|
3 | 3 | Developing plugins for MMIF Python SDK |
4 | 4 | ====================================== |
5 | 5 |
|
6 | | - |
7 | 6 | Overview |
8 | 7 | -------- |
9 | 8 |
|
@@ -80,10 +79,41 @@ And the plugin code. |
80 | 79 | def help(): |
81 | 80 | return "location format: `<DOCUMENT_ID>.video`" |
82 | 81 |
|
83 | | -
|
84 | | -
|
85 | | -Bulit-in Document Location Scheme Plugins |
| 82 | +Built-in Document Location Scheme Plugins |
86 | 83 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
87 | 84 |
|
88 | | -At the moment, ``mmif-python`` PyPI distribution ships a built-in *docloc* plugin that support both ``http`` and ``https`` schemes. |
| 85 | +At the moment, ``mmif-python`` PyPI distribution ships a built-in *docloc* plugin that support both ``http`` and ``https`` schemes. This plugin implements caching as described above, so repeated access to the same URL will not trigger multiple downloads. |
89 | 86 | Take a look at :mod:`mmif_docloc_http` module for details. |
| 87 | + |
| 88 | +Caching for Remote File Access |
| 89 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 90 | + |
| 91 | +When developing plugins that resolve remote document locations (e.g., ``http``, ``s3``, or custom schemes), it is highly recommended to implement caching to avoid repeated network requests or file downloads. Since ``mmif-python`` may call the ``resolve`` function multiple times for the same document location during processing, caching can significantly improve performance. |
| 92 | + |
| 93 | +A simple and effective approach is to use a module-level dictionary as a cache. Because Python modules are singletons (loaded once and cached in ``sys.modules``), this cache persists for the entire lifetime of the Python process, across multiple MMIF files and Document objects. |
| 94 | + |
| 95 | +Here's an example of how to implement caching in a plugin: |
| 96 | + |
| 97 | +.. code-block:: python |
| 98 | +
|
| 99 | + # mmif_docloc_myscheme/__init__.py |
| 100 | +
|
| 101 | + _cache = {} |
| 102 | +
|
| 103 | + def resolve(docloc): |
| 104 | + if docloc in _cache: |
| 105 | + return _cache[docloc] |
| 106 | +
|
| 107 | + # ... your resolution logic here ... |
| 108 | + resolved_path = do_actual_resolution(docloc) |
| 109 | +
|
| 110 | + _cache[docloc] = resolved_path |
| 111 | + return resolved_path |
| 112 | +
|
| 113 | +This pattern ensures that: |
| 114 | + |
| 115 | +* The first call to ``resolve`` performs the actual resolution (download, API call, etc.) |
| 116 | +* Subsequent calls for the same location return the cached result immediately |
| 117 | +* The cache is shared across all MMIF objects processed within the same Python process |
| 118 | + |
| 119 | +See :mod:`mmif_docloc_http` for a concrete example of this caching strategy in action. |
0 commit comments