Skip to content

Commit 637d8d7

Browse files
committed
Merge branch 'release/0.27.0'
2 parents b249046 + f1cb679 commit 637d8d7

File tree

15 files changed

+139
-39
lines changed

15 files changed

+139
-39
lines changed

CHANGELOG

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,16 @@
22
ChangeLog
33
*********
44

5+
0.27.0 (2018-07-19)
6+
===================
7+
- Feature: Support the Hypothes.is annotator toolbar on pdfs and files converted to pdf. This is
8+
not enabled by default; see `docs/integrations.rst` for instructions on enabling it. NB: the
9+
default urls, page titles, and document ids that hypothes.is gets from the rendered document when
10+
running in an MFR context are not very useful, so MFR will try to provide more appropriate values to
11+
the annotator. These may not be valid for all use cases, please see the document mentioned
12+
above for details. (h/t @jamescdavis for helping to debug a race condition in the loader!)
13+
- Code: Don't let pytest descend into node_modules/. (thanks, @birdbrained!)
14+
515
0.26.0 (2018-06-22)
616
===================
717
- Feature: Teach MFR to identify itself when requesting metadata from WaterButler. This will allow

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ Guide
2424
install
2525
quickstart
2626
overview
27+
integrations
2728
code
2829

2930
Project info

docs/integrations.rst

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
.. _integrations:
2+
3+
Integrations
4+
============
5+
6+
7+
Hypothes.is annotator
8+
---------------------
9+
10+
MFR supports loading the `Hypothes.is <https://hypothes.is/>`_ annotation sidebar on pdfs and files converted to pdf. Hypothes.is allows users to publicly comment and converse on internet-accesible files. The annotator is not automatically loaded; it must be signaled to turn on by the parent iframe. MFR also overrides some of the properties used by the sidebar to identify the annotation.
11+
12+
13+
Enabling
14+
^^^^^^^^
15+
16+
The annotator is not loaded automatically for every MFR pdf render. The parent frame will need to send the ``startHypothesis`` event to the MFR iframe to start loading the annotator. If the iframe is created via ``mfr.js``, then this signal can be sent by calling ``.startHypothesis()`` on the Render object. If ``mfr.js`` is not used, then the signal can be sent by calling ``.postMessage()`` on the iframe:
17+
18+
.. code-block:: javascript
19+
20+
$('iframe')[0].contentWindow.postMessage('startHypothesis', mfrUrl);
21+
22+
When the iframe receives this event, it will override the pdf.js metadata the annotator extracts then inject the hypothes.is loader script into the iframe.
23+
24+
Hypothes.is support can be completely disabled by setting the ``ENABLE_HYPOTHESIS`` flag to `False` in the pdf extension settings (`mfr.extensions.pdf.settings`). If running via the OSF's docker-compose, add ``PDF_EXTENSION_CONFIG_ENABLE_HYPOTHESIS=0`` to ``.docker-compose.mfr.env`` in the osf.io repo and recreate the container. If this flag is turned off, sending the ``startHypothesis`` event to the iframe will do nothing.
25+
26+
27+
Annotator metadata
28+
^^^^^^^^^^^^^^^^^^
29+
30+
The annotator client links annotations to both the url of the document and an identifier embedded in the pdf. It also attaches the page title as metadata to the annotation. [#f1]_ In MFR, all three of these may be unsuitable for one reason or another, so MFR will override the properties that the client retrieves to provide more appropriate values. These properties are:
31+
32+
**URL**: The MFR url can be complex, especially since it takes another url as a query parameter. Hypothes.is can handle reordering of the top-level parameters, but any change to the internal url will be taken as a new url, causing annotations to be lost. In addition, the url is used by hypothesis to provide share links and "view-in-context" links. Visiting an MFR render url will load the iframe, but without an external frame to send the ``startHypothesis`` signal, the annotations will never be loaded. Visiting an MFR export url will start a download of the document, with no chance of showing annotations. Instead, MFR sets the annotation url to the parent frame, which is expected to be simpler and provide more context.
33+
34+
**Document ID**: The document ID is an identifier embedded in the pdf. pdf.js will extract this value, or if it is not present, return the md5 hash of the first 1024 bytes of data in the pdf. User-provided pdfs will *usually* contain IDs, but may not. If the pdf is updated there is no guarantee that the ID will be preserved across revisions. If the ID changes, the document could lose its annotations. pdfs exported by LibreOffice do not contain any identifiers and may change unpredictably. For these reasons, MFR exports a stable identifier that should persist across revisions. The stable ID is defined by the auth provider. The OSF auth provider uses a hash of file metadata that is particular to that file and unlikely to change. MFR does not modify the file, instead overwriting the identifier detected by pdf.js, which is then read by the annotator client.
35+
36+
**Title**: The annotator will derive the annotation page title from the pdf title. Similar to Document IDs, user-provided pdfs may or may not have a title. LibreOffice-exported pdfs do not have an embedded title. If an embedded title isn't found, the annotator will fall back to the iframe document's title, which if not set will default to the path part of the iframe url. This results in annotation titles of "render" or "export", with no distinguishing attributes from other MFR annotations. MFR works around this by updating the pdf.js-detected title and page title with the source file's name.
37+
38+
.. rubric:: Footnotes
39+
40+
.. [#f1] If the page title changes between annotations, the client will send the new page title with new annotations, but the hypothesis aggregator will discard that and `use the first title received <https://github.com/hypothesis/h/blob/8410ff35150ea600c02458e4558a67db7c926816/h/activity/bucketing.py#L27>`_ for that identifier.

mfr/extensions/pdf/export.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import os
22
import imghdr
3+
import logging
34
from http import HTTPStatus
45

56
from PIL import Image, TiffImagePlugin
@@ -9,6 +10,8 @@
910
from mfr.extensions.pdf import exceptions
1011
from mfr.extensions.pdf.settings import EXPORT_MAX_PAGES
1112

13+
logger = logging.getLogger(__name__)
14+
1215

1316
class PdfExporter(extension.BaseExporter):
1417

@@ -63,6 +66,7 @@ def tiff_to_pdf(self, tiff_img, max_size):
6366
c.save()
6467

6568
def export(self):
69+
logger.debug('pdf-export: format::{}'.format(self.format))
6670
parts = self.format.split('.')
6771
export_type = parts[-1].lower()
6872
max_size = [int(x) for x in parts[0].split('x')] if len(parts) == 2 else None

mfr/extensions/pdf/render.py

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -21,36 +21,35 @@ class PdfRenderer(extension.BaseRenderer):
2121
def render(self):
2222

2323
download_url = munge_url_for_localdev(self.metadata.download_url)
24+
escaped_name = escape_url_for_template(
25+
'{}{}'.format(self.metadata.name, self.metadata.ext)
26+
)
2427
logger.debug('extension::{} supported-list::{}'.format(self.metadata.ext,
2528
settings.EXPORT_SUPPORTED))
2629
if self.metadata.ext.lower() not in settings.EXPORT_SUPPORTED:
2730
logger.debug('Extension not found in supported list!')
2831
return self.TEMPLATE.render(
2932
base=self.assets_url,
3033
url=escape_url_for_template(download_url.geturl()),
34+
stable_id=self.metadata.stable_id,
35+
file_name=escaped_name,
3136
enable_hypothesis=settings.ENABLE_HYPOTHESIS,
3237
)
3338

3439
logger.debug('Extension found in supported list!')
3540
exported_url = furl.furl(self.export_url)
36-
if settings.EXPORT_TYPE:
37-
if settings.EXPORT_MAXIMUM_SIZE:
38-
exported_url.args['format'] = '{}.{}'.format(settings.EXPORT_MAXIMUM_SIZE,
39-
settings.EXPORT_TYPE)
40-
else:
41-
exported_url.args['format'] = settings.EXPORT_TYPE
42-
43-
self.metrics.add('needs_export', True)
44-
return self.TEMPLATE.render(
45-
base=self.assets_url,
46-
url=escape_url_for_template(exported_url.url),
47-
enable_hypothesis=settings.ENABLE_HYPOTHESIS
48-
)
41+
if settings.EXPORT_MAXIMUM_SIZE:
42+
exported_url.args['format'] = '{}.{}'.format(settings.EXPORT_MAXIMUM_SIZE,
43+
settings.EXPORT_TYPE)
44+
else:
45+
exported_url.args['format'] = settings.EXPORT_TYPE
4946

50-
# TODO: is this dead code? ``settings.EXPORT_TYPE`` is never None or empty
47+
self.metrics.add('needs_export', True)
5148
return self.TEMPLATE.render(
5249
base=self.assets_url,
53-
url=escape_url_for_template(download_url.geturl()),
50+
url=escape_url_for_template(exported_url.url),
51+
stable_id=self.metadata.stable_id,
52+
file_name=escaped_name,
5453
enable_hypothesis=settings.ENABLE_HYPOTHESIS,
5554
)
5655

mfr/extensions/pdf/settings.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,10 @@
44
config = settings.child('PDF_EXTENSION_CONFIG')
55

66
EXPORT_TYPE = config.get('EXPORT_TYPE', 'pdf')
7+
assert EXPORT_TYPE # mandatory config
78
EXPORT_MAXIMUM_SIZE = config.get('EXPORT_MAXIMUM_SIZE', '1200x1200')
89

9-
ENABLE_HYPOTHESIS = config.get_bool('ENABLE_HYPOTHESIS', False)
10+
ENABLE_HYPOTHESIS = config.get_bool('ENABLE_HYPOTHESIS', True)
1011

1112
# supports multiple files in the form of a space separated string
1213
EXPORT_SUPPORTED = config.get('EXPORT_SUPPORTED', '.tiff .tif').split(' ')

mfr/extensions/pdf/templates/viewer.mako

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -424,8 +424,11 @@ http://sourceforge.net/adobe/cmap/wiki/License/
424424
window.pymChild.sendMessage('embed', 'embed-responsive-pdf');
425425
</script>
426426
% if enable_hypothesis:
427+
<script>
428+
window.MFR_STABLE_ID = '${stable_id}';
429+
window.MFR_FILE_NAME = '${file_name}';
430+
</script>
427431
<script src="/static/js/mfr.child.hypothesis.js"></script>
428432
% endif
429433
</body>
430434
</html>
431-

mfr/extensions/unoconv/export.py

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,6 @@
11
import os
22
import subprocess
33

4-
from pdfrw import (
5-
PdfReader,
6-
PdfWriter
7-
)
8-
9-
104
from mfr.core import extension
115
from mfr.core import exceptions
126

@@ -39,8 +33,3 @@ def export(self):
3933
extension=extension or '',
4034
exporter_class='unoconv',
4135
)
42-
43-
pdf = PdfReader(self.output_file_path)
44-
pdf.ID[0] = self.metadata.stable_id
45-
pdf.ID[1] = self.metadata.unique_key
46-
PdfWriter(self.output_file_path, trailer=pdf).write()

mfr/providers/osf/provider.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ async def metadata(self):
5959
differently.
6060
"""
6161
download_url = await self._fetch_download_url()
62+
logger.debug('download_url::{}'.format(download_url))
6263
if '/file?' in download_url:
6364
# URL is for WaterButler v0 API
6465
# TODO Remove this when API v0 is officially deprecated
@@ -124,8 +125,10 @@ async def metadata(self):
124125
self.metrics.add('metadata.clean_url_args', str(cleaned_url))
125126
meta = metadata['data']
126127
unique_key = hashlib.sha256((meta['etag'] + cleaned_url.url).encode('utf-8')).hexdigest()
127-
stable_id = hashlib.sha256('/{}/{}/{}'.format(meta['resource'], meta['provider'], meta['path'])
128-
.encode('utf-8')).hexdigest()
128+
stable_str = '/{}/{}{}'.format(meta['resource'], meta['provider'], meta['path'])
129+
stable_id = hashlib.sha256(stable_str.encode('utf-8')).hexdigest()
130+
logger.debug('stable_identifier: str({}) hash({})'.format(stable_str, stable_id))
131+
129132
return provider.ProviderMetadata(name, ext, content_type, unique_key, download_url, stable_id)
130133

131134
async def download(self):
@@ -177,6 +180,7 @@ async def _fetch_download_url(self):
177180
)
178181
await request.release()
179182

183+
logger.debug('osf-download-resolver: request.status::{}'.format(request.status))
180184
if request.status != 302:
181185
raise exceptions.MetadataError(
182186
request.reason,

mfr/server/handlers/core.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
import abc
33
import uuid
44
import asyncio
5+
import logging
56
import pkg_resources
67

78
import tornado.web
@@ -31,6 +32,8 @@
3132
'Content-Encoding',
3233
]
3334

35+
logger = logging.getLogger(__name__)
36+
3437

3538
class CorsMixin:
3639

@@ -110,6 +113,7 @@ async def prepare(self):
110113
provider=settings.PROVIDER_NAME,
111114
code=400,
112115
)
116+
logging.debug('target_url::{}'.format(self.url))
113117

114118
self.provider = utils.make_provider(
115119
settings.PROVIDER_NAME,
@@ -120,6 +124,7 @@ async def prepare(self):
120124

121125
self.metadata = await self.provider.metadata()
122126
self.extension_metrics.add('ext', self.metadata.ext)
127+
logging.debug('extension::{}'.format(self.metadata.ext))
123128

124129
self.cache_provider = waterbutler.core.utils.make_provider(
125130
settings.CACHE_PROVIDER_NAME,

0 commit comments

Comments
 (0)