Skip to content

Can't properly decode ToUnicode CMap #1597

@shibaev

Description

@shibaev

For TT_EOF cmap.pdf and PDF/A-2a flavour, veraPDF reports "6.2.11.7.2-1 The Font dictionary of all fonts shall define the map of all used character codes to Unicode values, either via a ToUnicode entry, or other mechanisms as defined in ISO 19005-2, 6.2.11.7.2". Internally, the following error happens:

org.verapdf.pd.font.cmap.CMapFactory getCMap
WARNING: Can't parse CMap CMap 13 0 obj, using default
java.io.IOException: CMap contains invalid entry in bfchar. Expected TT_HEXSTRING but got TT_EOF
        at org.verapdf.pd.font.cmap.CMapParser.checkTokenType(CMapParser.java:336)
        at org.verapdf.pd.font.cmap.CMapParser.readSingleToUnicodeMapping(CMapParser.java:241)
        at org.verapdf.pd.font.cmap.CMapParser.processList(CMapParser.java:143)
        at org.verapdf.pd.font.cmap.CMapParser.processObject(CMapParser.java:95)
        at org.verapdf.pd.font.cmap.CMapParser.parse(CMapParser.java:80)
        at org.verapdf.pd.font.cmap.CMapFactory.getCMap(CMapFactory.java:60)
        at org.verapdf.pd.font.cmap.PDCMap.getCMapFile(PDCMap.java:119)
        at org.verapdf.pd.font.cmap.PDCMap.getCMapFile(PDCMap.java:111)
        at org.verapdf.pd.font.cmap.PDCMap.toUnicode(PDCMap.java:262)
        at org.verapdf.pd.font.PDFont.cMapToUnicode(PDFont.java:328)
        at org.verapdf.pd.font.PDFont.toUnicode(PDFont.java:312)
        at org.verapdf.pd.font.PDSimpleFont.toUnicode(PDSimpleFont.java:59)
        at org.verapdf.gf.model.impl.operator.textshow.GFGlyph.<init>(GFGlyph.java:127)
        at org.verapdf.gf.model.impl.operator.textshow.GFGlyph.<init>(GFGlyph.java:70)
        at org.verapdf.gf.model.impl.operator.textshow.GFGlyph.getGlyph(GFGlyph.java:152)
        at org.verapdf.gf.model.impl.operator.textshow.GFOpTextShow.getUsedGlyphs(GFOpTextShow.java:154)
        at org.verapdf.gf.model.impl.operator.textshow.GFOpTextShow.getLinkedObjects(GFOpTextShow.java:110)
        at org.verapdf.gf.model.impl.operator.textshow.GFOpStringTextShow.getLinkedObjects(GFOpStringTextShow.java:58)
        at org.verapdf.pdfa.validation.validators.BaseValidator.addAllLinkedObjects(BaseValidator.java:285)
        at org.verapdf.pdfa.validation.validators.BaseValidator.checkNext(BaseValidator.java:250)
        at org.verapdf.pdfa.validation.validators.BaseValidator.validate(BaseValidator.java:185)
        at org.verapdf.pdfa.validation.validators.BaseValidator.validateAll(BaseValidator.java:149)
        at org.verapdf.processor.ProcessorImpl.validate(ProcessorImpl.java:241)
        at org.verapdf.processor.ProcessorImpl.process(ProcessorImpl.java:119)
        at org.verapdf.processor.BatchFileProcessor.processItem(BatchFileProcessor.java:167)
        at org.verapdf.processor.BatchFileProcessor.processList(BatchFileProcessor.java:85)
        at org.verapdf.processor.AbstractBatchProcessor.process(AbstractBatchProcessor.java:104)
        at org.verapdf.gui.ValidateWorker.doInBackground(ValidateWorker.java:125)
        at org.verapdf.gui.ValidateWorker.doInBackground(ValidateWorker.java:63)
        at javax.swing.SwingWorker$1.call(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at javax.swing.SwingWorker.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

However, the corresponding ToUnicode stream can be decoded properly. veraPDF does not report any issues for the decoded file TT_EOF cmap decoded.pdf.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions