Skip to content

Crash on malformed input #42

@tlby

Description

@tlby

Thank you for this work!

I discovered some poorly formed content that triggered an exception.

Example:

import sniffpy
buf = (
    b'\n\xef\xbb\xbf<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict'
    b'//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">\n<html'
    b' xmlns="http://www.w3.org/1999/xhtml">\n<html xmlns:fb="http://ww'
    b'w.facebook.com/2008/fbml">\n[...]\n</html>\r\n<!-- Performance op'
    b'timized by W3 Total Cache. Learn more: https://www.w3-edge.com/pr'
    b'oducts/\r\n\r\nPage Caching using disk: enhanced (SSL caching dis'
    b'abled)\r\n\r\n Served from: [...] @ 2017-03-29 16:04:56 by W3 Tot'
    b'al Cache -->'
)
sniffpy.sniff(buf)

produces:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/share/github.com/codeprentice-org/sniffpy/sniffpy/sniff.py", line 205, in sniff
    return sniff_unknown(resource, sniff_scriptable=not no_sniff)
  File "/share/github.com/codeprentice-org/sniffpy/sniffpy/sniff.py", line 106, in sniff_unknown
    mime_type = match.match_video_audio_type_pattern(resource)
  File "/share/github.com/codeprentice-org/sniffpy/sniffpy/match.py", line 93, in match_video_audio_type_pattern
    if is_mp3_pattern(resource):
  File "/share/github.com/codeprentice-org/sniffpy/sniffpy/match.py", line 132, in is_mp3_pattern
    if not match_mp3_header(resource, offset, parsed_values):
  File "/share/github.com/codeprentice-org/sniffpy/sniffpy/utils.py", line 49, in match_mp3_header
    parsed_values['layer'] = layer[0] >> 1
TypeError: 'int' object is not subscriptable

A few things are interesting to note about this content:

  • Byte order mark follows a newline
  • Line endings switch from \n to \r\n mid way
  • Multiple <html> tags
  • Mismatched open/close tags

This sample was found on the web (by commoncrawl.org in 2017), though I have replaced large sections with "[...]" that were not significant to the crash.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions