Skip to content

Conversation

@suricactus
Copy link

When diffing model fields that are binary, represent their value as a hex representation.

Prior to this change, all fields that are not datetime or JSON will be passed to Django's smart_str/force_str from within django-auditlog's get_field_value.
The value will then be encoded as a utf8 string, which does not work for all binary values, as they may contain arbitrary bytes - for example a md5 sum or a raster image.

This commit introduces a new special handling for the values of BinaryField. The value will be converted to the hex representation of the binary. Hex was chosen over the default Python representation with \x escape characters to save some space from excessive \x in the generated string for the audit diff.


Example error that motivated this PR:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 0: invalid start byte
[...]
  File "/usr/local/lib/python3.10/site-packages/auditlog/diff.py", line 186, in model_instance_diff
    new_value = get_field_value(new, field)
  File "/usr/local/lib/python3.10/site-packages/auditlog/diff.py", line 84, in get_field_value
   value = smart_str(getattr(obj, field.name, None))
  File "/usr/local/lib/python3.10/site-packages/django/utils/encoding.py", line 33, in smart_str
    return force_str(s, encoding, strings_only, errors)
  File "/usr/local/lib/python3.10/site-packages/django/utils/encoding.py", line 74, in force_str
    raise DjangoUnicodeDecodeError(s, *e.args)
django.utils.encoding.DjangoUnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 0: invalid start byte. You passed in b'\x96\x9ax\x83\xee\x8b\n3\xa6!\xdb\x9f\xe0\xe7\x8c$' (<class 'bytes'>)

When diffing model fields that are binary, represent their value as a
hex representation.

Prior to this change, all fields that are not datetime or JSON will be
passed to Django's `smart_str`/`force_str` from within django-auditlog's
`get_field_value`.
The value will then be encoded as a utf8 string, which does not work for
all binary values, as they may contain arbitrary bytes - for example a
md5 sum or a raster image.

This commit introduces a new special handling for the values of
`BinaryField`. The value will be converted to the hex representation of
the binary. Hex was chosen over the default Python representation with
`\x` escape characters to save some space from excessive `\x` in the
generated string for the audit diff.

---

Example error that motivated this PR:

```
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 0: invalid start byte
[...]
  File "/usr/local/lib/python3.10/site-packages/auditlog/diff.py", line 186, in model_instance_diff
    new_value = get_field_value(new, field)
  File "/usr/local/lib/python3.10/site-packages/auditlog/diff.py", line 84, in get_field_value
   value = smart_str(getattr(obj, field.name, None))
  File "/usr/local/lib/python3.10/site-packages/django/utils/encoding.py", line 33, in smart_str
    return force_str(s, encoding, strings_only, errors)
  File "/usr/local/lib/python3.10/site-packages/django/utils/encoding.py", line 74, in force_str
    raise DjangoUnicodeDecodeError(s, *e.args)
django.utils.encoding.DjangoUnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 0: invalid start byte. You passed in b'\x96\x9ax\x83\xee\x8b\n3\xa6!\xdb\x9f\xe0\xe7\x8c$' (<class 'bytes'>)
```
@hramezani
Copy link
Member

Thanks @suricactus for reporting this and providing the breaking case.

It would be great to include the fix as well.

Please remember to add a changelog entry in case you want to continue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants