-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Overview
Once the response is received, the newUTF8WithFallbackReader determines the content-type (media-type and charset). In case, media-type is text content, the body's character encoding is converted to utf-8, otherwise no-op.
Currently, the approach to determine textual media-type is inspired by go-colly's implementation, which basically blacklists limited set of non-textual media-types.
synapse/fetcher/http/charset.go
Lines 76 to 86 in 9d92807
| isTextualContent := func(mimeType string) bool { | |
| switch { | |
| case strings.HasPrefix(mimeType, "image/"), | |
| strings.HasPrefix(mimeType, "video/"), | |
| strings.HasPrefix(mimeType, "audio/"), | |
| strings.HasPrefix(mimeType, "font/"): | |
| return false | |
| default: | |
| return true | |
| } | |
| } |
Scope
Instead of filtering out "what the media-type is not", determine if it belongs to a known textual type such as text/*, application/json, *+xml, etc, so that only textual content is processed in the flow, while reducing the error surface.
synapse/fetcher/http/charset.go
Lines 101 to 103 in 9d92807
| if !isTextualContent(mimeType) { | |
| return nil, nil | |
| } |