Skip to content

Refactor media-type filtering logic to determine textual content #5

@ritvikos

Description

@ritvikos

Overview

Once the response is received, the newUTF8WithFallbackReader determines the content-type (media-type and charset). In case, media-type is text content, the body's character encoding is converted to utf-8, otherwise no-op.

Currently, the approach to determine textual media-type is inspired by go-colly's implementation, which basically blacklists limited set of non-textual media-types.

isTextualContent := func(mimeType string) bool {
switch {
case strings.HasPrefix(mimeType, "image/"),
strings.HasPrefix(mimeType, "video/"),
strings.HasPrefix(mimeType, "audio/"),
strings.HasPrefix(mimeType, "font/"):
return false
default:
return true
}
}

Scope

Instead of filtering out "what the media-type is not", determine if it belongs to a known textual type such as text/*, application/json, *+xml, etc, so that only textual content is processed in the flow, while reducing the error surface.

if !isTextualContent(mimeType) {
return nil, nil
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Focus: Http FetcherRelated to the Synapse Http Fetcher (/fetcher/http) componentgood first issueGood for newcomers

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions