Skip to content

Bug: html_decode_and_santize can exceed Nokogiri::Gumbo::DEFAULT_MAX_ATTRIBUTES #121

@corylown

Description

@corylown

EBSCO::EDS::Record#html_decode_and_santize fails to handle cases where the data being sanitized exceeds Nokogiri's Nokogiri::Gumbo::DEFAULT_MAX_ATTRIBUTES setting of 400.

We've seen an increased number of these errors since late December 2024.

There's a way to pass a larger configured value for :max_attributes as part of the Sanitize::Config hash. See: https://github.com/rgrove/sanitize?tab=readme-ov-file#parser_options-hash. The edsapi-ruby gem may need to set a higher limit based on the data being passed through this sanitize method and/or more gracefully handle cases where this limit would be exceeded.

I was able to trigger this error by searching the EDS API for:

  • anom fbi
  • fitch communes
  • uri menezes

Top of stack trace:

    [GEM_ROOT]/gems/nokogiri-1.18.2-x86_64-linux-gnu/lib/nokogiri/html5/document_fragment.rb:166 :in `fragment`
    [GEM_ROOT]/gems/nokogiri-1.18.2-x86_64-linux-gnu/lib/nokogiri/html5/document_fragment.rb:166 :in `initialize`
    [GEM_ROOT]/gems/nokogiri-1.18.2-x86_64-linux-gnu/lib/nokogiri/xml/document_fragment.rb:44 :in `new`
    [GEM_ROOT]/gems/nokogiri-1.18.2-x86_64-linux-gnu/lib/nokogiri/html5/document_fragment.rb:84 :in `parse`
    [GEM_ROOT]/gems/nokogiri-1.18.2-x86_64-linux-gnu/lib/nokogiri/html5.rb:281 :in `fragment`
    [GEM_ROOT]/gems/sanitize-6.1.3/lib/sanitize.rb:138 :in `fragment`
    [GEM_ROOT]/gems/sanitize-6.1.3/lib/sanitize.rb:67 :in `fragment`
    [GEM_ROOT]/gems/ebsco-eds-1.1.5/lib/ebsco/eds/record.rb:966 :in `html_decode_and_sanitize`
    [GEM_ROOT]/gems/ebsco-eds-1.1.5/lib/ebsco/eds/record.rb:936 :in `sanitize_data`
    [GEM_ROOT]/gems/ebsco-eds-1.1.5/lib/ebsco/eds/record.rb:904 :in `block in get_item_data`
    [GEM_ROOT]/gems/ebsco-eds-1.1.5/lib/ebsco/eds/record.rb:902 :in `each`
    [GEM_ROOT]/gems/ebsco-eds-1.1.5/lib/ebsco/eds/record.rb:902 :in `get_item_data`
    [GEM_ROOT]/gems/ebsco-eds-1.1.5/lib/ebsco/eds/record.rb:189 :in `initialize`
    [GEM_ROOT]/gems/ebsco-eds-1.1.5/lib/ebsco/eds/results.rb:57 :in `new`
    [GEM_ROOT]/gems/ebsco-eds-1.1.5/lib/ebsco/eds/results.rb:57 :in `block in initialize`
    [GEM_ROOT]/gems/ebsco-eds-1.1.5/lib/ebsco/eds/results.rb:55 :in `each`
    [GEM_ROOT]/gems/ebsco-eds-1.1.5/lib/ebsco/eds/results.rb:55 :in `initialize`
    [GEM_ROOT]/gems/ebsco-eds-1.1.5/lib/ebsco/eds/session.rb:247 :in `new`
    [GEM_ROOT]/gems/ebsco-eds-1.1.5/lib/ebsco/eds/session.rb:247 :in `search` 

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions