We should evaluate upgrading to the latest trafilatura. I see a relevant note:
"trafilatura==1.8.", # must stay below v1.11. to allow easy extraction of canonical_url
But I see that 2.0.0 has improvements to the URL extraction listed in their release notes, so we should evaluate upgrading as part of our regular maintenance. Is this "easy extraction of canonical_url" issue still something we're worried about? What was that for?
We should evaluate upgrading to the latest trafilatura. I see a relevant note:
But I see that 2.0.0 has improvements to the URL extraction listed in their release notes, so we should evaluate upgrading as part of our regular maintenance. Is this "easy extraction of canonical_url" issue still something we're worried about? What was that for?