Releases: hplt-project/data-analytics-tool
Releases · hplt-project/data-analytics-tool
HPLTAnalytics v1.2
https://github.com/hplt-project/data-analytics-tool/blob/v1.2/CHANGELOG.md
What's Changed
- Bump transformers from 4.52.1 to 4.53.0 in /deployment by @dependabot[bot] in #43
- Support for HPLTv3 documents by @mbanon in #47
- Added Domain Classification (#8) by @levnikolaevich in #46
- Reuse register labels when they exist in HPLT3 by @mbanon in #48
- Fixing stopwords.kab by @BoFFire in #50
- Bump jspdf from 3.0.1 to 3.0.2 in /front by @dependabot[bot] in #51
- Three by @mbanon in #45
- Mini frontend fixes by @mbanon in #53
New Contributors
- @BoFFire made their first contribution in #50
Full Changelog: v1.1...v1.2
HPLTAnalytics v1.1
What's Changed
https://github.com/hplt-project/data-analytics-tool/blob/v1.1/CHANGELOG.md
Full Changelog: v1.0...v1.1
HPLTAnalytics v1.0
What's Changed
https://github.com/hplt-project/data-analytics-tool/blob/v1.0/CHANGELOG.md
Full Changelog: v0.4...v1.0
HPLT Analytics 0.4
HPLT Analytics 0.3
What's Changed
- New frontend and graphics
- Integrate WDS (Web Document Scorer)
- Integrate HeLIPort
- Support for more metadata (domains, tlds, collections,...)
- Support for more languages (tokenization, stopwords...)
- Added PII detection
- Lighter-weight PDFs
- Added samples to frontend
- Added register labels to frontend
- Added reports for HPLT v2
- Added tests
- Libraries version bumps
- Other minor fixings
Contributors
- @mbanon Main developer
- @lukasweymann Frontend & visual design
- @gramirez-prompsit QA + feature selection
- @ZJaume HeLIport
- @pablop16n Web Document Scorer (WDS) & Stopwords review
- @aliciannz & @TudorMN Stopwords generation
Full Changelog: v0.2-ALPHA...v0.3
HPLT Analytics 0.2-ALPHA
HPLT Analytics 0.1-ALPHA
First released version.
Some known issues:
- No GPU support.
- Corpus names cannot contain strings such as ".tsv"
- OOM errors on very big corpora processing.
- YAML files will be overwritten if a new one is processed with the same name.
Will be fixed on next releases.