Skip to content

Conversation

@karpnv
Copy link
Collaborator

@karpnv karpnv commented Nov 10, 2023

Common Crawl dataset preprocessing

karpnv and others added 30 commits September 12, 2023 04:28
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
karpnv and others added 30 commits March 19, 2024 09:32
* YouTube German config and new processors

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* Added Merge Manifests processor

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* Clean de.yaml pipeline config

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* Fix Lang2Iso

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* fix typo

* fix empty list error - IndexError: list index out of range

* Added requirements.txt

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* Fixed paths for audio TN

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* Updated requirements.txt

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

---------

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
* YouTube German config and new processors

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* Added Merge Manifests processor

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* Clean de.yaml pipeline config

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* Fix Lang2Iso

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* fix typo

* fix empty list error - IndexError: list index out of range

* Added requirements.txt

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* Fixed paths for audio TN

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* Updated requirements.txt

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* ew processors for calculating metrics WER, CER, eedge CER, len diff ratio

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>

* Update utils.py

* Update aggregate_segments.py

* Update aggregate_segments.py

* Update aggregate_segments.py

---------

Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Sasha Meister <ameister@nvidia.com>
Co-authored-by: Sasha Meister <ameister@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants