Skip to content

replace python-based Ukrainian lemmatizer in docker image with native built-in lemmatizer #127

@tomatolog

Description

@tomatolog

Proposal:

After manticoresoftware/manticoresearch#4414 is merged we nned to update the Docker image to use the new native built-in Ukrainian lemmatizer instead of the old pymorphy2-based setup

as requested at manticoresoftware/manticoresearch#4414 (comment)

current docker image still includes the old Ukrainian lemmatizer path:

  • Dockerfile installs manticore-lemmatizer-uk
  • Dockerfile installs Python 3.9, pymorphy2, and pymorphy2-dicts-uk for lemmatize_uk
  • PYTHONWARNINGS is set for pymorphy2.analyzer

once the native Ukrainian lemmatizer is available in manticore packages we need to remove the Python runtime and pymorphy2 dependency chain from the Docker image and rely on the native asset-backed implementation.

That should allow:

  • daster docker image builds
  • smaller image and dependency surface
  • fewer python/pip/CVE maintenance issues
  • Ukrainian morphology should continue to work with morphology = 'lemmatize_uk'

Acceptance criteria:

  • Remove Python 3.9 / pip / pymorphy2 / pymorphy2-dicts-uk install steps from Dockerfile
  • Remove obsolete PYTHONWARNINGS for pymorphy2.analyzer
  • Stop installing the old external manticore-lemmatizer-uk package if it is no longer needed
  • Keep or update clt_tests/tests/test-ukrainian-morphology.rec so Docker CI verifies lemmatize_uk
  • Confirm Docker image build succeeds and Ukrainian morphology CLT passes

Checklist:

To be completed by the assignee. Check off tasks that have been completed or are not applicable.

Details
  • Implementation completed
  • Tests developed
  • Documentation updated
  • Documentation reviewed
  • Changelog updated

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions