Version
main
Which installation method(s) does this occur on?
Source
Describe the bug.
Summary
Audit of api/ found missing core dependencies (used in api/src but not declared) and dependencies that are only needed for tests and should be moved to an optional test extra.
1. Add missing core dependencies
These packages are imported in api/src but not listed in api/pyproject.toml. They should be added under [project] dependencies.
| Package |
PyPI name |
Notes |
| numpy |
numpy |
Used in yolox, ocr, pdfium, transforms, table_and_chart, etc. |
| pypdfium2 |
pypdfium2 |
Used in pdf util, PDF engines, metadata aggregators, pptx_helper. |
| requests |
requests |
Used in rest client, nim client, helpers, tika engine. |
| OpenCV |
opencv-python |
Imported as cv2 in transforms and model_interface/helpers. |
| Pillow |
Pillow |
Imported as PIL in transforms, aggregators, image_helpers, cached. |
| gRPC |
grpcio |
Imported as grpc in parakeet model interface. |
| scikit-learn |
scikit-learn |
Imported as sklearn; used in table_and_chart.py for sklearn.cluster.DBSCAN. |
| redis |
redis |
Used in util/service_clients/redis/redis_client.py. |
| python-docx |
python-docx |
Imported as docx in docx extractor (internal/extract/docx/.../docxreader.py). |
| python-pptx |
python-pptx |
Imported as pptx in pptx helper (internal/extract/pptx/engines/pptx_helper.py). |
| minio |
minio |
Used in internal/store/embed_text_upload.py for Minio client. |
| pymilvus |
pymilvus |
Used in internal/store/embed_text_upload.py for Collection, connections, bulk writer. |
| aiohttp |
aiohttp |
Used in internal/extract/pdf/engines/llama.py for async HTTP. |
| scipy |
scipy |
Used in internal/primitives/nim/model_interface/parakeet.py (scipy.io.wavfile). |
| nvidia-riva-client |
nvidia-riva-client |
Imported as riva.client in parakeet model interface. |
| unstructured-client |
unstructured-client |
Used in internal/extract/pdf/engines/unstructured_io.py. |
| tqdm |
tqdm |
Used in util/dataloader/dataloader.py. |
| python-dateutil |
python-dateutil |
Imported as dateutil in util/converters/datetools.py. |
| fastparquet |
fastparquet |
Used in util/converters/dftools.py. |
Optional: Add openai if the LLM summarizer UDF (api/src/udfs/llm_summarizer_udf.py) is part of the shipped package.
GPU / optional: cudf is used in util/converters/dftools.py; consider adding as an optional extra (e.g. gpu or cudf) rather than a core dependency.
2. Move test-only dependencies out of core
These are currently in dependencies but are only used by tests. Move them into [project.optional-dependencies] (e.g. a test extra).
| Package |
Action |
| moviepy |
Remove from core dependencies; add to optional-dependencies (e.g. test). Only used in api_tests/util/dataloader/ (dataloader_test_tools, test_dataloader_video). |
| pydantic-settings |
Remove from core dependencies (not used in api src or api_tests). Add to an optional extra later if needed. |
Acceptance criteria
Minimum reproducible example
Relevant log output
Other/Misc.
No response
Version
main
Which installation method(s) does this occur on?
Source
Describe the bug.
Summary
Audit of
api/found missing core dependencies (used inapi/srcbut not declared) and dependencies that are only needed for tests and should be moved to an optional test extra.1. Add missing core dependencies
These packages are imported in
api/srcbut not listed inapi/pyproject.toml. They should be added under[project] dependencies.numpypypdfium2requestsopencv-pythoncv2in transforms and model_interface/helpers.PillowPILin transforms, aggregators, image_helpers, cached.grpciogrpcin parakeet model interface.scikit-learnsklearn; used intable_and_chart.pyforsklearn.cluster.DBSCAN.redisutil/service_clients/redis/redis_client.py.python-docxdocxin docx extractor (internal/extract/docx/.../docxreader.py).python-pptxpptxin pptx helper (internal/extract/pptx/engines/pptx_helper.py).miniointernal/store/embed_text_upload.pyfor Minio client.pymilvusinternal/store/embed_text_upload.pyfor Collection, connections, bulk writer.aiohttpinternal/extract/pdf/engines/llama.pyfor async HTTP.scipyinternal/primitives/nim/model_interface/parakeet.py(scipy.io.wavfile).nvidia-riva-clientriva.clientin parakeet model interface.unstructured-clientinternal/extract/pdf/engines/unstructured_io.py.tqdmutil/dataloader/dataloader.py.python-dateutildateutilinutil/converters/datetools.py.fastparquetutil/converters/dftools.py.Optional: Add
openaiif the LLM summarizer UDF (api/src/udfs/llm_summarizer_udf.py) is part of the shipped package.GPU / optional:
cudfis used inutil/converters/dftools.py; consider adding as an optional extra (e.g.gpuorcudf) rather than a core dependency.2. Move test-only dependencies out of core
These are currently in
dependenciesbut are only used by tests. Move them into[project.optional-dependencies](e.g. atestextra).dependencies; add to optional-dependencies (e.g.test). Only used inapi_tests/util/dataloader/(dataloader_test_tools, test_dataloader_video).dependencies(not used in api src or api_tests). Add to an optional extra later if needed.Acceptance criteria
api/pyproject.tomlunderdependencies(and optionallyopenaiif applicable; considercudfas an optional extra).moviepyandpydantic-settingsare removed from coredependencies.test) exists and includesmoviepy(and optionally pytest/ray if desired for test runs).Minimum reproducible example
Relevant log output
Other/Misc.
No response