-
Notifications
You must be signed in to change notification settings - Fork 3
feat(docs): Quarto-generated documentation site #96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
cf66cf8
be2837b
a289123
3578ce4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -181,3 +181,9 @@ cython_debug/ | |
|
|
||
| # VSCode configuration | ||
| .vscode/ | ||
|
|
||
| /.quarto/ | ||
| **/*.quarto_ipynb | ||
| _site/ | ||
| docs/ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,136 @@ | ||
| execute: | ||
| freeze: auto | ||
| project: | ||
| type: website | ||
| render: | ||
| - index.qmd | ||
| - docs/**.qmd | ||
| - classifai/**.py | ||
| - DEMO/**.ipynb | ||
| - DEMO/**.py | ||
| - DEMO/**.qmd | ||
| - README.md | ||
| - DEMO/README.md | ||
| - classifai/vectorisers/__init__.py | ||
| - classifai/indexers/__init__.py | ||
| - classifai/servers/__init__.py | ||
|
|
||
|
|
||
| website: | ||
| title: "ClassifAI" | ||
| page-navigation: true | ||
| navbar: | ||
| background: light | ||
| search: true | ||
| left: | ||
| - file: docs/index.qmd | ||
| text: "Documentation" | ||
| right: | ||
| - icon: github | ||
| href: https://github.com/datasciencecampus/classifai | ||
|
|
||
| sidebar: | ||
| - id: index | ||
| - title: "Documentation" | ||
| - style: "docked" | ||
| collapse-level: 1 | ||
| contents: | ||
| - section: "Overview" | ||
| contents: | ||
| - docs/index.qmd | ||
| - section: "Vectorisers" | ||
| contents: | ||
| - section: "Vectorisers Overview" | ||
| contents: | ||
| - docs/vectorisers.qmd | ||
| - docs/vectorisers.base.VectoriserBase.qmd | ||
| - docs/vectorisers.base.VectoriserBase.transform.qmd | ||
| - section: "Specific Vectorisers" | ||
| contents: | ||
| - docs/vectorisers.huggingface.HuggingFaceVectoriser.qmd | ||
| - docs/vectorisers.ollama.OllamaVectoriser.qmd | ||
| - docs/vectorisers.gcp.GcpVectoriser.qmd | ||
| - section: "Indexers" | ||
| contents: | ||
| - docs/indexers.qmd | ||
| - docs/indexers.VectorStore.qmd | ||
| - docs/indexers.VectorStore.embed.qmd | ||
| - docs/indexers.VectorStore.search.qmd | ||
| - docs/indexers.VectorStore.reverse_search.qmd | ||
| - docs/indexers.VectorStore.from_filespace.qmd | ||
| - section: "Servers" | ||
| contents: | ||
| - docs/servers.qmd | ||
| - docs/servers.get_router.qmd | ||
| - docs/servers.get_server.qmd | ||
| - docs/servers.run_server.qmd | ||
| - docs/servers.make_endpoints.qmd | ||
| - section: "DEMO" | ||
| contents: | ||
| - file: DEMO/README.md | ||
| - file: DEMO/general_workflow_demo.ipynb | ||
| - file: DEMO/custom_vectoriser.ipynb | ||
| - file: DEMO/custom_preprocessing_and_postprocessing_hooks.ipynb | ||
|
|
||
| interlinks: | ||
| sources: | ||
| python: | ||
| url: https://docs.python.org/3/ | ||
|
|
||
| format: | ||
| html: | ||
| theme: cosmo | ||
| css: styles.css | ||
| toc: true | ||
| grid: | ||
| sidebar-width: 400px | ||
| body-width: 900px | ||
| margin-width: 200px | ||
| gutter-width: 1.0rem | ||
|
|
||
| quartodoc: | ||
| style: pkgdown | ||
| dir: docs | ||
| renderer: | ||
| style: _renderer.py | ||
| show_signature_annotations: false | ||
| # renderer: | ||
| # style: markdown | ||
| package: classifai | ||
| parser: google | ||
| sections: | ||
| - title: Vectorisers | ||
| desc: "Utilities to project text into numerical representation in a semantic vector space" | ||
| contents: | ||
| - vectorisers | ||
| - vectorisers.base | ||
| - vectorisers.base.VectoriserBase | ||
| - vectorisers.base.VectoriserBase.transform | ||
| - vectorisers.huggingface.HuggingFaceVectoriser | ||
| - vectorisers.ollama.OllamaVectoriser | ||
| - vectorisers.gcp.GcpVectoriser | ||
| - title: Indexers | ||
| desc: "Creation of Vector Stores for efficient similarity search and retrieval" | ||
| contents: | ||
| - indexers | ||
| - indexers.main | ||
| - indexers.VectorStore | ||
| - indexers.dataclasses.VectorStoreSearchInput | ||
| - indexers.dataclasses.VectorStoreSearchOutput | ||
| - indexers.dataclasses.VectorStoreReverseSearchInput | ||
| - indexers.dataclasses.VectorStoreReverseSearchOutput | ||
| - indexers.dataclasses.VectorStoreEmbedInput | ||
| - indexers.dataclasses.VectorStoreEmbedOutput | ||
| - indexers.VectorStore.embed | ||
| - indexers.VectorStore.search | ||
| - indexers.VectorStore.reverse_search | ||
| - indexers.VectorStore.from_filespace | ||
| - title: Servers | ||
| desc: "Expose ClassifAI functionality via Fast-API endpoints" | ||
| contents: | ||
| - servers | ||
| - servers.main | ||
| - servers.get_router | ||
| - servers.get_server | ||
| - servers.run_server | ||
| - servers.make_endpoints |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,63 @@ | ||
| from __future__ import annotations | ||
|
|
||
| from numpydoc.docscrape import NumpyDocString | ||
| from plum import dispatch | ||
| from quartodoc import MdRenderer | ||
| from quartodoc import ast as qast | ||
|
|
||
|
|
||
| class Renderer(MdRenderer): | ||
| style = "siuba" | ||
|
|
||
| @dispatch | ||
| def render(self, el): | ||
| """General render method. | ||
| Note: overloading of `render` enabled via plum.dispatch to allow different | ||
| rendering behaviour for some elements. | ||
| """ | ||
| prev_obj = getattr(self, "crnt_obj", None) | ||
| self.crnt_obj = el | ||
| res = super().render(el) | ||
| self.crnt_obj = prev_obj | ||
|
|
||
| return res | ||
|
|
||
| @dispatch | ||
| def render(self, el: qast.DocstringSectionSeeAlso): # noqa: F811 | ||
| """Numpy Docstring style render method. | ||
| Note: overloading of `render` enabled via plum.dispatch to allow different | ||
| rendering behaviour for some elements. | ||
| """ | ||
| lines = el.value.split("\n") | ||
|
|
||
| # each entry in result has form: ([('func1', '<directive>), ...], <description>) | ||
| parsed = NumpyDocString("")._parse_see_also(lines) | ||
|
|
||
| result = [] | ||
| for funcs, description in parsed: | ||
| links = [f"[{name}](`{self._name_to_target(name)}`)" for name, role in funcs] | ||
|
|
||
| str_links = ", ".join(links) | ||
|
|
||
| if description: | ||
| str_description = "<br>".join(description) | ||
| result.append(f"{str_links}: {str_description}") | ||
| else: | ||
| result.append(str_links) | ||
|
|
||
| return "*\n".join(result) | ||
|
|
||
| def _name_to_target(self, name: str): | ||
| """Helper method to convert a function/class name to a full target path, | ||
| used for Numpy Docstring style render method. | ||
| """ | ||
| crnt_path = getattr(self.crnt_obj, "path", None) | ||
| parent = crnt_path.rsplit(".", 1)[0] + "." | ||
| pkg = "classifai." | ||
|
|
||
| if crnt_path and not (name.startswith(pkg) or name.startswith(parent)): | ||
| return f"{parent}{name}" | ||
| elif not name.startswith(pkg): | ||
| return f"{pkg}{name}" | ||
|
|
||
| return name |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| # Package Overview {.unnumbered} | ||
|
|
||
| {{< include README.md >}} |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,30 @@ | ||
| """Indexers package.""" | ||
| # pylint: disable=C0301 | ||
| """This module provides functionality for creating a vector index from a text file. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "from a csv (text) file" instead? |
||
| It defines the `VectorStore` class, which is used to model and create vector databases | ||
| from CSV text files using a vectoriser object. | ||
|
|
||
| This class interacts with the Vectoriser class from the vectorisers submodule, | ||
| expecting that any vector model used to generate embeddings used in the | ||
| VectorStore objects is an instance of one of these classes, most notably | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would remove the hard reference to there being 3 Vectoriser classes. We might add more, users might make a custom one. Possibly it could refer to the base class instead |
||
| that each vectoriser object should have a transform method. | ||
|
|
||
| Key Features: | ||
| - Batch processing of input files to handle large datasets. | ||
| - Support for CSV file format (additional formats may be added in future updates). | ||
| - Integration with a custom embedder for generating vector embeddings. | ||
| - Logging for tracking progress and handling errors during processing. | ||
|
|
||
| Dependencies: | ||
| - polars: For handling data in tabular format and saving it as a Parquet file. | ||
| - tqdm: For displaying progress bars during batch processing. | ||
| - numpy: for vector cosine similarity calculations | ||
| - A custom file iterator (`iter_csv`) for reading input files in batches. | ||
|
|
||
| Usage: | ||
| This module is intended to be used with the Vectoriers mdodule and the | ||
| the servers module from ClassifAI, to created scalable, modular, searchable | ||
| vector databases from your own text data. | ||
| """ | ||
|
|
||
| from .dataclasses import ( | ||
| VectorStoreEmbedInput, | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this doctoring also capture the other functionalities of the VectorStore such as search() reverse_saerch(0 and embed(), and from_filesepace(). The key features and doctoring currently only refers to what happens in the index create / constructor step.