-
Notifications
You must be signed in to change notification settings - Fork 74
Integration with Vector Databases #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe recent changes introduce advanced vector handling and querying capabilities to Changes
Poem
Tip Early access features: enabledWe are currently testing the following features in early access:
Note:
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files ignored due to path filters (8)
chroma_db/104d460b-aaa5-4746-969c-b131149e52a7/data_level0.binis excluded by!**/*.binchroma_db/104d460b-aaa5-4746-969c-b131149e52a7/header.binis excluded by!**/*.binchroma_db/104d460b-aaa5-4746-969c-b131149e52a7/length.binis excluded by!**/*.binchroma_db/104d460b-aaa5-4746-969c-b131149e52a7/link_lists.binis excluded by!**/*.binchroma_db/7f8bd9ff-1cf4-4944-81ab-e7c257a0268c/data_level0.binis excluded by!**/*.binchroma_db/7f8bd9ff-1cf4-4944-81ab-e7c257a0268c/header.binis excluded by!**/*.binchroma_db/7f8bd9ff-1cf4-4944-81ab-e7c257a0268c/length.binis excluded by!**/*.binchroma_db/7f8bd9ff-1cf4-4944-81ab-e7c257a0268c/link_lists.binis excluded by!**/*.bin
Files selected for processing (4)
- SimplerLLM/language/llm.py (3 hunks)
- SimplerLLM/tools/vector_db.py (1 hunks)
- new.py (1 hunks)
- requirements.txt (1 hunks)
Files skipped from review due to trivial changes (1)
- requirements.txt
Additional context used
Ruff
SimplerLLM/tools/vector_db.py
1-1:
osimported but unusedRemove unused import:
os(F401)
new.py
3-3:
osimported but unusedRemove unused import:
os(F401)
SimplerLLM/language/llm.py
1-1:
osimported but unusedRemove unused import:
os(F401)
2-2:
dotenv.load_dotenvimported but unusedRemove unused import:
dotenv.load_dotenv(F401)
4-4:
SimplerLLM.language.llm_providers.openai_llm.generate_responseimported but unusedRemove unused import:
SimplerLLM.language.llm_providers.openai_llm.generate_response(F401)
5-5:
SimplerLLM.language.llm_providers.openai_llm.generate_response_asyncimported but unusedRemove unused import:
SimplerLLM.language.llm_providers.openai_llm.generate_response_async(F401)
116-116: Undefined name
openai_llm(F821)
155-155: Undefined name
openai_llm(F821)
215-215: Undefined name
gemini_llm(F821)
252-252: Undefined name
gemini_llm(F821)
301-301: Undefined name
anthropic_llm(F821)
339-339: Undefined name
anthropic_llm(F821)
Additional comments not posted (4)
SimplerLLM/tools/vector_db.py (2)
25-26: Review ofquery_similarmethod.This method is a simple wrapper around
query_vectors, which is a good use of code reuse. Ensure that the behavior ofquery_vectorsis as intended, as any change will affect this method too.
15-16: Review ofstore_vectorsmethod.The method correctly stores vectors with unique IDs. However, consider handling potential exceptions that might arise from database operations to improve robustness.
new.py (1)
59-64: Review ofmainfunction and script entry point.The
mainfunction and the script's entry point are well-defined. It's good practice to include such entry points in scripts intended for direct execution.SimplerLLM/language/llm.py (1)
58-62: Review of vector storage and querying methods.The methods
store_response_as_vectorandfind_similar_responsesare straightforward and utilize theVectorDBclass effectively. Ensure that the underlying methods inVectorDBhandle errors appropriately.
| def __init__(self): | ||
| persistence_directory = "./chroma_db" | ||
| self.client = chromadb.PersistentClient(path=persistence_directory) | ||
| self.embedding_function = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2") | ||
| self.collection = self.client.get_or_create_collection( | ||
| name="responses", | ||
| embedding_function=self.embedding_function | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review of VectorDB class constructor.
The constructor initializes the PersistentClient and sets up a collection with an embedding function. The hard-coded path for the database ("./chroma_db") could be made configurable to enhance flexibility in different environments.
- persistence_directory = "./chroma_db"
+ def __init__(self, persistence_directory="./chroma_db"):
+ self.client = chromadb.PersistentClient(path=persistence_directory)Committable suggestion was skipped due to low confidence.
| def store_response(self, text): | ||
| self.collection.add(documents=[text], ids=[f"id_{self.collection.count()}"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review of store_response method.
The method adds a single document to the collection. Using the collection's count as an ID is risky as it can lead to race conditions in a concurrent environment. Consider using a more robust method for generating unique IDs.
- ids=[f"id_{self.collection.count()}"]
+ import uuid
+ ids=[str(uuid.uuid4())]Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def store_response(self, text): | |
| self.collection.add(documents=[text], ids=[f"id_{self.collection.count()}"]) | |
| def store_response(self, text): | |
| import uuid | |
| self.collection.add(documents=[text], ids=[str(uuid.uuid4())]) |
| self.vector_db = VectorDB() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review of LLM class constructor.
The constructor initializes a VectorDB instance. It's good practice to allow dependency injection for better testing and flexibility.
- self.vector_db = VectorDB()
+ def __init__(self, vector_db=None):
+ self.vector_db = vector_db if vector_db else VectorDB()Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| self.vector_db = VectorDB() | |
| def __init__(self, vector_db=None): | |
| self.vector_db = vector_db if vector_db else VectorDB() |
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- new.py (1 hunks)
Additional context used
Ruff
new.py
3-3:
osimported but unusedRemove unused import:
os(F401)
Additional comments not posted (1)
new.py (1)
62-67: LGTM!The
mainfunction is straightforward and does not require changes.
| print("\nQuerying for similar responses:") | ||
| for query_prompt in query_prompts: | ||
| print(f"\nQuery: {query_prompt}") | ||
| start_time = time.time() | ||
| similar_responses = llm.find_similar_responses(query_prompt) | ||
| end_time = time.time() | ||
| print(f"Time taken: {end_time - start_time:.2f} seconds") | ||
| print("Similar responses:") | ||
| for i, response in enumerate(similar_responses, 1): | ||
| print(f"{i}. {response}") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add error handling to the querying process.
The loop for querying similar responses is clear and straightforward. However, adding error handling would improve the robustness of the test.
+ try:
similar_responses = llm.find_similar_responses(query_prompt)
+ except Exception as e:
+ print("Error occurred:", e)Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| print("\nQuerying for similar responses:") | |
| for query_prompt in query_prompts: | |
| print(f"\nQuery: {query_prompt}") | |
| start_time = time.time() | |
| similar_responses = llm.find_similar_responses(query_prompt) | |
| end_time = time.time() | |
| print(f"Time taken: {end_time - start_time:.2f} seconds") | |
| print("Similar responses:") | |
| for i, response in enumerate(similar_responses, 1): | |
| print(f"{i}. {response}") | |
| print("\nQuerying for similar responses:") | |
| for query_prompt in query_prompts: | |
| print(f"\nQuery: {query_prompt}") | |
| start_time = time.time() | |
| try: | |
| similar_responses = llm.find_similar_responses(query_prompt) | |
| except Exception as e: | |
| print("Error occurred:", e) | |
| continue | |
| end_time = time.time() | |
| print(f"Time taken: {end_time - start_time:.2f} seconds") | |
| print("Similar responses:") | |
| for i, response in enumerate(similar_responses, 1): | |
| print(f"{i}. {response}") |
I have added functions to integrate words into Vector bases. I have utilized chroma Database which is using all-MiniLM-L6-v2 model from the Sentence Transformers library.
In SimpleLLm/tools/vector_db.py , I have added code as follows :
Then in SimplerLLM/language/llm.py , the following modifications were added,in addition to existing code, In order to invoke the Execution of the Vector databases
Initialized instance of an class
Then
The Below given Libraries are required to be Installed
pip install chromadb sentence-transformersFinally in requirements.txt, gave the correct versions
You can test this working by executing following Sample code
Summary by CodeRabbit
New Features
Enhancements
Dependencies
sentence-transformersandchromadbto the project dependencies.