This application automates the retrieval, processing, and embedding of Mars rover data into a memory system designed for Retrieval-Augmented Generation (RAG). The pipeline is built on AWS Step Functions and serverless architecture to orchestrate data processing with scalability and efficiency.
The pipeline is designed to enable a chatbot with contextual memory, simulating the ability to "remember" and reference Mars Rover data in conversations. It performs the following steps:
-
Fetch Chat and Logs:
- Retrieves chat messages and logs from the API endpoints.
- Outputs relevant data and metadata for further processing.
-
Generate and Store Memories:
- Processes chat data and stores memory entries in a structured format.
- Stores these entries in a database or S3 bucket (if configured).
-
Embed Memories into PineconeDB:
- Embeds memory entries into Pinecone for use in RAG workflows and chatbot conversations.
functions: Lambda functions for chat and log retrieval:rover_chat: Handles chat interactions and memory generation.get_logs: Retrieves logs for analysis and debugging.
layers: Shared dependencies for Lambda functions.tests: Unit tests for pipeline components.template.yaml: AWS SAM template defining serverless resources.
-
Clone the repository
git clone https://github.com/amfelso/rover-chatbot.git cd rover-chatbot -
Set up your environment
make setup
This will:
- Install Python dependencies from
layers/rover_chat/requirements.txt - Create a
.envfile from.env.example
- Install Python dependencies from
-
Configure your API keys Edit the
.envfile with your actual API keys:PINECONE_API_KEY=your-pinecone-api-key-here OPENAI_API_KEY=your-openai-api-key-here
make check-tools- Verify required tools installedmake setup- Create venv, install dependencies, and create.envmake install- Install Python dependencies onlymake login- Configure AWS credentials from.envmake test- Run all tests (automatically loads.env)make test-unit- Run unit tests onlymake test-integration- Run integration tests only (requires deployed stack and API_ENDPOINT in.env)make lint- Run flake8 linter and SAM template validationmake build- Build SAM applicationmake deploy- Lint, test, build, and deploy to AWSmake clean- Clean build artifacts, venv, and Python cache files
Automatic Deployment: Pipeline will automatically deploy via Github Actions when code updates are merged to the release branch.
Manual Deployment:
make deployThis will lint, test, build, and deploy the application to AWS using SAM. After deployment, the API endpoint will be written to .env as WEBENDPOINT.
Tests ensure the functionality of individual Lambda functions and the pipeline as a whole.
# Run all tests
make test
# Run only unit tests
make test-unit
# Run only integration tests (requires deployed stack and API_ENDPOINT in .env)
make test-integrationNote:
- Unit tests can run locally without any AWS resources.
- Integration tests require the stack to be deployed and
API_ENDPOINTto be set in.env.
- AWS SAM Developer Guide: Introduction to SAM specification, the SAM CLI, and serverless application concepts.
- Pinecone Documentation: Guide to setting up and managing vector embeddings for efficient RAG workflows.
For more details, see the source code and comments in each function directory.
- Deploy to AWS:
sam deploy --guided
-
Request:
- Endpoint:
/chat - Method:
POST - Payload:
{ "user_prompt": "Hi Rover! How are you?", "conversation_id": "test", "earth_date": "2012-08-06" }
- Endpoint:
-
Response:
- Example:
{ "response": "Today I explored Jezero Crater and captured images of layered sedimentary rocks!" }
- Example:
This repository includes automated workflows for:
- Testing: Runs
pytestfor unit tests. - Linting: Runs
flake8to ensure code quality. - Deployment: Deploys the stack using AWS SAM.
rover-chatbot/
├── README.md
├── template.yaml # AWS SAM template
├── .gitignore # Git ignore file
├── .github/ # GitHub Actions workflows
│ ├── workflows/
│ │ ├── Develop.yml
│ │ └── Release.yml
├── functions/ # Lambda function code
│ ├── rover_chat/
│ │ ├── __init__.py
│ │ ├── app.py # ChatProcessor Lambda handler
│ │ ├── helpers.py # Helper functions
├── layers/ # Lambda layer for shared dependencies
│ ├── rover_chat/
│ │ ├── requirements.txt
├── tests/ # Unit tests
│ ├── unit/
│ │ ├── __init__.py
│ │ ├── test_chat.py
│ ├── __init__.py
- Support for additional endpoints (e.g., chat history, Rover status).
- Integration with more advanced memory systems for multi-turn conversations.
- Enhanced logging and monitoring with CloudWatch and X-Ray.
This project is licensed under the MIT License. See LICENSE for details.