FloatChat is an innovative AI-powered conversational interface that transforms complex ARGO oceanographic data into intuitive, natural language queries. Built with modern technologies, it allows researchers, oceanographers, and data enthusiasts to explore ocean data through simple conversations, interactive visualizations, and intelligent insights.
- AI-Powered Search: Uses sentence transformers and semantic embeddings for intelligent data discovery
- Geospatial Intelligence: Automatic geocoding and location-aware filtering
- Interactive Visualizations: Beautiful depth-time plots, maps, and real-time analytics
- Natural Language Interface: Ask questions in plain English about ocean data
- Real-time Processing: Fast query response with intelligent caching and hybrid scoring
- Modern UI: Beautiful Streamlit interface with ocean-themed design
graph TB
A[NetCDF Files] --> B[Data Ingestion]
B --> C[PostgreSQL Database]
B --> D[Vector Embeddings]
C --> E[Flask API Backend]
D --> E
E --> F[Streamlit Frontend]
F --> G[Interactive Visualizations]
F --> H[Chat Interface]
I[User Queries] --> H
H --> E
E --> J[AI Processing]
J --> K[Results & Visualizations]
L[Google Gemini AI] --> E
E --> M[Natural Language Explanations]
- Flask - Lightweight web framework for API development
- PostgreSQL - Relational database for structured oceanographic data
- Sentence Transformers - AI embeddings using
all-MiniLM-L6-v2model - Google Gemini AI - Natural language processing for query explanations
- psycopg2-binary - PostgreSQL adapter for database operations
- Streamlit - Interactive web application framework with custom CSS
- Plotly - Advanced 2D/3D data visualizations with interactive features
- Pandas - Data manipulation and analysis for real-time processing
- Geopy - Geocoding and location services for place name resolution
- xarray - Multi-dimensional NetCDF file processing
- NumPy - Numerical computing for vector operations
- Pandas - Data analysis and time series processing
- SQLAlchemy - Database ORM for advanced query operations
floatchat-clean/
├── 📁 api/ # Flask backend service
│ ├── Dockerfile # Multi-stage Docker build for API
│ ├── app.py # Main Flask application
│ ├── main.py # API entry point with CORS
│ ├── query.py # Advanced query processing with Gemini AI
│ ├── fallback_query.py # Fallback query handling
│ └── requirements.txt # Backend dependencies
|
├── 📁 frontend/ # Streamlit user interface
│ ├── Dockerfile # Multi-stage Docker build for Frontend
│ ├── chatbot_ui.py # Main chat interface with visualizations
│ ├── front.py # Multi-page application with navigation
│ ├── map_page.py # Geospatial visualizations and maps
│ ├── timedepthplot.py # Depth-time analysis and heatmaps
│ ├── dummy.py # Demo data utilities
│ ├── FloatChat.png # Application logo
│ ├── layered-waves-haikei.svg # Background graphics
│ └── requirements.txt # Frontend dependencies
|
├── 📁 ingestion/ # Data processing pipeline
│ ├── main.py # NetCDF → PostgreSQL + Vector embeddings
│ ├── requirements.txt # Ingestion dependencies
│ └── tempCodeRunnerFile.py # Development utilities
|
├── 📁 data/ # Raw NetCDF oceanographic files
│ └── 20250901_prof.nc # Sample ARGO float data
|
├── 📁 infra/ # Infrastructure and setup scripts
├── 📁 .github/ # GitHub Actions workflows
│ └── workflows/
│ └── ci.yml # CI/CD pipeline for testing and Docker builds
├── dummy.db # SQLite demo database
├── requirements.txt # Global project dependencies
└── README.md # This documentation
- Python 3.8+
- PostgreSQL 13+ (or SQLite for demo)
- Git
git clone https://github.com/SyedOwais312/floatchat.git
cd floatchat-clean# Install all dependencies at once
pip install -r requirements.txt# API only
pip install -r api/requirements.txt
# Frontend only
pip install -r frontend/requirements.txt
# Ingestion only
pip install -r ingestion/requirements.txt# Create database and user
sudo -u postgres psql
CREATE DATABASE floatchatai;
CREATE USER floatchat_user WITH PASSWORD 'your_secure_password';
GRANT ALL PRIVILEGES ON DATABASE floatchatai TO floatchat_user;
\qThe project includes a dummy.db SQLite database for immediate testing.
Create a .env file in the root directory:
# API Configuration
GOOGLE_API_KEY=your_google_gemini_api_key
QUERY_API=http://127.0.0.1:5000/query(for local)
# Database Configuration (if using PostgreSQL)
DB_HOST=localhost
DB_NAME=floatchatai
DB_USER=floatchat_user
DB_PASSWORD=your_secure_passwordcd api
python main.py
# API will be available at http://localhost:5000cd frontend
streamlit run front.py
# Frontend will be available at http://localhost:8501cd ingestion
python main.py
# This processes NetCDF files and populates the database- Frontend: http://localhost:8501 - Interactive ocean data explorer
- API: http://localhost:5000 - REST API endpoints
- Map Visualization: Navigate to Map page in the frontend
- Time-Depth Analysis: Navigate to Depth-Time Plots page
FloatChat uses multi-stage Docker builds for optimized container images. Both the API and Frontend services can be run in separate containers.
cd api
docker build -t floatchat-api:latest .cd frontend
docker build -t floatchat-frontend:latest .docker run --rm -p 5000:5000 \
--env-file .env \
floatchat-api:latest- API will be available at http://localhost:5000
- Ensure your
.envfile containsGOOGLE_API_KEYand database credentials
docker run --rm -p 8501:8501 \
--env-file .env \
floatchat-frontend:latest- Frontend will be available at http://localhost:8501
- Ensure your
.envfile containsQUERY_APIpointing to your API endpoint
For easier orchestration, you can use Docker Compose:
version: '3.8'
services:
api:
build: ./api
ports:
- "5000:5000"
env_file:
- .env
environment:
- DB_HOST=postgres
depends_on:
- postgres
frontend:
build: ./frontend
ports:
- "8501:8501"
env_file:
- .env
depends_on:
- api
postgres:
image: postgres:13
environment:
POSTGRES_DB: floatchatai
POSTGRES_USER: floatchat_user
POSTGRES_PASSWORD: your_password
volumes:
- postgres_data:/var/lib/postgresql/data
volumes:
postgres_data:Run with:
docker-compose up --build- Multi-stage builds: Optimized image sizes by separating build and runtime stages
- Wheel-based installation: Faster builds with pre-compiled Python packages
- Network resilience: Configured with extended timeouts for slow network connections
- Production-ready: Minimal runtime images with only necessary dependencies
File: .github/workflows/ci.yml
This workflow runs on every push or pull request to main:
-
testjob- Checks out the repository
- Installs Python 3.11 and project dependencies
- Runs
python -m compileall ...as a lightweight syntax and import check
-
dockerjob- Builds both API and Frontend Docker images using BuildKit
- Uses GitHub Actions cache for faster subsequent builds
- Pushes both images to GitHub Container Registry (
ghcr.io) on successful pushes tomain:ghcr.io/<repo-name>-api:latestghcr.io/<repo-name>-frontend:latest
After a successful build, you can pull and run the images:
# Login to GitHub Container Registry
echo $GITHUB_TOKEN | docker login ghcr.io -u USERNAME --password-stdin
# Pull API image
docker pull ghcr.io/<your-username>/<repo-name>-api:latest
# Pull Frontend image
docker pull ghcr.io/<your-username>/<repo-name>-frontend:latest- To push to another registry (Docker Hub, AWS ECR, etc.), update the
REGISTRY,IMAGE_NAME, and login step accordingly. - Add more quality gates (pytest, linting, etc.) by inserting steps into the
testjob. - Set additional secrets under the repository settings if you need third-party registry credentials.
- The workflow uses separate build contexts for API and Frontend to optimize build times.
Ask questions in plain English about ocean data:
- "Show salinity profiles near the equator"
- "Find temperature data near Mumbai"
- "Compare ocean data at lat=-43.037, long=130"
- "What's the salinity trend in the Pacific Ocean?"
- Semantic Search: Find relevant data using meaning, not just keywords
- Vector Embeddings: 384-dimensional embeddings for precise matching
- Hybrid Scoring: Combines semantic similarity with geographic proximity
- Natural Language Explanations: AI-generated explanations using Google Gemini
- Interactive Maps: Geospatial visualization of ARGO float trajectories
- Depth-Time Heatmaps: Visualize ocean parameters across time and depth
- Profile Comparisons: Side-by-side analysis of different ocean variables
- Real-time Charts: Dynamic Plotly visualizations with hover details
- Automatic Geocoding: Convert place names to coordinates
- Location-Aware Filtering: Filter data by geographic proximity
- Distance Calculations: Find nearest float data to any location
- Coordinate Extraction: Parse lat/long from natural language queries
# Example queries you can ask:
"Show salinity profiles near the equator in March 2023"
"Compare temperature in Arabian Sea last 6 months"
"Find temperature data at lat=-43.037, long=130"
"Show me ocean data near Mumbai"
"What's the salinity trend in the Pacific Ocean?"import requests
import json
# Query the API
response = requests.post(QUERY_API,
json={"query": "Show salinity near the equator"})
data = response.json()
# Access results
for profile in data:
print(f"Profile ID: {profile['profile_id']}")
print(f"Location: {profile['lat']}, {profile['lon']}")
print(f"Time: {profile['time']}")
print(f"AI Explanation: {profile['query_explain']}")
# Access depth-level data
for level in profile['depth_levels']:
print(f" Pressure: {level['pres']} dbar, "
f"Temperature: {level['temp']}°C, "
f"Salinity: {level['salinity']} PSU")The Streamlit frontend provides multiple pages:
- Home: Welcome page with feature overview
- FloatChat: AI-powered chat interface
- 🗺Map: Interactive geospatial visualizations
- Profile Comparison: Side-by-side data analysis
- Depth-Time Plots: Time series and heatmap analysis
Update database credentials in the respective files:
# api/main.py, ingestion/main.py
DB_CONFIG = {
"host": "localhost",
"database": "floatchatai",
"user": "your_username",
"password": "your_password"
}# api/query.py
TOP_K = 1 # Number of top results to return
RADIUS_METERS = 50_000 # Search radius in meters
MODEL_NAME = 'all-MiniLM-L6-v2' # Sentence transformer model# frontend/chatbot_ui.py
API_BASE_URL = "http://127.0.0.1:5000"(for local) # Backend API URL
TIMEOUT_SECONDS = 30 # Request timeout- NetCDF Files: ARGO float data in standard NetCDF format
- Variables: Temperature (TEMP), Salinity (PSAL), Pressure (PRES)
- Metadata: Latitude, Longitude, Time (JULD), Platform Number
{
"profile_id": 123,
"lat": -43.037,
"lon": 130.0,
"time": "2023-03-15 12:00:00",
"depth_levels": [
{"pres": 5.0, "temp": 18.5, "salinity": 35.2},
{"pres": 10.0, "temp": 18.3, "salinity": 35.1}
],
"query_explain": "Based on your query about salinity near the equator, I found oceanographic data from the Indian Ocean. The surface temperature is around 26.3°C, but it gets cooler as you go deeper..."
}The application includes a demo mode that works without a database:
- Start the frontend:
streamlit run frontend/front.py - Ask any ocean-related question (if you dont specify the data and time it will take the current date and time)
- The system will provide demo data with realistic ocean profiles
# Health check
curl http://localhost:5000/
# Query test
curl -X POST http://localhost:5000/query \
-H "Content-Type: application/json" \
-d '{"query": "Show salinity near the equator"}'- Follow PEP 8 guidelines
- Use meaningful variable names
- Add docstrings to functions
- Include type hints where possible
ollaboration Features** - Share and annotate findings
- ARGO Program - For providing the oceanographic data
- Streamlit - For the amazing web framework
- Flask - For the lightweight API framework
- PostgreSQL - For the robust database system
- Sentence Transformers - For the semantic search capabilities
- Google Gemini - For natural language processing
- Hugging Face - For the pre-trained models
- Plotly - For the interactive visualizations
- Geopy - For geocoding services
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: Contact the maintainer for direct support
This project is licensed under the MIT License - see the LICENSE file for details.
Star ⭐ this repository if you find it helpful!
Here’s what FloatChat looks like in action:


