🌊 FloatChat - AI-Powered Ocean Data Explorer

Where Data Meets the Deep — Intelligent Ocean Data Analysis Made Simple

What is FloatChat?

FloatChat is an innovative AI-powered conversational interface that transforms complex ARGO oceanographic data into intuitive, natural language queries. Built with modern technologies, it allows researchers, oceanographers, and data enthusiasts to explore ocean data through simple conversations, interactive visualizations, and intelligent insights.

Key Highlights

AI-Powered Search: Uses sentence transformers and semantic embeddings for intelligent data discovery
Geospatial Intelligence: Automatic geocoding and location-aware filtering
Interactive Visualizations: Beautiful depth-time plots, maps, and real-time analytics
Natural Language Interface: Ask questions in plain English about ocean data
Real-time Processing: Fast query response with intelligent caching and hybrid scoring
Modern UI: Beautiful Streamlit interface with ocean-themed design

Architecture Overview

graph TB
    A[NetCDF Files] --> B[Data Ingestion]
    B --> C[PostgreSQL Database]
    B --> D[Vector Embeddings]
    C --> E[Flask API Backend]
    D --> E
    E --> F[Streamlit Frontend]
    F --> G[Interactive Visualizations]
    F --> H[Chat Interface]
    
    I[User Queries] --> H
    H --> E
    E --> J[AI Processing]
    J --> K[Results & Visualizations]
    
    L[Google Gemini AI] --> E
    E --> M[Natural Language Explanations]

Tech Stack

Backend & API

Flask - Lightweight web framework for API development
PostgreSQL - Relational database for structured oceanographic data
Sentence Transformers - AI embeddings using all-MiniLM-L6-v2 model
Google Gemini AI - Natural language processing for query explanations
psycopg2-binary - PostgreSQL adapter for database operations

Frontend & Visualization

Streamlit - Interactive web application framework with custom CSS
Plotly - Advanced 2D/3D data visualizations with interactive features
Pandas - Data manipulation and analysis for real-time processing
Geopy - Geocoding and location services for place name resolution

Data Processing & ML

xarray - Multi-dimensional NetCDF file processing
NumPy - Numerical computing for vector operations
Pandas - Data analysis and time series processing
SQLAlchemy - Database ORM for advanced query operations

📁 Project Structure

floatchat-clean/
├── 📁 api/                     # Flask backend service
│   ├── Dockerfile              # Multi-stage Docker build for API
│   ├── app.py                  # Main Flask application
│   ├── main.py                 # API entry point with CORS
│   ├── query.py                # Advanced query processing with Gemini AI
│   ├── fallback_query.py       # Fallback query handling
│   └── requirements.txt        # Backend dependencies
|
├── 📁 frontend/                # Streamlit user interface
│   ├── Dockerfile              # Multi-stage Docker build for Frontend
│   ├── chatbot_ui.py           # Main chat interface with visualizations
│   ├── front.py                # Multi-page application with navigation
│   ├── map_page.py             # Geospatial visualizations and maps
│   ├── timedepthplot.py        # Depth-time analysis and heatmaps
│   ├── dummy.py                # Demo data utilities
│   ├── FloatChat.png           # Application logo
│   ├── layered-waves-haikei.svg # Background graphics
│   └── requirements.txt        # Frontend dependencies
|
├── 📁 ingestion/               # Data processing pipeline
│   ├── main.py                 # NetCDF → PostgreSQL + Vector embeddings
│   ├── requirements.txt        # Ingestion dependencies
│   └── tempCodeRunnerFile.py   # Development utilities
|
├── 📁 data/                    # Raw NetCDF oceanographic files
│   └── 20250901_prof.nc        # Sample ARGO float data
|
├── 📁 infra/                   # Infrastructure and setup scripts
├── 📁 .github/                  # GitHub Actions workflows
│   └── workflows/
│       └── ci.yml              # CI/CD pipeline for testing and Docker builds
├── dummy.db                    # SQLite demo database
├── requirements.txt            # Global project dependencies
└── README.md                   # This documentation

Quick Start Guide

Prerequisites

Python 3.8+
PostgreSQL 13+ (or SQLite for demo)
Git

1️) Clone the Repository

git clone https://github.com/SyedOwais312/floatchat.git
cd floatchat-clean

2️)Install Dependencies

Option A: Install Everything (Recommended)

# Install all dependencies at once
pip install -r requirements.txt

Option B: Install Individual Components

# API only
pip install -r api/requirements.txt

# Frontend only  
pip install -r frontend/requirements.txt

# Ingestion only
pip install -r ingestion/requirements.txt

3) Database Setup

PostgreSQL Setup (Production)

# Create database and user
sudo -u postgres psql
CREATE DATABASE floatchatai;
CREATE USER floatchat_user WITH PASSWORD 'your_secure_password';
GRANT ALL PRIVILEGES ON DATABASE floatchatai TO floatchat_user;
\q

SQLite Setup (Demo - No Setup Required)

The project includes a dummy.db SQLite database for immediate testing.

4️) Environment Configuration

Create a .env file in the root directory:

# API Configuration
GOOGLE_API_KEY=your_google_gemini_api_key
QUERY_API=http://127.0.0.1:5000/query(for local)

# Database Configuration (if using PostgreSQL)
DB_HOST=localhost
DB_NAME=floatchatai
DB_USER=floatchat_user
DB_PASSWORD=your_secure_password

5️) Start the Application

Start Backend API

cd api
python main.py
# API will be available at http://localhost:5000

Start Frontend Application

cd frontend
streamlit run front.py
# Frontend will be available at http://localhost:8501

Data Ingestion (Optional)

cd ingestion
python main.py
# This processes NetCDF files and populates the database

6️) Access the Application

Frontend: http://localhost:8501 - Interactive ocean data explorer
API: http://localhost:5000 - REST API endpoints
Map Visualization: Navigate to Map page in the frontend
Time-Depth Analysis: Navigate to Depth-Time Plots page

Dockerized Deployment

FloatChat uses multi-stage Docker builds for optimized container images. Both the API and Frontend services can be run in separate containers.

Building Docker Images

Build API Image

cd api
docker build -t floatchat-api:latest .

Build Frontend Image

cd frontend
docker build -t floatchat-frontend:latest .

Running the Containers

Run API Container

docker run --rm -p 5000:5000 \
  --env-file .env \
  floatchat-api:latest

API will be available at http://localhost:5000
Ensure your .env file contains GOOGLE_API_KEY and database credentials

Run Frontend Container

docker run --rm -p 8501:8501 \
  --env-file .env \
  floatchat-frontend:latest

Frontend will be available at http://localhost:8501
Ensure your .env file contains QUERY_API pointing to your API endpoint

Docker Compose (Recommended)

For easier orchestration, you can use Docker Compose:

version: '3.8'
services:
  api:
    build: ./api
    ports:
      - "5000:5000"
    env_file:
      - .env
    environment:
      - DB_HOST=postgres
    depends_on:
      - postgres

  frontend:
    build: ./frontend
    ports:
      - "8501:8501"
    env_file:
      - .env
    depends_on:
      - api

  postgres:
    image: postgres:13
    environment:
      POSTGRES_DB: floatchatai
      POSTGRES_USER: floatchat_user
      POSTGRES_PASSWORD: your_password
    volumes:
      - postgres_data:/var/lib/postgresql/data

volumes:
  postgres_data:

Run with:

docker-compose up --build

Docker Image Features

Multi-stage builds: Optimized image sizes by separating build and runtime stages
Wheel-based installation: Faster builds with pre-compiled Python packages
Network resilience: Configured with extended timeouts for slow network connections
Production-ready: Minimal runtime images with only necessary dependencies

CI/CD Workflow (GitHub Actions)

File: .github/workflows/ci.yml

This workflow runs on every push or pull request to main:

test job
- Checks out the repository
- Installs Python 3.11 and project dependencies
- Runs python -m compileall ... as a lightweight syntax and import check
docker job
- Builds both API and Frontend Docker images using BuildKit
- Uses GitHub Actions cache for faster subsequent builds
- Pushes both images to GitHub Container Registry (ghcr.io) on successful pushes to main:
  - ghcr.io/<repo-name>-api:latest
  - ghcr.io/<repo-name>-frontend:latest

Pulling Images from GitHub Container Registry

After a successful build, you can pull and run the images:

# Login to GitHub Container Registry
echo $GITHUB_TOKEN | docker login ghcr.io -u USERNAME --password-stdin

# Pull API image
docker pull ghcr.io/<your-username>/<repo-name>-api:latest

# Pull Frontend image
docker pull ghcr.io/<your-username>/<repo-name>-frontend:latest

Customization Notes

To push to another registry (Docker Hub, AWS ECR, etc.), update the REGISTRY, IMAGE_NAME, and login step accordingly.
Add more quality gates (pytest, linting, etc.) by inserting steps into the test job.
Set additional secrets under the repository settings if you need third-party registry credentials.
The workflow uses separate build contexts for API and Frontend to optimize build times.

Key Features

Natural Language Interface

Ask questions in plain English about ocean data:

"Show salinity profiles near the equator"
"Find temperature data near Mumbai"
"Compare ocean data at lat=-43.037, long=130"
"What's the salinity trend in the Pacific Ocean?"

AI-Powered Search

Semantic Search: Find relevant data using meaning, not just keywords
Vector Embeddings: 384-dimensional embeddings for precise matching
Hybrid Scoring: Combines semantic similarity with geographic proximity
Natural Language Explanations: AI-generated explanations using Google Gemini

Advanced Visualizations

Interactive Maps: Geospatial visualization of ARGO float trajectories
Depth-Time Heatmaps: Visualize ocean parameters across time and depth
Profile Comparisons: Side-by-side analysis of different ocean variables
Real-time Charts: Dynamic Plotly visualizations with hover details

Geospatial Intelligence

Automatic Geocoding: Convert place names to coordinates
Location-Aware Filtering: Filter data by geographic proximity
Distance Calculations: Find nearest float data to any location
Coordinate Extraction: Parse lat/long from natural language queries

Usage Examples

Natural Language Queries

# Example queries you can ask:
"Show salinity profiles near the equator in March 2023"
"Compare temperature in Arabian Sea last 6 months"  
"Find temperature data at lat=-43.037, long=130"
"Show me ocean data near Mumbai"
"What's the salinity trend in the Pacific Ocean?"

API Usage

import requests
import json

# Query the API
response = requests.post(QUERY_API, 
                        json={"query": "Show salinity near the equator"})
data = response.json()

# Access results
for profile in data:
    print(f"Profile ID: {profile['profile_id']}")
    print(f"Location: {profile['lat']}, {profile['lon']}")
    print(f"Time: {profile['time']}")
    print(f"AI Explanation: {profile['query_explain']}")
    
    # Access depth-level data
    for level in profile['depth_levels']:
        print(f"  Pressure: {level['pres']} dbar, "
              f"Temperature: {level['temp']}°C, "
              f"Salinity: {level['salinity']} PSU")

Frontend Navigation

The Streamlit frontend provides multiple pages:

Home: Welcome page with feature overview
FloatChat: AI-powered chat interface
🗺Map: Interactive geospatial visualizations
Profile Comparison: Side-by-side data analysis
Depth-Time Plots: Time series and heatmap analysis

⚙️ Configuration

Database Configuration

Update database credentials in the respective files:

# api/main.py, ingestion/main.py
DB_CONFIG = {
    "host": "localhost",
    "database": "floatchatai",
    "user": "your_username", 
    "password": "your_password"
}

API Configuration

# api/query.py
TOP_K = 1                    # Number of top results to return
RADIUS_METERS = 50_000       # Search radius in meters
MODEL_NAME = 'all-MiniLM-L6-v2'  # Sentence transformer model

Frontend Configuration

# frontend/chatbot_ui.py
API_BASE_URL = "http://127.0.0.1:5000"(for local) # Backend API URL
TIMEOUT_SECONDS = 30                     # Request timeout

Data Format

Input Data

NetCDF Files: ARGO float data in standard NetCDF format
Variables: Temperature (TEMP), Salinity (PSAL), Pressure (PRES)
Metadata: Latitude, Longitude, Time (JULD), Platform Number

API Response Format

{
  "profile_id": 123,
  "lat": -43.037,
  "lon": 130.0,
  "time": "2023-03-15 12:00:00",
  "depth_levels": [
    {"pres": 5.0, "temp": 18.5, "salinity": 35.2},
    {"pres": 10.0, "temp": 18.3, "salinity": 35.1}
  ],
  "query_explain": "Based on your query about salinity near the equator, I found oceanographic data from the Indian Ocean. The surface temperature is around 26.3°C, but it gets cooler as you go deeper..."
}

Testing

Run Demo Mode

The application includes a demo mode that works without a database:

Start the frontend: streamlit run frontend/front.py
Ask any ocean-related question (if you dont specify the data and time it will take the current date and time)
The system will provide demo data with realistic ocean profiles

Test API Endpoints

# Health check
curl http://localhost:5000/

# Query test
curl -X POST http://localhost:5000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "Show salinity near the equator"}'

Code Style

Follow PEP 8 guidelines
Use meaningful variable names
Add docstrings to functions
Include type hints where possible

ollaboration Features** - Share and annotate findings

🙏 Acknowledgments

ARGO Program - For providing the oceanographic data
Streamlit - For the amazing web framework
Flask - For the lightweight API framework
PostgreSQL - For the robust database system
Sentence Transformers - For the semantic search capabilities
Google Gemini - For natural language processing
Hugging Face - For the pre-trained models
Plotly - For the interactive visualizations
Geopy - For geocoding services

Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Email: Contact the maintainer for direct support

License

This project is licensed under the MIT License - see the LICENSE file for details.

Star ⭐ this repository if you find it helpful!

Example Output

Here’s what FloatChat looks like in action:

FloatChat answering oceanographic queries in natural language

Interactive depth-time visualization of ARGO float data

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
.github/workflows		.github/workflows
api		api
data		data
frontend		frontend
ingestion		ingestion
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dummy.db		dummy.db
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🌊 FloatChat - AI-Powered Ocean Data Explorer

What is FloatChat?

Key Highlights

Architecture Overview

Tech Stack

Backend & API

Frontend & Visualization

Data Processing & ML

📁 Project Structure

Quick Start Guide

Prerequisites

1️) Clone the Repository

2️)Install Dependencies

Option A: Install Everything (Recommended)

Option B: Install Individual Components

3) Database Setup

PostgreSQL Setup (Production)

SQLite Setup (Demo - No Setup Required)

4️) Environment Configuration

5️) Start the Application

Start Backend API

Start Frontend Application

Data Ingestion (Optional)

6️) Access the Application

Dockerized Deployment

Building Docker Images

Build API Image

Build Frontend Image

Running the Containers

Run API Container

Run Frontend Container

Docker Compose (Recommended)

Docker Image Features

CI/CD Workflow (GitHub Actions)

Pulling Images from GitHub Container Registry

Customization Notes

Key Features

Natural Language Interface

AI-Powered Search

Advanced Visualizations

Geospatial Intelligence

Usage Examples

Natural Language Queries

API Usage

Frontend Navigation

⚙️ Configuration

Database Configuration

API Configuration

Frontend Configuration

Data Format

Input Data

API Response Format

Testing

Run Demo Mode

Test API Endpoints

Code Style

🙏 Acknowledgments

Support

License

Example Output

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages