Skip to content

wchwawa/data-embedding-rag-system

Repository files navigation

🚀 Project Setup Guide

🌐 Live Demo

👉 Click here to view the live site CleanShot 2025-05-16 at 19 03 07@2x

🛠️ Tech Stack

TypeScript Next JS langchain ChatGPT Supabase Azure Rss Zod

📦 Dependency Setup

A. Automatic Setup

chmod +x setup.sh
./setup.sh

B. Manual Setup

  1. Make sure Node.js is installed.
  2. In the project root, run:
    npm install
  3. Copy .env.example to .env.local and fill in your own API keys.

🔑 Environment Variables

Refer to .env.example and fill in your credentials in .env.local.


🗄️ Database Setup

  1. Get familiar with Supabase.
  2. Enable the pgvector extension:
    Supabase vector columns guide
  3. Run the following script in the Supabase SQL editor:
-- Enable pgvector if not already enabled
CREATE EXTENSION IF NOT EXISTS vector;
-- Drop old function if exists
DROP FUNCTION IF EXISTS match_documents(vector, integer, jsonb);
-- Create documents table
CREATE TABLE IF NOT EXISTS documents (
  id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  content text,
  embedding vector(1536),
  created_at timestamp DEFAULT now(),
  metadata jsonb
);
-- Optional: create index for jsonb metadata tag search
CREATE INDEX IF NOT EXISTS idx_metadata_tags ON documents USING gin ((metadata->'tags'));
-- Create match_documents function
CREATE OR REPLACE FUNCTION match_documents (
  query_embedding vector(1536),
  match_count int,
  filter jsonb DEFAULT '{}'::jsonb
)
RETURNS TABLE (
  id uuid,
  title text,
  content text,
  metadata jsonb,
  similarity float
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    docs.id,
    docs.metadata->>'title' AS title,
    docs.content,
    docs.metadata,
    1 - (docs.embedding <=> query_embedding) AS similarity
  FROM documents AS docs
  WHERE
    (
      ((filter - 'tags') = '{}'::jsonb) OR (docs.metadata @> (filter - 'tags'))
    )
    AND
    (
      NOT (filter ? 'tags') OR 
      (
        (docs.metadata ? 'tags') AND
        (jsonb_typeof(docs.metadata->'tags') = 'array') AND
        (docs.metadata->'tags' ?| ARRAY(SELECT jsonb_array_elements_text(filter->'tags')))
      )
    )
  ORDER BY docs.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

📥 Data Retrieval & Indexing

  • Configure the news topics you want to fetch in script/process-indexing.ts.
  • For valid parameters, see: Alpha Vantage News Sentiment docs
  • Podcast parameters (e.g., account, topic) are preset and cannot be changed.
  • To fetch and index data, run:
    npm run test ./script/process-indexing.ts

🖥️ Run the Server

npm run dev

🗃️ Database and Types

  • Check types/supabase.ts for the current schema types.
  • After changing the database schema, regenerate types with:
    npm run supabase:generate-types

🏗️ System Structure

diagram

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages