This repository contains a cutting-edge project exploring the use of Large Language Models (LLMs) to gain a competitive advantage in stock selection. The system is built to automate research and streamline the process of identifying promising investment opportunities by analyzing vast amounts of textual data and market metrics.

The project focuses on building an intelligent Research Automation System capable of:
- Understanding natural language queries to identify relevant stocks (e.g., "What are companies that build data centers?").
- Enabling advanced filtering and search capabilities for all stocks listed on the New York Stock Exchange (NYSE) based on key metrics, including Market Capitalization, Volume, Sector, and more.
- Leveraging state-of-the-art AI technologies to bridge the gap between textual data analysis and actionable investment insights.
- Users can enter complex queries in natural language to identify stocks meeting specific criteria.
- Example: "Show me tech companies with a market capitalization greater than $10 billion."
- Advanced search options for stocks based on:
- Market Capitalization
- Volume
- Industry/Sector
- And more.
- Integrates Large Language Models to analyze sentiment from news articles, reports, and other textual sources, enhancing decision-making.
- Retrieves up-to-date market data using Yahoo Finance (yFinance).
- Uses vector embeddings and similarity search to match user queries with relevant stocks efficiently.
- Streamlit: Interactive and user-friendly web interface.
- Pinecone: Vector database for fast similarity search.
- OpenAI API: Natural Language Processing (NLP) with GPT models.
- Groq API: High-performance AI computing for model execution.
- LangChain: Framework for working with LLMs and embeddings.
- HuggingFace Sentence Transformers: For embedding textual data.
- scikit-learn: To compute cosine similarity between embeddings.
- yFinance: Real-time market data retrieval.
- dotenv: Securely manage environment variables.
- NumPy: Data manipulation and analysis.
- Requests: To handle API requests.
- Python 3.8+
- API keys for OpenAI, Pinecone, and Groq.
-
Clone the repository:
git clone https://github.com/sheicky/stock_analysis_with_LLM.git cd stock_analysis_with_LLM -
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables:
Create a.envfile in the root directory and add the following:OPENAI_API_KEY=<your_openai_api_key> PINECONE_API_KEY=<your_pinecone_api_key> GROQ_API_KEY=<your_groq_api_key>
-
Run the Streamlit application:
streamlit run app.py
check my demo here on youtbe : https://www.youtube.com/watch?v=M9TzqpBcggg
-
Query Processing
- User inputs are processed using OpenAI’s GPT model, converting natural language queries into actionable search commands.
-
Stock Retrieval
- Stocks are filtered using Yahoo Finance data and further refined using vector similarity with Pinecone.
-
Sentiment Analysis
- News articles and reports are embedded using HuggingFace Sentence Transformers, and sentiment scores are computed to aid trading decisions.
- Deep Sentiment Analysis: Integrate advanced LLMs for context-aware sentiment scoring.
- Multi-Market Support: Extend coverage to global stock markets beyond the NYSE.
- Prediction Models: Incorporate time-series forecasting for price and volume trends.