Spotify User Board Simulation 👥🎙️

🚀 Overview

A research-driven project that blends classic ML for large-scale review analysis with a multi-agent LLM layer (LangGraph + OpenAI) to simulate a virtual user-board session. Result: product teams get high-quality, data-grounded insights in hours—not weeks—slashing the time and cost of early-stage user research.

Read the full article description - https://open.substack.com/pub/vvk93/p/from-reviews-to-roadmap-building?r=t37oj&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

📂 Project Structure

.
├── Vladimir_Kovtunovskiy/homework2-userboard-simulation/
│   ├── cluster_outputs/          # Output directory for review clustering results
│   │   └── clusters_data.json    # JSON file containing clustered review data and keywords
│   ├── multiagent_outputs/       # Output directory for the user board simulation
│   │   ├── board_session.log     # Detailed log file for the simulation run
│   │   └── userboard_report.md   # Final markdown report summarizing the simulation
│   ├── data_types.py             # Defines shared data structures (Persona, FeatureProposal)
│   ├── persona_generator.py      # Generates user personas from cluster data using an LLM
│   ├── review_prep_pipeline.py   # Processes, cleans, embeds, and clusters user reviews
│   ├── board_simulation.py       # Core logic for simulating the multi-agent discussion
│   ├── userboard_pipeline.py     # Main script orchestrating the entire pipeline (clustering -> personas -> simulation)
│   ├── requirements.txt          # Python package dependencies
│   ├── spotify_reviews.csv       # Input dataset of Spotify user reviews
│   └── README.md                 # This file
└── ... (other project files/folders)

📄 File Descriptions

spotify_reviews.csv: The raw input data containing user reviews for Spotify.
requirements.txt: Lists the necessary Python libraries required to run the project.
data_types.py: Contains Python dataclass definitions for Persona and FeatureProposal, ensuring consistent data handling across modules.
review_prep_pipeline.py:
- Loads reviews from spotify_reviews.csv.
- Cleans and preprocesses the review text.
- Calculates sentiment scores (positive, neutral, negative).
- Generates text embeddings using Sentence Transformers (optimized for MPS).
- Performs dimensionality reduction using UMAP.
- Clusters the reviews using K-Means, determining the optimal 'k'.
- Extracts relevant keywords for each cluster using TF-IDF.
- Saves the clustering results, including keywords and sample reviews, to cluster_outputs/clusters_data.json.
persona_generator.py:
- Reads the clusters_data.json file.
- Uses an LLM (e.g., GPT-4) to generate distinct user Persona objects based on the characteristics and feedback within selected clusters.
board_simulation.py:
- Takes generated Persona objects and proposed FeatureProposals as input.
- Initializes AI agents for each persona and a facilitator agent using LangChain/LangGraph.
- Simulates a structured discussion over several rounds, where the facilitator asks questions about the features and personas respond based on their profiles.
- Captures the entire discussion transcript.
userboard_pipeline.py:
- Acts as the main entry point and orchestrator.
- Sequentially runs the review preparation (if clusters_data.json doesn't exist or needs updating, though currently relies on pre-existing file), persona generation, feature ideation (based on clusters), and board simulation steps.
- Uses LangGraph to manage the state and flow between these steps.
- Generates a final summary report (userboard_report.md) in the multiagent_outputs directory.
cluster_outputs/: Directory where the output of review_prep_pipeline.py is stored.
multiagent_outputs/: Directory where the outputs of the userboard_pipeline.py (logs and the final report) are stored.

⚙️ Setup

Clone the repository:

git clone <your-repo-url>
cd <your-repo-directory>/board-simulation

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install dependencies:
```
pip install -r requirements.txt
```

Download NLTK data (if not already present): Run Python and execute:

import nltk
nltk.download('vader_lexicon')
nltk.download('stopwords')

Set up OpenAI API Key: Create a .env file in the BoardSimulation directory and add your OpenAI API key:
```
OPENAI_API_KEY='your_openai_api_key_here'
```
Alternatively, set it as an environment variable.

▶️ Usage

The main pipeline can be executed by running the userboard_pipeline.py script:

python userboard_pipeline.py

This script will:

Load cluster data from cluster_outputs/clusters_data.json. (Note: It assumes this file exists. You might need to run review_prep_pipeline.py separately first if it doesn't, although the pipeline currently doesn't automatically trigger it).
```
# To generate cluster data (if needed):
python review_prep_pipeline.py
```
Select top clusters based on negative sentiment.
Generate feature ideas based on selected clusters using an LLM.
Generate user personas based on selected clusters using an LLM.
Run the board simulation with the generated personas and features.
Generate a summary report (userboard_report.md) and log file (board_session.log) in the multiagent_outputs directory.

Individual Scripts

You can also run the review_prep_pipeline.py script independently if you only need to perform the review clustering:

# Uses default input ./spotify_reviews.csv and output ./cluster_outputs/
python review_prep_pipeline.py

# Specify input/output
python review_prep_pipeline.py --csv path/to/reviews.csv --out path/to/output_dir

✨ Key Features

AI Agent Simulation: Leverages LLMs to simulate realistic user personas and discussions.
Data-Driven Personas: Personas are grounded in real user feedback clusters.
Automated Insight Generation: Streamlines the process of understanding user sentiment and potential feature reception.
Modular Pipeline: Code is organized into distinct, reusable modules.
MPS Acceleration: Utilizes Apple Silicon GPUs for faster embedding generation in the clustering pipeline.
LangGraph Orchestration: Uses LangGraph for managing the multi-step simulation pipeline state.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spotify User Board Simulation 👥🎙️

🚀 Overview

📂 Project Structure

📄 File Descriptions

⚙️ Setup

▶️ Usage

Individual Scripts

✨ Key Features

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
cluster_outputs		cluster_outputs
multiagent_outputs		multiagent_outputs
tests		tests
.gitignore		.gitignore
README.md		README.md
board_simulation.py		board_simulation.py
data_types.py		data_types.py
persona_generator.py		persona_generator.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt
review_prep_pipeline.py		review_prep_pipeline.py
spotify_reviews.csv		spotify_reviews.csv
userboard_pipeline.py		userboard_pipeline.py

Folders and files

Latest commit

History

Repository files navigation

Spotify User Board Simulation 👥🎙️

🚀 Overview

📂 Project Structure

📄 File Descriptions

⚙️ Setup

▶️ Usage

Individual Scripts

✨ Key Features

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages