XLikes Agent

This project automates the process of collecting, processing, and organizing your liked content from X (Twitter) into a structured Knowledge Base.

Features

Ingestion: Fetches liked tweets via Twitter API (or uses mock data for testing).
Extraction: Fetches and extracts content from linked URLs (HTML, PDF).
Analysis: Uses LLM (via LM Studio) to classify content, generate summaries, and extract key takeaways.
- First-pass X classification: post vs article.
- Post subtype classification: normal_post, comment_or_reply, paper_or_project_reco, course_teaching, release.
- comment_or_reply posts are ignored.
- Article outputs include title, abstract, and ~5 keywords.
- Paper/project/course/release posts include structured links and metadata when available.
- Coarse content classification is still retained: paper, article (X Article), long_blog, blog, thread, comment, etc.
- Fine AI taxonomy (for paper/article/long_blog/blog): e.g. LLM, RAG, Agent, MLOps, Evaluation.
Knowledge Base: Generates a Markdown-based knowledge base with:
- Individual item files
- Tag index
- Weekly digests
Tag Control: Applies a tag allowlist from keep.txt (or TAGS_KEEP_FILE) to prevent noisy tags.

Prerequisites

Python 3.8+
LM Studio running locally (compatible with OpenAI API).
- Start LM Studio server on http://127.0.0.1:1234.
- Load a model (e.g., Mistral, Llama 3).
Twitter API Bearer Token (Optional, for real data).

Installation

Clone the repository.

Set up the environment:

# Create virtual environment
python3 -m venv xlike
source xlike/bin/activate

# Install dependencies
pip install -r requirements.txt

# Install Playwright browsers
playwright install

Configure environment:

cp .env.example .env
# Edit .env with your configuration

Usage

You can use the helper script start.sh to automatically activate the environment and run the script.

Run with Mock Data (Default)

This will generate 5 mock items to test the pipeline.

./start.sh --mock --limit 5

Run with Real Twitter Data

Ensure TWITTER_BEARER_TOKEN is set in .env for API mode, or TWITTER_USERNAME/PASSWORD for Browser mode.

API Mode (Fast, but limited metadata):

./start.sh --api --limit 10

Browser Mode (Recommended for full classification): Uses Playwright to login and scrape likes, handling threads/blogs classification.

./start.sh --browser --limit 10
# Run in visible mode to debug login
./start.sh --browser --visible --limit 10

Run Directly With Real Chrome Profile

Use your normal Chrome profile to avoid repeated login challenges.

Fully quit Chrome first (important, or you may hit ProcessSingleton lock).
Run with system Chrome profile path:

BROWSER_USER_DATA_DIR="$HOME/Library/Application Support/Google/Chrome" \
CHROME_PROFILE_DIR="Default" \
PROXY_URL='' \
HEADLESS=false \
./start.sh --browser --visible --all --since 2026-01-01 --until 2026-03-09

If your daily profile is not Default, set CHROME_PROFILE_DIR to Profile 1 / Profile 2, etc.

Optional: CDP Mode

If direct profile mode conflicts with Chrome locks or startup errors, use CDP mode.

Practical setup tested on 2026-03-10:

# 1) Prepare a reusable profile clone (one-time or occasional refresh)
mkdir -p /tmp/chrome_cdp_userdata
rsync -a --delete "$HOME/Library/Application Support/Google/Chrome/Default" /tmp/chrome_cdp_userdata/
cp "$HOME/Library/Application Support/Google/Chrome/Local State" /tmp/chrome_cdp_userdata/Local\ State

# 2) Start real Chrome with CDP
open -na "Google Chrome" --args \
  --remote-debugging-port=9222 \
  --user-data-dir=/tmp/chrome_cdp_userdata \
  --profile-directory=Default

# 3) Run x-likes via CDP for a date range
CHROME_CDP_URL='http://127.0.0.1:9222' PROXY_URL='' HEADLESS=false \
./start.sh --browser --all --since 2026-03-09 --until 2026-03-10

Legacy direct-CDP command (may fail on newer Chrome builds when using default profile dir directly):

"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
  --remote-debugging-port=9222 \
  --user-data-dir="$HOME/Library/Application Support/Google/Chrome" \
  --profile-directory=Default

CHROME_CDP_URL="http://127.0.0.1:9222" ./start.sh --browser --visible --limit 10

Current Run Status (2026-03-10, Post/Article validation)

Range tested: 2026-03-09 to 2026-03-10.
Ingestion + classification + summarization completed successfully in CDP mode.
5 items were processed (all matched items were on 2026-03-09).
Post/Article outputs were generated as expected:
- article: 1
- post: 4 (release: 1, normal_post: 3)
Example outputs:
- output/items/2026/03/2026-03-09-gpt-oss-inference-from-scratch-2030893100181401958.md (article with Abstract/Keywords)
- output/items/2026/03/2026-03-09-2031120122619060445.md (post release with Release Info)

Options

--limit N: Number of items to process.
--all: Try to fetch all likes in the date range.
--since YYYY-MM-DD: Inclusive start date filter.
--until YYYY-MM-DD: Inclusive end date filter.
--visible: Run browser in visible mode.
--sync-obsidian: After processing, sync markdown reports to Obsidian.
--sync-only: Sync existing markdown reports to Obsidian only (no ingestion/classification/summarization).

Example: Jan 1 to Mar 9

./start.sh --browser --all --since 2026-01-01 --until 2026-03-09 --visible

Project Structure

src/agents: LLM agents for classification and summarization.
src/browser: Browser automation (Playwright) and HTTP fetching.
src/ingest: Data ingestion (Twitter API, Mock).
src/knowledge_base: Markdown generation logic.
src/parser: Content extraction (Readability, PyPDF).
data/: Raw data storage (optional).
output/: Generated Knowledge Base.

Output

Check the output/ directory for the generated Markdown files.

output/items/YYYY/MM/: Individual Markdown files for each liked item, partitioned by year/month.
output/tags/: Index files for each tag.
output/weekly/: Weekly digest files.

Obsidian Sync (Obsidian CLI)

This project supports syncing generated markdown reports into an Obsidian vault via Obsidian CLI.

Enable/install Obsidian CLI in Obsidian (1.12+).

Configure vault path in .env:

OBSIDIAN_VAULT_PATH=/absolute/path/to/your/vault
OBSIDIAN_XLIKE_ROOT=x-like
OBSIDIAN_SYNC_ITEMS_LIMIT=20
OBSIDIAN_CLI_BIN=obsidian

Run sync:

# Run full x-likes pipeline and then sync
./start.sh --browser --limit 10 --sync-obsidian

# Sync only existing output markdown (no new fetch/analyze)
./start.sh --sync-only

Sync policy:

items: sync only most recent N markdown files (OBSIDIAN_SYNC_ITEMS_LIMIT), append-only (existing files are skipped).
weekly: sync all weekly markdown files, overwrite enabled to keep weekly reports up to date.

Current Status (2026-03-10)

Real vault integration was tested against:

/Users/wonster/Library/Mobile Documents/iCloud~md~obsidian/Documents/Notes

What is working:

x-like/ root folder is created in vault.
weekly report sync works and supports overwrite.
Most items files sync correctly in append-only mode.
Sync logic now verifies that target file content was actually written after each CLI create call.
Sync logic now uses command timeouts to prevent indefinite hanging.

Known issue observed on current Obsidian install:

Obsidian CLI prints installer warning (out of date) and can become unstable.
In test runs, two target item files repeatedly failed to land in vault despite prior "created" logs:
- items/2026/03/2026-03-09-2030896015830827038.md
- items/2026/03/2026-03-09-gpt-oss-inference-from-scratch-2030893100181401958.md
With strict verification enabled, these cases now fail fast with explicit timeout/error instead of silently passing.

Operational note:

Before debugging, a vault backup was created at:
- /tmp/obsidian-vault-Notes-backup-20260310-153844

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
keep.txt		keep.txt
requirements.txt		requirements.txt
run.py		run.py
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XLikes Agent

Features

Prerequisites

Installation

Usage

Run with Mock Data (Default)

Run with Real Twitter Data

Run Directly With Real Chrome Profile

Optional: CDP Mode

Current Run Status (2026-03-10, Post/Article validation)

Options

Example: Jan 1 to Mar 9

Project Structure

Output

Obsidian Sync (Obsidian CLI)

Current Status (2026-03-10)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

XLikes Agent

Features

Prerequisites

Installation

Usage

Run with Mock Data (Default)

Run with Real Twitter Data

Run Directly With Real Chrome Profile

Optional: CDP Mode

Current Run Status (2026-03-10, Post/Article validation)

Options

Example: Jan 1 to Mar 9

Project Structure

Output

Obsidian Sync (Obsidian CLI)

Current Status (2026-03-10)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages