Skip to content

Wooonster/XLikes-Agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

XLikes Agent

This project automates the process of collecting, processing, and organizing your liked content from X (Twitter) into a structured Knowledge Base.

Features

  • Ingestion: Fetches liked tweets via Twitter API (or uses mock data for testing).
  • Extraction: Fetches and extracts content from linked URLs (HTML, PDF).
  • Analysis: Uses LLM (via LM Studio) to classify content, generate summaries, and extract key takeaways.
    • First-pass X classification: post vs article.
    • Post subtype classification: normal_post, comment_or_reply, paper_or_project_reco, course_teaching, release.
    • comment_or_reply posts are ignored.
    • Article outputs include title, abstract, and ~5 keywords.
    • Paper/project/course/release posts include structured links and metadata when available.
    • Coarse content classification is still retained: paper, article (X Article), long_blog, blog, thread, comment, etc.
    • Fine AI taxonomy (for paper/article/long_blog/blog): e.g. LLM, RAG, Agent, MLOps, Evaluation.
  • Knowledge Base: Generates a Markdown-based knowledge base with:
    • Individual item files
    • Tag index
    • Weekly digests
  • Tag Control: Applies a tag allowlist from keep.txt (or TAGS_KEEP_FILE) to prevent noisy tags.

Prerequisites

  1. Python 3.8+
  2. LM Studio running locally (compatible with OpenAI API).
    • Start LM Studio server on http://127.0.0.1:1234.
    • Load a model (e.g., Mistral, Llama 3).
  3. Twitter API Bearer Token (Optional, for real data).

Installation

  1. Clone the repository.
  2. Set up the environment:
    # Create virtual environment
    python3 -m venv xlike
    source xlike/bin/activate
    
    # Install dependencies
    pip install -r requirements.txt
    
    # Install Playwright browsers
    playwright install
  3. Configure environment:
    cp .env.example .env
    # Edit .env with your configuration

Usage

You can use the helper script start.sh to automatically activate the environment and run the script.

Run with Mock Data (Default)

This will generate 5 mock items to test the pipeline.

./start.sh --mock --limit 5

Run with Real Twitter Data

Ensure TWITTER_BEARER_TOKEN is set in .env for API mode, or TWITTER_USERNAME/PASSWORD for Browser mode.

API Mode (Fast, but limited metadata):

./start.sh --api --limit 10

Browser Mode (Recommended for full classification): Uses Playwright to login and scrape likes, handling threads/blogs classification.

./start.sh --browser --limit 10
# Run in visible mode to debug login
./start.sh --browser --visible --limit 10

Run Directly With Real Chrome Profile

Use your normal Chrome profile to avoid repeated login challenges.

  1. Fully quit Chrome first (important, or you may hit ProcessSingleton lock).
  2. Run with system Chrome profile path:
BROWSER_USER_DATA_DIR="$HOME/Library/Application Support/Google/Chrome" \
CHROME_PROFILE_DIR="Default" \
PROXY_URL='' \
HEADLESS=false \
./start.sh --browser --visible --all --since 2026-01-01 --until 2026-03-09

If your daily profile is not Default, set CHROME_PROFILE_DIR to Profile 1 / Profile 2, etc.

Optional: CDP Mode

If direct profile mode conflicts with Chrome locks or startup errors, use CDP mode.

Practical setup tested on 2026-03-10:

# 1) Prepare a reusable profile clone (one-time or occasional refresh)
mkdir -p /tmp/chrome_cdp_userdata
rsync -a --delete "$HOME/Library/Application Support/Google/Chrome/Default" /tmp/chrome_cdp_userdata/
cp "$HOME/Library/Application Support/Google/Chrome/Local State" /tmp/chrome_cdp_userdata/Local\ State

# 2) Start real Chrome with CDP
open -na "Google Chrome" --args \
  --remote-debugging-port=9222 \
  --user-data-dir=/tmp/chrome_cdp_userdata \
  --profile-directory=Default

# 3) Run x-likes via CDP for a date range
CHROME_CDP_URL='http://127.0.0.1:9222' PROXY_URL='' HEADLESS=false \
./start.sh --browser --all --since 2026-03-09 --until 2026-03-10

Legacy direct-CDP command (may fail on newer Chrome builds when using default profile dir directly):

"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
  --remote-debugging-port=9222 \
  --user-data-dir="$HOME/Library/Application Support/Google/Chrome" \
  --profile-directory=Default

CHROME_CDP_URL="http://127.0.0.1:9222" ./start.sh --browser --visible --limit 10

Current Run Status (2026-03-10, Post/Article validation)

  • Range tested: 2026-03-09 to 2026-03-10.
  • Ingestion + classification + summarization completed successfully in CDP mode.
  • 5 items were processed (all matched items were on 2026-03-09).
  • Post/Article outputs were generated as expected:
    • article: 1
    • post: 4 (release: 1, normal_post: 3)
  • Example outputs:
    • output/items/2026/03/2026-03-09-gpt-oss-inference-from-scratch-2030893100181401958.md (article with Abstract/Keywords)
    • output/items/2026/03/2026-03-09-2031120122619060445.md (post release with Release Info)

Options

  • --limit N: Number of items to process.
  • --all: Try to fetch all likes in the date range.
  • --since YYYY-MM-DD: Inclusive start date filter.
  • --until YYYY-MM-DD: Inclusive end date filter.
  • --visible: Run browser in visible mode.
  • --sync-obsidian: After processing, sync markdown reports to Obsidian.
  • --sync-only: Sync existing markdown reports to Obsidian only (no ingestion/classification/summarization).

Example: Jan 1 to Mar 9

./start.sh --browser --all --since 2026-01-01 --until 2026-03-09 --visible

Project Structure

  • src/agents: LLM agents for classification and summarization.
  • src/browser: Browser automation (Playwright) and HTTP fetching.
  • src/ingest: Data ingestion (Twitter API, Mock).
  • src/knowledge_base: Markdown generation logic.
  • src/parser: Content extraction (Readability, PyPDF).
  • data/: Raw data storage (optional).
  • output/: Generated Knowledge Base.

Output

Check the output/ directory for the generated Markdown files.

  • output/items/YYYY/MM/: Individual Markdown files for each liked item, partitioned by year/month.
  • output/tags/: Index files for each tag.
  • output/weekly/: Weekly digest files.

Obsidian Sync (Obsidian CLI)

This project supports syncing generated markdown reports into an Obsidian vault via Obsidian CLI.

  1. Enable/install Obsidian CLI in Obsidian (1.12+).
  2. Configure vault path in .env:
    OBSIDIAN_VAULT_PATH=/absolute/path/to/your/vault
    OBSIDIAN_XLIKE_ROOT=x-like
    OBSIDIAN_SYNC_ITEMS_LIMIT=20
    OBSIDIAN_CLI_BIN=obsidian
  3. Run sync:
    # Run full x-likes pipeline and then sync
    ./start.sh --browser --limit 10 --sync-obsidian
    
    # Sync only existing output markdown (no new fetch/analyze)
    ./start.sh --sync-only

Sync policy:

  • items: sync only most recent N markdown files (OBSIDIAN_SYNC_ITEMS_LIMIT), append-only (existing files are skipped).
  • weekly: sync all weekly markdown files, overwrite enabled to keep weekly reports up to date.

Current Status (2026-03-10)

Real vault integration was tested against:

  • /Users/wonster/Library/Mobile Documents/iCloud~md~obsidian/Documents/Notes

What is working:

  • x-like/ root folder is created in vault.
  • weekly report sync works and supports overwrite.
  • Most items files sync correctly in append-only mode.
  • Sync logic now verifies that target file content was actually written after each CLI create call.
  • Sync logic now uses command timeouts to prevent indefinite hanging.

Known issue observed on current Obsidian install:

  • Obsidian CLI prints installer warning (out of date) and can become unstable.
  • In test runs, two target item files repeatedly failed to land in vault despite prior "created" logs:
    • items/2026/03/2026-03-09-2030896015830827038.md
    • items/2026/03/2026-03-09-gpt-oss-inference-from-scratch-2030893100181401958.md
  • With strict verification enabled, these cases now fail fast with explicit timeout/error instead of silently passing.

Operational note:

  • Before debugging, a vault backup was created at:
    • /tmp/obsidian-vault-Notes-backup-20260310-153844

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors