Skip to content
Merged
110 changes: 110 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,110 @@
name: Bug report
description: Something in bytebell-server or the bytebell CLI is broken or behaving unexpectedly.
title: "[bug] "
labels: ["bug", "needs-triage"]
body:
- type: markdown
attributes:
value: |
Thanks for taking the time to file a bug! A few quick tips:
- Please redact API keys, tokens, and private repo URLs from anything you paste.
- Server and CLI logs live in `~/.bytebell/logs/`.
- Only the first four fields are required — fill in the rest if you have it handy.

- type: textarea
id: summary
attributes:
label: What happened?
description: A short description of the bug. What did you do, and what went wrong?
placeholder: I ran `bytebell index https://github.com/...` and the worker stalled at PROCESSING.
validations:
required: true

- type: textarea
id: repro
attributes:
label: Steps to reproduce
description: The minimum sequence of commands or clicks needed to see the bug.
placeholder: |
1. `bytebell boot`
2. `bytebell index <repo-url>`
3. Wait ~2 minutes
validations:
required: true

- type: textarea
id: expected
attributes:
label: What did you expect to happen?
validations:
required: true

- type: input
id: version
attributes:
label: Bytebell version
description: Output of `bytebell --version`.
placeholder: e.g. 0.4.2
validations:
required: true

- type: dropdown
id: component
attributes:
label: Which part of Bytebell is affected?
description: Pick whatever feels closest — "Not sure" is fine.
multiple: true
options:
- "Not sure"
- "bytebell-server (HTTP / MCP / workers)"
- "bytebell CLI / TUI"
- "Ingestion (@bb/ingest-github)"
- "MCP surface (@bb/mcp)"
- "Adapter (@bb/mongo / @bb/neo4j / @bb/redis)"
- "LLM layer (@bb/llm)"
- "Config / first-run setup (@bb/config)"
- "Docs / README"

- type: dropdown
id: llm
attributes:
label: LLM provider
options:
- "Not applicable / don't know"
- "OpenRouter"
- "Ollama (local)"

- type: input
id: os
attributes:
label: OS and architecture
placeholder: e.g. macOS 14.5 arm64, Ubuntu 22.04 x86_64

- type: input
id: bun
attributes:
label: Bun version
description: Output of `bun --version`.

- type: textarea
id: logs
attributes:
label: Logs or error output
description: Paste relevant lines from `~/.bytebell/logs/server-YYYY-MM-DD.log` or `cli-YYYY-MM-DD.log`. Redact secrets.
render: shell

- type: textarea
id: context
attributes:
label: Anything else?
description: Repo you were ingesting, recent config changes, screenshots — whatever might help.

- type: checkboxes
id: preflight
attributes:
label: Before you submit
options:
- label: I searched existing issues and didn't find a duplicate.
required: true
- label: I've redacted any secrets from the information above.
required: true
11 changes: 11 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
blank_issues_enabled: false
contact_links:
- name: Security vulnerability
url: https://github.com/ByteBell/bytebell-oss/blob/main/SECURITY.md
about: Please do NOT open a public issue. Email team@bytebell.ai — see SECURITY.md for details and PGP info.
- name: Question, help, or discussion
url: https://github.com/ByteBell/bytebell-oss/discussions
about: For "how do I…", design discussions, or sharing what you built — use Discussions, not issues.
- name: Read the contributing guide
url: https://github.com/ByteBell/bytebell-oss/blob/main/contributing.md
about: New here? The contributing guide explains the architecture, package layout, and how to get a local dev loop running.
46 changes: 46 additions & 0 deletions .github/ISSUE_TEMPLATE/feature_request.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: Feature request
description: Suggest a new capability or improvement for Bytebell.
title: "[feat] "
labels: ["enhancement", "needs-triage"]
body:
- type: markdown
attributes:
value: |
Have an idea? Great — describe the problem first, then your proposal. Don't worry
about getting the architecture right; maintainers will help shape it during triage.

- type: textarea
id: problem
attributes:
label: What problem are you trying to solve?
description: Be concrete — a real workflow, a repo you couldn't ingest, a query you couldn't run.
validations:
required: true

- type: textarea
id: proposal
attributes:
label: What would you like to see?
description: Your proposed solution. Rough sketches are fine.
validations:
required: true

- type: textarea
id: alternatives
attributes:
label: Alternatives you considered
description: Workarounds you tried, other tools, or different shapes for the same idea.

- type: textarea
id: context
attributes:
label: Additional context
description: Links, screenshots, prior art in other projects, related issues.

- type: checkboxes
id: preflight
attributes:
label: Before you submit
options:
- label: I searched existing issues and discussions and didn't find a duplicate.
required: true
62 changes: 0 additions & 62 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,67 +1,5 @@
# Bytebell [bytebell.ai]

## What is this and why does it exist

If you've ever worked on a codebase that spans multiple repositories, you already know the pain. You open up your copilot or your coding agent, you ask it something like "how does this authentication service in Repo A affect the user management flow in Repo B" and it just goes blank.

It either hallucinates an answer or tells you it doesn't have enough context. And the thing is, its not the model's fault. The model is smart enough. The problem is that no tool in the current ecosystem gives it the right context to work with.

Every code intelligence tool today, whether its vector search based like **claude-context** and **Cody**, or AST based like **Serena** and **code-graph-mcp**, or even production grade tools like **Sourcegraph** with **SCIP** indexing, they all do fundamentally the same thing.

They read your code structurally. They parse syntax trees, they map function calls, they build embedding vectors. And all of that is useful to some degree but it completely misses the question that actually matters which is what is this code for. Not what it looks like syntactically, not what functions it calls, but what is its purpose, what business logic does it encode, why does it exist in the first place, and how does it connect to code in entirely different repositories that were written by entirely different teams.

**ByteBell** takes a fundamentally different approach. Instead of parsing structure and hoping the model figures out meaning at query time, we run an LLM once at index time across your entire codebase. The LLM reads every file and extracts its **semantic purpose**, its **business role**, its **cross-repo dependencies**, what it does and why it does it. All of that understanding gets stored in a **persistent semantic graph** that lives across sessions, across models, across every copilot and agent you use. You pay the LLM cost once during indexing and then every tool in your stack benefits from that understanding forever.

The key insight here is that every other tool in this space persists an index. ByteBell persists meaning. This is an architectural difference that changes everything downstream.

## How it actually works

When you point ByteBell at your repositories, it does a one time semantic indexing pass. For every file, it sends the code to an LLM with a carefully designed prompt that asks it to extract several things: what does this file do in plain english, what business domain does it belong to, how does it relate to other files in this repo and in other repos, what would break if you changed it, and what is the intent behind the key functions and classes.
All of these semantic annotations get stored in a graph database where the nodes are files, functions, and concepts, and the edges are semantic relationships like "this service depends on that authentication module for JWT validation" rather than just "this file imports that file." The difference is massive because when a model later queries this graph it gets back actual understanding, not just a list of files that look syntactically similar to the query.
The graph is persistent and shared. It doesn't disappear when your session ends. It doesn't get rebuilt every time you switch from one copilot to another. Whether you're using Claude, or GPT, or an open source model, or switching between Cursor and Windsurf and Claude Code, they all read from the same semantic graph. Your codebase understanding is decoupled from any single tool or model.

## The cost problem and how we solved it

The obvious objection to using LLMs for indexing is cost. If you're running Claude Opus on every file in a large codebase you'll burn through thousands of dollars before you even start working. So we did something that nobody else has done properly, we ran a systematic benchmark across 14 different models to find the sweet spot between accuracy and cost.
We tested on 30 Kubernetes ecosystem files with roughly 33,800 average input tokens per file and 3,200 output tokens per file. We scored each model across 7 categories including search accuracy, graph quality, semantic understanding, cross-repo integration, section mapping, business context extraction, and JSON formatting. Any model that scored below 70 points was dropped as unusable regardless of how cheap it was.
The results were surprising.
DeepSeek V4 Flash can index 1,000 files at just $7.01 with an accuracy score of 71.13. Thats 100x cheaper than Claude Opus 4.7 at $752.70 and only about 2.3 points behind it in quality. GLM 5.1 sits in a nice balanced spot at $23.24 with 72.22 accuracy. Claude Sonnet 4.6 is the premium quality option at $149.40 with the highest accuracy at 73.56 if you need the absolute best analysis and dont mind paying for it.

Models like GPT 5.4 scored 55.65 which is just completely unusable, and Step 3.5 Flash came in at 69.71 which is cheap but falls below our quality floor.

So the default recommendation is DeepSeek V4 Flash for most use cases because it gives you production quality semantic indexing at a cost thats genuinely negligible even for very large codebases. You can always run a premium model like Sonnet on your most critical repositories if you want that extra 2 points of accuracy.

## What we tested and what we found

We ran ByteBell on the SWE-bench Verified benchmark, specifically on Astropy and OpenTelemetry which are the two most important repositories in that dataset, roughly 8 GB of code combined. We compared task performance with ByteBell's semantic context layer (MCP) versus raw model performance without it.

The per-task results across 55 tasks show something that seems counterintuitive at first. ByteBell feeds the model 22 less context, the average cost per task is actually lower with ByteBell at $0.52 versus $0.73 without it.
The reason is simple, when the model has real semantic understanding of the codebase it stops wasting tokens exploring dead ends, searching through irrelevant files, and guessing at relationships. It knows exactly where to look and what things mean. The result is 60% less cost and 80% faster responses with the same or better accuracy.

But the really important finding was about cross-repository performance. We tested on 150,000+ files across 46 Kubernetes ecosystem repositories and in cross-repo scenarios, even SOTA models with their full prompt caching and claude.md configurations dont just perform poorly, they fail to complete the task entirely. They literally cannot finish. The model runs out of context, gets confused about which repo it's looking at, hallucinates connections that dont exist, and eventually gives up or produces garbage. ByteBell's persistent semantic graph is the only approach we've seen that gives the model enough cross-repo understanding to actually solve these problems end to end.

## How ByteBell compares to existing tools

We did a detailed comparison against the major tools in this space and the differences are architectural, not incremental.
Vector search tools like claude-context and Cody embed your code into vector space and retrieve chunks that are semantically similar to your query. This works okay for simple "find code that looks like this" queries but it fundamentally treats code like english text which it is not.

Code is logic with complex dependency chains, side effects, and implicit contracts that dont show up in embedding similarity. These tools also dont persist understanding across sessions and dont share context across different copilots.

AST and LSP based tools like **Serena** and **code-graph-mcp** parse your code into abstract syntax trees and use language server protocols to map structural relationships. They know what calls what and what imports what but they have zero understanding of business intent. They can tell you that function A calls function B but they cannot tell you why that call exists or what business rule it implements. They also work within a single repository boundary and have no concept of cross-repo semantic connections.

**GitNexus** builds a static AST graph which is essentially a more sophisticated version of the AST approach. It maps out structural relationships across your codebase in a graph format which is useful but again, its purely structural. It knows syntax, not semantics. And it doesn't persist any understanding across sessions.

**Graphify** combines AST parsing with multimodal analysis so it tries to understand code through multiple representations beyond just the syntax tree. Its a step in the right direction but its still fundamentally building a structural graph enriched with pattern matching rather than extracting actual semantic intent. No cross-repo graph, no persistent meaning across sessions.

**Sourcegraph** with **SCIP** indexing is probably the most production grade tool in this list and it does excellent structural code intelligence at scale. But SCIP is a structural indexing format, it gives you precise code navigation and cross-references but not semantic understanding. It also only partially supports cross-repo connections and doesn't share context across different copilots and agents.
Augment is a cloud SaaS approach that does provide some cost reduction per query but it doesn't persist meaning across sessions, doesn't share across copilots, doesn't build a cross-repo semantic graph, and critically it cannot run on-prem or air-gapped which is a dealbreaker for many enterprises.

**ByteBell** is the only tool that checks every box. Persistent semantic understanding across sessions, shared across every copilot and agent, one graph that works with every model, cross-repo semantic connections, business context per commit, fully on-prem and air-gapped capable, 80%+ cost reduction per query, and 20 to 40% accuracy improvement on cross-repo tasks.

Based on what we've seen so far, we believe that open source models with access to ByteBell's semantic context layer can improve their performance by at least 10%-40% compared to current SOTA models running without it. The early results already point strongly in that direction but we need to prove it across the full dataset to make that claim definitively.

If you can sponsor API credits on OpenRouter, OpenAI, or Anthropic, or if you know someone who can, please reach out. Every dollar goes directly into running benchmarks on the complete SWE-bench Verified dataset and we will publish all results openly. This is an open source project and the benchmark results will be open too.

## Quickstart

> Looking for the full CLI reference? Every `bytebell` subcommand, flag, and option lives in **[commands.md](commands.md)**. The Quickstart below is the minimum sequence from zero to a queryable graph.
Expand Down
8 changes: 4 additions & 4 deletions infra/docker/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ services:
container_name: bytebell-mongo
restart: unless-stopped
ports:
- "127.0.0.1:27117:27017"
- "127.0.0.1:${MONGO_HOST_PORT:-27017}:27017"
volumes:
- mongo_data:/data/db
environment:
Expand All @@ -25,8 +25,8 @@ services:
container_name: bytebell-neo4j
restart: unless-stopped
ports:
- "127.0.0.1:7474:7474"
- "127.0.0.1:7787:7687"
- "127.0.0.1:${NEO4J_HTTP_HOST_PORT:-7474}:7474"
- "127.0.0.1:${NEO4J_BOLT_HOST_PORT:-7687}:7687"
volumes:
- neo4j_data:/data
environment:
Expand All @@ -47,7 +47,7 @@ services:
container_name: bytebell-redis
restart: unless-stopped
ports:
- "127.0.0.1:6479:6379"
- "127.0.0.1:${REDIS_HOST_PORT:-6379}:6379"
volumes:
- redis_data:/data
command: ["redis-server", "--appendonly", "yes"]
Expand Down
24 changes: 19 additions & 5 deletions packages/cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,16 +36,30 @@ indexing, configuration, server lifecycle, and inspection.
matching `bytebell set …` hint). Auto-fills blank infra config keys
with local-docker defaults (mongo / neo4j / neo4j-user / redis) and
generates a random Neo4j password if one isn't already set. Writes
`infra/docker/.env`, runs `docker compose -f
`infra/docker/.env` (Neo4j password + host ports derived from the
configured URIs), runs `docker compose -f
infra/docker/docker-compose.yml up -d`, polls
`docker compose ps --format json` until all three services report
`healthy`, then invokes `ensureServerRunning()` (existing helper) to
spawn `bytebell-server`. Idempotent — re-running on an already-up
stack is a fast no-op.
stack is a fast no-op. When a compose host port is already taken,
boot drops into an Ink picker (`PortConflictSelector.tsx`) offering
three choices: reuse the existing service on that port (compose
starts only the unconflicted services), stop the conflicting
container and reuse the port, or change bytebell's host port for
the affected service (mongo / neo4j-bolt / redis URI gets rewritten
via `setConfigValue`, compose env is regenerated, retry). Up to
four conflict rounds before giving up.
- `bytebell shutdown` — sends SIGTERM to the server PID, polls until
the PID file vanishes (≤ 30 s), and prints the `docker compose down`
hint. Docker infra is **left running** by design — warm re-boots are
fast.
the PID file vanishes (≤ 30 s), then asks (Ink prompt
`StopInfraPrompt.tsx`) whether to stop Docker infra too. Default
answer is **Yes** (Enter tears down `mongo + neo4j + redis` via
`docker compose down --remove-orphans`); pressing `n` / Esc keeps the
containers running for fast warm re-boots and prints the manual
`docker compose down` hint. The prompt is skipped when stdin isn't a
TTY (CI-safe — falls back to keeping infra up). Two flags override
the prompt deterministically: `--with-docker` always stops infra,
`--keep-docker` always leaves it running; passing both is rejected.
- `bytebell server start` — low-level wrapper that spawns the server
in the foreground (Ctrl+C to stop). Used during dev; everyday users
prefer `bytebell boot`.
Expand Down
Loading
Loading