Skip to content

Commit b616abc

Browse files
committed
Docs update + og image
1 parent 593798c commit b616abc

4 files changed

Lines changed: 100 additions & 1 deletion

File tree

astro.config.mjs

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,52 @@ export default defineConfig({
3333
href: "https://github.com/chunkhound/chunkhound",
3434
},
3535
],
36+
head: [
37+
// Open Graph
38+
{
39+
tag: "meta",
40+
attrs: {
41+
property: "og:image",
42+
content: "https://chunkhound.github.io/og-image.png",
43+
},
44+
},
45+
{
46+
tag: "meta",
47+
attrs: {
48+
property: "og:image:width",
49+
content: "1200",
50+
},
51+
},
52+
{
53+
tag: "meta",
54+
attrs: {
55+
property: "og:image:height",
56+
content: "630",
57+
},
58+
},
59+
{
60+
tag: "meta",
61+
attrs: {
62+
property: "og:type",
63+
content: "website",
64+
},
65+
},
66+
// Twitter Card
67+
{
68+
tag: "meta",
69+
attrs: {
70+
name: "twitter:card",
71+
content: "summary_large_image",
72+
},
73+
},
74+
{
75+
tag: "meta",
76+
attrs: {
77+
name: "twitter:image",
78+
content: "https://chunkhound.github.io/og-image.png",
79+
},
80+
},
81+
],
3682
sidebar: [
3783
{ label: "Quickstart", slug: "quickstart" },
3884
{ label: "How-To Guides", slug: "how-to" },

public/og-image.png

91.7 KB
Loading

public/og-image.svg

Lines changed: 47 additions & 0 deletions
Loading

src/content/docs/code-research.mdx

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -326,7 +326,13 @@ This implements a classic **precision-recall tradeoff**—cast a wide net during
326326

327327
### Map-Reduce Synthesis with Clustering
328328

329-
When filtered results exceed single-LLM context limits (>150k tokens for large repositories), the system uses **token-bounded K-means clustering** to prevent context collapse. Files are partitioned into clusters of max 30k tokens each, synthesized in parallel with cluster-local citations [1][2][3], then deterministically remapped to global numbers before the reduce phase combines summaries.
329+
When semantic clustering produces multiple clusters from filtered results, the system uses **two-phase HDBSCAN clustering with map-reduce synthesis** to prevent context collapse:
330+
331+
1. **Phase 1 (Natural Boundary Discovery)**: HDBSCAN (Hierarchical Density-Based Spatial Clustering) discovers natural semantic boundaries in the embedding space, grouping files where they are cohesively related rather than forcing arbitrary partitions. This respects the inherent structure of your codebase, identifying both semantically dense clusters and outliers that don't fit natural groupings.
332+
333+
2. **Phase 2 (Token-Budget Grouping)**: Clusters are greedily merged based on centroid distance while respecting the 30k token limit per cluster, preserving semantic coherence during merging.
334+
335+
Files are partitioned into token-bounded clusters, synthesized in parallel with cluster-local citations [1][2][3], then deterministically remapped to global numbers before the reduce phase combines summaries.
330336

331337
This avoids **progressive compression loss** from iterative summarization chains (summary → summary-of-summary). Each cluster synthesizes once with full context, preserving architectural details while enabling arbitrary scaling. Cluster-local citation namespaces enable maximum parallelism—no coordination needed during map phase. The reduce LLM integrates remapped summaries with explicit instructions to preserve citations (not generate new ones), ensuring every [N] traces to actual source files.
332338

0 commit comments

Comments
 (0)