Skip to content

Latest commit

 

History

History
765 lines (578 loc) · 21.2 KB

File metadata and controls

765 lines (578 loc) · 21.2 KB

code-dna Programmatic API Reference

This document covers the public library API exported from code-dna/lib. For CLI usage, see the README. For MCP server usage, see MCP.md.

Table of Contents


Installation

npm install code-dna

Requires Node.js 20 or later.

Import from the library entry point:

import { analyze, formatMarkdown, formatYaml } from 'code-dna/lib';

Functions

analyze

Run the full 4-layer code-dna analysis pipeline on a project directory.

async function analyze(
  rootPath: string,
  options: Partial<AnalysisOptions>,
): Promise<DnaOutput>

Parameters:

Parameter Type Description
rootPath string Absolute or relative path to the project root to analyse. Resolved to an absolute path internally.
options Partial<AnalysisOptions> Analysis configuration. All fields are optional; omitted fields use defaults or values from .codedna.yaml.

Returns: Promise<DnaOutput> — A fully-populated DNA output object ready for formatting.

Example:

import { analyze } from 'code-dna/lib';

const dna = await analyze('/path/to/project', {
  layers: [1, 2, 3, 4],
  tokenBudget: 8000,
  gitDepth: 500,
});

console.log(dna.metadata.projectName);
console.log(dna.metadata.totalFiles);

Layer selection:

// Structural skeleton only (no git, no patterns, no risk)
const dna = await analyze('/path/to/project', { layers: [1] });

// Skip git archaeology
const dna = await analyze('/path/to/project', { layers: [1, 3, 4] });

// Language filter
const dna = await analyze('/path/to/project', {
  layers: [1, 2, 3, 4],
  languages: ['typescript', 'python'],
});

Error handling:

Layer 2 (git) failures do not abort the pipeline. If git history cannot be read, the analysis continues with empty git results. Other errors reject the promise.

try {
  const dna = await analyze('/path/to/project', {});
} catch (err) {
  console.error('Analysis failed:', err.message);
}

formatMarkdown

Format a DnaOutput as Markdown using an explicit token budget allocation.

function formatMarkdown(dna: DnaOutput, budget: SectionBudget): string

Parameters:

Parameter Type Description
dna DnaOutput Fully-populated DNA output from analyze().
budget SectionBudget Token allocation per section. Use allocateTokenBudget() to compute this.

Returns: string — UTF-8 Markdown document.

Example:

import { analyze, formatMarkdown, allocateTokenBudget } from 'code-dna/lib';

const dna = await analyze('/path/to/project', {});

const budget = allocateTokenBudget(
  8000,
  { architecture: 15, moduleMap: 25, dependencies: 15, conventions: 15,
    hotFiles: 10, riskSurface: 10, apiSurface: 5 },
  {
    architecture: dna.architecture.layers.length * 50 + 100,
    moduleMap: (dna.skeleton.root.children?.length ?? 1) * 40,
    dependencies: dna.dependencies.edges.length * 30,
    conventions: 200,
    hotFiles: dna.hotFiles.length * 60,
    riskSurface: dna.riskSurface.length * 70,
    apiSurface: dna.apiSurface.length * 50,
  },
);

const markdown = formatMarkdown(dna, budget);

formatYaml

Format a DnaOutput as YAML. Unlike Markdown, YAML output includes the full dataset with no token-based truncation.

function formatYaml(dna: DnaOutput): string

Parameters:

Parameter Type Description
dna DnaOutput Fully-populated DNA output from analyze().

Returns: string — UTF-8 YAML document. Suitable for machine consumption or as input to diff.

Example:

import { analyze, formatYaml } from 'code-dna/lib';
import fs from 'node:fs';

const dna = await analyze('/path/to/project', {});
const yaml = formatYaml(dna);
fs.writeFileSync('CODEBASE-DNA.yaml', yaml, 'utf8');

computeDnaDiff

Compare two DnaOutput snapshots and return a structured diff.

function computeDnaDiff(oldDna: DnaOutput, newDna: DnaOutput): DnaDiff

Parameters:

Parameter Type Description
oldDna DnaOutput The baseline DNA snapshot.
newDna DnaOutput The newer DNA snapshot to compare against the baseline.

Returns: DnaDiff — Structured diff covering file changes, symbol changes, dependency changes, risk movements, and convention shifts.

Example:

import { analyze, computeDnaDiff, formatDnaDiff } from 'code-dna/lib';

const before = await analyze('/path/to/project', {});
// ... time passes, code changes ...
const after = await analyze('/path/to/project', {});

const diff = computeDnaDiff(before, after);
console.log(`Files added: ${diff.summary.filesAdded}`);
console.log(`Symbols removed: ${diff.summary.symbolsRemoved}`);

Note: computeDnaDiff is exported from src/core/diff-engine.ts, not from src/lib.ts. Import it directly:

import { computeDnaDiff, formatDnaDiff } from 'code-dna/dist/core/diff-engine.js';

formatDnaDiff

Format a DnaDiff as a Markdown report suitable for human review or inclusion in a pull-request description.

function formatDnaDiff(diff: DnaDiff): string

Parameters:

Parameter Type Description
diff DnaDiff Diff produced by computeDnaDiff().

Returns: string — UTF-8 Markdown diff report with sections for file changes, symbol changes, dependency changes, risk surface movements, and convention changes.

Example:

const diff = computeDnaDiff(before, after);
const report = formatDnaDiff(diff);
fs.writeFileSync('DNA-DIFF.md', report, 'utf8');

allocateTokenBudget

Distribute a total token budget across DNA output sections, proportional to configured weights and estimated data sizes.

function allocateTokenBudget(
  totalBudget: number,
  weights: Record<string, number>,
  dataSizes: Record<string, number>,
): SectionBudget

Parameters:

Parameter Type Description
totalBudget number Total token count to distribute.
weights Record<string, number> Relative weight per section (higher = more tokens).
dataSizes Record<string, number> Estimated raw data size per section (used to avoid over-allocating to sparse sections).

Returns: SectionBudget — Token allocation per section. Each value is at least 200 tokens and at most 40% of totalBudget.

Constraints:

  • Minimum per section: 200 tokens
  • Maximum per section: 40% of total budget
  • Truncation order when budget is tight (lowest priority first): apiSurface, hotFiles, riskSurface, dependencies, conventions, moduleMap, architecture

Example:

const budget = allocateTokenBudget(
  8000,
  { architecture: 15, moduleMap: 25, dependencies: 15,
    conventions: 15, hotFiles: 10, riskSurface: 10, apiSurface: 5 },
  { architecture: 300, moduleMap: 1200, dependencies: 600,
    conventions: 200, hotFiles: 400, riskSurface: 350, apiSurface: 150 },
);
// budget.moduleMap will be larger than budget.apiSurface

estimateTokens

Estimate the token count of a string using a simple character-count heuristic (approximately 4 characters per token).

function estimateTokens(text: string): number

Parameters:

Parameter Type Description
text string The text to estimate.

Returns: number — Estimated token count.


Types

AnalysisOptions

Options passed to the top-level analyze() function.

interface AnalysisOptions {
  path: string;
  format: 'md' | 'yaml';
  output?: string;
  layers: number[];
  languages?: string[];
  tokenBudget: number;
  gitDepth?: number;
  scope?: string;
}
Field Type Default Description
path string Absolute or relative path to the project root.
format 'md' | 'yaml' 'md' Output format.
output string? undefined Destination file path. undefined means stdout.
layers number[] [1, 2, 3, 4] Which pipeline layers to execute.
languages string[]? all Restrict to these language IDs, e.g. ['typescript', 'python'].
tokenBudget number 8000 Target token count for Markdown output.
gitDepth number? 1000 Maximum commits to traverse during git archaeology.
scope string? undefined Subdirectory to scope the analysis to (relative to path).

All fields are optional when passed to analyze(), which accepts Partial<AnalysisOptions>.


DnaOutput

The fully-populated DNA output produced by the core engine after all layers have run.

interface DnaOutput {
  metadata: GenomeHeader;
  skeleton: ModuleMap;
  dependencies: DependencyGraph;
  conventions: ConventionReport;
  hotFiles: HotFile[];
  riskSurface: RiskEntry[];
  apiSurface: ApiEntry[];
  architecture: ArchitectureReport;
}
Field Type Description
metadata GenomeHeader Project metadata and aggregate statistics.
skeleton ModuleMap Tree-shaped module structure with symbols.
dependencies DependencyGraph Import/export dependency graph.
conventions ConventionReport Detected coding conventions.
hotFiles HotFile[] Files with highest commit and author churn.
riskSurface RiskEntry[] Risk-ranked files with scores and factor breakdowns.
apiSurface ApiEntry[] Public API surface — exported symbols and HTTP endpoints.
architecture ArchitectureReport Architectural style and framework detection.

GenomeHeader

Project-level metadata at the top of the DNA output.

interface GenomeHeader {
  projectName: string;
  version: string;
  generatedAt: string;
  gitRemote?: string;
  branch?: string;
  commitSha?: string;
  languages: LanguageStat[];
  totalFiles: number;
  totalLoc: number;
}
Field Type Description
projectName string Project name from package.json or directory name.
version string Semver version of the code-dna tool that produced this output.
generatedAt string ISO 8601 timestamp.
gitRemote string? Git remote URL if available.
branch string? Active git branch at analysis time.
commitSha string? HEAD commit SHA at analysis time.
languages LanguageStat[] Language breakdown sorted by file count descending.
totalFiles number Total source files analysed (unknown-language files excluded).
totalLoc number Total lines of code across all analysed files.

LanguageStat

Per-language aggregate statistics.

interface LanguageStat {
  id: string;
  files: number;
  loc: number;
  percentage: number;
}
Field Type Description
id string Language identifier, e.g. 'typescript', 'python'.
files number Number of source files in this language.
loc number Total lines of code in this language.
percentage number Percentage of total project LOC (0–100).

ModuleMap

Entry point for the tree-shaped structural skeleton.

interface ModuleMap {
  root: ModuleNode;
}

ModuleNode

A single node in the module tree — either a directory or a source file.

interface ModuleNode {
  name: string;
  path: string;
  type: 'directory' | 'file';
  role?: FileRole;
  language?: string;
  loc?: number;
  symbols?: SymbolEntry[];
  children?: ModuleNode[];
}
Field Type Description
name string Base name of the file or directory.
path string Full path relative to the project root.
type 'directory' | 'file' Node type.
role FileRole? Inferred semantic role (file nodes only).
language string? Programming language of the file.
loc number? Lines of code (file nodes only).
symbols SymbolEntry[]? Top-level exported symbols (file nodes only).
children ModuleNode[]? Child nodes (directory nodes only).

FileRole values: controller, route, handler, service, usecase, model, entity, schema, repository, dao, middleware, util, helper, lib, test, spec, config, env, migration, type, interface, component, page, layout, entry, main, unknown.


SymbolEntry

A named symbol extracted from a source file.

interface SymbolEntry {
  name: string;
  kind: 'function' | 'class' | 'interface' | 'type' | 'enum' | 'variable' | 'method';
  exported: boolean;
  signature?: string;
  decorators?: string[];
  loc: { start: number; end: number };
}
Field Type Description
name string Identifier as written in source.
kind string Category of the symbol.
exported boolean Whether the symbol is exported from its module.
signature string? Human-readable type signature, e.g. (req: Request) => void.
decorators string[]? Decorator names applied to the symbol, e.g. ['Controller', 'Get'].
loc { start: number; end: number } Source line range (1-indexed).

DependencyGraph

Import/export relationships between all source files.

interface DependencyGraph {
  nodes: DependencyNode[];
  edges: DependencyEdge[];
  circularDeps: string[][];
}
Field Type Description
nodes DependencyNode[] One node per file that participates in the import graph.
edges DependencyEdge[] Directed import edges.
circularDeps string[][] Groups of file paths that form circular dependency cycles.

DependencyNode

A single file in the dependency graph with degree metrics.

interface DependencyNode {
  path: string;
  inDegree: number;
  outDegree: number;
}
Field Type Description
path string File path relative to the project root.
inDegree number Number of other files that import this file (fan-in).
outDegree number Number of files this file imports (fan-out).

DependencyEdge

A directed edge in the dependency graph.

interface DependencyEdge {
  from: string;
  to: string;
  type: 'import' | 'reexport' | 'dynamic';
}
Field Type Description
from string Importing file path.
to string Imported module path (resolved to a project file where possible).
type string How the dependency was introduced.

ConventionReport

Coding conventions observed across the project.

interface ConventionReport {
  naming: NamingConvention;
  fileOrganization: string;
  importStyle: ImportStyle;
  errorHandling: string;
  testPattern: string;
  exportStyle: string;
  confidence: number;
}
Field Type Description
naming NamingConvention Identifier and file naming conventions.
fileOrganization string One of: 'by-feature', 'by-layer', 'by-type', 'hybrid'.
importStyle ImportStyle Import ordering and path style.
errorHandling string One of: 'try-catch', 'result-type', 'callback', 'mixed'.
testPattern string One of: 'co-located', 'mirror-directory', 'flat', 'mixed'.
exportStyle string One of: 'named', 'default', 'barrel', 'mixed'.
confidence number Confidence score for the detected conventions (0–1).

NamingConvention fields: files, functions, classes, variables, constants — each a human-readable style name (e.g. 'camelCase', 'kebab-case', 'PascalCase', 'snake_case').

ImportStyle fields: ordering (one of 'external-first', 'internal-first', 'alphabetical', 'unordered'), pathStyle (one of 'relative', 'absolute', 'alias', 'mixed').


HotFile

A file identified as a churn hotspot via git archaeology.

interface HotFile {
  path: string;
  commitCount: number;
  authorCount: number;
  lastModified: string;
  reason: string;
}
Field Type Description
path string File path relative to the project root.
commitCount number Total commits touching this file.
authorCount number Number of distinct authors.
lastModified string ISO 8601 timestamp of the most recent commit.
reason string Human-readable explanation of why this file is considered hot.

RiskEntry

A file ranked by its composite risk score.

interface RiskEntry {
  path: string;
  score: number;
  factors: RiskFactor[];
}
Field Type Description
path string File path relative to the project root.
score number Composite risk score (0–100). Higher is riskier to change.
factors RiskFactor[] Individual factors contributing to the score.

RiskFactor

A single dimension of risk for a file.

interface RiskFactor {
  type: 'high-centrality' | 'low-test-coverage' | 'high-churn' | 'many-authors' | 'high-complexity';
  value: number;
  description: string;
}
Field Type Description
type string Which risk dimension this factor measures.
value number Normalised numeric value for this factor.
description string Human-readable explanation.

ApiEntry

A single entry on the public API surface.

interface ApiEntry {
  path: string;
  symbol: string;
  kind: 'endpoint' | 'export' | 'public-method';
  signature: string;
  httpMethod?: string;
  route?: string;
}
Field Type Description
path string File declaring this API entry.
symbol string Symbol name as declared in source.
kind string Category of API entry.
signature string Type signature of the symbol.
httpMethod string? HTTP method for endpoint entries (e.g. 'GET', 'POST').
route string? URL route pattern for endpoint entries (e.g. '/users/:id').

ArchitectureReport

High-level architectural characterisation of the project.

interface ArchitectureReport {
  style: string;
  framework: FrameworkInfo;
  layers: LayerInfo[];
  confidence: number;
}
Field Type Description
style string One of: 'mvc', 'hexagonal', 'layered', 'event-driven', 'monolith', 'unknown'.
framework FrameworkInfo Detected framework and detection evidence.
layers LayerInfo[] Logical layers found in the project.
confidence number Confidence score for the architecture detection (0–1).

FrameworkInfo fields: name (e.g. 'nextjs', 'express', 'fastapi', 'spring-boot', 'nestjs'), version?, markers: string[] (evidence files/packages).

LayerInfo fields: name, directories: string[], fileCount: number, role: string.


DnaDiff

The full structural diff between two DnaOutput snapshots, produced by computeDnaDiff().

interface DnaDiff {
  summary: DnaDiffSummary;
  fileChanges: {
    added: string[];
    removed: string[];
    modified: string[];
  };
  symbolChanges: {
    added: Array<{ name: string; kind: string; file: string; line: number }>;
    removed: Array<{ name: string; kind: string; file: string }>;
  };
  dependencyChanges: {
    added: string[];
    removed: string[];
  };
  riskChanges: RiskChange[];
  conventionChanges: string[];
}

DnaDiffSummary fields: filesAdded, filesRemoved, filesModified, symbolsAdded, symbolsRemoved, riskIncreased, riskDecreased — all number.

RiskChange fields: file: string, oldScore: number, newScore: number, direction: 'increased' | 'decreased'.


SectionBudget

Token allocation for each DNA output section.

interface SectionBudget {
  architecture: number;
  moduleMap: number;
  dependencies: number;
  conventions: number;
  hotFiles: number;
  riskSurface: number;
  apiSurface: number;
}

All values are positive integers representing allocated token counts. Produced by allocateTokenBudget() and consumed by formatMarkdown().