categories: add mmap .ndb backend for custom category lists by fabiodepin · Pull Request #3157 · ntop/nDPI

fabiodepin · 2026-04-15T20:31:49Z

Please sign (check) the below before submitting the Pull Request:

[ X ] I have signed the ntop Contributor License Agreement at https://github.com/ntop/legal/blob/main/individual-contributor-licence-agreement.md
[ X ] I have read the contributing guidelines at https://github.com/ntop/nDPI/blob/dev/CONTRIBUTING.md
[ X ] I have updated the documentation (in doc/) to reflect the changes made (if applicable)

Link to the related 3151:

Describe changes:

Introduce a compiled .ndb backend (mmap) for external custom category matching, with LEGACY / NDB_ONLY / HYBRID modes, while keeping the existing -G / Aho-Corasick list path unchanged.

Adds ndpi_load_category_ndb_file() / ndpi_unload_category_ndb(), ndpiReader --category-ndb and --category-ndb-reload-interval, a polling-based hot-reload helper, offline builder ndpi_gen_categories_bin, shared hostname normalization (generator + runtime), and on-disk layout in ndpi_categories_bin.h (domains plus IPv4/IPv6 prefix entries).

The generator writes the database atomically (temporary file + fsync + rename) so ndpiReader can reload a new valid file without restart.

IvanNardi · 2026-04-16T12:14:00Z

Interesting stuff! This patch is quite big, so I might need some time to proper review it.
In the meantime, if possible:

try to fix compilation warnings that you can find in CI logs
we want to be able to compile the library without lpthread (via --disable-global-context-support). In that case, it is likely that we can't update the db at runtime and all the lock/unlock operation should be a nop

While I understand that load time is lower with this change, I really would like to see some tests and numbers about runtime/lookup performance. Is that possible?

fabiodepin · 2026-04-16T12:33:05Z

Thanks for the feedback.

That makes sense.

I'll address the CI warnings first.

Regarding builds without pthread / --disable-global-context-support: I agree that runtime DB updates should not require pthread in that configuration. I'll adjust the implementation so that lock/unlock become no-ops there, and any runtime reload-specific path is either disabled or compiled out as needed, while keeping the basic .ndb loading path working when possible.

For performance, yes — I can add tests and numbers. I'll collect:

load time
memory usage
runtime / lookup performance

I'll compare the legacy path (-G / existing structures) against the .ndb backend, and I can include both a real ndpiReader run and a smaller lookup-focused benchmark if useful.

fabiodepin · 2026-04-17T16:43:07Z

Thanks again for the review.

Quick update on the requested points:

CI warnings:
I went through the CI logs and fixed the compilation warnings that were reported.

Build without pthread (–disable-global-context-support):
This is now supported:
• the code builds without pthread
• all lock/unlock operations are implemented as no-ops in that mode
• runtime DB updates (hot reload) are effectively disabled when global context support is off

The basic .ndb loading path still works in this configuration.

Runtime / lookup performance:
I’m currently working on benchmarks for this.

It measures:
• load time
• RSS / memory usage
• lookup performance (hostname + IPv4, hit/miss/LPM)
• basic latency percentiles (block-based)

Initial results (micro dataset) already show:
• much lower load time for .ndb
• significantly better hostname miss performance
• lookup performance overall comparable to legacy

I’m now extending this to larger datasets to provide more representative numbers. I’ll share detailed results shortly.

fabiodepin · 2026-04-18T12:35:15Z

Benchmark

I ran a dedicated benchmark (not included in this PR to keep it focused):
tests/performance/category_ndb_bench.c

It measures:

load time
RSS / memory usage
lookup latency (hostname + IPv4, hit/miss/LPM)
median and block percentiles

⸻

Key results

Large dataset (~7.5M hostnames, ~100k IPv4 rules)

.ndb file size: ~332 MB

Load time

.ndb: ~962 ms
legacy: ~3446 ms
→ ~3.5× faster

Memory (RSS after load)

.ndb: ~711 MB
legacy: ~1.82 GB
→ >2× reduction

Hostname lookup

hit:
- .ndb: ~181 ns
- legacy: ~167 ns
  → essentially on par
miss:
- .ndb: ~226 ns
- legacy: ~407 ns
  → ~1.8× faster

⸻

Interpretation

Load time and memory usage are significantly improved with .ndb
Hostname lookup remains competitive even at multi-million scale
Hostname miss is consistently faster with .ndb

⸻

Current limitation

The benchmark also highlights the current weak point:

IPv4 lookup in .ndb is still implemented as a linear scan (O(N))

This dominates lookup cost for large IPv4 tables and will be addressed separately.

⸻

Summary

Even at large scale:

.ndb significantly improves load time and memory footprint
hostname lookup scales well and remains competitive
the remaining bottleneck is the IPv4 lookup path

fabiodepin · 2026-04-18T12:55:45Z

Happy to add the benchmark to the PR if that would make review easier.

IvanNardi · 2026-04-20T14:25:24Z

Load time

* .ndb: ~962 ms

* legacy: ~3446 ms
  → ~3.5× faster

Memory (RSS after load)

* .ndb: ~711 MB

* legacy: ~1.82 GB
  → >2× reduction

Interesting numbers, but quite different from the ones reported in #3151

• ~100x reduction in memory usage
• ~7x faster startup vs -G

IvanNardi · 2026-04-20T14:26:05Z

Happy to add the benchmark to the PR if that would make review easier.

yes, please. I would like to run some tests locally, before reviewing this patch

fabiodepin · 2026-04-20T16:37:14Z

Good point — the numbers differ because they come from two different kinds of measurements.

In #3151, the numbers are from a real ndpiReader run, which includes:

full runtime setup
text parsing / runtime structure build in the legacy path
the full application environment

The benchmark I shared here isolates only the category load + lookup path, so it measures:

mmap load vs legacy load
lookup latency (hostname / IPv4)
RSS after initialization

So they are complementary rather than directly equivalent:

High memory usage and long startup time in ndpiReader with large category lists (-G) – proposal for binary .ndb backend #3151 shows real ndpiReader behavior
the benchmark shows isolated load/lookup behavior

Regarding the benchmark: yes, I’ll add it to the PR so it can be run locally.

fabiodepin · 2026-04-20T23:21:35Z

I’ve just pushed the benchmark code to the PR:

tests/performance/category_ndb_bench.c
Makefile target in tests/performance

It’s a simple tool to compare .ndb vs legacy for:

load time
RSS
hostname + IPv4 lookup latency

It supports micro/scale/stress profiles and load-only / lookup runs.

A couple of notes:

large IPv4 tables will highlight the current O(N) lookup behavior in .ndb
mixed_global currently reuses per-case pools and is not yet a true cross-API mixed loop

If you want to try it quickly (from repo root):

Micro dataset: quick sanity check (.ndb vs legacy)
./tests/performance/category_ndb_bench --profile micro --backend both --mode fixed
Micro dataset with mixed lookup patterns
./tests/performance/category_ndb_bench --profile micro --backend both --mode mixed_by_case
Scale dataset: load / RSS comparison
./tests/performance/category_ndb_bench --profile scale --backend both --only-load
Scale dataset: lookup comparison
./tests/performance/category_ndb_bench --profile scale --backend both --mode mixed_by_case
Stress dataset: .ndb-only load path
./tests/performance/category_ndb_bench --profile stress --backend ndb --only-load

Let me know if you’d like any adjustments or additional scenarios.

sonarqubecloud · 2026-05-05T23:00:13Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Introduce a compiled .ndb backend (mmap) for external custom category matching, with LEGACY / NDB_ONLY / HYBRID modes, while keeping the existing -G (Aho-Corasick) path unchanged. Add ndpi_load_category_ndb_file() / ndpi_unload_category_ndb(), CLI options (--category-ndb, --category-ndb-reload-interval), a polling-based hot-reload helper, and the offline builder ndpi_gen_categories_bin. Implement shared hostname normalization (generator + runtime) and define the on-disk layout in ndpi_categories_bin.h (domains and IPv4/IPv6 prefix entries). The generator writes the database atomically (temporary file + fsync + rename), allowing ndpiReader to reload a valid file without restart. category_ndb: use no-op locks when global context support is disabled

…oad benchmarks Add a dedicated benchmark tool, category_ndb_bench, to compare the compiled .ndb backend against the legacy custom-category path. The benchmark measures: - load time - RSS / memory usage - hostname and IPv4 lookup latency - median and block percentiles It supports synthetic micro/scale/stress profiles, temp or persisted .ndb generation, load-only / lookup-only modes, and explicit backend selection (ndb / legacy / both). Also add guardrails and help text for large legacy runs: - default legacy safety caps for hosts / IPv4 rules - clear skip/error behavior depending on backend mode - mixed_global caveat in output/help - recommended commands in --help

sonarqubecloud · 2026-05-19T23:44:57Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

fabiodepin force-pushed the categories-ndb branch 5 times, most recently from 0ed3acd to 42fe9f9 Compare April 16, 2026 01:27

fabiodepin force-pushed the categories-ndb branch 4 times, most recently from 331dd37 to 68896c5 Compare April 17, 2026 16:11

fabiodepin force-pushed the categories-ndb branch from 68896c5 to 4bdb2d0 Compare April 20, 2026 23:07

fabiodepin force-pushed the categories-ndb branch from 4bdb2d0 to 108db01 Compare May 5, 2026 22:59

fabiodepin and others added 2 commits May 19, 2026 20:42

fabiodepin force-pushed the categories-ndb branch from 108db01 to 4a368ac Compare May 19, 2026 23:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

categories: add mmap .ndb backend for custom category lists#3157

categories: add mmap .ndb backend for custom category lists#3157
fabiodepin wants to merge 2 commits into
ntop:devfrom
fabiodepin:categories-ndb

fabiodepin commented Apr 15, 2026

Uh oh!

IvanNardi commented Apr 16, 2026

Uh oh!

fabiodepin commented Apr 16, 2026

Uh oh!

fabiodepin commented Apr 17, 2026

Uh oh!

fabiodepin commented Apr 18, 2026

Uh oh!

fabiodepin commented Apr 18, 2026

Uh oh!

IvanNardi commented Apr 20, 2026 •

edited

Loading

Uh oh!

IvanNardi commented Apr 20, 2026

Uh oh!

fabiodepin commented Apr 20, 2026

Uh oh!

fabiodepin commented Apr 20, 2026

Uh oh!

sonarqubecloud Bot commented May 5, 2026

Uh oh!

sonarqubecloud Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

fabiodepin commented Apr 15, 2026

Uh oh!

IvanNardi commented Apr 16, 2026

Uh oh!

fabiodepin commented Apr 16, 2026

Uh oh!

fabiodepin commented Apr 17, 2026

Uh oh!

fabiodepin commented Apr 18, 2026

Uh oh!

fabiodepin commented Apr 18, 2026

Uh oh!

IvanNardi commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IvanNardi commented Apr 20, 2026

Uh oh!

fabiodepin commented Apr 20, 2026

Uh oh!

fabiodepin commented Apr 20, 2026

Uh oh!

sonarqubecloud Bot commented May 5, 2026

Quality Gate passed

Uh oh!

sonarqubecloud Bot commented May 19, 2026

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

IvanNardi commented Apr 20, 2026 •

edited

Loading