Skip to content

Conversation

@digizeph
Copy link
Member

@digizeph digizeph commented Dec 5, 2025

This pull request introduces major enhancements and refactoring to the bgpkit-commons crate, focusing on historical RPKI data support, API improvements, and codebase consolidation. The most significant changes include adding RPKIviews as a historical RPKI data source with efficient streaming, unifying and expanding public API types, migrating external crates (as2org-rs and peeringdb-rs) into the codebase, and providing a new builder pattern for AS information loading. Comprehensive tests and new example programs demonstrate the improved functionality.

Historical RPKI Data Support

  • Added RPKIviews as a historical RPKI data source, including support for multiple collectors, new loading methods (RpkiTrie::from_rpkiviews, from_rpkiviews_file, from_rpkiviews_files), and file discovery (list_rpkiviews_files). Streaming optimizations allow early termination when extracting rpki-client.json from large .tgz archives, significantly reducing test completion time.
  • Updated RIPE NCC historical data loading to use the JSON format (output.json.xz) for consistency and richer data, enabled by default via the xz feature in oneio. [1] [2]

Public API and Builder Pattern

  • Added stable public API types for ROA and ASPA objects, and unified rpki-client JSON parsing logic for all sources.
  • Introduced a new AsInfoBuilder struct for ergonomic, fluent configuration of AS information loading, replacing confusing boolean parameters. Added corresponding methods to BgpkitCommons for builder creation and loading. [1] [2]

Crate Consolidation and Feature Flags

  • Migrated the CAIDA AS-to-Organization mapping (as2org-rs) and PeeringDB API client (peeringdb-rs) directly into the asinfo module, removing external dependencies and simplifying maintenance. Feature flags updated accordingly. [1] [2] [3]

New and Updated Examples

  • Added examples/rpki_historical.rs to demonstrate historical RPKI data loading and file listing. Updated examples/list_aspas.rs to count ASPA objects for the first day of each year (2020-2025) using both RIPE and RPKIviews sources. [1] [2] [3]

Testing and Utility Improvements

  • Added comprehensive unit and integration tests for the migrated as2org and peeringdb modules, and new helper methods for PeeringDB usability (len, is_empty, contains, get_all_asns).

- Added a new `AsInfoBuilder` with fluent API methods to load AS information using specific data sources.
- Introduced `asinfo_builder()` and `load_asinfo_with(builder)` methods for improved API ergonomics in `BgpkitCommons`.
- Retained `load_asinfo(bool, ...)` for backward compatibility.
- Exposed `PeeringdbData` for public access.
- Migrated `as2org-rs` crate into `bgpkit-commons` as `as2org` module:
  - Implemented AS-to-Organization mapping with `As2org` struct and core methods.
  - Removed external `as2org-rs` dependency from `Cargo.toml`.
- Migrated `peeringdb-rs` crate into `bgpkit-commons` as `peeringdb` module:
  - Integrated PeeringDB API client with `PeeringdbNet` struct and utility functions.
  - Removed external `peeringdb-rs` dependency from `Cargo.toml`.
- Updated `asinfo` feature flags to replace external dependencies with `regex`.
- Added comprehensive tests for `as2org` and `peeringdb` modules for reliability.
- Simplified maintenance by unifying all functionality in the shared repository.
optimization

This commit adds RPKIviews as an additional historical RPKI data source
alongside RIPE NCC archives, with efficient streaming extraction that
avoids downloading entire 300+ MB archives.

- New `RpkiViewsCollector` enum with four collectors:
  - Josephine (default) - A2B Internet, Amsterdam
  - Amber - Massar, Lugano
  - Dango - IIJ, Tokyo
  - Kerfuffle - Kerfuffle LLC, Fremont
- `RpkiTrie::from_rpkiviews(collector, date)` for loading historical
  data
- `list_rpkiviews_files(collector, date)` for discovering available
  archives
- `HistoricalRpkiSource` enum for explicitly selecting data source

- `rpki-client.json` is located at position 3-4 in RPKIviews archives
- Early termination after ~80MB instead of downloading full 300+ MB
- New helper functions:
  - `extract_file_from_tgz(url, target_path)` - Stream and extract
    specific file
  - `list_files_in_tgz(url, max_entries)` - List archive contents with
    limit
  - `tgz_contains_file(url, target_path)` - Check file existence
- Test completion time reduced from minutes to ~8 seconds

- New internal `rpki_client.rs` module with `RpkiClientData` struct
- Handles format variations across sources:
  - ASN formats: numeric (12345) vs string ("AS12345")
  - ASPA field names: `customer_asid` vs `customer`
  - Provider arrays: numbers vs strings
- Used by Cloudflare, RIPE historical, and RPKIviews sources

- New stable `Roa` struct: prefix, asn, max_length, not_before,
  not_after
- New stable `Aspa` struct: customer_asn, providers
- Internal rpki-client structs now `pub(crate)` only
- New `BgpkitCommons` methods:
  - `load_rpki_historical(source, date)`
  - `list_rpki_files(source, date)`
  - `load_rpki_from_files(urls, date)`

- Changed from CSV to `output.json.xz` format for consistency
- Provides richer data including expiry timestamps

- Added `reqwest` (blocking) for HTTP streaming
- Added `tar` crate for reading tar archives
- Enabled `xz` feature in `oneio` for .xz decompression

- New `examples/rpki_historical.rs` demonstrating historical data
  loading
- Updated `examples/list_aspas.rs` to count ASPA objects for 2020-2025

- Fixed clippy warnings (derivable_impls, dead_code)
- Added #[allow(dead_code)] for public API functions used by consumers

Files changed:
- Added: src/rpki/rpki_client.rs (371 lines)
- Added: src/rpki/rpkiviews.rs (667 lines)
- Added: examples/rpki_historical.rs (64 lines)
- Modified: src/rpki/mod.rs, src/rpki/cloudflare.rs,
  src/rpki/ripe_historical.rs
- Modified: src/lib.rs, Cargo.toml, CHANGELOG.md
- Modified: examples/list_aspas.rs
- Modified: src/asinfo/as2org.rs, src/asinfo/peeringdb.rs (clippy fixes)
Rename RPKIviews collectors to
SoborostNet/MassarsNet/AttnJp/KerfuffleNet. Add streaming tgz
extraction, unified rpki-client JSON parsing, public Roa and Aspa types,
historical loading/listing APIs, and update examples and CHANGELOG
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces comprehensive enhancements to the bgpkit-commons crate, focusing on historical RPKI data support through RPKIviews, consolidation of external dependencies, and API improvements. The key additions include efficient streaming extraction from .tgz archives, unified rpki-client JSON parsing across data sources, migration of as2org-rs and peeringdb-rs functionality into the main codebase, and a new builder pattern for AS information loading.

Key Changes:

  • Added RPKIviews as a historical RPKI data source with streaming optimization that reduces download requirements from 300+ MB to ~80 MB
  • Unified RPKI data parsing with internal rpki_client module supporting Cloudflare, RIPE, and RPKIviews formats
  • Migrated as2org-rs and peeringdb-rs into asinfo module, removing external dependencies

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
src/rpki/rpkiviews.rs New module implementing RPKIviews data loading with streaming .tgz extraction
src/rpki/rpki_client.rs Unified rpki-client JSON parsing with robust deserializers for multiple formats
src/rpki/ripe_historical.rs Updated to use JSON format (output.json.xz) instead of CSV
src/rpki/mod.rs Enhanced public API with Roa/Aspa structs, HistoricalRpkiSource enum, and new loading methods
src/rpki/cloudflare.rs Simplified to use shared rpki_client parsing logic
src/asinfo/as2org.rs Complete migration of CAIDA AS-to-Organization functionality from external crate
src/asinfo/peeringdb.rs Complete migration of PeeringDB API client from external crate
src/asinfo/mod.rs New AsInfoBuilder pattern for ergonomic data source configuration
src/lib.rs New BgpkitCommons methods for historical RPKI loading and file listing
examples/rpki_historical.rs New example demonstrating historical RPKI data loading
examples/list_aspas.rs Updated to count ASPAs across years using both RIPE and RPKIviews
tests/*.rs Added feature flags for conditional compilation
Cargo.toml Updated dependencies: removed as2org-rs/peeringdb-rs, added reqwest/tar
CHANGELOG.md Comprehensive documentation of changes
Comments suppressed due to low confidence (1)

src/rpki/mod.rs:536

  • The reload() method only handles RIPE historical data when date is Some, but with the new RPKIviews support, users may have loaded from RPKIviews. Reloading will fall back to RIPE, which might not match the original source. Consider tracking the original source in RpkiTrie to reload from the same source.
    pub fn reload(&mut self) -> Result<()> {
        match self.date {
            Some(date) => {
                let trie = RpkiTrie::from_ripe_historical(date)?;
                self.trie = trie.trie;
                self.aspas = trie.aspas;
            }
            None => {
                let trie = RpkiTrie::from_cloudflare()?;
                self.trie = trie.trie;
                self.aspas = trie.aspas;
            }
        }

        Ok(())

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +166 to +177
let mut gunzip = Command::new("gunzip")
.arg("-c")
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()
.map_err(|e| {
crate::BgpkitCommonsError::data_source_error(
"RPKIviews",
format!("Failed to spawn gunzip: {}", e),
)
})?;
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code spawns external gunzip processes without any timeout or resource limits. A malicious or malformed .tgz file could cause the gunzip process to hang indefinitely or consume excessive resources. Consider adding a timeout mechanism or resource monitoring.

Copilot uses AI. Check for mistakes.
Comment on lines 274 to 276
// Don't wait for writer thread - it will terminate when pipe closes
// Just detach it
drop(writer_thread);
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The writer thread is dropped without being joined, which could lead to incomplete writes or resource leaks. While there's a comment explaining this, it's a potential issue. Consider using drop(writer_thread.join()); to ensure clean shutdown, or at minimum document why fire-and-forget is acceptable here.

Suggested change
// Don't wait for writer thread - it will terminate when pipe closes
// Just detach it
drop(writer_thread);
// Wait for writer thread to finish to ensure clean shutdown
let _ = writer_thread.join();

Copilot uses AI. Check for mistakes.
}

should_stop.store(true, std::sync::atomic::Ordering::Relaxed);
drop(writer_thread);
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thread management issue. The writer thread is dropped without joining, which could result in incomplete operations or resource leaks.

Suggested change
drop(writer_thread);
match writer_thread.join() {
Ok(Ok(())) => {},
Ok(Err(e)) => return Err(e),
Err(_) => {
return Err(crate::BgpkitCommonsError::data_source_error(
"RPKIviews",
"Writer thread panicked while extracting file from tgz",
));
}
}

Copilot uses AI. Check for mistakes.
// Backwards Compatibility Type Aliases
// ============================================================================

/// Type alias for backwards compatibility. Use [`Roa`] instead.
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The deprecated type alias RoaEntry has a deprecation notice, but there's no replacement guidance that points to the new Roa type. The notice says "Use Roa instead" which is good, but consider adding a version when this will be removed for better planning.

Suggested change
/// Type alias for backwards compatibility. Use [`Roa`] instead.
/// Type alias for backwards compatibility. Use [`Roa`] instead.
/// Deprecated since 0.10.0. This alias will be removed in version 0.12.0.

Copilot uses AI. Check for mistakes.
/// Load RPKI data from RPKIviews for a specific date.
///
/// This will use the first (earliest) available file for the given date from the specified collector.
/// By default, uses the Kerfuffle collector.
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment on line 531 says "By default, uses the Kerfuffle collector" but this is misleading - the function takes a collector parameter and doesn't use any default. The default is actually SoborostNet as defined in the Default impl for RpkiViewsCollector. Update the documentation to clarify that the caller specifies the collector.

Suggested change
/// By default, uses the Kerfuffle collector.
/// The caller must specify the collector to use.

Copilot uses AI. Check for mistakes.
Comment on lines 265 to 272
should_stop.store(true, std::sync::atomic::Ordering::Relaxed);
break;
}
}
}

// Signal writer to stop (in case we finished iterating)
should_stop.store(true, std::sync::atomic::Ordering::Relaxed);
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The atomic flag should_stop uses Ordering::Relaxed which provides no synchronization guarantees. While this may work in practice for this use case (a simple flag), consider using Ordering::SeqCst or at least Ordering::Release/Ordering::Acquire pairs for clearer memory ordering semantics.

Suggested change
should_stop.store(true, std::sync::atomic::Ordering::Relaxed);
break;
}
}
}
// Signal writer to stop (in case we finished iterating)
should_stop.store(true, std::sync::atomic::Ordering::Relaxed);
should_stop.store(true, std::sync::atomic::Ordering::SeqCst);
break;
}
}
}
// Signal writer to stop (in case we finished iterating)
should_stop.store(true, std::sync::atomic::Ordering::SeqCst);

Copilot uses AI. Check for mistakes.
}

should_stop.store(true, std::sync::atomic::Ordering::Relaxed);
drop(writer_thread);
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same pattern of dropping the writer thread without joining appears here as well. This could leave threads running or cause resource leaks. Consider joining the thread or documenting why this is safe.

Suggested change
drop(writer_thread);
let _ = writer_thread.join();

Copilot uses AI. Check for mistakes.
Comment on lines +166 to +177
let mut gunzip = Command::new("gunzip")
.arg("-c")
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()
.map_err(|e| {
crate::BgpkitCommonsError::data_source_error(
"RPKIviews",
format!("Failed to spawn gunzip: {}", e),
)
})?;
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gunzip process is spawned without checking if the gunzip command is available. If gunzip is not installed or not in PATH, this will fail with a potentially confusing error. Consider adding a check or documenting this system dependency in the module documentation.

Copilot uses AI. Check for mistakes.
src/rpki/mod.rs Outdated
}

#[test]
fn test_validate_check_expiry() {
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The test name test_validate_check_expiry is very generic. Consider renaming it to something more descriptive like test_validate_check_expiry_with_time_constraints to better indicate what scenarios are being tested.

Suggested change
fn test_validate_check_expiry() {
fn test_validate_check_expiry_with_time_constraints() {

Copilot uses AI. Check for mistakes.
Ok(0) => break,
Ok(n) => n,
Err(e) if e.kind() == std::io::ErrorKind::Interrupted => continue,
Err(_) => break, // Connection closed or error, stop gracefully
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the writer thread error handling on line 220, network errors during HTTP streaming are silently swallowed with Err(_) => break. This could hide real errors (like corrupted data or unexpected disconnections) from the user. Consider logging the error before breaking or distinguishing between expected EOF and unexpected errors.

Copilot uses AI. Check for mistakes.
Use SeqCst ordering for should_stop/should_stop_writer, log network read
errors instead of silently breaking, and join the writer thread to
ensure a clean shutdown. Add module documentation noting the `gunzip`
requirement for RPKIviews. Add a deprecation note to the RoaEntry alias,
rename/update related tests, and clarify collector usage in the
from_rpkiviews docs and test.
@digizeph digizeph merged commit 10e169f into main Dec 5, 2025
1 check passed
@digizeph digizeph deleted the revisions branch December 5, 2025 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants