-
Notifications
You must be signed in to change notification settings - Fork 1
Crate consolidation, RPKIviews support, refactors #27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Added a new `AsInfoBuilder` with fluent API methods to load AS information using specific data sources. - Introduced `asinfo_builder()` and `load_asinfo_with(builder)` methods for improved API ergonomics in `BgpkitCommons`. - Retained `load_asinfo(bool, ...)` for backward compatibility. - Exposed `PeeringdbData` for public access.
- Migrated `as2org-rs` crate into `bgpkit-commons` as `as2org` module: - Implemented AS-to-Organization mapping with `As2org` struct and core methods. - Removed external `as2org-rs` dependency from `Cargo.toml`. - Migrated `peeringdb-rs` crate into `bgpkit-commons` as `peeringdb` module: - Integrated PeeringDB API client with `PeeringdbNet` struct and utility functions. - Removed external `peeringdb-rs` dependency from `Cargo.toml`. - Updated `asinfo` feature flags to replace external dependencies with `regex`. - Added comprehensive tests for `as2org` and `peeringdb` modules for reliability. - Simplified maintenance by unifying all functionality in the shared repository.
optimization
This commit adds RPKIviews as an additional historical RPKI data source
alongside RIPE NCC archives, with efficient streaming extraction that
avoids downloading entire 300+ MB archives.
- New `RpkiViewsCollector` enum with four collectors:
- Josephine (default) - A2B Internet, Amsterdam
- Amber - Massar, Lugano
- Dango - IIJ, Tokyo
- Kerfuffle - Kerfuffle LLC, Fremont
- `RpkiTrie::from_rpkiviews(collector, date)` for loading historical
data
- `list_rpkiviews_files(collector, date)` for discovering available
archives
- `HistoricalRpkiSource` enum for explicitly selecting data source
- `rpki-client.json` is located at position 3-4 in RPKIviews archives
- Early termination after ~80MB instead of downloading full 300+ MB
- New helper functions:
- `extract_file_from_tgz(url, target_path)` - Stream and extract
specific file
- `list_files_in_tgz(url, max_entries)` - List archive contents with
limit
- `tgz_contains_file(url, target_path)` - Check file existence
- Test completion time reduced from minutes to ~8 seconds
- New internal `rpki_client.rs` module with `RpkiClientData` struct
- Handles format variations across sources:
- ASN formats: numeric (12345) vs string ("AS12345")
- ASPA field names: `customer_asid` vs `customer`
- Provider arrays: numbers vs strings
- Used by Cloudflare, RIPE historical, and RPKIviews sources
- New stable `Roa` struct: prefix, asn, max_length, not_before,
not_after
- New stable `Aspa` struct: customer_asn, providers
- Internal rpki-client structs now `pub(crate)` only
- New `BgpkitCommons` methods:
- `load_rpki_historical(source, date)`
- `list_rpki_files(source, date)`
- `load_rpki_from_files(urls, date)`
- Changed from CSV to `output.json.xz` format for consistency
- Provides richer data including expiry timestamps
- Added `reqwest` (blocking) for HTTP streaming
- Added `tar` crate for reading tar archives
- Enabled `xz` feature in `oneio` for .xz decompression
- New `examples/rpki_historical.rs` demonstrating historical data
loading
- Updated `examples/list_aspas.rs` to count ASPA objects for 2020-2025
- Fixed clippy warnings (derivable_impls, dead_code)
- Added #[allow(dead_code)] for public API functions used by consumers
Files changed:
- Added: src/rpki/rpki_client.rs (371 lines)
- Added: src/rpki/rpkiviews.rs (667 lines)
- Added: examples/rpki_historical.rs (64 lines)
- Modified: src/rpki/mod.rs, src/rpki/cloudflare.rs,
src/rpki/ripe_historical.rs
- Modified: src/lib.rs, Cargo.toml, CHANGELOG.md
- Modified: examples/list_aspas.rs
- Modified: src/asinfo/as2org.rs, src/asinfo/peeringdb.rs (clippy fixes)
Rename RPKIviews collectors to SoborostNet/MassarsNet/AttnJp/KerfuffleNet. Add streaming tgz extraction, unified rpki-client JSON parsing, public Roa and Aspa types, historical loading/listing APIs, and update examples and CHANGELOG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request introduces comprehensive enhancements to the bgpkit-commons crate, focusing on historical RPKI data support through RPKIviews, consolidation of external dependencies, and API improvements. The key additions include efficient streaming extraction from .tgz archives, unified rpki-client JSON parsing across data sources, migration of as2org-rs and peeringdb-rs functionality into the main codebase, and a new builder pattern for AS information loading.
Key Changes:
- Added RPKIviews as a historical RPKI data source with streaming optimization that reduces download requirements from 300+ MB to ~80 MB
- Unified RPKI data parsing with internal
rpki_clientmodule supporting Cloudflare, RIPE, and RPKIviews formats - Migrated
as2org-rsandpeeringdb-rsintoasinfomodule, removing external dependencies
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| src/rpki/rpkiviews.rs | New module implementing RPKIviews data loading with streaming .tgz extraction |
| src/rpki/rpki_client.rs | Unified rpki-client JSON parsing with robust deserializers for multiple formats |
| src/rpki/ripe_historical.rs | Updated to use JSON format (output.json.xz) instead of CSV |
| src/rpki/mod.rs | Enhanced public API with Roa/Aspa structs, HistoricalRpkiSource enum, and new loading methods |
| src/rpki/cloudflare.rs | Simplified to use shared rpki_client parsing logic |
| src/asinfo/as2org.rs | Complete migration of CAIDA AS-to-Organization functionality from external crate |
| src/asinfo/peeringdb.rs | Complete migration of PeeringDB API client from external crate |
| src/asinfo/mod.rs | New AsInfoBuilder pattern for ergonomic data source configuration |
| src/lib.rs | New BgpkitCommons methods for historical RPKI loading and file listing |
| examples/rpki_historical.rs | New example demonstrating historical RPKI data loading |
| examples/list_aspas.rs | Updated to count ASPAs across years using both RIPE and RPKIviews |
| tests/*.rs | Added feature flags for conditional compilation |
| Cargo.toml | Updated dependencies: removed as2org-rs/peeringdb-rs, added reqwest/tar |
| CHANGELOG.md | Comprehensive documentation of changes |
Comments suppressed due to low confidence (1)
src/rpki/mod.rs:536
- The
reload()method only handles RIPE historical data whendateisSome, but with the new RPKIviews support, users may have loaded from RPKIviews. Reloading will fall back to RIPE, which might not match the original source. Consider tracking the original source inRpkiTrieto reload from the same source.
pub fn reload(&mut self) -> Result<()> {
match self.date {
Some(date) => {
let trie = RpkiTrie::from_ripe_historical(date)?;
self.trie = trie.trie;
self.aspas = trie.aspas;
}
None => {
let trie = RpkiTrie::from_cloudflare()?;
self.trie = trie.trie;
self.aspas = trie.aspas;
}
}
Ok(())
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let mut gunzip = Command::new("gunzip") | ||
| .arg("-c") | ||
| .stdin(Stdio::piped()) | ||
| .stdout(Stdio::piped()) | ||
| .stderr(Stdio::piped()) | ||
| .spawn() | ||
| .map_err(|e| { | ||
| crate::BgpkitCommonsError::data_source_error( | ||
| "RPKIviews", | ||
| format!("Failed to spawn gunzip: {}", e), | ||
| ) | ||
| })?; |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code spawns external gunzip processes without any timeout or resource limits. A malicious or malformed .tgz file could cause the gunzip process to hang indefinitely or consume excessive resources. Consider adding a timeout mechanism or resource monitoring.
src/rpki/rpkiviews.rs
Outdated
| // Don't wait for writer thread - it will terminate when pipe closes | ||
| // Just detach it | ||
| drop(writer_thread); |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The writer thread is dropped without being joined, which could lead to incomplete writes or resource leaks. While there's a comment explaining this, it's a potential issue. Consider using drop(writer_thread.join()); to ensure clean shutdown, or at minimum document why fire-and-forget is acceptable here.
| // Don't wait for writer thread - it will terminate when pipe closes | |
| // Just detach it | |
| drop(writer_thread); | |
| // Wait for writer thread to finish to ensure clean shutdown | |
| let _ = writer_thread.join(); |
src/rpki/rpkiviews.rs
Outdated
| } | ||
|
|
||
| should_stop.store(true, std::sync::atomic::Ordering::Relaxed); | ||
| drop(writer_thread); |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thread management issue. The writer thread is dropped without joining, which could result in incomplete operations or resource leaks.
| drop(writer_thread); | |
| match writer_thread.join() { | |
| Ok(Ok(())) => {}, | |
| Ok(Err(e)) => return Err(e), | |
| Err(_) => { | |
| return Err(crate::BgpkitCommonsError::data_source_error( | |
| "RPKIviews", | |
| "Writer thread panicked while extracting file from tgz", | |
| )); | |
| } | |
| } |
| // Backwards Compatibility Type Aliases | ||
| // ============================================================================ | ||
|
|
||
| /// Type alias for backwards compatibility. Use [`Roa`] instead. |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The deprecated type alias RoaEntry has a deprecation notice, but there's no replacement guidance that points to the new Roa type. The notice says "Use Roa instead" which is good, but consider adding a version when this will be removed for better planning.
| /// Type alias for backwards compatibility. Use [`Roa`] instead. | |
| /// Type alias for backwards compatibility. Use [`Roa`] instead. | |
| /// Deprecated since 0.10.0. This alias will be removed in version 0.12.0. |
src/rpki/rpkiviews.rs
Outdated
| /// Load RPKI data from RPKIviews for a specific date. | ||
| /// | ||
| /// This will use the first (earliest) available file for the given date from the specified collector. | ||
| /// By default, uses the Kerfuffle collector. |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on line 531 says "By default, uses the Kerfuffle collector" but this is misleading - the function takes a collector parameter and doesn't use any default. The default is actually SoborostNet as defined in the Default impl for RpkiViewsCollector. Update the documentation to clarify that the caller specifies the collector.
| /// By default, uses the Kerfuffle collector. | |
| /// The caller must specify the collector to use. |
src/rpki/rpkiviews.rs
Outdated
| should_stop.store(true, std::sync::atomic::Ordering::Relaxed); | ||
| break; | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Signal writer to stop (in case we finished iterating) | ||
| should_stop.store(true, std::sync::atomic::Ordering::Relaxed); |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The atomic flag should_stop uses Ordering::Relaxed which provides no synchronization guarantees. While this may work in practice for this use case (a simple flag), consider using Ordering::SeqCst or at least Ordering::Release/Ordering::Acquire pairs for clearer memory ordering semantics.
| should_stop.store(true, std::sync::atomic::Ordering::Relaxed); | |
| break; | |
| } | |
| } | |
| } | |
| // Signal writer to stop (in case we finished iterating) | |
| should_stop.store(true, std::sync::atomic::Ordering::Relaxed); | |
| should_stop.store(true, std::sync::atomic::Ordering::SeqCst); | |
| break; | |
| } | |
| } | |
| } | |
| // Signal writer to stop (in case we finished iterating) | |
| should_stop.store(true, std::sync::atomic::Ordering::SeqCst); |
src/rpki/rpkiviews.rs
Outdated
| } | ||
|
|
||
| should_stop.store(true, std::sync::atomic::Ordering::Relaxed); | ||
| drop(writer_thread); |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same pattern of dropping the writer thread without joining appears here as well. This could leave threads running or cause resource leaks. Consider joining the thread or documenting why this is safe.
| drop(writer_thread); | |
| let _ = writer_thread.join(); |
| let mut gunzip = Command::new("gunzip") | ||
| .arg("-c") | ||
| .stdin(Stdio::piped()) | ||
| .stdout(Stdio::piped()) | ||
| .stderr(Stdio::piped()) | ||
| .spawn() | ||
| .map_err(|e| { | ||
| crate::BgpkitCommonsError::data_source_error( | ||
| "RPKIviews", | ||
| format!("Failed to spawn gunzip: {}", e), | ||
| ) | ||
| })?; |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gunzip process is spawned without checking if the gunzip command is available. If gunzip is not installed or not in PATH, this will fail with a potentially confusing error. Consider adding a check or documenting this system dependency in the module documentation.
src/rpki/mod.rs
Outdated
| } | ||
|
|
||
| #[test] | ||
| fn test_validate_check_expiry() { |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The test name test_validate_check_expiry is very generic. Consider renaming it to something more descriptive like test_validate_check_expiry_with_time_constraints to better indicate what scenarios are being tested.
| fn test_validate_check_expiry() { | |
| fn test_validate_check_expiry_with_time_constraints() { |
src/rpki/rpkiviews.rs
Outdated
| Ok(0) => break, | ||
| Ok(n) => n, | ||
| Err(e) if e.kind() == std::io::ErrorKind::Interrupted => continue, | ||
| Err(_) => break, // Connection closed or error, stop gracefully |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the writer thread error handling on line 220, network errors during HTTP streaming are silently swallowed with Err(_) => break. This could hide real errors (like corrupted data or unexpected disconnections) from the user. Consider logging the error before breaking or distinguishing between expected EOF and unexpected errors.
Use SeqCst ordering for should_stop/should_stop_writer, log network read errors instead of silently breaking, and join the writer thread to ensure a clean shutdown. Add module documentation noting the `gunzip` requirement for RPKIviews. Add a deprecation note to the RoaEntry alias, rename/update related tests, and clarify collector usage in the from_rpkiviews docs and test.
This pull request introduces major enhancements and refactoring to the
bgpkit-commonscrate, focusing on historical RPKI data support, API improvements, and codebase consolidation. The most significant changes include adding RPKIviews as a historical RPKI data source with efficient streaming, unifying and expanding public API types, migrating external crates (as2org-rsandpeeringdb-rs) into the codebase, and providing a new builder pattern for AS information loading. Comprehensive tests and new example programs demonstrate the improved functionality.Historical RPKI Data Support
RpkiTrie::from_rpkiviews,from_rpkiviews_file,from_rpkiviews_files), and file discovery (list_rpkiviews_files). Streaming optimizations allow early termination when extractingrpki-client.jsonfrom large.tgzarchives, significantly reducing test completion time.output.json.xz) for consistency and richer data, enabled by default via thexzfeature inoneio. [1] [2]Public API and Builder Pattern
AsInfoBuilderstruct for ergonomic, fluent configuration of AS information loading, replacing confusing boolean parameters. Added corresponding methods toBgpkitCommonsfor builder creation and loading. [1] [2]Crate Consolidation and Feature Flags
as2org-rs) and PeeringDB API client (peeringdb-rs) directly into theasinfomodule, removing external dependencies and simplifying maintenance. Feature flags updated accordingly. [1] [2] [3]New and Updated Examples
examples/rpki_historical.rsto demonstrate historical RPKI data loading and file listing. Updatedexamples/list_aspas.rsto count ASPA objects for the first day of each year (2020-2025) using both RIPE and RPKIviews sources. [1] [2] [3]Testing and Utility Improvements
as2organdpeeringdbmodules, and new helper methods for PeeringDB usability (len,is_empty,contains,get_all_asns).