fix(ghidra): implement flow-insensitive block discovery#2990
fix(ghidra): implement flow-insensitive block discovery#2990sashwathsubra wants to merge 3 commits intomandiant:masterfrom
Conversation
…nction truncation
There was a problem hiding this comment.
Code Review
This pull request introduces several bug fixes and performance optimizations, most notably a flow-insensitive block iteration for Ghidra to prevent function truncation and various safety checks in Ghidra extractors. However, the newly introduced _RuleFeatureIndex contains a critical bug that breaks Substring and Regex feature matching and causes a performance regression by re-indexing rules during every match call. A copy-paste error was also identified in the changelog.
| # inspect(match_details) | ||
| # | ||
| # aliased here so that the type can be documented and xref'd. | ||
| class _RuleFeatureIndex: |
There was a problem hiding this comment.
The _RuleFeatureIndex implementation has a critical correctness bug: it breaks rules that use Substring or Regex features. These features match against String features in the FeatureSet via partial or pattern matching. However, get_candidates performs an exact lookup (feature in self.features). Since the extracted features in the FeatureSet are String objects and the indexed features are Substring/Regex objects, they will never match exactly, causing these rules to be incorrectly filtered out and never evaluated.
| index = _RuleFeatureIndex(rules) | ||
| candidates = index.get_candidates(features) |
There was a problem hiding this comment.
Instantiating _RuleFeatureIndex(rules) inside the match function introduces a significant performance regression. The match function is called for every scope (file, function, basic block) during analysis. Re-indexing the entire rule set on every call is computationally expensive ($O(Rules \times Features)$) and likely far outweighs the benefit of skipping rule.evaluate calls, especially since evaluate is already optimized for short-circuiting. The index should be constructed once and reused, or this change should be reverted.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
@sashwathsubra This pull request contains unrelated changes to the rules engine and cache. It also contains unrelated formatting changes. Please remove all unrelated changes and post a screenshot of all tests passing locally before we give this a review. |
This is a focused PR to fix the Ghidra function truncation bug.
Related Issue
Fixes #2989