Skip to content

fix(ghidra): implement flow-insensitive block discovery#2990

Open
sashwathsubra wants to merge 3 commits intomandiant:masterfrom
sashwathsubra:fix/ghidra-function-truncation
Open

fix(ghidra): implement flow-insensitive block discovery#2990
sashwathsubra wants to merge 3 commits intomandiant:masterfrom
sashwathsubra:fix/ghidra-function-truncation

Conversation

@sashwathsubra
Copy link
Copy Markdown

This is a focused PR to fix the Ghidra function truncation bug.
Related Issue
Fixes #2989

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several bug fixes and performance optimizations, most notably a flow-insensitive block iteration for Ghidra to prevent function truncation and various safety checks in Ghidra extractors. However, the newly introduced _RuleFeatureIndex contains a critical bug that breaks Substring and Regex feature matching and causes a performance regression by re-indexing rules during every match call. A copy-paste error was also identified in the changelog.

# inspect(match_details)
#
# aliased here so that the type can be documented and xref'd.
class _RuleFeatureIndex:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The _RuleFeatureIndex implementation has a critical correctness bug: it breaks rules that use Substring or Regex features. These features match against String features in the FeatureSet via partial or pattern matching. However, get_candidates performs an exact lookup (feature in self.features). Since the extracted features in the FeatureSet are String objects and the indexed features are Substring/Regex objects, they will never match exactly, causing these rules to be incorrectly filtered out and never evaluated.

Comment on lines +336 to +337
index = _RuleFeatureIndex(rules)
candidates = index.get_candidates(features)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Instantiating _RuleFeatureIndex(rules) inside the match function introduces a significant performance regression. The match function is called for every scope (file, function, basic block) during analysis. Re-indexing the entire rule set on every call is computationally expensive ($O(Rules \times Features)$) and likely far outweighs the benefit of skipping rule.evaluate calls, especially since evaluate is already optimized for short-circuiting. The index should be constructed once and reused, or this change should be reverted.

sashwathsubra and others added 2 commits April 3, 2026 19:25
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@mike-hunhoff
Copy link
Copy Markdown
Collaborator

@sashwathsubra This pull request contains unrelated changes to the rules engine and cache. It also contains unrelated formatting changes. Please remove all unrelated changes and post a screenshot of all tests passing locally before we give this a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(ghidra): function truncation during analysis

3 participants