fix(conf): fix meta-externalagent UA case and use ASN verification#17
Open
adri wants to merge 1 commit intocnlangzi:mainfrom
Open
fix(conf): fix meta-externalagent UA case and use ASN verification#17adri wants to merge 1 commit intocnlangzi:mainfrom
adri wants to merge 1 commit intocnlangzi:mainfrom
Conversation
The actual User-Agent string is lowercase "meta-externalagent", not "Meta-ExternalAgent". Since UA matching is case-sensitive, the bot was not being detected. Also switch from RDNS to ASN 32934 (Meta) verification for faster and more reliable detection. Ref: https://developers.facebook.com/docs/sharing/webmasters/crawler
Contributor
Reviewer's guide (collapsed on small PRs)Reviewer's GuideUpdates the Meta AI/Facebook crawler bot definition to match the actual lowercase User-Agent string and switches verification from reverse DNS and domain list to ASN-based verification for Meta’s ASN 32934. Sequence diagram for updated Meta crawler detection (UA + ASN)sequenceDiagram
actor MetaCrawler
participant WebServer
participant BotDetector
participant ConfigMetaExternalagent
MetaCrawler->>WebServer: HTTP GET /resource
WebServer->>BotDetector: Request with headers and client_ip
BotDetector->>ConfigMetaExternalagent: Load kind AITraining name meta-externalagent
ConfigMetaExternalagent-->>BotDetector: ua meta-externalagent, asn 32934
BotDetector->>BotDetector: Compare UserAgent == meta-externalagent
alt UserAgent matches
BotDetector->>ASNService: Lookup ASN for client_ip
ASNService-->>BotDetector: ASN result
alt ASN == 32934
BotDetector-->>WebServer: Mark as verified Meta crawler
else ASN != 32934
BotDetector-->>WebServer: Not verified Meta crawler
end
else UserAgent does not match
BotDetector-->>WebServer: Not Meta crawler
end
WebServer-->>MetaCrawler: HTTP response
Flow diagram for new Meta crawler verification logicflowchart TD
A[Start request] --> B[Read UserAgent and client_ip]
B --> C{UserAgent == meta-externalagent?}
C -- No --> D[Treat as normal traffic]
C -- Yes --> E[Lookup ASN for client_ip]
E --> F{ASN == 32934?}
F -- Yes --> G[Classify as verified Meta AI/Facebook crawler]
F -- No --> H[Do not classify as Meta crawler]
D --> I[Continue standard handling]
G --> I
H --> I
I --> J[End]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #17 +/- ##
===========================================
- Coverage 72.76% 61.50% -11.27%
===========================================
Files 15 24 +9
Lines 661 1000 +339
===========================================
+ Hits 481 615 +134
- Misses 136 327 +191
- Partials 44 58 +14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The actual User-Agent string is lowercase "meta-externalagent", not "Meta-ExternalAgent". Since UA matching is case-sensitive, the bot was not being detected.
Also switch from RDNS to ASN 32934 (Meta) verification for faster and more reliable detection.
Ref: https://developers.facebook.com/docs/sharing/webmasters/crawler
Summary by Sourcery
Update Meta crawler bot configuration to correctly detect the meta-externalagent user-agent using ASN-based verification instead of RDNS/domain matching.
Bug Fixes:
Enhancements: