Path extraction using regular expressions on file/http response content.#17
Open
thiezn wants to merge 2 commits intoassetnote:masterfrom
Open
Path extraction using regular expressions on file/http response content.#17thiezn wants to merge 2 commits intoassetnote:masterfrom
thiezn wants to merge 2 commits intoassetnote:masterfrom
Conversation
Author
|
@infosec-au I've run the BigQuery for php myself today on the github dataset which gave me about 1.5 million results. If you are interested I can share the wordlist itself to include on assetnote. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi,
Here's a pull request that adds the capability to provide a regular expression on the github file or httparchive http response content. The result will be a wordlist of unique paths to the files/urls that matches the regular expression.
Note that this is a large amount of data to process so can get pretty expensive quickly. To cut cost I've added a --sample-rate parameter that will only query a subset of the github or httparchive datasets.
I was not able to test the code fully as I don't have readonly permission on the gs://commonspeak-udf/URI.min.js file. The queries are working ok when I remove the path parsing temporary function from the httparchive sql query. It's also the first time I've touched golang code so forgive me if I've done anything stupid here.
To be honest, the cost/benefit of these queries probably don't add up so feel free to close off this pull request. Since I spent some time writing the code I'd thought I'd at least do a pull request in case it's interesting for you guys.
Example queries on github
An example use case is to extract all paths that contain so called PHP superglobals ($_GET, $_POST, etc). These files take input from a web browser so are more likely susceptible to vulnerabilities. To extract this it makes more sense to leverage the github source as they will contain the raw PHP files. HTTP responses will only contain the processed .php files.
Example run with a sample rate of 0.01%. This cost around 213.48Gb on BigQuery
Example run without any sampling rate costs 2.64TB to run. At the moment it's $5,- per TB so adds up to $15,-
Another example would be to do something similar for known javascript files using a regex like 'eval(.*)|.setInterval(|.setInterval(|dangerouslySetInnerHTML|bypassSecurityTrustAs'. Or perhaps you use a regex string to match top level domains of known bug bounty targets.
Example queries on httparchive
Similar to the github example, we can leverage the httparchive set in the same way. For example you could create a wordlist with paths that contain Java springboot error messages. Springboot will generate so called Whitelabel Error Pages when no explicit error page has been defined.
Example run with a sample rate of 0.01%. This cost around 3.08Gb on BigQuery
Example run without any sampling rate costs 31.97 TB to run. At the moment it's $5,- per TB so adds up to $160,-
Kind regards,
Thiezn