Skip to content

Commit d9086bc

Browse files
committed
- fix typos and spell check docs + readme
- fix bug with files too small to compute tlsh
1 parent 76a2571 commit d9086bc

File tree

15 files changed

+66
-66
lines changed

15 files changed

+66
-66
lines changed

README.rst

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
Security auditing and static code analysis
1616
=================================================
1717

18-
Aura is a static analysis framework developed as a response to ever increasing threat of malicious packages and vulnerable code published on PyPI.
18+
Aura is a static analysis framework developed as a response to the ever-increasing threat of malicious packages and vulnerable code published on PyPI.
1919

2020

2121
Project goals:
@@ -28,20 +28,20 @@ Project goals:
2828
Why Aura?
2929
---------
3030

31-
While there are other tools with a functionality that overlaps with Aura such as Bandit, dlint, semgrep etc. the focus of these alternatives is different which impacts functionality and how they are being used. These alternatives are mainly intended to be used in a similar way to linters, integrated into IDEs, frequently run during the development which makes it important to **minimize false positives** and reporting with clear **actionable** explanations in ideal cases.
31+
While there are other tools with functionality that overlaps with Aura such as Bandit, dlint, semgrep etc. the focus of these alternatives is different which impacts the functionality and how they are being used. These alternatives are mainly intended to be used in a similar way to linters, integrated into IDEs, frequently run during the development which makes it important to **minimize false positives** and reporting with clear **actionable** explanations in ideal cases.
3232

33-
Aura on the other hand reports on **behaviour of the code**, **anomalies** and **vulnerabilities** with as much information as possible at the cost of false positive. There are a lot of things reported by aura that are not necessarily actionable by a user but they tell you a lot about the behaviour of the code such as doing network communication, accessing sensitive files or using mechanisms associated with obfuscation indicating a possible malicious code. By collecting this kind of data and aggregating it together, Aura can be compared in functionality to other security systems such as antivirus, IDS or firewalls that are essentially doing the same analysis but on a different kind of data (network communication, running processes etc).
33+
Aura on the other hand reports on ** behavior of the code**, **anomalies**, and **vulnerabilities** with as much information as possible at the cost of false positive. There are a lot of things reported by aura that are not necessarily actionable by a user but they tell you a lot about the behavior of the code such as doing network communication, accessing sensitive files, or using mechanisms associated with obfuscation indicating a possible malicious code. By collecting this kind of data and aggregating it together, Aura can be compared in functionality to other security systems such as antivirus, IDS, or firewalls that are essentially doing the same analysis but on a different kind of data (network communication, running processes, etc).
3434

3535
Here is a quick overview of differences between Aura and other similar linters and SAST tools:
3636

3737
- **input data**:
3838
- **Other SAST tools** - usually restricted to only python (target) source code and python version under which the tool is installed.
39-
- **Aura** can analyze both binary (or non python code) and python source code as well. Able to analyze a mixture of python code compatible with different python versions (py2k & py3k) using **the same Aura installation**.
39+
- **Aura** can analyze both binary (or non-python code) and python source code as well. Able to analyze a mixture of python code compatible with different python versions (py2k & py3k) using **the same Aura installation**.
4040
- **reporting**:
41-
- **Other SAST tools** - Aims at integrating well with other systems such as IDEs, CI systems with actionable results while trying to minimize false positives to prevent overwhelming users with too much non-significant alerts.
42-
- **Aura** - reports as much information as possible that is not immediately actionable such as behavioral and anomaly analysis. Output format is designed for easy machine processing and aggregation rather then human readable.
41+
- **Other SAST tools** - Aims at integrating well with other systems such as IDEs, CI systems with actionable results while trying to minimize false positives to prevent overwhelming users with too many non-significant alerts.
42+
- **Aura** - reports as much information as possible that is not immediately actionable such as behavioral and anomaly analysis. The output format is designed for easy machine processing and aggregation rather than human readable.
4343
- **configuration**:
44-
- **Other SAST tools** - The tools is fine-tuned to the target project by customizing the signatures to target specific technologies used by the target project. Overriding configuration is often possible by inserting comments inside the source code such as ``# nosec`` that will suppress the alert at that position
44+
- **Other SAST tools** - The tools are fine-tuned to the target project by customizing the signatures to target specific technologies used by the target project. The overriding configuration is often possible by inserting comments inside the source code such as ``# nosec`` that will suppress the alert at that position
4545
- **Aura** - it is expected that there is little to no knowledge in advance about the technologies used by code that is being scanned such as auditing a new python package for approval to be used as a dependency in a project. In most cases, it is not even possible to modify the scanned source code such as using comments to indicate to linter or aura to skip detection at that location because it is scanning a copy of that code that is hosted at some remote location.
4646

4747

@@ -62,7 +62,7 @@ Running Aura
6262

6363
docker run -ti --rm sourcecodeai/aura:dev scan pypi://requests -v
6464

65-
Aura uses a so called URIs to identify the protocol and location to scan, if no protocol is used, the scan argument is treated as a path to the file or directory on a local system.
65+
Aura uses a so-called URIs to identify the protocol and location to scan, if no protocol is used, the scan argument is treated as a path to the file or directory on a local system.
6666

6767

6868
Diff packages::

aura/uri_handlers/base.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -173,15 +173,13 @@ def __compute_hashes(self):
173173
sha1.update(buffer)
174174
sha256.update(buffer)
175175
sha512.update(buffer)
176-
177176
buffer = fd.read(4096)
178177

179178
try:
180179
tl.final()
180+
self.metadata["tlsh"] = tl.hexdigest()
181181
except ValueError: # TLSH needs at least 256 bytes
182182
pass
183-
else:
184-
self.metadata["tlsh"] = tl.hexdigest()
185183

186184
self.metadata["md5"] = md5.hexdigest()
187185
self.metadata["sha1"] = sha1.hexdigest()

docs/source/analyzers.rst

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,31 +6,31 @@ Aura ships by default with a huge amount of built-in analyzers. To find which an
66
Technical description
77
=====================
88

9-
Analyzers are developed as hooks that take input data for processing and output either detection result or a ScanLocation for Aura to scan. There are two major types of analyzers. The first one is a classic "normal" analyzer that receives as input a file/directory path with metadata and performs an analysis. This way any kind of file can be processed including non-source code (Python) files. Second type of analyzer is called visitor. It takes an already parsed source code as an input (AST tree) and performs tree traversal, detections and modifications of this tree. A visitor analyzer can modify the tree and such visitors can be chained together which is a core part of a static analysis functionality. A visitor workflow on top of a Python source code is as following:
9+
Analyzers are developed as hooks that take input data for processing and output either detection result or a ScanLocation for Aura to scan. There are two major types of analyzers. The first one is a classic "normal" analyzer that receives as input a file/directory path with metadata and performs an analysis. This way any kind of file can be processed including non-source code (Python) files. The second type of analyzer is called a visitor. It takes an already parsed source code as an input (AST tree) and performs tree traversal, detections, and modifications of this tree. A visitor analyzer can modify the tree and such visitors can be chained together which is a core part of static analysis functionality. A visitor workflow on top of a Python source code is as follows:
1010

11-
- Convert: converts a raw json (parsed ast) into internal representation of nodes that aura uses for further analysis.
12-
- Rewrite: rewrites the AST tree into while retaining it's semantic equivalent. This is done by applying rules such as constant propagation, string concatenation etc... that removes an unnecessary complexity from the AST tree.
11+
- Convert: converts a raw JSON (parsed ast) into an internal representation of nodes that aura uses for further analysis.
12+
- Rewrite: rewrites the AST tree while retaining its semantic equivalent. This is done by applying rules such as constant propagation, string concatenation, etc... that remove unnecessary complexity from the AST tree.
1313
- Taint Analysis: performs taint analysis using defined semantic rules.
14-
- Read Only: runs all read only node visitors, see description below.
14+
- Read Only: runs all read-only node visitors, see description below.
1515

16-
Read only visitors are a special type of visitors that as the name suggest are prohibited doing any kind of modifications to the tree. This is where the majority of detections that produce results are happening. Since these analyzers are read only, Aura can run them in parallel on each visited node instead of doing a separate tree traversal for each of the analyzers. This provides a massive performance boost and it is highly recommended to always code AST node analyzers as read only visitors.
16+
Read-only visitors are a special type of visitors that as the name suggests are prohibited from doing any kind of modifications to the tree. This is where the majority of detections that produce results are happening. Since these analyzers are read-only, Aura can run them in parallel on each visited node instead of doing a separate tree traversal for each of the analyzers. This provides a massive performance boost and it is highly recommended to always code AST node analyzers as read-only visitors.
1717

1818

1919
ScanLocation is a special type of an item that points to either a directory or a file and tells aura to scan it using enabled analyzers. A common use case for outputting a ScanLocation is when the analyzer itself for example unpacks a zip file and want to process the extracted files in a recursive way
2020

21-
Detection result is a standard way to produce an information/result that is by the end of the analysis reported back to the user or serialized into output format.
21+
The detection result is a standard way to produce an information/result that is by the end of the analysis reported back to the user or serialized into the output format.
2222

2323

2424
Creating analyzers
2525
==================
2626

27-
Standard (path based) analyzer
27+
Standard (path-based) analyzer
2828
------------------------------
2929

3030
TODO
3131

3232

33-
Read only AST visitor
33+
Read-only AST visitor
3434
---------------------
3535

3636
TODO

docs/source/apip.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,6 @@ apip
22
====
33

44
Aura contains an experimental wrapper around the ``pip`` command that would intercept any package installation and sends it to Aura for analysis.
5-
This wrapper is available under the `<project root>/aura/apip.py` and can be copy/pasted into your bin directory. this is done automatically in case the framework is installed via poetry. apip requires to have the ``AURA_PATH`` environment variable set to point to the aura installation, e.g. the `aura` command, which you can find by running ``which aura`` in your shell. Usage of apip is exactly the same as using the pip command, it proxies everything behind the scenes to the pip script and monkey patch the pip installation to allow intercepting of the package installation.
5+
This wrapper is available under the `<project root>/aura/apip.py` and can be copy/pasted into your bin directory. this is done automatically in case the framework is installed via poetry. apip requires having the ``AURA_PATH`` environment variable set to point to the aura installation, e.g. the `aura` command, which you can find by running ``which aura`` in your shell. Usage of apip is exactly the same as using the pip command, it proxies everything behind the scenes to the pip script and monkey patch the pip installation to allow intercepting of the package installation.
6+
7+
As pip itself does not provide any standard mechanism to hook into package installation, the ``apip`` is using a monkey patching technique to modify existing pip structures to be able to intercept package installations. We are trying to push for a native functionality using this GitHub issue ticket: https://github.com/pypa/pip/issues/8938 .

0 commit comments

Comments
 (0)