Skip to content

Commit 37a38ae

Browse files
authored
Merge pull request #3 from mutating/develop
0.0.3
2 parents f261e98 + 1169592 commit 37a38ae

17 files changed

Lines changed: 574 additions & 17 deletions

.github/workflows/lint.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ jobs:
77
runs-on: ubuntu-latest
88
strategy:
99
matrix:
10-
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13", "3.14", "3.14t"]
10+
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13", "3.14", "3.14t", "3.15.0-alpha.1"]
1111

1212
steps:
1313
- uses: actions/checkout@v4

.github/workflows/tests_and_coverage.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ jobs:
88
strategy:
99
matrix:
1010
os: [macos-latest, ubuntu-latest, windows-latest]
11-
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13", "3.14", "3.14t"]
11+
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13", "3.14", "3.14t", "3.15.0-alpha.1"]
1212

1313
steps:
1414
- uses: actions/checkout@v4

README.md

Lines changed: 124 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -17,26 +17,35 @@
1717

1818
![logo](https://raw.githubusercontent.com/mutating/getsources/develop/docs/assets/logo_1.svg)
1919

20+
This library lets you retrieve a function's source code at runtime. It can serve as a foundation for tools that work with [ASTs](https://en.wikipedia.org/wiki/Abstract_syntax_tree). It is a thin wrapper around [`inspect.getsource`](https://docs.python.org/3/library/inspect.html#inspect.getsource) and [`dill.source.getsource`](https://dill.readthedocs.io/en/latest/dill.html#dill.source.getsource).
2021

21-
This library is needed to obtain the source code of functions at runtime. It can be used, for example, as a basis for libraries that work with [AST](https://en.wikipedia.org/wiki/Abstract_syntax_tree) on the fly. In fact, it is a thin layer built around [`inspect.getsource`](https://docs.python.org/3/library/inspect.html#inspect.getsource) and [`dill.source.getsource`](https://dill.readthedocs.io/en/latest/dill.html#dill.source.getsource).
22+
23+
## Table of contents
24+
25+
- [**Installation**](#installation)
26+
- [**Get raw source**](#get-raw-source)
27+
- [**Get cleaned source**](#get-cleaned-source)
28+
- [**Generate source hashes**](#generate-source-hashes)
2229

2330

2431
## Installation
2532

26-
You can install [`getsources`](https://pypi.python.org/pypi/getsources) using pip:
33+
You can install [`getsources`](https://pypi.python.org/pypi/getsources) with pip:
2734

2835
```bash
2936
pip install getsources
3037
```
38+
You can also use [`instld`](https://github.com/pomponchik/instld) to quickly try this package and others without installing them.
3139

32-
You can also quickly try out this and other packages without having to install using [instld](https://github.com/pomponchik/instld).
3340

41+
## Get raw source
3442

35-
## Usage
43+
The standard library provides the [`getsource`](https://docs.python.org/3/library/inspect.html#inspect.getsource) function that returns the source code of functions and other objects. However, this does not work with functions defined in the [`REPL`](https://docs.python.org/3/tutorial/interpreter.html#interactive-mode).
3644

37-
The basic function of the library is `getsource`, which works similarly to the function of the same name from the standard library:
45+
This library provides a function with the same name and nearly the same interface, but without this limitation:
3846

3947
```python
48+
# You can run this code snippet in the REPL.
4049
from getsources import getsource
4150

4251
def function():
@@ -47,12 +56,37 @@ print(getsource(function))
4756
#> ...
4857
```
4958

50-
Unlike its counterpart from the standard library, this thing can also work:
59+
This allows AST-based tools to work reliably in both scripts and the `REPL`. All other functions in the library are built on top of it.
60+
61+
> ⚠️ Please note that this library is intended solely for retrieving the source code of functions of any kind, including generators, async functions, regular functions, class methods, lambdas, and so on. It is not intended for classes, modules, or other objects. Other use cases may work, but they are not covered by the test suite.
62+
5163

52-
- With lambda functions
53-
- With functions defined inside REPL
64+
## Get cleaned source
5465

55-
We also often need to trim excess indentation from a function object to make it easier to further process the resulting code. To do this, use the `getclearsource` function:
66+
The [`getsource`](#get-raw-source) function returns a function's source code in raw form. This means that the code snippet captures some unnecessary surrounding code.
67+
68+
Here is an example where the standard `getsource` output includes extra leading whitespace:
69+
70+
```python
71+
if True:
72+
def function():
73+
...
74+
75+
print(getsource(function))
76+
#> def function():
77+
#> ...
78+
```
79+
80+
> ↑ Notice the extra leading spaces.
81+
82+
For lambda functions, it may also return the entire surrounding expression:
83+
84+
```python
85+
print(getsource(lambda x: x))
86+
#> print(getsource(lambda x: x))
87+
```
88+
89+
To address these issues, the library provides a function called `getclearsource`, which returns the function's source with unnecessary context removed:
5690

5791
```python
5892
from getsources import getclearsource
@@ -66,6 +100,86 @@ print(getclearsource(SomeClass.method))
66100
#> @staticmethod
67101
#> def method():
68102
#> ...
103+
print(getclearsource(lambda x: x))
104+
#> lambda x: x
105+
```
106+
107+
To extract only the substring containing a lambda function, the library uses AST parsing behind the scenes. Unfortunately, this [does not allow](https://stackoverflow.com/a/55386046/14522393) it to distinguish between multiple lambda functions defined in a single line, so in this case you will get an exception:
108+
109+
```python
110+
lambdas = [lambda: None, lambda x: x]
111+
112+
getclearsource(lambdas[0])
113+
#> ...
114+
#> getsources.errors.UncertaintyWithLambdasError: Several lambda functions are defined in a single line of code, can't determine which one.
115+
```
116+
117+
If you absolutely must obtain at least some source code for these lambdas, use [`getsource`](#get-raw-source):
118+
119+
```python
120+
try:
121+
getclearsource(function)
122+
except UncertaintyWithLambdasError:
123+
getsource(function)
124+
```
125+
126+
However, in general, the `getclearsource` function is recommended for retrieving the source code of functions when working with the AST.
127+
128+
129+
## Generate source hashes
130+
131+
In some cases, you may not care about a function's exact source, but you still need to distinguish between different implementations. In this case, the `getsourcehash` function is useful. It returns a short hash string derived from the function's source code:
132+
133+
```python
134+
from getsources import getsourcehash
135+
136+
def function():
137+
...
138+
139+
print(getsourcehash(function))
140+
#> 7SWJGZ
69141
```
70142

71-
As you can see, the resulting source code text has no extra indentation, but in all other respects this function is completely identical to the usual `getsource`.
143+
> ⓘ A hash string uses only characters from the [`Crockford Base32`](https://en.wikipedia.org/wiki/Base32) alphabet, which consists solely of uppercase English letters and digits; ambiguous characters are excluded, which makes the hash easier to read.
144+
145+
> ⓘ The `getsourcehash` function is built on top of [`getclearsource`](#get-cleaned-source) and ignores "extra" characters in the source code.
146+
147+
By default, the hash string length is 6 characters, but you can choose a length from 4 to 8 characters:
148+
149+
```python
150+
print(getsourcehash(function, size=4))
151+
#> WJGZ
152+
print(getsourcehash(function, size=8))
153+
#> XG7SWJGZ
154+
```
155+
156+
By default, the full source code of a function is used, including its name and arguments. If you only want to compare function bodies, pass `only_body=True`:
157+
158+
```python
159+
def function_1():
160+
...
161+
162+
def function_2(a=5):
163+
...
164+
165+
print(getsourcehash(function_1, only_body=True))
166+
#> V587A6
167+
print(getsourcehash(function_2, only_body=True))
168+
#> V587A6
169+
```
170+
171+
By default, docstrings are considered part of the function body. If you want to skip them as well, pass `skip_docstring=True`:
172+
173+
```python
174+
def function_1():
175+
"""some text"""
176+
...
177+
178+
def function_2(a=5):
179+
...
180+
181+
print(getsourcehash(function_1, only_body=True, skip_docstring=True))
182+
#> V587A6
183+
print(getsourcehash(function_2, only_body=True, skip_docstring=True))
184+
#> V587A6
185+
```

getsources/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,4 @@
11
from getsources.base import getsource as getsource
22
from getsources.clear import getclearsource as getclearsource
3+
from getsources.errors import UncertaintyWithLambdasError as UncertaintyWithLambdasError
4+
from getsources.hash import getsourcehash as getsourcehash

getsources/clear.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,33 @@
1+
from ast import Lambda, get_source_segment, parse, walk
12
from typing import Any, Callable
23

34
from getsources import getsource
5+
from getsources.errors import UncertaintyWithLambdasError
6+
from getsources.helpers.is_lambda import is_lambda
47

58

69
def getclearsource(function: Callable[..., Any]) -> str:
710
source_code = getsource(function)
811

12+
if is_lambda(function):
13+
stripped_source_code = source_code.strip()
14+
tree = parse(stripped_source_code)
15+
16+
first = True
17+
lambda_node = None
18+
for node in walk(tree):
19+
if isinstance(node, Lambda):
20+
if not first:
21+
raise UncertaintyWithLambdasError('Several lambda functions are defined in a single line of code, can\'t determine which one.')
22+
lambda_node = node
23+
first = False
24+
25+
segment_source = get_source_segment(stripped_source_code, lambda_node) # type: ignore[arg-type]
26+
if segment_source is None:
27+
raise UncertaintyWithLambdasError('It seems that the AST for the lambda function has been modified; can\'t extract the source code.')
28+
return segment_source
29+
30+
931
splitted_source_code = source_code.split('\n')
1032

1133
indent = 0

getsources/errors.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
class UncertaintyWithLambdasError(Exception):
2+
...

getsources/hash.py

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
import hashlib
2+
from ast import Constant, Expr, Lambda, get_source_segment, parse, walk
3+
from typing import Any, Callable
4+
5+
from getsources import getclearsource
6+
from getsources.helpers.is_lambda import is_lambda
7+
8+
ALPHABET = '0123456789ABCDEFGHJKMNPQRSTVWXYZ'
9+
10+
11+
def get_body_text(function: Callable[..., Any], source: str, skip_docstring: bool) -> str:
12+
tree = parse(source)
13+
14+
if is_lambda(function):
15+
body_nodes = []
16+
17+
for node in walk(tree):
18+
if isinstance(node, Lambda):
19+
body_nodes.append(node.body)
20+
21+
else:
22+
function_node = tree.body[0]
23+
body_nodes = function_node.body # type: ignore[attr-defined]
24+
first = body_nodes[0]
25+
26+
if skip_docstring and body_nodes and (isinstance(first, Expr) and isinstance(first.value, Constant) and isinstance(first.value.value, str)):
27+
body_nodes = body_nodes[1:]
28+
29+
return '\n'.join([get_source_segment(source, statement) for statement in body_nodes]) # type: ignore[misc]
30+
31+
32+
def getsourcehash(function: Callable[..., Any], size: int = 6, only_body: bool = False, skip_docstring: bool = False) -> str:
33+
if not 4 <= size <= 8:
34+
raise ValueError('The hash string size must be in range 4..8.')
35+
if skip_docstring and not only_body:
36+
raise ValueError('You can omit the docstring only if the `only_body=True` option is set.')
37+
38+
source_code = getclearsource(function)
39+
if not only_body:
40+
interesting_part = source_code
41+
else:
42+
interesting_part = get_body_text(function, source_code, skip_docstring=skip_docstring)
43+
44+
digest = hashlib.sha256(interesting_part.encode('utf-8')).digest()
45+
number = int.from_bytes(digest, 'big')
46+
base = len(ALPHABET)
47+
48+
chars = []
49+
for _ in range(size):
50+
number, rem = divmod(number, base)
51+
chars.append(ALPHABET[rem])
52+
53+
return ''.join(reversed(chars))

getsources/helpers/__init__.py

Whitespace-only changes.

getsources/helpers/is_lambda.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
from types import FunctionType
2+
from typing import Any, Callable
3+
4+
5+
def is_lambda(function: Callable[..., Any]) -> bool:
6+
return isinstance(function, FunctionType) and function.__name__ == "<lambda>"

pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "getsources"
7-
version = "0.0.2"
7+
version = "0.0.3"
88
authors = [
99
{ name="Evgeniy Blinov", email="zheni-b@yandex.ru" },
1010
]
@@ -28,6 +28,7 @@ classifiers = [
2828
'Programming Language :: Python :: 3.12',
2929
'Programming Language :: Python :: 3.13',
3030
'Programming Language :: Python :: 3.14',
31+
'Programming Language :: Python :: 3.15',
3132
'Programming Language :: Python :: Free Threading',
3233
'Programming Language :: Python :: Free Threading :: 3 - Stable',
3334
'License :: OSI Approved :: MIT License',

0 commit comments

Comments
 (0)