Skip to content

Commit 6191647

Browse files
authored
chore(migration): Migrate code from googleapis/python-bigquery-dataframes into packages/bigframes (#16505)
See #15999. This PR should be merged with a merge-commit, not a squash-commit, in order to preserve the git history.
2 parents d10f9fc + da3f29b commit 6191647

File tree

1,482 files changed

+394723
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,482 files changed

+394723
-0
lines changed

.librarian/config.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,10 @@ libraries:
2424
# Allow generation for google-cloud-bigtable once this bug is fixed.
2525
- id: "google-cloud-bigtable"
2626
generate_blocked: true
27+
# TODO(https://github.com/googleapis/google-cloud-python/issues/16489):
28+
# Allow releases for bigframes once the bug above is fixed.
29+
- id: "bigframes"
30+
release_blocked: true
2731
# TODO(https://github.com/googleapis/google-cloud-python/issues/16506):
2832
# Allow generation for google-cloud-firestore once this bug is fixed.
2933
- id: "google-cloud-firestore"

.librarian/state.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,14 @@
11
image: us-central1-docker.pkg.dev/cloud-sdk-librarian-prod/images-prod/python-librarian-generator@sha256:234b9d1f2ddb057ed7ac6a38db0bf8163d839c65c6cf88ade52530cddebce59e
22
libraries:
3+
- id: bigframes
4+
version: 2.39.0
5+
last_generated_commit: ""
6+
apis: []
7+
source_roots:
8+
- packages/bigframes
9+
preserve_regex: []
10+
remove_regex: []
11+
tag_format: '{id}-v{version}'
312
- id: bigquery-magics
413
version: 0.12.2
514
last_generated_commit: ""

packages/bigframes/.coveragerc

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# -*- coding: utf-8 -*-
2+
#
3+
# Copyright 2024 Google LLC
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# https://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
17+
# Generated by synthtool. DO NOT EDIT!
18+
[run]
19+
branch = True
20+
omit =
21+
google/__init__.py
22+
google/cloud/__init__.py
23+
24+
[report]
25+
fail_under = 35
26+
show_missing = True
27+
exclude_lines =
28+
# Re-enable the standard pragma
29+
pragma: NO COVER
30+
# Ignore debug-only repr
31+
def __repr__
32+
# Ignore abstract methods
33+
raise NotImplementedError
34+
omit =
35+
*/gapic/*.py
36+
*/proto/*.py
37+
*/site-packages/*.py
38+
google/cloud/__init__.py

packages/bigframes/.flake8

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# -*- coding: utf-8 -*-
2+
#
3+
# Copyright 2024 Google LLC
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# https://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
17+
# Generated by synthtool. DO NOT EDIT!
18+
[flake8]
19+
ignore = E203, E231, E266, E501, W503
20+
exclude =
21+
# Exclude generated code.
22+
**/proto/**
23+
**/gapic/**
24+
**/services/**
25+
**/types/**
26+
*_pb2.py
27+
28+
# Standard linting exemptions.
29+
**/.nox/**
30+
__pycache__,
31+
.git,
32+
*.pyc,
33+
conf.py
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
## Constraints
2+
3+
- Only add git commits. Do not change git history.
4+
- Follow the spec file for development.
5+
- Check off items in the "Acceptance
6+
criteria" and "Detailed steps" sections with `[x]`.
7+
- Please do this as they are completed.
8+
- Refer back to the spec after each step.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
## Documentation
2+
3+
If a method or property is implementing the same interface as a third-party
4+
package such as pandas or scikit-learn, place the relevant docstring in the
5+
corresponding `third_party/bigframes_vendored/package_name` directory, not in
6+
the `bigframes` directory. Implementations may be placed in the `bigframes`
7+
directory, though.
8+
9+
@../tools/test_docs.md
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
## Adding a scalar operator
2+
3+
For an example, see commit
4+
[c5b7fdae74a22e581f7705bc0cf5390e928f4425](https://github.com/googleapis/python-bigquery-dataframes/commit/c5b7fdae74a22e581f7705bc0cf5390e928f4425).
5+
6+
To add a new scalar operator, follow these steps:
7+
8+
1. **Define the operation dataclass:**
9+
- In `bigframes/operations/`, find the relevant file (e.g., `geo_ops.py` for geography functions) or create a new one.
10+
- Create a new dataclass inheriting from `base_ops.UnaryOp` for unary
11+
operators, `base_ops.BinaryOp` for binary operators, `base_ops.TernaryOp`
12+
for ternary operators, or `base_ops.NaryOp for operators with many
13+
arguments. Note that these operators are counting the number column-like
14+
arguments. A function that takes only a single column but several literal
15+
values would still be a `UnaryOp`.
16+
- Define the `name` of the operation and any parameters it requires.
17+
- Implement the `output_type` method to specify the data type of the result.
18+
19+
2. **Export the new operation:**
20+
- In `bigframes/operations/__init__.py`, import your new operation dataclass and add it to the `__all__` list.
21+
22+
3. **Implement the user-facing function (pandas-like):**
23+
24+
- Identify the canonical function from pandas / geopandas / awkward array /
25+
other popular Python package that this operator implements.
26+
- Find the corresponding class in BigFrames. For example, the implementation
27+
for most geopandas.GeoSeries methods is in
28+
`bigframes/geopandas/geoseries.py`. Pandas Series methods are implemented
29+
in `bigframes/series.py` or one of the accessors, such as `StringMethods`
30+
in `bigframes/operations/strings.py`.
31+
- Create the user-facing function that will be called by users (e.g., `length`).
32+
- If the SQL method differs from pandas or geopandas in a way that can't be
33+
made the same, raise a `NotImplementedError` with an appropriate message and
34+
link to the feedback form.
35+
- Add the docstring to the corresponding file in
36+
`third_party/bigframes_vendored`, modeled after pandas / geopandas.
37+
38+
4. **Implement the user-facing function (SQL-like):**
39+
40+
- In `bigframes/bigquery/_operations/`, find the relevant file (e.g., `geo.py`) or create a new one.
41+
- Create the user-facing function that will be called by users (e.g., `st_length`).
42+
- This function should take a `Series` for any column-like inputs, plus any other parameters.
43+
- Inside the function, call `series._apply_unary_op`,
44+
`series._apply_binary_op`, or similar passing the operation dataclass you
45+
created.
46+
- Add a comprehensive docstring with examples.
47+
- In `bigframes/bigquery/__init__.py`, import your new user-facing function and add it to the `__all__` list.
48+
49+
5. **Implement the compilation logic:**
50+
- In `bigframes/core/compile/scalar_op_compiler.py`:
51+
- If the BigQuery function has a direct equivalent in Ibis, you can often reuse an existing Ibis method.
52+
- If not, define a new Ibis UDF using `@ibis_udf.scalar.builtin` to map to the specific BigQuery function signature.
53+
- Create a new compiler implementation function (e.g., `geo_length_op_impl`).
54+
- Register this function to your operation dataclass using `@scalar_op_compiler.register_unary_op` or `@scalar_op_compiler.register_binary_op`.
55+
- This implementation will translate the BigQuery DataFrames operation into the appropriate Ibis expression.
56+
57+
6. **Add Tests:**
58+
- Add system tests in the `tests/system/` directory to verify the end-to-end
59+
functionality of the new operator. Test various inputs, including edge cases
60+
and `NULL` values.
61+
62+
Where possible, run the same test code against pandas or GeoPandas and
63+
compare that the outputs are the same (except for dtypes if BigFrames
64+
differs from pandas).
65+
- If you are overriding a pandas or GeoPandas property, add a unit test to
66+
ensure the correct behavior (e.g., raising `NotImplementedError` if the
67+
functionality is not supported).
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
## Code Style with nox
2+
3+
- We use the automatic code formatter `black`. You can run it using
4+
the nox session `format`. This will eliminate many lint errors. Run via:
5+
6+
```bash
7+
nox -r -s format
8+
```
9+
10+
- PEP8 compliance is required, with exceptions defined in the linter configuration.
11+
If you have ``nox`` installed, you can test that you have not introduced
12+
any non-compliant code via:
13+
14+
```
15+
nox -r -s lint
16+
```
17+
18+
- When writing tests, use the idiomatic "pytest" style.
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
## Testing code samples
2+
3+
Code samples are very important for accurate documentation. We use the "doctest"
4+
framework to ensure the samples are functioning as expected. After adding a code
5+
sample, please ensure it is correct by running doctest. To run the samples
6+
doctests for just a single method, refer to the following example:
7+
8+
```bash
9+
pytest --doctest-modules bigframes/pandas/__init__.py::bigframes.pandas.cut
10+
```
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
## Testing with nox
2+
3+
Use `nox` to instrument our tests.
4+
5+
- To test your changes, run unit tests with `nox`:
6+
7+
```bash
8+
nox -r -s unit
9+
```
10+
11+
- To run a single unit test:
12+
13+
```bash
14+
nox -r -s unit-3.14 -- -k <name of test>
15+
```
16+
17+
- Ignore this step if you lack access to Google Cloud resources. To run system
18+
tests, you can execute::
19+
20+
# Run all system tests
21+
$ nox -r -s system
22+
23+
# Run a single system test
24+
$ nox -r -s system-3.14 -- -k <name of test>
25+
26+
- The codebase must have better coverage than it had previously after each
27+
change. You can test coverage via `nox -s unit system cover` (takes a long
28+
time). Omit `system` if you lack access to cloud resources.

0 commit comments

Comments
 (0)