Skip to content
Merged

test #25

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions data/analytics/ducky.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
import duckdb
import duckdb
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a trailing whitespace on this line. It's good practice to remove it to maintain a clean and consistent code style, as recommended by PEP 8.

Suggested change
import duckdb
import duckdb
References
  1. PEP 8, the style guide for Python code, recommends avoiding extraneous whitespace, including trailing whitespace. (link)


duckdb.sql("SELECT 42 FROM data/2026-01-14/hey.parquet")
df = duckdb.read_parquet("../2026-01-14/hey.parquet")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The file path ../2026-01-14/hey.parquet is hardcoded. This makes the script less flexible and harder to reuse. Consider passing the file path as a command-line argument (e.g., using argparse) or reading it from an environment variable. This will make the script more portable and easier to use in different environments or with different data files.


duckdb.sql("DESCRIBE SELECT * FROM df").show()

duckdb.sql("SELECT language, COUNT(language) AS c_p \
FROM df \
GROUP BY language \
ORDER BY c_p DESC").show()
Comment on lines +7 to +10
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For multi-line strings, especially for something like an SQL query, using triple quotes (""" or ''') is generally more readable and maintainable than using a backslash \ for line continuation. This is a common Python idiom and is implicitly encouraged by PEP 8 for readability.

For example:

duckdb.sql("""
    SELECT language, COUNT(language) AS c_p
    FROM df
    GROUP BY language
    ORDER BY c_p DESC
""").show()
References
  1. PEP 8 emphasizes readability. While not a strict rule against backslash continuation for strings, using triple-quoted strings for multi-line strings is a widely accepted convention for improving readability. (link)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The file is missing a newline character at the end. It's a common convention (and a POSIX standard) to end files with a newline. This can prevent issues with file concatenation and some command-line tools. PEP 8 also recommends this.

References
  1. PEP 8 recommends that all files should end in a single newline character. (link)

5 changes: 2 additions & 3 deletions worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -123,15 +123,14 @@ def get_github_data(self, start_in_repo_num: int = 0, batch_size: int = 500, git

except Exception as validation_error:
print(f"Validation error for repo {github_data_points.get('full_name')}: {validation_error}")
print("Skipping this repo and continuing...")
print("Skipping this repo and continuing")
continue

remaining_api_calls = github_instance.rate_limiting
remaining = remaining_api_calls[0]

if remaining_api_calls == 1:
if remaining == 2:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using 2 as a hardcoded value here is a 'magic number'. It would be better to define it as a constant with a descriptive name (e.g., API_RATE_LIMIT_THRESHOLD = 2) at a more global scope to improve readability and maintainability. Additionally, using <= is more robust than == for this check. It ensures that the loop will break even if the number of remaining calls happens to skip the exact value of 2 (e.g., by dropping from 3 to 1).

Suggested change
if remaining == 2:
if remaining <= 2:

print(f"Reached batch size limit of {batch_size}")

break

# # start_in_repo_num = counter
Expand Down
Loading