From 090d586c2ccc05495e4c5ef27839cf6318c52493 Mon Sep 17 00:00:00 2001 From: Adam Dangoor Date: Fri, 5 Dec 2025 09:31:09 +0000 Subject: [PATCH 01/11] Fix a variable name --- sources/academy/platform/getting_started/apify_client.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sources/academy/platform/getting_started/apify_client.md b/sources/academy/platform/getting_started/apify_client.md index 9622ee324..ce875609d 100644 --- a/sources/academy/platform/getting_started/apify_client.md +++ b/sources/academy/platform/getting_started/apify_client.md @@ -189,7 +189,7 @@ from apify_client import ApifyClient client = ApifyClient(token='YOUR_TOKEN') -actor = client.actor('YOUR_USERNAME/adding-actor').call(run_input={ +run = client.actor('YOUR_USERNAME/adding-actor').call(run_input={ 'num1': 4, 'num2': 2 }) From 813d3a471271ebe1f7aaf740487ef5e42347f793 Mon Sep 17 00:00:00 2001 From: Adam Dangoor Date: Fri, 5 Dec 2025 09:31:34 +0000 Subject: [PATCH 02/11] requirements.txt is not a Python file --- sources/academy/tutorials/python/process_data_using_python.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sources/academy/tutorials/python/process_data_using_python.md b/sources/academy/tutorials/python/process_data_using_python.md index 5e72eaddb..50efade77 100644 --- a/sources/academy/tutorials/python/process_data_using_python.md +++ b/sources/academy/tutorials/python/process_data_using_python.md @@ -31,7 +31,7 @@ In the page that opens, you can see your newly created Actor. In the **Settings* First, we'll start with the `requirements.txt` file. Its purpose is to list all the third-party packages that your Actor will use. We will be using the `pandas` package for parsing the downloaded weather data, and the `matplotlib` package for visualizing it. We don't care about versions of these packages, so we list just their names: -```py +```text # Add your dependencies here. # See https://pip.pypa.io/en/latest/cli/pip_install/#requirements-file-format # for how to format them From 21023d559772c5d16900ce2fa3f4927127a98a8c Mon Sep 17 00:00:00 2001 From: Adam Dangoor Date: Fri, 5 Dec 2025 09:34:00 +0000 Subject: [PATCH 03/11] Group commands on process_data_using_python.md --- sources/academy/tutorials/python/process_data_using_python.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/sources/academy/tutorials/python/process_data_using_python.md b/sources/academy/tutorials/python/process_data_using_python.md index 50efade77..56e6aa197 100644 --- a/sources/academy/tutorials/python/process_data_using_python.md +++ b/sources/academy/tutorials/python/process_data_using_python.md @@ -44,6 +44,8 @@ The Actor's main logic will live in the `main.py` file. Let's delete everything Next, we'll import all the packages we will use in the code: + + ```py from io import BytesIO import os @@ -127,6 +129,8 @@ print(f'Result is available at {os.environ["APIFY_API_PUBLIC_BASE_URL"]}' + f'/v2/key-value-stores/{os.environ["APIFY_DEFAULT_KEY_VALUE_STORE_ID"]}/records/prediction.png') ``` + + And that's it! Now you can save the changes in the editor, and then click **Build and run** at the bottom of the page. The Actor will get built, the built Actor image will get saved for future re-use, and then it will be executed. You can follow the progress of the Actor build and the Actor run in the **Last build** and **Last run** tabs, respectively, in the developer console in the Actor source view. Once the Actor finishes running, it will output the URL where you can access the plot we created in its log. ![Building and running the BBC Weather Parser Actor](./images/bbc-weather-parser-source.png) From 09e511b552c50f9a30f0f40b8e77d7bfd0478cec Mon Sep 17 00:00:00 2001 From: Adam Dangoor Date: Fri, 5 Dec 2025 10:00:13 +0000 Subject: [PATCH 04/11] Change a few requirements.txt file definitions to be 'text' rather than Python formatting --- sources/academy/tutorials/python/scrape_data_python.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/sources/academy/tutorials/python/scrape_data_python.md b/sources/academy/tutorials/python/scrape_data_python.md index df8dfcdbf..b67fc6e44 100644 --- a/sources/academy/tutorials/python/scrape_data_python.md +++ b/sources/academy/tutorials/python/scrape_data_python.md @@ -63,7 +63,7 @@ In the page that opens, you can see your newly created Actor. In the **Settings* First we'll start with the `requirements.txt` file. Its purpose is to list all the third-party packages that your Actor will use. We will be using the `requests` package for downloading the BBC Weather pages, and the `beautifulsoup4` package for parsing and processing the downloaded pages. We don't care about versions of these packages, so we list just their names: -```py +```text # Add your dependencies here. # See https://pip.pypa.io/en/latest/cli/pip_install/#requirements-file-format # for how to format them @@ -231,7 +231,7 @@ In the page that opens, you can see your newly created Actor. In the **Settings* First, we'll start with the `requirements.txt` file. Its purpose is to list all the third-party packages that your Actor will use. We will be using the `pandas` package for parsing the downloaded weather data, and the `matplotlib` package for visualizing it. We don't care about versions of these packages, so we list just their names: -```py +```text # Add your dependencies here. # See https://pip.pypa.io/en/latest/cli/pip_install/#requirements-file-format # for how to format them From dbad94a8cfccd584ff24114ed84c9f725e469470 Mon Sep 17 00:00:00 2001 From: Adam Dangoor Date: Fri, 5 Dec 2025 10:00:43 +0000 Subject: [PATCH 05/11] Add a few doccmd groups --- sources/academy/tutorials/python/scrape_data_python.md | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/sources/academy/tutorials/python/scrape_data_python.md b/sources/academy/tutorials/python/scrape_data_python.md index b67fc6e44..4f042c4a8 100644 --- a/sources/academy/tutorials/python/scrape_data_python.md +++ b/sources/academy/tutorials/python/scrape_data_python.md @@ -78,6 +78,8 @@ Finally, we can get to writing the main logic for the Actor, which will live in First, we need to import all the packages we will use in the code: + + ```py from datetime import datetime, time, timedelta, timezone import os @@ -205,6 +207,8 @@ default_dataset_client.push_items(weather_data) print(f'Results have been saved to the dataset with ID {os.environ["APIFY_DEFAULT_DATASET_ID"]}') ``` + + ### Running the Actor And that's it! Now you can save the changes in the editor, and then click **Build and run** at the bottom of the page. The Actor will get built, the built Actor image will get saved for future reuse, and then it will be executed. You can follow the progress of the Actor build and the Actor run in the **Last build** and **Last run** tabs, respectively, in the developer console in the Actor source view. Once the Actor finishes running, you can view the scraped data in the **Dataset** tab in the Actor run view. @@ -244,6 +248,8 @@ The Actor's main logic will live in the `main.py` file. Let's delete everything Next, we'll import all the packages we will use in the code: + + ```py from io import BytesIO import os @@ -327,6 +333,8 @@ print(f'Result is available at {os.environ["APIFY_API_PUBLIC_BASE_URL"]}' + f'/v2/key-value-stores/{os.environ["APIFY_DEFAULT_KEY_VALUE_STORE_ID"]}/records/prediction.png') ``` + + And that's it! Now you can save the changes in the editor, and then click **Build and run** at the bottom of the page. The Actor will get built, the built Actor image will get saved for future re-use, and then it will be executed. You can follow the progress of the Actor build and the Actor run in the **Last build** and **Last run** tabs, respectively, in the developer console in the Actor source view. Once the Actor finishes running, it will output the URL where you can access the plot we created in its log. ![Building and running the BBC Weather Parser Actor](./images/bbc-weather-parser-source.png) From 3933d6effe755d8d2cbef510465181a3ccde9f94 Mon Sep 17 00:00:00 2001 From: Adam Dangoor Date: Fri, 5 Dec 2025 10:01:29 +0000 Subject: [PATCH 06/11] Change the rendering of a Python REPL from py to pycon --- .../scraping_basics_javascript2/07_extracting_data.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sources/academy/webscraping/scraping_basics_javascript2/07_extracting_data.md b/sources/academy/webscraping/scraping_basics_javascript2/07_extracting_data.md index 5c9fb9554..383637020 100644 --- a/sources/academy/webscraping/scraping_basics_javascript2/07_extracting_data.md +++ b/sources/academy/webscraping/scraping_basics_javascript2/07_extracting_data.md @@ -148,7 +148,7 @@ if (priceText.startsWith("From ")) { Great! Only if we didn't overlook an important pitfall called [floating-point error](https://en.wikipedia.org/wiki/Floating-point_error_mitigation). In short, computers save floating point numbers in a way which isn't always reliable: -```py +```pycon > 0.1 + 0.2 0.30000000000000004 ``` From e38f9e980c0fae036e1e6eb9c087cb8f6362ceb9 Mon Sep 17 00:00:00 2001 From: Adam Dangoor Date: Fri, 5 Dec 2025 10:14:26 +0000 Subject: [PATCH 07/11] Use an import to satisfy ruff --- .../webscraping/scraping_basics_python/04_downloading_html.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sources/academy/webscraping/scraping_basics_python/04_downloading_html.md b/sources/academy/webscraping/scraping_basics_python/04_downloading_html.md index e3866cfcb..6441eca53 100644 --- a/sources/academy/webscraping/scraping_basics_python/04_downloading_html.md +++ b/sources/academy/webscraping/scraping_basics_python/04_downloading_html.md @@ -34,7 +34,7 @@ Now let's test that all works. Inside the project directory we'll create a new f ```py import httpx -print("OK") +print("OK", httpx.__version__) ``` Running it as a Python program will verify that our setup is okay and we've installed HTTPX: From 95ffe17acd345582c13da0907a20dfc8c8a6074a Mon Sep 17 00:00:00 2001 From: Adam Dangoor Date: Fri, 5 Dec 2025 10:19:20 +0000 Subject: [PATCH 08/11] Add a few doccmd groups --- .../webscraping/scraping_basics_python/05_parsing_html.md | 4 ++++ .../scraping_basics_python/06_locating_elements.md | 4 ++++ 2 files changed, 8 insertions(+) diff --git a/sources/academy/webscraping/scraping_basics_python/05_parsing_html.md b/sources/academy/webscraping/scraping_basics_python/05_parsing_html.md index dbfa52cb9..b8458a48c 100644 --- a/sources/academy/webscraping/scraping_basics_python/05_parsing_html.md +++ b/sources/academy/webscraping/scraping_basics_python/05_parsing_html.md @@ -46,6 +46,8 @@ Now let's use it for parsing the HTML. The `BeautifulSoup` object allows us to w We'll update our code to the following: + + ```py import httpx from bs4 import BeautifulSoup @@ -74,6 +76,8 @@ first_heading = headings[0] print(first_heading.text) ``` + + If we run our scraper again, it prints the text of the first `h1` element: ```text diff --git a/sources/academy/webscraping/scraping_basics_python/06_locating_elements.md b/sources/academy/webscraping/scraping_basics_python/06_locating_elements.md index 0708dc071..290c7a277 100644 --- a/sources/academy/webscraping/scraping_basics_python/06_locating_elements.md +++ b/sources/academy/webscraping/scraping_basics_python/06_locating_elements.md @@ -164,6 +164,8 @@ We can use Beautiful Soup's `.contents` property to access individual nodes. It It seems like we can read the last element to get the actual amount. Let's fix our program: + + ```py import httpx from bs4 import BeautifulSoup @@ -198,6 +200,8 @@ The results seem to be correct, but they're hard to verify because the prices vi print(title, price, sep=" | ") ``` + + The output is much nicer this way: ```text From 5c0428a96eb23758cd034feda092aba03c26815b Mon Sep 17 00:00:00 2001 From: Adam Dangoor Date: Fri, 5 Dec 2025 10:19:37 +0000 Subject: [PATCH 09/11] Mark Python output as text, not Python --- .../webscraping/scraping_basics_python/06_locating_elements.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sources/academy/webscraping/scraping_basics_python/06_locating_elements.md b/sources/academy/webscraping/scraping_basics_python/06_locating_elements.md index 290c7a277..c0307592c 100644 --- a/sources/academy/webscraping/scraping_basics_python/06_locating_elements.md +++ b/sources/academy/webscraping/scraping_basics_python/06_locating_elements.md @@ -158,7 +158,7 @@ When translated to a tree of Python objects, the element with class `price` will We can use Beautiful Soup's `.contents` property to access individual nodes. It returns a list of nodes like this: -```py +```text ["\n", Sale price, "$74.95"] ``` From b7f96eefb41c3a804edf0c2248e995a08114c375 Mon Sep 17 00:00:00 2001 From: Adam Dangoor Date: Fri, 5 Dec 2025 10:25:53 +0000 Subject: [PATCH 10/11] Split some imports onto separate lines --- sources/platform/proxy/datacenter_proxy.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/sources/platform/proxy/datacenter_proxy.md b/sources/platform/proxy/datacenter_proxy.md index 1cf81cb78..04ed35e53 100644 --- a/sources/platform/proxy/datacenter_proxy.md +++ b/sources/platform/proxy/datacenter_proxy.md @@ -118,7 +118,8 @@ await Actor.exit(); ```python from apify import Actor -import requests, asyncio +import asyncio +import requests async def main(): async with Actor: @@ -258,7 +259,8 @@ await Actor.exit(); ```python from apify import Actor -import requests, asyncio +import asyncio +import requests async def main(): async with Actor: From b90a58c180c2ecb062784e9889bd55e0da03f87a Mon Sep 17 00:00:00 2001 From: Adam Dangoor Date: Fri, 5 Dec 2025 10:51:04 +0000 Subject: [PATCH 11/11] Use more doccmd groups --- .../scraping_basics_python/07_extracting_data.md | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/sources/academy/webscraping/scraping_basics_python/07_extracting_data.md b/sources/academy/webscraping/scraping_basics_python/07_extracting_data.md index eb49b7ce6..7724b099c 100644 --- a/sources/academy/webscraping/scraping_basics_python/07_extracting_data.md +++ b/sources/academy/webscraping/scraping_basics_python/07_extracting_data.md @@ -34,6 +34,12 @@ It's because some products have variants with different prices. Later in the cou Ideally we'd go and discuss the problem with those who are about to use the resulting data. For their purposes, is the fact that some prices are just minimum prices important? What would be the most useful representation of the range for them? Maybe they'd tell us that it's okay if we just remove the `From` prefix? + + + + ```py price_text = product.select_one(".price").contents[-1] price = price_text.removeprefix("From ") @@ -51,6 +57,8 @@ else: price = min_price ``` + + :::tip Built-in string methods If you're not proficient in Python's string methods, [.startswith()](https://docs.python.org/3/library/stdtypes.html#str.startswith) checks the beginning of a given string, and [.removeprefix()](https://docs.python.org/3/library/stdtypes.html#str.removeprefix) removes something from the beginning of a given string. @@ -59,6 +67,8 @@ If you're not proficient in Python's string methods, [.startswith()](https://doc The whole program would look like this: + + ```py import httpx from bs4 import BeautifulSoup @@ -112,7 +122,7 @@ These might be useful in some complex scenarios, but in our case, they won't mak We got rid of the `From` and possible whitespace, but we still can't save the price as a number in our Python program: -```py +```pycon >>> price = "$1,998.00" >>> float(price) Traceback (most recent call last): @@ -154,7 +164,7 @@ else: Great! Only if we didn't overlook an important pitfall called [floating-point error](https://en.wikipedia.org/wiki/Floating-point_error_mitigation). In short, computers save floating point numbers in a way which isn't always reliable: -```py +```pycon >>> 0.1 + 0.2 0.30000000000000004 ``` @@ -174,6 +184,8 @@ price_text = ( ) ``` + + In this case, removing the dot from the price text is the same as if we multiplied all the numbers with 100, effectively converting dollars to cents. For converting the text to a number we'll use `int()` instead of `float()`. This is how the whole program looks like now: ```py