\n",
@@ -1708,86 +1639,101 @@
"7 HackerNews\n",
"8 Medium\n",
"9 HackADay"
- ]
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "summary": "{\n \"name\": \"df_partners\",\n \"rows\": 10,\n \"fields\": [\n {\n \"column\": \"partner\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"Medium\",\n \"AWS\",\n \"Browserbase\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
+ "variable_name": "df_partners"
+ }
},
- "execution_count": 11,
"metadata": {},
- "output_type": "execute_result"
+ "output_type": "execute_result",
+ "execution_count": 11
}
],
- "source": [
- "df_partners"
- ]
+ "metadata": {
+ "id": "jNLaHXlEOisi",
+ "colab": {
+ "height": 363,
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "0dd7f4dc-9fee-444c-ac1c-a6f829063f76"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "Save the responses to CSV",
+ "outputs": [],
"metadata": {
"id": "v0CBYVk7qA5Z"
},
- "source": [
- "Save the responses to CSV"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "BtEbB9pmQGhO",
- "outputId": "c25648bc-ba04-4e32-e15f-2a650d3ac3ba"
- },
+ "source": "# Save the DataFrames to a CSV file\ndf_company.to_csv(\"company_info.csv\", index=False)\ndf_founders.to_csv(\"founders.csv\", index=False)\ndf_pricing.to_csv(\"pricing_plans.csv\", index=False)\ndf_partners.to_csv(\"partners.csv\", index=False)\n# Print confirmation\nprint(\"Data saved to CSV files\")\n",
"outputs": [
{
"name": "stdout",
- "output_type": "stream",
"text": [
"Data saved to CSV files\n"
- ]
+ ],
+ "output_type": "stream"
}
],
- "source": [
- "# Save the DataFrames to a CSV file\n",
- "df_company.to_csv(\"company_info.csv\", index=False)\n",
- "df_founders.to_csv(\"founders.csv\", index=False)\n",
- "df_pricing.to_csv(\"pricing_plans.csv\", index=False)\n",
- "df_partners.to_csv(\"partners.csv\", index=False)\n",
- "# Print confirmation\n",
- "print(\"Data saved to CSV files\")\n"
- ]
+ "metadata": {
+ "id": "BtEbB9pmQGhO",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "c25648bc-ba04-4e32-e15f-2a650d3ac3ba"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "## 🔗 Resources",
+ "outputs": [],
"metadata": {
"id": "-1SZT8VzTZNd"
},
- "source": [
- "## 🔗 Resources"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "\n
\n\n\n- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n- 🦙 **LlamaIndex:** [ScrapeGraph docs](https://docs.llamaindex.ai/en/stable/api_reference/tools/scrapegraph/)\n\nMade with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n",
+ "outputs": [],
"metadata": {
"id": "dUi2LtMLRDDR"
},
- "source": [
- "\n",
- "
\n",
- "\n",
- "\n",
- "- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n",
- "- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n",
- "- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n",
- "- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n",
- "- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n",
- "- 🦙 **LlamaIndex:** [ScrapeGraph docs](https://docs.llamaindex.ai/en/stable/api_reference/tools/scrapegraph/)\n",
- "\n",
- "Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
}
],
"metadata": {
@@ -1795,8 +1741,8 @@
"provenance": []
},
"kernelspec": {
- "display_name": "Python 3",
- "name": "python3"
+ "name": "python3",
+ "display_name": "Python 3"
},
"language_info": {
"name": "python"
@@ -1804,4 +1750,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
-}
+}
\ No newline at end of file
diff --git a/cookbook/company-info/scrapegraph_sdk.ipynb b/cookbook/company-info/scrapegraph_sdk.ipynb
index c66398a..1a8e450 100644
--- a/cookbook/company-info/scrapegraph_sdk.ipynb
+++ b/cookbook/company-info/scrapegraph_sdk.ipynb
@@ -1 +1,1888 @@
-{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[],"collapsed_sections":["IzsyDXEWwPVt"],"authorship_tag":"ABX9TyO57uo4LpNqAm10rmE0B6Q5"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["
"],"metadata":{"id":"ReBHQ5_834pZ"}},{"cell_type":"markdown","source":["## 🕷️ Extract Company Info with Official Scrapegraph SDK"],"metadata":{"id":"jEkuKbcRrPcK"}},{"cell_type":"markdown","source":[""],"metadata":{"id":"3Q5VM3SsRlxO"}},{"cell_type":"markdown","source":["### 🔧 Install `dependencies`"],"metadata":{"id":"IzsyDXEWwPVt"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"os_vm0MkIxr9"},"outputs":[],"source":["%%capture\n","!pip install scrapegraph-py"]},{"cell_type":"markdown","source":["### 🔑 Import `ScrapeGraph` API key"],"metadata":{"id":"apBsL-L2KzM7"}},{"cell_type":"markdown","source":["You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"],"metadata":{"id":"ol9gQbAFkh9b"}},{"cell_type":"code","source":["import getpass\n","import os\n","\n","if not os.environ.get(\"SGAI_API_KEY\"):\n"," os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"sffqFG2EJ8bI","executionInfo":{"status":"ok","timestamp":1734532300517,"user_tz":-60,"elapsed":6877,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"f6b837cd-0f00-49cc-cb6f-f2bca57544f5"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["SGAI_API_KEY not found in environment.\n","Please enter your SGAI_API_KEY: ··········\n","SGAI_API_KEY has been set in the environment.\n"]}]},{"cell_type":"markdown","source":["### 📝 Defining an `Output Schema` for Webpage Content Extraction\n"],"metadata":{"id":"jnqMB2-xVYQ7"}},{"cell_type":"markdown","source":["If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n","\n","
\n"],"metadata":{"id":"VZvxbjfXvbgd"}},{"cell_type":"code","source":["from pydantic import BaseModel, Field\n","from typing import List, Dict, Optional\n","\n","# Schema for founder information\n","class FounderSchema(BaseModel):\n"," name: str = Field(description=\"Name of the founder\")\n"," role: str = Field(description=\"Role of the founder in the company\")\n"," linkedin: str = Field(description=\"LinkedIn profile of the founder\")\n","\n","# Schema for pricing plans\n","class PricingPlanSchema(BaseModel):\n"," tier: str = Field(description=\"Name of the pricing tier\")\n"," price: str = Field(description=\"Price of the plan\")\n"," credits: int = Field(description=\"Number of credits included in the plan\")\n","\n","# Schema for social links\n","class SocialLinksSchema(BaseModel):\n"," linkedin: str = Field(description=\"LinkedIn page of the company\")\n"," twitter: str = Field(description=\"Twitter page of the company\")\n"," github: str = Field(description=\"GitHub page of the company\")\n","\n","# Schema for company information\n","class CompanyInfoSchema(BaseModel):\n"," company_name: str = Field(description=\"Name of the company\")\n"," description: str = Field(description=\"Brief description of the company\")\n"," founders: List[FounderSchema] = Field(description=\"List of company founders\")\n"," logo: str = Field(description=\"Logo URL of the company\")\n"," partners: List[str] = Field(description=\"List of company partners\")\n"," pricing_plans: List[PricingPlanSchema] = Field(description=\"Details of pricing plans\")\n"," contact_emails: List[str] = Field(description=\"Contact emails of the company\")\n"," social_links: SocialLinksSchema = Field(description=\"Social links of the company\")\n"," privacy_policy: str = Field(description=\"URL to the privacy policy\")\n"," terms_of_service: str = Field(description=\"URL to the terms of service\")\n"," api_status: str = Field(description=\"API status page URL\")"],"metadata":{"id":"dlrOEgZk_8V4"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["### 🚀 Initialize `SGAI Client` and start extraction"],"metadata":{"id":"cDGH0b2DkY63"}},{"cell_type":"markdown","source":["Initialize the client for scraping (there's also an async version [here](https://github.com/ScrapeGraphAI/scrapegraph-sdk/blob/main/scrapegraph-py/examples/async_smartscraper_example.py))"],"metadata":{"id":"4SLJgXgcob6L"}},{"cell_type":"code","source":["from scrapegraph_py import Client\n","\n","# Initialize the client with explicit API key\n","sgai_client = Client(api_key=sgai_api_key)"],"metadata":{"id":"PQI25GZvoCSk"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Here we use `Smartscraper` service to extract structured data using AI from a webpage.\n","\n","\n","> If you already have an HTML file, you can upload it and use `Localscraper` instead.\n","\n","\n","\n"],"metadata":{"id":"M1KSXffZopUD"}},{"cell_type":"code","source":["# Request for Trending Repositories\n","repo_response = sgai_client.smartscraper(\n"," website_url=\"https://scrapegraphai.com/\",\n"," user_prompt=\"Extract info about the company\",\n"," output_schema=CompanyInfoSchema,\n",")"],"metadata":{"id":"2FIKomclLNFx"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Print the response"],"metadata":{"id":"YZz1bqCIpoL8"}},{"cell_type":"code","source":["import json\n","\n","# Print the response\n","request_id = repo_response['request_id']\n","result = repo_response['result']\n","\n","print(f\"Request ID: {request_id}\")\n","print(\"Company Info:\")\n","print(json.dumps(result, indent=2))"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"F1VfD8B4LPc8","executionInfo":{"status":"ok","timestamp":1734532533318,"user_tz":-60,"elapsed":339,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"8d7b2955-1569-4b3a-8ffe-014a8442dd12"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Request ID: 87a7ea1a-9dd4-4d1d-ae76-b419ead57c11\n","Company Info:\n","{\n"," \"company_name\": \"ScrapeGraphAI\",\n"," \"description\": \"ScrapeGraphAI is a powerful AI scraping API designed for efficient web data extraction to power LLM applications and AI agents. It enables developers to perform intelligent AI scraping and extract structured information from websites using advanced AI techniques.\",\n"," \"founders\": [\n"," {\n"," \"name\": \"\",\n"," \"role\": \"Founder & Technical Lead\",\n"," \"linkedin\": \"https://www.linkedin.com/in/perinim/\"\n"," },\n"," {\n"," \"name\": \"Marco Vinciguerra\",\n"," \"role\": \"Founder & Software Engineer\",\n"," \"linkedin\": \"https://www.linkedin.com/in/marco-vinciguerra-7ba365242/\"\n"," },\n"," {\n"," \"name\": \"Lorenzo Padoan\",\n"," \"role\": \"Founder & Product Engineer\",\n"," \"linkedin\": \"https://www.linkedin.com/in/lorenzo-padoan-4521a2154/\"\n"," }\n"," ],\n"," \"logo\": \"https://scrapegraphai.com/images/scrapegraphai_logo.svg\",\n"," \"partners\": [\n"," \"PostHog\",\n"," \"AWS\",\n"," \"NVIDIA\",\n"," \"JinaAI\",\n"," \"DagWorks\",\n"," \"Browserbase\",\n"," \"ScrapeDo\",\n"," \"HackerNews\",\n"," \"Medium\",\n"," \"HackADay\"\n"," ],\n"," \"pricing_plans\": [\n"," {\n"," \"tier\": \"Free\",\n"," \"price\": \"$0\",\n"," \"credits\": 100\n"," },\n"," {\n"," \"tier\": \"Starter\",\n"," \"price\": \"$20/month\",\n"," \"credits\": 5000\n"," },\n"," {\n"," \"tier\": \"Growth\",\n"," \"price\": \"$100/month\",\n"," \"credits\": 40000\n"," },\n"," {\n"," \"tier\": \"Pro\",\n"," \"price\": \"$500/month\",\n"," \"credits\": 250000\n"," }\n"," ],\n"," \"contact_emails\": [\n"," \"contact@scrapegraphai.com\"\n"," ],\n"," \"social_links\": {\n"," \"linkedin\": \"https://www.linkedin.com/company/101881123\",\n"," \"twitter\": \"https://x.com/scrapegraphai\",\n"," \"github\": \"https://github.com/ScrapeGraphAI/Scrapegraph-ai\"\n"," },\n"," \"privacy_policy\": \"https://scrapegraphai.com/privacy\",\n"," \"terms_of_service\": \"https://scrapegraphai.com/terms\",\n"," \"api_status\": \"https://scrapegraphapi.openstatus.dev\"\n","}\n"]}]},{"cell_type":"markdown","source":["### 💾 Save the output to a `CSV` file"],"metadata":{"id":"2as65QLypwdb"}},{"cell_type":"markdown","source":["Let's create a pandas dataframe and show the tables with the extracted content"],"metadata":{"id":"HTLVFgbVLLBR"}},{"cell_type":"code","source":["import pandas as pd\n","\n","# Flatten and save main company information\n","company_info = {\n"," \"company_name\": result[\"company_name\"],\n"," \"description\": result[\"description\"],\n"," \"logo\": result[\"logo\"],\n"," \"contact_emails\": \", \".join(result[\"contact_emails\"]),\n"," \"privacy_policy\": result[\"privacy_policy\"],\n"," \"terms_of_service\": result[\"terms_of_service\"],\n"," \"api_status\": result[\"api_status\"],\n"," \"linkedin\": result[\"social_links\"][\"linkedin\"],\n"," \"twitter\": result[\"social_links\"][\"twitter\"],\n"," \"github\": result[\"social_links\"].get(\"github\", None)\n","}\n","\n","# Creating dataframes\n","df_company = pd.DataFrame([company_info])\n","df_founders = pd.DataFrame(result[\"founders\"])\n","df_pricing = pd.DataFrame(result[\"pricing_plans\"])\n","df_partners = pd.DataFrame({\"partner\": result[\"partners\"]})"],"metadata":{"id":"1lS9O1KOI51y"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Show flattened tables"],"metadata":{"id":"JJI9huPkOY9t"}},{"cell_type":"code","source":["df_company"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":153},"id":"vZs8ZutKOT63","executionInfo":{"status":"ok","timestamp":1734533012061,"user_tz":-60,"elapsed":199,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"1278a9b9-2ab8-4150-8d37-328d4eb27e49"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" company_name description \\\n","0 ScrapeGraphAI ScrapeGraphAI is a powerful AI scraping API de... \n","\n"," logo \\\n","0 https://scrapegraphai.com/images/scrapegraphai... \n","\n"," contact_emails privacy_policy \\\n","0 contact@scrapegraphai.com https://scrapegraphai.com/privacy \n","\n"," terms_of_service api_status \\\n","0 https://scrapegraphai.com/terms https://scrapegraphapi.openstatus.dev \n","\n"," linkedin twitter \\\n","0 https://www.linkedin.com/company/101881123 https://x.com/scrapegraphai \n","\n"," github \n","0 https://github.com/ScrapeGraphAI/Scrapegraph-ai "],"text/html":["\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_company","summary":"{\n \"name\": \"df_company\",\n \"rows\": 1,\n \"fields\": [\n {\n \"column\": \"company_name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"ScrapeGraphAI\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"description\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"ScrapeGraphAI is a powerful AI scraping API designed for efficient web data extraction to power LLM applications and AI agents. It enables developers to perform intelligent AI scraping and extract structured information from websites using advanced AI techniques.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"logo\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://scrapegraphai.com/images/scrapegraphai_logo.svg\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"contact_emails\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"contact@scrapegraphai.com\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"privacy_policy\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://scrapegraphai.com/privacy\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"terms_of_service\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://scrapegraphai.com/terms\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"api_status\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://scrapegraphapi.openstatus.dev\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"linkedin\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://www.linkedin.com/company/101881123\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"twitter\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://x.com/scrapegraphai\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"github\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://github.com/ScrapeGraphAI/Scrapegraph-ai\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":10}]},{"cell_type":"code","source":["df_founders"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":143},"id":"QR-fyx5cOetl","executionInfo":{"status":"ok","timestamp":1734533051319,"user_tz":-60,"elapsed":304,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"4b7d55ed-9ef4-44f9-9008-688d734ca820"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" name role \\\n","0 Founder & Technical Lead \n","1 Marco Vinciguerra Founder & Software Engineer \n","2 Lorenzo Padoan Founder & Product Engineer \n","\n"," linkedin \n","0 https://www.linkedin.com/in/perinim/ \n","1 https://www.linkedin.com/in/marco-vinciguerra-... \n","2 https://www.linkedin.com/in/lorenzo-padoan-452... "],"text/html":["\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_founders","summary":"{\n \"name\": \"df_founders\",\n \"rows\": 3,\n \"fields\": [\n {\n \"column\": \"name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"\",\n \"Marco Vinciguerra\",\n \"Lorenzo Padoan\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"role\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Founder & Technical Lead\",\n \"Founder & Software Engineer\",\n \"Founder & Product Engineer\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"linkedin\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"https://www.linkedin.com/in/perinim/\",\n \"https://www.linkedin.com/in/marco-vinciguerra-7ba365242/\",\n \"https://www.linkedin.com/in/lorenzo-padoan-4521a2154/\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":11}]},{"cell_type":"code","source":["df_pricing"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":175},"id":"SWpCvl53OgyQ","executionInfo":{"status":"ok","timestamp":1734533059550,"user_tz":-60,"elapsed":312,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"c256f5e5-227a-4df4-da16-d0021aaf03a1"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" tier price credits\n","0 Free $0 100\n","1 Starter $20/month 5000\n","2 Growth $100/month 40000\n","3 Pro $500/month 250000"],"text/html":["\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_pricing","summary":"{\n \"name\": \"df_pricing\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"tier\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Starter\",\n \"Pro\",\n \"Free\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"price\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"$20/month\",\n \"$500/month\",\n \"$0\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"credits\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 118819,\n \"min\": 100,\n \"max\": 250000,\n \"num_unique_values\": 4,\n \"samples\": [\n 5000,\n 250000,\n 100\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":12}]},{"cell_type":"code","source":["df_partners"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":363},"id":"jNLaHXlEOisi","executionInfo":{"status":"ok","timestamp":1734533067079,"user_tz":-60,"elapsed":216,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"6f075db5-fc3f-437d-9aaa-d6f8e3085c49"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" partner\n","0 PostHog\n","1 AWS\n","2 NVIDIA\n","3 JinaAI\n","4 DagWorks\n","5 Browserbase\n","6 ScrapeDo\n","7 HackerNews\n","8 Medium\n","9 HackADay"],"text/html":["\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df_partners","summary":"{\n \"name\": \"df_partners\",\n \"rows\": 10,\n \"fields\": [\n {\n \"column\": \"partner\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"Medium\",\n \"AWS\",\n \"Browserbase\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":13}]},{"cell_type":"markdown","source":["Save the results to CSV"],"metadata":{"id":"v0CBYVk7qA5Z"}},{"cell_type":"code","source":["# Save the DataFrames to a CSV file\n","df_company.to_csv(\"company_info.csv\", index=False)\n","df_founders.to_csv(\"founders.csv\", index=False)\n","df_pricing.to_csv(\"pricing_plans.csv\", index=False)\n","df_partners.to_csv(\"partners.csv\", index=False)\n","# Print confirmation\n","print(\"Data saved to CSV files\")\n"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"BtEbB9pmQGhO","executionInfo":{"status":"ok","timestamp":1734533092882,"user_tz":-60,"elapsed":213,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"3f05c8ba-7b34-4b53-ab20-bfcc78060557"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Data saved to CSV files\n"]}]},{"cell_type":"markdown","source":["## 🔗 Resources"],"metadata":{"id":"-1SZT8VzTZNd"}},{"cell_type":"markdown","source":["\n","
\n","\n","\n","- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n","- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n","- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n","- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n","- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n","\n","Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"],"metadata":{"id":"dUi2LtMLRDDR"}}]}
+{
+ "cells": [
+ {
+ "source": "## 🕷️ Extract Company Info with Official Scrapegraph SDK\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-sdk) [](https://colab.research.google.com/drive/12d7LycLAYO2bFsBo_jtPHXSaIg7AqR3O?usp=sharing)",
+ "outputs": [
+ {
+ "data": {
+ "text/plain": "## 🕷️ Extract Company Info with Official Scrapegraph SDK\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-sdk) [](https://colab.research.google.com/drive/12d7LycLAYO2bFsBo_jtPHXSaIg7AqR3O?usp=sharing)",
+ "text/markdown": "## 🕷️ Extract Company Info with Official Scrapegraph SDK\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-sdk) [](https://colab.research.google.com/drive/12d7LycLAYO2bFsBo_jtPHXSaIg7AqR3O?usp=sharing)"
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "metadata": {
+ "id": "jEkuKbcRrPcK"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "completed",
+ "execution_count": null,
+ "executionEndTime": "2026-03-26T00:04:40.205Z",
+ "executionStartTime": "2026-03-26T00:04:40.205Z"
+ },
+ {
+ "source": "",
+ "outputs": [],
+ "metadata": {
+ "id": "3Q5VM3SsRlxO"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 🔧 Install `dependencies`",
+ "outputs": [],
+ "metadata": {
+ "id": "IzsyDXEWwPVt"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "%%capture\n!pip install scrapegraph-py",
+ "outputs": [],
+ "metadata": {
+ "id": "os_vm0MkIxr9"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 🔑 Import `ScrapeGraph` API key",
+ "outputs": [],
+ "metadata": {
+ "id": "apBsL-L2KzM7"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)",
+ "outputs": [],
+ "metadata": {
+ "id": "ol9gQbAFkh9b"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "import getpass\nimport os\n\nif not os.environ.get(\"SGAI_API_KEY\"):\n os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")",
+ "outputs": [
+ {
+ "name": "stdout",
+ "text": [
+ "SGAI_API_KEY not found in environment.\n",
+ "Please enter your SGAI_API_KEY: ··········\n",
+ "SGAI_API_KEY has been set in the environment.\n"
+ ],
+ "output_type": "stream"
+ }
+ ],
+ "metadata": {
+ "id": "sffqFG2EJ8bI",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "f6b837cd-0f00-49cc-cb6f-f2bca57544f5",
+ "executionInfo": {
+ "user": {
+ "userId": "10474323355016263615",
+ "displayName": "ScrapeGraphAI"
+ },
+ "status": "ok",
+ "elapsed": 6877,
+ "user_tz": -60,
+ "timestamp": 1734532300517
+ }
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 📝 Defining an `Output Schema` for Webpage Content Extraction\n",
+ "outputs": [],
+ "metadata": {
+ "id": "jnqMB2-xVYQ7"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n\n
\n",
+ "outputs": [],
+ "metadata": {
+ "id": "VZvxbjfXvbgd"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "from pydantic import BaseModel, Field\nfrom typing import List, Dict, Optional\n\n# Schema for founder information\nclass FounderSchema(BaseModel):\n name: str = Field(description=\"Name of the founder\")\n role: str = Field(description=\"Role of the founder in the company\")\n linkedin: str = Field(description=\"LinkedIn profile of the founder\")\n\n# Schema for pricing plans\nclass PricingPlanSchema(BaseModel):\n tier: str = Field(description=\"Name of the pricing tier\")\n price: str = Field(description=\"Price of the plan\")\n credits: int = Field(description=\"Number of credits included in the plan\")\n\n# Schema for social links\nclass SocialLinksSchema(BaseModel):\n linkedin: str = Field(description=\"LinkedIn page of the company\")\n twitter: str = Field(description=\"Twitter page of the company\")\n github: str = Field(description=\"GitHub page of the company\")\n\n# Schema for company information\nclass CompanyInfoSchema(BaseModel):\n company_name: str = Field(description=\"Name of the company\")\n description: str = Field(description=\"Brief description of the company\")\n founders: List[FounderSchema] = Field(description=\"List of company founders\")\n logo: str = Field(description=\"Logo URL of the company\")\n partners: List[str] = Field(description=\"List of company partners\")\n pricing_plans: List[PricingPlanSchema] = Field(description=\"Details of pricing plans\")\n contact_emails: List[str] = Field(description=\"Contact emails of the company\")\n social_links: SocialLinksSchema = Field(description=\"Social links of the company\")\n privacy_policy: str = Field(description=\"URL to the privacy policy\")\n terms_of_service: str = Field(description=\"URL to the terms of service\")\n api_status: str = Field(description=\"API status page URL\")",
+ "outputs": [],
+ "metadata": {
+ "id": "dlrOEgZk_8V4"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 🚀 Initialize `SGAI Client` and start extraction",
+ "outputs": [],
+ "metadata": {
+ "id": "cDGH0b2DkY63"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Initialize the client for scraping (there's also an async version [here](https://github.com/ScrapeGraphAI/scrapegraph-sdk/blob/main/scrapegraph-py/examples/async_smartscraper_example.py))",
+ "outputs": [],
+ "metadata": {
+ "id": "4SLJgXgcob6L"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "from scrapegraph_py import Client\n\n# Initialize the client with explicit API key\nsgai_client = Client(api_key=sgai_api_key)",
+ "outputs": [],
+ "metadata": {
+ "id": "PQI25GZvoCSk"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Here we use `Smartscraper` service to extract structured data using AI from a webpage.\n\n\n> If you already have an HTML file, you can upload it and use `Localscraper` instead.\n\n\n\n",
+ "outputs": [],
+ "metadata": {
+ "id": "M1KSXffZopUD"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "# Request for Trending Repositories\nrepo_response = sgai_client.smartscraper(\n website_url=\"https://scrapegraphai.com/\",\n user_prompt=\"Extract info about the company\",\n output_schema=CompanyInfoSchema,\n)",
+ "outputs": [],
+ "metadata": {
+ "id": "2FIKomclLNFx"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Print the response",
+ "outputs": [],
+ "metadata": {
+ "id": "YZz1bqCIpoL8"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "import json\n\n# Print the response\nrequest_id = repo_response['request_id']\nresult = repo_response['result']\n\nprint(f\"Request ID: {request_id}\")\nprint(\"Company Info:\")\nprint(json.dumps(result, indent=2))",
+ "outputs": [
+ {
+ "name": "stdout",
+ "text": [
+ "Request ID: 87a7ea1a-9dd4-4d1d-ae76-b419ead57c11\n",
+ "Company Info:\n",
+ "{\n",
+ " \"company_name\": \"ScrapeGraphAI\",\n",
+ " \"description\": \"ScrapeGraphAI is a powerful AI scraping API designed for efficient web data extraction to power LLM applications and AI agents. It enables developers to perform intelligent AI scraping and extract structured information from websites using advanced AI techniques.\",\n",
+ " \"founders\": [\n",
+ " {\n",
+ " \"name\": \"\",\n",
+ " \"role\": \"Founder & Technical Lead\",\n",
+ " \"linkedin\": \"https://www.linkedin.com/in/perinim/\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"Marco Vinciguerra\",\n",
+ " \"role\": \"Founder & Software Engineer\",\n",
+ " \"linkedin\": \"https://www.linkedin.com/in/marco-vinciguerra-7ba365242/\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"Lorenzo Padoan\",\n",
+ " \"role\": \"Founder & Product Engineer\",\n",
+ " \"linkedin\": \"https://www.linkedin.com/in/lorenzo-padoan-4521a2154/\"\n",
+ " }\n",
+ " ],\n",
+ " \"logo\": \"https://scrapegraphai.com/images/scrapegraphai_logo.svg\",\n",
+ " \"partners\": [\n",
+ " \"PostHog\",\n",
+ " \"AWS\",\n",
+ " \"NVIDIA\",\n",
+ " \"JinaAI\",\n",
+ " \"DagWorks\",\n",
+ " \"Browserbase\",\n",
+ " \"ScrapeDo\",\n",
+ " \"HackerNews\",\n",
+ " \"Medium\",\n",
+ " \"HackADay\"\n",
+ " ],\n",
+ " \"pricing_plans\": [\n",
+ " {\n",
+ " \"tier\": \"Free\",\n",
+ " \"price\": \"$0\",\n",
+ " \"credits\": 100\n",
+ " },\n",
+ " {\n",
+ " \"tier\": \"Starter\",\n",
+ " \"price\": \"$20/month\",\n",
+ " \"credits\": 5000\n",
+ " },\n",
+ " {\n",
+ " \"tier\": \"Growth\",\n",
+ " \"price\": \"$100/month\",\n",
+ " \"credits\": 40000\n",
+ " },\n",
+ " {\n",
+ " \"tier\": \"Pro\",\n",
+ " \"price\": \"$500/month\",\n",
+ " \"credits\": 250000\n",
+ " }\n",
+ " ],\n",
+ " \"contact_emails\": [\n",
+ " \"contact@scrapegraphai.com\"\n",
+ " ],\n",
+ " \"social_links\": {\n",
+ " \"linkedin\": \"https://www.linkedin.com/company/101881123\",\n",
+ " \"twitter\": \"https://x.com/scrapegraphai\",\n",
+ " \"github\": \"https://github.com/ScrapeGraphAI/Scrapegraph-ai\"\n",
+ " },\n",
+ " \"privacy_policy\": \"https://scrapegraphai.com/privacy\",\n",
+ " \"terms_of_service\": \"https://scrapegraphai.com/terms\",\n",
+ " \"api_status\": \"https://scrapegraphapi.openstatus.dev\"\n",
+ "}\n"
+ ],
+ "output_type": "stream"
+ }
+ ],
+ "metadata": {
+ "id": "F1VfD8B4LPc8",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "8d7b2955-1569-4b3a-8ffe-014a8442dd12",
+ "executionInfo": {
+ "user": {
+ "userId": "10474323355016263615",
+ "displayName": "ScrapeGraphAI"
+ },
+ "status": "ok",
+ "elapsed": 339,
+ "user_tz": -60,
+ "timestamp": 1734532533318
+ }
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 💾 Save the output to a `CSV` file",
+ "outputs": [],
+ "metadata": {
+ "id": "2as65QLypwdb"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Let's create a pandas dataframe and show the tables with the extracted content",
+ "outputs": [],
+ "metadata": {
+ "id": "HTLVFgbVLLBR"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "import pandas as pd\n\n# Flatten and save main company information\ncompany_info = {\n \"company_name\": result[\"company_name\"],\n \"description\": result[\"description\"],\n \"logo\": result[\"logo\"],\n \"contact_emails\": \", \".join(result[\"contact_emails\"]),\n \"privacy_policy\": result[\"privacy_policy\"],\n \"terms_of_service\": result[\"terms_of_service\"],\n \"api_status\": result[\"api_status\"],\n \"linkedin\": result[\"social_links\"][\"linkedin\"],\n \"twitter\": result[\"social_links\"][\"twitter\"],\n \"github\": result[\"social_links\"].get(\"github\", None)\n}\n\n# Creating dataframes\ndf_company = pd.DataFrame([company_info])\ndf_founders = pd.DataFrame(result[\"founders\"])\ndf_pricing = pd.DataFrame(result[\"pricing_plans\"])\ndf_partners = pd.DataFrame({\"partner\": result[\"partners\"]})",
+ "outputs": [],
+ "metadata": {
+ "id": "1lS9O1KOI51y"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Show flattened tables",
+ "outputs": [],
+ "metadata": {
+ "id": "JJI9huPkOY9t"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "df_company",
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "
\n"
+ ],
+ "text/plain": [
+ " company_name description \\\n",
+ "0 ScrapeGraphAI ScrapeGraphAI is a powerful AI scraping API de... \n",
+ "\n",
+ " logo \\\n",
+ "0 https://scrapegraphai.com/images/scrapegraphai... \n",
+ "\n",
+ " contact_emails privacy_policy \\\n",
+ "0 contact@scrapegraphai.com https://scrapegraphai.com/privacy \n",
+ "\n",
+ " terms_of_service api_status \\\n",
+ "0 https://scrapegraphai.com/terms https://scrapegraphapi.openstatus.dev \n",
+ "\n",
+ " linkedin twitter \\\n",
+ "0 https://www.linkedin.com/company/101881123 https://x.com/scrapegraphai \n",
+ "\n",
+ " github \n",
+ "0 https://github.com/ScrapeGraphAI/Scrapegraph-ai "
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "summary": "{\n \"name\": \"df_company\",\n \"rows\": 1,\n \"fields\": [\n {\n \"column\": \"company_name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"ScrapeGraphAI\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"description\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"ScrapeGraphAI is a powerful AI scraping API designed for efficient web data extraction to power LLM applications and AI agents. It enables developers to perform intelligent AI scraping and extract structured information from websites using advanced AI techniques.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"logo\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://scrapegraphai.com/images/scrapegraphai_logo.svg\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"contact_emails\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"contact@scrapegraphai.com\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"privacy_policy\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://scrapegraphai.com/privacy\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"terms_of_service\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://scrapegraphai.com/terms\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"api_status\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://scrapegraphapi.openstatus.dev\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"linkedin\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://www.linkedin.com/company/101881123\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"twitter\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://x.com/scrapegraphai\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"github\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"https://github.com/ScrapeGraphAI/Scrapegraph-ai\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
+ "variable_name": "df_company"
+ }
+ },
+ "metadata": {},
+ "output_type": "execute_result",
+ "execution_count": 10
+ }
+ ],
+ "metadata": {
+ "id": "vZs8ZutKOT63",
+ "colab": {
+ "height": 153,
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "1278a9b9-2ab8-4150-8d37-328d4eb27e49",
+ "executionInfo": {
+ "user": {
+ "userId": "10474323355016263615",
+ "displayName": "ScrapeGraphAI"
+ },
+ "status": "ok",
+ "elapsed": 199,
+ "user_tz": -60,
+ "timestamp": 1734533012061
+ }
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "df_founders",
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "
\n"
+ ],
+ "text/plain": [
+ " name role \\\n",
+ "0 Founder & Technical Lead \n",
+ "1 Marco Vinciguerra Founder & Software Engineer \n",
+ "2 Lorenzo Padoan Founder & Product Engineer \n",
+ "\n",
+ " linkedin \n",
+ "0 https://www.linkedin.com/in/perinim/ \n",
+ "1 https://www.linkedin.com/in/marco-vinciguerra-... \n",
+ "2 https://www.linkedin.com/in/lorenzo-padoan-452... "
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "summary": "{\n \"name\": \"df_founders\",\n \"rows\": 3,\n \"fields\": [\n {\n \"column\": \"name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"\",\n \"Marco Vinciguerra\",\n \"Lorenzo Padoan\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"role\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Founder & Technical Lead\",\n \"Founder & Software Engineer\",\n \"Founder & Product Engineer\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"linkedin\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"https://www.linkedin.com/in/perinim/\",\n \"https://www.linkedin.com/in/marco-vinciguerra-7ba365242/\",\n \"https://www.linkedin.com/in/lorenzo-padoan-4521a2154/\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
+ "variable_name": "df_founders"
+ }
+ },
+ "metadata": {},
+ "output_type": "execute_result",
+ "execution_count": 11
+ }
+ ],
+ "metadata": {
+ "id": "QR-fyx5cOetl",
+ "colab": {
+ "height": 143,
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "4b7d55ed-9ef4-44f9-9008-688d734ca820",
+ "executionInfo": {
+ "user": {
+ "userId": "10474323355016263615",
+ "displayName": "ScrapeGraphAI"
+ },
+ "status": "ok",
+ "elapsed": 304,
+ "user_tz": -60,
+ "timestamp": 1734533051319
+ }
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "df_pricing",
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "
\n"
+ ],
+ "text/plain": [
+ " tier price credits\n",
+ "0 Free $0 100\n",
+ "1 Starter $20/month 5000\n",
+ "2 Growth $100/month 40000\n",
+ "3 Pro $500/month 250000"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "summary": "{\n \"name\": \"df_pricing\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"tier\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Starter\",\n \"Pro\",\n \"Free\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"price\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"$20/month\",\n \"$500/month\",\n \"$0\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"credits\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 118819,\n \"min\": 100,\n \"max\": 250000,\n \"num_unique_values\": 4,\n \"samples\": [\n 5000,\n 250000,\n 100\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
+ "variable_name": "df_pricing"
+ }
+ },
+ "metadata": {},
+ "output_type": "execute_result",
+ "execution_count": 12
+ }
+ ],
+ "metadata": {
+ "id": "SWpCvl53OgyQ",
+ "colab": {
+ "height": 175,
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "c256f5e5-227a-4df4-da16-d0021aaf03a1",
+ "executionInfo": {
+ "user": {
+ "userId": "10474323355016263615",
+ "displayName": "ScrapeGraphAI"
+ },
+ "status": "ok",
+ "elapsed": 312,
+ "user_tz": -60,
+ "timestamp": 1734533059550
+ }
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "df_partners",
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "
\n"
+ ],
+ "text/plain": [
+ " partner\n",
+ "0 PostHog\n",
+ "1 AWS\n",
+ "2 NVIDIA\n",
+ "3 JinaAI\n",
+ "4 DagWorks\n",
+ "5 Browserbase\n",
+ "6 ScrapeDo\n",
+ "7 HackerNews\n",
+ "8 Medium\n",
+ "9 HackADay"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "summary": "{\n \"name\": \"df_partners\",\n \"rows\": 10,\n \"fields\": [\n {\n \"column\": \"partner\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"Medium\",\n \"AWS\",\n \"Browserbase\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
+ "variable_name": "df_partners"
+ }
+ },
+ "metadata": {},
+ "output_type": "execute_result",
+ "execution_count": 13
+ }
+ ],
+ "metadata": {
+ "id": "jNLaHXlEOisi",
+ "colab": {
+ "height": 363,
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "6f075db5-fc3f-437d-9aaa-d6f8e3085c49",
+ "executionInfo": {
+ "user": {
+ "userId": "10474323355016263615",
+ "displayName": "ScrapeGraphAI"
+ },
+ "status": "ok",
+ "elapsed": 216,
+ "user_tz": -60,
+ "timestamp": 1734533067079
+ }
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Save the results to CSV",
+ "outputs": [],
+ "metadata": {
+ "id": "v0CBYVk7qA5Z"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "# Save the DataFrames to a CSV file\ndf_company.to_csv(\"company_info.csv\", index=False)\ndf_founders.to_csv(\"founders.csv\", index=False)\ndf_pricing.to_csv(\"pricing_plans.csv\", index=False)\ndf_partners.to_csv(\"partners.csv\", index=False)\n# Print confirmation\nprint(\"Data saved to CSV files\")\n",
+ "outputs": [
+ {
+ "name": "stdout",
+ "text": [
+ "Data saved to CSV files\n"
+ ],
+ "output_type": "stream"
+ }
+ ],
+ "metadata": {
+ "id": "BtEbB9pmQGhO",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "3f05c8ba-7b34-4b53-ab20-bfcc78060557",
+ "executionInfo": {
+ "user": {
+ "userId": "10474323355016263615",
+ "displayName": "ScrapeGraphAI"
+ },
+ "status": "ok",
+ "elapsed": 213,
+ "user_tz": -60,
+ "timestamp": 1734533092882
+ }
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "## 🔗 Resources",
+ "outputs": [],
+ "metadata": {
+ "id": "-1SZT8VzTZNd"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "\n
\n\n\n- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n\nMade with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n",
+ "outputs": [],
+ "metadata": {
+ "id": "dUi2LtMLRDDR"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": [],
+ "authorship_tag": "ABX9TyO57uo4LpNqAm10rmE0B6Q5",
+ "collapsed_sections": [
+ "IzsyDXEWwPVt"
+ ]
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/cookbook/github-trending/scrapegraph_langchain.ipynb b/cookbook/github-trending/scrapegraph_langchain.ipynb
index cf0e99e..f26ed98 100644
--- a/cookbook/github-trending/scrapegraph_langchain.ipynb
+++ b/cookbook/github-trending/scrapegraph_langchain.ipynb
@@ -1 +1,986 @@
-{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[],"collapsed_sections":["jnqMB2-xVYQ7"],"authorship_tag":"ABX9TyMtqjL6QES980AWc+JvHhAw"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["
"],"metadata":{"id":"ReBHQ5_834pZ"}},{"cell_type":"markdown","source":["## 🕷️🦜 Extract Github Trending Repositories with langchain-scrapegraph\n"],"metadata":{"id":"jEkuKbcRrPcK"}},{"cell_type":"markdown","source":[""],"metadata":{"id":"IhozYNwsgJzt"}},{"cell_type":"markdown","source":["### 🔧 Install `dependencies`"],"metadata":{"id":"IzsyDXEWwPVt"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"os_vm0MkIxr9"},"outputs":[],"source":["%%capture\n","!pip install langchain-scrapegraph"]},{"cell_type":"markdown","source":["### 🔑 Import `ScrapeGraph` API key"],"metadata":{"id":"apBsL-L2KzM7"}},{"cell_type":"markdown","source":["You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"],"metadata":{"id":"ol9gQbAFkh9b"}},{"cell_type":"code","source":["import getpass\n","import os\n","\n","if not os.environ.get(\"SGAI_API_KEY\"):\n"," os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"sffqFG2EJ8bI","executionInfo":{"status":"ok","timestamp":1734619154386,"user_tz":-60,"elapsed":4395,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"07af4bbe-c226-4fb3-8f68-7ccd429f59cc"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["SGAI_API_KEY not found in environment.\n","Please enter your SGAI_API_KEY: ··········\n","SGAI_API_KEY has been set in the environment.\n"]}]},{"cell_type":"markdown","source":["### 📝 Defining an `Output Schema` for Webpage Content Extraction\n"],"metadata":{"id":"jnqMB2-xVYQ7"}},{"cell_type":"markdown","source":["If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n","\n","
\n"],"metadata":{"id":"VZvxbjfXvbgd"}},{"cell_type":"code","source":["from pydantic import BaseModel, Field\n","from typing import List\n","\n","# Schema for Trending Repositories\n","# This defines only the structure of how a single repository should look like\n","class RepositorySchema(BaseModel):\n"," name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n"," description: str = Field(description=\"Description of the repository\")\n"," stars: int = Field(description=\"Star count of the repository\")\n"," forks: int = Field(description=\"Fork count of the repository\")\n"," today_stars: int = Field(description=\"Stars gained today\")\n"," language: str = Field(description=\"Programming language used\")\n","\n","# Schema that contains a list of repositories\n","# This references the previous schema\n","class ListRepositoriesSchema(BaseModel):\n"," repositories: List[RepositorySchema] = Field(description=\"List of github trending repositories\")"],"metadata":{"id":"dlrOEgZk_8V4"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["### 🚀 Initialize `langchain-scrapegraph` tools and start extraction"],"metadata":{"id":"cDGH0b2DkY63"}},{"cell_type":"markdown","source":["Here we use `SmartScraperTool` to extract structured data using AI from a webpage.\n","\n","\n","> If you already have an HTML file, you can upload it and use `LocalScraperTool` instead.\n","\n","You can find more info in the [official langchain documentation](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n","\n"],"metadata":{"id":"M1KSXffZopUD"}},{"cell_type":"code","source":["from langchain_scrapegraph.tools import SmartScraperTool\n","\n","# Will automatically get SGAI_API_KEY from environment\n","# Initialization without output schema\n","# tool = SmartScraperTool()\n","\n","# Since we have defined an output schema, let's use it\n","# This will force the tool to have always the same output structure\n","tool = SmartScraperTool(llm_output_schema=ListRepositoriesSchema)"],"metadata":{"id":"2FIKomclLNFx"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["`Invoke` the tool"],"metadata":{"id":"XGAEX1ZPY7b7"}},{"cell_type":"code","source":["# Request for Trending Repositories\n","result = tool.invoke({\n"," \"website_url\":\"https://github.com/trending\",\n"," \"user_prompt\":\"Extract only the first ten github trending repositories\",\n"," }\n",")"],"metadata":{"id":"GZQr_Y59Y0df"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["> As you may have noticed, we are not passing the `llm_output_schema` while invoking the tool, this will make life easier to `AI agents` since they will not need to generate one themselves with high risk of failure. Instead, we force the tool to return always a structured output that follows your previously defined schema. To find out more, check the following [README](https://github.com/ScrapeGraphAI/langchain-scrapegraph)\n"],"metadata":{"id":"-6YKuEqiZcPC"}},{"cell_type":"markdown","source":["Print the response"],"metadata":{"id":"YZz1bqCIpoL8"}},{"cell_type":"code","source":["import json\n","\n","print(\"Trending Repositories:\")\n","print(json.dumps(result, indent=2))"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"F1VfD8B4LPc8","executionInfo":{"status":"ok","timestamp":1734621006360,"user_tz":-60,"elapsed":247,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"1faad90e-9f9a-496a-e771-d92007e06b0e"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Trending Repositories:\n","{\n"," \"repositories\": [\n"," {\n"," \"name\": \"XiaoMi/ha_xiaomi_home\",\n"," \"description\": \"Xiaomi Home Integration for Home Assistant\",\n"," \"stars\": 11097,\n"," \"forks\": 472,\n"," \"today_stars\": 3023,\n"," \"language\": \"Python\"\n"," },\n"," {\n"," \"name\": \"comet-ml/opik\",\n"," \"description\": \"Open-source end-to-end LLM Development Platform\",\n"," \"stars\": 2741,\n"," \"forks\": 169,\n"," \"today_stars\": 91,\n"," \"language\": \"Java\"\n"," },\n"," {\n"," \"name\": \"EbookFoundation/free-programming-books\",\n"," \"description\": \"\\ud83d\\udcda Freely available programming books\",\n"," \"stars\": 341919,\n"," \"forks\": 62038,\n"," \"today_stars\": 225,\n"," \"language\": \"HTML\"\n"," },\n"," {\n"," \"name\": \"konfig-dev/konfig\",\n"," \"description\": \"Sunset as of December 2024\",\n"," \"stars\": 689,\n"," \"forks\": 192,\n"," \"today_stars\": 224,\n"," \"language\": \"TypeScript\"\n"," },\n"," {\n"," \"name\": \"anoma/anoma\",\n"," \"description\": \"Reference implementation of Anoma\",\n"," \"stars\": 9451,\n"," \"forks\": 452,\n"," \"today_stars\": 4129,\n"," \"language\": \"Elixir\"\n"," },\n"," {\n"," \"name\": \"stripe/stripe-ios\",\n"," \"description\": \"Stripe iOS SDK\",\n"," \"stars\": 2292,\n"," \"forks\": 1004,\n"," \"today_stars\": 49,\n"," \"language\": \"Swift\"\n"," },\n"," {\n"," \"name\": \"Guovin/iptv-api\",\n"," \"description\": \"IPTV live TV source update tool\",\n"," \"stars\": 9385,\n"," \"forks\": 2010,\n"," \"today_stars\": 91,\n"," \"language\": \"Python\"\n"," },\n"," {\n"," \"name\": \"facebookresearch/AnimatedDrawings\",\n"," \"description\": \"Code to accompany \\\"A Method for Animating Children's Drawings of the Human Figure\\\"\",\n"," \"stars\": 11473,\n"," \"forks\": 988,\n"," \"today_stars\": 398,\n"," \"language\": \"Python\"\n"," },\n"," {\n"," \"name\": \"apache/airflow\",\n"," \"description\": \"Apache Airflow - A platform to programmatically author, schedule, and monitor workflows\",\n"," \"stars\": 37690,\n"," \"forks\": 14411,\n"," \"today_stars\": 25,\n"," \"language\": \"Python\"\n"," },\n"," {\n"," \"name\": \"seleniumbase/SeleniumBase\",\n"," \"description\": \"Python APIs for web automation, testing, and bypassing bot-detection.\",\n"," \"stars\": 6646,\n"," \"forks\": 1028,\n"," \"today_stars\": 624,\n"," \"language\": \"Python\"\n"," }\n"," ]\n","}\n"]}]},{"cell_type":"markdown","source":["### 💾 Save the output to a `CSV` file"],"metadata":{"id":"2as65QLypwdb"}},{"cell_type":"markdown","source":["Let's create a pandas dataframe and show the table with the extracted content"],"metadata":{"id":"HTLVFgbVLLBR"}},{"cell_type":"code","source":["import pandas as pd\n","\n","# Convert dictionary to DataFrame\n","df = pd.DataFrame(result[\"repositories\"])\n","df"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":730},"id":"1lS9O1KOI51y","executionInfo":{"status":"ok","timestamp":1734621018939,"user_tz":-60,"elapsed":516,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"4dc6a8db-7f6c-49b7-90fa-0e74c2cf77dd"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" name \\\n","0 XiaoMi/ha_xiaomi_home \n","1 comet-ml/opik \n","2 EbookFoundation/free-programming-books \n","3 konfig-dev/konfig \n","4 anoma/anoma \n","5 stripe/stripe-ios \n","6 Guovin/iptv-api \n","7 facebookresearch/AnimatedDrawings \n","8 apache/airflow \n","9 seleniumbase/SeleniumBase \n","\n"," description stars forks \\\n","0 Xiaomi Home Integration for Home Assistant 11097 472 \n","1 Open-source end-to-end LLM Development Platform 2741 169 \n","2 📚 Freely available programming books 341919 62038 \n","3 Sunset as of December 2024 689 192 \n","4 Reference implementation of Anoma 9451 452 \n","5 Stripe iOS SDK 2292 1004 \n","6 IPTV live TV source update tool 9385 2010 \n","7 Code to accompany \"A Method for Animating Chil... 11473 988 \n","8 Apache Airflow - A platform to programmaticall... 37690 14411 \n","9 Python APIs for web automation, testing, and b... 6646 1028 \n","\n"," today_stars language \n","0 3023 Python \n","1 91 Java \n","2 225 HTML \n","3 224 TypeScript \n","4 4129 Elixir \n","5 49 Swift \n","6 91 Python \n","7 398 Python \n","8 25 Python \n","9 624 Python "],"text/html":["\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df","summary":"{\n \"name\": \"df\",\n \"rows\": 10,\n \"fields\": [\n {\n \"column\": \"name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"apache/airflow\",\n \"comet-ml/opik\",\n \"stripe/stripe-ios\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"description\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"Apache Airflow - A platform to programmatically author, schedule, and monitor workflows\",\n \"Open-source end-to-end LLM Development Platform\",\n \"Stripe iOS SDK\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"stars\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 105428,\n \"min\": 689,\n \"max\": 341919,\n \"num_unique_values\": 10,\n \"samples\": [\n 37690,\n 2741,\n 2292\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"forks\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 19376,\n \"min\": 169,\n \"max\": 62038,\n \"num_unique_values\": 10,\n \"samples\": [\n 14411,\n 169,\n 1004\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"today_stars\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1451,\n \"min\": 25,\n \"max\": 4129,\n \"num_unique_values\": 9,\n \"samples\": [\n 25,\n 91,\n 49\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"language\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 6,\n \"samples\": [\n \"Python\",\n \"Java\",\n \"Swift\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":13}]},{"cell_type":"markdown","source":["Save it to CSV"],"metadata":{"id":"v0CBYVk7qA5Z"}},{"cell_type":"code","source":["# Save the DataFrame to a CSV file\n","csv_file = \"trending_repositories.csv\"\n","df.to_csv(csv_file, index=False)\n","print(f\"Data saved to {csv_file}\")"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"BtEbB9pmQGhO","executionInfo":{"status":"ok","timestamp":1734621022471,"user_tz":-60,"elapsed":236,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"bf8a22dc-e35e-4bae-948b-71c4c4622504"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Data saved to trending_repositories.csv\n"]}]},{"cell_type":"markdown","source":["## 🔗 Resources"],"metadata":{"id":"-1SZT8VzTZNd"}},{"cell_type":"markdown","source":["\n","
\n","\n","\n","- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n","- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n","- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n","- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n","- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n","- 🦜 **Langchain:** [ScrapeGraph docs](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n","\n","Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"],"metadata":{"id":"dUi2LtMLRDDR"}}]}
+{
+ "cells": [
+ {
+ "source": "## 🕷️🦜 Extract Github Trending Repositories with langchain-scrapegraph\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-langchain-1) [](https://colab.research.google.com/drive/19OQbJC_i7oZI789lcUJ4_j1GmdjzlP-Z?usp=sharing)",
+ "outputs": [
+ {
+ "data": {
+ "text/plain": "## 🕷️🦜 Extract Github Trending Repositories with langchain-scrapegraph\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-langchain-1) [](https://colab.research.google.com/drive/19OQbJC_i7oZI789lcUJ4_j1GmdjzlP-Z?usp=sharing)",
+ "text/markdown": "## 🕷️🦜 Extract Github Trending Repositories with langchain-scrapegraph\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-langchain-1) [](https://colab.research.google.com/drive/19OQbJC_i7oZI789lcUJ4_j1GmdjzlP-Z?usp=sharing)"
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "metadata": {
+ "id": "jEkuKbcRrPcK"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "completed",
+ "execution_count": null,
+ "executionEndTime": "2026-03-26T00:06:18.208Z",
+ "executionStartTime": "2026-03-26T00:06:18.208Z"
+ },
+ {
+ "source": "",
+ "outputs": [],
+ "metadata": {
+ "id": "IhozYNwsgJzt"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 🔧 Install `dependencies`",
+ "outputs": [],
+ "metadata": {
+ "id": "IzsyDXEWwPVt"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "%%capture\n!pip install langchain-scrapegraph",
+ "outputs": [],
+ "metadata": {
+ "id": "os_vm0MkIxr9"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 🔑 Import `ScrapeGraph` API key",
+ "outputs": [],
+ "metadata": {
+ "id": "apBsL-L2KzM7"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)",
+ "outputs": [],
+ "metadata": {
+ "id": "ol9gQbAFkh9b"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "import getpass\nimport os\n\nif not os.environ.get(\"SGAI_API_KEY\"):\n os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")",
+ "outputs": [
+ {
+ "name": "stdout",
+ "text": [
+ "SGAI_API_KEY not found in environment.\n",
+ "Please enter your SGAI_API_KEY: ··········\n",
+ "SGAI_API_KEY has been set in the environment.\n"
+ ],
+ "output_type": "stream"
+ }
+ ],
+ "metadata": {
+ "id": "sffqFG2EJ8bI",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "07af4bbe-c226-4fb3-8f68-7ccd429f59cc",
+ "executionInfo": {
+ "user": {
+ "userId": "10474323355016263615",
+ "displayName": "ScrapeGraphAI"
+ },
+ "status": "ok",
+ "elapsed": 4395,
+ "user_tz": -60,
+ "timestamp": 1734619154386
+ }
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 📝 Defining an `Output Schema` for Webpage Content Extraction\n",
+ "outputs": [],
+ "metadata": {
+ "id": "jnqMB2-xVYQ7"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n\n
\n",
+ "outputs": [],
+ "metadata": {
+ "id": "VZvxbjfXvbgd"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "from pydantic import BaseModel, Field\nfrom typing import List\n\n# Schema for Trending Repositories\n# This defines only the structure of how a single repository should look like\nclass RepositorySchema(BaseModel):\n name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n description: str = Field(description=\"Description of the repository\")\n stars: int = Field(description=\"Star count of the repository\")\n forks: int = Field(description=\"Fork count of the repository\")\n today_stars: int = Field(description=\"Stars gained today\")\n language: str = Field(description=\"Programming language used\")\n\n# Schema that contains a list of repositories\n# This references the previous schema\nclass ListRepositoriesSchema(BaseModel):\n repositories: List[RepositorySchema] = Field(description=\"List of github trending repositories\")",
+ "outputs": [],
+ "metadata": {
+ "id": "dlrOEgZk_8V4"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 🚀 Initialize `langchain-scrapegraph` tools and start extraction",
+ "outputs": [],
+ "metadata": {
+ "id": "cDGH0b2DkY63"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Here we use `SmartScraperTool` to extract structured data using AI from a webpage.\n\n\n> If you already have an HTML file, you can upload it and use `LocalScraperTool` instead.\n\nYou can find more info in the [official langchain documentation](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n\n",
+ "outputs": [],
+ "metadata": {
+ "id": "M1KSXffZopUD"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "from langchain_scrapegraph.tools import SmartScraperTool\n\n# Will automatically get SGAI_API_KEY from environment\n# Initialization without output schema\n# tool = SmartScraperTool()\n\n# Since we have defined an output schema, let's use it\n# This will force the tool to have always the same output structure\ntool = SmartScraperTool(llm_output_schema=ListRepositoriesSchema)",
+ "outputs": [],
+ "metadata": {
+ "id": "2FIKomclLNFx"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "`Invoke` the tool",
+ "outputs": [],
+ "metadata": {
+ "id": "XGAEX1ZPY7b7"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "# Request for Trending Repositories\nresult = tool.invoke({\n \"website_url\":\"https://github.com/trending\",\n \"user_prompt\":\"Extract only the first ten github trending repositories\",\n }\n)",
+ "outputs": [],
+ "metadata": {
+ "id": "GZQr_Y59Y0df"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "> As you may have noticed, we are not passing the `llm_output_schema` while invoking the tool, this will make life easier to `AI agents` since they will not need to generate one themselves with high risk of failure. Instead, we force the tool to return always a structured output that follows your previously defined schema. To find out more, check the following [README](https://github.com/ScrapeGraphAI/langchain-scrapegraph)\n",
+ "outputs": [],
+ "metadata": {
+ "id": "-6YKuEqiZcPC"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Print the response",
+ "outputs": [],
+ "metadata": {
+ "id": "YZz1bqCIpoL8"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "import json\n\nprint(\"Trending Repositories:\")\nprint(json.dumps(result, indent=2))",
+ "outputs": [
+ {
+ "name": "stdout",
+ "text": [
+ "Trending Repositories:\n",
+ "{\n",
+ " \"repositories\": [\n",
+ " {\n",
+ " \"name\": \"XiaoMi/ha_xiaomi_home\",\n",
+ " \"description\": \"Xiaomi Home Integration for Home Assistant\",\n",
+ " \"stars\": 11097,\n",
+ " \"forks\": 472,\n",
+ " \"today_stars\": 3023,\n",
+ " \"language\": \"Python\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"comet-ml/opik\",\n",
+ " \"description\": \"Open-source end-to-end LLM Development Platform\",\n",
+ " \"stars\": 2741,\n",
+ " \"forks\": 169,\n",
+ " \"today_stars\": 91,\n",
+ " \"language\": \"Java\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"EbookFoundation/free-programming-books\",\n",
+ " \"description\": \"\\ud83d\\udcda Freely available programming books\",\n",
+ " \"stars\": 341919,\n",
+ " \"forks\": 62038,\n",
+ " \"today_stars\": 225,\n",
+ " \"language\": \"HTML\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"konfig-dev/konfig\",\n",
+ " \"description\": \"Sunset as of December 2024\",\n",
+ " \"stars\": 689,\n",
+ " \"forks\": 192,\n",
+ " \"today_stars\": 224,\n",
+ " \"language\": \"TypeScript\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"anoma/anoma\",\n",
+ " \"description\": \"Reference implementation of Anoma\",\n",
+ " \"stars\": 9451,\n",
+ " \"forks\": 452,\n",
+ " \"today_stars\": 4129,\n",
+ " \"language\": \"Elixir\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"stripe/stripe-ios\",\n",
+ " \"description\": \"Stripe iOS SDK\",\n",
+ " \"stars\": 2292,\n",
+ " \"forks\": 1004,\n",
+ " \"today_stars\": 49,\n",
+ " \"language\": \"Swift\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"Guovin/iptv-api\",\n",
+ " \"description\": \"IPTV live TV source update tool\",\n",
+ " \"stars\": 9385,\n",
+ " \"forks\": 2010,\n",
+ " \"today_stars\": 91,\n",
+ " \"language\": \"Python\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"facebookresearch/AnimatedDrawings\",\n",
+ " \"description\": \"Code to accompany \\\"A Method for Animating Children's Drawings of the Human Figure\\\"\",\n",
+ " \"stars\": 11473,\n",
+ " \"forks\": 988,\n",
+ " \"today_stars\": 398,\n",
+ " \"language\": \"Python\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"apache/airflow\",\n",
+ " \"description\": \"Apache Airflow - A platform to programmatically author, schedule, and monitor workflows\",\n",
+ " \"stars\": 37690,\n",
+ " \"forks\": 14411,\n",
+ " \"today_stars\": 25,\n",
+ " \"language\": \"Python\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"seleniumbase/SeleniumBase\",\n",
+ " \"description\": \"Python APIs for web automation, testing, and bypassing bot-detection.\",\n",
+ " \"stars\": 6646,\n",
+ " \"forks\": 1028,\n",
+ " \"today_stars\": 624,\n",
+ " \"language\": \"Python\"\n",
+ " }\n",
+ " ]\n",
+ "}\n"
+ ],
+ "output_type": "stream"
+ }
+ ],
+ "metadata": {
+ "id": "F1VfD8B4LPc8",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "1faad90e-9f9a-496a-e771-d92007e06b0e",
+ "executionInfo": {
+ "user": {
+ "userId": "10474323355016263615",
+ "displayName": "ScrapeGraphAI"
+ },
+ "status": "ok",
+ "elapsed": 247,
+ "user_tz": -60,
+ "timestamp": 1734621006360
+ }
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 💾 Save the output to a `CSV` file",
+ "outputs": [],
+ "metadata": {
+ "id": "2as65QLypwdb"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Let's create a pandas dataframe and show the table with the extracted content",
+ "outputs": [],
+ "metadata": {
+ "id": "HTLVFgbVLLBR"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "import pandas as pd\n\n# Convert dictionary to DataFrame\ndf = pd.DataFrame(result[\"repositories\"])\ndf",
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "
\n"
+ ],
+ "text/plain": [
+ " name \\\n",
+ "0 XiaoMi/ha_xiaomi_home \n",
+ "1 comet-ml/opik \n",
+ "2 EbookFoundation/free-programming-books \n",
+ "3 konfig-dev/konfig \n",
+ "4 anoma/anoma \n",
+ "5 stripe/stripe-ios \n",
+ "6 Guovin/iptv-api \n",
+ "7 facebookresearch/AnimatedDrawings \n",
+ "8 apache/airflow \n",
+ "9 seleniumbase/SeleniumBase \n",
+ "\n",
+ " description stars forks \\\n",
+ "0 Xiaomi Home Integration for Home Assistant 11097 472 \n",
+ "1 Open-source end-to-end LLM Development Platform 2741 169 \n",
+ "2 📚 Freely available programming books 341919 62038 \n",
+ "3 Sunset as of December 2024 689 192 \n",
+ "4 Reference implementation of Anoma 9451 452 \n",
+ "5 Stripe iOS SDK 2292 1004 \n",
+ "6 IPTV live TV source update tool 9385 2010 \n",
+ "7 Code to accompany \"A Method for Animating Chil... 11473 988 \n",
+ "8 Apache Airflow - A platform to programmaticall... 37690 14411 \n",
+ "9 Python APIs for web automation, testing, and b... 6646 1028 \n",
+ "\n",
+ " today_stars language \n",
+ "0 3023 Python \n",
+ "1 91 Java \n",
+ "2 225 HTML \n",
+ "3 224 TypeScript \n",
+ "4 4129 Elixir \n",
+ "5 49 Swift \n",
+ "6 91 Python \n",
+ "7 398 Python \n",
+ "8 25 Python \n",
+ "9 624 Python "
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "summary": "{\n \"name\": \"df\",\n \"rows\": 10,\n \"fields\": [\n {\n \"column\": \"name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"apache/airflow\",\n \"comet-ml/opik\",\n \"stripe/stripe-ios\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"description\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"Apache Airflow - A platform to programmatically author, schedule, and monitor workflows\",\n \"Open-source end-to-end LLM Development Platform\",\n \"Stripe iOS SDK\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"stars\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 105428,\n \"min\": 689,\n \"max\": 341919,\n \"num_unique_values\": 10,\n \"samples\": [\n 37690,\n 2741,\n 2292\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"forks\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 19376,\n \"min\": 169,\n \"max\": 62038,\n \"num_unique_values\": 10,\n \"samples\": [\n 14411,\n 169,\n 1004\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"today_stars\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1451,\n \"min\": 25,\n \"max\": 4129,\n \"num_unique_values\": 9,\n \"samples\": [\n 25,\n 91,\n 49\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"language\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 6,\n \"samples\": [\n \"Python\",\n \"Java\",\n \"Swift\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
+ "variable_name": "df"
+ }
+ },
+ "metadata": {},
+ "output_type": "execute_result",
+ "execution_count": 13
+ }
+ ],
+ "metadata": {
+ "id": "1lS9O1KOI51y",
+ "colab": {
+ "height": 730,
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "4dc6a8db-7f6c-49b7-90fa-0e74c2cf77dd",
+ "executionInfo": {
+ "user": {
+ "userId": "10474323355016263615",
+ "displayName": "ScrapeGraphAI"
+ },
+ "status": "ok",
+ "elapsed": 516,
+ "user_tz": -60,
+ "timestamp": 1734621018939
+ }
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Save it to CSV",
+ "outputs": [],
+ "metadata": {
+ "id": "v0CBYVk7qA5Z"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "# Save the DataFrame to a CSV file\ncsv_file = \"trending_repositories.csv\"\ndf.to_csv(csv_file, index=False)\nprint(f\"Data saved to {csv_file}\")",
+ "outputs": [
+ {
+ "name": "stdout",
+ "text": [
+ "Data saved to trending_repositories.csv\n"
+ ],
+ "output_type": "stream"
+ }
+ ],
+ "metadata": {
+ "id": "BtEbB9pmQGhO",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "bf8a22dc-e35e-4bae-948b-71c4c4622504",
+ "executionInfo": {
+ "user": {
+ "userId": "10474323355016263615",
+ "displayName": "ScrapeGraphAI"
+ },
+ "status": "ok",
+ "elapsed": 236,
+ "user_tz": -60,
+ "timestamp": 1734621022471
+ }
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "## 🔗 Resources",
+ "outputs": [],
+ "metadata": {
+ "id": "-1SZT8VzTZNd"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "\n
\n\n\n- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n- 🦜 **Langchain:** [ScrapeGraph docs](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n\nMade with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n",
+ "outputs": [],
+ "metadata": {
+ "id": "dUi2LtMLRDDR"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": [],
+ "authorship_tag": "ABX9TyMtqjL6QES980AWc+JvHhAw",
+ "collapsed_sections": [
+ "jnqMB2-xVYQ7"
+ ]
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/cookbook/github-trending/scrapegraph_llama_index.ipynb b/cookbook/github-trending/scrapegraph_llama_index.ipynb
index 17f2b6a..2836b69 100644
--- a/cookbook/github-trending/scrapegraph_llama_index.ipynb
+++ b/cookbook/github-trending/scrapegraph_llama_index.ipynb
@@ -1,335 +1,284 @@
{
"cells": [
{
- "cell_type": "markdown",
- "metadata": {
- "id": "ReBHQ5_834pZ"
- },
- "source": [
- "
"
- ]
- },
- {
- "cell_type": "markdown",
+ "source": "## 🕷️ Extract Github Trending Repositories with llama index and scrapegraphai's APIs\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-llama-index-2) [](https://colab.research.google.com/drive/18Y1lwdrUm1qVG_Yh1Em_1z_3zyLcWAjV?usp=share_link)",
+ "outputs": [
+ {
+ "data": {
+ "text/plain": "## 🕷️ Extract Github Trending Repositories with llama index and scrapegraphai's APIs\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-llama-index-2) [](https://colab.research.google.com/drive/18Y1lwdrUm1qVG_Yh1Em_1z_3zyLcWAjV?usp=share_link)",
+ "text/markdown": "## 🕷️ Extract Github Trending Repositories with llama index and scrapegraphai's APIs\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-llama-index-2) [](https://colab.research.google.com/drive/18Y1lwdrUm1qVG_Yh1Em_1z_3zyLcWAjV?usp=share_link)"
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
"metadata": {
"id": "jEkuKbcRrPcK"
},
- "source": [
- "## 🕷️ Extract Github Trending Repositories with llama index and scrapegraphai's APIs\n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "completed",
+ "execution_count": null,
+ "executionEndTime": "2026-03-26T00:05:24.782Z",
+ "executionStartTime": "2026-03-26T00:05:24.782Z"
},
{
- "cell_type": "markdown",
+ "source": "",
+ "outputs": [],
"metadata": {
"id": "IhozYNwsgJzt"
},
- "source": [
- ""
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 🔧 Install `dependencies`",
+ "outputs": [],
"metadata": {
"id": "IzsyDXEWwPVt"
},
- "source": [
- "### 🔧 Install `dependencies`"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
+ "source": "%%capture\n!pip install llama-index\n!pip install llama-index-tools-scrapegraphai\n",
+ "outputs": [],
"metadata": {
"id": "os_vm0MkIxr9"
},
- "outputs": [],
- "source": [
- "%%capture\n",
- "!pip install llama-index\n",
- "!pip install llama-index-tools-scrapegraphai\n"
- ]
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 🔑 Import `ScrapeGraph` API key",
+ "outputs": [],
"metadata": {
"id": "apBsL-L2KzM7"
},
- "source": [
- "### 🔑 Import `ScrapeGraph` API key"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)",
+ "outputs": [],
"metadata": {
"id": "ol9gQbAFkh9b"
},
- "source": [
- "You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "sffqFG2EJ8bI",
- "outputId": "07af4bbe-c226-4fb3-8f68-7ccd429f59cc"
- },
+ "source": "import os\nfrom getpass import getpass\n\n# Check if the API key is already set in the environment\nsgai_api_key = os.getenv(\"SGAI_API_KEY\")\n\nif sgai_api_key:\n print(\"SGAI_API_KEY found in environment.\")\nelse:\n print(\"SGAI_API_KEY not found in environment.\")\n # Prompt the user to input the API key securely (hidden input)\n sgai_api_key = getpass(\"Please enter your SGAI_API_KEY: \").strip()\n if sgai_api_key:\n # Set the API key in the environment\n os.environ[\"SGAI_API_KEY\"] = sgai_api_key\n print(\"SGAI_API_KEY has been set in the environment.\")\n else:\n print(\"No API key entered. Please set the API key to continue.\")\n",
"outputs": [
{
"name": "stdout",
- "output_type": "stream",
"text": [
"SGAI_API_KEY not found in environment.\n",
"SGAI_API_KEY has been set in the environment.\n"
- ]
+ ],
+ "output_type": "stream"
}
],
- "source": [
- "import os\n",
- "from getpass import getpass\n",
- "\n",
- "# Check if the API key is already set in the environment\n",
- "sgai_api_key = os.getenv(\"SGAI_API_KEY\")\n",
- "\n",
- "if sgai_api_key:\n",
- " print(\"SGAI_API_KEY found in environment.\")\n",
- "else:\n",
- " print(\"SGAI_API_KEY not found in environment.\")\n",
- " # Prompt the user to input the API key securely (hidden input)\n",
- " sgai_api_key = getpass(\"Please enter your SGAI_API_KEY: \").strip()\n",
- " if sgai_api_key:\n",
- " # Set the API key in the environment\n",
- " os.environ[\"SGAI_API_KEY\"] = sgai_api_key\n",
- " print(\"SGAI_API_KEY has been set in the environment.\")\n",
- " else:\n",
- " print(\"No API key entered. Please set the API key to continue.\")\n"
- ]
+ "metadata": {
+ "id": "sffqFG2EJ8bI",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "07af4bbe-c226-4fb3-8f68-7ccd429f59cc"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 📝 Defining an `Output Schema` for Webpage Content Extraction\n",
+ "outputs": [],
"metadata": {
"id": "jnqMB2-xVYQ7"
},
- "source": [
- "### 📝 Defining an `Output Schema` for Webpage Content Extraction\n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n\n
\n",
+ "outputs": [],
"metadata": {
"id": "VZvxbjfXvbgd"
},
- "source": [
- "If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n",
- "\n",
- "
\n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
+ "source": "from pydantic import BaseModel, Field\nfrom typing import List\n\n# Schema for Trending Repositories\n# This defines only the structure of how a single repository should look like\nclass RepositorySchema(BaseModel):\n name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n description: str = Field(description=\"Description of the repository\")\n stars: int = Field(description=\"Star count of the repository\")\n forks: int = Field(description=\"Fork count of the repository\")\n today_stars: int = Field(description=\"Stars gained today\")\n language: str = Field(description=\"Programming language used\")\n\n# Schema that contains a list of repositories\n# This references the previous schema\nclass ListRepositoriesSchema(BaseModel):\n repositories: List[RepositorySchema] = Field(description=\"List of github trending repositories\")",
+ "outputs": [],
"metadata": {
"id": "dlrOEgZk_8V4"
},
- "outputs": [],
- "source": [
- "from pydantic import BaseModel, Field\n",
- "from typing import List\n",
- "\n",
- "# Schema for Trending Repositories\n",
- "# This defines only the structure of how a single repository should look like\n",
- "class RepositorySchema(BaseModel):\n",
- " name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n",
- " description: str = Field(description=\"Description of the repository\")\n",
- " stars: int = Field(description=\"Star count of the repository\")\n",
- " forks: int = Field(description=\"Fork count of the repository\")\n",
- " today_stars: int = Field(description=\"Stars gained today\")\n",
- " language: str = Field(description=\"Programming language used\")\n",
- "\n",
- "# Schema that contains a list of repositories\n",
- "# This references the previous schema\n",
- "class ListRepositoriesSchema(BaseModel):\n",
- " repositories: List[RepositorySchema] = Field(description=\"List of github trending repositories\")"
- ]
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 🚀 Initialize `ScrapegraphToolSpec` tools and start extraction",
+ "outputs": [],
"metadata": {
"id": "cDGH0b2DkY63"
},
- "source": [
- "### 🚀 Initialize `ScrapegraphToolSpec` tools and start extraction"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "Here we use `scrapegraph_smartscraper` to extract structured data using AI from a webpage.\n\n\n> If you already have an HTML file, you can upload it and use `scrapegraph_local_scrape` instead.\n\nYou can find more info in the [official llama-index documentation](https://docs.llamaindex.ai/en/stable/api_reference/tools/scrapegraph/)",
+ "outputs": [],
"metadata": {
"id": "M1KSXffZopUD"
},
- "source": [
- "Here we use `scrapegraph_smartscraper` to extract structured data using AI from a webpage.\n",
- "\n",
- "\n",
- "> If you already have an HTML file, you can upload it and use `scrapegraph_local_scrape` instead.\n",
- "\n",
- "You can find more info in the [official llama-index documentation](https://docs.llamaindex.ai/en/stable/api_reference/tools/scrapegraph/)"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
+ "source": "from llama_index.tools.scrapegraph.base import ScrapegraphToolSpec",
+ "outputs": [],
"metadata": {
"id": "bNt9QkkEncIA"
},
- "outputs": [],
- "source": [
- "from llama_index.tools.scrapegraph.base import ScrapegraphToolSpec"
- ]
- },
- {
"cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
"execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "scrapegraph_tool = ScrapegraphToolSpec()",
+ "outputs": [],
"metadata": {
"id": "2FIKomclLNFx"
},
- "outputs": [],
- "source": [
- "scrapegraph_tool = ScrapegraphToolSpec()"
- ]
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "`Invoke` the tool",
+ "outputs": [],
"metadata": {
"id": "XGAEX1ZPY7b7"
},
- "source": [
- "`Invoke` the tool"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
+ "source": "response = scrapegraph_tool.scrapegraph_smartscraper(\n prompt=\"Extract only the first ten github trending repositories\",\n url=\"https://github.com/trending\",\n api_key=os.environ.get(\"SGAI_API_KEY\"),\n schema=ListRepositoriesSchema,\n)",
+ "outputs": [],
"metadata": {
"id": "GZQr_Y59Y0df"
},
- "outputs": [],
- "source": [
- "response = scrapegraph_tool.scrapegraph_smartscraper(\n",
- " prompt=\"Extract only the first ten github trending repositories\",\n",
- " url=\"https://github.com/trending\",\n",
- " api_key=os.environ.get(\"SGAI_API_KEY\"),\n",
- " schema=ListRepositoriesSchema,\n",
- ")"
- ]
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "> As you may have noticed, we are not passing the `llm_output_schema` while invoking the tool, this will make life easier to `AI agents` since they will not need to generate one themselves with high risk of failure. Instead, we force the tool to return always a structured output that follows your previously defined schema. To find out more, check the following [README](https://github.com/ScrapeGraphAI/langchain-scrapegraph)\n",
+ "outputs": [],
"metadata": {
"id": "-6YKuEqiZcPC"
},
- "source": [
- "> As you may have noticed, we are not passing the `llm_output_schema` while invoking the tool, this will make life easier to `AI agents` since they will not need to generate one themselves with high risk of failure. Instead, we force the tool to return always a structured output that follows your previously defined schema. To find out more, check the following [README](https://github.com/ScrapeGraphAI/langchain-scrapegraph)\n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "Print the response",
+ "outputs": [],
"metadata": {
"id": "YZz1bqCIpoL8"
},
- "source": [
- "Print the response"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "F1VfD8B4LPc8",
- "outputId": "1faad90e-9f9a-496a-e771-d92007e06b0e"
- },
+ "source": "import json\n\nprint(\"Trending Repositories:\")\nprint(json.dumps(response, indent=2))",
"outputs": [
{
"name": "stdout",
- "output_type": "stream",
"text": [
"Trending Repositories:\n",
"{\n",
@@ -416,53 +365,58 @@
" }\n",
" ]\n",
"}\n"
- ]
+ ],
+ "output_type": "stream"
}
],
- "source": [
- "import json\n",
- "\n",
- "print(\"Trending Repositories:\")\n",
- "print(json.dumps(response, indent=2))"
- ]
+ "metadata": {
+ "id": "F1VfD8B4LPc8",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "1faad90e-9f9a-496a-e771-d92007e06b0e"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 💾 Save the output to a `CSV` file",
+ "outputs": [],
"metadata": {
"id": "2as65QLypwdb"
},
- "source": [
- "### 💾 Save the output to a `CSV` file"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "Let's create a pandas dataframe and show the table with the extracted content",
+ "outputs": [],
"metadata": {
"id": "HTLVFgbVLLBR"
},
- "source": [
- "Let's create a pandas dataframe and show the table with the extracted content"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/",
- "height": 730
- },
- "id": "1lS9O1KOI51y",
- "outputId": "4dc6a8db-7f6c-49b7-90fa-0e74c2cf77dd"
- },
+ "source": "import pandas as pd\n\n# Convert dictionary to DataFrame\ndf = pd.DataFrame(response[\"repositories\"])\ndf",
"outputs": [
{
"data": {
- "application/vnd.google.colaboratory.intrinsic+json": {
- "summary": "{\n \"name\": \"df\",\n \"rows\": 10,\n \"fields\": [\n {\n \"column\": \"name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"apache/airflow\",\n \"comet-ml/opik\",\n \"stripe/stripe-ios\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"description\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"Apache Airflow - A platform to programmatically author, schedule, and monitor workflows\",\n \"Open-source end-to-end LLM Development Platform\",\n \"Stripe iOS SDK\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"stars\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 105428,\n \"min\": 689,\n \"max\": 341919,\n \"num_unique_values\": 10,\n \"samples\": [\n 37690,\n 2741,\n 2292\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"forks\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 19376,\n \"min\": 169,\n \"max\": 62038,\n \"num_unique_values\": 10,\n \"samples\": [\n 14411,\n 169,\n 1004\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"today_stars\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1451,\n \"min\": 25,\n \"max\": 4129,\n \"num_unique_values\": 9,\n \"samples\": [\n 25,\n 91,\n 49\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"language\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 6,\n \"samples\": [\n \"Python\",\n \"Java\",\n \"Swift\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
- "type": "dataframe",
- "variable_name": "df"
- },
"text/html": [
"\n",
"
\n",
@@ -887,113 +841,127 @@
"7 398 Python \n",
"8 25 Python \n",
"9 624 Python "
- ]
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "summary": "{\n \"name\": \"df\",\n \"rows\": 10,\n \"fields\": [\n {\n \"column\": \"name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"apache/airflow\",\n \"comet-ml/opik\",\n \"stripe/stripe-ios\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"description\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 10,\n \"samples\": [\n \"Apache Airflow - A platform to programmatically author, schedule, and monitor workflows\",\n \"Open-source end-to-end LLM Development Platform\",\n \"Stripe iOS SDK\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"stars\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 105428,\n \"min\": 689,\n \"max\": 341919,\n \"num_unique_values\": 10,\n \"samples\": [\n 37690,\n 2741,\n 2292\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"forks\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 19376,\n \"min\": 169,\n \"max\": 62038,\n \"num_unique_values\": 10,\n \"samples\": [\n 14411,\n 169,\n 1004\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"today_stars\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1451,\n \"min\": 25,\n \"max\": 4129,\n \"num_unique_values\": 9,\n \"samples\": [\n 25,\n 91,\n 49\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"language\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 6,\n \"samples\": [\n \"Python\",\n \"Java\",\n \"Swift\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
+ "variable_name": "df"
+ }
},
- "execution_count": 13,
"metadata": {},
- "output_type": "execute_result"
+ "output_type": "execute_result",
+ "execution_count": 13
}
],
- "source": [
- "import pandas as pd\n",
- "\n",
- "# Convert dictionary to DataFrame\n",
- "df = pd.DataFrame(response[\"repositories\"])\n",
- "df"
- ]
+ "metadata": {
+ "id": "1lS9O1KOI51y",
+ "colab": {
+ "height": 730,
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "4dc6a8db-7f6c-49b7-90fa-0e74c2cf77dd"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "Save it to CSV",
+ "outputs": [],
"metadata": {
"id": "v0CBYVk7qA5Z"
},
- "source": [
- "Save it to CSV"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "BtEbB9pmQGhO",
- "outputId": "bf8a22dc-e35e-4bae-948b-71c4c4622504"
- },
+ "source": "# Save the DataFrame to a CSV file\ncsv_file = \"trending_repositories.csv\"\ndf.to_csv(csv_file, index=False)\nprint(f\"Data saved to {csv_file}\")",
"outputs": [
{
"name": "stdout",
- "output_type": "stream",
"text": [
"Data saved to trending_repositories.csv\n"
- ]
+ ],
+ "output_type": "stream"
}
],
- "source": [
- "# Save the DataFrame to a CSV file\n",
- "csv_file = \"trending_repositories.csv\"\n",
- "df.to_csv(csv_file, index=False)\n",
- "print(f\"Data saved to {csv_file}\")"
- ]
+ "metadata": {
+ "id": "BtEbB9pmQGhO",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "bf8a22dc-e35e-4bae-948b-71c4c4622504"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "## 🔗 Resources",
+ "outputs": [],
"metadata": {
"id": "-1SZT8VzTZNd"
},
- "source": [
- "## 🔗 Resources"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "\n
\n
\n
\n\n\n- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n- 🦙 **LlamaIndex:** [ScrapeGraph docs](https://docs.llamaindex.ai/en/stable/api_reference/tools/scrapegraph/)\n\nMade with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n",
+ "outputs": [],
"metadata": {
"id": "dUi2LtMLRDDR"
},
- "source": [
- "\n",
- "
\n",
- "
\n",
- "
\n",
- "\n",
- "\n",
- "- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n",
- "- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n",
- "- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n",
- "- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n",
- "- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n",
- "- 🦙 **LlamaIndex:** [ScrapeGraph docs](https://docs.llamaindex.ai/en/stable/api_reference/tools/scrapegraph/)\n",
- "\n",
- "Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
}
],
"metadata": {
"colab": {
+ "provenance": [],
"collapsed_sections": [
"jnqMB2-xVYQ7"
- ],
- "provenance": []
+ ]
},
"kernelspec": {
- "display_name": "Python 3",
- "name": "python3"
+ "name": "python3",
+ "display_name": "Python 3"
},
"language_info": {
+ "name": "python",
+ "version": "3.10.14",
+ "mimetype": "text/x-python",
+ "file_extension": ".py",
+ "pygments_lexer": "ipython3",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.10.14"
+ "nbconvert_exporter": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
-}
+}
\ No newline at end of file
diff --git a/cookbook/github-trending/scrapegraph_sdk.ipynb b/cookbook/github-trending/scrapegraph_sdk.ipynb
index cd591e9..462ab90 100644
--- a/cookbook/github-trending/scrapegraph_sdk.ipynb
+++ b/cookbook/github-trending/scrapegraph_sdk.ipynb
@@ -1 +1,1116 @@
-{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[],"collapsed_sections":["IzsyDXEWwPVt","jnqMB2-xVYQ7","cDGH0b2DkY63","2as65QLypwdb"],"authorship_tag":"ABX9TyM1qXPrrrWt8sAHKB8wCDas"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["
\n","
\n",""],"metadata":{"id":"ReBHQ5_834pZ"}},{"cell_type":"markdown","source":["## 🕷️ Extract Github Trending Repositories with Official Scrapegraph SDK"],"metadata":{"id":"jEkuKbcRrPcK"}},{"cell_type":"markdown","source":[""],"metadata":{"id":"d7Zro0xiuo-l"}},{"cell_type":"markdown","source":["### 🔧 Install `dependencies`"],"metadata":{"id":"IzsyDXEWwPVt"}},{"cell_type":"code","execution_count":null,"metadata":{"id":"os_vm0MkIxr9"},"outputs":[],"source":["%%capture\n","!pip install scrapegraph-py"]},{"cell_type":"markdown","source":["### 🔑 Import `ScrapeGraph` API key"],"metadata":{"id":"apBsL-L2KzM7"}},{"cell_type":"markdown","source":["You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"],"metadata":{"id":"ol9gQbAFkh9b"}},{"cell_type":"code","source":["import getpass\n","import os\n","\n","if not os.environ.get(\"SGAI_API_KEY\"):\n"," os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"sffqFG2EJ8bI","executionInfo":{"status":"ok","timestamp":1734439787062,"user_tz":-60,"elapsed":5826,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"ab74193e-e746-4de6-d65d-33a2a26b5d86"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["SGAI_API_KEY not found in environment.\n","Please enter your SGAI_API_KEY: ··········\n","SGAI_API_KEY has been set in the environment.\n"]}]},{"cell_type":"markdown","source":["### 📝 Defining an `Output Schema` for Webpage Content Extraction\n"],"metadata":{"id":"jnqMB2-xVYQ7"}},{"cell_type":"markdown","source":["If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n","\n","
\n"," Pydantic Schema Quick Guide
\n","\n","Types of Schemas \n","\n","1. Simple Schema \n","Use this when you want to extract straightforward information, such as a single piece of content. \n","\n","```python\n","from pydantic import BaseModel, Field\n","\n","# Simple schema for a single webpage\n","class PageInfoSchema(BaseModel):\n"," title: str = Field(description=\"The title of the webpage\")\n"," description: str = Field(description=\"The description of the webpage\")\n","\n","# Example Output JSON after AI extraction\n","{\n"," \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n"," \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n","}\n","```\n","\n","2. Complex Schema (Nested) \n","If you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n","\n","```python\n","from pydantic import BaseModel, Field\n","from typing import List\n","\n","# Define a schema for a single repository\n","class RepositorySchema(BaseModel):\n"," name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n"," description: str = Field(description=\"Description of the repository\")\n"," stars: int = Field(description=\"Star count of the repository\")\n"," forks: int = Field(description=\"Fork count of the repository\")\n"," today_stars: int = Field(description=\"Stars gained today\")\n"," language: str = Field(description=\"Programming language used\")\n","\n","# Define a schema for a list of repositories\n","class ListRepositoriesSchema(BaseModel):\n"," repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n","\n","# Example Output JSON after AI extraction\n","{\n"," \"repositories\": [\n"," {\n"," \"name\": \"google-gemini/cookbook\",\n"," \"description\": \"Examples and guides for using the Gemini API\",\n"," \"stars\": 8036,\n"," \"forks\": 1001,\n"," \"today_stars\": 649,\n"," \"language\": \"Jupyter Notebook\"\n"," },\n"," {\n"," \"name\": \"TEN-framework/TEN-Agent\",\n"," \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n"," \"stars\": 3224,\n"," \"forks\": 311,\n"," \"today_stars\": 361,\n"," \"language\": \"Python\"\n"," }\n"," ]\n","}\n","```\n","\n","Key Takeaways \n","- **Simple Schema**: Perfect for small, straightforward extractions. \n","- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n","\n","Both approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n"," \n"],"metadata":{"id":"VZvxbjfXvbgd"}},{"cell_type":"code","source":["from pydantic import BaseModel, Field\n","from typing import List\n","\n","# Schema for Trending Repositories\n","# This defines only the structure of how a single repository should look like\n","class RepositorySchema(BaseModel):\n"," name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n"," description: str = Field(description=\"Description of the repository\")\n"," stars: int = Field(description=\"Star count of the repository\")\n"," forks: int = Field(description=\"Fork count of the repository\")\n"," today_stars: int = Field(description=\"Stars gained today\")\n"," language: str = Field(description=\"Programming language used\")\n","\n","# Schema that contains a list of repositories\n","# This references the previous schema\n","class ListRepositoriesSchema(BaseModel):\n"," repositories: List[RepositorySchema] = Field(description=\"List of github trending repositories\")"],"metadata":{"id":"dlrOEgZk_8V4"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["### 🚀 Initialize `SGAI Client` and start extraction"],"metadata":{"id":"cDGH0b2DkY63"}},{"cell_type":"markdown","source":["Initialize the client for scraping (there's also an async version [here](https://github.com/ScrapeGraphAI/scrapegraph-sdk/blob/main/scrapegraph-py/examples/async_smartscraper_example.py))"],"metadata":{"id":"4SLJgXgcob6L"}},{"cell_type":"code","source":["from scrapegraph_py import Client\n","\n","# Initialize the client with explicit API key\n","sgai_client = Client(api_key=sgai_api_key)"],"metadata":{"id":"PQI25GZvoCSk"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Here we use `Smartscraper` service to extract structured data using AI from a webpage.\n","\n","\n","> If you already have an HTML file, you can upload it and use `Localscraper` instead.\n","\n","\n","\n"],"metadata":{"id":"M1KSXffZopUD"}},{"cell_type":"code","source":["# Request for Trending Repositories\n","repo_response = sgai_client.smartscraper(\n"," website_url=\"https://github.com/trending\",\n"," user_prompt=\"Extract only the visible github trending repositories\",\n"," output_schema=ListRepositoriesSchema,\n",")"],"metadata":{"id":"2FIKomclLNFx"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["Print the response"],"metadata":{"id":"YZz1bqCIpoL8"}},{"cell_type":"code","source":["import json\n","\n","# Print the response\n","request_id = repo_response['request_id']\n","result = repo_response['result']\n","\n","print(f\"Request ID: {request_id}\")\n","print(\"Trending Repositories:\")\n","print(json.dumps(result, indent=2))"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"F1VfD8B4LPc8","executionInfo":{"status":"ok","timestamp":1734439624722,"user_tz":-60,"elapsed":266,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"6b4db540-076e-4d3f-a5ef-a29e14fbb233"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Request ID: 1e3b00ff-4b55-497c-8046-8ec5503cdafd\n","Trending Repositories:\n","{\n"," \"repositories\": [\n"," {\n"," \"name\": \"Byaidu/PDFMathTranslate\",\n"," \"description\": \"PDF scientific paper translation with preserved formats - \\u57fa\\u4e8e AI \\u5b8c\\u6574\\u4fdd\\u7559\\u6392\\u7248\\u7684 PDF \\u6587\\u6863\\u5168\\u6587\\u53cc\\u8bed\\u7ffb\\u8bd1\\uff0c\\u652f\\u6301 Google/DeepL/Ollama/OpenAI \\u7b49\\u670d\\u52a1\\uff0c\\u63d0\\u4f9b CLI/GUI/Docker\",\n"," \"stars\": 8902,\n"," \"forks\": 633,\n"," \"today_stars\": 816,\n"," \"language\": \"Python\"\n"," },\n"," {\n"," \"name\": \"bigskysoftware/htmx\",\n"," \"description\": \"htmx - high power tools for HTML\",\n"," \"stars\": 39143,\n"," \"forks\": 1324,\n"," \"today_stars\": 186,\n"," \"language\": \"JavaScript\"\n"," },\n"," {\n"," \"name\": \"commaai/openpilot\",\n"," \"description\": \"openpilot is an operating system for robotics. Currently, it upgrades the driver assistance system on 275+ supported cars.\",\n"," \"stars\": 50945,\n"," \"forks\": 9206,\n"," \"today_stars\": 132,\n"," \"language\": \"Python\"\n"," },\n"," {\n"," \"name\": \"google-gemini/cookbook\",\n"," \"description\": \"Examples and guides for using the Gemini API\",\n"," \"stars\": 8108,\n"," \"forks\": 1011,\n"," \"today_stars\": 1221,\n"," \"language\": \"Jupyter Notebook\"\n"," },\n"," {\n"," \"name\": \"stripe/stripe-ios\",\n"," \"description\": \"Stripe iOS SDK\",\n"," \"stars\": 2179,\n"," \"forks\": 994,\n"," \"today_stars\": 19,\n"," \"language\": \"Swift\"\n"," },\n"," {\n"," \"name\": \"RIOT-OS/RIOT\",\n"," \"description\": \"RIOT - The friendly OS for IoT\",\n"," \"stars\": 5234,\n"," \"forks\": 2017,\n"," \"today_stars\": 168,\n"," \"language\": \"C\"\n"," },\n"," {\n"," \"name\": \"zju3dv/EasyVolcap\",\n"," \"description\": \"EasyVolcap: Accelerating Neural Volumetric Video Research\",\n"," \"stars\": 802,\n"," \"forks\": 52,\n"," \"today_stars\": 30,\n"," \"language\": \"Python\"\n"," },\n"," {\n"," \"name\": \"TEN-framework/TEN-Agent\",\n"," \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more. It offers real-time capabilities to see, hear, and speak, along with advanced tools like weather checks, web search, and RAG.\",\n"," \"stars\": 3245,\n"," \"forks\": 313,\n"," \"today_stars\": 296,\n"," \"language\": \"Python\"\n"," },\n"," {\n"," \"name\": \"DS4SD/docling\",\n"," \"description\": \"Get your documents ready for gen AI\",\n"," \"stars\": 15201,\n"," \"forks\": 774,\n"," \"today_stars\": 281,\n"," \"language\": \"Python\"\n"," },\n"," {\n"," \"name\": \"Guovin/iptv-api\",\n"," \"description\": \"\\ud83d\\udcfaIPTV\\u7535\\u89c6\\u76f4\\u64ad\\u6e90\\u66f4\\u65b0\\u5de5\\u5177\\uff1a\\u2728\\u592e\\u89c6\\u9891\\u3001\\ud83d\\udcf1\\u536b\\u89c6\\u3001\\u2615\\u5404\\u7701\\u4efd\\u5730\\u65b9\\u53f0\\u3001\\ud83c\\udf0f\\u6e2f\\u00b7\\u6fb3\\u00b7\\u53f0\\u3001\\ud83c\\udfa5\\u7535\\u5f71\\u3001\\ud83c\\udfae\\u6e38\\u620f\\u3001\\ud83c\\udfb5\\u97f3\\u4e50\\u3001\\ud83c\\udfad\\u7ecf\\u5178\\u5267\\u573a\\uff1b\\u652f\\u6301IPv4/IPv6\\uff1b\\u652f\\u6301\\u81ea\\u5b9a\\u4e49\\u589e\\u52a0\\u9891\\u9053\\uff1b\\u652f\\u6301\\u805a\\u5408\\u6e90\\u3001\\u4ee3\\u7406\\u6e90\\u3001\\u8ba2\\u9605\\u6e90\\u3001\\u5173\\u952e\\u5b57\\u641c\\u7d22\\uff1b\\u6bcf\\u5929\\u81ea\\u52a8\\u66f4\\u65b0\\u4e24\\u6b21\\uff0c\\u7ed3\\u679c\\u53ef\\u7528\\u4e8eTVBox\\u7b49\\u64ad\\u653e\\u8f6f\\u4ef6\\uff1b\\u652f\\u6301\\u5de5\\u4f5c\\u6d41\\u3001Docker(amd64/arm64/arm v7)\\u3001\\u547d\\u4ee4\\u884c\\u3001GUI\\u8fd0\\u884c\\u65b9\\u5f0f | IPTV live TV source update tool\",\n"," \"stars\": 9046,\n"," \"forks\": 1938,\n"," \"today_stars\": 101,\n"," \"language\": \"Python\"\n"," },\n"," {\n"," \"name\": \"fatedier/frp\",\n"," \"description\": \"A fast reverse proxy to help you expose a local server behind a NAT or firewall to the internet.\",\n"," \"stars\": 87828,\n"," \"forks\": 13502,\n"," \"today_stars\": 64,\n"," \"language\": \"Go\"\n"," },\n"," {\n"," \"name\": \"facebookresearch/AnimatedDrawings\",\n"," \"description\": \"Code to accompany \\\"A Method for Animating Children's Drawings of the Human Figure\\\"\",\n"," \"stars\": 10766,\n"," \"forks\": 955,\n"," \"today_stars\": 38,\n"," \"language\": \"Python\"\n"," },\n"," {\n"," \"name\": \"gorilla/websocket\",\n"," \"description\": \"Package gorilla/websocket is a fast, well-tested and widely used WebSocket implementation for Go.\",\n"," \"stars\": 22633,\n"," \"forks\": 3495,\n"," \"today_stars\": 13,\n"," \"language\": \"Go\"\n"," },\n"," {\n"," \"name\": \"DefiLlama/chainlist\",\n"," \"description\": \"NA\",\n"," \"stars\": 2368,\n"," \"forks\": 2476,\n"," \"today_stars\": 5,\n"," \"language\": \"JavaScript\"\n"," },\n"," {\n"," \"name\": \"open-telemetry/opentelemetry-collector\",\n"," \"description\": \"OpenTelemetry Collector\",\n"," \"stars\": 4570,\n"," \"forks\": 1497,\n"," \"today_stars\": 4,\n"," \"language\": \"Go\"\n"," },\n"," {\n"," \"name\": \"RocketChat/Rocket.Chat\",\n"," \"description\": \"The communications platform that puts data protection first.\",\n"," \"stars\": 41169,\n"," \"forks\": 10877,\n"," \"today_stars\": 73,\n"," \"language\": \"TypeScript\"\n"," },\n"," {\n"," \"name\": \"langgenius/dify\",\n"," \"description\": \"Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.\",\n"," \"stars\": 54976,\n"," \"forks\": 8083,\n"," \"today_stars\": 127,\n"," \"language\": \"TypeScript\"\n"," }\n"," ]\n","}\n"]}]},{"cell_type":"markdown","source":["### 💾 Save the output to a `CSV` file"],"metadata":{"id":"2as65QLypwdb"}},{"cell_type":"markdown","source":["Let's create a pandas dataframe and show the table with the extracted content"],"metadata":{"id":"HTLVFgbVLLBR"}},{"cell_type":"code","source":["import pandas as pd\n","\n","# Convert dictionary to DataFrame\n","df = pd.DataFrame(result[\"repositories\"])\n","df"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":635},"id":"1lS9O1KOI51y","executionInfo":{"status":"ok","timestamp":1734439642507,"user_tz":-60,"elapsed":262,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"34a068a4-0fd0-47aa-a139-637b803f14f5"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" name \\\n","0 Byaidu/PDFMathTranslate \n","1 bigskysoftware/htmx \n","2 commaai/openpilot \n","3 google-gemini/cookbook \n","4 stripe/stripe-ios \n","5 RIOT-OS/RIOT \n","6 zju3dv/EasyVolcap \n","7 TEN-framework/TEN-Agent \n","8 DS4SD/docling \n","9 Guovin/iptv-api \n","10 fatedier/frp \n","11 facebookresearch/AnimatedDrawings \n","12 gorilla/websocket \n","13 DefiLlama/chainlist \n","14 open-telemetry/opentelemetry-collector \n","15 RocketChat/Rocket.Chat \n","16 langgenius/dify \n","\n"," description stars forks \\\n","0 PDF scientific paper translation with preserve... 8902 633 \n","1 htmx - high power tools for HTML 39143 1324 \n","2 openpilot is an operating system for robotics.... 50945 9206 \n","3 Examples and guides for using the Gemini API 8108 1011 \n","4 Stripe iOS SDK 2179 994 \n","5 RIOT - The friendly OS for IoT 5234 2017 \n","6 EasyVolcap: Accelerating Neural Volumetric Vid... 802 52 \n","7 TEN Agent is a conversational AI powered by TE... 3245 313 \n","8 Get your documents ready for gen AI 15201 774 \n","9 📺IPTV电视直播源更新工具:✨央视频、📱卫视、☕各省份地方台、🌏港·澳·台、🎥电影、🎮游戏... 9046 1938 \n","10 A fast reverse proxy to help you expose a loca... 87828 13502 \n","11 Code to accompany \"A Method for Animating Chil... 10766 955 \n","12 Package gorilla/websocket is a fast, well-test... 22633 3495 \n","13 NA 2368 2476 \n","14 OpenTelemetry Collector 4570 1497 \n","15 The communications platform that puts data pro... 41169 10877 \n","16 Dify is an open-source LLM app development pla... 54976 8083 \n","\n"," today_stars language \n","0 816 Python \n","1 186 JavaScript \n","2 132 Python \n","3 1221 Jupyter Notebook \n","4 19 Swift \n","5 168 C \n","6 30 Python \n","7 296 Python \n","8 281 Python \n","9 101 Python \n","10 64 Go \n","11 38 Python \n","12 13 Go \n","13 5 JavaScript \n","14 4 Go \n","15 73 TypeScript \n","16 127 TypeScript "],"text/html":["\n","
\n","
\n","\n","
\n"," \n"," \n"," | \n"," name | \n"," description | \n"," stars | \n"," forks | \n"," today_stars | \n"," language | \n","
\n"," \n"," \n"," \n"," | 0 | \n"," Byaidu/PDFMathTranslate | \n"," PDF scientific paper translation with preserve... | \n"," 8902 | \n"," 633 | \n"," 816 | \n"," Python | \n","
\n"," \n"," | 1 | \n"," bigskysoftware/htmx | \n"," htmx - high power tools for HTML | \n"," 39143 | \n"," 1324 | \n"," 186 | \n"," JavaScript | \n","
\n"," \n"," | 2 | \n"," commaai/openpilot | \n"," openpilot is an operating system for robotics.... | \n"," 50945 | \n"," 9206 | \n"," 132 | \n"," Python | \n","
\n"," \n"," | 3 | \n"," google-gemini/cookbook | \n"," Examples and guides for using the Gemini API | \n"," 8108 | \n"," 1011 | \n"," 1221 | \n"," Jupyter Notebook | \n","
\n"," \n"," | 4 | \n"," stripe/stripe-ios | \n"," Stripe iOS SDK | \n"," 2179 | \n"," 994 | \n"," 19 | \n"," Swift | \n","
\n"," \n"," | 5 | \n"," RIOT-OS/RIOT | \n"," RIOT - The friendly OS for IoT | \n"," 5234 | \n"," 2017 | \n"," 168 | \n"," C | \n","
\n"," \n"," | 6 | \n"," zju3dv/EasyVolcap | \n"," EasyVolcap: Accelerating Neural Volumetric Vid... | \n"," 802 | \n"," 52 | \n"," 30 | \n"," Python | \n","
\n"," \n"," | 7 | \n"," TEN-framework/TEN-Agent | \n"," TEN Agent is a conversational AI powered by TE... | \n"," 3245 | \n"," 313 | \n"," 296 | \n"," Python | \n","
\n"," \n"," | 8 | \n"," DS4SD/docling | \n"," Get your documents ready for gen AI | \n"," 15201 | \n"," 774 | \n"," 281 | \n"," Python | \n","
\n"," \n"," | 9 | \n"," Guovin/iptv-api | \n"," 📺IPTV电视直播源更新工具:✨央视频、📱卫视、☕各省份地方台、🌏港·澳·台、🎥电影、🎮游戏... | \n"," 9046 | \n"," 1938 | \n"," 101 | \n"," Python | \n","
\n"," \n"," | 10 | \n"," fatedier/frp | \n"," A fast reverse proxy to help you expose a loca... | \n"," 87828 | \n"," 13502 | \n"," 64 | \n"," Go | \n","
\n"," \n"," | 11 | \n"," facebookresearch/AnimatedDrawings | \n"," Code to accompany \"A Method for Animating Chil... | \n"," 10766 | \n"," 955 | \n"," 38 | \n"," Python | \n","
\n"," \n"," | 12 | \n"," gorilla/websocket | \n"," Package gorilla/websocket is a fast, well-test... | \n"," 22633 | \n"," 3495 | \n"," 13 | \n"," Go | \n","
\n"," \n"," | 13 | \n"," DefiLlama/chainlist | \n"," NA | \n"," 2368 | \n"," 2476 | \n"," 5 | \n"," JavaScript | \n","
\n"," \n"," | 14 | \n"," open-telemetry/opentelemetry-collector | \n"," OpenTelemetry Collector | \n"," 4570 | \n"," 1497 | \n"," 4 | \n"," Go | \n","
\n"," \n"," | 15 | \n"," RocketChat/Rocket.Chat | \n"," The communications platform that puts data pro... | \n"," 41169 | \n"," 10877 | \n"," 73 | \n"," TypeScript | \n","
\n"," \n"," | 16 | \n"," langgenius/dify | \n"," Dify is an open-source LLM app development pla... | \n"," 54976 | \n"," 8083 | \n"," 127 | \n"," TypeScript | \n","
\n"," \n","
\n","
\n","
\n","
\n"],"application/vnd.google.colaboratory.intrinsic+json":{"type":"dataframe","variable_name":"df","summary":"{\n \"name\": \"df\",\n \"rows\": 17,\n \"fields\": [\n {\n \"column\": \"name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 17,\n \"samples\": [\n \"Byaidu/PDFMathTranslate\",\n \"bigskysoftware/htmx\",\n \"RIOT-OS/RIOT\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"description\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 17,\n \"samples\": [\n \"PDF scientific paper translation with preserved formats - \\u57fa\\u4e8e AI \\u5b8c\\u6574\\u4fdd\\u7559\\u6392\\u7248\\u7684 PDF \\u6587\\u6863\\u5168\\u6587\\u53cc\\u8bed\\u7ffb\\u8bd1\\uff0c\\u652f\\u6301 Google/DeepL/Ollama/OpenAI \\u7b49\\u670d\\u52a1\\uff0c\\u63d0\\u4f9b CLI/GUI/Docker\",\n \"htmx - high power tools for HTML\",\n \"RIOT - The friendly OS for IoT\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"stars\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 24731,\n \"min\": 802,\n \"max\": 87828,\n \"num_unique_values\": 17,\n \"samples\": [\n 8902,\n 39143,\n 5234\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"forks\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4176,\n \"min\": 52,\n \"max\": 13502,\n \"num_unique_values\": 17,\n \"samples\": [\n 633,\n 1324,\n 2017\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"today_stars\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 325,\n \"min\": 4,\n \"max\": 1221,\n \"num_unique_values\": 17,\n \"samples\": [\n 816,\n 186,\n 168\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"language\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 7,\n \"samples\": [\n \"Python\",\n \"JavaScript\",\n \"Go\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"}},"metadata":{},"execution_count":38}]},{"cell_type":"markdown","source":["Save it to CSV"],"metadata":{"id":"v0CBYVk7qA5Z"}},{"cell_type":"code","source":["# Save the DataFrame to a CSV file\n","csv_file = \"trending_repositories.csv\"\n","df.to_csv(csv_file, index=False)\n","print(f\"Data saved to {csv_file}\")"],"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"BtEbB9pmQGhO","executionInfo":{"status":"ok","timestamp":1734439655791,"user_tz":-60,"elapsed":303,"user":{"displayName":"ScrapeGraphAI","userId":"10474323355016263615"}},"outputId":"d2ec6dac-395f-4ddb-ad49-4bd672a04e5b"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stdout","text":["Data saved to trending_repositories.csv\n"]}]},{"cell_type":"markdown","source":["## 🔗 Resources"],"metadata":{"id":"-1SZT8VzTZNd"}},{"cell_type":"markdown","source":["\n","
\n","
\n","
\n","\n","\n","- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n","- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n","- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n","- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n","- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n","\n","Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"],"metadata":{"id":"dUi2LtMLRDDR"}}]}
+{
+ "cells": [
+ {
+ "source": "## 🕷️ Extract Github Trending Repositories with Official Scrapegraph SDK\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-sdk-1) [](https://colab.research.google.com/drive/1SSsI0naieeHjHcJkW22CqNIZwLHWjXmd?usp=sharing)",
+ "outputs": [
+ {
+ "data": {
+ "text/plain": "## 🕷️ Extract Github Trending Repositories with Official Scrapegraph SDK\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-sdk-1) [](https://colab.research.google.com/drive/1SSsI0naieeHjHcJkW22CqNIZwLHWjXmd?usp=sharing)",
+ "text/markdown": "## 🕷️ Extract Github Trending Repositories with Official Scrapegraph SDK\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-sdk-1) [](https://colab.research.google.com/drive/1SSsI0naieeHjHcJkW22CqNIZwLHWjXmd?usp=sharing)"
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "metadata": {
+ "id": "jEkuKbcRrPcK"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "completed",
+ "execution_count": null,
+ "executionEndTime": "2026-03-26T00:08:25.271Z",
+ "executionStartTime": "2026-03-26T00:08:25.271Z"
+ },
+ {
+ "source": "",
+ "outputs": [],
+ "metadata": {
+ "id": "d7Zro0xiuo-l"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 🔧 Install `dependencies`",
+ "outputs": [],
+ "metadata": {
+ "id": "IzsyDXEWwPVt"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "%%capture\n!pip install scrapegraph-py",
+ "outputs": [],
+ "metadata": {
+ "id": "os_vm0MkIxr9"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 🔑 Import `ScrapeGraph` API key",
+ "outputs": [],
+ "metadata": {
+ "id": "apBsL-L2KzM7"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)",
+ "outputs": [],
+ "metadata": {
+ "id": "ol9gQbAFkh9b"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "import getpass\nimport os\n\nif not os.environ.get(\"SGAI_API_KEY\"):\n os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")",
+ "outputs": [
+ {
+ "name": "stdout",
+ "text": [
+ "SGAI_API_KEY not found in environment.\n",
+ "Please enter your SGAI_API_KEY: ··········\n",
+ "SGAI_API_KEY has been set in the environment.\n"
+ ],
+ "output_type": "stream"
+ }
+ ],
+ "metadata": {
+ "id": "sffqFG2EJ8bI",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "ab74193e-e746-4de6-d65d-33a2a26b5d86",
+ "executionInfo": {
+ "user": {
+ "userId": "10474323355016263615",
+ "displayName": "ScrapeGraphAI"
+ },
+ "status": "ok",
+ "elapsed": 5826,
+ "user_tz": -60,
+ "timestamp": 1734439787062
+ }
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 📝 Defining an `Output Schema` for Webpage Content Extraction\n",
+ "outputs": [],
+ "metadata": {
+ "id": "jnqMB2-xVYQ7"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n\n
\n Pydantic Schema Quick Guide
\n\nTypes of Schemas \n\n1. Simple Schema \nUse this when you want to extract straightforward information, such as a single piece of content. \n\n```python\nfrom pydantic import BaseModel, Field\n\n# Simple schema for a single webpage\nclass PageInfoSchema(BaseModel):\n title: str = Field(description=\"The title of the webpage\")\n description: str = Field(description=\"The description of the webpage\")\n\n# Example Output JSON after AI extraction\n{\n \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n}\n```\n\n2. Complex Schema (Nested) \nIf you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n\n```python\nfrom pydantic import BaseModel, Field\nfrom typing import List\n\n# Define a schema for a single repository\nclass RepositorySchema(BaseModel):\n name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n description: str = Field(description=\"Description of the repository\")\n stars: int = Field(description=\"Star count of the repository\")\n forks: int = Field(description=\"Fork count of the repository\")\n today_stars: int = Field(description=\"Stars gained today\")\n language: str = Field(description=\"Programming language used\")\n\n# Define a schema for a list of repositories\nclass ListRepositoriesSchema(BaseModel):\n repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n\n# Example Output JSON after AI extraction\n{\n \"repositories\": [\n {\n \"name\": \"google-gemini/cookbook\",\n \"description\": \"Examples and guides for using the Gemini API\",\n \"stars\": 8036,\n \"forks\": 1001,\n \"today_stars\": 649,\n \"language\": \"Jupyter Notebook\"\n },\n {\n \"name\": \"TEN-framework/TEN-Agent\",\n \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n \"stars\": 3224,\n \"forks\": 311,\n \"today_stars\": 361,\n \"language\": \"Python\"\n }\n ]\n}\n```\n\nKey Takeaways \n- **Simple Schema**: Perfect for small, straightforward extractions. \n- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n\nBoth approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n \n",
+ "outputs": [],
+ "metadata": {
+ "id": "VZvxbjfXvbgd"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "from pydantic import BaseModel, Field\nfrom typing import List\n\n# Schema for Trending Repositories\n# This defines only the structure of how a single repository should look like\nclass RepositorySchema(BaseModel):\n name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n description: str = Field(description=\"Description of the repository\")\n stars: int = Field(description=\"Star count of the repository\")\n forks: int = Field(description=\"Fork count of the repository\")\n today_stars: int = Field(description=\"Stars gained today\")\n language: str = Field(description=\"Programming language used\")\n\n# Schema that contains a list of repositories\n# This references the previous schema\nclass ListRepositoriesSchema(BaseModel):\n repositories: List[RepositorySchema] = Field(description=\"List of github trending repositories\")",
+ "outputs": [],
+ "metadata": {
+ "id": "dlrOEgZk_8V4"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 🚀 Initialize `SGAI Client` and start extraction",
+ "outputs": [],
+ "metadata": {
+ "id": "cDGH0b2DkY63"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Initialize the client for scraping (there's also an async version [here](https://github.com/ScrapeGraphAI/scrapegraph-sdk/blob/main/scrapegraph-py/examples/async_smartscraper_example.py))",
+ "outputs": [],
+ "metadata": {
+ "id": "4SLJgXgcob6L"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "from scrapegraph_py import Client\n\n# Initialize the client with explicit API key\nsgai_client = Client(api_key=sgai_api_key)",
+ "outputs": [],
+ "metadata": {
+ "id": "PQI25GZvoCSk"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Here we use `Smartscraper` service to extract structured data using AI from a webpage.\n\n\n> If you already have an HTML file, you can upload it and use `Localscraper` instead.\n\n\n\n",
+ "outputs": [],
+ "metadata": {
+ "id": "M1KSXffZopUD"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "# Request for Trending Repositories\nrepo_response = sgai_client.smartscraper(\n website_url=\"https://github.com/trending\",\n user_prompt=\"Extract only the visible github trending repositories\",\n output_schema=ListRepositoriesSchema,\n)",
+ "outputs": [],
+ "metadata": {
+ "id": "2FIKomclLNFx"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Print the response",
+ "outputs": [],
+ "metadata": {
+ "id": "YZz1bqCIpoL8"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "import json\n\n# Print the response\nrequest_id = repo_response['request_id']\nresult = repo_response['result']\n\nprint(f\"Request ID: {request_id}\")\nprint(\"Trending Repositories:\")\nprint(json.dumps(result, indent=2))",
+ "outputs": [
+ {
+ "name": "stdout",
+ "text": [
+ "Request ID: 1e3b00ff-4b55-497c-8046-8ec5503cdafd\n",
+ "Trending Repositories:\n",
+ "{\n",
+ " \"repositories\": [\n",
+ " {\n",
+ " \"name\": \"Byaidu/PDFMathTranslate\",\n",
+ " \"description\": \"PDF scientific paper translation with preserved formats - \\u57fa\\u4e8e AI \\u5b8c\\u6574\\u4fdd\\u7559\\u6392\\u7248\\u7684 PDF \\u6587\\u6863\\u5168\\u6587\\u53cc\\u8bed\\u7ffb\\u8bd1\\uff0c\\u652f\\u6301 Google/DeepL/Ollama/OpenAI \\u7b49\\u670d\\u52a1\\uff0c\\u63d0\\u4f9b CLI/GUI/Docker\",\n",
+ " \"stars\": 8902,\n",
+ " \"forks\": 633,\n",
+ " \"today_stars\": 816,\n",
+ " \"language\": \"Python\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"bigskysoftware/htmx\",\n",
+ " \"description\": \"htmx - high power tools for HTML\",\n",
+ " \"stars\": 39143,\n",
+ " \"forks\": 1324,\n",
+ " \"today_stars\": 186,\n",
+ " \"language\": \"JavaScript\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"commaai/openpilot\",\n",
+ " \"description\": \"openpilot is an operating system for robotics. Currently, it upgrades the driver assistance system on 275+ supported cars.\",\n",
+ " \"stars\": 50945,\n",
+ " \"forks\": 9206,\n",
+ " \"today_stars\": 132,\n",
+ " \"language\": \"Python\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"google-gemini/cookbook\",\n",
+ " \"description\": \"Examples and guides for using the Gemini API\",\n",
+ " \"stars\": 8108,\n",
+ " \"forks\": 1011,\n",
+ " \"today_stars\": 1221,\n",
+ " \"language\": \"Jupyter Notebook\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"stripe/stripe-ios\",\n",
+ " \"description\": \"Stripe iOS SDK\",\n",
+ " \"stars\": 2179,\n",
+ " \"forks\": 994,\n",
+ " \"today_stars\": 19,\n",
+ " \"language\": \"Swift\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"RIOT-OS/RIOT\",\n",
+ " \"description\": \"RIOT - The friendly OS for IoT\",\n",
+ " \"stars\": 5234,\n",
+ " \"forks\": 2017,\n",
+ " \"today_stars\": 168,\n",
+ " \"language\": \"C\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"zju3dv/EasyVolcap\",\n",
+ " \"description\": \"EasyVolcap: Accelerating Neural Volumetric Video Research\",\n",
+ " \"stars\": 802,\n",
+ " \"forks\": 52,\n",
+ " \"today_stars\": 30,\n",
+ " \"language\": \"Python\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"TEN-framework/TEN-Agent\",\n",
+ " \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more. It offers real-time capabilities to see, hear, and speak, along with advanced tools like weather checks, web search, and RAG.\",\n",
+ " \"stars\": 3245,\n",
+ " \"forks\": 313,\n",
+ " \"today_stars\": 296,\n",
+ " \"language\": \"Python\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"DS4SD/docling\",\n",
+ " \"description\": \"Get your documents ready for gen AI\",\n",
+ " \"stars\": 15201,\n",
+ " \"forks\": 774,\n",
+ " \"today_stars\": 281,\n",
+ " \"language\": \"Python\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"Guovin/iptv-api\",\n",
+ " \"description\": \"\\ud83d\\udcfaIPTV\\u7535\\u89c6\\u76f4\\u64ad\\u6e90\\u66f4\\u65b0\\u5de5\\u5177\\uff1a\\u2728\\u592e\\u89c6\\u9891\\u3001\\ud83d\\udcf1\\u536b\\u89c6\\u3001\\u2615\\u5404\\u7701\\u4efd\\u5730\\u65b9\\u53f0\\u3001\\ud83c\\udf0f\\u6e2f\\u00b7\\u6fb3\\u00b7\\u53f0\\u3001\\ud83c\\udfa5\\u7535\\u5f71\\u3001\\ud83c\\udfae\\u6e38\\u620f\\u3001\\ud83c\\udfb5\\u97f3\\u4e50\\u3001\\ud83c\\udfad\\u7ecf\\u5178\\u5267\\u573a\\uff1b\\u652f\\u6301IPv4/IPv6\\uff1b\\u652f\\u6301\\u81ea\\u5b9a\\u4e49\\u589e\\u52a0\\u9891\\u9053\\uff1b\\u652f\\u6301\\u805a\\u5408\\u6e90\\u3001\\u4ee3\\u7406\\u6e90\\u3001\\u8ba2\\u9605\\u6e90\\u3001\\u5173\\u952e\\u5b57\\u641c\\u7d22\\uff1b\\u6bcf\\u5929\\u81ea\\u52a8\\u66f4\\u65b0\\u4e24\\u6b21\\uff0c\\u7ed3\\u679c\\u53ef\\u7528\\u4e8eTVBox\\u7b49\\u64ad\\u653e\\u8f6f\\u4ef6\\uff1b\\u652f\\u6301\\u5de5\\u4f5c\\u6d41\\u3001Docker(amd64/arm64/arm v7)\\u3001\\u547d\\u4ee4\\u884c\\u3001GUI\\u8fd0\\u884c\\u65b9\\u5f0f | IPTV live TV source update tool\",\n",
+ " \"stars\": 9046,\n",
+ " \"forks\": 1938,\n",
+ " \"today_stars\": 101,\n",
+ " \"language\": \"Python\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"fatedier/frp\",\n",
+ " \"description\": \"A fast reverse proxy to help you expose a local server behind a NAT or firewall to the internet.\",\n",
+ " \"stars\": 87828,\n",
+ " \"forks\": 13502,\n",
+ " \"today_stars\": 64,\n",
+ " \"language\": \"Go\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"facebookresearch/AnimatedDrawings\",\n",
+ " \"description\": \"Code to accompany \\\"A Method for Animating Children's Drawings of the Human Figure\\\"\",\n",
+ " \"stars\": 10766,\n",
+ " \"forks\": 955,\n",
+ " \"today_stars\": 38,\n",
+ " \"language\": \"Python\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"gorilla/websocket\",\n",
+ " \"description\": \"Package gorilla/websocket is a fast, well-tested and widely used WebSocket implementation for Go.\",\n",
+ " \"stars\": 22633,\n",
+ " \"forks\": 3495,\n",
+ " \"today_stars\": 13,\n",
+ " \"language\": \"Go\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"DefiLlama/chainlist\",\n",
+ " \"description\": \"NA\",\n",
+ " \"stars\": 2368,\n",
+ " \"forks\": 2476,\n",
+ " \"today_stars\": 5,\n",
+ " \"language\": \"JavaScript\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"open-telemetry/opentelemetry-collector\",\n",
+ " \"description\": \"OpenTelemetry Collector\",\n",
+ " \"stars\": 4570,\n",
+ " \"forks\": 1497,\n",
+ " \"today_stars\": 4,\n",
+ " \"language\": \"Go\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"RocketChat/Rocket.Chat\",\n",
+ " \"description\": \"The communications platform that puts data protection first.\",\n",
+ " \"stars\": 41169,\n",
+ " \"forks\": 10877,\n",
+ " \"today_stars\": 73,\n",
+ " \"language\": \"TypeScript\"\n",
+ " },\n",
+ " {\n",
+ " \"name\": \"langgenius/dify\",\n",
+ " \"description\": \"Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.\",\n",
+ " \"stars\": 54976,\n",
+ " \"forks\": 8083,\n",
+ " \"today_stars\": 127,\n",
+ " \"language\": \"TypeScript\"\n",
+ " }\n",
+ " ]\n",
+ "}\n"
+ ],
+ "output_type": "stream"
+ }
+ ],
+ "metadata": {
+ "id": "F1VfD8B4LPc8",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "6b4db540-076e-4d3f-a5ef-a29e14fbb233",
+ "executionInfo": {
+ "user": {
+ "userId": "10474323355016263615",
+ "displayName": "ScrapeGraphAI"
+ },
+ "status": "ok",
+ "elapsed": 266,
+ "user_tz": -60,
+ "timestamp": 1734439624722
+ }
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 💾 Save the output to a `CSV` file",
+ "outputs": [],
+ "metadata": {
+ "id": "2as65QLypwdb"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Let's create a pandas dataframe and show the table with the extracted content",
+ "outputs": [],
+ "metadata": {
+ "id": "HTLVFgbVLLBR"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "import pandas as pd\n\n# Convert dictionary to DataFrame\ndf = pd.DataFrame(result[\"repositories\"])\ndf",
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "
\n",
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " name | \n",
+ " description | \n",
+ " stars | \n",
+ " forks | \n",
+ " today_stars | \n",
+ " language | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | 0 | \n",
+ " Byaidu/PDFMathTranslate | \n",
+ " PDF scientific paper translation with preserve... | \n",
+ " 8902 | \n",
+ " 633 | \n",
+ " 816 | \n",
+ " Python | \n",
+ "
\n",
+ " \n",
+ " | 1 | \n",
+ " bigskysoftware/htmx | \n",
+ " htmx - high power tools for HTML | \n",
+ " 39143 | \n",
+ " 1324 | \n",
+ " 186 | \n",
+ " JavaScript | \n",
+ "
\n",
+ " \n",
+ " | 2 | \n",
+ " commaai/openpilot | \n",
+ " openpilot is an operating system for robotics.... | \n",
+ " 50945 | \n",
+ " 9206 | \n",
+ " 132 | \n",
+ " Python | \n",
+ "
\n",
+ " \n",
+ " | 3 | \n",
+ " google-gemini/cookbook | \n",
+ " Examples and guides for using the Gemini API | \n",
+ " 8108 | \n",
+ " 1011 | \n",
+ " 1221 | \n",
+ " Jupyter Notebook | \n",
+ "
\n",
+ " \n",
+ " | 4 | \n",
+ " stripe/stripe-ios | \n",
+ " Stripe iOS SDK | \n",
+ " 2179 | \n",
+ " 994 | \n",
+ " 19 | \n",
+ " Swift | \n",
+ "
\n",
+ " \n",
+ " | 5 | \n",
+ " RIOT-OS/RIOT | \n",
+ " RIOT - The friendly OS for IoT | \n",
+ " 5234 | \n",
+ " 2017 | \n",
+ " 168 | \n",
+ " C | \n",
+ "
\n",
+ " \n",
+ " | 6 | \n",
+ " zju3dv/EasyVolcap | \n",
+ " EasyVolcap: Accelerating Neural Volumetric Vid... | \n",
+ " 802 | \n",
+ " 52 | \n",
+ " 30 | \n",
+ " Python | \n",
+ "
\n",
+ " \n",
+ " | 7 | \n",
+ " TEN-framework/TEN-Agent | \n",
+ " TEN Agent is a conversational AI powered by TE... | \n",
+ " 3245 | \n",
+ " 313 | \n",
+ " 296 | \n",
+ " Python | \n",
+ "
\n",
+ " \n",
+ " | 8 | \n",
+ " DS4SD/docling | \n",
+ " Get your documents ready for gen AI | \n",
+ " 15201 | \n",
+ " 774 | \n",
+ " 281 | \n",
+ " Python | \n",
+ "
\n",
+ " \n",
+ " | 9 | \n",
+ " Guovin/iptv-api | \n",
+ " 📺IPTV电视直播源更新工具:✨央视频、📱卫视、☕各省份地方台、🌏港·澳·台、🎥电影、🎮游戏... | \n",
+ " 9046 | \n",
+ " 1938 | \n",
+ " 101 | \n",
+ " Python | \n",
+ "
\n",
+ " \n",
+ " | 10 | \n",
+ " fatedier/frp | \n",
+ " A fast reverse proxy to help you expose a loca... | \n",
+ " 87828 | \n",
+ " 13502 | \n",
+ " 64 | \n",
+ " Go | \n",
+ "
\n",
+ " \n",
+ " | 11 | \n",
+ " facebookresearch/AnimatedDrawings | \n",
+ " Code to accompany \"A Method for Animating Chil... | \n",
+ " 10766 | \n",
+ " 955 | \n",
+ " 38 | \n",
+ " Python | \n",
+ "
\n",
+ " \n",
+ " | 12 | \n",
+ " gorilla/websocket | \n",
+ " Package gorilla/websocket is a fast, well-test... | \n",
+ " 22633 | \n",
+ " 3495 | \n",
+ " 13 | \n",
+ " Go | \n",
+ "
\n",
+ " \n",
+ " | 13 | \n",
+ " DefiLlama/chainlist | \n",
+ " NA | \n",
+ " 2368 | \n",
+ " 2476 | \n",
+ " 5 | \n",
+ " JavaScript | \n",
+ "
\n",
+ " \n",
+ " | 14 | \n",
+ " open-telemetry/opentelemetry-collector | \n",
+ " OpenTelemetry Collector | \n",
+ " 4570 | \n",
+ " 1497 | \n",
+ " 4 | \n",
+ " Go | \n",
+ "
\n",
+ " \n",
+ " | 15 | \n",
+ " RocketChat/Rocket.Chat | \n",
+ " The communications platform that puts data pro... | \n",
+ " 41169 | \n",
+ " 10877 | \n",
+ " 73 | \n",
+ " TypeScript | \n",
+ "
\n",
+ " \n",
+ " | 16 | \n",
+ " langgenius/dify | \n",
+ " Dify is an open-source LLM app development pla... | \n",
+ " 54976 | \n",
+ " 8083 | \n",
+ " 127 | \n",
+ " TypeScript | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
\n",
+ "
\n"
+ ],
+ "text/plain": [
+ " name \\\n",
+ "0 Byaidu/PDFMathTranslate \n",
+ "1 bigskysoftware/htmx \n",
+ "2 commaai/openpilot \n",
+ "3 google-gemini/cookbook \n",
+ "4 stripe/stripe-ios \n",
+ "5 RIOT-OS/RIOT \n",
+ "6 zju3dv/EasyVolcap \n",
+ "7 TEN-framework/TEN-Agent \n",
+ "8 DS4SD/docling \n",
+ "9 Guovin/iptv-api \n",
+ "10 fatedier/frp \n",
+ "11 facebookresearch/AnimatedDrawings \n",
+ "12 gorilla/websocket \n",
+ "13 DefiLlama/chainlist \n",
+ "14 open-telemetry/opentelemetry-collector \n",
+ "15 RocketChat/Rocket.Chat \n",
+ "16 langgenius/dify \n",
+ "\n",
+ " description stars forks \\\n",
+ "0 PDF scientific paper translation with preserve... 8902 633 \n",
+ "1 htmx - high power tools for HTML 39143 1324 \n",
+ "2 openpilot is an operating system for robotics.... 50945 9206 \n",
+ "3 Examples and guides for using the Gemini API 8108 1011 \n",
+ "4 Stripe iOS SDK 2179 994 \n",
+ "5 RIOT - The friendly OS for IoT 5234 2017 \n",
+ "6 EasyVolcap: Accelerating Neural Volumetric Vid... 802 52 \n",
+ "7 TEN Agent is a conversational AI powered by TE... 3245 313 \n",
+ "8 Get your documents ready for gen AI 15201 774 \n",
+ "9 📺IPTV电视直播源更新工具:✨央视频、📱卫视、☕各省份地方台、🌏港·澳·台、🎥电影、🎮游戏... 9046 1938 \n",
+ "10 A fast reverse proxy to help you expose a loca... 87828 13502 \n",
+ "11 Code to accompany \"A Method for Animating Chil... 10766 955 \n",
+ "12 Package gorilla/websocket is a fast, well-test... 22633 3495 \n",
+ "13 NA 2368 2476 \n",
+ "14 OpenTelemetry Collector 4570 1497 \n",
+ "15 The communications platform that puts data pro... 41169 10877 \n",
+ "16 Dify is an open-source LLM app development pla... 54976 8083 \n",
+ "\n",
+ " today_stars language \n",
+ "0 816 Python \n",
+ "1 186 JavaScript \n",
+ "2 132 Python \n",
+ "3 1221 Jupyter Notebook \n",
+ "4 19 Swift \n",
+ "5 168 C \n",
+ "6 30 Python \n",
+ "7 296 Python \n",
+ "8 281 Python \n",
+ "9 101 Python \n",
+ "10 64 Go \n",
+ "11 38 Python \n",
+ "12 13 Go \n",
+ "13 5 JavaScript \n",
+ "14 4 Go \n",
+ "15 73 TypeScript \n",
+ "16 127 TypeScript "
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "dataframe",
+ "summary": "{\n \"name\": \"df\",\n \"rows\": 17,\n \"fields\": [\n {\n \"column\": \"name\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 17,\n \"samples\": [\n \"Byaidu/PDFMathTranslate\",\n \"bigskysoftware/htmx\",\n \"RIOT-OS/RIOT\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"description\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 17,\n \"samples\": [\n \"PDF scientific paper translation with preserved formats - \\u57fa\\u4e8e AI \\u5b8c\\u6574\\u4fdd\\u7559\\u6392\\u7248\\u7684 PDF \\u6587\\u6863\\u5168\\u6587\\u53cc\\u8bed\\u7ffb\\u8bd1\\uff0c\\u652f\\u6301 Google/DeepL/Ollama/OpenAI \\u7b49\\u670d\\u52a1\\uff0c\\u63d0\\u4f9b CLI/GUI/Docker\",\n \"htmx - high power tools for HTML\",\n \"RIOT - The friendly OS for IoT\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"stars\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 24731,\n \"min\": 802,\n \"max\": 87828,\n \"num_unique_values\": 17,\n \"samples\": [\n 8902,\n 39143,\n 5234\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"forks\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4176,\n \"min\": 52,\n \"max\": 13502,\n \"num_unique_values\": 17,\n \"samples\": [\n 633,\n 1324,\n 2017\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"today_stars\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 325,\n \"min\": 4,\n \"max\": 1221,\n \"num_unique_values\": 17,\n \"samples\": [\n 816,\n 186,\n 168\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"language\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 7,\n \"samples\": [\n \"Python\",\n \"JavaScript\",\n \"Go\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
+ "variable_name": "df"
+ }
+ },
+ "metadata": {},
+ "output_type": "execute_result",
+ "execution_count": 38
+ }
+ ],
+ "metadata": {
+ "id": "1lS9O1KOI51y",
+ "colab": {
+ "height": 635,
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "34a068a4-0fd0-47aa-a139-637b803f14f5",
+ "executionInfo": {
+ "user": {
+ "userId": "10474323355016263615",
+ "displayName": "ScrapeGraphAI"
+ },
+ "status": "ok",
+ "elapsed": 262,
+ "user_tz": -60,
+ "timestamp": 1734439642507
+ }
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Save it to CSV",
+ "outputs": [],
+ "metadata": {
+ "id": "v0CBYVk7qA5Z"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "# Save the DataFrame to a CSV file\ncsv_file = \"trending_repositories.csv\"\ndf.to_csv(csv_file, index=False)\nprint(f\"Data saved to {csv_file}\")",
+ "outputs": [
+ {
+ "name": "stdout",
+ "text": [
+ "Data saved to trending_repositories.csv\n"
+ ],
+ "output_type": "stream"
+ }
+ ],
+ "metadata": {
+ "id": "BtEbB9pmQGhO",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "d2ec6dac-395f-4ddb-ad49-4bd672a04e5b",
+ "executionInfo": {
+ "user": {
+ "userId": "10474323355016263615",
+ "displayName": "ScrapeGraphAI"
+ },
+ "status": "ok",
+ "elapsed": 303,
+ "user_tz": -60,
+ "timestamp": 1734439655791
+ }
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "## 🔗 Resources",
+ "outputs": [],
+ "metadata": {
+ "id": "-1SZT8VzTZNd"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "\n
\n
\n
\n\n\n- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n\nMade with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n",
+ "outputs": [],
+ "metadata": {
+ "id": "dUi2LtMLRDDR"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": [],
+ "authorship_tag": "ABX9TyM1qXPrrrWt8sAHKB8wCDas",
+ "collapsed_sections": [
+ "IzsyDXEWwPVt",
+ "jnqMB2-xVYQ7",
+ "cDGH0b2DkY63",
+ "2as65QLypwdb"
+ ]
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/cookbook/homes-forsale/scrapegraph_langchain.ipynb b/cookbook/homes-forsale/scrapegraph_langchain.ipynb
index 446538e..93d86e6 100644
--- a/cookbook/homes-forsale/scrapegraph_langchain.ipynb
+++ b/cookbook/homes-forsale/scrapegraph_langchain.ipynb
@@ -1 +1,1784 @@
-{"cells":[{"cell_type":"markdown","metadata":{"id":"ReBHQ5_834pZ"},"source":["
\n","
\n",""]},{"cell_type":"markdown","metadata":{"id":"jEkuKbcRrPcK"},"source":["## 🕷️ Extract Houses Listing with langchain-scrapegraph"]},{"cell_type":"markdown","metadata":{"id":"5cVkde_LpVkF"},"source":[""]},{"cell_type":"markdown","metadata":{"id":"IzsyDXEWwPVt"},"source":["### 🔧 Install `dependencies`"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"os_vm0MkIxr9"},"outputs":[],"source":["%%capture\n","!pip install langchain-scrapegraph"]},{"cell_type":"markdown","metadata":{"id":"apBsL-L2KzM7"},"source":["### 🔑 Import `ScrapeGraph` API key"]},{"cell_type":"markdown","metadata":{"id":"ol9gQbAFkh9b"},"source":["You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"sffqFG2EJ8bI","outputId":"c588274d-64ce-4d13-f12d-0458d9c4839d"},"outputs":[{"name":"stdout","output_type":"stream","text":["SGAI_API_KEY found in environment.\n"]}],"source":["import getpass\n","import os\n","\n","if not os.environ.get(\"SGAI_API_KEY\"):\n"," os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")"]},{"cell_type":"markdown","metadata":{"id":"jnqMB2-xVYQ7"},"source":["### 📝 Defining an `Output Schema` for Webpage Content Extraction\n"]},{"cell_type":"markdown","metadata":{"id":"VZvxbjfXvbgd"},"source":["If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n","\n","
\n"," Pydantic Schema Quick Guide
\n","\n","Types of Schemas \n","\n","1. Simple Schema \n","Use this when you want to extract straightforward information, such as a single piece of content. \n","\n","```python\n","from pydantic import BaseModel, Field\n","\n","# Simple schema for a single webpage\n","class PageInfoSchema(BaseModel):\n"," title: str = Field(description=\"The title of the webpage\")\n"," description: str = Field(description=\"The description of the webpage\")\n","\n","# Example Output JSON after AI extraction\n","{\n"," \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n"," \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n","}\n","```\n","\n","2. Complex Schema (Nested) \n","If you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n","\n","```python\n","from pydantic import BaseModel, Field\n","from typing import List\n","\n","# Define a schema for a single repository\n","class RepositorySchema(BaseModel):\n"," name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n"," description: str = Field(description=\"Description of the repository\")\n"," stars: int = Field(description=\"Star count of the repository\")\n"," forks: int = Field(description=\"Fork count of the repository\")\n"," today_stars: int = Field(description=\"Stars gained today\")\n"," language: str = Field(description=\"Programming language used\")\n","\n","# Define a schema for a list of repositories\n","class ListRepositoriesSchema(BaseModel):\n"," repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n","\n","# Example Output JSON after AI extraction\n","{\n"," \"repositories\": [\n"," {\n"," \"name\": \"google-gemini/cookbook\",\n"," \"description\": \"Examples and guides for using the Gemini API\",\n"," \"stars\": 8036,\n"," \"forks\": 1001,\n"," \"today_stars\": 649,\n"," \"language\": \"Jupyter Notebook\"\n"," },\n"," {\n"," \"name\": \"TEN-framework/TEN-Agent\",\n"," \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n"," \"stars\": 3224,\n"," \"forks\": 311,\n"," \"today_stars\": 361,\n"," \"language\": \"Python\"\n"," }\n"," ]\n","}\n","```\n","\n","Key Takeaways \n","- **Simple Schema**: Perfect for small, straightforward extractions. \n","- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n","\n","Both approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n"," \n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"dlrOEgZk_8V4"},"outputs":[],"source":["from pydantic import BaseModel, Field\n","from typing import List, Optional\n","\n","# Schema for a single house listing\n","class HouseListingSchema(BaseModel):\n"," price: int = Field(description=\"Price of the house in USD\")\n"," bedrooms: int = Field(description=\"Number of bedrooms\")\n"," bathrooms: int = Field(description=\"Number of bathrooms\")\n"," square_feet: int = Field(description=\"Total square footage of the house\")\n"," address: str = Field(description=\"Address of the house\")\n"," city: str = Field(description=\"City where the house is located\")\n"," state: str = Field(description=\"State where the house is located\")\n"," zip_code: str = Field(description=\"ZIP code of the house location\")\n"," tags: List[str] = Field(description=\"Tags like 'New construction' or 'Large garage'\")\n"," agent_name: str = Field(description=\"Name of the listing agent\")\n"," agency: str = Field(description=\"Agency listing the house\")\n","\n","# Schema containing a list of house listings\n","class HousesListingsSchema(BaseModel):\n"," houses: List[HouseListingSchema] = Field(description=\"List of house listings on Homes or similar platforms\")\n"]},{"cell_type":"markdown","metadata":{"id":"cDGH0b2DkY63"},"source":["### 🚀 Initialize `langchain-scrapegraph` tools and start extraction"]},{"cell_type":"markdown","metadata":{"id":"M1KSXffZopUD"},"source":["Here we use `SmartScraperTool` to extract structured data using AI from a webpage.\n","\n","\n","> If you already have an HTML file, you can upload it and use `LocalScraperTool` instead.\n","\n","You can find more info in the [official langchain documentation](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n","\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"p2BFhL53ore1"},"outputs":[],"source":["from langchain_scrapegraph.tools import SmartScraperTool\n","\n","# Will automatically get SGAI_API_KEY from environment\n","# Initialization without output schema\n","# tool = SmartScraperTool()\n","\n","# Since we have defined an output schema, let's use it\n","# This will force the tool to have always the same output structure\n","tool = SmartScraperTool(llm_output_schema=HousesListingsSchema)"]},{"cell_type":"markdown","metadata":{"id":"iCOcvpuOoubk"},"source":["`Invoke` the tool"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"2FIKomclLNFx"},"outputs":[],"source":["# Request for Homes Listings\n","result = tool.invoke({\n"," \"website_url\":\"https://www.homes.com/san-francisco-ca/?bb=nzpwspy0mS749snkvsb\",\n"," \"user_prompt\":\"Extract info about the houses visible on the page\",\n"," }\n",")"]},{"cell_type":"markdown","metadata":{"id":"gR2UZZwzo9Sn"},"source":["> As you may have noticed, we are not passing the `llm_output_schema` while invoking the tool, this will make life easier to `AI agents` since they will not need to generate one themselves with high risk of failure. Instead, we force the tool to return always a structured output that follows your previously defined schema. To find out more, check the following [README](https://github.com/ScrapeGraphAI/langchain-scrapegraph)\n"]},{"cell_type":"markdown","metadata":{"id":"YZz1bqCIpoL8"},"source":["Print the response"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"F1VfD8B4LPc8","outputId":"00597cd7-bac8-4af1-a5fe-d88d2f0ffa8d"},"outputs":[{"name":"stdout","output_type":"stream","text":["Houses:\n","{\n"," \"houses\": [\n"," {\n"," \"price\": 549000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 0,\n"," \"address\": \"380 14th St Unit 405\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94103\",\n"," \"tags\": [\n"," \"New construction\"\n"," ],\n"," \"agent_name\": \"Eddie O'Sullivan\",\n"," \"agency\": \"Elevation Real Estate\"\n"," },\n"," {\n"," \"price\": 1799000,\n"," \"bedrooms\": 4,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 2735,\n"," \"address\": \"123 Grattan St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94117\",\n"," \"tags\": [\n"," \"Edwardian-style\",\n"," \"investment\",\n"," \"owner-occupants\"\n"," ],\n"," \"agent_name\": \"Sean Engmann\",\n"," \"agency\": \"eXp Realty of Northern CA Inc.\"\n"," },\n"," {\n"," \"price\": 1995000,\n"," \"bedrooms\": 7,\n"," \"bathrooms\": 3,\n"," \"square_feet\": 3330,\n"," \"address\": \"1590 Washington St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94109\",\n"," \"tags\": [\n"," \"Nob Hill\",\n"," \"3-unit building\",\n"," \"investment\"\n"," ],\n"," \"agent_name\": \"Eddie O'Sullivan\",\n"," \"agency\": \"Elevation Real Estate\"\n"," },\n"," {\n"," \"price\": 549000,\n"," \"bedrooms\": 0,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 477,\n"," \"address\": \"240 Lombard St Unit 835\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94111\",\n"," \"tags\": [\n"," \"Bay view\",\n"," \"studio\",\n"," \"modern appliances\"\n"," ],\n"," \"agent_name\": \"Tim Gullicksen\",\n"," \"agency\": \"Corcoran Icon Properties\"\n"," },\n"," {\n"," \"price\": 5495000,\n"," \"bedrooms\": 10,\n"," \"bathrooms\": 7,\n"," \"square_feet\": 6505,\n"," \"address\": \"1057 Steiner St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94115\",\n"," \"tags\": [\n"," \"Victorian\",\n"," \"Bed & Breakfast\",\n"," \"Gilded Age\"\n"," ],\n"," \"agent_name\": \"Bonnie Spindler\",\n"," \"agency\": \"Corcoran Icon Properties\"\n"," },\n"," {\n"," \"price\": 925000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 779,\n"," \"address\": \"2 Fallon Place Unit 57\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94133\",\n"," \"tags\": [\n"," \"Russian Hill\",\n"," \"views\",\n"," \"exclusive-use deck\"\n"," ],\n"," \"agent_name\": \"Eddie O'Sullivan\",\n"," \"agency\": \"Elevation Real Estate\"\n"," },\n"," {\n"," \"price\": 898000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1175,\n"," \"address\": \"5160 Diamond Heights Blvd Unit 208C\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94131\",\n"," \"tags\": [],\n"," \"agent_name\": \"Joe Polyak\",\n"," \"agency\": \"Rise Homes\"\n"," },\n"," {\n"," \"price\": 1700000,\n"," \"bedrooms\": 4,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1950,\n"," \"address\": \"1351 26th Ave\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94122\",\n"," \"tags\": [],\n"," \"agent_name\": \"Glenda Queensbury\",\n"," \"agency\": \"Referral Realty-BV\"\n"," },\n"," {\n"," \"price\": 1899000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1560,\n"," \"address\": \"340 Yerba Buena Ave\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94127\",\n"," \"tags\": [],\n"," \"agent_name\": \"Jeannie Anderson\",\n"," \"agency\": \"Coldwell Banker Realty\"\n"," },\n"," {\n"," \"price\": 850000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1055,\n"," \"address\": \"588 Minna Unit 604\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94103\",\n"," \"tags\": [],\n"," \"agent_name\": \"Mohamed Lakdawala\",\n"," \"agency\": \"Remax Prestigious Properties\"\n"," },\n"," {\n"," \"price\": 1990000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1280,\n"," \"address\": \"1450 Diamond St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94131\",\n"," \"tags\": [],\n"," \"agent_name\": \"Mary Anne Villamil\",\n"," \"agency\": \"Kinetic Real Estate\"\n"," },\n"," {\n"," \"price\": 849000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 855,\n"," \"address\": \"81 Lansing St Unit 401\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94105\",\n"," \"tags\": [],\n"," \"agent_name\": \"Kristen Haenggi\",\n"," \"agency\": \"Compass\"\n"," },\n"," {\n"," \"price\": 1080000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 936,\n"," \"address\": \"451 Kansas St Unit 466\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94107\",\n"," \"tags\": [],\n"," \"agent_name\": \"Maureen DeBoer\",\n"," \"agency\": \"LKJ Realty\"\n"," },\n"," {\n"," \"price\": 1499000,\n"," \"bedrooms\": 4,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 2145,\n"," \"address\": \"486 Yale St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94134\",\n"," \"tags\": [],\n"," \"agent_name\": \"Alicia Atienza\",\n"," \"agency\": \"Statewide Realty\"\n"," },\n"," {\n"," \"price\": 1140000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 998,\n"," \"address\": \"588 Minna Unit 801\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94103\",\n"," \"tags\": [],\n"," \"agent_name\": \"Milan Jezdimirovic\",\n"," \"agency\": \"Compass\"\n"," },\n"," {\n"," \"price\": 1988000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 3800,\n"," \"address\": \"183 19th Ave\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94121\",\n"," \"tags\": [\n"," \"Amazing Property\",\n"," \"Marina Style\",\n"," \"Needs TLC\"\n"," ],\n"," \"agent_name\": \"Leo Cheung\",\n"," \"agency\": \"eXp Realty of California, Inc\"\n"," },\n"," {\n"," \"price\": 1218000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1275,\n"," \"address\": \"1998 Pacific Ave Unit 202\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94109\",\n"," \"tags\": [\n"," \"Light-filled\",\n"," \"Freshly painted\",\n"," \"Walker's paradise\"\n"," ],\n"," \"agent_name\": \"Grace Sun\",\n"," \"agency\": \"Compass\"\n"," },\n"," {\n"," \"price\": 895000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 837,\n"," \"address\": \"425 1st St Unit 2501\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94105\",\n"," \"tags\": [\n"," \"Unobstructed bay bridge views\",\n"," \"Open layout\"\n"," ],\n"," \"agent_name\": \"Matt Fuller\",\n"," \"agency\": \"Jackson Fuller Real Estate\"\n"," },\n"," {\n"," \"price\": 1499000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1500,\n"," \"address\": \"Unlisted Address\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"NA\",\n"," \"tags\": [\n"," \"Contractor's Special\",\n"," \"Fixer-upper\"\n"," ],\n"," \"agent_name\": \"Jaymee Faith Sagisi\",\n"," \"agency\": \"IMPACT\"\n"," },\n"," {\n"," \"price\": 900000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 930,\n"," \"address\": \"1101 Green St Unit 302\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94109\",\n"," \"tags\": [\n"," \"Historic Art Deco\",\n"," \"Abundant natural light\"\n"," ],\n"," \"agent_name\": \"NA\",\n"," \"agency\": \"NA\"\n"," },\n"," {\n"," \"price\": 858000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1104,\n"," \"address\": \"260 King St Unit 557\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94107\",\n"," \"tags\": [],\n"," \"agent_name\": \"Miyuki Takami\",\n"," \"agency\": \"eXp Realty of California, Inc\"\n"," },\n"," {\n"," \"price\": 945000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 767,\n"," \"address\": \"307 Page St Unit 1\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94102\",\n"," \"tags\": [],\n"," \"agent_name\": \"NA\",\n"," \"agency\": \"NA\"\n"," },\n"," {\n"," \"price\": 1099000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1330,\n"," \"address\": \"1080 Sutter St Unit 202\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94109\",\n"," \"tags\": [],\n"," \"agent_name\": \"Annette Liberty\",\n"," \"agency\": \"Coldwell Banker Realty\"\n"," },\n"," {\n"," \"price\": 950000,\n"," \"bedrooms\": 4,\n"," \"bathrooms\": 3,\n"," \"square_feet\": 2090,\n"," \"address\": \"3328 26th St Unit 3330\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94110\",\n"," \"tags\": [],\n"," \"agent_name\": \"Isaac Munene\",\n"," \"agency\": \"Coldwell Banker Realty\"\n"," },\n"," {\n"," \"price\": 1088000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1065,\n"," \"address\": \"1776 Sacramento St Unit 503\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94109\",\n"," \"tags\": [],\n"," \"agent_name\": \"Marilyn Becklehimer\",\n"," \"agency\": \"Dio Real Estate\"\n"," },\n"," {\n"," \"price\": 1788888,\n"," \"bedrooms\": 4,\n"," \"bathrooms\": 3,\n"," \"square_feet\": 1856,\n"," \"address\": \"2317 15th St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94114\",\n"," \"tags\": [],\n"," \"agent_name\": \"Joel Gile\",\n"," \"agency\": \"Sequoia Real Estate\"\n"," },\n"," {\n"," \"price\": 1650000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1547,\n"," \"address\": \"2475 47th Ave\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94116\",\n"," \"tags\": [],\n"," \"agent_name\": \"Lucy Goldenshteyn\",\n"," \"agency\": \"Redfin\"\n"," },\n"," {\n"," \"price\": 998000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1202,\n"," \"address\": \"50 Lansing St Unit 201\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94105\",\n"," \"tags\": [],\n"," \"agent_name\": \"Tracey Broadman\",\n"," \"agency\": \"Vanguard Properties\"\n"," },\n"," {\n"," \"price\": 1595000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 5,\n"," \"square_feet\": 1995,\n"," \"address\": \"15 Joy St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94110\",\n"," \"tags\": [],\n"," \"agent_name\": \"Mike Stack\",\n"," \"agency\": \"Vanguard Properties\"\n"," },\n"," {\n"," \"price\": 1028000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1065,\n"," \"address\": \"50 Lansing St Unit 403\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94105\",\n"," \"tags\": [],\n"," \"agent_name\": \"Robyn Kaufman\",\n"," \"agency\": \"Vivre Real Estate\"\n"," },\n"," {\n"," \"price\": 999000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1021,\n"," \"address\": \"338 Spear St Unit 6J\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94105\",\n"," \"tags\": [\n"," \"Spacious\",\n"," \"Balcony\",\n"," \"Bright courtyard views\"\n"," ],\n"," \"agent_name\": \"Paul Hwang\",\n"," \"agency\": \"Skybox Realty\"\n"," },\n"," {\n"," \"price\": 799800,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1109,\n"," \"address\": \"10 Innes Ct\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94124\",\n"," \"tags\": [\n"," \"New Construction\",\n"," \"1-car garage\"\n"," ],\n"," \"agent_name\": \"Lennar\",\n"," \"agency\": \"Lennar\"\n"," },\n"," {\n"," \"price\": 529880,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 740,\n"," \"address\": \"10 Innes Ct\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94124\",\n"," \"tags\": [\n"," \"New Construction\",\n"," \"1-car garage\"\n"," ],\n"," \"agent_name\": \"Lennar\",\n"," \"agency\": \"Lennar\"\n"," },\n"," {\n"," \"price\": 489000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 741,\n"," \"address\": \"10 Innes Ct\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94124\",\n"," \"tags\": [\n"," \"New Construction\",\n"," \"1-car garage\"\n"," ],\n"," \"agent_name\": \"Lennar\",\n"," \"agency\": \"Lennar\"\n"," },\n"," {\n"," \"price\": 1359000,\n"," \"bedrooms\": 4,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1845,\n"," \"address\": \"170 Thrift St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94112\",\n"," \"tags\": [\n"," \"Updated\",\n"," \"Natural light\"\n"," ],\n"," \"agent_name\": \"Cristal Wright\",\n"," \"agency\": \"Vanguard Properties\"\n"," },\n"," {\n"," \"price\": 1295000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1214,\n"," \"address\": \"1922 43rd Ave\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94116\",\n"," \"tags\": [],\n"," \"agent_name\": \"Mila Romprey\",\n"," \"agency\": \"Premier Realty Associates\"\n"," },\n"," {\n"," \"price\": 1098000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1006,\n"," \"address\": \"150 Putnam St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94110\",\n"," \"tags\": [],\n"," \"agent_name\": \"Genie Mantzoros\",\n"," \"agency\": \"Epic Real Estate & Asso. Inc.\"\n"," },\n"," {\n"," \"price\": 1189870,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1436,\n"," \"address\": \"327 Ordway St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94134\",\n"," \"tags\": [],\n"," \"agent_name\": \"Shawn Zahraie\",\n"," \"agency\": \"Affinity Enterprises, Inc\"\n"," },\n"," {\n"," \"price\": 899000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1118,\n"," \"address\": \"272 Farallones St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94112\",\n"," \"tags\": [],\n"," \"agent_name\": \"Janice Lee\",\n"," \"agency\": \"Coldwell Banker Realty\"\n"," },\n"," {\n"," \"price\": 30000,\n"," \"bedrooms\": 0,\n"," \"bathrooms\": 0,\n"," \"square_feet\": 0,\n"," \"address\": \"0 Evans Ave\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94124\",\n"," \"tags\": [\n"," \"Land\",\n"," \"0.12 Acre\",\n"," \"$251,467 per Acre\"\n"," ],\n"," \"agent_name\": \"Heidy Carrera\",\n"," \"agency\": \"Berkshire Hathaway HomeService\"\n"," }\n"," ]\n","}\n"]}],"source":["import json\n","\n","print(json.dumps(result, indent=2))"]},{"cell_type":"markdown","metadata":{"id":"2as65QLypwdb"},"source":["### 💾 Save the output to a `CSV` file"]},{"cell_type":"markdown","metadata":{"id":"HTLVFgbVLLBR"},"source":["Let's create a pandas dataframe and show the table with the extracted content"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":488},"id":"1lS9O1KOI51y","outputId":"16c95c43-2312-4c08-9d29-b3b95de080c9"},"outputs":[{"data":{"text/html":["
\n","\n","
\n"," \n"," \n"," | \n"," price | \n"," bedrooms | \n"," bathrooms | \n"," square_feet | \n"," address | \n"," city | \n"," state | \n"," zip_code | \n"," tags | \n"," agent_name | \n"," agency | \n","
\n"," \n"," \n"," \n"," | 0 | \n"," 549000 | \n"," 1 | \n"," 1 | \n"," 0 | \n"," 380 14th St Unit 405 | \n"," San Francisco | \n"," CA | \n"," 94103 | \n"," [New construction] | \n"," Eddie O'Sullivan | \n"," Elevation Real Estate | \n","
\n"," \n"," | 1 | \n"," 1799000 | \n"," 4 | \n"," 2 | \n"," 2735 | \n"," 123 Grattan St | \n"," San Francisco | \n"," CA | \n"," 94117 | \n"," [Edwardian-style, investment, owner-occupants] | \n"," Sean Engmann | \n"," eXp Realty of Northern CA Inc. | \n","
\n"," \n"," | 2 | \n"," 1995000 | \n"," 7 | \n"," 3 | \n"," 3330 | \n"," 1590 Washington St | \n"," San Francisco | \n"," CA | \n"," 94109 | \n"," [Nob Hill, 3-unit building, investment] | \n"," Eddie O'Sullivan | \n"," Elevation Real Estate | \n","
\n"," \n"," | 3 | \n"," 549000 | \n"," 0 | \n"," 1 | \n"," 477 | \n"," 240 Lombard St Unit 835 | \n"," San Francisco | \n"," CA | \n"," 94111 | \n"," [Bay view, studio, modern appliances] | \n"," Tim Gullicksen | \n"," Corcoran Icon Properties | \n","
\n"," \n"," | 4 | \n"," 5495000 | \n"," 10 | \n"," 7 | \n"," 6505 | \n"," 1057 Steiner St | \n"," San Francisco | \n"," CA | \n"," 94115 | \n"," [Victorian, Bed & Breakfast, Gilded Age] | \n"," Bonnie Spindler | \n"," Corcoran Icon Properties | \n","
\n"," \n"," | 5 | \n"," 925000 | \n"," 2 | \n"," 1 | \n"," 779 | \n"," 2 Fallon Place Unit 57 | \n"," San Francisco | \n"," CA | \n"," 94133 | \n"," [Russian Hill, views, exclusive-use deck] | \n"," Eddie O'Sullivan | \n"," Elevation Real Estate | \n","
\n"," \n"," | 6 | \n"," 898000 | \n"," 2 | \n"," 2 | \n"," 1175 | \n"," 5160 Diamond Heights Blvd Unit 208C | \n"," San Francisco | \n"," CA | \n"," 94131 | \n"," [] | \n"," Joe Polyak | \n"," Rise Homes | \n","
\n"," \n"," | 7 | \n"," 1700000 | \n"," 4 | \n"," 2 | \n"," 1950 | \n"," 1351 26th Ave | \n"," San Francisco | \n"," CA | \n"," 94122 | \n"," [] | \n"," Glenda Queensbury | \n"," Referral Realty-BV | \n","
\n"," \n"," | 8 | \n"," 1899000 | \n"," 3 | \n"," 2 | \n"," 1560 | \n"," 340 Yerba Buena Ave | \n"," San Francisco | \n"," CA | \n"," 94127 | \n"," [] | \n"," Jeannie Anderson | \n"," Coldwell Banker Realty | \n","
\n"," \n"," | 9 | \n"," 850000 | \n"," 2 | \n"," 2 | \n"," 1055 | \n"," 588 Minna Unit 604 | \n"," San Francisco | \n"," CA | \n"," 94103 | \n"," [] | \n"," Mohamed Lakdawala | \n"," Remax Prestigious Properties | \n","
\n"," \n"," | 10 | \n"," 1990000 | \n"," 3 | \n"," 1 | \n"," 1280 | \n"," 1450 Diamond St | \n"," San Francisco | \n"," CA | \n"," 94131 | \n"," [] | \n"," Mary Anne Villamil | \n"," Kinetic Real Estate | \n","
\n"," \n"," | 11 | \n"," 849000 | \n"," 1 | \n"," 1 | \n"," 855 | \n"," 81 Lansing St Unit 401 | \n"," San Francisco | \n"," CA | \n"," 94105 | \n"," [] | \n"," Kristen Haenggi | \n"," Compass | \n","
\n"," \n"," | 12 | \n"," 1080000 | \n"," 2 | \n"," 2 | \n"," 936 | \n"," 451 Kansas St Unit 466 | \n"," San Francisco | \n"," CA | \n"," 94107 | \n"," [] | \n"," Maureen DeBoer | \n"," LKJ Realty | \n","
\n"," \n"," | 13 | \n"," 1499000 | \n"," 4 | \n"," 2 | \n"," 2145 | \n"," 486 Yale St | \n"," San Francisco | \n"," CA | \n"," 94134 | \n"," [] | \n"," Alicia Atienza | \n"," Statewide Realty | \n","
\n"," \n"," | 14 | \n"," 1140000 | \n"," 2 | \n"," 2 | \n"," 998 | \n"," 588 Minna Unit 801 | \n"," San Francisco | \n"," CA | \n"," 94103 | \n"," [] | \n"," Milan Jezdimirovic | \n"," Compass | \n","
\n"," \n"," | 15 | \n"," 1988000 | \n"," 2 | \n"," 1 | \n"," 3800 | \n"," 183 19th Ave | \n"," San Francisco | \n"," CA | \n"," 94121 | \n"," [Amazing Property, Marina Style, Needs TLC] | \n"," Leo Cheung | \n"," eXp Realty of California, Inc | \n","
\n"," \n"," | 16 | \n"," 1218000 | \n"," 2 | \n"," 2 | \n"," 1275 | \n"," 1998 Pacific Ave Unit 202 | \n"," San Francisco | \n"," CA | \n"," 94109 | \n"," [Light-filled, Freshly painted, Walker's parad... | \n"," Grace Sun | \n"," Compass | \n","
\n"," \n"," | 17 | \n"," 895000 | \n"," 1 | \n"," 1 | \n"," 837 | \n"," 425 1st St Unit 2501 | \n"," San Francisco | \n"," CA | \n"," 94105 | \n"," [Unobstructed bay bridge views, Open layout] | \n"," Matt Fuller | \n"," Jackson Fuller Real Estate | \n","
\n"," \n"," | 18 | \n"," 1499000 | \n"," 3 | \n"," 1 | \n"," 1500 | \n"," Unlisted Address | \n"," San Francisco | \n"," CA | \n"," NA | \n"," [Contractor's Special, Fixer-upper] | \n"," Jaymee Faith Sagisi | \n"," IMPACT | \n","
\n"," \n"," | 19 | \n"," 900000 | \n"," 1 | \n"," 1 | \n"," 930 | \n"," 1101 Green St Unit 302 | \n"," San Francisco | \n"," CA | \n"," 94109 | \n"," [Historic Art Deco, Abundant natural light] | \n"," NA | \n"," NA | \n","
\n"," \n"," | 20 | \n"," 858000 | \n"," 1 | \n"," 1 | \n"," 1104 | \n"," 260 King St Unit 557 | \n"," San Francisco | \n"," CA | \n"," 94107 | \n"," [] | \n"," Miyuki Takami | \n"," eXp Realty of California, Inc | \n","
\n"," \n"," | 21 | \n"," 945000 | \n"," 2 | \n"," 1 | \n"," 767 | \n"," 307 Page St Unit 1 | \n"," San Francisco | \n"," CA | \n"," 94102 | \n"," [] | \n"," NA | \n"," NA | \n","
\n"," \n"," | 22 | \n"," 1099000 | \n"," 2 | \n"," 2 | \n"," 1330 | \n"," 1080 Sutter St Unit 202 | \n"," San Francisco | \n"," CA | \n"," 94109 | \n"," [] | \n"," Annette Liberty | \n"," Coldwell Banker Realty | \n","
\n"," \n"," | 23 | \n"," 950000 | \n"," 4 | \n"," 3 | \n"," 2090 | \n"," 3328 26th St Unit 3330 | \n"," San Francisco | \n"," CA | \n"," 94110 | \n"," [] | \n"," Isaac Munene | \n"," Coldwell Banker Realty | \n","
\n"," \n"," | 24 | \n"," 1088000 | \n"," 2 | \n"," 2 | \n"," 1065 | \n"," 1776 Sacramento St Unit 503 | \n"," San Francisco | \n"," CA | \n"," 94109 | \n"," [] | \n"," Marilyn Becklehimer | \n"," Dio Real Estate | \n","
\n"," \n"," | 25 | \n"," 1788888 | \n"," 4 | \n"," 3 | \n"," 1856 | \n"," 2317 15th St | \n"," San Francisco | \n"," CA | \n"," 94114 | \n"," [] | \n"," Joel Gile | \n"," Sequoia Real Estate | \n","
\n"," \n"," | 26 | \n"," 1650000 | \n"," 3 | \n"," 2 | \n"," 1547 | \n"," 2475 47th Ave | \n"," San Francisco | \n"," CA | \n"," 94116 | \n"," [] | \n"," Lucy Goldenshteyn | \n"," Redfin | \n","
\n"," \n"," | 27 | \n"," 998000 | \n"," 2 | \n"," 2 | \n"," 1202 | \n"," 50 Lansing St Unit 201 | \n"," San Francisco | \n"," CA | \n"," 94105 | \n"," [] | \n"," Tracey Broadman | \n"," Vanguard Properties | \n","
\n"," \n"," | 28 | \n"," 1595000 | \n"," 3 | \n"," 5 | \n"," 1995 | \n"," 15 Joy St | \n"," San Francisco | \n"," CA | \n"," 94110 | \n"," [] | \n"," Mike Stack | \n"," Vanguard Properties | \n","
\n"," \n"," | 29 | \n"," 1028000 | \n"," 2 | \n"," 2 | \n"," 1065 | \n"," 50 Lansing St Unit 403 | \n"," San Francisco | \n"," CA | \n"," 94105 | \n"," [] | \n"," Robyn Kaufman | \n"," Vivre Real Estate | \n","
\n"," \n"," | 30 | \n"," 999000 | \n"," 1 | \n"," 1 | \n"," 1021 | \n"," 338 Spear St Unit 6J | \n"," San Francisco | \n"," CA | \n"," 94105 | \n"," [Spacious, Balcony, Bright courtyard views] | \n"," Paul Hwang | \n"," Skybox Realty | \n","
\n"," \n"," | 31 | \n"," 799800 | \n"," 2 | \n"," 2 | \n"," 1109 | \n"," 10 Innes Ct | \n"," San Francisco | \n"," CA | \n"," 94124 | \n"," [New Construction, 1-car garage] | \n"," Lennar | \n"," Lennar | \n","
\n"," \n"," | 32 | \n"," 529880 | \n"," 1 | \n"," 1 | \n"," 740 | \n"," 10 Innes Ct | \n"," San Francisco | \n"," CA | \n"," 94124 | \n"," [New Construction, 1-car garage] | \n"," Lennar | \n"," Lennar | \n","
\n"," \n"," | 33 | \n"," 489000 | \n"," 1 | \n"," 1 | \n"," 741 | \n"," 10 Innes Ct | \n"," San Francisco | \n"," CA | \n"," 94124 | \n"," [New Construction, 1-car garage] | \n"," Lennar | \n"," Lennar | \n","
\n"," \n"," | 34 | \n"," 1359000 | \n"," 4 | \n"," 2 | \n"," 1845 | \n"," 170 Thrift St | \n"," San Francisco | \n"," CA | \n"," 94112 | \n"," [Updated, Natural light] | \n"," Cristal Wright | \n"," Vanguard Properties | \n","
\n"," \n"," | 35 | \n"," 1295000 | \n"," 3 | \n"," 1 | \n"," 1214 | \n"," 1922 43rd Ave | \n"," San Francisco | \n"," CA | \n"," 94116 | \n"," [] | \n"," Mila Romprey | \n"," Premier Realty Associates | \n","
\n"," \n"," | 36 | \n"," 1098000 | \n"," 3 | \n"," 1 | \n"," 1006 | \n"," 150 Putnam St | \n"," San Francisco | \n"," CA | \n"," 94110 | \n"," [] | \n"," Genie Mantzoros | \n"," Epic Real Estate & Asso. Inc. | \n","
\n"," \n"," | 37 | \n"," 1189870 | \n"," 3 | \n"," 2 | \n"," 1436 | \n"," 327 Ordway St | \n"," San Francisco | \n"," CA | \n"," 94134 | \n"," [] | \n"," Shawn Zahraie | \n"," Affinity Enterprises, Inc | \n","
\n"," \n"," | 38 | \n"," 899000 | \n"," 2 | \n"," 1 | \n"," 1118 | \n"," 272 Farallones St | \n"," San Francisco | \n"," CA | \n"," 94112 | \n"," [] | \n"," Janice Lee | \n"," Coldwell Banker Realty | \n","
\n"," \n"," | 39 | \n"," 30000 | \n"," 0 | \n"," 0 | \n"," 0 | \n"," 0 Evans Ave | \n"," San Francisco | \n"," CA | \n"," 94124 | \n"," [Land, 0.12 Acre, $251,467 per Acre] | \n"," Heidy Carrera | \n"," Berkshire Hathaway HomeService | \n","
\n"," \n","
\n","
"],"text/plain":[" price bedrooms bathrooms square_feet \\\n","0 549000 1 1 0 \n","1 1799000 4 2 2735 \n","2 1995000 7 3 3330 \n","3 549000 0 1 477 \n","4 5495000 10 7 6505 \n","5 925000 2 1 779 \n","6 898000 2 2 1175 \n","7 1700000 4 2 1950 \n","8 1899000 3 2 1560 \n","9 850000 2 2 1055 \n","10 1990000 3 1 1280 \n","11 849000 1 1 855 \n","12 1080000 2 2 936 \n","13 1499000 4 2 2145 \n","14 1140000 2 2 998 \n","15 1988000 2 1 3800 \n","16 1218000 2 2 1275 \n","17 895000 1 1 837 \n","18 1499000 3 1 1500 \n","19 900000 1 1 930 \n","20 858000 1 1 1104 \n","21 945000 2 1 767 \n","22 1099000 2 2 1330 \n","23 950000 4 3 2090 \n","24 1088000 2 2 1065 \n","25 1788888 4 3 1856 \n","26 1650000 3 2 1547 \n","27 998000 2 2 1202 \n","28 1595000 3 5 1995 \n","29 1028000 2 2 1065 \n","30 999000 1 1 1021 \n","31 799800 2 2 1109 \n","32 529880 1 1 740 \n","33 489000 1 1 741 \n","34 1359000 4 2 1845 \n","35 1295000 3 1 1214 \n","36 1098000 3 1 1006 \n","37 1189870 3 2 1436 \n","38 899000 2 1 1118 \n","39 30000 0 0 0 \n","\n"," address city state zip_code \\\n","0 380 14th St Unit 405 San Francisco CA 94103 \n","1 123 Grattan St San Francisco CA 94117 \n","2 1590 Washington St San Francisco CA 94109 \n","3 240 Lombard St Unit 835 San Francisco CA 94111 \n","4 1057 Steiner St San Francisco CA 94115 \n","5 2 Fallon Place Unit 57 San Francisco CA 94133 \n","6 5160 Diamond Heights Blvd Unit 208C San Francisco CA 94131 \n","7 1351 26th Ave San Francisco CA 94122 \n","8 340 Yerba Buena Ave San Francisco CA 94127 \n","9 588 Minna Unit 604 San Francisco CA 94103 \n","10 1450 Diamond St San Francisco CA 94131 \n","11 81 Lansing St Unit 401 San Francisco CA 94105 \n","12 451 Kansas St Unit 466 San Francisco CA 94107 \n","13 486 Yale St San Francisco CA 94134 \n","14 588 Minna Unit 801 San Francisco CA 94103 \n","15 183 19th Ave San Francisco CA 94121 \n","16 1998 Pacific Ave Unit 202 San Francisco CA 94109 \n","17 425 1st St Unit 2501 San Francisco CA 94105 \n","18 Unlisted Address San Francisco CA NA \n","19 1101 Green St Unit 302 San Francisco CA 94109 \n","20 260 King St Unit 557 San Francisco CA 94107 \n","21 307 Page St Unit 1 San Francisco CA 94102 \n","22 1080 Sutter St Unit 202 San Francisco CA 94109 \n","23 3328 26th St Unit 3330 San Francisco CA 94110 \n","24 1776 Sacramento St Unit 503 San Francisco CA 94109 \n","25 2317 15th St San Francisco CA 94114 \n","26 2475 47th Ave San Francisco CA 94116 \n","27 50 Lansing St Unit 201 San Francisco CA 94105 \n","28 15 Joy St San Francisco CA 94110 \n","29 50 Lansing St Unit 403 San Francisco CA 94105 \n","30 338 Spear St Unit 6J San Francisco CA 94105 \n","31 10 Innes Ct San Francisco CA 94124 \n","32 10 Innes Ct San Francisco CA 94124 \n","33 10 Innes Ct San Francisco CA 94124 \n","34 170 Thrift St San Francisco CA 94112 \n","35 1922 43rd Ave San Francisco CA 94116 \n","36 150 Putnam St San Francisco CA 94110 \n","37 327 Ordway St San Francisco CA 94134 \n","38 272 Farallones St San Francisco CA 94112 \n","39 0 Evans Ave San Francisco CA 94124 \n","\n"," tags agent_name \\\n","0 [New construction] Eddie O'Sullivan \n","1 [Edwardian-style, investment, owner-occupants] Sean Engmann \n","2 [Nob Hill, 3-unit building, investment] Eddie O'Sullivan \n","3 [Bay view, studio, modern appliances] Tim Gullicksen \n","4 [Victorian, Bed & Breakfast, Gilded Age] Bonnie Spindler \n","5 [Russian Hill, views, exclusive-use deck] Eddie O'Sullivan \n","6 [] Joe Polyak \n","7 [] Glenda Queensbury \n","8 [] Jeannie Anderson \n","9 [] Mohamed Lakdawala \n","10 [] Mary Anne Villamil \n","11 [] Kristen Haenggi \n","12 [] Maureen DeBoer \n","13 [] Alicia Atienza \n","14 [] Milan Jezdimirovic \n","15 [Amazing Property, Marina Style, Needs TLC] Leo Cheung \n","16 [Light-filled, Freshly painted, Walker's parad... Grace Sun \n","17 [Unobstructed bay bridge views, Open layout] Matt Fuller \n","18 [Contractor's Special, Fixer-upper] Jaymee Faith Sagisi \n","19 [Historic Art Deco, Abundant natural light] NA \n","20 [] Miyuki Takami \n","21 [] NA \n","22 [] Annette Liberty \n","23 [] Isaac Munene \n","24 [] Marilyn Becklehimer \n","25 [] Joel Gile \n","26 [] Lucy Goldenshteyn \n","27 [] Tracey Broadman \n","28 [] Mike Stack \n","29 [] Robyn Kaufman \n","30 [Spacious, Balcony, Bright courtyard views] Paul Hwang \n","31 [New Construction, 1-car garage] Lennar \n","32 [New Construction, 1-car garage] Lennar \n","33 [New Construction, 1-car garage] Lennar \n","34 [Updated, Natural light] Cristal Wright \n","35 [] Mila Romprey \n","36 [] Genie Mantzoros \n","37 [] Shawn Zahraie \n","38 [] Janice Lee \n","39 [Land, 0.12 Acre, $251,467 per Acre] Heidy Carrera \n","\n"," agency \n","0 Elevation Real Estate \n","1 eXp Realty of Northern CA Inc. \n","2 Elevation Real Estate \n","3 Corcoran Icon Properties \n","4 Corcoran Icon Properties \n","5 Elevation Real Estate \n","6 Rise Homes \n","7 Referral Realty-BV \n","8 Coldwell Banker Realty \n","9 Remax Prestigious Properties \n","10 Kinetic Real Estate \n","11 Compass \n","12 LKJ Realty \n","13 Statewide Realty \n","14 Compass \n","15 eXp Realty of California, Inc \n","16 Compass \n","17 Jackson Fuller Real Estate \n","18 IMPACT \n","19 NA \n","20 eXp Realty of California, Inc \n","21 NA \n","22 Coldwell Banker Realty \n","23 Coldwell Banker Realty \n","24 Dio Real Estate \n","25 Sequoia Real Estate \n","26 Redfin \n","27 Vanguard Properties \n","28 Vanguard Properties \n","29 Vivre Real Estate \n","30 Skybox Realty \n","31 Lennar \n","32 Lennar \n","33 Lennar \n","34 Vanguard Properties \n","35 Premier Realty Associates \n","36 Epic Real Estate & Asso. Inc. \n","37 Affinity Enterprises, Inc \n","38 Coldwell Banker Realty \n","39 Berkshire Hathaway HomeService "]},"execution_count":8,"metadata":{},"output_type":"execute_result"}],"source":["import pandas as pd\n","\n","# Convert dictionary to DataFrame\n","df = pd.DataFrame(result[\"houses\"])\n","df"]},{"cell_type":"markdown","metadata":{"id":"v0CBYVk7qA5Z"},"source":["Save it to CSV"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"BtEbB9pmQGhO","outputId":"5b64dab7-65d7-42cd-cd08-1631e64b28eb"},"outputs":[{"name":"stdout","output_type":"stream","text":["Data saved to houses_forsale.csv\n"]}],"source":["# Save the DataFrame to a CSV file\n","csv_file = \"houses_forsale.csv\"\n","df.to_csv(csv_file, index=False)\n","print(f\"Data saved to {csv_file}\")"]},{"cell_type":"markdown","metadata":{"id":"-1SZT8VzTZNd"},"source":["## 🔗 Resources"]},{"cell_type":"markdown","metadata":{"id":"dUi2LtMLRDDR"},"source":["\n","
\n","
\n","
\n","\n","\n","- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n","- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n","- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n","- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n","- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n","- 🦜 **Langchain:** [ScrapeGraph docs](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n","\n","Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"]}],"metadata":{"colab":{"provenance":[]},"kernelspec":{"display_name":"Python 3","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.14"}},"nbformat":4,"nbformat_minor":0}
+{
+ "cells": [
+ {
+ "source": "## 🕷️ Extract Houses Listing with langchain-scrapegraph\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-langchain-2) [](https://colab.research.google.com/drive/1jpr-ch0NTKG7xc6aPIg_kJHfdTWVLb5L?usp=sharing)",
+ "outputs": [
+ {
+ "data": {
+ "text/plain": "## 🕷️ Extract Houses Listing with langchain-scrapegraph\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-langchain-2) [](https://colab.research.google.com/drive/1jpr-ch0NTKG7xc6aPIg_kJHfdTWVLb5L?usp=sharing)",
+ "text/markdown": "## 🕷️ Extract Houses Listing with langchain-scrapegraph\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-langchain-2) [](https://colab.research.google.com/drive/1jpr-ch0NTKG7xc6aPIg_kJHfdTWVLb5L?usp=sharing)"
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "metadata": {
+ "id": "jEkuKbcRrPcK"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "completed",
+ "execution_count": null,
+ "executionEndTime": "2026-03-26T00:10:19.635Z",
+ "executionStartTime": "2026-03-26T00:10:19.635Z"
+ },
+ {
+ "source": "",
+ "outputs": [],
+ "metadata": {
+ "id": "5cVkde_LpVkF"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 🔧 Install `dependencies`",
+ "outputs": [],
+ "metadata": {
+ "id": "IzsyDXEWwPVt"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "%%capture\n!pip install langchain-scrapegraph",
+ "outputs": [],
+ "metadata": {
+ "id": "os_vm0MkIxr9"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 🔑 Import `ScrapeGraph` API key",
+ "outputs": [],
+ "metadata": {
+ "id": "apBsL-L2KzM7"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)",
+ "outputs": [],
+ "metadata": {
+ "id": "ol9gQbAFkh9b"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "import getpass\nimport os\n\nif not os.environ.get(\"SGAI_API_KEY\"):\n os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")",
+ "outputs": [
+ {
+ "name": "stdout",
+ "text": [
+ "SGAI_API_KEY found in environment.\n"
+ ],
+ "output_type": "stream"
+ }
+ ],
+ "metadata": {
+ "id": "sffqFG2EJ8bI",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "c588274d-64ce-4d13-f12d-0458d9c4839d"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 📝 Defining an `Output Schema` for Webpage Content Extraction\n",
+ "outputs": [],
+ "metadata": {
+ "id": "jnqMB2-xVYQ7"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n\n
\n Pydantic Schema Quick Guide
\n\nTypes of Schemas \n\n1. Simple Schema \nUse this when you want to extract straightforward information, such as a single piece of content. \n\n```python\nfrom pydantic import BaseModel, Field\n\n# Simple schema for a single webpage\nclass PageInfoSchema(BaseModel):\n title: str = Field(description=\"The title of the webpage\")\n description: str = Field(description=\"The description of the webpage\")\n\n# Example Output JSON after AI extraction\n{\n \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n}\n```\n\n2. Complex Schema (Nested) \nIf you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n\n```python\nfrom pydantic import BaseModel, Field\nfrom typing import List\n\n# Define a schema for a single repository\nclass RepositorySchema(BaseModel):\n name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n description: str = Field(description=\"Description of the repository\")\n stars: int = Field(description=\"Star count of the repository\")\n forks: int = Field(description=\"Fork count of the repository\")\n today_stars: int = Field(description=\"Stars gained today\")\n language: str = Field(description=\"Programming language used\")\n\n# Define a schema for a list of repositories\nclass ListRepositoriesSchema(BaseModel):\n repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n\n# Example Output JSON after AI extraction\n{\n \"repositories\": [\n {\n \"name\": \"google-gemini/cookbook\",\n \"description\": \"Examples and guides for using the Gemini API\",\n \"stars\": 8036,\n \"forks\": 1001,\n \"today_stars\": 649,\n \"language\": \"Jupyter Notebook\"\n },\n {\n \"name\": \"TEN-framework/TEN-Agent\",\n \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n \"stars\": 3224,\n \"forks\": 311,\n \"today_stars\": 361,\n \"language\": \"Python\"\n }\n ]\n}\n```\n\nKey Takeaways \n- **Simple Schema**: Perfect for small, straightforward extractions. \n- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n\nBoth approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n \n",
+ "outputs": [],
+ "metadata": {
+ "id": "VZvxbjfXvbgd"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "from pydantic import BaseModel, Field\nfrom typing import List, Optional\n\n# Schema for a single house listing\nclass HouseListingSchema(BaseModel):\n price: int = Field(description=\"Price of the house in USD\")\n bedrooms: int = Field(description=\"Number of bedrooms\")\n bathrooms: int = Field(description=\"Number of bathrooms\")\n square_feet: int = Field(description=\"Total square footage of the house\")\n address: str = Field(description=\"Address of the house\")\n city: str = Field(description=\"City where the house is located\")\n state: str = Field(description=\"State where the house is located\")\n zip_code: str = Field(description=\"ZIP code of the house location\")\n tags: List[str] = Field(description=\"Tags like 'New construction' or 'Large garage'\")\n agent_name: str = Field(description=\"Name of the listing agent\")\n agency: str = Field(description=\"Agency listing the house\")\n\n# Schema containing a list of house listings\nclass HousesListingsSchema(BaseModel):\n houses: List[HouseListingSchema] = Field(description=\"List of house listings on Homes or similar platforms\")\n",
+ "outputs": [],
+ "metadata": {
+ "id": "dlrOEgZk_8V4"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 🚀 Initialize `langchain-scrapegraph` tools and start extraction",
+ "outputs": [],
+ "metadata": {
+ "id": "cDGH0b2DkY63"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Here we use `SmartScraperTool` to extract structured data using AI from a webpage.\n\n\n> If you already have an HTML file, you can upload it and use `LocalScraperTool` instead.\n\nYou can find more info in the [official langchain documentation](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n\n",
+ "outputs": [],
+ "metadata": {
+ "id": "M1KSXffZopUD"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "from langchain_scrapegraph.tools import SmartScraperTool\n\n# Will automatically get SGAI_API_KEY from environment\n# Initialization without output schema\n# tool = SmartScraperTool()\n\n# Since we have defined an output schema, let's use it\n# This will force the tool to have always the same output structure\ntool = SmartScraperTool(llm_output_schema=HousesListingsSchema)",
+ "outputs": [],
+ "metadata": {
+ "id": "p2BFhL53ore1"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "`Invoke` the tool",
+ "outputs": [],
+ "metadata": {
+ "id": "iCOcvpuOoubk"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "# Request for Homes Listings\nresult = tool.invoke({\n \"website_url\":\"https://www.homes.com/san-francisco-ca/?bb=nzpwspy0mS749snkvsb\",\n \"user_prompt\":\"Extract info about the houses visible on the page\",\n }\n)",
+ "outputs": [],
+ "metadata": {
+ "id": "2FIKomclLNFx"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "> As you may have noticed, we are not passing the `llm_output_schema` while invoking the tool, this will make life easier to `AI agents` since they will not need to generate one themselves with high risk of failure. Instead, we force the tool to return always a structured output that follows your previously defined schema. To find out more, check the following [README](https://github.com/ScrapeGraphAI/langchain-scrapegraph)\n",
+ "outputs": [],
+ "metadata": {
+ "id": "gR2UZZwzo9Sn"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Print the response",
+ "outputs": [],
+ "metadata": {
+ "id": "YZz1bqCIpoL8"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "import json\n\nprint(json.dumps(result, indent=2))",
+ "outputs": [
+ {
+ "name": "stdout",
+ "text": [
+ "Houses:\n",
+ "{\n",
+ " \"houses\": [\n",
+ " {\n",
+ " \"price\": 549000,\n",
+ " \"bedrooms\": 1,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 0,\n",
+ " \"address\": \"380 14th St Unit 405\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94103\",\n",
+ " \"tags\": [\n",
+ " \"New construction\"\n",
+ " ],\n",
+ " \"agent_name\": \"Eddie O'Sullivan\",\n",
+ " \"agency\": \"Elevation Real Estate\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1799000,\n",
+ " \"bedrooms\": 4,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 2735,\n",
+ " \"address\": \"123 Grattan St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94117\",\n",
+ " \"tags\": [\n",
+ " \"Edwardian-style\",\n",
+ " \"investment\",\n",
+ " \"owner-occupants\"\n",
+ " ],\n",
+ " \"agent_name\": \"Sean Engmann\",\n",
+ " \"agency\": \"eXp Realty of Northern CA Inc.\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1995000,\n",
+ " \"bedrooms\": 7,\n",
+ " \"bathrooms\": 3,\n",
+ " \"square_feet\": 3330,\n",
+ " \"address\": \"1590 Washington St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94109\",\n",
+ " \"tags\": [\n",
+ " \"Nob Hill\",\n",
+ " \"3-unit building\",\n",
+ " \"investment\"\n",
+ " ],\n",
+ " \"agent_name\": \"Eddie O'Sullivan\",\n",
+ " \"agency\": \"Elevation Real Estate\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 549000,\n",
+ " \"bedrooms\": 0,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 477,\n",
+ " \"address\": \"240 Lombard St Unit 835\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94111\",\n",
+ " \"tags\": [\n",
+ " \"Bay view\",\n",
+ " \"studio\",\n",
+ " \"modern appliances\"\n",
+ " ],\n",
+ " \"agent_name\": \"Tim Gullicksen\",\n",
+ " \"agency\": \"Corcoran Icon Properties\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 5495000,\n",
+ " \"bedrooms\": 10,\n",
+ " \"bathrooms\": 7,\n",
+ " \"square_feet\": 6505,\n",
+ " \"address\": \"1057 Steiner St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94115\",\n",
+ " \"tags\": [\n",
+ " \"Victorian\",\n",
+ " \"Bed & Breakfast\",\n",
+ " \"Gilded Age\"\n",
+ " ],\n",
+ " \"agent_name\": \"Bonnie Spindler\",\n",
+ " \"agency\": \"Corcoran Icon Properties\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 925000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 779,\n",
+ " \"address\": \"2 Fallon Place Unit 57\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94133\",\n",
+ " \"tags\": [\n",
+ " \"Russian Hill\",\n",
+ " \"views\",\n",
+ " \"exclusive-use deck\"\n",
+ " ],\n",
+ " \"agent_name\": \"Eddie O'Sullivan\",\n",
+ " \"agency\": \"Elevation Real Estate\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 898000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1175,\n",
+ " \"address\": \"5160 Diamond Heights Blvd Unit 208C\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94131\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Joe Polyak\",\n",
+ " \"agency\": \"Rise Homes\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1700000,\n",
+ " \"bedrooms\": 4,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1950,\n",
+ " \"address\": \"1351 26th Ave\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94122\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Glenda Queensbury\",\n",
+ " \"agency\": \"Referral Realty-BV\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1899000,\n",
+ " \"bedrooms\": 3,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1560,\n",
+ " \"address\": \"340 Yerba Buena Ave\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94127\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Jeannie Anderson\",\n",
+ " \"agency\": \"Coldwell Banker Realty\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 850000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1055,\n",
+ " \"address\": \"588 Minna Unit 604\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94103\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Mohamed Lakdawala\",\n",
+ " \"agency\": \"Remax Prestigious Properties\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1990000,\n",
+ " \"bedrooms\": 3,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 1280,\n",
+ " \"address\": \"1450 Diamond St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94131\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Mary Anne Villamil\",\n",
+ " \"agency\": \"Kinetic Real Estate\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 849000,\n",
+ " \"bedrooms\": 1,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 855,\n",
+ " \"address\": \"81 Lansing St Unit 401\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94105\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Kristen Haenggi\",\n",
+ " \"agency\": \"Compass\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1080000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 936,\n",
+ " \"address\": \"451 Kansas St Unit 466\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94107\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Maureen DeBoer\",\n",
+ " \"agency\": \"LKJ Realty\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1499000,\n",
+ " \"bedrooms\": 4,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 2145,\n",
+ " \"address\": \"486 Yale St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94134\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Alicia Atienza\",\n",
+ " \"agency\": \"Statewide Realty\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1140000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 998,\n",
+ " \"address\": \"588 Minna Unit 801\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94103\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Milan Jezdimirovic\",\n",
+ " \"agency\": \"Compass\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1988000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 3800,\n",
+ " \"address\": \"183 19th Ave\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94121\",\n",
+ " \"tags\": [\n",
+ " \"Amazing Property\",\n",
+ " \"Marina Style\",\n",
+ " \"Needs TLC\"\n",
+ " ],\n",
+ " \"agent_name\": \"Leo Cheung\",\n",
+ " \"agency\": \"eXp Realty of California, Inc\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1218000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1275,\n",
+ " \"address\": \"1998 Pacific Ave Unit 202\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94109\",\n",
+ " \"tags\": [\n",
+ " \"Light-filled\",\n",
+ " \"Freshly painted\",\n",
+ " \"Walker's paradise\"\n",
+ " ],\n",
+ " \"agent_name\": \"Grace Sun\",\n",
+ " \"agency\": \"Compass\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 895000,\n",
+ " \"bedrooms\": 1,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 837,\n",
+ " \"address\": \"425 1st St Unit 2501\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94105\",\n",
+ " \"tags\": [\n",
+ " \"Unobstructed bay bridge views\",\n",
+ " \"Open layout\"\n",
+ " ],\n",
+ " \"agent_name\": \"Matt Fuller\",\n",
+ " \"agency\": \"Jackson Fuller Real Estate\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1499000,\n",
+ " \"bedrooms\": 3,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 1500,\n",
+ " \"address\": \"Unlisted Address\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"NA\",\n",
+ " \"tags\": [\n",
+ " \"Contractor's Special\",\n",
+ " \"Fixer-upper\"\n",
+ " ],\n",
+ " \"agent_name\": \"Jaymee Faith Sagisi\",\n",
+ " \"agency\": \"IMPACT\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 900000,\n",
+ " \"bedrooms\": 1,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 930,\n",
+ " \"address\": \"1101 Green St Unit 302\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94109\",\n",
+ " \"tags\": [\n",
+ " \"Historic Art Deco\",\n",
+ " \"Abundant natural light\"\n",
+ " ],\n",
+ " \"agent_name\": \"NA\",\n",
+ " \"agency\": \"NA\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 858000,\n",
+ " \"bedrooms\": 1,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 1104,\n",
+ " \"address\": \"260 King St Unit 557\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94107\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Miyuki Takami\",\n",
+ " \"agency\": \"eXp Realty of California, Inc\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 945000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 767,\n",
+ " \"address\": \"307 Page St Unit 1\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94102\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"NA\",\n",
+ " \"agency\": \"NA\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1099000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1330,\n",
+ " \"address\": \"1080 Sutter St Unit 202\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94109\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Annette Liberty\",\n",
+ " \"agency\": \"Coldwell Banker Realty\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 950000,\n",
+ " \"bedrooms\": 4,\n",
+ " \"bathrooms\": 3,\n",
+ " \"square_feet\": 2090,\n",
+ " \"address\": \"3328 26th St Unit 3330\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94110\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Isaac Munene\",\n",
+ " \"agency\": \"Coldwell Banker Realty\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1088000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1065,\n",
+ " \"address\": \"1776 Sacramento St Unit 503\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94109\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Marilyn Becklehimer\",\n",
+ " \"agency\": \"Dio Real Estate\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1788888,\n",
+ " \"bedrooms\": 4,\n",
+ " \"bathrooms\": 3,\n",
+ " \"square_feet\": 1856,\n",
+ " \"address\": \"2317 15th St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94114\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Joel Gile\",\n",
+ " \"agency\": \"Sequoia Real Estate\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1650000,\n",
+ " \"bedrooms\": 3,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1547,\n",
+ " \"address\": \"2475 47th Ave\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94116\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Lucy Goldenshteyn\",\n",
+ " \"agency\": \"Redfin\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 998000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1202,\n",
+ " \"address\": \"50 Lansing St Unit 201\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94105\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Tracey Broadman\",\n",
+ " \"agency\": \"Vanguard Properties\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1595000,\n",
+ " \"bedrooms\": 3,\n",
+ " \"bathrooms\": 5,\n",
+ " \"square_feet\": 1995,\n",
+ " \"address\": \"15 Joy St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94110\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Mike Stack\",\n",
+ " \"agency\": \"Vanguard Properties\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1028000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1065,\n",
+ " \"address\": \"50 Lansing St Unit 403\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94105\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Robyn Kaufman\",\n",
+ " \"agency\": \"Vivre Real Estate\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 999000,\n",
+ " \"bedrooms\": 1,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 1021,\n",
+ " \"address\": \"338 Spear St Unit 6J\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94105\",\n",
+ " \"tags\": [\n",
+ " \"Spacious\",\n",
+ " \"Balcony\",\n",
+ " \"Bright courtyard views\"\n",
+ " ],\n",
+ " \"agent_name\": \"Paul Hwang\",\n",
+ " \"agency\": \"Skybox Realty\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 799800,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1109,\n",
+ " \"address\": \"10 Innes Ct\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94124\",\n",
+ " \"tags\": [\n",
+ " \"New Construction\",\n",
+ " \"1-car garage\"\n",
+ " ],\n",
+ " \"agent_name\": \"Lennar\",\n",
+ " \"agency\": \"Lennar\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 529880,\n",
+ " \"bedrooms\": 1,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 740,\n",
+ " \"address\": \"10 Innes Ct\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94124\",\n",
+ " \"tags\": [\n",
+ " \"New Construction\",\n",
+ " \"1-car garage\"\n",
+ " ],\n",
+ " \"agent_name\": \"Lennar\",\n",
+ " \"agency\": \"Lennar\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 489000,\n",
+ " \"bedrooms\": 1,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 741,\n",
+ " \"address\": \"10 Innes Ct\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94124\",\n",
+ " \"tags\": [\n",
+ " \"New Construction\",\n",
+ " \"1-car garage\"\n",
+ " ],\n",
+ " \"agent_name\": \"Lennar\",\n",
+ " \"agency\": \"Lennar\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1359000,\n",
+ " \"bedrooms\": 4,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1845,\n",
+ " \"address\": \"170 Thrift St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94112\",\n",
+ " \"tags\": [\n",
+ " \"Updated\",\n",
+ " \"Natural light\"\n",
+ " ],\n",
+ " \"agent_name\": \"Cristal Wright\",\n",
+ " \"agency\": \"Vanguard Properties\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1295000,\n",
+ " \"bedrooms\": 3,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 1214,\n",
+ " \"address\": \"1922 43rd Ave\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94116\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Mila Romprey\",\n",
+ " \"agency\": \"Premier Realty Associates\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1098000,\n",
+ " \"bedrooms\": 3,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 1006,\n",
+ " \"address\": \"150 Putnam St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94110\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Genie Mantzoros\",\n",
+ " \"agency\": \"Epic Real Estate & Asso. Inc.\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1189870,\n",
+ " \"bedrooms\": 3,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1436,\n",
+ " \"address\": \"327 Ordway St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94134\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Shawn Zahraie\",\n",
+ " \"agency\": \"Affinity Enterprises, Inc\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 899000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 1118,\n",
+ " \"address\": \"272 Farallones St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94112\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Janice Lee\",\n",
+ " \"agency\": \"Coldwell Banker Realty\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 30000,\n",
+ " \"bedrooms\": 0,\n",
+ " \"bathrooms\": 0,\n",
+ " \"square_feet\": 0,\n",
+ " \"address\": \"0 Evans Ave\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94124\",\n",
+ " \"tags\": [\n",
+ " \"Land\",\n",
+ " \"0.12 Acre\",\n",
+ " \"$251,467 per Acre\"\n",
+ " ],\n",
+ " \"agent_name\": \"Heidy Carrera\",\n",
+ " \"agency\": \"Berkshire Hathaway HomeService\"\n",
+ " }\n",
+ " ]\n",
+ "}\n"
+ ],
+ "output_type": "stream"
+ }
+ ],
+ "metadata": {
+ "id": "F1VfD8B4LPc8",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "00597cd7-bac8-4af1-a5fe-d88d2f0ffa8d"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 💾 Save the output to a `CSV` file",
+ "outputs": [],
+ "metadata": {
+ "id": "2as65QLypwdb"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Let's create a pandas dataframe and show the table with the extracted content",
+ "outputs": [],
+ "metadata": {
+ "id": "HTLVFgbVLLBR"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "import pandas as pd\n\n# Convert dictionary to DataFrame\ndf = pd.DataFrame(result[\"houses\"])\ndf",
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " price | \n",
+ " bedrooms | \n",
+ " bathrooms | \n",
+ " square_feet | \n",
+ " address | \n",
+ " city | \n",
+ " state | \n",
+ " zip_code | \n",
+ " tags | \n",
+ " agent_name | \n",
+ " agency | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | 0 | \n",
+ " 549000 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 0 | \n",
+ " 380 14th St Unit 405 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94103 | \n",
+ " [New construction] | \n",
+ " Eddie O'Sullivan | \n",
+ " Elevation Real Estate | \n",
+ "
\n",
+ " \n",
+ " | 1 | \n",
+ " 1799000 | \n",
+ " 4 | \n",
+ " 2 | \n",
+ " 2735 | \n",
+ " 123 Grattan St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94117 | \n",
+ " [Edwardian-style, investment, owner-occupants] | \n",
+ " Sean Engmann | \n",
+ " eXp Realty of Northern CA Inc. | \n",
+ "
\n",
+ " \n",
+ " | 2 | \n",
+ " 1995000 | \n",
+ " 7 | \n",
+ " 3 | \n",
+ " 3330 | \n",
+ " 1590 Washington St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94109 | \n",
+ " [Nob Hill, 3-unit building, investment] | \n",
+ " Eddie O'Sullivan | \n",
+ " Elevation Real Estate | \n",
+ "
\n",
+ " \n",
+ " | 3 | \n",
+ " 549000 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 477 | \n",
+ " 240 Lombard St Unit 835 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94111 | \n",
+ " [Bay view, studio, modern appliances] | \n",
+ " Tim Gullicksen | \n",
+ " Corcoran Icon Properties | \n",
+ "
\n",
+ " \n",
+ " | 4 | \n",
+ " 5495000 | \n",
+ " 10 | \n",
+ " 7 | \n",
+ " 6505 | \n",
+ " 1057 Steiner St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94115 | \n",
+ " [Victorian, Bed & Breakfast, Gilded Age] | \n",
+ " Bonnie Spindler | \n",
+ " Corcoran Icon Properties | \n",
+ "
\n",
+ " \n",
+ " | 5 | \n",
+ " 925000 | \n",
+ " 2 | \n",
+ " 1 | \n",
+ " 779 | \n",
+ " 2 Fallon Place Unit 57 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94133 | \n",
+ " [Russian Hill, views, exclusive-use deck] | \n",
+ " Eddie O'Sullivan | \n",
+ " Elevation Real Estate | \n",
+ "
\n",
+ " \n",
+ " | 6 | \n",
+ " 898000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 1175 | \n",
+ " 5160 Diamond Heights Blvd Unit 208C | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94131 | \n",
+ " [] | \n",
+ " Joe Polyak | \n",
+ " Rise Homes | \n",
+ "
\n",
+ " \n",
+ " | 7 | \n",
+ " 1700000 | \n",
+ " 4 | \n",
+ " 2 | \n",
+ " 1950 | \n",
+ " 1351 26th Ave | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94122 | \n",
+ " [] | \n",
+ " Glenda Queensbury | \n",
+ " Referral Realty-BV | \n",
+ "
\n",
+ " \n",
+ " | 8 | \n",
+ " 1899000 | \n",
+ " 3 | \n",
+ " 2 | \n",
+ " 1560 | \n",
+ " 340 Yerba Buena Ave | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94127 | \n",
+ " [] | \n",
+ " Jeannie Anderson | \n",
+ " Coldwell Banker Realty | \n",
+ "
\n",
+ " \n",
+ " | 9 | \n",
+ " 850000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 1055 | \n",
+ " 588 Minna Unit 604 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94103 | \n",
+ " [] | \n",
+ " Mohamed Lakdawala | \n",
+ " Remax Prestigious Properties | \n",
+ "
\n",
+ " \n",
+ " | 10 | \n",
+ " 1990000 | \n",
+ " 3 | \n",
+ " 1 | \n",
+ " 1280 | \n",
+ " 1450 Diamond St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94131 | \n",
+ " [] | \n",
+ " Mary Anne Villamil | \n",
+ " Kinetic Real Estate | \n",
+ "
\n",
+ " \n",
+ " | 11 | \n",
+ " 849000 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 855 | \n",
+ " 81 Lansing St Unit 401 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94105 | \n",
+ " [] | \n",
+ " Kristen Haenggi | \n",
+ " Compass | \n",
+ "
\n",
+ " \n",
+ " | 12 | \n",
+ " 1080000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 936 | \n",
+ " 451 Kansas St Unit 466 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94107 | \n",
+ " [] | \n",
+ " Maureen DeBoer | \n",
+ " LKJ Realty | \n",
+ "
\n",
+ " \n",
+ " | 13 | \n",
+ " 1499000 | \n",
+ " 4 | \n",
+ " 2 | \n",
+ " 2145 | \n",
+ " 486 Yale St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94134 | \n",
+ " [] | \n",
+ " Alicia Atienza | \n",
+ " Statewide Realty | \n",
+ "
\n",
+ " \n",
+ " | 14 | \n",
+ " 1140000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 998 | \n",
+ " 588 Minna Unit 801 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94103 | \n",
+ " [] | \n",
+ " Milan Jezdimirovic | \n",
+ " Compass | \n",
+ "
\n",
+ " \n",
+ " | 15 | \n",
+ " 1988000 | \n",
+ " 2 | \n",
+ " 1 | \n",
+ " 3800 | \n",
+ " 183 19th Ave | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94121 | \n",
+ " [Amazing Property, Marina Style, Needs TLC] | \n",
+ " Leo Cheung | \n",
+ " eXp Realty of California, Inc | \n",
+ "
\n",
+ " \n",
+ " | 16 | \n",
+ " 1218000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 1275 | \n",
+ " 1998 Pacific Ave Unit 202 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94109 | \n",
+ " [Light-filled, Freshly painted, Walker's parad... | \n",
+ " Grace Sun | \n",
+ " Compass | \n",
+ "
\n",
+ " \n",
+ " | 17 | \n",
+ " 895000 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 837 | \n",
+ " 425 1st St Unit 2501 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94105 | \n",
+ " [Unobstructed bay bridge views, Open layout] | \n",
+ " Matt Fuller | \n",
+ " Jackson Fuller Real Estate | \n",
+ "
\n",
+ " \n",
+ " | 18 | \n",
+ " 1499000 | \n",
+ " 3 | \n",
+ " 1 | \n",
+ " 1500 | \n",
+ " Unlisted Address | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " NA | \n",
+ " [Contractor's Special, Fixer-upper] | \n",
+ " Jaymee Faith Sagisi | \n",
+ " IMPACT | \n",
+ "
\n",
+ " \n",
+ " | 19 | \n",
+ " 900000 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 930 | \n",
+ " 1101 Green St Unit 302 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94109 | \n",
+ " [Historic Art Deco, Abundant natural light] | \n",
+ " NA | \n",
+ " NA | \n",
+ "
\n",
+ " \n",
+ " | 20 | \n",
+ " 858000 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 1104 | \n",
+ " 260 King St Unit 557 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94107 | \n",
+ " [] | \n",
+ " Miyuki Takami | \n",
+ " eXp Realty of California, Inc | \n",
+ "
\n",
+ " \n",
+ " | 21 | \n",
+ " 945000 | \n",
+ " 2 | \n",
+ " 1 | \n",
+ " 767 | \n",
+ " 307 Page St Unit 1 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94102 | \n",
+ " [] | \n",
+ " NA | \n",
+ " NA | \n",
+ "
\n",
+ " \n",
+ " | 22 | \n",
+ " 1099000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 1330 | \n",
+ " 1080 Sutter St Unit 202 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94109 | \n",
+ " [] | \n",
+ " Annette Liberty | \n",
+ " Coldwell Banker Realty | \n",
+ "
\n",
+ " \n",
+ " | 23 | \n",
+ " 950000 | \n",
+ " 4 | \n",
+ " 3 | \n",
+ " 2090 | \n",
+ " 3328 26th St Unit 3330 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94110 | \n",
+ " [] | \n",
+ " Isaac Munene | \n",
+ " Coldwell Banker Realty | \n",
+ "
\n",
+ " \n",
+ " | 24 | \n",
+ " 1088000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 1065 | \n",
+ " 1776 Sacramento St Unit 503 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94109 | \n",
+ " [] | \n",
+ " Marilyn Becklehimer | \n",
+ " Dio Real Estate | \n",
+ "
\n",
+ " \n",
+ " | 25 | \n",
+ " 1788888 | \n",
+ " 4 | \n",
+ " 3 | \n",
+ " 1856 | \n",
+ " 2317 15th St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94114 | \n",
+ " [] | \n",
+ " Joel Gile | \n",
+ " Sequoia Real Estate | \n",
+ "
\n",
+ " \n",
+ " | 26 | \n",
+ " 1650000 | \n",
+ " 3 | \n",
+ " 2 | \n",
+ " 1547 | \n",
+ " 2475 47th Ave | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94116 | \n",
+ " [] | \n",
+ " Lucy Goldenshteyn | \n",
+ " Redfin | \n",
+ "
\n",
+ " \n",
+ " | 27 | \n",
+ " 998000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 1202 | \n",
+ " 50 Lansing St Unit 201 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94105 | \n",
+ " [] | \n",
+ " Tracey Broadman | \n",
+ " Vanguard Properties | \n",
+ "
\n",
+ " \n",
+ " | 28 | \n",
+ " 1595000 | \n",
+ " 3 | \n",
+ " 5 | \n",
+ " 1995 | \n",
+ " 15 Joy St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94110 | \n",
+ " [] | \n",
+ " Mike Stack | \n",
+ " Vanguard Properties | \n",
+ "
\n",
+ " \n",
+ " | 29 | \n",
+ " 1028000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 1065 | \n",
+ " 50 Lansing St Unit 403 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94105 | \n",
+ " [] | \n",
+ " Robyn Kaufman | \n",
+ " Vivre Real Estate | \n",
+ "
\n",
+ " \n",
+ " | 30 | \n",
+ " 999000 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 1021 | \n",
+ " 338 Spear St Unit 6J | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94105 | \n",
+ " [Spacious, Balcony, Bright courtyard views] | \n",
+ " Paul Hwang | \n",
+ " Skybox Realty | \n",
+ "
\n",
+ " \n",
+ " | 31 | \n",
+ " 799800 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 1109 | \n",
+ " 10 Innes Ct | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94124 | \n",
+ " [New Construction, 1-car garage] | \n",
+ " Lennar | \n",
+ " Lennar | \n",
+ "
\n",
+ " \n",
+ " | 32 | \n",
+ " 529880 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 740 | \n",
+ " 10 Innes Ct | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94124 | \n",
+ " [New Construction, 1-car garage] | \n",
+ " Lennar | \n",
+ " Lennar | \n",
+ "
\n",
+ " \n",
+ " | 33 | \n",
+ " 489000 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 741 | \n",
+ " 10 Innes Ct | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94124 | \n",
+ " [New Construction, 1-car garage] | \n",
+ " Lennar | \n",
+ " Lennar | \n",
+ "
\n",
+ " \n",
+ " | 34 | \n",
+ " 1359000 | \n",
+ " 4 | \n",
+ " 2 | \n",
+ " 1845 | \n",
+ " 170 Thrift St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94112 | \n",
+ " [Updated, Natural light] | \n",
+ " Cristal Wright | \n",
+ " Vanguard Properties | \n",
+ "
\n",
+ " \n",
+ " | 35 | \n",
+ " 1295000 | \n",
+ " 3 | \n",
+ " 1 | \n",
+ " 1214 | \n",
+ " 1922 43rd Ave | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94116 | \n",
+ " [] | \n",
+ " Mila Romprey | \n",
+ " Premier Realty Associates | \n",
+ "
\n",
+ " \n",
+ " | 36 | \n",
+ " 1098000 | \n",
+ " 3 | \n",
+ " 1 | \n",
+ " 1006 | \n",
+ " 150 Putnam St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94110 | \n",
+ " [] | \n",
+ " Genie Mantzoros | \n",
+ " Epic Real Estate & Asso. Inc. | \n",
+ "
\n",
+ " \n",
+ " | 37 | \n",
+ " 1189870 | \n",
+ " 3 | \n",
+ " 2 | \n",
+ " 1436 | \n",
+ " 327 Ordway St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94134 | \n",
+ " [] | \n",
+ " Shawn Zahraie | \n",
+ " Affinity Enterprises, Inc | \n",
+ "
\n",
+ " \n",
+ " | 38 | \n",
+ " 899000 | \n",
+ " 2 | \n",
+ " 1 | \n",
+ " 1118 | \n",
+ " 272 Farallones St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94112 | \n",
+ " [] | \n",
+ " Janice Lee | \n",
+ " Coldwell Banker Realty | \n",
+ "
\n",
+ " \n",
+ " | 39 | \n",
+ " 30000 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 Evans Ave | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94124 | \n",
+ " [Land, 0.12 Acre, $251,467 per Acre] | \n",
+ " Heidy Carrera | \n",
+ " Berkshire Hathaway HomeService | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " price bedrooms bathrooms square_feet \\\n",
+ "0 549000 1 1 0 \n",
+ "1 1799000 4 2 2735 \n",
+ "2 1995000 7 3 3330 \n",
+ "3 549000 0 1 477 \n",
+ "4 5495000 10 7 6505 \n",
+ "5 925000 2 1 779 \n",
+ "6 898000 2 2 1175 \n",
+ "7 1700000 4 2 1950 \n",
+ "8 1899000 3 2 1560 \n",
+ "9 850000 2 2 1055 \n",
+ "10 1990000 3 1 1280 \n",
+ "11 849000 1 1 855 \n",
+ "12 1080000 2 2 936 \n",
+ "13 1499000 4 2 2145 \n",
+ "14 1140000 2 2 998 \n",
+ "15 1988000 2 1 3800 \n",
+ "16 1218000 2 2 1275 \n",
+ "17 895000 1 1 837 \n",
+ "18 1499000 3 1 1500 \n",
+ "19 900000 1 1 930 \n",
+ "20 858000 1 1 1104 \n",
+ "21 945000 2 1 767 \n",
+ "22 1099000 2 2 1330 \n",
+ "23 950000 4 3 2090 \n",
+ "24 1088000 2 2 1065 \n",
+ "25 1788888 4 3 1856 \n",
+ "26 1650000 3 2 1547 \n",
+ "27 998000 2 2 1202 \n",
+ "28 1595000 3 5 1995 \n",
+ "29 1028000 2 2 1065 \n",
+ "30 999000 1 1 1021 \n",
+ "31 799800 2 2 1109 \n",
+ "32 529880 1 1 740 \n",
+ "33 489000 1 1 741 \n",
+ "34 1359000 4 2 1845 \n",
+ "35 1295000 3 1 1214 \n",
+ "36 1098000 3 1 1006 \n",
+ "37 1189870 3 2 1436 \n",
+ "38 899000 2 1 1118 \n",
+ "39 30000 0 0 0 \n",
+ "\n",
+ " address city state zip_code \\\n",
+ "0 380 14th St Unit 405 San Francisco CA 94103 \n",
+ "1 123 Grattan St San Francisco CA 94117 \n",
+ "2 1590 Washington St San Francisco CA 94109 \n",
+ "3 240 Lombard St Unit 835 San Francisco CA 94111 \n",
+ "4 1057 Steiner St San Francisco CA 94115 \n",
+ "5 2 Fallon Place Unit 57 San Francisco CA 94133 \n",
+ "6 5160 Diamond Heights Blvd Unit 208C San Francisco CA 94131 \n",
+ "7 1351 26th Ave San Francisco CA 94122 \n",
+ "8 340 Yerba Buena Ave San Francisco CA 94127 \n",
+ "9 588 Minna Unit 604 San Francisco CA 94103 \n",
+ "10 1450 Diamond St San Francisco CA 94131 \n",
+ "11 81 Lansing St Unit 401 San Francisco CA 94105 \n",
+ "12 451 Kansas St Unit 466 San Francisco CA 94107 \n",
+ "13 486 Yale St San Francisco CA 94134 \n",
+ "14 588 Minna Unit 801 San Francisco CA 94103 \n",
+ "15 183 19th Ave San Francisco CA 94121 \n",
+ "16 1998 Pacific Ave Unit 202 San Francisco CA 94109 \n",
+ "17 425 1st St Unit 2501 San Francisco CA 94105 \n",
+ "18 Unlisted Address San Francisco CA NA \n",
+ "19 1101 Green St Unit 302 San Francisco CA 94109 \n",
+ "20 260 King St Unit 557 San Francisco CA 94107 \n",
+ "21 307 Page St Unit 1 San Francisco CA 94102 \n",
+ "22 1080 Sutter St Unit 202 San Francisco CA 94109 \n",
+ "23 3328 26th St Unit 3330 San Francisco CA 94110 \n",
+ "24 1776 Sacramento St Unit 503 San Francisco CA 94109 \n",
+ "25 2317 15th St San Francisco CA 94114 \n",
+ "26 2475 47th Ave San Francisco CA 94116 \n",
+ "27 50 Lansing St Unit 201 San Francisco CA 94105 \n",
+ "28 15 Joy St San Francisco CA 94110 \n",
+ "29 50 Lansing St Unit 403 San Francisco CA 94105 \n",
+ "30 338 Spear St Unit 6J San Francisco CA 94105 \n",
+ "31 10 Innes Ct San Francisco CA 94124 \n",
+ "32 10 Innes Ct San Francisco CA 94124 \n",
+ "33 10 Innes Ct San Francisco CA 94124 \n",
+ "34 170 Thrift St San Francisco CA 94112 \n",
+ "35 1922 43rd Ave San Francisco CA 94116 \n",
+ "36 150 Putnam St San Francisco CA 94110 \n",
+ "37 327 Ordway St San Francisco CA 94134 \n",
+ "38 272 Farallones St San Francisco CA 94112 \n",
+ "39 0 Evans Ave San Francisco CA 94124 \n",
+ "\n",
+ " tags agent_name \\\n",
+ "0 [New construction] Eddie O'Sullivan \n",
+ "1 [Edwardian-style, investment, owner-occupants] Sean Engmann \n",
+ "2 [Nob Hill, 3-unit building, investment] Eddie O'Sullivan \n",
+ "3 [Bay view, studio, modern appliances] Tim Gullicksen \n",
+ "4 [Victorian, Bed & Breakfast, Gilded Age] Bonnie Spindler \n",
+ "5 [Russian Hill, views, exclusive-use deck] Eddie O'Sullivan \n",
+ "6 [] Joe Polyak \n",
+ "7 [] Glenda Queensbury \n",
+ "8 [] Jeannie Anderson \n",
+ "9 [] Mohamed Lakdawala \n",
+ "10 [] Mary Anne Villamil \n",
+ "11 [] Kristen Haenggi \n",
+ "12 [] Maureen DeBoer \n",
+ "13 [] Alicia Atienza \n",
+ "14 [] Milan Jezdimirovic \n",
+ "15 [Amazing Property, Marina Style, Needs TLC] Leo Cheung \n",
+ "16 [Light-filled, Freshly painted, Walker's parad... Grace Sun \n",
+ "17 [Unobstructed bay bridge views, Open layout] Matt Fuller \n",
+ "18 [Contractor's Special, Fixer-upper] Jaymee Faith Sagisi \n",
+ "19 [Historic Art Deco, Abundant natural light] NA \n",
+ "20 [] Miyuki Takami \n",
+ "21 [] NA \n",
+ "22 [] Annette Liberty \n",
+ "23 [] Isaac Munene \n",
+ "24 [] Marilyn Becklehimer \n",
+ "25 [] Joel Gile \n",
+ "26 [] Lucy Goldenshteyn \n",
+ "27 [] Tracey Broadman \n",
+ "28 [] Mike Stack \n",
+ "29 [] Robyn Kaufman \n",
+ "30 [Spacious, Balcony, Bright courtyard views] Paul Hwang \n",
+ "31 [New Construction, 1-car garage] Lennar \n",
+ "32 [New Construction, 1-car garage] Lennar \n",
+ "33 [New Construction, 1-car garage] Lennar \n",
+ "34 [Updated, Natural light] Cristal Wright \n",
+ "35 [] Mila Romprey \n",
+ "36 [] Genie Mantzoros \n",
+ "37 [] Shawn Zahraie \n",
+ "38 [] Janice Lee \n",
+ "39 [Land, 0.12 Acre, $251,467 per Acre] Heidy Carrera \n",
+ "\n",
+ " agency \n",
+ "0 Elevation Real Estate \n",
+ "1 eXp Realty of Northern CA Inc. \n",
+ "2 Elevation Real Estate \n",
+ "3 Corcoran Icon Properties \n",
+ "4 Corcoran Icon Properties \n",
+ "5 Elevation Real Estate \n",
+ "6 Rise Homes \n",
+ "7 Referral Realty-BV \n",
+ "8 Coldwell Banker Realty \n",
+ "9 Remax Prestigious Properties \n",
+ "10 Kinetic Real Estate \n",
+ "11 Compass \n",
+ "12 LKJ Realty \n",
+ "13 Statewide Realty \n",
+ "14 Compass \n",
+ "15 eXp Realty of California, Inc \n",
+ "16 Compass \n",
+ "17 Jackson Fuller Real Estate \n",
+ "18 IMPACT \n",
+ "19 NA \n",
+ "20 eXp Realty of California, Inc \n",
+ "21 NA \n",
+ "22 Coldwell Banker Realty \n",
+ "23 Coldwell Banker Realty \n",
+ "24 Dio Real Estate \n",
+ "25 Sequoia Real Estate \n",
+ "26 Redfin \n",
+ "27 Vanguard Properties \n",
+ "28 Vanguard Properties \n",
+ "29 Vivre Real Estate \n",
+ "30 Skybox Realty \n",
+ "31 Lennar \n",
+ "32 Lennar \n",
+ "33 Lennar \n",
+ "34 Vanguard Properties \n",
+ "35 Premier Realty Associates \n",
+ "36 Epic Real Estate & Asso. Inc. \n",
+ "37 Affinity Enterprises, Inc \n",
+ "38 Coldwell Banker Realty \n",
+ "39 Berkshire Hathaway HomeService "
+ ]
+ },
+ "metadata": {},
+ "output_type": "execute_result",
+ "execution_count": 8
+ }
+ ],
+ "metadata": {
+ "id": "1lS9O1KOI51y",
+ "colab": {
+ "height": 488,
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "16c95c43-2312-4c08-9d29-b3b95de080c9"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Save it to CSV",
+ "outputs": [],
+ "metadata": {
+ "id": "v0CBYVk7qA5Z"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "# Save the DataFrame to a CSV file\ncsv_file = \"houses_forsale.csv\"\ndf.to_csv(csv_file, index=False)\nprint(f\"Data saved to {csv_file}\")",
+ "outputs": [
+ {
+ "name": "stdout",
+ "text": [
+ "Data saved to houses_forsale.csv\n"
+ ],
+ "output_type": "stream"
+ }
+ ],
+ "metadata": {
+ "id": "BtEbB9pmQGhO",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "5b64dab7-65d7-42cd-cd08-1631e64b28eb"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "## 🔗 Resources",
+ "outputs": [],
+ "metadata": {
+ "id": "-1SZT8VzTZNd"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "\n
\n
\n
\n\n\n- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n- 🦜 **Langchain:** [ScrapeGraph docs](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n\nMade with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n",
+ "outputs": [],
+ "metadata": {
+ "id": "dUi2LtMLRDDR"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python",
+ "version": "3.10.14",
+ "mimetype": "text/x-python",
+ "file_extension": ".py",
+ "pygments_lexer": "ipython3",
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "nbconvert_exporter": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/cookbook/homes-forsale/scrapegraph_llama_index.ipynb b/cookbook/homes-forsale/scrapegraph_llama_index.ipynb
index 6880ff2..9119ec0 100644
--- a/cookbook/homes-forsale/scrapegraph_llama_index.ipynb
+++ b/cookbook/homes-forsale/scrapegraph_llama_index.ipynb
@@ -1,329 +1,270 @@
{
"cells": [
{
- "cell_type": "markdown",
- "metadata": {
- "id": "ReBHQ5_834pZ"
- },
- "source": [
- "
\n",
- "
\n",
- ""
- ]
- },
- {
- "cell_type": "markdown",
+ "source": "## 🕷️ Extract Houses Listing on Zillow with llama-index and ScrapegraphAI APIs\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-llama-index-1) [](https://colab.research.google.com/drive/1MlWu2HBwmvvnXWyezwl93hyApHOufAQi?usp=sharing)",
+ "outputs": [
+ {
+ "data": {
+ "text/plain": "## 🕷️ Extract Houses Listing on Zillow with llama-index and ScrapegraphAI APIs\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-llama-index-1) [](https://colab.research.google.com/drive/1MlWu2HBwmvvnXWyezwl93hyApHOufAQi?usp=sharing)",
+ "text/markdown": "## 🕷️ Extract Houses Listing on Zillow with llama-index and ScrapegraphAI APIs\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-llama-index-1) [](https://colab.research.google.com/drive/1MlWu2HBwmvvnXWyezwl93hyApHOufAQi?usp=sharing)"
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
"metadata": {
"id": "jEkuKbcRrPcK"
},
- "source": [
- "## 🕷️ Extract Houses Listing on Zillow with llama-index and ScrapegraphAI APIs"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "completed",
+ "execution_count": null,
+ "executionEndTime": "2026-03-26T00:09:56.460Z",
+ "executionStartTime": "2026-03-26T00:09:56.460Z"
},
{
- "cell_type": "markdown",
+ "source": "",
+ "outputs": [],
"metadata": {
"id": "5cVkde_LpVkF"
},
- "source": [
- ""
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 🔧 Install `dependencies`",
+ "outputs": [],
"metadata": {
"id": "IzsyDXEWwPVt"
},
- "source": [
- "### 🔧 Install `dependencies`"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
+ "source": "%%capture\n!pip install llama-index\n!pip install llama-index-tools-scrapegraphai",
+ "outputs": [],
"metadata": {
"id": "os_vm0MkIxr9"
},
- "outputs": [],
- "source": [
- "%%capture\n",
- "!pip install llama-index\n",
- "!pip install llama-index-tools-scrapegraphai"
- ]
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 🔑 Import `ScrapeGraph` API key",
+ "outputs": [],
"metadata": {
"id": "apBsL-L2KzM7"
},
- "source": [
- "### 🔑 Import `ScrapeGraph` API key"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)",
+ "outputs": [],
"metadata": {
"id": "ol9gQbAFkh9b"
},
- "source": [
- "You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "sffqFG2EJ8bI",
- "outputId": "c588274d-64ce-4d13-f12d-0458d9c4839d"
- },
+ "source": "import os\nfrom getpass import getpass\n\n# Check if the API key is already set in the environment\nsgai_api_key = os.getenv(\"SGAI_API_KEY\")\n\nif sgai_api_key:\n print(\"SGAI_API_KEY found in environment.\")\nelse:\n print(\"SGAI_API_KEY not found in environment.\")\n # Prompt the user to input the API key securely (hidden input)\n sgai_api_key = getpass(\"Please enter your SGAI_API_KEY: \").strip()\n if sgai_api_key:\n # Set the API key in the environment\n os.environ[\"SGAI_API_KEY\"] = sgai_api_key\n print(\"SGAI_API_KEY has been set in the environment.\")\n else:\n print(\"No API key entered. Please set the API key to continue.\")\n",
"outputs": [
{
"name": "stdout",
- "output_type": "stream",
"text": [
"SGAI_API_KEY not found in environment.\n",
"SGAI_API_KEY has been set in the environment.\n"
- ]
+ ],
+ "output_type": "stream"
}
],
- "source": [
- "import os\n",
- "from getpass import getpass\n",
- "\n",
- "# Check if the API key is already set in the environment\n",
- "sgai_api_key = os.getenv(\"SGAI_API_KEY\")\n",
- "\n",
- "if sgai_api_key:\n",
- " print(\"SGAI_API_KEY found in environment.\")\n",
- "else:\n",
- " print(\"SGAI_API_KEY not found in environment.\")\n",
- " # Prompt the user to input the API key securely (hidden input)\n",
- " sgai_api_key = getpass(\"Please enter your SGAI_API_KEY: \").strip()\n",
- " if sgai_api_key:\n",
- " # Set the API key in the environment\n",
- " os.environ[\"SGAI_API_KEY\"] = sgai_api_key\n",
- " print(\"SGAI_API_KEY has been set in the environment.\")\n",
- " else:\n",
- " print(\"No API key entered. Please set the API key to continue.\")\n"
- ]
+ "metadata": {
+ "id": "sffqFG2EJ8bI",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "c588274d-64ce-4d13-f12d-0458d9c4839d"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 📝 Defining an `Output Schema` for Webpage Content Extraction\n",
+ "outputs": [],
"metadata": {
"id": "jnqMB2-xVYQ7"
},
- "source": [
- "### 📝 Defining an `Output Schema` for Webpage Content Extraction\n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n\n
\n Pydantic Schema Quick Guide
\n\nTypes of Schemas \n\n1. Simple Schema \nUse this when you want to extract straightforward information, such as a single piece of content. \n\n```python\nfrom pydantic import BaseModel, Field\n\n# Simple schema for a single webpage\nclass PageInfoSchema(BaseModel):\n title: str = Field(description=\"The title of the webpage\")\n description: str = Field(description=\"The description of the webpage\")\n\n# Example Output JSON after AI extraction\n{\n \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n}\n```\n\n2. Complex Schema (Nested) \nIf you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n\n```python\nfrom pydantic import BaseModel, Field\nfrom typing import List\n\n# Define a schema for a single repository\nclass RepositorySchema(BaseModel):\n name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n description: str = Field(description=\"Description of the repository\")\n stars: int = Field(description=\"Star count of the repository\")\n forks: int = Field(description=\"Fork count of the repository\")\n today_stars: int = Field(description=\"Stars gained today\")\n language: str = Field(description=\"Programming language used\")\n\n# Define a schema for a list of repositories\nclass ListRepositoriesSchema(BaseModel):\n repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n\n# Example Output JSON after AI extraction\n{\n \"repositories\": [\n {\n \"name\": \"google-gemini/cookbook\",\n \"description\": \"Examples and guides for using the Gemini API\",\n \"stars\": 8036,\n \"forks\": 1001,\n \"today_stars\": 649,\n \"language\": \"Jupyter Notebook\"\n },\n {\n \"name\": \"TEN-framework/TEN-Agent\",\n \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n \"stars\": 3224,\n \"forks\": 311,\n \"today_stars\": 361,\n \"language\": \"Python\"\n }\n ]\n}\n```\n\nKey Takeaways \n- **Simple Schema**: Perfect for small, straightforward extractions. \n- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n\nBoth approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n \n",
+ "outputs": [],
"metadata": {
"id": "VZvxbjfXvbgd"
},
- "source": [
- "If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n",
- "\n",
- "
\n",
- " Pydantic Schema Quick Guide
\n",
- "\n",
- "Types of Schemas \n",
- "\n",
- "1. Simple Schema \n",
- "Use this when you want to extract straightforward information, such as a single piece of content. \n",
- "\n",
- "```python\n",
- "from pydantic import BaseModel, Field\n",
- "\n",
- "# Simple schema for a single webpage\n",
- "class PageInfoSchema(BaseModel):\n",
- " title: str = Field(description=\"The title of the webpage\")\n",
- " description: str = Field(description=\"The description of the webpage\")\n",
- "\n",
- "# Example Output JSON after AI extraction\n",
- "{\n",
- " \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n",
- " \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n",
- "}\n",
- "```\n",
- "\n",
- "2. Complex Schema (Nested) \n",
- "If you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n",
- "\n",
- "```python\n",
- "from pydantic import BaseModel, Field\n",
- "from typing import List\n",
- "\n",
- "# Define a schema for a single repository\n",
- "class RepositorySchema(BaseModel):\n",
- " name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n",
- " description: str = Field(description=\"Description of the repository\")\n",
- " stars: int = Field(description=\"Star count of the repository\")\n",
- " forks: int = Field(description=\"Fork count of the repository\")\n",
- " today_stars: int = Field(description=\"Stars gained today\")\n",
- " language: str = Field(description=\"Programming language used\")\n",
- "\n",
- "# Define a schema for a list of repositories\n",
- "class ListRepositoriesSchema(BaseModel):\n",
- " repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n",
- "\n",
- "# Example Output JSON after AI extraction\n",
- "{\n",
- " \"repositories\": [\n",
- " {\n",
- " \"name\": \"google-gemini/cookbook\",\n",
- " \"description\": \"Examples and guides for using the Gemini API\",\n",
- " \"stars\": 8036,\n",
- " \"forks\": 1001,\n",
- " \"today_stars\": 649,\n",
- " \"language\": \"Jupyter Notebook\"\n",
- " },\n",
- " {\n",
- " \"name\": \"TEN-framework/TEN-Agent\",\n",
- " \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n",
- " \"stars\": 3224,\n",
- " \"forks\": 311,\n",
- " \"today_stars\": 361,\n",
- " \"language\": \"Python\"\n",
- " }\n",
- " ]\n",
- "}\n",
- "```\n",
- "\n",
- "Key Takeaways \n",
- "- **Simple Schema**: Perfect for small, straightforward extractions. \n",
- "- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n",
- "\n",
- "Both approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n",
- " \n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
+ "source": "from pydantic import BaseModel, Field\nfrom typing import List\n\nclass HouseListingSchema(BaseModel):\n price: int = Field(description=\"Price of the house in USD\")\n bedrooms: int = Field(description=\"Number of bedrooms\")\n bathrooms: int = Field(description=\"Number of bathrooms\")\n square_feet: int = Field(description=\"Total square footage of the house\")\n address: str = Field(description=\"Address of the house\")\n city: str = Field(description=\"City where the house is located\")\n state: str = Field(description=\"State where the house is located\")\n zip_code: str = Field(description=\"ZIP code of the house location\")\n tags: List[str] = Field(description=\"Tags like 'New construction' or 'Large garage'\")\n agent_name: str = Field(description=\"Name of the listing agent\")\n agency: str = Field(description=\"Agency listing the house\")\n\n# Schema containing a list of house listings\nclass HousesListingsSchema(BaseModel):\n houses: List[HouseListingSchema] = Field(description=\"List of house listings on Zillow or similar platforms\")\n",
+ "outputs": [],
"metadata": {
"id": "dlrOEgZk_8V4"
},
- "outputs": [],
- "source": [
- "from pydantic import BaseModel, Field\n",
- "from typing import List\n",
- "\n",
- "class HouseListingSchema(BaseModel):\n",
- " price: int = Field(description=\"Price of the house in USD\")\n",
- " bedrooms: int = Field(description=\"Number of bedrooms\")\n",
- " bathrooms: int = Field(description=\"Number of bathrooms\")\n",
- " square_feet: int = Field(description=\"Total square footage of the house\")\n",
- " address: str = Field(description=\"Address of the house\")\n",
- " city: str = Field(description=\"City where the house is located\")\n",
- " state: str = Field(description=\"State where the house is located\")\n",
- " zip_code: str = Field(description=\"ZIP code of the house location\")\n",
- " tags: List[str] = Field(description=\"Tags like 'New construction' or 'Large garage'\")\n",
- " agent_name: str = Field(description=\"Name of the listing agent\")\n",
- " agency: str = Field(description=\"Agency listing the house\")\n",
- "\n",
- "# Schema containing a list of house listings\n",
- "class HousesListingsSchema(BaseModel):\n",
- " houses: List[HouseListingSchema] = Field(description=\"List of house listings on Zillow or similar platforms\")\n"
- ]
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 🚀 Initialize `ScrapegraphToolSpec` tools and start extraction",
+ "outputs": [],
"metadata": {
"id": "cDGH0b2DkY63"
},
- "source": [
- "### 🚀 Initialize `ScrapegraphToolSpec` tools and start extraction"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "Here we use `SmartScraperTool` to extract structured data using AI from a webpage.\n\n\n> If you already have an HTML file, you can upload it and use `LocalScraperTool` instead.\n\nYou can find more info in the [official langchain documentation](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n\n",
+ "outputs": [],
"metadata": {
"id": "M1KSXffZopUD"
},
- "source": [
- "Here we use `SmartScraperTool` to extract structured data using AI from a webpage.\n",
- "\n",
- "\n",
- "> If you already have an HTML file, you can upload it and use `LocalScraperTool` instead.\n",
- "\n",
- "You can find more info in the [official langchain documentation](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n",
- "\n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
+ "source": "from llama_index.tools.scrapegraph.base import ScrapegraphToolSpec\n\nscrapegraph_tool = ScrapegraphToolSpec()",
+ "outputs": [],
"metadata": {
"id": "p2BFhL53ore1"
},
- "outputs": [],
- "source": [
- "from llama_index.tools.scrapegraph.base import ScrapegraphToolSpec\n",
- "\n",
- "scrapegraph_tool = ScrapegraphToolSpec()"
- ]
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "`Invoke` the tool",
+ "outputs": [],
"metadata": {
"id": "iCOcvpuOoubk"
},
- "source": [
- "`Invoke` the tool"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
+ "source": "response = scrapegraph_tool.scrapegraph_smartscraper(\n prompt=\"Extract information about houses for sale\",\n url=\"https://www.zillow.com/san-francisco-ca/\",\n api_key=os.getenv(\"SGAI_API_KEY\"),\n schema=HousesListingsSchema,\n)",
+ "outputs": [],
"metadata": {
"id": "2FIKomclLNFx"
},
- "outputs": [],
- "source": [
- "response = scrapegraph_tool.scrapegraph_smartscraper(\n",
- " prompt=\"Extract information about houses for sale\",\n",
- " url=\"https://www.zillow.com/san-francisco-ca/\",\n",
- " api_key=os.getenv(\"SGAI_API_KEY\"),\n",
- " schema=HousesListingsSchema,\n",
- ")"
- ]
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "> As you may have noticed, we are not passing the `llm_output_schema` while invoking the tool, this will make life easier to `AI agents` since they will not need to generate one themselves with high risk of failure. Instead, we force the tool to return always a structured output that follows your previously defined schema. To find out more, check the following [README](https://github.com/ScrapeGraphAI/langchain-scrapegraph)\n",
+ "outputs": [],
"metadata": {
"id": "gR2UZZwzo9Sn"
},
- "source": [
- "> As you may have noticed, we are not passing the `llm_output_schema` while invoking the tool, this will make life easier to `AI agents` since they will not need to generate one themselves with high risk of failure. Instead, we force the tool to return always a structured output that follows your previously defined schema. To find out more, check the following [README](https://github.com/ScrapeGraphAI/langchain-scrapegraph)\n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "Print the response",
+ "outputs": [],
"metadata": {
"id": "YZz1bqCIpoL8"
},
- "source": [
- "Print the response"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "F1VfD8B4LPc8",
- "outputId": "00597cd7-bac8-4af1-a5fe-d88d2f0ffa8d"
- },
+ "source": "import json\n\nprint(\"Trending Repositories:\")\nprint(json.dumps(response, indent=2))",
"outputs": [
{
"name": "stdout",
- "output_type": "stream",
"text": [
"Trending Repositories:\n",
"{\n",
@@ -464,45 +405,55 @@
" },\n",
" \"error\": \"\"\n",
"}\n"
- ]
+ ],
+ "output_type": "stream"
}
],
- "source": [
- "import json\n",
- "\n",
- "print(\"Trending Repositories:\")\n",
- "print(json.dumps(response, indent=2))"
- ]
+ "metadata": {
+ "id": "F1VfD8B4LPc8",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "00597cd7-bac8-4af1-a5fe-d88d2f0ffa8d"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 💾 Save the output to a `CSV` file",
+ "outputs": [],
"metadata": {
"id": "2as65QLypwdb"
},
- "source": [
- "### 💾 Save the output to a `CSV` file"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "Let's create a pandas dataframe and show the table with the extracted content",
+ "outputs": [],
"metadata": {
"id": "HTLVFgbVLLBR"
},
- "source": [
- "Let's create a pandas dataframe and show the table with the extracted content"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/",
- "height": 488
- },
- "id": "1lS9O1KOI51y",
- "outputId": "16c95c43-2312-4c08-9d29-b3b95de080c9"
- },
+ "source": "import pandas as pd\n\n# Convert dictionary to DataFrame\ndf = pd.DataFrame(response[\"result\"][\"houses\"])\ndf",
"outputs": [
{
"data": {
@@ -704,73 +655,82 @@
"8 Lynn Anne Bell CHRISTIE'S INT'L R.E. SF "
]
},
- "execution_count": 10,
"metadata": {},
- "output_type": "execute_result"
+ "output_type": "execute_result",
+ "execution_count": 10
}
],
- "source": [
- "import pandas as pd\n",
- "\n",
- "# Convert dictionary to DataFrame\n",
- "df = pd.DataFrame(response[\"result\"][\"houses\"])\n",
- "df"
- ]
+ "metadata": {
+ "id": "1lS9O1KOI51y",
+ "colab": {
+ "height": 488,
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "16c95c43-2312-4c08-9d29-b3b95de080c9"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "Save it to CSV",
+ "outputs": [],
"metadata": {
"id": "v0CBYVk7qA5Z"
},
- "source": [
- "Save it to CSV"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
+ "source": "# Save the DataFrame to a CSV file\ncsv_file = \"zillow_forsale.csv\"\ndf.to_csv(csv_file, index=False)\nprint(f\"Data saved to {csv_file}\")",
+ "outputs": [],
"metadata": {
"id": "BtEbB9pmQGhO"
},
- "outputs": [],
- "source": [
- "# Save the DataFrame to a CSV file\n",
- "csv_file = \"zillow_forsale.csv\"\n",
- "df.to_csv(csv_file, index=False)\n",
- "print(f\"Data saved to {csv_file}\")"
- ]
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "## 🔗 Resources",
+ "outputs": [],
"metadata": {
"id": "-1SZT8VzTZNd"
},
- "source": [
- "## 🔗 Resources"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "\n
\n
\n
\n\n\n- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n- 🦙 **LlamaIndex:** [ScrapeGraph docs](https://docs.llamaindex.ai/en/stable/api_reference/tools/scrapegraph/)\n\nMade with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n",
+ "outputs": [],
"metadata": {
"id": "dUi2LtMLRDDR"
},
- "source": [
- "\n",
- "
\n",
- "
\n",
- "
\n",
- "\n",
- "\n",
- "- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n",
- "- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n",
- "- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n",
- "- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n",
- "- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n",
- "- 🦙 **LlamaIndex:** [ScrapeGraph docs](https://docs.llamaindex.ai/en/stable/api_reference/tools/scrapegraph/)\n",
- "\n",
- "Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
}
],
"metadata": {
@@ -778,22 +738,22 @@
"provenance": []
},
"kernelspec": {
- "display_name": "Python 3",
- "name": "python3"
+ "name": "python3",
+ "display_name": "Python 3"
},
"language_info": {
+ "name": "python",
+ "version": "3.10.14",
+ "mimetype": "text/x-python",
+ "file_extension": ".py",
+ "pygments_lexer": "ipython3",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.10.14"
+ "nbconvert_exporter": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
-}
+}
\ No newline at end of file
diff --git a/cookbook/homes-forsale/scrapegraph_sdk.ipynb b/cookbook/homes-forsale/scrapegraph_sdk.ipynb
index 0ea410c..1ae276b 100644
--- a/cookbook/homes-forsale/scrapegraph_sdk.ipynb
+++ b/cookbook/homes-forsale/scrapegraph_sdk.ipynb
@@ -1 +1,1747 @@
-{"cells":[{"cell_type":"markdown","metadata":{"id":"ReBHQ5_834pZ"},"source":["
\n","
\n",""]},{"cell_type":"markdown","metadata":{"id":"jEkuKbcRrPcK"},"source":["## 🕷️ Extract Houses Listing with Official Scrapegraph SDK"]},{"cell_type":"markdown","metadata":{"id":"8vZBkAWLq9C1"},"source":[""]},{"cell_type":"markdown","metadata":{"id":"IzsyDXEWwPVt"},"source":["### 🔧 Install `dependencies`"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"os_vm0MkIxr9"},"outputs":[],"source":["%%capture\n","!pip install scrapegraph-py"]},{"cell_type":"markdown","metadata":{"id":"apBsL-L2KzM7"},"source":["### 🔑 Import `ScrapeGraph` API key"]},{"cell_type":"markdown","metadata":{"id":"ol9gQbAFkh9b"},"source":["You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"sffqFG2EJ8bI","outputId":"18dfce64-db37-4825-d316-fabd064100d0"},"outputs":[{"name":"stdout","output_type":"stream","text":["SGAI_API_KEY found in environment.\n"]}],"source":["import getpass\n","import os\n","\n","if not os.environ.get(\"SGAI_API_KEY\"):\n"," os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")"]},{"cell_type":"markdown","metadata":{"id":"jnqMB2-xVYQ7"},"source":["### 📝 Defining an `Output Schema` for Webpage Content Extraction\n"]},{"cell_type":"markdown","metadata":{"id":"VZvxbjfXvbgd"},"source":["If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n","\n","
\n"," Pydantic Schema Quick Guide
\n","\n","Types of Schemas \n","\n","1. Simple Schema \n","Use this when you want to extract straightforward information, such as a single piece of content. \n","\n","```python\n","from pydantic import BaseModel, Field\n","\n","# Simple schema for a single webpage\n","class PageInfoSchema(BaseModel):\n"," title: str = Field(description=\"The title of the webpage\")\n"," description: str = Field(description=\"The description of the webpage\")\n","\n","# Example Output JSON after AI extraction\n","{\n"," \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n"," \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n","}\n","```\n","\n","2. Complex Schema (Nested) \n","If you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n","\n","```python\n","from pydantic import BaseModel, Field\n","from typing import List\n","\n","# Define a schema for a single repository\n","class RepositorySchema(BaseModel):\n"," name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n"," description: str = Field(description=\"Description of the repository\")\n"," stars: int = Field(description=\"Star count of the repository\")\n"," forks: int = Field(description=\"Fork count of the repository\")\n"," today_stars: int = Field(description=\"Stars gained today\")\n"," language: str = Field(description=\"Programming language used\")\n","\n","# Define a schema for a list of repositories\n","class ListRepositoriesSchema(BaseModel):\n"," repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n","\n","# Example Output JSON after AI extraction\n","{\n"," \"repositories\": [\n"," {\n"," \"name\": \"google-gemini/cookbook\",\n"," \"description\": \"Examples and guides for using the Gemini API\",\n"," \"stars\": 8036,\n"," \"forks\": 1001,\n"," \"today_stars\": 649,\n"," \"language\": \"Jupyter Notebook\"\n"," },\n"," {\n"," \"name\": \"TEN-framework/TEN-Agent\",\n"," \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n"," \"stars\": 3224,\n"," \"forks\": 311,\n"," \"today_stars\": 361,\n"," \"language\": \"Python\"\n"," }\n"," ]\n","}\n","```\n","\n","Key Takeaways \n","- **Simple Schema**: Perfect for small, straightforward extractions. \n","- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n","\n","Both approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n"," \n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"dlrOEgZk_8V4"},"outputs":[],"source":["from pydantic import BaseModel, Field\n","from typing import List, Optional\n","\n","# Schema for a single house listing\n","class HouseSchema(BaseModel):\n"," price: int = Field(description=\"Price of the house in USD\")\n"," bedrooms: int = Field(description=\"Number of bedrooms\")\n"," bathrooms: int = Field(description=\"Number of bathrooms\")\n"," square_feet: int = Field(description=\"Total square footage of the house\")\n"," address: str = Field(description=\"Address of the house\")\n"," city: str = Field(description=\"City where the house is located\")\n"," state: str = Field(description=\"State where the house is located\")\n"," zip_code: str = Field(description=\"ZIP code of the house location\")\n"," tags: List[str] = Field(description=\"Tags like 'New construction' or 'Large garage'\")\n"," agent_name: str = Field(description=\"Name of the listing agent. If not present or not sure write NA.\")\n"," agency: str = Field(description=\"Agency listing the house. If not present or not sure write NA.\")\n","\n","# Schema containing a list of house listings\n","class HouseListingsSchema(BaseModel):\n"," houses: List[HouseSchema] = Field(description=\"List of house listings on Homes or similar platforms\")\n"]},{"cell_type":"markdown","metadata":{"id":"cDGH0b2DkY63"},"source":["### 🚀 Initialize `SGAI Client` and start extraction"]},{"cell_type":"markdown","metadata":{"id":"4SLJgXgcob6L"},"source":["Initialize the client for scraping (there's also an async version [here](https://github.com/ScrapeGraphAI/scrapegraph-sdk/blob/main/scrapegraph-py/examples/async_smartscraper_example.py))"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"PQI25GZvoCSk"},"outputs":[],"source":["from scrapegraph_py import Client\n","\n","# Initialize the client with explicit API key\n","sgai_client = Client(api_key=sgai_api_key, timeout=240)"]},{"cell_type":"markdown","metadata":{"id":"M1KSXffZopUD"},"source":["Here we use `Smartscraper` service to extract structured data using AI from a webpage.\n","\n","\n","> If you already have an HTML file, you can upload it and use `Localscraper` instead.\n","\n","\n","\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"2FIKomclLNFx"},"outputs":[],"source":["# Request for Trending Repositories\n","repo_response = sgai_client.smartscraper(\n"," website_url=\"https://www.homes.com/san-francisco-ca/?bb=nzpwspy0mS749snkvsb\",\n"," user_prompt=\"Extract info about the houses visible on the page\",\n"," output_schema=HouseListingsSchema,\n",")"]},{"cell_type":"markdown","metadata":{"id":"YZz1bqCIpoL8"},"source":["Print the response"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"F1VfD8B4LPc8","outputId":"1e849a65-6713-486c-e306-bb7c26db4bf9"},"outputs":[{"name":"stdout","output_type":"stream","text":["Request ID: 4e023916-2a41-40ea-bea5-efc422daf33e\n","{\n"," \"houses\": [\n"," {\n"," \"price\": 549000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 477,\n"," \"address\": \"380 14th St Unit 405\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94103\",\n"," \"tags\": [\n"," \"New construction\"\n"," ],\n"," \"agent_name\": \"Eddie O'Sullivan\",\n"," \"agency\": \"Elevation Real Estate\"\n"," },\n"," {\n"," \"price\": 1799000,\n"," \"bedrooms\": 4,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 2735,\n"," \"address\": \"123 Grattan St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94117\",\n"," \"tags\": [],\n"," \"agent_name\": \"Sean Engmann\",\n"," \"agency\": \"eXp Realty of Northern CA Inc.\"\n"," },\n"," {\n"," \"price\": 1995000,\n"," \"bedrooms\": 7,\n"," \"bathrooms\": 3,\n"," \"square_feet\": 3330,\n"," \"address\": \"1590 Washington St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94109\",\n"," \"tags\": [],\n"," \"agent_name\": \"Eddie O'Sullivan\",\n"," \"agency\": \"Elevation Real Estate\"\n"," },\n"," {\n"," \"price\": 549000,\n"," \"bedrooms\": 0,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 477,\n"," \"address\": \"240 Lombard St Unit 835\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94111\",\n"," \"tags\": [],\n"," \"agent_name\": \"Tim Gullicksen\",\n"," \"agency\": \"Corcoran Icon Properties\"\n"," },\n"," {\n"," \"price\": 5495000,\n"," \"bedrooms\": 10,\n"," \"bathrooms\": 7,\n"," \"square_feet\": 6505,\n"," \"address\": \"1057 Steiner St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94115\",\n"," \"tags\": [],\n"," \"agent_name\": \"Bonnie Spindler\",\n"," \"agency\": \"Corcoran Icon Properties\"\n"," },\n"," {\n"," \"price\": 925000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 779,\n"," \"address\": \"2 Fallon Place Unit 57\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94133\",\n"," \"tags\": [],\n"," \"agent_name\": \"Eddie O'Sullivan\",\n"," \"agency\": \"Elevation Real Estate\"\n"," },\n"," {\n"," \"price\": 898000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1175,\n"," \"address\": \"5160 Diamond Heights Blvd Unit 208C\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94131\",\n"," \"tags\": [],\n"," \"agent_name\": \"Joe Polyak\",\n"," \"agency\": \"Rise Homes\"\n"," },\n"," {\n"," \"price\": 1700000,\n"," \"bedrooms\": 4,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1950,\n"," \"address\": \"1351 26th Ave\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94122\",\n"," \"tags\": [],\n"," \"agent_name\": \"Glenda Queensbury\",\n"," \"agency\": \"Referral Realty-BV\"\n"," },\n"," {\n"," \"price\": 1899000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1560,\n"," \"address\": \"340 Yerba Buena Ave\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94127\",\n"," \"tags\": [],\n"," \"agent_name\": \"Jeannie Anderson\",\n"," \"agency\": \"Coldwell Banker Realty\"\n"," },\n"," {\n"," \"price\": 850000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1055,\n"," \"address\": \"588 Minna Unit 604\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94103\",\n"," \"tags\": [],\n"," \"agent_name\": \"Mohamed Lakdawala\",\n"," \"agency\": \"Remax Prestigious Properties\"\n"," },\n"," {\n"," \"price\": 1990000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1280,\n"," \"address\": \"1450 Diamond St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94131\",\n"," \"tags\": [],\n"," \"agent_name\": \"Mary Anne Villamil\",\n"," \"agency\": \"Kinetic Real Estate\"\n"," },\n"," {\n"," \"price\": 849000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 855,\n"," \"address\": \"81 Lansing St Unit 401\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94105\",\n"," \"tags\": [],\n"," \"agent_name\": \"Kristen Haenggi\",\n"," \"agency\": \"Compass\"\n"," },\n"," {\n"," \"price\": 1080000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 936,\n"," \"address\": \"451 Kansas St Unit 466\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94107\",\n"," \"tags\": [],\n"," \"agent_name\": \"Maureen DeBoer\",\n"," \"agency\": \"LKJ Realty\"\n"," },\n"," {\n"," \"price\": 1499000,\n"," \"bedrooms\": 4,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 2145,\n"," \"address\": \"486 Yale St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94134\",\n"," \"tags\": [],\n"," \"agent_name\": \"Alicia Atienza\",\n"," \"agency\": \"Statewide Realty\"\n"," },\n"," {\n"," \"price\": 1140000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 998,\n"," \"address\": \"588 Minna Unit 801\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94103\",\n"," \"tags\": [],\n"," \"agent_name\": \"Milan Jezdimirovic\",\n"," \"agency\": \"Compass\"\n"," },\n"," {\n"," \"price\": 1988000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 3800,\n"," \"address\": \"183 19th Ave\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94121\",\n"," \"tags\": [\n"," \"Amazing Property\",\n"," \"Marina Style\",\n"," \"Needs TLC\"\n"," ],\n"," \"agent_name\": \"Leo Cheung\",\n"," \"agency\": \"eXp Realty of California, Inc\"\n"," },\n"," {\n"," \"price\": 1218000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1275,\n"," \"address\": \"1998 Pacific Ave Unit 202\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94109\",\n"," \"tags\": [\n"," \"Light-filled\",\n"," \"Freshly painted\",\n"," \"Walker's paradise\"\n"," ],\n"," \"agent_name\": \"Grace Sun\",\n"," \"agency\": \"Compass\"\n"," },\n"," {\n"," \"price\": 895000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 837,\n"," \"address\": \"425 1st St Unit 2501\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94105\",\n"," \"tags\": [\n"," \"Unobstructed bay bridge views\",\n"," \"Open layout\"\n"," ],\n"," \"agent_name\": \"Matt Fuller\",\n"," \"agency\": \"Jackson Fuller Real Estate\"\n"," },\n"," {\n"," \"price\": 1499000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1500,\n"," \"address\": \"Unlisted Address\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"NA\",\n"," \"tags\": [\n"," \"Contractor's Special\",\n"," \"Fixer-upper\"\n"," ],\n"," \"agent_name\": \"Jaymee Faith Sagisi\",\n"," \"agency\": \"IMPACT\"\n"," },\n"," {\n"," \"price\": 900000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 930,\n"," \"address\": \"1101 Green St Unit 302\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94109\",\n"," \"tags\": [\n"," \"Historic Art Deco\",\n"," \"Iconic views\"\n"," ],\n"," \"agent_name\": \"NA\",\n"," \"agency\": \"NA\"\n"," },\n"," {\n"," \"price\": 858000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1104,\n"," \"address\": \"260 King St Unit 557\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94107\",\n"," \"tags\": [],\n"," \"agent_name\": \"Miyuki Takami\",\n"," \"agency\": \"eXp Realty of California, Inc\"\n"," },\n"," {\n"," \"price\": 945000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 767,\n"," \"address\": \"307 Page St Unit 1\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94102\",\n"," \"tags\": [],\n"," \"agent_name\": \"NA\",\n"," \"agency\": \"NA\"\n"," },\n"," {\n"," \"price\": 1099000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1330,\n"," \"address\": \"1080 Sutter St Unit 202\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94109\",\n"," \"tags\": [],\n"," \"agent_name\": \"Annette Liberty\",\n"," \"agency\": \"Coldwell Banker Realty\"\n"," },\n"," {\n"," \"price\": 950000,\n"," \"bedrooms\": 4,\n"," \"bathrooms\": 3,\n"," \"square_feet\": 2090,\n"," \"address\": \"3328 26th St Unit 3330\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94110\",\n"," \"tags\": [],\n"," \"agent_name\": \"Isaac Munene\",\n"," \"agency\": \"Coldwell Banker Realty\"\n"," },\n"," {\n"," \"price\": 1088000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1065,\n"," \"address\": \"1776 Sacramento St Unit 503\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94109\",\n"," \"tags\": [],\n"," \"agent_name\": \"Marilyn Becklehimer\",\n"," \"agency\": \"Dio Real Estate\"\n"," },\n"," {\n"," \"price\": 1788888,\n"," \"bedrooms\": 4,\n"," \"bathrooms\": 3,\n"," \"square_feet\": 1856,\n"," \"address\": \"2317 15th St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94114\",\n"," \"tags\": [],\n"," \"agent_name\": \"Joel Gile\",\n"," \"agency\": \"Sequoia Real Estate\"\n"," },\n"," {\n"," \"price\": 1650000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1547,\n"," \"address\": \"2475 47th Ave\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94116\",\n"," \"tags\": [],\n"," \"agent_name\": \"Lucy Goldenshteyn\",\n"," \"agency\": \"Redfin\"\n"," },\n"," {\n"," \"price\": 998000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1202,\n"," \"address\": \"50 Lansing St Unit 201\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94105\",\n"," \"tags\": [],\n"," \"agent_name\": \"Tracey Broadman\",\n"," \"agency\": \"Vanguard Properties\"\n"," },\n"," {\n"," \"price\": 1595000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 5,\n"," \"square_feet\": 1995,\n"," \"address\": \"15 Joy St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94110\",\n"," \"tags\": [],\n"," \"agent_name\": \"Mike Stack\",\n"," \"agency\": \"Vanguard Properties\"\n"," },\n"," {\n"," \"price\": 1028000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1065,\n"," \"address\": \"50 Lansing St Unit 403\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94105\",\n"," \"tags\": [],\n"," \"agent_name\": \"Robyn Kaufman\",\n"," \"agency\": \"Vivre Real Estate\"\n"," },\n"," {\n"," \"price\": 999000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1021,\n"," \"address\": \"338 Spear St Unit 6J\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94105\",\n"," \"tags\": [\n"," \"Spacious\",\n"," \"Balcony\",\n"," \"Bright courtyard views\"\n"," ],\n"," \"agent_name\": \"Paul Hwang\",\n"," \"agency\": \"Skybox Realty\"\n"," },\n"," {\n"," \"price\": 799800,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1109,\n"," \"address\": \"10 Innes Ct\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94124\",\n"," \"tags\": [\n"," \"New Construction\"\n"," ],\n"," \"agent_name\": \"Lennar\",\n"," \"agency\": \"Lennar\"\n"," },\n"," {\n"," \"price\": 529880,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 740,\n"," \"address\": \"10 Innes Ct\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94124\",\n"," \"tags\": [\n"," \"New Construction\"\n"," ],\n"," \"agent_name\": \"Lennar\",\n"," \"agency\": \"Lennar\"\n"," },\n"," {\n"," \"price\": 489000,\n"," \"bedrooms\": 1,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 741,\n"," \"address\": \"10 Innes Ct\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94124\",\n"," \"tags\": [\n"," \"New Construction\"\n"," ],\n"," \"agent_name\": \"Lennar\",\n"," \"agency\": \"Lennar\"\n"," },\n"," {\n"," \"price\": 1359000,\n"," \"bedrooms\": 4,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1845,\n"," \"address\": \"170 Thrift St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94112\",\n"," \"tags\": [\n"," \"Updated\",\n"," \"Single-family home\"\n"," ],\n"," \"agent_name\": \"Cristal Wright\",\n"," \"agency\": \"Vanguard Properties\"\n"," },\n"," {\n"," \"price\": 1295000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1214,\n"," \"address\": \"1922 43rd Ave\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94116\",\n"," \"tags\": [],\n"," \"agent_name\": \"Mila Romprey\",\n"," \"agency\": \"Premier Realty Associates\"\n"," },\n"," {\n"," \"price\": 1098000,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1006,\n"," \"address\": \"150 Putnam St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94110\",\n"," \"tags\": [],\n"," \"agent_name\": \"Genie Mantzoros\",\n"," \"agency\": \"Epic Real Estate & Asso. Inc.\"\n"," },\n"," {\n"," \"price\": 1189870,\n"," \"bedrooms\": 3,\n"," \"bathrooms\": 2,\n"," \"square_feet\": 1436,\n"," \"address\": \"327 Ordway St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94134\",\n"," \"tags\": [],\n"," \"agent_name\": \"Shawn Zahraie\",\n"," \"agency\": \"Affinity Enterprises, Inc\"\n"," },\n"," {\n"," \"price\": 899000,\n"," \"bedrooms\": 2,\n"," \"bathrooms\": 1,\n"," \"square_feet\": 1118,\n"," \"address\": \"272 Farallones St\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94112\",\n"," \"tags\": [],\n"," \"agent_name\": \"Janice Lee\",\n"," \"agency\": \"Coldwell Banker Realty\"\n"," },\n"," {\n"," \"price\": 30000,\n"," \"bedrooms\": 0,\n"," \"bathrooms\": 0,\n"," \"square_feet\": 0,\n"," \"address\": \"0 Evans Ave\",\n"," \"city\": \"San Francisco\",\n"," \"state\": \"CA\",\n"," \"zip_code\": \"94124\",\n"," \"tags\": [\n"," \"Land\",\n"," \"0.12 Acre\",\n"," \"$251,467 per Acre\"\n"," ],\n"," \"agent_name\": \"Heidy Carrera\",\n"," \"agency\": \"Berkshire Hathaway HomeService\"\n"," }\n"," ]\n","}\n"]}],"source":["import json\n","\n","# Print the response\n","request_id = repo_response['request_id']\n","result = repo_response['result']\n","\n","print(f\"Request ID: {request_id}\")\n","print(json.dumps(result, indent=2))"]},{"cell_type":"markdown","metadata":{"id":"2as65QLypwdb"},"source":["### 💾 Save the output to a `CSV` file"]},{"cell_type":"markdown","metadata":{"id":"HTLVFgbVLLBR"},"source":["Let's create a pandas dataframe and show the table with the extracted content"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"1lS9O1KOI51y","outputId":"89fe200c-deca-45b1-be2e-6cf3e9f97fe2"},"outputs":[{"data":{"text/html":["
\n","\n","
\n"," \n"," \n"," | \n"," price | \n"," bedrooms | \n"," bathrooms | \n"," square_feet | \n"," address | \n"," city | \n"," state | \n"," zip_code | \n"," tags | \n"," agent_name | \n"," agency | \n","
\n"," \n"," \n"," \n"," | 0 | \n"," 549000 | \n"," 1 | \n"," 1 | \n"," 477 | \n"," 380 14th St Unit 405 | \n"," San Francisco | \n"," CA | \n"," 94103 | \n"," [New construction] | \n"," Eddie O'Sullivan | \n"," Elevation Real Estate | \n","
\n"," \n"," | 1 | \n"," 1799000 | \n"," 4 | \n"," 2 | \n"," 2735 | \n"," 123 Grattan St | \n"," San Francisco | \n"," CA | \n"," 94117 | \n"," [] | \n"," Sean Engmann | \n"," eXp Realty of Northern CA Inc. | \n","
\n"," \n"," | 2 | \n"," 1995000 | \n"," 7 | \n"," 3 | \n"," 3330 | \n"," 1590 Washington St | \n"," San Francisco | \n"," CA | \n"," 94109 | \n"," [] | \n"," Eddie O'Sullivan | \n"," Elevation Real Estate | \n","
\n"," \n"," | 3 | \n"," 549000 | \n"," 0 | \n"," 1 | \n"," 477 | \n"," 240 Lombard St Unit 835 | \n"," San Francisco | \n"," CA | \n"," 94111 | \n"," [] | \n"," Tim Gullicksen | \n"," Corcoran Icon Properties | \n","
\n"," \n"," | 4 | \n"," 5495000 | \n"," 10 | \n"," 7 | \n"," 6505 | \n"," 1057 Steiner St | \n"," San Francisco | \n"," CA | \n"," 94115 | \n"," [] | \n"," Bonnie Spindler | \n"," Corcoran Icon Properties | \n","
\n"," \n"," | 5 | \n"," 925000 | \n"," 2 | \n"," 1 | \n"," 779 | \n"," 2 Fallon Place Unit 57 | \n"," San Francisco | \n"," CA | \n"," 94133 | \n"," [] | \n"," Eddie O'Sullivan | \n"," Elevation Real Estate | \n","
\n"," \n"," | 6 | \n"," 898000 | \n"," 2 | \n"," 2 | \n"," 1175 | \n"," 5160 Diamond Heights Blvd Unit 208C | \n"," San Francisco | \n"," CA | \n"," 94131 | \n"," [] | \n"," Joe Polyak | \n"," Rise Homes | \n","
\n"," \n"," | 7 | \n"," 1700000 | \n"," 4 | \n"," 2 | \n"," 1950 | \n"," 1351 26th Ave | \n"," San Francisco | \n"," CA | \n"," 94122 | \n"," [] | \n"," Glenda Queensbury | \n"," Referral Realty-BV | \n","
\n"," \n"," | 8 | \n"," 1899000 | \n"," 3 | \n"," 2 | \n"," 1560 | \n"," 340 Yerba Buena Ave | \n"," San Francisco | \n"," CA | \n"," 94127 | \n"," [] | \n"," Jeannie Anderson | \n"," Coldwell Banker Realty | \n","
\n"," \n"," | 9 | \n"," 850000 | \n"," 2 | \n"," 2 | \n"," 1055 | \n"," 588 Minna Unit 604 | \n"," San Francisco | \n"," CA | \n"," 94103 | \n"," [] | \n"," Mohamed Lakdawala | \n"," Remax Prestigious Properties | \n","
\n"," \n"," | 10 | \n"," 1990000 | \n"," 3 | \n"," 1 | \n"," 1280 | \n"," 1450 Diamond St | \n"," San Francisco | \n"," CA | \n"," 94131 | \n"," [] | \n"," Mary Anne Villamil | \n"," Kinetic Real Estate | \n","
\n"," \n"," | 11 | \n"," 849000 | \n"," 1 | \n"," 1 | \n"," 855 | \n"," 81 Lansing St Unit 401 | \n"," San Francisco | \n"," CA | \n"," 94105 | \n"," [] | \n"," Kristen Haenggi | \n"," Compass | \n","
\n"," \n"," | 12 | \n"," 1080000 | \n"," 2 | \n"," 2 | \n"," 936 | \n"," 451 Kansas St Unit 466 | \n"," San Francisco | \n"," CA | \n"," 94107 | \n"," [] | \n"," Maureen DeBoer | \n"," LKJ Realty | \n","
\n"," \n"," | 13 | \n"," 1499000 | \n"," 4 | \n"," 2 | \n"," 2145 | \n"," 486 Yale St | \n"," San Francisco | \n"," CA | \n"," 94134 | \n"," [] | \n"," Alicia Atienza | \n"," Statewide Realty | \n","
\n"," \n"," | 14 | \n"," 1140000 | \n"," 2 | \n"," 2 | \n"," 998 | \n"," 588 Minna Unit 801 | \n"," San Francisco | \n"," CA | \n"," 94103 | \n"," [] | \n"," Milan Jezdimirovic | \n"," Compass | \n","
\n"," \n"," | 15 | \n"," 1988000 | \n"," 2 | \n"," 1 | \n"," 3800 | \n"," 183 19th Ave | \n"," San Francisco | \n"," CA | \n"," 94121 | \n"," [Amazing Property, Marina Style, Needs TLC] | \n"," Leo Cheung | \n"," eXp Realty of California, Inc | \n","
\n"," \n"," | 16 | \n"," 1218000 | \n"," 2 | \n"," 2 | \n"," 1275 | \n"," 1998 Pacific Ave Unit 202 | \n"," San Francisco | \n"," CA | \n"," 94109 | \n"," [Light-filled, Freshly painted, Walker's parad... | \n"," Grace Sun | \n"," Compass | \n","
\n"," \n"," | 17 | \n"," 895000 | \n"," 1 | \n"," 1 | \n"," 837 | \n"," 425 1st St Unit 2501 | \n"," San Francisco | \n"," CA | \n"," 94105 | \n"," [Unobstructed bay bridge views, Open layout] | \n"," Matt Fuller | \n"," Jackson Fuller Real Estate | \n","
\n"," \n"," | 18 | \n"," 1499000 | \n"," 3 | \n"," 1 | \n"," 1500 | \n"," Unlisted Address | \n"," San Francisco | \n"," CA | \n"," NA | \n"," [Contractor's Special, Fixer-upper] | \n"," Jaymee Faith Sagisi | \n"," IMPACT | \n","
\n"," \n"," | 19 | \n"," 900000 | \n"," 1 | \n"," 1 | \n"," 930 | \n"," 1101 Green St Unit 302 | \n"," San Francisco | \n"," CA | \n"," 94109 | \n"," [Historic Art Deco, Iconic views] | \n"," NA | \n"," NA | \n","
\n"," \n"," | 20 | \n"," 858000 | \n"," 1 | \n"," 1 | \n"," 1104 | \n"," 260 King St Unit 557 | \n"," San Francisco | \n"," CA | \n"," 94107 | \n"," [] | \n"," Miyuki Takami | \n"," eXp Realty of California, Inc | \n","
\n"," \n"," | 21 | \n"," 945000 | \n"," 2 | \n"," 1 | \n"," 767 | \n"," 307 Page St Unit 1 | \n"," San Francisco | \n"," CA | \n"," 94102 | \n"," [] | \n"," NA | \n"," NA | \n","
\n"," \n"," | 22 | \n"," 1099000 | \n"," 2 | \n"," 2 | \n"," 1330 | \n"," 1080 Sutter St Unit 202 | \n"," San Francisco | \n"," CA | \n"," 94109 | \n"," [] | \n"," Annette Liberty | \n"," Coldwell Banker Realty | \n","
\n"," \n"," | 23 | \n"," 950000 | \n"," 4 | \n"," 3 | \n"," 2090 | \n"," 3328 26th St Unit 3330 | \n"," San Francisco | \n"," CA | \n"," 94110 | \n"," [] | \n"," Isaac Munene | \n"," Coldwell Banker Realty | \n","
\n"," \n"," | 24 | \n"," 1088000 | \n"," 2 | \n"," 2 | \n"," 1065 | \n"," 1776 Sacramento St Unit 503 | \n"," San Francisco | \n"," CA | \n"," 94109 | \n"," [] | \n"," Marilyn Becklehimer | \n"," Dio Real Estate | \n","
\n"," \n"," | 25 | \n"," 1788888 | \n"," 4 | \n"," 3 | \n"," 1856 | \n"," 2317 15th St | \n"," San Francisco | \n"," CA | \n"," 94114 | \n"," [] | \n"," Joel Gile | \n"," Sequoia Real Estate | \n","
\n"," \n"," | 26 | \n"," 1650000 | \n"," 3 | \n"," 2 | \n"," 1547 | \n"," 2475 47th Ave | \n"," San Francisco | \n"," CA | \n"," 94116 | \n"," [] | \n"," Lucy Goldenshteyn | \n"," Redfin | \n","
\n"," \n"," | 27 | \n"," 998000 | \n"," 2 | \n"," 2 | \n"," 1202 | \n"," 50 Lansing St Unit 201 | \n"," San Francisco | \n"," CA | \n"," 94105 | \n"," [] | \n"," Tracey Broadman | \n"," Vanguard Properties | \n","
\n"," \n"," | 28 | \n"," 1595000 | \n"," 3 | \n"," 5 | \n"," 1995 | \n"," 15 Joy St | \n"," San Francisco | \n"," CA | \n"," 94110 | \n"," [] | \n"," Mike Stack | \n"," Vanguard Properties | \n","
\n"," \n"," | 29 | \n"," 1028000 | \n"," 2 | \n"," 2 | \n"," 1065 | \n"," 50 Lansing St Unit 403 | \n"," San Francisco | \n"," CA | \n"," 94105 | \n"," [] | \n"," Robyn Kaufman | \n"," Vivre Real Estate | \n","
\n"," \n"," | 30 | \n"," 999000 | \n"," 1 | \n"," 1 | \n"," 1021 | \n"," 338 Spear St Unit 6J | \n"," San Francisco | \n"," CA | \n"," 94105 | \n"," [Spacious, Balcony, Bright courtyard views] | \n"," Paul Hwang | \n"," Skybox Realty | \n","
\n"," \n"," | 31 | \n"," 799800 | \n"," 2 | \n"," 2 | \n"," 1109 | \n"," 10 Innes Ct | \n"," San Francisco | \n"," CA | \n"," 94124 | \n"," [New Construction] | \n"," Lennar | \n"," Lennar | \n","
\n"," \n"," | 32 | \n"," 529880 | \n"," 1 | \n"," 1 | \n"," 740 | \n"," 10 Innes Ct | \n"," San Francisco | \n"," CA | \n"," 94124 | \n"," [New Construction] | \n"," Lennar | \n"," Lennar | \n","
\n"," \n"," | 33 | \n"," 489000 | \n"," 1 | \n"," 1 | \n"," 741 | \n"," 10 Innes Ct | \n"," San Francisco | \n"," CA | \n"," 94124 | \n"," [New Construction] | \n"," Lennar | \n"," Lennar | \n","
\n"," \n"," | 34 | \n"," 1359000 | \n"," 4 | \n"," 2 | \n"," 1845 | \n"," 170 Thrift St | \n"," San Francisco | \n"," CA | \n"," 94112 | \n"," [Updated, Single-family home] | \n"," Cristal Wright | \n"," Vanguard Properties | \n","
\n"," \n"," | 35 | \n"," 1295000 | \n"," 3 | \n"," 1 | \n"," 1214 | \n"," 1922 43rd Ave | \n"," San Francisco | \n"," CA | \n"," 94116 | \n"," [] | \n"," Mila Romprey | \n"," Premier Realty Associates | \n","
\n"," \n"," | 36 | \n"," 1098000 | \n"," 3 | \n"," 1 | \n"," 1006 | \n"," 150 Putnam St | \n"," San Francisco | \n"," CA | \n"," 94110 | \n"," [] | \n"," Genie Mantzoros | \n"," Epic Real Estate & Asso. Inc. | \n","
\n"," \n"," | 37 | \n"," 1189870 | \n"," 3 | \n"," 2 | \n"," 1436 | \n"," 327 Ordway St | \n"," San Francisco | \n"," CA | \n"," 94134 | \n"," [] | \n"," Shawn Zahraie | \n"," Affinity Enterprises, Inc | \n","
\n"," \n"," | 38 | \n"," 899000 | \n"," 2 | \n"," 1 | \n"," 1118 | \n"," 272 Farallones St | \n"," San Francisco | \n"," CA | \n"," 94112 | \n"," [] | \n"," Janice Lee | \n"," Coldwell Banker Realty | \n","
\n"," \n"," | 39 | \n"," 30000 | \n"," 0 | \n"," 0 | \n"," 0 | \n"," 0 Evans Ave | \n"," San Francisco | \n"," CA | \n"," 94124 | \n"," [Land, 0.12 Acre, $251,467 per Acre] | \n"," Heidy Carrera | \n"," Berkshire Hathaway HomeService | \n","
\n"," \n","
\n","
"],"text/plain":[" price bedrooms bathrooms square_feet \\\n","0 549000 1 1 477 \n","1 1799000 4 2 2735 \n","2 1995000 7 3 3330 \n","3 549000 0 1 477 \n","4 5495000 10 7 6505 \n","5 925000 2 1 779 \n","6 898000 2 2 1175 \n","7 1700000 4 2 1950 \n","8 1899000 3 2 1560 \n","9 850000 2 2 1055 \n","10 1990000 3 1 1280 \n","11 849000 1 1 855 \n","12 1080000 2 2 936 \n","13 1499000 4 2 2145 \n","14 1140000 2 2 998 \n","15 1988000 2 1 3800 \n","16 1218000 2 2 1275 \n","17 895000 1 1 837 \n","18 1499000 3 1 1500 \n","19 900000 1 1 930 \n","20 858000 1 1 1104 \n","21 945000 2 1 767 \n","22 1099000 2 2 1330 \n","23 950000 4 3 2090 \n","24 1088000 2 2 1065 \n","25 1788888 4 3 1856 \n","26 1650000 3 2 1547 \n","27 998000 2 2 1202 \n","28 1595000 3 5 1995 \n","29 1028000 2 2 1065 \n","30 999000 1 1 1021 \n","31 799800 2 2 1109 \n","32 529880 1 1 740 \n","33 489000 1 1 741 \n","34 1359000 4 2 1845 \n","35 1295000 3 1 1214 \n","36 1098000 3 1 1006 \n","37 1189870 3 2 1436 \n","38 899000 2 1 1118 \n","39 30000 0 0 0 \n","\n"," address city state zip_code \\\n","0 380 14th St Unit 405 San Francisco CA 94103 \n","1 123 Grattan St San Francisco CA 94117 \n","2 1590 Washington St San Francisco CA 94109 \n","3 240 Lombard St Unit 835 San Francisco CA 94111 \n","4 1057 Steiner St San Francisco CA 94115 \n","5 2 Fallon Place Unit 57 San Francisco CA 94133 \n","6 5160 Diamond Heights Blvd Unit 208C San Francisco CA 94131 \n","7 1351 26th Ave San Francisco CA 94122 \n","8 340 Yerba Buena Ave San Francisco CA 94127 \n","9 588 Minna Unit 604 San Francisco CA 94103 \n","10 1450 Diamond St San Francisco CA 94131 \n","11 81 Lansing St Unit 401 San Francisco CA 94105 \n","12 451 Kansas St Unit 466 San Francisco CA 94107 \n","13 486 Yale St San Francisco CA 94134 \n","14 588 Minna Unit 801 San Francisco CA 94103 \n","15 183 19th Ave San Francisco CA 94121 \n","16 1998 Pacific Ave Unit 202 San Francisco CA 94109 \n","17 425 1st St Unit 2501 San Francisco CA 94105 \n","18 Unlisted Address San Francisco CA NA \n","19 1101 Green St Unit 302 San Francisco CA 94109 \n","20 260 King St Unit 557 San Francisco CA 94107 \n","21 307 Page St Unit 1 San Francisco CA 94102 \n","22 1080 Sutter St Unit 202 San Francisco CA 94109 \n","23 3328 26th St Unit 3330 San Francisco CA 94110 \n","24 1776 Sacramento St Unit 503 San Francisco CA 94109 \n","25 2317 15th St San Francisco CA 94114 \n","26 2475 47th Ave San Francisco CA 94116 \n","27 50 Lansing St Unit 201 San Francisco CA 94105 \n","28 15 Joy St San Francisco CA 94110 \n","29 50 Lansing St Unit 403 San Francisco CA 94105 \n","30 338 Spear St Unit 6J San Francisco CA 94105 \n","31 10 Innes Ct San Francisco CA 94124 \n","32 10 Innes Ct San Francisco CA 94124 \n","33 10 Innes Ct San Francisco CA 94124 \n","34 170 Thrift St San Francisco CA 94112 \n","35 1922 43rd Ave San Francisco CA 94116 \n","36 150 Putnam St San Francisco CA 94110 \n","37 327 Ordway St San Francisco CA 94134 \n","38 272 Farallones St San Francisco CA 94112 \n","39 0 Evans Ave San Francisco CA 94124 \n","\n"," tags agent_name \\\n","0 [New construction] Eddie O'Sullivan \n","1 [] Sean Engmann \n","2 [] Eddie O'Sullivan \n","3 [] Tim Gullicksen \n","4 [] Bonnie Spindler \n","5 [] Eddie O'Sullivan \n","6 [] Joe Polyak \n","7 [] Glenda Queensbury \n","8 [] Jeannie Anderson \n","9 [] Mohamed Lakdawala \n","10 [] Mary Anne Villamil \n","11 [] Kristen Haenggi \n","12 [] Maureen DeBoer \n","13 [] Alicia Atienza \n","14 [] Milan Jezdimirovic \n","15 [Amazing Property, Marina Style, Needs TLC] Leo Cheung \n","16 [Light-filled, Freshly painted, Walker's parad... Grace Sun \n","17 [Unobstructed bay bridge views, Open layout] Matt Fuller \n","18 [Contractor's Special, Fixer-upper] Jaymee Faith Sagisi \n","19 [Historic Art Deco, Iconic views] NA \n","20 [] Miyuki Takami \n","21 [] NA \n","22 [] Annette Liberty \n","23 [] Isaac Munene \n","24 [] Marilyn Becklehimer \n","25 [] Joel Gile \n","26 [] Lucy Goldenshteyn \n","27 [] Tracey Broadman \n","28 [] Mike Stack \n","29 [] Robyn Kaufman \n","30 [Spacious, Balcony, Bright courtyard views] Paul Hwang \n","31 [New Construction] Lennar \n","32 [New Construction] Lennar \n","33 [New Construction] Lennar \n","34 [Updated, Single-family home] Cristal Wright \n","35 [] Mila Romprey \n","36 [] Genie Mantzoros \n","37 [] Shawn Zahraie \n","38 [] Janice Lee \n","39 [Land, 0.12 Acre, $251,467 per Acre] Heidy Carrera \n","\n"," agency \n","0 Elevation Real Estate \n","1 eXp Realty of Northern CA Inc. \n","2 Elevation Real Estate \n","3 Corcoran Icon Properties \n","4 Corcoran Icon Properties \n","5 Elevation Real Estate \n","6 Rise Homes \n","7 Referral Realty-BV \n","8 Coldwell Banker Realty \n","9 Remax Prestigious Properties \n","10 Kinetic Real Estate \n","11 Compass \n","12 LKJ Realty \n","13 Statewide Realty \n","14 Compass \n","15 eXp Realty of California, Inc \n","16 Compass \n","17 Jackson Fuller Real Estate \n","18 IMPACT \n","19 NA \n","20 eXp Realty of California, Inc \n","21 NA \n","22 Coldwell Banker Realty \n","23 Coldwell Banker Realty \n","24 Dio Real Estate \n","25 Sequoia Real Estate \n","26 Redfin \n","27 Vanguard Properties \n","28 Vanguard Properties \n","29 Vivre Real Estate \n","30 Skybox Realty \n","31 Lennar \n","32 Lennar \n","33 Lennar \n","34 Vanguard Properties \n","35 Premier Realty Associates \n","36 Epic Real Estate & Asso. Inc. \n","37 Affinity Enterprises, Inc \n","38 Coldwell Banker Realty \n","39 Berkshire Hathaway HomeService "]},"execution_count":10,"metadata":{},"output_type":"execute_result"}],"source":["import pandas as pd\n","\n","# Convert dictionary to DataFrame\n","df = pd.DataFrame(result[\"houses\"])\n","df"]},{"cell_type":"markdown","metadata":{"id":"v0CBYVk7qA5Z"},"source":["Save it to CSV"]},{"cell_type":"code","execution_count":null,"metadata":{"colab":{"base_uri":"https://localhost:8080/"},"id":"BtEbB9pmQGhO","outputId":"96b40841-381e-49fb-db05-752dfe63ad00"},"outputs":[{"name":"stdout","output_type":"stream","text":["Data saved to zillow_forsale.csv\n"]}],"source":["# Save the DataFrame to a CSV file\n","csv_file = \"houses_forsale.csv\"\n","df.to_csv(csv_file, index=False)\n","print(f\"Data saved to {csv_file}\")"]},{"cell_type":"markdown","metadata":{"id":"-1SZT8VzTZNd"},"source":["## 🔗 Resources"]},{"cell_type":"markdown","metadata":{"id":"dUi2LtMLRDDR"},"source":["\n","
\n","
\n","
\n","\n","\n","- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n","- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n","- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n","- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n","- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n","\n","Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"]}],"metadata":{"colab":{"provenance":[]},"kernelspec":{"display_name":"Python 3","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.14"}},"nbformat":4,"nbformat_minor":0}
+{
+ "cells": [
+ {
+ "source": "## 🕷️ Extract Houses Listing with Official Scrapegraph SDK\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-sdk-2) [](https://colab.research.google.com/drive/1HHBUSFAHD_IvdeTAF60p6mtmeabo1s9P?usp=sharing)",
+ "outputs": [
+ {
+ "data": {
+ "text/plain": "## 🕷️ Extract Houses Listing with Official Scrapegraph SDK\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-sdk-2) [](https://colab.research.google.com/drive/1HHBUSFAHD_IvdeTAF60p6mtmeabo1s9P?usp=sharing)",
+ "text/markdown": "## 🕷️ Extract Houses Listing with Official Scrapegraph SDK\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-sdk-2) [](https://colab.research.google.com/drive/1HHBUSFAHD_IvdeTAF60p6mtmeabo1s9P?usp=sharing)"
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "metadata": {
+ "id": "jEkuKbcRrPcK"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "completed",
+ "execution_count": null,
+ "executionEndTime": "2026-03-26T00:11:50.373Z",
+ "executionStartTime": "2026-03-26T00:11:50.372Z"
+ },
+ {
+ "source": "",
+ "outputs": [],
+ "metadata": {
+ "id": "8vZBkAWLq9C1"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 🔧 Install `dependencies`",
+ "outputs": [],
+ "metadata": {
+ "id": "IzsyDXEWwPVt"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "%%capture\n!pip install scrapegraph-py",
+ "outputs": [],
+ "metadata": {
+ "id": "os_vm0MkIxr9"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 🔑 Import `ScrapeGraph` API key",
+ "outputs": [],
+ "metadata": {
+ "id": "apBsL-L2KzM7"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)",
+ "outputs": [],
+ "metadata": {
+ "id": "ol9gQbAFkh9b"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "import getpass\nimport os\n\nif not os.environ.get(\"SGAI_API_KEY\"):\n os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")",
+ "outputs": [
+ {
+ "name": "stdout",
+ "text": [
+ "SGAI_API_KEY found in environment.\n"
+ ],
+ "output_type": "stream"
+ }
+ ],
+ "metadata": {
+ "id": "sffqFG2EJ8bI",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "18dfce64-db37-4825-d316-fabd064100d0"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 📝 Defining an `Output Schema` for Webpage Content Extraction\n",
+ "outputs": [],
+ "metadata": {
+ "id": "jnqMB2-xVYQ7"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n\n
\n Pydantic Schema Quick Guide
\n\nTypes of Schemas \n\n1. Simple Schema \nUse this when you want to extract straightforward information, such as a single piece of content. \n\n```python\nfrom pydantic import BaseModel, Field\n\n# Simple schema for a single webpage\nclass PageInfoSchema(BaseModel):\n title: str = Field(description=\"The title of the webpage\")\n description: str = Field(description=\"The description of the webpage\")\n\n# Example Output JSON after AI extraction\n{\n \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n}\n```\n\n2. Complex Schema (Nested) \nIf you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n\n```python\nfrom pydantic import BaseModel, Field\nfrom typing import List\n\n# Define a schema for a single repository\nclass RepositorySchema(BaseModel):\n name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n description: str = Field(description=\"Description of the repository\")\n stars: int = Field(description=\"Star count of the repository\")\n forks: int = Field(description=\"Fork count of the repository\")\n today_stars: int = Field(description=\"Stars gained today\")\n language: str = Field(description=\"Programming language used\")\n\n# Define a schema for a list of repositories\nclass ListRepositoriesSchema(BaseModel):\n repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n\n# Example Output JSON after AI extraction\n{\n \"repositories\": [\n {\n \"name\": \"google-gemini/cookbook\",\n \"description\": \"Examples and guides for using the Gemini API\",\n \"stars\": 8036,\n \"forks\": 1001,\n \"today_stars\": 649,\n \"language\": \"Jupyter Notebook\"\n },\n {\n \"name\": \"TEN-framework/TEN-Agent\",\n \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n \"stars\": 3224,\n \"forks\": 311,\n \"today_stars\": 361,\n \"language\": \"Python\"\n }\n ]\n}\n```\n\nKey Takeaways \n- **Simple Schema**: Perfect for small, straightforward extractions. \n- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n\nBoth approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n \n",
+ "outputs": [],
+ "metadata": {
+ "id": "VZvxbjfXvbgd"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "from pydantic import BaseModel, Field\nfrom typing import List, Optional\n\n# Schema for a single house listing\nclass HouseSchema(BaseModel):\n price: int = Field(description=\"Price of the house in USD\")\n bedrooms: int = Field(description=\"Number of bedrooms\")\n bathrooms: int = Field(description=\"Number of bathrooms\")\n square_feet: int = Field(description=\"Total square footage of the house\")\n address: str = Field(description=\"Address of the house\")\n city: str = Field(description=\"City where the house is located\")\n state: str = Field(description=\"State where the house is located\")\n zip_code: str = Field(description=\"ZIP code of the house location\")\n tags: List[str] = Field(description=\"Tags like 'New construction' or 'Large garage'\")\n agent_name: str = Field(description=\"Name of the listing agent. If not present or not sure write NA.\")\n agency: str = Field(description=\"Agency listing the house. If not present or not sure write NA.\")\n\n# Schema containing a list of house listings\nclass HouseListingsSchema(BaseModel):\n houses: List[HouseSchema] = Field(description=\"List of house listings on Homes or similar platforms\")\n",
+ "outputs": [],
+ "metadata": {
+ "id": "dlrOEgZk_8V4"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 🚀 Initialize `SGAI Client` and start extraction",
+ "outputs": [],
+ "metadata": {
+ "id": "cDGH0b2DkY63"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Initialize the client for scraping (there's also an async version [here](https://github.com/ScrapeGraphAI/scrapegraph-sdk/blob/main/scrapegraph-py/examples/async_smartscraper_example.py))",
+ "outputs": [],
+ "metadata": {
+ "id": "4SLJgXgcob6L"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "from scrapegraph_py import Client\n\n# Initialize the client with explicit API key\nsgai_client = Client(api_key=sgai_api_key, timeout=240)",
+ "outputs": [],
+ "metadata": {
+ "id": "PQI25GZvoCSk"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Here we use `Smartscraper` service to extract structured data using AI from a webpage.\n\n\n> If you already have an HTML file, you can upload it and use `Localscraper` instead.\n\n\n\n",
+ "outputs": [],
+ "metadata": {
+ "id": "M1KSXffZopUD"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "# Request for Trending Repositories\nrepo_response = sgai_client.smartscraper(\n website_url=\"https://www.homes.com/san-francisco-ca/?bb=nzpwspy0mS749snkvsb\",\n user_prompt=\"Extract info about the houses visible on the page\",\n output_schema=HouseListingsSchema,\n)",
+ "outputs": [],
+ "metadata": {
+ "id": "2FIKomclLNFx"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Print the response",
+ "outputs": [],
+ "metadata": {
+ "id": "YZz1bqCIpoL8"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "import json\n\n# Print the response\nrequest_id = repo_response['request_id']\nresult = repo_response['result']\n\nprint(f\"Request ID: {request_id}\")\nprint(json.dumps(result, indent=2))",
+ "outputs": [
+ {
+ "name": "stdout",
+ "text": [
+ "Request ID: 4e023916-2a41-40ea-bea5-efc422daf33e\n",
+ "{\n",
+ " \"houses\": [\n",
+ " {\n",
+ " \"price\": 549000,\n",
+ " \"bedrooms\": 1,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 477,\n",
+ " \"address\": \"380 14th St Unit 405\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94103\",\n",
+ " \"tags\": [\n",
+ " \"New construction\"\n",
+ " ],\n",
+ " \"agent_name\": \"Eddie O'Sullivan\",\n",
+ " \"agency\": \"Elevation Real Estate\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1799000,\n",
+ " \"bedrooms\": 4,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 2735,\n",
+ " \"address\": \"123 Grattan St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94117\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Sean Engmann\",\n",
+ " \"agency\": \"eXp Realty of Northern CA Inc.\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1995000,\n",
+ " \"bedrooms\": 7,\n",
+ " \"bathrooms\": 3,\n",
+ " \"square_feet\": 3330,\n",
+ " \"address\": \"1590 Washington St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94109\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Eddie O'Sullivan\",\n",
+ " \"agency\": \"Elevation Real Estate\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 549000,\n",
+ " \"bedrooms\": 0,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 477,\n",
+ " \"address\": \"240 Lombard St Unit 835\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94111\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Tim Gullicksen\",\n",
+ " \"agency\": \"Corcoran Icon Properties\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 5495000,\n",
+ " \"bedrooms\": 10,\n",
+ " \"bathrooms\": 7,\n",
+ " \"square_feet\": 6505,\n",
+ " \"address\": \"1057 Steiner St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94115\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Bonnie Spindler\",\n",
+ " \"agency\": \"Corcoran Icon Properties\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 925000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 779,\n",
+ " \"address\": \"2 Fallon Place Unit 57\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94133\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Eddie O'Sullivan\",\n",
+ " \"agency\": \"Elevation Real Estate\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 898000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1175,\n",
+ " \"address\": \"5160 Diamond Heights Blvd Unit 208C\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94131\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Joe Polyak\",\n",
+ " \"agency\": \"Rise Homes\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1700000,\n",
+ " \"bedrooms\": 4,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1950,\n",
+ " \"address\": \"1351 26th Ave\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94122\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Glenda Queensbury\",\n",
+ " \"agency\": \"Referral Realty-BV\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1899000,\n",
+ " \"bedrooms\": 3,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1560,\n",
+ " \"address\": \"340 Yerba Buena Ave\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94127\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Jeannie Anderson\",\n",
+ " \"agency\": \"Coldwell Banker Realty\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 850000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1055,\n",
+ " \"address\": \"588 Minna Unit 604\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94103\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Mohamed Lakdawala\",\n",
+ " \"agency\": \"Remax Prestigious Properties\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1990000,\n",
+ " \"bedrooms\": 3,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 1280,\n",
+ " \"address\": \"1450 Diamond St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94131\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Mary Anne Villamil\",\n",
+ " \"agency\": \"Kinetic Real Estate\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 849000,\n",
+ " \"bedrooms\": 1,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 855,\n",
+ " \"address\": \"81 Lansing St Unit 401\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94105\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Kristen Haenggi\",\n",
+ " \"agency\": \"Compass\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1080000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 936,\n",
+ " \"address\": \"451 Kansas St Unit 466\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94107\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Maureen DeBoer\",\n",
+ " \"agency\": \"LKJ Realty\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1499000,\n",
+ " \"bedrooms\": 4,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 2145,\n",
+ " \"address\": \"486 Yale St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94134\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Alicia Atienza\",\n",
+ " \"agency\": \"Statewide Realty\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1140000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 998,\n",
+ " \"address\": \"588 Minna Unit 801\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94103\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Milan Jezdimirovic\",\n",
+ " \"agency\": \"Compass\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1988000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 3800,\n",
+ " \"address\": \"183 19th Ave\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94121\",\n",
+ " \"tags\": [\n",
+ " \"Amazing Property\",\n",
+ " \"Marina Style\",\n",
+ " \"Needs TLC\"\n",
+ " ],\n",
+ " \"agent_name\": \"Leo Cheung\",\n",
+ " \"agency\": \"eXp Realty of California, Inc\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1218000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1275,\n",
+ " \"address\": \"1998 Pacific Ave Unit 202\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94109\",\n",
+ " \"tags\": [\n",
+ " \"Light-filled\",\n",
+ " \"Freshly painted\",\n",
+ " \"Walker's paradise\"\n",
+ " ],\n",
+ " \"agent_name\": \"Grace Sun\",\n",
+ " \"agency\": \"Compass\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 895000,\n",
+ " \"bedrooms\": 1,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 837,\n",
+ " \"address\": \"425 1st St Unit 2501\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94105\",\n",
+ " \"tags\": [\n",
+ " \"Unobstructed bay bridge views\",\n",
+ " \"Open layout\"\n",
+ " ],\n",
+ " \"agent_name\": \"Matt Fuller\",\n",
+ " \"agency\": \"Jackson Fuller Real Estate\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1499000,\n",
+ " \"bedrooms\": 3,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 1500,\n",
+ " \"address\": \"Unlisted Address\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"NA\",\n",
+ " \"tags\": [\n",
+ " \"Contractor's Special\",\n",
+ " \"Fixer-upper\"\n",
+ " ],\n",
+ " \"agent_name\": \"Jaymee Faith Sagisi\",\n",
+ " \"agency\": \"IMPACT\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 900000,\n",
+ " \"bedrooms\": 1,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 930,\n",
+ " \"address\": \"1101 Green St Unit 302\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94109\",\n",
+ " \"tags\": [\n",
+ " \"Historic Art Deco\",\n",
+ " \"Iconic views\"\n",
+ " ],\n",
+ " \"agent_name\": \"NA\",\n",
+ " \"agency\": \"NA\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 858000,\n",
+ " \"bedrooms\": 1,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 1104,\n",
+ " \"address\": \"260 King St Unit 557\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94107\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Miyuki Takami\",\n",
+ " \"agency\": \"eXp Realty of California, Inc\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 945000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 767,\n",
+ " \"address\": \"307 Page St Unit 1\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94102\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"NA\",\n",
+ " \"agency\": \"NA\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1099000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1330,\n",
+ " \"address\": \"1080 Sutter St Unit 202\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94109\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Annette Liberty\",\n",
+ " \"agency\": \"Coldwell Banker Realty\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 950000,\n",
+ " \"bedrooms\": 4,\n",
+ " \"bathrooms\": 3,\n",
+ " \"square_feet\": 2090,\n",
+ " \"address\": \"3328 26th St Unit 3330\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94110\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Isaac Munene\",\n",
+ " \"agency\": \"Coldwell Banker Realty\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1088000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1065,\n",
+ " \"address\": \"1776 Sacramento St Unit 503\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94109\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Marilyn Becklehimer\",\n",
+ " \"agency\": \"Dio Real Estate\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1788888,\n",
+ " \"bedrooms\": 4,\n",
+ " \"bathrooms\": 3,\n",
+ " \"square_feet\": 1856,\n",
+ " \"address\": \"2317 15th St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94114\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Joel Gile\",\n",
+ " \"agency\": \"Sequoia Real Estate\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1650000,\n",
+ " \"bedrooms\": 3,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1547,\n",
+ " \"address\": \"2475 47th Ave\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94116\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Lucy Goldenshteyn\",\n",
+ " \"agency\": \"Redfin\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 998000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1202,\n",
+ " \"address\": \"50 Lansing St Unit 201\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94105\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Tracey Broadman\",\n",
+ " \"agency\": \"Vanguard Properties\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1595000,\n",
+ " \"bedrooms\": 3,\n",
+ " \"bathrooms\": 5,\n",
+ " \"square_feet\": 1995,\n",
+ " \"address\": \"15 Joy St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94110\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Mike Stack\",\n",
+ " \"agency\": \"Vanguard Properties\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1028000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1065,\n",
+ " \"address\": \"50 Lansing St Unit 403\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94105\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Robyn Kaufman\",\n",
+ " \"agency\": \"Vivre Real Estate\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 999000,\n",
+ " \"bedrooms\": 1,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 1021,\n",
+ " \"address\": \"338 Spear St Unit 6J\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94105\",\n",
+ " \"tags\": [\n",
+ " \"Spacious\",\n",
+ " \"Balcony\",\n",
+ " \"Bright courtyard views\"\n",
+ " ],\n",
+ " \"agent_name\": \"Paul Hwang\",\n",
+ " \"agency\": \"Skybox Realty\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 799800,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1109,\n",
+ " \"address\": \"10 Innes Ct\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94124\",\n",
+ " \"tags\": [\n",
+ " \"New Construction\"\n",
+ " ],\n",
+ " \"agent_name\": \"Lennar\",\n",
+ " \"agency\": \"Lennar\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 529880,\n",
+ " \"bedrooms\": 1,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 740,\n",
+ " \"address\": \"10 Innes Ct\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94124\",\n",
+ " \"tags\": [\n",
+ " \"New Construction\"\n",
+ " ],\n",
+ " \"agent_name\": \"Lennar\",\n",
+ " \"agency\": \"Lennar\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 489000,\n",
+ " \"bedrooms\": 1,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 741,\n",
+ " \"address\": \"10 Innes Ct\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94124\",\n",
+ " \"tags\": [\n",
+ " \"New Construction\"\n",
+ " ],\n",
+ " \"agent_name\": \"Lennar\",\n",
+ " \"agency\": \"Lennar\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1359000,\n",
+ " \"bedrooms\": 4,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1845,\n",
+ " \"address\": \"170 Thrift St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94112\",\n",
+ " \"tags\": [\n",
+ " \"Updated\",\n",
+ " \"Single-family home\"\n",
+ " ],\n",
+ " \"agent_name\": \"Cristal Wright\",\n",
+ " \"agency\": \"Vanguard Properties\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1295000,\n",
+ " \"bedrooms\": 3,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 1214,\n",
+ " \"address\": \"1922 43rd Ave\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94116\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Mila Romprey\",\n",
+ " \"agency\": \"Premier Realty Associates\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1098000,\n",
+ " \"bedrooms\": 3,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 1006,\n",
+ " \"address\": \"150 Putnam St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94110\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Genie Mantzoros\",\n",
+ " \"agency\": \"Epic Real Estate & Asso. Inc.\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 1189870,\n",
+ " \"bedrooms\": 3,\n",
+ " \"bathrooms\": 2,\n",
+ " \"square_feet\": 1436,\n",
+ " \"address\": \"327 Ordway St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94134\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Shawn Zahraie\",\n",
+ " \"agency\": \"Affinity Enterprises, Inc\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 899000,\n",
+ " \"bedrooms\": 2,\n",
+ " \"bathrooms\": 1,\n",
+ " \"square_feet\": 1118,\n",
+ " \"address\": \"272 Farallones St\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94112\",\n",
+ " \"tags\": [],\n",
+ " \"agent_name\": \"Janice Lee\",\n",
+ " \"agency\": \"Coldwell Banker Realty\"\n",
+ " },\n",
+ " {\n",
+ " \"price\": 30000,\n",
+ " \"bedrooms\": 0,\n",
+ " \"bathrooms\": 0,\n",
+ " \"square_feet\": 0,\n",
+ " \"address\": \"0 Evans Ave\",\n",
+ " \"city\": \"San Francisco\",\n",
+ " \"state\": \"CA\",\n",
+ " \"zip_code\": \"94124\",\n",
+ " \"tags\": [\n",
+ " \"Land\",\n",
+ " \"0.12 Acre\",\n",
+ " \"$251,467 per Acre\"\n",
+ " ],\n",
+ " \"agent_name\": \"Heidy Carrera\",\n",
+ " \"agency\": \"Berkshire Hathaway HomeService\"\n",
+ " }\n",
+ " ]\n",
+ "}\n"
+ ],
+ "output_type": "stream"
+ }
+ ],
+ "metadata": {
+ "id": "F1VfD8B4LPc8",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "1e849a65-6713-486c-e306-bb7c26db4bf9"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "### 💾 Save the output to a `CSV` file",
+ "outputs": [],
+ "metadata": {
+ "id": "2as65QLypwdb"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Let's create a pandas dataframe and show the table with the extracted content",
+ "outputs": [],
+ "metadata": {
+ "id": "HTLVFgbVLLBR"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "import pandas as pd\n\n# Convert dictionary to DataFrame\ndf = pd.DataFrame(result[\"houses\"])\ndf",
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " price | \n",
+ " bedrooms | \n",
+ " bathrooms | \n",
+ " square_feet | \n",
+ " address | \n",
+ " city | \n",
+ " state | \n",
+ " zip_code | \n",
+ " tags | \n",
+ " agent_name | \n",
+ " agency | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " | 0 | \n",
+ " 549000 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 477 | \n",
+ " 380 14th St Unit 405 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94103 | \n",
+ " [New construction] | \n",
+ " Eddie O'Sullivan | \n",
+ " Elevation Real Estate | \n",
+ "
\n",
+ " \n",
+ " | 1 | \n",
+ " 1799000 | \n",
+ " 4 | \n",
+ " 2 | \n",
+ " 2735 | \n",
+ " 123 Grattan St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94117 | \n",
+ " [] | \n",
+ " Sean Engmann | \n",
+ " eXp Realty of Northern CA Inc. | \n",
+ "
\n",
+ " \n",
+ " | 2 | \n",
+ " 1995000 | \n",
+ " 7 | \n",
+ " 3 | \n",
+ " 3330 | \n",
+ " 1590 Washington St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94109 | \n",
+ " [] | \n",
+ " Eddie O'Sullivan | \n",
+ " Elevation Real Estate | \n",
+ "
\n",
+ " \n",
+ " | 3 | \n",
+ " 549000 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 477 | \n",
+ " 240 Lombard St Unit 835 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94111 | \n",
+ " [] | \n",
+ " Tim Gullicksen | \n",
+ " Corcoran Icon Properties | \n",
+ "
\n",
+ " \n",
+ " | 4 | \n",
+ " 5495000 | \n",
+ " 10 | \n",
+ " 7 | \n",
+ " 6505 | \n",
+ " 1057 Steiner St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94115 | \n",
+ " [] | \n",
+ " Bonnie Spindler | \n",
+ " Corcoran Icon Properties | \n",
+ "
\n",
+ " \n",
+ " | 5 | \n",
+ " 925000 | \n",
+ " 2 | \n",
+ " 1 | \n",
+ " 779 | \n",
+ " 2 Fallon Place Unit 57 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94133 | \n",
+ " [] | \n",
+ " Eddie O'Sullivan | \n",
+ " Elevation Real Estate | \n",
+ "
\n",
+ " \n",
+ " | 6 | \n",
+ " 898000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 1175 | \n",
+ " 5160 Diamond Heights Blvd Unit 208C | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94131 | \n",
+ " [] | \n",
+ " Joe Polyak | \n",
+ " Rise Homes | \n",
+ "
\n",
+ " \n",
+ " | 7 | \n",
+ " 1700000 | \n",
+ " 4 | \n",
+ " 2 | \n",
+ " 1950 | \n",
+ " 1351 26th Ave | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94122 | \n",
+ " [] | \n",
+ " Glenda Queensbury | \n",
+ " Referral Realty-BV | \n",
+ "
\n",
+ " \n",
+ " | 8 | \n",
+ " 1899000 | \n",
+ " 3 | \n",
+ " 2 | \n",
+ " 1560 | \n",
+ " 340 Yerba Buena Ave | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94127 | \n",
+ " [] | \n",
+ " Jeannie Anderson | \n",
+ " Coldwell Banker Realty | \n",
+ "
\n",
+ " \n",
+ " | 9 | \n",
+ " 850000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 1055 | \n",
+ " 588 Minna Unit 604 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94103 | \n",
+ " [] | \n",
+ " Mohamed Lakdawala | \n",
+ " Remax Prestigious Properties | \n",
+ "
\n",
+ " \n",
+ " | 10 | \n",
+ " 1990000 | \n",
+ " 3 | \n",
+ " 1 | \n",
+ " 1280 | \n",
+ " 1450 Diamond St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94131 | \n",
+ " [] | \n",
+ " Mary Anne Villamil | \n",
+ " Kinetic Real Estate | \n",
+ "
\n",
+ " \n",
+ " | 11 | \n",
+ " 849000 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 855 | \n",
+ " 81 Lansing St Unit 401 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94105 | \n",
+ " [] | \n",
+ " Kristen Haenggi | \n",
+ " Compass | \n",
+ "
\n",
+ " \n",
+ " | 12 | \n",
+ " 1080000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 936 | \n",
+ " 451 Kansas St Unit 466 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94107 | \n",
+ " [] | \n",
+ " Maureen DeBoer | \n",
+ " LKJ Realty | \n",
+ "
\n",
+ " \n",
+ " | 13 | \n",
+ " 1499000 | \n",
+ " 4 | \n",
+ " 2 | \n",
+ " 2145 | \n",
+ " 486 Yale St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94134 | \n",
+ " [] | \n",
+ " Alicia Atienza | \n",
+ " Statewide Realty | \n",
+ "
\n",
+ " \n",
+ " | 14 | \n",
+ " 1140000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 998 | \n",
+ " 588 Minna Unit 801 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94103 | \n",
+ " [] | \n",
+ " Milan Jezdimirovic | \n",
+ " Compass | \n",
+ "
\n",
+ " \n",
+ " | 15 | \n",
+ " 1988000 | \n",
+ " 2 | \n",
+ " 1 | \n",
+ " 3800 | \n",
+ " 183 19th Ave | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94121 | \n",
+ " [Amazing Property, Marina Style, Needs TLC] | \n",
+ " Leo Cheung | \n",
+ " eXp Realty of California, Inc | \n",
+ "
\n",
+ " \n",
+ " | 16 | \n",
+ " 1218000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 1275 | \n",
+ " 1998 Pacific Ave Unit 202 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94109 | \n",
+ " [Light-filled, Freshly painted, Walker's parad... | \n",
+ " Grace Sun | \n",
+ " Compass | \n",
+ "
\n",
+ " \n",
+ " | 17 | \n",
+ " 895000 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 837 | \n",
+ " 425 1st St Unit 2501 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94105 | \n",
+ " [Unobstructed bay bridge views, Open layout] | \n",
+ " Matt Fuller | \n",
+ " Jackson Fuller Real Estate | \n",
+ "
\n",
+ " \n",
+ " | 18 | \n",
+ " 1499000 | \n",
+ " 3 | \n",
+ " 1 | \n",
+ " 1500 | \n",
+ " Unlisted Address | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " NA | \n",
+ " [Contractor's Special, Fixer-upper] | \n",
+ " Jaymee Faith Sagisi | \n",
+ " IMPACT | \n",
+ "
\n",
+ " \n",
+ " | 19 | \n",
+ " 900000 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 930 | \n",
+ " 1101 Green St Unit 302 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94109 | \n",
+ " [Historic Art Deco, Iconic views] | \n",
+ " NA | \n",
+ " NA | \n",
+ "
\n",
+ " \n",
+ " | 20 | \n",
+ " 858000 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 1104 | \n",
+ " 260 King St Unit 557 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94107 | \n",
+ " [] | \n",
+ " Miyuki Takami | \n",
+ " eXp Realty of California, Inc | \n",
+ "
\n",
+ " \n",
+ " | 21 | \n",
+ " 945000 | \n",
+ " 2 | \n",
+ " 1 | \n",
+ " 767 | \n",
+ " 307 Page St Unit 1 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94102 | \n",
+ " [] | \n",
+ " NA | \n",
+ " NA | \n",
+ "
\n",
+ " \n",
+ " | 22 | \n",
+ " 1099000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 1330 | \n",
+ " 1080 Sutter St Unit 202 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94109 | \n",
+ " [] | \n",
+ " Annette Liberty | \n",
+ " Coldwell Banker Realty | \n",
+ "
\n",
+ " \n",
+ " | 23 | \n",
+ " 950000 | \n",
+ " 4 | \n",
+ " 3 | \n",
+ " 2090 | \n",
+ " 3328 26th St Unit 3330 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94110 | \n",
+ " [] | \n",
+ " Isaac Munene | \n",
+ " Coldwell Banker Realty | \n",
+ "
\n",
+ " \n",
+ " | 24 | \n",
+ " 1088000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 1065 | \n",
+ " 1776 Sacramento St Unit 503 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94109 | \n",
+ " [] | \n",
+ " Marilyn Becklehimer | \n",
+ " Dio Real Estate | \n",
+ "
\n",
+ " \n",
+ " | 25 | \n",
+ " 1788888 | \n",
+ " 4 | \n",
+ " 3 | \n",
+ " 1856 | \n",
+ " 2317 15th St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94114 | \n",
+ " [] | \n",
+ " Joel Gile | \n",
+ " Sequoia Real Estate | \n",
+ "
\n",
+ " \n",
+ " | 26 | \n",
+ " 1650000 | \n",
+ " 3 | \n",
+ " 2 | \n",
+ " 1547 | \n",
+ " 2475 47th Ave | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94116 | \n",
+ " [] | \n",
+ " Lucy Goldenshteyn | \n",
+ " Redfin | \n",
+ "
\n",
+ " \n",
+ " | 27 | \n",
+ " 998000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 1202 | \n",
+ " 50 Lansing St Unit 201 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94105 | \n",
+ " [] | \n",
+ " Tracey Broadman | \n",
+ " Vanguard Properties | \n",
+ "
\n",
+ " \n",
+ " | 28 | \n",
+ " 1595000 | \n",
+ " 3 | \n",
+ " 5 | \n",
+ " 1995 | \n",
+ " 15 Joy St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94110 | \n",
+ " [] | \n",
+ " Mike Stack | \n",
+ " Vanguard Properties | \n",
+ "
\n",
+ " \n",
+ " | 29 | \n",
+ " 1028000 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 1065 | \n",
+ " 50 Lansing St Unit 403 | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94105 | \n",
+ " [] | \n",
+ " Robyn Kaufman | \n",
+ " Vivre Real Estate | \n",
+ "
\n",
+ " \n",
+ " | 30 | \n",
+ " 999000 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 1021 | \n",
+ " 338 Spear St Unit 6J | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94105 | \n",
+ " [Spacious, Balcony, Bright courtyard views] | \n",
+ " Paul Hwang | \n",
+ " Skybox Realty | \n",
+ "
\n",
+ " \n",
+ " | 31 | \n",
+ " 799800 | \n",
+ " 2 | \n",
+ " 2 | \n",
+ " 1109 | \n",
+ " 10 Innes Ct | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94124 | \n",
+ " [New Construction] | \n",
+ " Lennar | \n",
+ " Lennar | \n",
+ "
\n",
+ " \n",
+ " | 32 | \n",
+ " 529880 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 740 | \n",
+ " 10 Innes Ct | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94124 | \n",
+ " [New Construction] | \n",
+ " Lennar | \n",
+ " Lennar | \n",
+ "
\n",
+ " \n",
+ " | 33 | \n",
+ " 489000 | \n",
+ " 1 | \n",
+ " 1 | \n",
+ " 741 | \n",
+ " 10 Innes Ct | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94124 | \n",
+ " [New Construction] | \n",
+ " Lennar | \n",
+ " Lennar | \n",
+ "
\n",
+ " \n",
+ " | 34 | \n",
+ " 1359000 | \n",
+ " 4 | \n",
+ " 2 | \n",
+ " 1845 | \n",
+ " 170 Thrift St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94112 | \n",
+ " [Updated, Single-family home] | \n",
+ " Cristal Wright | \n",
+ " Vanguard Properties | \n",
+ "
\n",
+ " \n",
+ " | 35 | \n",
+ " 1295000 | \n",
+ " 3 | \n",
+ " 1 | \n",
+ " 1214 | \n",
+ " 1922 43rd Ave | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94116 | \n",
+ " [] | \n",
+ " Mila Romprey | \n",
+ " Premier Realty Associates | \n",
+ "
\n",
+ " \n",
+ " | 36 | \n",
+ " 1098000 | \n",
+ " 3 | \n",
+ " 1 | \n",
+ " 1006 | \n",
+ " 150 Putnam St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94110 | \n",
+ " [] | \n",
+ " Genie Mantzoros | \n",
+ " Epic Real Estate & Asso. Inc. | \n",
+ "
\n",
+ " \n",
+ " | 37 | \n",
+ " 1189870 | \n",
+ " 3 | \n",
+ " 2 | \n",
+ " 1436 | \n",
+ " 327 Ordway St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94134 | \n",
+ " [] | \n",
+ " Shawn Zahraie | \n",
+ " Affinity Enterprises, Inc | \n",
+ "
\n",
+ " \n",
+ " | 38 | \n",
+ " 899000 | \n",
+ " 2 | \n",
+ " 1 | \n",
+ " 1118 | \n",
+ " 272 Farallones St | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94112 | \n",
+ " [] | \n",
+ " Janice Lee | \n",
+ " Coldwell Banker Realty | \n",
+ "
\n",
+ " \n",
+ " | 39 | \n",
+ " 30000 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 | \n",
+ " 0 Evans Ave | \n",
+ " San Francisco | \n",
+ " CA | \n",
+ " 94124 | \n",
+ " [Land, 0.12 Acre, $251,467 per Acre] | \n",
+ " Heidy Carrera | \n",
+ " Berkshire Hathaway HomeService | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " price bedrooms bathrooms square_feet \\\n",
+ "0 549000 1 1 477 \n",
+ "1 1799000 4 2 2735 \n",
+ "2 1995000 7 3 3330 \n",
+ "3 549000 0 1 477 \n",
+ "4 5495000 10 7 6505 \n",
+ "5 925000 2 1 779 \n",
+ "6 898000 2 2 1175 \n",
+ "7 1700000 4 2 1950 \n",
+ "8 1899000 3 2 1560 \n",
+ "9 850000 2 2 1055 \n",
+ "10 1990000 3 1 1280 \n",
+ "11 849000 1 1 855 \n",
+ "12 1080000 2 2 936 \n",
+ "13 1499000 4 2 2145 \n",
+ "14 1140000 2 2 998 \n",
+ "15 1988000 2 1 3800 \n",
+ "16 1218000 2 2 1275 \n",
+ "17 895000 1 1 837 \n",
+ "18 1499000 3 1 1500 \n",
+ "19 900000 1 1 930 \n",
+ "20 858000 1 1 1104 \n",
+ "21 945000 2 1 767 \n",
+ "22 1099000 2 2 1330 \n",
+ "23 950000 4 3 2090 \n",
+ "24 1088000 2 2 1065 \n",
+ "25 1788888 4 3 1856 \n",
+ "26 1650000 3 2 1547 \n",
+ "27 998000 2 2 1202 \n",
+ "28 1595000 3 5 1995 \n",
+ "29 1028000 2 2 1065 \n",
+ "30 999000 1 1 1021 \n",
+ "31 799800 2 2 1109 \n",
+ "32 529880 1 1 740 \n",
+ "33 489000 1 1 741 \n",
+ "34 1359000 4 2 1845 \n",
+ "35 1295000 3 1 1214 \n",
+ "36 1098000 3 1 1006 \n",
+ "37 1189870 3 2 1436 \n",
+ "38 899000 2 1 1118 \n",
+ "39 30000 0 0 0 \n",
+ "\n",
+ " address city state zip_code \\\n",
+ "0 380 14th St Unit 405 San Francisco CA 94103 \n",
+ "1 123 Grattan St San Francisco CA 94117 \n",
+ "2 1590 Washington St San Francisco CA 94109 \n",
+ "3 240 Lombard St Unit 835 San Francisco CA 94111 \n",
+ "4 1057 Steiner St San Francisco CA 94115 \n",
+ "5 2 Fallon Place Unit 57 San Francisco CA 94133 \n",
+ "6 5160 Diamond Heights Blvd Unit 208C San Francisco CA 94131 \n",
+ "7 1351 26th Ave San Francisco CA 94122 \n",
+ "8 340 Yerba Buena Ave San Francisco CA 94127 \n",
+ "9 588 Minna Unit 604 San Francisco CA 94103 \n",
+ "10 1450 Diamond St San Francisco CA 94131 \n",
+ "11 81 Lansing St Unit 401 San Francisco CA 94105 \n",
+ "12 451 Kansas St Unit 466 San Francisco CA 94107 \n",
+ "13 486 Yale St San Francisco CA 94134 \n",
+ "14 588 Minna Unit 801 San Francisco CA 94103 \n",
+ "15 183 19th Ave San Francisco CA 94121 \n",
+ "16 1998 Pacific Ave Unit 202 San Francisco CA 94109 \n",
+ "17 425 1st St Unit 2501 San Francisco CA 94105 \n",
+ "18 Unlisted Address San Francisco CA NA \n",
+ "19 1101 Green St Unit 302 San Francisco CA 94109 \n",
+ "20 260 King St Unit 557 San Francisco CA 94107 \n",
+ "21 307 Page St Unit 1 San Francisco CA 94102 \n",
+ "22 1080 Sutter St Unit 202 San Francisco CA 94109 \n",
+ "23 3328 26th St Unit 3330 San Francisco CA 94110 \n",
+ "24 1776 Sacramento St Unit 503 San Francisco CA 94109 \n",
+ "25 2317 15th St San Francisco CA 94114 \n",
+ "26 2475 47th Ave San Francisco CA 94116 \n",
+ "27 50 Lansing St Unit 201 San Francisco CA 94105 \n",
+ "28 15 Joy St San Francisco CA 94110 \n",
+ "29 50 Lansing St Unit 403 San Francisco CA 94105 \n",
+ "30 338 Spear St Unit 6J San Francisco CA 94105 \n",
+ "31 10 Innes Ct San Francisco CA 94124 \n",
+ "32 10 Innes Ct San Francisco CA 94124 \n",
+ "33 10 Innes Ct San Francisco CA 94124 \n",
+ "34 170 Thrift St San Francisco CA 94112 \n",
+ "35 1922 43rd Ave San Francisco CA 94116 \n",
+ "36 150 Putnam St San Francisco CA 94110 \n",
+ "37 327 Ordway St San Francisco CA 94134 \n",
+ "38 272 Farallones St San Francisco CA 94112 \n",
+ "39 0 Evans Ave San Francisco CA 94124 \n",
+ "\n",
+ " tags agent_name \\\n",
+ "0 [New construction] Eddie O'Sullivan \n",
+ "1 [] Sean Engmann \n",
+ "2 [] Eddie O'Sullivan \n",
+ "3 [] Tim Gullicksen \n",
+ "4 [] Bonnie Spindler \n",
+ "5 [] Eddie O'Sullivan \n",
+ "6 [] Joe Polyak \n",
+ "7 [] Glenda Queensbury \n",
+ "8 [] Jeannie Anderson \n",
+ "9 [] Mohamed Lakdawala \n",
+ "10 [] Mary Anne Villamil \n",
+ "11 [] Kristen Haenggi \n",
+ "12 [] Maureen DeBoer \n",
+ "13 [] Alicia Atienza \n",
+ "14 [] Milan Jezdimirovic \n",
+ "15 [Amazing Property, Marina Style, Needs TLC] Leo Cheung \n",
+ "16 [Light-filled, Freshly painted, Walker's parad... Grace Sun \n",
+ "17 [Unobstructed bay bridge views, Open layout] Matt Fuller \n",
+ "18 [Contractor's Special, Fixer-upper] Jaymee Faith Sagisi \n",
+ "19 [Historic Art Deco, Iconic views] NA \n",
+ "20 [] Miyuki Takami \n",
+ "21 [] NA \n",
+ "22 [] Annette Liberty \n",
+ "23 [] Isaac Munene \n",
+ "24 [] Marilyn Becklehimer \n",
+ "25 [] Joel Gile \n",
+ "26 [] Lucy Goldenshteyn \n",
+ "27 [] Tracey Broadman \n",
+ "28 [] Mike Stack \n",
+ "29 [] Robyn Kaufman \n",
+ "30 [Spacious, Balcony, Bright courtyard views] Paul Hwang \n",
+ "31 [New Construction] Lennar \n",
+ "32 [New Construction] Lennar \n",
+ "33 [New Construction] Lennar \n",
+ "34 [Updated, Single-family home] Cristal Wright \n",
+ "35 [] Mila Romprey \n",
+ "36 [] Genie Mantzoros \n",
+ "37 [] Shawn Zahraie \n",
+ "38 [] Janice Lee \n",
+ "39 [Land, 0.12 Acre, $251,467 per Acre] Heidy Carrera \n",
+ "\n",
+ " agency \n",
+ "0 Elevation Real Estate \n",
+ "1 eXp Realty of Northern CA Inc. \n",
+ "2 Elevation Real Estate \n",
+ "3 Corcoran Icon Properties \n",
+ "4 Corcoran Icon Properties \n",
+ "5 Elevation Real Estate \n",
+ "6 Rise Homes \n",
+ "7 Referral Realty-BV \n",
+ "8 Coldwell Banker Realty \n",
+ "9 Remax Prestigious Properties \n",
+ "10 Kinetic Real Estate \n",
+ "11 Compass \n",
+ "12 LKJ Realty \n",
+ "13 Statewide Realty \n",
+ "14 Compass \n",
+ "15 eXp Realty of California, Inc \n",
+ "16 Compass \n",
+ "17 Jackson Fuller Real Estate \n",
+ "18 IMPACT \n",
+ "19 NA \n",
+ "20 eXp Realty of California, Inc \n",
+ "21 NA \n",
+ "22 Coldwell Banker Realty \n",
+ "23 Coldwell Banker Realty \n",
+ "24 Dio Real Estate \n",
+ "25 Sequoia Real Estate \n",
+ "26 Redfin \n",
+ "27 Vanguard Properties \n",
+ "28 Vanguard Properties \n",
+ "29 Vivre Real Estate \n",
+ "30 Skybox Realty \n",
+ "31 Lennar \n",
+ "32 Lennar \n",
+ "33 Lennar \n",
+ "34 Vanguard Properties \n",
+ "35 Premier Realty Associates \n",
+ "36 Epic Real Estate & Asso. Inc. \n",
+ "37 Affinity Enterprises, Inc \n",
+ "38 Coldwell Banker Realty \n",
+ "39 Berkshire Hathaway HomeService "
+ ]
+ },
+ "metadata": {},
+ "output_type": "execute_result",
+ "execution_count": 10
+ }
+ ],
+ "metadata": {
+ "id": "1lS9O1KOI51y",
+ "colab": {
+ "height": 1000,
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "89fe200c-deca-45b1-be2e-6cf3e9f97fe2"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "Save it to CSV",
+ "outputs": [],
+ "metadata": {
+ "id": "v0CBYVk7qA5Z"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "# Save the DataFrame to a CSV file\ncsv_file = \"houses_forsale.csv\"\ndf.to_csv(csv_file, index=False)\nprint(f\"Data saved to {csv_file}\")",
+ "outputs": [
+ {
+ "name": "stdout",
+ "text": [
+ "Data saved to zillow_forsale.csv\n"
+ ],
+ "output_type": "stream"
+ }
+ ],
+ "metadata": {
+ "id": "BtEbB9pmQGhO",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "96b40841-381e-49fb-db05-752dfe63ad00"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "## 🔗 Resources",
+ "outputs": [],
+ "metadata": {
+ "id": "-1SZT8VzTZNd"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "\n
\n
\n
\n\n\n- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n\nMade with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n",
+ "outputs": [],
+ "metadata": {
+ "id": "dUi2LtMLRDDR"
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "provenance": []
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python",
+ "version": "3.10.14",
+ "mimetype": "text/x-python",
+ "file_extension": ".py",
+ "pygments_lexer": "ipython3",
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "nbconvert_exporter": "python"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}
\ No newline at end of file
diff --git a/cookbook/research-agent/crewai_integration.ipynb b/cookbook/research-agent/crewai_integration.ipynb
index 01ee97d..8e47bb0 100644
--- a/cookbook/research-agent/crewai_integration.ipynb
+++ b/cookbook/research-agent/crewai_integration.ipynb
@@ -1,235 +1,222 @@
{
"cells": [
{
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "
\n",
- "
\n",
- ""
- ]
- },
- {
- "cell_type": "markdown",
+ "source": "# Scraping agent for ecommerce\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/crewai-integration) [](https://colab.research.google.com/drive/1oqGzVer_XTWOJq88tpcPyQ0B72l38YUL?usp=sharing)",
+ "outputs": [
+ {
+ "data": {
+ "text/plain": "# Scraping agent for ecommerce\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/crewai-integration) [](https://colab.research.google.com/drive/1oqGzVer_XTWOJq88tpcPyQ0B72l38YUL?usp=sharing)",
+ "text/markdown": "# Scraping agent for ecommerce\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/crewai-integration) [](https://colab.research.google.com/drive/1oqGzVer_XTWOJq88tpcPyQ0B72l38YUL?usp=sharing)"
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
"metadata": {
"id": "XjbB5NvzdB0j"
},
- "source": [
- "# Scraping agent for ecommerce"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "completed",
+ "execution_count": null,
+ "executionEndTime": "2026-03-26T00:08:56.908Z",
+ "executionStartTime": "2026-03-26T00:08:56.908Z"
},
{
- "cell_type": "markdown",
+ "source": "",
+ "outputs": [],
"metadata": {
"id": "3DHEIAp316DR"
},
- "source": [
- ""
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "## Installation of the required libraries",
+ "outputs": [],
"metadata": {
"id": "wPQ3ZmBjc-Mz"
},
- "source": [
- "## Installation of the required libraries"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
+ "source": "%%capture\n!pip install crewai\n!pip install crewai-tools",
+ "outputs": [],
"metadata": {
"id": "m9mtKbuacveq"
},
- "outputs": [],
- "source": [
- "%%capture\n",
- "!pip install crewai\n",
- "!pip install crewai-tools"
- ]
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "## Environment configuration",
+ "outputs": [],
"metadata": {
"id": "ws1ueZZ9djul"
},
- "source": [
- "## Environment configuration"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "_XTTnMmZdnhz",
- "outputId": "dcc56a3b-7bc3-4e0e-d057-310e1e98b9d8"
- },
+ "source": "import os\nfrom getpass import getpass\n\n# Check if the API key is already set in the environment\nsgai_api_key = os.getenv(\"SCRAPEGRAPH_API_KEY\")\n\nif sgai_api_key:\n print(\"SCRAPEGRAPH_API_KEY found in environment.\")\nelse:\n print(\"SCRAPEGRAPH_API_KEY not found in environment.\")\n # Prompt the user to input the API key securely (hidden input)\n sgai_api_key = getpass(\"Please enter your SGAI_API_KEY: \").strip()\n if sgai_api_key:\n # Set the API key in the environment\n os.environ[\"SCRAPEGRAPH_API_KEY\"] = sgai_api_key\n print(\"SCRAPEGRAPH_API_KEY has been set in the environment.\")\n else:\n print(\"No API key entered. Please set the API key to continue.\")",
"outputs": [
{
"name": "stdout",
- "output_type": "stream",
"text": [
"SCRAPEGRAPH_API_KEY found in environment.\n"
- ]
+ ],
+ "output_type": "stream"
}
],
- "source": [
- "import os\n",
- "from getpass import getpass\n",
- "\n",
- "# Check if the API key is already set in the environment\n",
- "sgai_api_key = os.getenv(\"SCRAPEGRAPH_API_KEY\")\n",
- "\n",
- "if sgai_api_key:\n",
- " print(\"SCRAPEGRAPH_API_KEY found in environment.\")\n",
- "else:\n",
- " print(\"SCRAPEGRAPH_API_KEY not found in environment.\")\n",
- " # Prompt the user to input the API key securely (hidden input)\n",
- " sgai_api_key = getpass(\"Please enter your SGAI_API_KEY: \").strip()\n",
- " if sgai_api_key:\n",
- " # Set the API key in the environment\n",
- " os.environ[\"SCRAPEGRAPH_API_KEY\"] = sgai_api_key\n",
- " print(\"SCRAPEGRAPH_API_KEY has been set in the environment.\")\n",
- " else:\n",
- " print(\"No API key entered. Please set the API key to continue.\")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
"metadata": {
+ "id": "_XTTnMmZdnhz",
"colab": {
"base_uri": "https://localhost:8080/"
},
- "id": "pbWasAqodoZG",
- "outputId": "06f75acf-c0be-4fdd-e7b7-9f19ce4a156a"
+ "outputId": "dcc56a3b-7bc3-4e0e-d057-310e1e98b9d8"
},
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "# Check if the API key is already set in the environment\nsgai_api_key = os.getenv(\"OPENAI_API_KEY\")\n\nif sgai_api_key:\n print(\"OPENAI_API_KEY found in environment.\")\nelse:\n print(\"OPENAI_API_KEY not found in environment.\")\n # Prompt the user to input the API key securely (hidden input)\n sgai_api_key = getpass(\"Please enter your OPENAI_API_KEY: \").strip()\n if sgai_api_key:\n # Set the API key in the environment\n os.environ[\"OPENAI_API_KEY\"] = sgai_api_key\n print(\"OPENAI_API_KEY has been set in the environment.\")\n else:\n print(\"No API key entered. Please set the API key to continue.\")",
"outputs": [
{
"name": "stdout",
- "output_type": "stream",
"text": [
"OPENAI_API_KEY found in environment.\n"
- ]
+ ],
+ "output_type": "stream"
}
],
- "source": [
- "# Check if the API key is already set in the environment\n",
- "sgai_api_key = os.getenv(\"OPENAI_API_KEY\")\n",
- "\n",
- "if sgai_api_key:\n",
- " print(\"OPENAI_API_KEY found in environment.\")\n",
- "else:\n",
- " print(\"OPENAI_API_KEY not found in environment.\")\n",
- " # Prompt the user to input the API key securely (hidden input)\n",
- " sgai_api_key = getpass(\"Please enter your OPENAI_API_KEY: \").strip()\n",
- " if sgai_api_key:\n",
- " # Set the API key in the environment\n",
- " os.environ[\"OPENAI_API_KEY\"] = sgai_api_key\n",
- " print(\"OPENAI_API_KEY has been set in the environment.\")\n",
- " else:\n",
- " print(\"No API key entered. Please set the API key to continue.\")"
- ]
+ "metadata": {
+ "id": "pbWasAqodoZG",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "06f75acf-c0be-4fdd-e7b7-9f19ce4a156a"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "## Imports",
+ "outputs": [],
"metadata": {
"id": "Dpb7akE3dUjX"
},
- "source": [
- "## Imports"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
+ "source": "from crewai import Agent, Crew, Process, Task\nfrom crewai_tools import ScrapegraphScrapeTool",
+ "outputs": [],
"metadata": {
"id": "2vOrmP3GdWJt"
},
- "outputs": [],
- "source": [
- "from crewai import Agent, Crew, Process, Task\n",
- "from crewai_tools import ScrapegraphScrapeTool"
- ]
- },
- {
"cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
"execution_count": null,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "lyC3gQWxeGVh",
- "outputId": "764eed43-2558-4e77-8fe5-e7280edda2d2"
- },
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "website = \"https://www.ebay.com/sch/i.html?_nkw=keyboard&_sacat=0&_from=R40&_trksid=p4432023.m570.l1313\"\ntool = ScrapegraphScrapeTool()\n\nagent = Agent(\n role=\"Web Researcher\",\n goal=\"Research and extract accurate information from websites\",\n backstory=\"You are an expert web researcher with experience in extracting and analyzing information from various websites.\",\n tools=[tool],\n)\n\ntask = Task(\n name=\"scraping task\",\n description=f\"Visit the website {website} and extract detailed information about all the keyboards available.\",\n expected_output=\"A file with the informations extracted from the website.\",\n agent=agent,\n)\n\ncrew = Crew(\n agents=[agent],\n tasks=[task],\n)\n\nres = crew.kickoff()",
"outputs": [
{
"name": "stderr",
- "output_type": "stream",
"text": [
"WARNING:opentelemetry.trace:Overriding of current TracerProvider is not allowed\n"
- ]
+ ],
+ "output_type": "stream"
}
],
- "source": [
- "website = \"https://www.ebay.com/sch/i.html?_nkw=keyboard&_sacat=0&_from=R40&_trksid=p4432023.m570.l1313\"\n",
- "tool = ScrapegraphScrapeTool()\n",
- "\n",
- "agent = Agent(\n",
- " role=\"Web Researcher\",\n",
- " goal=\"Research and extract accurate information from websites\",\n",
- " backstory=\"You are an expert web researcher with experience in extracting and analyzing information from various websites.\",\n",
- " tools=[tool],\n",
- ")\n",
- "\n",
- "task = Task(\n",
- " name=\"scraping task\",\n",
- " description=f\"Visit the website {website} and extract detailed information about all the keyboards available.\",\n",
- " expected_output=\"A file with the informations extracted from the website.\",\n",
- " agent=agent,\n",
- ")\n",
- "\n",
- "crew = Crew(\n",
- " agents=[agent],\n",
- " tasks=[task],\n",
- ")\n",
- "\n",
- "res = crew.kickoff()"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
"metadata": {
+ "id": "lyC3gQWxeGVh",
"colab": {
- "base_uri": "https://localhost:8080/",
- "height": 137
+ "base_uri": "https://localhost:8080/"
},
- "id": "fiPH4Ta8JnCO",
- "outputId": "4b9d7fb0-511e-4158-de52-e5b549ca8f5e"
+ "outputId": "764eed43-2558-4e77-8fe5-e7280edda2d2"
},
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "res.raw",
"outputs": [
{
"data": {
- "application/vnd.google.colaboratory.intrinsic+json": {
- "type": "string"
- },
"text/plain": [
"'{\\n \"keyboards\": [\\n {\\n \"title\": \"Teclado para juegos SteelSeries Apex Pro TKL Gen 3 con cable certificado restaurado\",\\n \"price\": \"USD 164.99\",\\n \"seller_information\": {\\n \"seller_name\": \"gamesngadgetsplus\",\\n \"ratings\": 3580,\\n \"positive_feedback\": \"99.1%\"\\n },\\n \"item_specifics\": {\\n \"condition\": \"Restored\",\\n \"shipping_cost\": \"USD 63.74\",\\n \"shipping_origin\": \"United States\"\\n },\\n \"link\": \"https://www.ebay.com/itm/146284292952?_skw=keyboard&epid=14072039453&itmmeta=01JHZ2XD60HFYG3H2SGW3ZPD20&hash=item220f392b58:g:5NoAAOSw7ytnZb8d&itmprp=enc%3AAQAJAAAA4HoV3kP08IDx%2BKZ9MfhVJKnJnAISV9LpbsB4qVE8BPDpZj1d0167Ko%2FPBy31lCqIBYnInLHGEdjeLgwAL02glnIeEEqKoS8V3y38FHo2UIjJyTsghnHunyefi%2FEBvMC2cLNkeLXhfAORbJw3MAn4akA698gi%2FDHZRVsyYFpHdTlbNnmDWCY0Q9P9Fwen%2BFtKEK%2BFBGll8l%2F%2BfDJ1U5bOAA8iV65K2VaglTrW4CbrLvIABLQFhJAgq4wik7AiQpnRDahukAMx%2BX3VRfl8iGPeGPZ0WXJ93qe41x8cGWgH0ljL%7Ctkp%3ABFBMytP14o9l\"\\n },\\n {\\n \"title\": \"Teclado electrónico portátil de 61 teclas para todos los niveles (sin soporte)\",\\n \"price\": \"Not specified\",\\n \"seller_information\": {\\n \"seller_name\": \"Not specified\",\\n \"ratings\": \"Not specified\",\\n \"positive_feedback\": \"Not specified\"\\n },\\n \"item_specifics\": {\\n \"condition\": \"Not specified\",\\n \"shipping_cost\": \"Not specified\",\\n \"shipping_origin\": \"Not specified\"\\n },\\n \"link\": \"https://www.ebay.com/itm/116374921423?_skw=keyboard&itmmeta=01JHZ2XD60HFBCXZGCTNGMSD5V&hash=item1b187c60cf:g:PosAAOSwO~pnIUCg&itmprp=enc%3AAQAJAAAA4HoV3kP08IDx%2BKZ9MfhVJKmMPWqDzISoc%2Fy0NEb29gCBnD2SpusvA4MSMxa35GGERgjG9%2F6sPB22PedK5%2F1LDxZYJxeqhSv646gq6mzx0ZBojH%2BEibOOCTplRqmprWLDLC6HUF16BntMg6fgfm0Q6F4OvFJ1CqEgvY6uz2HQ759uu%2F2yNkR1WiVurWxkd1HRZZ1kvI6r3XrsTvwVQnSa%2F6SWbqbe0IYP8%2FN8ohPwBzLez7IxzcWbVbFWk31XtVDH0%2Bio5dT6ZWgofYotfUUQUuYjCwnuLCXR0BDwXX1BX4\"\\n },\\n {\\n \"title\": \"Teclado electrónico portátil de 61 teclas\",\\n \"price\": \"USD 49.99 (or Best Offer)\",\\n \"seller_information\": {\\n \"seller_name\": \"elanza771\",\\n \"ratings\": 4076,\\n \"positive_rating_percentage\": 95.4\\n },\\n \"item_specifics\": \"Totally new\"\\n },\\n // More items similar to above...\\n ]\\n}'"
- ]
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
},
- "execution_count": 23,
"metadata": {},
- "output_type": "execute_result"
+ "output_type": "execute_result",
+ "execution_count": 23
}
],
- "source": [
- "res.raw"
- ]
+ "metadata": {
+ "id": "fiPH4Ta8JnCO",
+ "colab": {
+ "height": 137,
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "4b9d7fb0-511e-4158-de52-e5b549ca8f5e"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
}
],
"metadata": {
@@ -237,8 +224,8 @@
"provenance": []
},
"kernelspec": {
- "display_name": "Python 3",
- "name": "python3"
+ "name": "python3",
+ "display_name": "Python 3"
},
"language_info": {
"name": "python"
@@ -246,4 +233,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
-}
+}
\ No newline at end of file
diff --git a/cookbook/research-agent/llama_index_agent.ipynb b/cookbook/research-agent/llama_index_agent.ipynb
index bb22c21..8b5f926 100644
--- a/cookbook/research-agent/llama_index_agent.ipynb
+++ b/cookbook/research-agent/llama_index_agent.ipynb
@@ -1,264 +1,239 @@
{
"cells": [
{
- "cell_type": "markdown",
- "metadata": {
- "id": "ReBHQ5_834pZ"
- },
- "source": [
- "
\n",
- "
\n",
- ""
- ]
- },
- {
- "cell_type": "markdown",
+ "source": "## 🕷️ Extract Keyboard prices with llama-index and ScrapegraphAI APIs\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/llama-index-agent) [](https://colab.research.google.com/drive/1oqGzVer_XTWOJq88tpcPyQ0B72l38YUL?usp=sharing)",
+ "outputs": [
+ {
+ "data": {
+ "text/plain": "## 🕷️ Extract Keyboard prices with llama-index and ScrapegraphAI APIs\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/llama-index-agent) [](https://colab.research.google.com/drive/1oqGzVer_XTWOJq88tpcPyQ0B72l38YUL?usp=sharing)",
+ "text/markdown": "## 🕷️ Extract Keyboard prices with llama-index and ScrapegraphAI APIs\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/llama-index-agent) [](https://colab.research.google.com/drive/1oqGzVer_XTWOJq88tpcPyQ0B72l38YUL?usp=sharing)"
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
"metadata": {
"id": "jEkuKbcRrPcK"
},
- "source": [
- "## 🕷️ Extract Keyboard prices with llama-index and ScrapegraphAI APIs"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "completed",
+ "execution_count": null,
+ "executionEndTime": "2026-03-26T00:14:21.547Z",
+ "executionStartTime": "2026-03-26T00:14:21.540Z"
},
{
- "cell_type": "markdown",
+ "source": "

",
+ "outputs": [],
"metadata": {
"id": "bsSKqJT1dl6J"
},
- "source": [
- "

"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 🔧 Install `dependencies`",
+ "outputs": [],
"metadata": {
"id": "IzsyDXEWwPVt"
},
- "source": [
- "### 🔧 Install `dependencies`"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
+ "source": "%%capture\n!pip install llama-index\n!pip install llama-index-tools-scrapegraphai",
+ "outputs": [],
"metadata": {
"id": "os_vm0MkIxr9"
},
- "outputs": [],
- "source": [
- "%%capture\n",
- "!pip install llama-index\n",
- "!pip install llama-index-tools-scrapegraphai"
- ]
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 🔑 Import `ScrapeGraph` and `OpenaAI` API keys",
+ "outputs": [],
"metadata": {
"id": "apBsL-L2KzM7"
},
- "source": [
- "### 🔑 Import `ScrapeGraph` and `OpenaAI` API keys"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)",
+ "outputs": [],
"metadata": {
"id": "ol9gQbAFkh9b"
},
- "source": [
- "You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "sffqFG2EJ8bI",
- "outputId": "ac476453-97dc-4514-a42a-dc7c40fb6011"
- },
+ "source": "import os\nfrom getpass import getpass\n\n# Check if the API key is already set in the environment\nsgai_api_key = os.getenv(\"SGAI_API_KEY\")\n\nif sgai_api_key:\n print(\"SGAI_API_KEY found in environment.\")\nelse:\n print(\"SGAI_API_KEY not found in environment.\")\n # Prompt the user to input the API key securely (hidden input)\n sgai_api_key = getpass(\"Please enter your SGAI_API_KEY: \").strip()\n if sgai_api_key:\n # Set the API key in the environment\n os.environ[\"SGAI_API_KEY\"] = sgai_api_key\n print(\"SGAI_API_KEY has been set in the environment.\")\n else:\n print(\"No API key entered. Please set the API key to continue.\")\n",
"outputs": [
{
"name": "stdout",
- "output_type": "stream",
"text": [
"SGAI_API_KEY found in environment.\n"
- ]
+ ],
+ "output_type": "stream"
}
],
- "source": [
- "import os\n",
- "from getpass import getpass\n",
- "\n",
- "# Check if the API key is already set in the environment\n",
- "sgai_api_key = os.getenv(\"SGAI_API_KEY\")\n",
- "\n",
- "if sgai_api_key:\n",
- " print(\"SGAI_API_KEY found in environment.\")\n",
- "else:\n",
- " print(\"SGAI_API_KEY not found in environment.\")\n",
- " # Prompt the user to input the API key securely (hidden input)\n",
- " sgai_api_key = getpass(\"Please enter your SGAI_API_KEY: \").strip()\n",
- " if sgai_api_key:\n",
- " # Set the API key in the environment\n",
- " os.environ[\"SGAI_API_KEY\"] = sgai_api_key\n",
- " print(\"SGAI_API_KEY has been set in the environment.\")\n",
- " else:\n",
- " print(\"No API key entered. Please set the API key to continue.\")\n"
- ]
+ "metadata": {
+ "id": "sffqFG2EJ8bI",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "ac476453-97dc-4514-a42a-dc7c40fb6011"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "You can find OpenAI key [here](https://auth.openai.com/authorize?audience=https%3A%2F%2Fapi.openai.com%2Fv1&auth0Client=eyJuYW1lIjoiYXV0aDAtc3BhLWpzIiwidmVyc2lvbiI6IjEuMjEuMCJ9&client_id=DRivsnm2Mu42T3KOpqdtwB3NYviHYzwD&device_id=ae0f442e-634e-48c9-97c1-586c458be4a9&issuer=https%3A%2F%2Fauth.openai.com&max_age=0&nonce=VXEuaDZsWUNQNHIyUDJ2N2k1TExpREUuWlg0R29YUmk5elUxYVU3QUpiNQ%3D%3D&redirect_uri=https%3A%2F%2Fplatform.openai.com%2Fauth%2Fcallback&response_mode=query&response_type=code&scope=openid+profile+email+offline_access&state=WGNmdGRub21STEtUcUMzRWRkYkFFbWI1VEJ6VkczYzBMdndBXzlnN05SZg%3D%3D&flow=treatment)",
+ "outputs": [],
"metadata": {
"id": "FFla30JV8vyc"
},
- "source": [
- "You can find OpenAI key [here](https://auth.openai.com/authorize?audience=https%3A%2F%2Fapi.openai.com%2Fv1&auth0Client=eyJuYW1lIjoiYXV0aDAtc3BhLWpzIiwidmVyc2lvbiI6IjEuMjEuMCJ9&client_id=DRivsnm2Mu42T3KOpqdtwB3NYviHYzwD&device_id=ae0f442e-634e-48c9-97c1-586c458be4a9&issuer=https%3A%2F%2Fauth.openai.com&max_age=0&nonce=VXEuaDZsWUNQNHIyUDJ2N2k1TExpREUuWlg0R29YUmk5elUxYVU3QUpiNQ%3D%3D&redirect_uri=https%3A%2F%2Fplatform.openai.com%2Fauth%2Fcallback&response_mode=query&response_type=code&scope=openid+profile+email+offline_access&state=WGNmdGRub21STEtUcUMzRWRkYkFFbWI1VEJ6VkczYzBMdndBXzlnN05SZg%3D%3D&flow=treatment)"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "Yx895fNp8uQV",
- "outputId": "a1f8ddb5-31d5-442c-edb1-5b08fc237d92"
- },
+ "source": "import os\nfrom getpass import getpass\n\n# Check if the API key is already set in the environment\nsgai_api_key = os.getenv(\"OPENAI_API_KEY\")\n\nif sgai_api_key:\n print(\"OPENAI_API_KEY found in environment.\")\nelse:\n print(\"OPENAI_API_KEY not found in environment.\")\n # Prompt the user to input the API key securely (hidden input)\n sgai_api_key = getpass(\"Please enter your OPENAI_API_KEY: \").strip()\n if sgai_api_key:\n # Set the API key in the environment\n os.environ[\"OPENAI_API_KEY\"] = sgai_api_key\n print(\"OPENAI_API_KEY has been set in the environment.\")\n else:\n print(\"No API key entered. Please set the API key to continue.\")\n",
"outputs": [
{
"name": "stdout",
- "output_type": "stream",
"text": [
"OPENAI_API_KEY found in environment.\n"
- ]
+ ],
+ "output_type": "stream"
}
],
- "source": [
- "import os\n",
- "from getpass import getpass\n",
- "\n",
- "# Check if the API key is already set in the environment\n",
- "sgai_api_key = os.getenv(\"OPENAI_API_KEY\")\n",
- "\n",
- "if sgai_api_key:\n",
- " print(\"OPENAI_API_KEY found in environment.\")\n",
- "else:\n",
- " print(\"OPENAI_API_KEY not found in environment.\")\n",
- " # Prompt the user to input the API key securely (hidden input)\n",
- " sgai_api_key = getpass(\"Please enter your OPENAI_API_KEY: \").strip()\n",
- " if sgai_api_key:\n",
- " # Set the API key in the environment\n",
- " os.environ[\"OPENAI_API_KEY\"] = sgai_api_key\n",
- " print(\"OPENAI_API_KEY has been set in the environment.\")\n",
- " else:\n",
- " print(\"No API key entered. Please set the API key to continue.\")\n"
- ]
+ "metadata": {
+ "id": "Yx895fNp8uQV",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "a1f8ddb5-31d5-442c-edb1-5b08fc237d92"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "# Etract structured informations from a website",
+ "outputs": [],
"metadata": {
"id": "hWc1Qdc18eHE"
},
- "source": [
- "# Etract structured informations from a website"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "",
+ "outputs": [],
"metadata": {
"id": "j3F6AZs7fK2q"
},
- "source": [
- ""
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
+ "source": "def scrapegraph_tool_invocation(prompt, url):\n \"\"\"\n Invokes the Scrapegraph smart scraper tool to extract information from a webpage based on a prompt.\n\n Args:\n prompt (str): The prompt describing what information to extract from the webpage.\n url (str): The URL of the webpage to scrape.\n\n Returns:\n The response from the Scrapegraph tool containing the extracted information.\n\n Note:\n Requires the SGAI_API_KEY environment variable to be set.\n \"\"\"\n import os\n from llama_index.tools.scrapegraph.base import ScrapegraphToolSpec\n\n scrapegraph_tool = ScrapegraphToolSpec()\n response = scrapegraph_tool.scrapegraph_smartscraper(\n\n prompt=prompt,\n url=url,\n api_key=os.getenv(\"SGAI_API_KEY\"),\n )\n\n return response",
+ "outputs": [],
"metadata": {
"id": "Y5zuhhCbYwaH"
},
- "outputs": [],
- "source": [
- "def scrapegraph_tool_invocation(prompt, url):\n",
- " \"\"\"\n",
- " Invokes the Scrapegraph smart scraper tool to extract information from a webpage based on a prompt.\n",
- "\n",
- " Args:\n",
- " prompt (str): The prompt describing what information to extract from the webpage.\n",
- " url (str): The URL of the webpage to scrape.\n",
- "\n",
- " Returns:\n",
- " The response from the Scrapegraph tool containing the extracted information.\n",
- "\n",
- " Note:\n",
- " Requires the SGAI_API_KEY environment variable to be set.\n",
- " \"\"\"\n",
- " import os\n",
- " from llama_index.tools.scrapegraph.base import ScrapegraphToolSpec\n",
- "\n",
- " scrapegraph_tool = ScrapegraphToolSpec()\n",
- " response = scrapegraph_tool.scrapegraph_smartscraper(\n",
- "\n",
- " prompt=prompt,\n",
- " url=url,\n",
- " api_key=os.getenv(\"SGAI_API_KEY\"),\n",
- " )\n",
- "\n",
- " return response"
- ]
- },
- {
"cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
"execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "from llama_index.core.tools import FunctionTool\nfrom llama_index.llms.openai import OpenAI\nfrom llama_index.core.agent import ReActAgent\n\n\nscrape_tool = FunctionTool.from_defaults(fn=scrapegraph_tool_invocation)\n\n# initialize llm\nllm = OpenAI(model=\"gpt-4o\")\n\n# initialize ReAct agent\nagent = ReActAgent.from_tools([scrape_tool ], llm=llm, verbose=True)",
+ "outputs": [],
"metadata": {
"id": "5trXVkGUYN2t"
},
- "outputs": [],
- "source": [
- "from llama_index.core.tools import FunctionTool\n",
- "from llama_index.llms.openai import OpenAI\n",
- "from llama_index.core.agent import ReActAgent\n",
- "\n",
- "\n",
- "scrape_tool = FunctionTool.from_defaults(fn=scrapegraph_tool_invocation)\n",
- "\n",
- "# initialize llm\n",
- "llm = OpenAI(model=\"gpt-4o\")\n",
- "\n",
- "# initialize ReAct agent\n",
- "agent = ReActAgent.from_tools([scrape_tool ], llm=llm, verbose=True)"
- ]
- },
- {
"cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
"execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "link = \"https://www.ebay.com/sch/i.html?_from=R40&_trksid=p4432023.m570.l1313&_nkw=keyboards&_sacat=0\"",
+ "outputs": [],
"metadata": {
"id": "mW9NSsXFdMe8"
},
- "outputs": [],
- "source": [
- "link = \"https://www.ebay.com/sch/i.html?_from=R40&_trksid=p4432023.m570.l1313&_nkw=keyboards&_sacat=0\""
- ]
- },
- {
"cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
"execution_count": null,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "5qKjYpdZYP9G",
- "outputId": "bd9fd0c9-7382-43d5-a1a3-7ff497669d4f"
- },
+ "executionEndTime": null,
+ "executionStartTime": null
+ },
+ {
+ "source": "res = agent.chat(f\"Extract me all the keyboard names and prices from the following website: {link}\")",
"outputs": [
{
"name": "stdout",
- "output_type": "stream",
"text": [
"> Running step 4abd0d38-1ec8-49dd-b5e3-7eb63e4de6eb. Step input: Extract me all the keyboard names and prices from the following website: https://www.ebay.com/sch/i.html?_from=R40&_trksid=p4432023.m570.l1313&_nkw=keyboards&_sacat=0\n",
"\u001b[1;3;38;5;200mThought: The current language of the user is English. I need to use a tool to help me extract all the keyboard names and prices from the provided eBay URL.\n",
@@ -340,36 +315,44 @@
"\n",
"If you need further details or have any other requests, feel free to ask!\n",
"\u001b[0m"
- ]
+ ],
+ "output_type": "stream"
}
],
- "source": [
- "res = agent.chat(f\"Extract me all the keyboard names and prices from the following website: {link}\")"
- ]
+ "metadata": {
+ "id": "5qKjYpdZYP9G",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "bd9fd0c9-7382-43d5-a1a3-7ff497669d4f"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### Print the result",
+ "outputs": [],
"metadata": {
"id": "Eej-kiuZfi1R"
},
- "source": [
- "### Print the result"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "QJSvlPpJflJN",
- "outputId": "ca3f12fb-c44d-43cf-9086-47006e06c159"
- },
+ "source": "print(res)",
"outputs": [
{
"name": "stdout",
- "output_type": "stream",
"text": [
"Here are all the keyboard names and their prices extracted from the eBay website:\n",
"\n",
@@ -443,44 +426,52 @@
"68. Russian/English Language 78 Keys Slim Lightweight Portable Wired USB Keyboard - $25.99\n",
"\n",
"If you need further details or have any other requests, feel free to ask!\n"
- ]
+ ],
+ "output_type": "stream"
}
],
- "source": [
- "print(res)"
- ]
+ "metadata": {
+ "id": "QJSvlPpJflJN",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "ca3f12fb-c44d-43cf-9086-47006e06c159"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "## 🔗 Resources",
+ "outputs": [],
"metadata": {
"id": "-1SZT8VzTZNd"
},
- "source": [
- "## 🔗 Resources"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "\n
\n
\n
\n\n\n- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n- 🦙 **LlamaIndex:** [ScrapeGraph docs](https://docs.llamaindex.ai/en/stable/api_reference/tools/scrapegraph/)\n\nMade with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n",
+ "outputs": [],
"metadata": {
"id": "dUi2LtMLRDDR"
},
- "source": [
- "\n",
- "
\n",
- "
\n",
- "
\n",
- "\n",
- "\n",
- "- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n",
- "- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n",
- "- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n",
- "- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n",
- "- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n",
- "- 🦙 **LlamaIndex:** [ScrapeGraph docs](https://docs.llamaindex.ai/en/stable/api_reference/tools/scrapegraph/)\n",
- "\n",
- "Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
}
],
"metadata": {
@@ -488,22 +479,22 @@
"provenance": []
},
"kernelspec": {
- "display_name": "Python 3",
- "name": "python3"
+ "name": "python3",
+ "display_name": "Python 3"
},
"language_info": {
+ "name": "python",
+ "version": "3.10.14",
+ "mimetype": "text/x-python",
+ "file_extension": ".py",
+ "pygments_lexer": "ipython3",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
- "file_extension": ".py",
- "mimetype": "text/x-python",
- "name": "python",
- "nbconvert_exporter": "python",
- "pygments_lexer": "ipython3",
- "version": "3.10.14"
+ "nbconvert_exporter": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
-}
+}
\ No newline at end of file
diff --git a/cookbook/research-agent/scrapegraph_langgraph_tavily.ipynb b/cookbook/research-agent/scrapegraph_langgraph_tavily.ipynb
index cc37897..608e661 100644
--- a/cookbook/research-agent/scrapegraph_langgraph_tavily.ipynb
+++ b/cookbook/research-agent/scrapegraph_langgraph_tavily.ipynb
@@ -1,87 +1,103 @@
{
"cells": [
{
- "cell_type": "markdown",
- "metadata": {
- "id": "ReBHQ5_834pZ"
- },
- "source": [
- "
\n",
- "
\n",
- ""
- ]
- },
- {
- "cell_type": "markdown",
+ "source": "## 🕷️ Research Agent with `scrapegraph`, `langgraph`, and `tavily`\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-langgraph-tavily) [](https://colab.research.google.com/drive/1lk4cjuRLDN0w71kXm9uAPQM6jH03yGW9?usp=sharing)",
+ "outputs": [
+ {
+ "data": {
+ "text/plain": "## 🕷️ Research Agent with `scrapegraph`, `langgraph`, and `tavily`\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-langgraph-tavily) [](https://colab.research.google.com/drive/1lk4cjuRLDN0w71kXm9uAPQM6jH03yGW9?usp=sharing)",
+ "text/markdown": "## 🕷️ Research Agent with `scrapegraph`, `langgraph`, and `tavily`\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-langgraph-tavily) [](https://colab.research.google.com/drive/1lk4cjuRLDN0w71kXm9uAPQM6jH03yGW9?usp=sharing)"
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
"metadata": {
"id": "jEkuKbcRrPcK"
},
- "source": [
- "## 🕷️ Research Agent with `scrapegraph`, `langgraph`, and `tavily`"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "completed",
+ "execution_count": null,
+ "executionEndTime": "2026-03-26T00:07:22.280Z",
+ "executionStartTime": "2026-03-26T00:07:22.280Z"
},
{
- "cell_type": "markdown",
- "source": [
- ""
- ],
+ "source": "",
+ "outputs": [],
"metadata": {
"id": "cJrlyZbLwQek"
- }
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 🔧 Install `dependencies`",
+ "outputs": [],
"metadata": {
"id": "IzsyDXEWwPVt"
},
- "source": [
- "### 🔧 Install `dependencies`"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": 1,
+ "source": "%%capture\n!pip install langgraph langchain-scrapegraph langchain-openai \"langchain-community>=0.2.11\" tavily-python",
+ "outputs": [],
"metadata": {
"id": "os_vm0MkIxr9"
},
- "outputs": [],
- "source": [
- "%%capture\n",
- "!pip install langgraph langchain-scrapegraph langchain-openai \"langchain-community>=0.2.11\" tavily-python"
- ]
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": 1,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 🔑 Import `ScrapeGraph`, `Tavily` and `OpenAI` API keys",
+ "outputs": [],
"metadata": {
"id": "apBsL-L2KzM7"
},
- "source": [
- "### 🔑 Import `ScrapeGraph`, `Tavily` and `OpenAI` API keys"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)",
+ "outputs": [],
"metadata": {
"id": "ol9gQbAFkh9b"
},
- "source": [
- "You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {
- "id": "sffqFG2EJ8bI",
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "outputId": "7e47aa4d-edca-48f2-df6e-bd7fafabfc9c"
- },
+ "source": "import getpass\nimport os\n\nif not os.environ.get(\"SGAI_API_KEY\"):\n os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")\n\nif not os.environ.get(\"TAVILY_API_KEY\"):\n os.environ[\"TAVILY_API_KEY\"] = getpass.getpass(\"Tavily API key:\\n\")\n\nif not os.environ.get(\"OPENAI_API_KEY\"):\n os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API key:\\n\")",
"outputs": [
{
"name": "stdout",
- "output_type": "stream",
"text": [
"Scrapegraph API key:\n",
"··········\n",
@@ -89,322 +105,227 @@
"··········\n",
"OpenAI API key:\n",
"··········\n"
- ]
+ ],
+ "output_type": "stream"
}
],
- "source": [
- "import getpass\n",
- "import os\n",
- "\n",
- "if not os.environ.get(\"SGAI_API_KEY\"):\n",
- " os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")\n",
- "\n",
- "if not os.environ.get(\"TAVILY_API_KEY\"):\n",
- " os.environ[\"TAVILY_API_KEY\"] = getpass.getpass(\"Tavily API key:\\n\")\n",
- "\n",
- "if not os.environ.get(\"OPENAI_API_KEY\"):\n",
- " os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API key:\\n\")"
- ]
+ "metadata": {
+ "id": "sffqFG2EJ8bI",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "7e47aa4d-edca-48f2-df6e-bd7fafabfc9c"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": 2,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 📝 Defining an `Output Schema` for Webpage Content Extraction\n",
+ "outputs": [],
"metadata": {
"id": "jnqMB2-xVYQ7"
},
- "source": [
- "### 📝 Defining an `Output Schema` for Webpage Content Extraction\n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n\n
\n Pydantic Schema Quick Guide
\n\nTypes of Schemas \n\n1. Simple Schema \nUse this when you want to extract straightforward information, such as a single piece of content. \n\n```python\nfrom pydantic import BaseModel, Field\n\n# Simple schema for a single webpage\nclass PageInfoSchema(BaseModel):\n title: str = Field(description=\"The title of the webpage\")\n description: str = Field(description=\"The description of the webpage\")\n\n# Example Output JSON after AI extraction\n{\n \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n}\n```\n\n2. Complex Schema (Nested) \nIf you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n\n```python\nfrom pydantic import BaseModel, Field\nfrom typing import List\n\n# Define a schema for a single repository\nclass RepositorySchema(BaseModel):\n name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n description: str = Field(description=\"Description of the repository\")\n stars: int = Field(description=\"Star count of the repository\")\n forks: int = Field(description=\"Fork count of the repository\")\n today_stars: int = Field(description=\"Stars gained today\")\n language: str = Field(description=\"Programming language used\")\n\n# Define a schema for a list of repositories\nclass ListRepositoriesSchema(BaseModel):\n repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n\n# Example Output JSON after AI extraction\n{\n \"repositories\": [\n {\n \"name\": \"google-gemini/cookbook\",\n \"description\": \"Examples and guides for using the Gemini API\",\n \"stars\": 8036,\n \"forks\": 1001,\n \"today_stars\": 649,\n \"language\": \"Jupyter Notebook\"\n },\n {\n \"name\": \"TEN-framework/TEN-Agent\",\n \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n \"stars\": 3224,\n \"forks\": 311,\n \"today_stars\": 361,\n \"language\": \"Python\"\n }\n ]\n}\n```\n\nKey Takeaways \n- **Simple Schema**: Perfect for small, straightforward extractions. \n- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n\nBoth approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n \n",
+ "outputs": [],
"metadata": {
"id": "VZvxbjfXvbgd"
},
- "source": [
- "If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n",
- "\n",
- "
\n",
- " Pydantic Schema Quick Guide
\n",
- "\n",
- "Types of Schemas \n",
- "\n",
- "1. Simple Schema \n",
- "Use this when you want to extract straightforward information, such as a single piece of content. \n",
- "\n",
- "```python\n",
- "from pydantic import BaseModel, Field\n",
- "\n",
- "# Simple schema for a single webpage\n",
- "class PageInfoSchema(BaseModel):\n",
- " title: str = Field(description=\"The title of the webpage\")\n",
- " description: str = Field(description=\"The description of the webpage\")\n",
- "\n",
- "# Example Output JSON after AI extraction\n",
- "{\n",
- " \"title\": \"ScrapeGraphAI: The Best Content Extraction Tool\",\n",
- " \"description\": \"ScrapeGraphAI provides powerful tools for structured content extraction from websites.\"\n",
- "}\n",
- "```\n",
- "\n",
- "2. Complex Schema (Nested) \n",
- "If you need to extract structured information with multiple related items (like a list of repositories), you can **nest schemas**.\n",
- "\n",
- "```python\n",
- "from pydantic import BaseModel, Field\n",
- "from typing import List\n",
- "\n",
- "# Define a schema for a single repository\n",
- "class RepositorySchema(BaseModel):\n",
- " name: str = Field(description=\"Name of the repository (e.g., 'owner/repo')\")\n",
- " description: str = Field(description=\"Description of the repository\")\n",
- " stars: int = Field(description=\"Star count of the repository\")\n",
- " forks: int = Field(description=\"Fork count of the repository\")\n",
- " today_stars: int = Field(description=\"Stars gained today\")\n",
- " language: str = Field(description=\"Programming language used\")\n",
- "\n",
- "# Define a schema for a list of repositories\n",
- "class ListRepositoriesSchema(BaseModel):\n",
- " repositories: List[RepositorySchema] = Field(description=\"List of GitHub trending repositories\")\n",
- "\n",
- "# Example Output JSON after AI extraction\n",
- "{\n",
- " \"repositories\": [\n",
- " {\n",
- " \"name\": \"google-gemini/cookbook\",\n",
- " \"description\": \"Examples and guides for using the Gemini API\",\n",
- " \"stars\": 8036,\n",
- " \"forks\": 1001,\n",
- " \"today_stars\": 649,\n",
- " \"language\": \"Jupyter Notebook\"\n",
- " },\n",
- " {\n",
- " \"name\": \"TEN-framework/TEN-Agent\",\n",
- " \"description\": \"TEN Agent is a conversational AI powered by TEN, integrating Gemini 2.0 Multimodal Live API, OpenAI Realtime API, RTC, and more.\",\n",
- " \"stars\": 3224,\n",
- " \"forks\": 311,\n",
- " \"today_stars\": 361,\n",
- " \"language\": \"Python\"\n",
- " }\n",
- " ]\n",
- "}\n",
- "```\n",
- "\n",
- "Key Takeaways \n",
- "- **Simple Schema**: Perfect for small, straightforward extractions. \n",
- "- **Complex Schema**: Use nesting to extract lists or structured data, like \"a list of repositories.\" \n",
- "\n",
- "Both approaches give the AI a clear structure to follow, ensuring that the extracted content matches exactly what you need.\n",
- " \n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": 17,
+ "source": "from pydantic import BaseModel, Field\nfrom typing import List\n\n# Schema for a single news item\nclass NewsItemSchema(BaseModel):\n title: str = Field(description=\"Title of the news article\")\n link: str = Field(description=\"URL to the news article\")\n description: str = Field(description=\"Summary/description of the news article\")\n\n# Schema that contains a list of news items\nclass ListNewsSchema(BaseModel):\n news: List[NewsItemSchema] = Field(description=\"List of news articles with their details\")",
+ "outputs": [],
"metadata": {
"id": "dlrOEgZk_8V4"
},
- "outputs": [],
- "source": [
- "from pydantic import BaseModel, Field\n",
- "from typing import List\n",
- "\n",
- "# Schema for a single news item\n",
- "class NewsItemSchema(BaseModel):\n",
- " title: str = Field(description=\"Title of the news article\")\n",
- " link: str = Field(description=\"URL to the news article\")\n",
- " description: str = Field(description=\"Summary/description of the news article\")\n",
- "\n",
- "# Schema that contains a list of news items\n",
- "class ListNewsSchema(BaseModel):\n",
- " news: List[NewsItemSchema] = Field(description=\"List of news articles with their details\")"
- ]
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": 17,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 🚀 Initialize `scrapegraph` and `tavily` tools and `langgraph` prebuilt agent and run the `extraction`",
+ "outputs": [],
"metadata": {
"id": "cDGH0b2DkY63"
},
- "source": [
- "### 🚀 Initialize `scrapegraph` and `tavily` tools and `langgraph` prebuilt agent and run the `extraction`"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "Here we use `SmartScraperTool` to extract structured data using AI from a webpage.\n\n\n> If you already have an HTML file, you can upload it and use `LocalScraperTool` instead.\n\nYou can find more info in the [official langchain documentation](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n\n",
+ "outputs": [],
"metadata": {
"id": "M1KSXffZopUD"
},
- "source": [
- "Here we use `SmartScraperTool` to extract structured data using AI from a webpage.\n",
- "\n",
- "\n",
- "> If you already have an HTML file, you can upload it and use `LocalScraperTool` instead.\n",
- "\n",
- "You can find more info in the [official langchain documentation](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n",
- "\n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": 18,
+ "source": "from langchain_scrapegraph.tools import SmartScraperTool\nfrom langchain_community.tools import TavilySearchResults\n\n# Will automatically get SGAI_API_KEY from environment\n# Initialization without output schema\n# smartscraper_tool = SmartScraperTool()\n\n# Since we have defined an output schema, let's use it\n# This will force the tool to have always the same output structure\nsmartscraper_tool = SmartScraperTool(llm_output_schema=ListNewsSchema)\n\n# Initialize tavily tool to look for URLs\ntavily_tool = TavilySearchResults(\n max_results=1,\n name=\"urls_finder\",\n description=\"Use this tool to find webpages urls that satisfy the user request\",\n)\n\n",
+ "outputs": [],
"metadata": {
"id": "ySoE0Rowjgp1"
},
- "outputs": [],
- "source": [
- "from langchain_scrapegraph.tools import SmartScraperTool\n",
- "from langchain_community.tools import TavilySearchResults\n",
- "\n",
- "# Will automatically get SGAI_API_KEY from environment\n",
- "# Initialization without output schema\n",
- "# smartscraper_tool = SmartScraperTool()\n",
- "\n",
- "# Since we have defined an output schema, let's use it\n",
- "# This will force the tool to have always the same output structure\n",
- "smartscraper_tool = SmartScraperTool(llm_output_schema=ListNewsSchema)\n",
- "\n",
- "# Initialize tavily tool to look for URLs\n",
- "tavily_tool = TavilySearchResults(\n",
- " max_results=1,\n",
- " name=\"urls_finder\",\n",
- " description=\"Use this tool to find webpages urls that satisfy the user request\",\n",
- ")\n",
- "\n"
- ]
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": 18,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
- "source": [
- "We then initialize the `llm model` we want to use in the agent\n",
- "\n"
- ],
+ "source": "We then initialize the `llm model` we want to use in the agent\n\n",
+ "outputs": [],
"metadata": {
"id": "W54HVoYeiJbG"
- }
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "source": [
- "# First we initialize the llm model we want to use.\n",
- "from langchain_openai import ChatOpenAI\n",
- "\n",
- "llm_model = ChatOpenAI(model=\"gpt-4o\", temperature=0)"
- ],
+ "source": "# First we initialize the llm model we want to use.\nfrom langchain_openai import ChatOpenAI\n\nllm_model = ChatOpenAI(model=\"gpt-4o\", temperature=0)",
+ "outputs": [],
"metadata": {
"id": "ctrkEnltiBCD"
},
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
"execution_count": 5,
- "outputs": []
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
- "source": [
- "Here we use `create_react_agent` to quickly use one of the prebuilt agents from `langgraph.prebuilt` module\n",
- "\n",
- "You can find more info in the [official langgraph documentation](https://langchain-ai.github.io/langgraph/how-tos/create-react-agent/)\n",
- "\n"
- ],
+ "source": "Here we use `create_react_agent` to quickly use one of the prebuilt agents from `langgraph.prebuilt` module\n\nYou can find more info in the [official langgraph documentation](https://langchain-ai.github.io/langgraph/how-tos/create-react-agent/)\n\n",
+ "outputs": [],
"metadata": {
"id": "M0WY2Pa8Y8Pk"
- }
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "source": [
- "from langgraph.prebuilt import create_react_agent\n",
- "from langgraph.checkpoint.memory import MemorySaver\n",
- "\n",
- "# List of tools we want the agent to use\n",
- "tools = [smartscraper_tool, tavily_tool]\n",
- "\n",
- "# We set up the agent's memory to review the different reasoning steps\n",
- "memory = MemorySaver()\n",
- "\n",
- "# Add a configuration to specify where to store the graph states\n",
- "config = {\"configurable\": {\"thread_id\": \"1\"}}\n",
- "\n",
- "# Initialize the ReAct agent\n",
- "graph = create_react_agent(\n",
- " model=llm_model,\n",
- " tools=tools,\n",
- " checkpointer=memory,\n",
- ")"
- ],
+ "source": "from langgraph.prebuilt import create_react_agent\nfrom langgraph.checkpoint.memory import MemorySaver\n\n# List of tools we want the agent to use\ntools = [smartscraper_tool, tavily_tool]\n\n# We set up the agent's memory to review the different reasoning steps\nmemory = MemorySaver()\n\n# Add a configuration to specify where to store the graph states\nconfig = {\"configurable\": {\"thread_id\": \"1\"}}\n\n# Initialize the ReAct agent\ngraph = create_react_agent(\n model=llm_model,\n tools=tools,\n checkpointer=memory,\n)",
+ "outputs": [],
"metadata": {
"id": "Zo1BcIlHhcQP"
},
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
"execution_count": 19,
- "outputs": []
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
- "source": [
- "Let's visualize the `graph`"
- ],
+ "source": "Let's visualize the `graph`",
+ "outputs": [],
"metadata": {
"id": "_UYcJ2Mxip5w"
- }
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": 8,
- "metadata": {
- "id": "2FIKomclLNFx",
- "colab": {
- "base_uri": "https://localhost:8080/",
- "height": 350
- },
- "outputId": "3419b941-b409-499e-c1e3-54f2526d467f"
- },
+ "source": "from IPython.display import Image, display\n\ndisplay(Image(graph.get_graph().draw_mermaid_png()))",
"outputs": [
{
- "output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAHwAAAFNCAIAAABNLZxVAAAAAXNSR0IArs4c6QAAIABJREFUeJztnXlAU1cW/+/LvrOFNWERVMCdEarFrS4VUeo2VXFrtdqpW6f92dra0aozU63j1Lr+uqFVK1ZUVByZKrVgVRQ3VEDZBNkiBMhGNrLn90f8UUcj8l7uy0sgn79Cknfu4cvlvPvuPfdcxGq1Ag/OhUS0Az0Rj+gE4BGdADyiE4BHdALwiE4AFOgWLRbQXKfTKE1apdlsshh07jEkpTFIDDaJzaNwvSk+QTRc20JgjdMtJlB6s63mvqa+QiuIYtJZZDaP7M2n6XVmKPbxxmoFSqlRqzTTWaTmen3kQHbkAE5IFAOPtuCIfutXWfktlbAPM3IgJzyWBcMxImmTGGvua6Rig1JqHPEGPyCMDte+o6LX3tfkHBYPHuM9fLIfPK9cBdHD9mtnJcERzFEz+RDNOiT6rV9l8hbj2FkBVDoC0SdXo65Ue/FEy9xPwuhMOOMO7KIX5sqNeku37ODPo5Kbjm6rX/z3XlQahO6FUfTcjBYWh/xqSo9QvIP9n9ekrglj88gO2sHy/3LvkoJGJ/U0xQEA89eGH/1XneN2UIv+uKpd0WwYNQPmjcVdYLBJU5aG5B5tcdAOatEvn2odMNLbwVbdl+BejHaNueaBxhEj6ESvvKPyDabxQ/B9YHNxElP8rp2VOmIBnegP76pHvuHvSHtdp6mpqbGxkajLO8E3iBbZn111T43ZAgrRW0V6tcLE9nb03t0VRCLR1KlTS0tLCbn8pQSEMyrvqjBfjkL0RyWaXgPYmFtChclkwjaWtV2F+fIu0msAu+Y+9rCOYpz+331NiSl+0GfgdDrd1q1bL1++DACIi4v7+OOPrVbr1KlTO76QkpKyadMmg8GQlpaWk5PT3NzM5/OnTJny3nvvkclkAMDs2bOjoqKioqIyMjJ0Ot2BAwfmzp37zOVwfQYA5GW09I7jhkUzMVyLYmq3vlKb5BeEoY3OOXDgQHZ29rJly/h8fnZ2NpPJZLFYX3zxxfr165ctWxYfH+/r6wsAIJPJN27cGD16tFAorKio+PHHH3k83oIFC2xGCgoKdDrdjh07tFpteHj485dDh0JD5M16fEU36CwkEkKhwp9jaWxsZDKZixYtolAo06dPt70ZExMDAIiIiBgyZIjtHTKZfOjQIQR54oBIJMrLy+sQnUKhbNmyhclkvuhy6LC9KJo2jLPWXY3pWpXZ8cdfuyQnJ+t0uvfff7+qqqrzb8pksq1bt06fPn3cuHHV1dVS6R/jtgEDBnQo7hzYPIpWacJ2bVdFt1oAnYmL6ImJibt27ZJKpampqV988YXJZP83kUql8+fPv3nz5vLly/fs2RMbG2s2/9HRnKw4AIBCJSEkjP/3XQ0vLC5Z0WrA1sZLSUxMHD58+NGjR3fs2BEcHLxkyZLnv3Py5EmZTHbw4MGgoCAAQFBQUF0dhGkQzKjkRjoL40xvVy+js0hGvcWCw9KbwWAAAJBIpPnz5/v7+5eXlwMAGAwGAKC1tbXjawqFwsfHx6a47cdOxl3PXw4djdLE5mFcYUZxWUQ/tkZp5vpADjIZGRmXLl2aPHlya2tra2trv379AACBgYECgSA9PZ3JZLa1taWmpsbHxx8/fvzbb78dPHhwXl7e1atXLRaLQqHw9rYzEfT85XQ65CU3AIAXn4rtQhT/IFxfanUx9sewFyEUCg0Gw44dO7KyslJTUxcuXAgAQBBky5YtbDb7q6++Onv2rEwmGzdu3NKlS0+cOLFu3Tqj0Xjw4MGIiIhjx47Ztfn85dDdLslvC4/BuBqM4uHocVX7zfOyGasE2FrqTjRUthfmyqYvxygFivAi6M1ESMBksFJevGSVkpKiVtuZCRo0aFBxcfHz73t5eZ05c6brPmAjPz9//fr1z79vtVqtViuJZOff/cKFC1TqC6OHuFbXN46H2R90y3X3LilUMlMnKxhisdhisXTdIIlE6rg34odOp7MbYSwWi8VioVDs9Lzg4OCOB7FnrWks6Vtql26OxOwP6jXSHzfUpH4cxsLnQcktyMtoCYpg9BuOvaejHmmOmuFfdFmBuT13Ryk16bRmRxTHInqfOI7RYCm+0uZIq+7L0X/XT5gX6KARLM9Uo2f6VxWpHVk6cVOOf90w9S8hNIajKUfYk41yfmqOHMDu8yeOgx64C8d3NCQtDPbiQ8hzxv5HS3or8FGJ+tavcsedcHGUUuP3a6tHTfOHojiEBNI7eYqSfEXiG/w+cd2wy7erzdfOSg3t5gnzA6l0aBsoIKRKq2Sma2clep0loh+7V3821xf+RgPnU1+uFdfpiq+0jXjDL3aYQ2OV54G2KUAiMtg2BdAYpOBIJoNFYvMoXB+qyYTiWYlALCarSmHSKs0IAoqvKIR9WH3iONDltgFN9A6kTYaWer26zahVmhES0CghTwc/ePAgLCyMy+XCNctgkehMMptH5vFp4TEsEp4Pf/BFx5ulS5euXLkyLi6OaEew49ldRwAe0QnA/UQXCAS2HCP3xf1Ef/z48dN5AO6I+4nOYrHsLju4Ee7nvVarRbVO4oK4n+g+Pj6enu5s5HK5p6c7G6FQ6Bm9OBuRSOQZvXhAjfuJzuFwPDdSZ6NWqz03UmfD4/E8Pd3ZKJVKT0/3gBr3Ez0oKMgzTnc2YrHYM073gBr3E90zDUAAnmkAD1hwP9FDQ0M94cXZNDQ0eMKLB9S4n+ieFAwC8KRgeMCC+4nuyXshAE/eCwF4ZhkJwDPL6AEL7ie6t7f3i0oluAvuJ3rnhaTcAvcT3TOfTgCe+XQC8PR0AvD0dALw8/Nz957uNpt3k5KSaDQaiUSSyWRsNptKpZJIJCqVmpmZSbRrqHGb4gksFquhocH2ur293fZi2bJlhDqFEbcJL8nJyc88EwmFwjlz5hDnEXbcRvRZs2YJBP9TfHLy5MnQyzI4B7cR3cfHZ9KkSR0/hoaGPn0Gg3vhNqIDAObNmxcaGmp77b7d3M1E5/F4SUlJCIKEh4e7bzfHa/SikptkYoPRAH99J3HwzBu96hITE8XVAADINQpJCMLxpvgG0ig4H68KeZyulBovnZRIGvXhsWyNys2eG+l0sqxZZ7WAPnGc+Nd98GsIpuhqhSnrm8Zx80K4Pm4z/LfLrRwJi0MaPhmXs3qgxnQrOPj32mkrw9xdcQBAQhJfq7bgV3ISmugFv8hGTAuAZY1wEpL4NQ80Oi0uaQfQRG+s1nJ9u9uRmfJmXA68gSa6xYLwfDEezOGa+AUzlHIjHpahia5RGC0W95iw7CIGnRngk9TkTg9H3QaP6ATgEZ0APKITgEd0AvCITgAe0QnAIzoBeEQnAI/oBOARnQC6v+hqtbryYTnRXvwP3V/0pX9JPXcO9yNPUeEGootE9Y5cbjs43KUgbGmtpaV5/4Fvbty4qtGoQ0PD581dPGH8k1wiqVSyZ++/CwtvUKjUoUOHXb6c+/236b16RQEAzvwn8/iJdImkJSgoZPy4SXNmL6TT6Q+rKt7/6ztbt+z+Yd+e6urKwMDg997964gRYwAAqfNS5HJZ1pkTWWdOBAYGZfycTdTv+zSEiW4ym8rLH0yb+qYXz/tyft7mLesFgtDYmP5ms/lv6z6UyaUffLBWJpOk7dsbNyTepvjBQz+cyEyfOSM1PDyyoaH22PGfRI/r/7b2HwAAvV7/93+ufX/VmuCgkAMHv/tiy7qMn7O9vLw3bdz2yaerhgweOuvN+VSaqyxsESZ6SLDg4I8nbDmhycnTZvx5wtWrv8fG9C8ru1/5sHzjhq2vjZkAAKivrz13/j8Gg0GpbDvy84/r120eM3q8zYKfn/+OnV+uWvmx7cf3V60ZN3YiAGDp0lXvLVtQVHxn9KhxMdH9KBSKnx9/4MAhRP2mz0Pkyn1VdeXBQ99XVJQCAMxms0wmBQC0tDYDAEJChLbvCIVhFoulvV1bWHjDZDJt3rJ+85YnR3TbkkckrS22H5kMpu1FYGAwAEAiaSXo13o5hIl+5+6tT9e+Hzck/pM1G9ks9oZNayxWCwBAIAgFAJSU3OvbJwYAUFZ2n8/39/LylsokAIAtm3cG+P/PwbchIcKa2uqn36FSqAAAi8V1U50IE/3w4X0hIcItm3faDjPv6KfRfWMT4of/kLa7ublJ0Sa/eu3S+nWbAQBc7pOz+8LCItC25Wq7TQgbMrYpFb2j+toUNxgM2vY/alu8v2qNUBjWIKrz9vLZu+eALbjHxSUgCHI661iHhY79GJ3DZDClUgluvwcWCOvpQ4bE5+Sc/eXcGR7X68TJIyqVsram2mq1ms3mFavenvXmAoEgFEEQlUqpVqs5HI5QEDpzRurJU0f/tv7/jBzxmlQqyTpz/Mstu2xRqBMGDozLzTv/89GDXC5v2CsjAgIcPZbbcQgT/Z1Fy2VSyZ69/+ZyeSlTZs5+c8HXO7fcvXf7T3EJ8UOHH07fZzKZbN/kcri7d+2PiIhcuWJ1QEDg6dPHbt0q8PPjjxo51p//8pyy9/7yV5lMcjh9n7eXT3R0P1cQHVoC6cFNtZPeEbK9IPwVzWazbdOi1WptbHq89N3U2bMWLF7k7D1d+aebIwewouPhbz1wuWRPvV6/YtXbAQFBgwf9iUqllZTc1el0UVF9ifYLJi4nOoIgE1+fkpeXc+DgdzQarVev3hs3bB09ahzRfsHE5USn0WhzZi+cM3sh0Y7giBvMMnY/PKITgEd0AvCITgAe0QnAIzoBeEQnAI/oBOARnQA8ohMANNF9Q2gutj7jKHQWmUrHpVNCM0qlkqSNOljWXAFRpcY3CJesDWiiRw7iSJv0sKwRjqbN5MWnevvjsh0ZmujRQzlGvbnokgyWQSKxgtyjTa+96Y+Tecj1Xi783EKhknyD6P5CBkSzzgEhISqZUSkzXs9uefvzCK4vXvPe8IthPryrrrmvMRmtksd2oo1er0MQEs2BDDetVkOnM1AVIZXL5TQajc1md/41JpdMoSEhkcxhk/Cq9PIEqxM5e/ZsWlqag0aWLFly584dVJcsWLAgPj7+jTfeyMzMdLB1KLhN2dcOCgoKYmJifHxQlHv68MMPr1y5giAIlUqNiopasWJFYmIinj6+BCc9HInF4m+++QaKqVdffRWV4rYijrYXRqOxtLR03bp1H3zwARRnsOEM0dVqdVpa2ooVK6BYO3LkSF1dHapLwsPDO86jIpFIKpXq6tWrY8aMgeIPBpwhOofD+fzzz2FZu3jxokyGbmDK5/OfuYsymcxLly7BcgktuIv+2Wef1dbWQjS4cePGmJiXpNI9Q0BAAJ1Ot722Wq0BAQFXrlyB6BJa8BV9z54906ZNi4hAnWfbCaGhoUwmE9UlAQEBtktYLFZ6evqgQYMg+oMB9xu9HDx4cNSoUVFRUaiumjlzptlsPnPmDACgrKyMRqOhtQATnIai1dXVBw4cwMPyBx98cOXKFTwsOw1cRNfr9SkpKXhYtlqtVVVVcrncQSNlZWWrV6+G5BFq3C+8wGL//v1RUVGvvfaa85uGL/qDBw+sVuuAAQPgmu0gIyNDKBSOHDkSJ/tOAPLopaamZuPGjfgpDgBobW2tqqqCYqq8vDw/Px+KKXTAjVa3bt1Sq9VwbT6DSCSqq6uDZW3mzJk1NTWwrHURmOFFp9MBABgMd5pJl0gkIpFoyBCnbu2FFl7UavWkSZOcoPi9e/e+/vprWNb4fL6TFYcp+rlz53bv3g3LWicwGIw7d+5ANFhQULBp0yaIBl+Ok8OZ4xiNxqqqKrg233777YaGBrg2OwFOTM/MzExISAgPD4fRDbo/EMJLUVHRL7/84kzFP/vsM5FIBNdmWVkZXIOdAEF0MpkM8c7WFWzP8XBtHjly5Pz583Btvgi3nAZobm5GECQgAOYRHKWlpQUFBUuWLIFo80U4KvqXX345ZMiQ5ORkeC51fxwKLwaD4d69e85XXCaTffXVV9DNXr9+XSwWQzf7PA6JTqPRjh071oUvQsbX1/fixYvQBaqurj59+jRcm3ZxSPSKioouFl2Bzs6dT6rzQGTkyJG+vjjndgHgUEwXiUQrV660LYB5QAX2nl5aWrp48WKozqBApVKtW7cOutnjx4931JnBD+yiT5w4cfr06VCdQQGXyxWJRPfv34dr9vDhwy0tLXBtPg9G0Q0GQ05ODmxn0LF161boITg1NbUjFww/MMb03NzcnJycbdu24eBS9wfjX5VOp7/11luwnUHNmjVrYC3d2cjPz29sbIRo0C4YRR85ciSuC6FdJCEhITsbZs3iY8eOwU0CtA+G6WCTybR7924c5pmJJz09/dGjR3i3guX5oqys7Pbt2zh0ACxUVVX5+fmhzVh/EfPnz4dip3OwhBcGg7Fq1SocnMGCQqH47LPPYFk7f/68UqmEZe1FYBG9d+/eCQkJODiDhfj4+JiYmObmZijWtm3b5oS5biyiZ2VlQV+4cYQPP/wwMBBCXVGz2TxhwgQvLy8YTnUKhvtAcnKyWCzG4QaDnR07dphMJqK96Cqoe7rJZJo3bx6UngURJpO5f/9+B420trZev34dkkedgVp0CoWyYMECfJzBztKlSyMjIx008ttvvzkntRG16NXV1T///DM+zmCHTCZPmDDB9nr06NHjx4/HYEQgEDhnFQz1OL2kpKS6uroLX3Q206ZNa2pqslW+9/LyunXrFtoh1ujRo3Hz7n9ALfqAAQP69++PjzMYefPNN5uamvT6P0oRUKlUDOtK586dmzhxIqqqA9hAHV569+7dp08ffJzBiE6nsyUMd0AikdCKXldXl5aW5gTFsYh+9OhRZyZDdYVTp04lJCQ8rZfFYkE7LU4mkyE+2XYOatFzc3Of6VaEQ6PRvvvuuzlz5jw9A4O2zwqFQqc9ZqMWfe7cub1798bHGYdYvXr1mjVrhEKh1WrFENN/+umn+nqHTibsOu6RVmc0WDUKU1ccFYvFu3btkkgkGzduFAqFXW9i3rx5Bw4c6NjNjgEEAC8+FelCN0Yt+t69e5cvX+6cGw4AoPKOuuiyQtqo9/KnGQ0W3NqxWixWB1dHeT60x1Wa8FjO0PHewZGd7UhBJ7rBYBgzZkxBQYEjznWdoitt9eXt8RP5HG+XO0biRSilxiunmxOn+IXFvLCAATrR9Xp9Xl6ecx7b7l5UNNcbRkyHmZrrNM4fEL2S5Bsey7L7qYvGdK3S/NvRlrGpwUQ7ghGjznr5VNP05SF2P0UXxVpaWrZv3w7Jsc6QNOrNXbpxuihUBiIT6zVt9pPF0IkulUrv3r0LybHOUMpMAaHutB/1eYR92PIWo92P0IkeEhLyySefQPKqM0xGi64dv7GKM1ArTFaL/X9WdKJ7eXkRXhaoG4BO9JKSku+++w43Z3oKqG+kjx49ws2ZngK6h464uDjHV8U8oBPd19fXORtEujfowsv169cPHz6MmzM9BXSii8VitCVXPTwPuvCSmJjo/Ooo3Q90osPdGd5jQRderly5kpmZiZszPQV0Pb2pqckZGxW6O+h6emJi4rRp03BzxlFKy+4/nf2Cgd8v/TZ2fHx9Pb4dC53oQqEwOjoaN2cc4nzO2ZWrFul0xGybRwU60QsKCn755RfcnHEIB/u4M0Enek1NjatlGtn4Lff8zl1bAQDTZ04YOz7+fM5Z2/ulZff/+uHSpOTEaTPG/2vb35WqJ1tbTCZT2r69b86e9HrS8KV/mZt/9Xe7Zq9fz39n6ZxJk0csemfWqdPQ6n2gu5EmJCRotVpYbUMkfuiw2bMWHD+R/uXmnWw2RygMAwDU1j766ONlERFRn6zZ2KaQHzj4XUuLePtX3wIAvtr+xW+55xbMfyciIuq33HOfb/h41460QYPinrap1Wo3/ePTiPDIj1avr6mpkkpbYXmLTnRXy2LswNvbJyRECACIjR3g5eVtezP9yH4SibTtX3u5HC4AgMvlbdm6oajojo+Pb86v2W8tXLro7fcAAGNGj1/w1oyDh77/evv/zFrLFTK9Xj9q1LjXJ0BeiEcXXu7evfv77/b/E12Qe0WFcXEJNsUBAAkJrwIAKipLi4rvAABGjhxrex9BkIT44RWVpc9cHhIs6N9/UPqR/SdPZRgMBoiOoRO9rKyssLAQYvO4otGovb3+yG7kcnkAAImkVaNRAwB8vP+YLuXxvLRarUajefpyBEG2btmdNDHlu+93vrVoZlERtLKn6EQfPHjwiBEjYLWNB09nlPD5AUplW8ePcrkMAMDhcPn8AADA0x/JZFIKhfJ8oWAOh/PhB2sPHTzJZnPWf74a1v0Mnej9+/cfPnw4lIahw2QwbR25453+/QfdKyrsyDG+fDkXADBw4JDY2AEIgly/8WR7kcFguH4jv3//QWQymUalPf33sA1DQ4IFM2ekqjVqsRhOrQYyqtq+paWlNTU1AoEAStudIK7Ttastgt72M6TswmCyzvznRG3dIwQgpWUl0dH9IsIjT546eq+okEqlXb+Rv//AN4MGxr391rs8npdY3HQ66xgAiETS+u23O2pqq9d8vCE4WEChUk9nHSuveBAWFsH3839r0UyJpFUqlZzOOmbQ65e8s6LrycCPilWCKIYX384xsuhEv3DhwoMHD5xw2h4G0Xlcnr9/4O+/XygouKJSKZOSUng8r4ED4m7dLjibfbKismzsaxPXfLzBlpebEP+qRqM+d/5MXl4Om8X++KP1ttssl8MNDgq5c/cWCSHF9hsoEtXnX714JT/Pz89/7SebBAIUacCdiI4ure727dttbW3Y9q6h4t4lhVRsemUSH++G8OPC4caE171Do+30G3Tj9Pj4eHhe9VxQTwMUFxfj5kxPAZ3ot27dclrp5W4MuvASHh7OYqG4uXmwCzrRhw0bhpsnPQh04UUkEj148AA3Z3oK6EQvLCw8efIkbs70FNCFF4FAAHe+rWeCepzuGao7Duq0Oug1hXsg6EQvLS09dOgQbs70FNCJLhAIBg8ejJszPQV0MT06Otpl817cCHQ9XSKROKe0GI1OorNwr2OOK1wfKomM2P0I3S/W1NS0b98+SF51Bs+P2lzjBrlanVBXpvYNotn9CJ3oQUFBr7/+OiSvOiMwjE6m2O8mboFWaQ6KYDA59muFuGhtAABAxW1V6Q3VhAX2t9e7OGf21ie9HegvtF89Bl1P12g0P/30EyTHXkJ0PHfoeO9z+0UtDTq9m+ye1ipN4tr2zB21U5YEv0hx1D1do9EkJydfvnwZkpMvp6lGdzdP8bhaSyIhBj1M6TEUV+scn0Bau8oUHstOSPLl+nQ2LEQ3ZGSz2U6u+RrcixG8JAgAYDbCDINtbW2pqannzp2DaNMCAJXapfuQ68Z0XNHpdIcPH3733XcJaR216CdPnkxOTvasHzkC6qB24sQJJxxKgzd6vT4rK4uo1lGLPmPGjOdz/twOrVa7d+9eolrvoTFdr9fn5+c7IWvKLqhFv3nzppeXl2fayxFQh5fi4uK8vDx8nHEearWawF3IqEUfPXr0wIED8XHGebS0tBByULONHhrT5XJ5YWFhx4kOTga16FKpNCsra8mSJbi51P1BHV44HI7jh9sQTnl5OYGH2KIWnU6nL1++nKhD1GFx/fr1yspKolrvoTH95s2bHA6nX79+hLSORfSCggI/P7++ffvi41L3B8uEcn19PYETF1DYt2+fVColqnUsoo8aNYqof0womM3mH374wc/PjygHemJMVygUBQUFzim9bxeMon/zzTdz5swhsLO4NRgXCdVq9bVr12A74ySysrJKSkoIdABjT29tbW1ra3PNA49eyqRJkw4fPuzv70+UAz0upmu12uLiYmIrHGDPQdi+fXtTUxNUZ5wBi8UivKYEdtHZbHZ2djZUZ5zBP/7xj8ePHxPrA/ZDmxYuXFhTUwPVGdypqKh4+PChE6p4dE7PiulyuZxKpXI4HGLdcEj0zMxMlUq1ePFiqC51fxxK5ktOTs7NzYXnDL6kpaU5J7n+pfSg8JKampqRkUG0FwCC6FqttqWlJSIiAp5L3R9Hc4VZLNbmzZvv3IFWPg8PDAaDS5UIhpCg/emnn7p45Z0NGzZQqXZqaRFF94/pcrm8qqrKaYd2dwU4WxFEItH3338PxRR0fHx8XEpxaKILhUKRSORScdPGrl27nLZJquvADC9qtZrwh72naWho+O9//7ts2TKiHXkWmKIrlUqTyeQ5Z+2lwNxexuPxPv30UxcZPmZmZjrnkGAMQN5/v3fv3urqarg2MZCbm/vo0aO4uLgufJcAuv+Q0QXBpdLE9u3b4W7RREV2dvYz5eddDVxE/+ijjwoLC5ubm20/Tp48GdchxNNbh9auXUun09lsNn7NOQ7u4SUpKUkqlQqFwvT0dDwGlPv27bMde3379m21Wm2xWHg8HvRW4IJjIZuSkpJhw4bZUgZ1Oh1Oh1M/ePDA1m+GDh06ffp011ccX9EXLVpkNpttr+VyOU754NXV1QiC2M4NUSgURO1SRAUuok+aNGno0KE2LWyYzWY8jgKrqKgwmUxPv9PW1vbKK69AbwguuIh+/vz5Xr16cTgci+VJsRCr1VpeXg69obq6OqVS+fQ7AoFg7Nix0BuCC/YUjM7JzMzMzs7OyMgQi8VyuRxBEKVSKRaLg4KCILZSVFTU3t6OIAiJRAoODk5MTJw5c6bLHjvWAV6iAwBSUlJSUlIuXbqUkZFRW1vb1tZWW1sLV/SSkhIKhRISEpKUlDR16tSQEPeoPeXQkFFU2V5T2t7SoNOqTO1qM4IgRr3Z7jetVqvFYiGT7RcSw4zZbEYQhISQwAuK2/gGMdrVRiaH4htED4qgRQ1ks71w7GddBIvoaoXp1gVF2Y02tg+dG8ChMihUOplCJ5MpJOBycwqIUW8y6U1mo0Ul0aqlWq4PdfBor37DuET6hEp0sxlcPNb66L46qC+fw2e+qNijK6NTGWT1bQatfswM/14DiSkVhEL0ugr95VMSli/LL8wNHkA6R68xSmsVPF/S5EU+oktuAAAFcUlEQVQBUKundYmuin7/mvLWBUWvVwhOvYSLXKTSKdRz16A4pwsKXRK9rqL90ilZ2BCYAw8XQSvXa1oVsz5w6rDn5f9aj+5rrmR1T8UBACwfOivA++dtDc5s9CWiqxWm3462CAd1T8VtsL3pTF/ur0danNbiS0T/735x+JBgZzlDGD4CrqzFUnPfSUsfnYlefltpBhQ6x4US0vDDJ9T78mmJc9rqTPT8LKl/ZE/Jp6CzqTQ2vfS6sgvfdZQXil5dpGH7sagMyA/uUDhyYsO/ds2GbtY31Ls4n1DRK++qmTy3L3qJCjqHqlKYVHJTF77rEC8UvbZUzQtw6eVdPODwWY/uq/Fuxf6UW0u93k/AJuFzQIJM3vifczsrq29SKXRBSHTyhGWhgn4AgANH1vjzw8lkyo3bWSazMbbviJlvfMJkPFnLvldy4deL++SKpkD/SKsVrxr2XD9Wqwj3Sln2e7pGZTIacPnFlErJ3rR3tVrltMmrpyStMpuN/3ffe03NT5LCLl09IpM3vrNg+/TJq4vv5+b+fsD2/p2inPTj63kcv+mTP4ruM7xR/BAP3wAAJApJ8liPk/EO7Pd0rdJMpuByC71w6UcO2/e9xXvJZAoAYOjg5K07/3zj9pnpU1YDAPz9wua9+XcEQcKE/YtLL1ZUXU8B7xuN+jO/fB0ZHvfu23tsM/ISaQNOulPoZK0K95huX3STwUJl4TI8L6+8pmhr/ts/X+t4x2w2KpRP0pKoVEbHcravd3BtfTEAoKauSKNVjEpM7VgDIZHwGlNRGRQmF/fnEvuik8iIQYvLH1yllvaLHjll4sqn32TQ7SQhkclUi8UMAJC3iW1/Azz8eQazwaxR4H72p33RWTyKxYTLMzGLydNo2wL8UWyB5LB9AABqrQIPf57BqDczubiv59m/kbK5ZIsJlxtpn8iE2vqihsd/5MDoDS8ZLYQE9UEQ0p0iZ5zibjKYud4EhZeAcIayVYdHe6+PXVpWeTXt0F9Hj5jHZfuWPyywWMyL5/+7k0t8vINe+dMbNwrPmEz66D6vKlWSssqrXA4u5cPaFbrwPvbPPoPIC2I6CYREsVSSdi6fCbc9vp9w1btpZ3N25106CBBEGBwzYvisl141fcpHFArtbnFORdWNXmGDQ4L6qtS4VFXUyLRRAwPxsPw0L1w5un+1reSGLjiWj7cHroNRZ66/07jkn7hvuX/hTSN2mNeNXzu7d2m1yi07Ztj9iO8rlMhEz7/fP2b03D9vxOSnHdp16s3bp9n9iMPytnvjfW3E/AmvvfMig21i9YARzlhz72yN9PovsoYai3+kj91PLRaLok38IrPAXgYMjca0DUWg0IkDJpORQrFzP2QyuEym/YwXqwWU5tWs3O6M+nsvWZj+Zk11zJhwd8xvQYu4Utp3IPVP46D1iU54yXLdxAVBLVVOWk8hEJ3KSLYanaP4y0XvPZgd3ocmqZE7xxtCsFpA9Q3RrA+dl9Lz8hSMV6f4BoeSmx92W90flzQt2tDLmS12KaVsxBs+3j7m5oeEFRzHCb3aeP/XmhkrgtleTl2VRJHLWJireFSq5wbyGFzcn9mcgLReqWlVLlwXjjh9lIAua/dxVXve8VYSlRrQx49Kd8U1664gE6laqmQDEr1HTiMm1wFLfnpFobrkmkopM3L82F5BbCqD4vpjSrPRopa2q1o1WoUuPJY95s98Bsvp2br/H+w7MVpF+od3NY21+pY6LYmEUJlkGpOC09wkZhgcmrJVq9eafYMZXB9K9FB2rwEcKo3gLgJnx7S+3aJVmvQ6K3Cx8g4kCsLikNk8CkJYt7aDpwoGAbhSB+gxeEQnAI/oBOARnQA8ohOAR3QC+H+6fjfnM2J1xQAAAABJRU5ErkJggg==\n",
"text/plain": [
"
"
]
},
- "metadata": {}
+ "metadata": {},
+ "output_type": "display_data"
}
],
- "source": [
- "from IPython.display import Image, display\n",
- "\n",
- "display(Image(graph.get_graph().draw_mermaid_png()))"
- ]
+ "metadata": {
+ "id": "2FIKomclLNFx",
+ "colab": {
+ "height": 350,
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "3419b941-b409-499e-c1e3-54f2526d467f"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": 8,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
- "source": [
- "`Run the graph` and stream the agent reasoning.\n",
- "\n",
- "We are going to ask the agent to extract the content from a `specific webpage`."
- ],
+ "source": "`Run the graph` and stream the agent reasoning.\n\nWe are going to ask the agent to extract the content from a `specific webpage`.",
+ "outputs": [],
"metadata": {
"id": "cw-T5CYWkCEN"
- }
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "source": [
- "# Inputs for the agent\n",
- "inputs = {\"messages\": [(\"user\", \"Find latest news related to robotics December 2024\")]}\n",
- "\n",
- "# Run the graph\n",
- "for event in graph.stream(inputs, config, stream_mode=\"values\"):\n",
- " event[\"messages\"][-1].pretty_print()\n"
- ],
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "Qn1rC2y8kAe9",
- "outputId": "15438700-b5ac-4439-b3b5-661de0bfef7c"
- },
- "execution_count": 20,
+ "source": "# Inputs for the agent\ninputs = {\"messages\": [(\"user\", \"Find latest news related to robotics December 2024\")]}\n\n# Run the graph\nfor event in graph.stream(inputs, config, stream_mode=\"values\"):\n event[\"messages\"][-1].pretty_print()\n",
"outputs": [
{
- "output_type": "stream",
"name": "stdout",
"text": [
"================================\u001b[1m Human Message \u001b[0m=================================\n",
@@ -431,48 +352,57 @@
"Name: SmartScraper\n",
"\n",
"{\"news\": [{\"title\": \"Matternet adds ANRA's UTM tech to expand drone delivery\", \"link\": \"https://www.therobotreport.com/matternet-adds-anras-utm-tech-to-expand-drone-delivery/\", \"description\": \"This latest partnership follows Matternet’s recent launch of a drone delivery operation in Silicon Valley.\"}, {\"title\": \"Helm.ai upgrades generative AI model to enrich autonomous driving data\", \"link\": \"https://www.therobotreport.com/helm-ai-upgrades-generative-ai-model-to-enrich-autonomous-driving-data/\", \"description\": \"Helm.ai said the new model enables automakers to generate diverse, realistic video data tailored to specific requirements.\"}, {\"title\": \"New research analyzes safety of Waymo robotaxis\", \"link\": \"https://www.therobotreport.com/new-research-analyzes-safety-of-waymo-robotaxis/\", \"description\": \"Waymo shared research with Swiss Re, one of the world’s largest insurance providers, analyzing liability claims related to collisions from 25.3 million fully autonomous miles driven.\"}, {\"title\": \"From AI to humanoids: top robotics trends of 2024\", \"link\": \"https://www.therobotreport.com/from-ai-to-humanoids-top-robotics-trends-of-2024/\", \"description\": \"The Robot Report Podcast reflects on the successes and challenges that defined the robotics industry in 2024.\"}, {\"title\": \"Symbotic acquires OhmniLabs, maker of disinfection & telepresence robots\", \"link\": \"https://www.therobotreport.com/symbotic-buys-healthcare-robot-maker-ohmnilabs/\", \"description\": \"With the acquisition of OhmniLabs, Symbotic said it will be better positioned to expand its capabilities for supply chain customers.\"}, {\"title\": \"Sanctuary AI shows new dexterity with in-hand manipulation skills\", \"link\": \"https://www.therobotreport.com/sanctuary-ai-showing-new-dexterity-with-in-hand-manipulation-skills/\", \"description\": \"Sanctuary AI showed its latest breakthrough with hydraulic actuation and precise in-hand manipulation to open up a range of high-value tasks.\"}, {\"title\": \"Apptronik partners with Google DeepMind to advance humanoid robots with AI\", \"link\": \"https://www.therobotreport.com/apptronik-partners-google-deepmind-advance-humanoid-robots-ai/\", \"description\": \"Apptronik will combine its iterative design experience and Apollo humanoid in testing with Google DeepMind’s AI platforms.\"}, {\"title\": \"Alimak Group, Skyline Robotics create autonomous building maintenance unit\", \"link\": \"https://www.therobotreport.com/alimak-group-skyline-robotics-create-autonomous-building-maintenance-unit/\", \"description\": \"Skyline Robotics said the joint system can help the industry handle increasingly complex design challenges and labor shortages.\"}, {\"title\": \"DoorDash partners with Wing to launch drone deliveries in Dallas-Fort Worth mall\", \"link\": \"https://www.therobotreport.com/doordash-partners-wing-launch-drone-deliveries-dallas-fort-worth-mall/\", \"description\": \"Beginning today, when certain DoorDash customers in Texas select drone delivery, their order will be delivered via Wing.\"}, {\"title\": \"Mcity says open-source digital twin enables cheaper autonomous vehicle testing\", \"link\": \"https://www.therobotreport.com/mcity-open-source-digital-twin-enables-cheaper-av-testing/\", \"description\": \"The Mcity test facility has been open since 2015, and autonomous vehicle developers can now test their technology from anywhere.\"}, {\"title\": \"2024: The year humanoids woke up\", \"link\": \"https://www.therobotreport.com/2024-the-year-humanoids-woke-up/\", \"description\": \"Humanoids empowered by AI are coming, and the long-term market could be huge, Persona AI’s Nic Radford tells columnist Oliver Mitchell.\"}, {\"title\": \"Waymo robotaxis head to Tokyo with the help of Nihan Kotsu and GO\", \"link\": \"https://www.therobotreport.com/waymo-is-heading-to-tokyo-with-the-help-of-nihan-kotsu-and-go/\", \"description\": \"The first all-electric Jaguar I-PACEs for Waymo will arrive in Tokyo in early 2025 and will initially be driven by safety drivers.\"}, {\"title\": \"Realbotix earns Amazon development subsidy; partners with UOL\", \"link\": \"https://www.therobotreport.com/realbotix-earns-amazon-development-subsidy-partners-with-uol/\", \"description\": \"Realbotix plans to use the funding to directly support the completion of initiatives including the development of Robot Controller 3.0.\"}, {\"title\": \"Eyeonic Trace Laser Line Scanner offers sub-millimeter depth perception\", \"link\": \"https://www.therobotreport.com/eyeonic-trace-laser-line-scanner-offers-sub-millimeter-depth-perception/\", \"description\": \"Prototype of the Eyeonic Trace Laser Line Scanner, designed to provide sub–millimeter depth precision for next generation warehouse automation, robotics, farming, construction and manufacturing applications.\"}, {\"title\": \"Slip Robotics picks up $28M for trailer loading/unloading robots\", \"link\": \"https://www.therobotreport.com/slip-robotics-picks-up-28m-for-trailer-loading-unloading-robots/\", \"description\": \"Slip Robotics plans to use its latest funding to continue RɦD on its trailer loading/unloading robots as it serves commercial customers.\"}, {\"title\": \"Jetson Orin Nano Super developer kit available from NVIDIA\", \"link\": \"https://www.therobotreport.com/jetson-orin-nano-super-developer-kit-available/\", \"description\": \"NVIDIA released Jetson Orin Nano Super Developer Kit, lowered the price and dropped an update for existing Nano users.\"}, {\"title\": \"Mbodi and T-Robotics are ABB Robotics' AI Startup Challenge winners\", \"link\": \"https://www.therobotreport.com/mbodi-and-t-robotics-are-abb-robotics-ai-startup-challenge-winners/\", \"description\": \"ABB Robotics is working with Mbodi and T-Robotics to make industrial robots easier to program and enable them to learn on their own.\"}, {\"title\": \"IEEE Awards announce Daniela Rus as 2025 Edison Medal recipient\", \"link\": \"https://www.therobotreport.com/ieee-awards-announce-daniela-rus-2025-edison-medal-recipient/\", \"description\": \"Currently the director of MIT CSAIL, Daniela Rus’ research interests include robotics, mobile computing, and data science.\"}, {\"title\": \"Eureka Robotics raises $10.5M to scale its vision systems in the U.S.\", \"link\": \"https://www.therobotreport.com/eureka-robotics-raises-10-5m-scale-its-vision-systems-in-u-s/\", \"description\": \"Eureka Robotics provides software and system to automate tasks that require high accuracy and high agility.\"}, {\"title\": \"Vision-guided cobot automates paint process for DENSO\", \"link\": \"https://www.therobotreport.com/denso-automates-paint-process-vision-guided-cobot/\", \"description\": \"DENSO deployed a 3D-vision-guided cobot with AI-based motion planning and control software to relieve employees of strenuous, tedious tasks.\"}, {\"title\": \"Brushed DC motors find use in robot applications, humanoid development\", \"link\": \"https://www.therobotreport.com/brushed-dc-motors-find-use-in-robot-applications-humanoid-development/\", \"description\": \"Recent research from Portescap found that brushed DC motors best fulfill the high requirements of humanoid robots.\"}, {\"title\": \"Diversity and inclusion can accelerate robotics innovation, finds Max Planck study\", \"link\": \"https://www.therobotreport.com/diversity-and-inclusion-can-accelerate-robotic-innovation-finds-max-planck-study/\", \"description\": \"The study outlined seven distinct benefits that diversity and inclusion bring to robotics research and innovation.\"}, {\"title\": \"Advanced Precision Strain Wave Gear Offers Torque Sensing to Robots\", \"link\": \"https://www.therobotreport.com/advanced-precision-strain-wave-gear-offers-torque-sensing-to-robots/\", \"description\": \"NA\"}, {\"title\": \"Innovative motion solutions are supporting the latest trends in robotics\", \"link\": \"https://www.therobotreport.com/innovative-motion-solutions-are-supporting-the-latest-trends-in-robotics/\", \"description\": \"NA\"}, {\"title\": \"Renishaw and RLS help to drive a robot revolution\", \"link\": \"https://www.therobotreport.com/renishaw-and-rls-help-to-drive-a-robot-revolution/\", \"description\": \"NA\"}, {\"title\": \"Ask an Expert Podcast: flexible conveyance for materials handling\", \"link\": \"https://www.therobotreport.com/ask-an-expert-flexible-conveyors-for-materials-handling/\", \"description\": \"NA\"}, {\"title\": \"Hop Onboard the AMR Revolution: Vision & Localization Unleashed\", \"link\": \"https://www.therobotreport.com/hop-onboard-the-amr-revolution-vision-localization-unleashed/\", \"description\": \"NA\"}]}\n"
- ]
+ ],
+ "output_type": "stream"
}
- ]
+ ],
+ "metadata": {
+ "id": "Qn1rC2y8kAe9",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "15438700-b5ac-4439-b3b5-661de0bfef7c"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": 20,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "source": [
- "# get last message (assuming the last one is the Smartscraper tool response)\n",
- "result = graph.get_state(config).values[\"messages\"][-1].content\n",
- "\n",
- "import json\n",
- "# convert string into json\n",
- "result = json.loads(result)"
- ],
+ "source": "# get last message (assuming the last one is the Smartscraper tool response)\nresult = graph.get_state(config).values[\"messages\"][-1].content\n\nimport json\n# convert string into json\nresult = json.loads(result)",
+ "outputs": [],
"metadata": {
"id": "_12IqhcrkiHC"
},
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
"execution_count": 21,
- "outputs": []
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "Print the response",
+ "outputs": [],
"metadata": {
"id": "YZz1bqCIpoL8"
},
- "source": [
- "Print the response"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": 23,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "F1VfD8B4LPc8",
- "outputId": "3aebeb27-c529-4bd6-e3a4-6318f9dabc0b"
- },
+ "source": "print(json.dumps(result, indent=2))",
"outputs": [
{
- "output_type": "stream",
"name": "stdout",
"text": [
"{\n",
@@ -614,134 +544,58 @@
" }\n",
" ]\n",
"}\n"
- ]
+ ],
+ "output_type": "stream"
}
],
- "source": [
- "print(json.dumps(result, indent=2))"
- ]
+ "metadata": {
+ "id": "F1VfD8B4LPc8",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "3aebeb27-c529-4bd6-e3a4-6318f9dabc0b"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": 23,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 💾 Save the output to a `CSV` file",
+ "outputs": [],
"metadata": {
"id": "2as65QLypwdb"
},
- "source": [
- "### 💾 Save the output to a `CSV` file"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "Let's create a pandas dataframe and show the table with the extracted content",
+ "outputs": [],
"metadata": {
"id": "HTLVFgbVLLBR"
},
- "source": [
- "Let's create a pandas dataframe and show the table with the extracted content"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": 24,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/",
- "height": 896
- },
- "id": "1lS9O1KOI51y",
- "outputId": "76fb52bc-738a-44b4-ed34-13ce5ac9b26c"
- },
+ "source": "import pandas as pd\n\n# Convert dictionary to DataFrame\ndf = pd.DataFrame(result[\"news\"])\ndf",
"outputs": [
{
- "output_type": "execute_result",
"data": {
- "text/plain": [
- " title \\\n",
- "0 Matternet adds ANRA's UTM tech to expand drone... \n",
- "1 Helm.ai upgrades generative AI model to enrich... \n",
- "2 New research analyzes safety of Waymo robotaxis \n",
- "3 From AI to humanoids: top robotics trends of 2024 \n",
- "4 Symbotic acquires OhmniLabs, maker of disinfec... \n",
- "5 Sanctuary AI shows new dexterity with in-hand ... \n",
- "6 Apptronik partners with Google DeepMind to adv... \n",
- "7 Alimak Group, Skyline Robotics create autonomo... \n",
- "8 DoorDash partners with Wing to launch drone de... \n",
- "9 Mcity says open-source digital twin enables ch... \n",
- "10 2024: The year humanoids woke up \n",
- "11 Waymo robotaxis head to Tokyo with the help of... \n",
- "12 Realbotix earns Amazon development subsidy; pa... \n",
- "13 Eyeonic Trace Laser Line Scanner offers sub-mi... \n",
- "14 Slip Robotics picks up $28M for trailer loadin... \n",
- "15 Jetson Orin Nano Super developer kit available... \n",
- "16 Mbodi and T-Robotics are ABB Robotics' AI Star... \n",
- "17 IEEE Awards announce Daniela Rus as 2025 Ediso... \n",
- "18 Eureka Robotics raises $10.5M to scale its vis... \n",
- "19 Vision-guided cobot automates paint process fo... \n",
- "20 Brushed DC motors find use in robot applicatio... \n",
- "21 Diversity and inclusion can accelerate robotic... \n",
- "22 Advanced Precision Strain Wave Gear Offers Tor... \n",
- "23 Innovative motion solutions are supporting the... \n",
- "24 Renishaw and RLS help to drive a robot revolution \n",
- "25 Ask an Expert Podcast: flexible conveyance for... \n",
- "26 Hop Onboard the AMR Revolution: Vision & Local... \n",
- "\n",
- " link \\\n",
- "0 https://www.therobotreport.com/matternet-adds-... \n",
- "1 https://www.therobotreport.com/helm-ai-upgrade... \n",
- "2 https://www.therobotreport.com/new-research-an... \n",
- "3 https://www.therobotreport.com/from-ai-to-huma... \n",
- "4 https://www.therobotreport.com/symbotic-buys-h... \n",
- "5 https://www.therobotreport.com/sanctuary-ai-sh... \n",
- "6 https://www.therobotreport.com/apptronik-partn... \n",
- "7 https://www.therobotreport.com/alimak-group-sk... \n",
- "8 https://www.therobotreport.com/doordash-partne... \n",
- "9 https://www.therobotreport.com/mcity-open-sour... \n",
- "10 https://www.therobotreport.com/2024-the-year-h... \n",
- "11 https://www.therobotreport.com/waymo-is-headin... \n",
- "12 https://www.therobotreport.com/realbotix-earns... \n",
- "13 https://www.therobotreport.com/eyeonic-trace-l... \n",
- "14 https://www.therobotreport.com/slip-robotics-p... \n",
- "15 https://www.therobotreport.com/jetson-orin-nan... \n",
- "16 https://www.therobotreport.com/mbodi-and-t-rob... \n",
- "17 https://www.therobotreport.com/ieee-awards-ann... \n",
- "18 https://www.therobotreport.com/eureka-robotics... \n",
- "19 https://www.therobotreport.com/denso-automates... \n",
- "20 https://www.therobotreport.com/brushed-dc-moto... \n",
- "21 https://www.therobotreport.com/diversity-and-i... \n",
- "22 https://www.therobotreport.com/advanced-precis... \n",
- "23 https://www.therobotreport.com/innovative-moti... \n",
- "24 https://www.therobotreport.com/renishaw-and-rl... \n",
- "25 https://www.therobotreport.com/ask-an-expert-f... \n",
- "26 https://www.therobotreport.com/hop-onboard-the... \n",
- "\n",
- " description \n",
- "0 This latest partnership follows Matternet’s re... \n",
- "1 Helm.ai said the new model enables automakers ... \n",
- "2 Waymo shared research with Swiss Re, one of th... \n",
- "3 The Robot Report Podcast reflects on the succe... \n",
- "4 With the acquisition of OhmniLabs, Symbotic sa... \n",
- "5 Sanctuary AI showed its latest breakthrough wi... \n",
- "6 Apptronik will combine its iterative design ex... \n",
- "7 Skyline Robotics said the joint system can hel... \n",
- "8 Beginning today, when certain DoorDash custome... \n",
- "9 The Mcity test facility has been open since 20... \n",
- "10 Humanoids empowered by AI are coming, and the ... \n",
- "11 The first all-electric Jaguar I-PACEs for Waym... \n",
- "12 Realbotix plans to use the funding to directly... \n",
- "13 Prototype of the Eyeonic Trace Laser Line Scan... \n",
- "14 Slip Robotics plans to use its latest funding ... \n",
- "15 NVIDIA released Jetson Orin Nano Super Develop... \n",
- "16 ABB Robotics is working with Mbodi and T-Robot... \n",
- "17 Currently the director of MIT CSAIL, Daniela R... \n",
- "18 Eureka Robotics provides software and system t... \n",
- "19 DENSO deployed a 3D-vision-guided cobot with A... \n",
- "20 Recent research from Portescap found that brus... \n",
- "21 The study outlined seven distinct benefits tha... \n",
- "22 NA \n",
- "23 NA \n",
- "24 NA \n",
- "25 NA \n",
- "26 NA "
- ],
"text/html": [
"\n",
" \n",
@@ -1199,90 +1053,188 @@
"
\n",
" \n"
],
+ "text/plain": [
+ " title \\\n",
+ "0 Matternet adds ANRA's UTM tech to expand drone... \n",
+ "1 Helm.ai upgrades generative AI model to enrich... \n",
+ "2 New research analyzes safety of Waymo robotaxis \n",
+ "3 From AI to humanoids: top robotics trends of 2024 \n",
+ "4 Symbotic acquires OhmniLabs, maker of disinfec... \n",
+ "5 Sanctuary AI shows new dexterity with in-hand ... \n",
+ "6 Apptronik partners with Google DeepMind to adv... \n",
+ "7 Alimak Group, Skyline Robotics create autonomo... \n",
+ "8 DoorDash partners with Wing to launch drone de... \n",
+ "9 Mcity says open-source digital twin enables ch... \n",
+ "10 2024: The year humanoids woke up \n",
+ "11 Waymo robotaxis head to Tokyo with the help of... \n",
+ "12 Realbotix earns Amazon development subsidy; pa... \n",
+ "13 Eyeonic Trace Laser Line Scanner offers sub-mi... \n",
+ "14 Slip Robotics picks up $28M for trailer loadin... \n",
+ "15 Jetson Orin Nano Super developer kit available... \n",
+ "16 Mbodi and T-Robotics are ABB Robotics' AI Star... \n",
+ "17 IEEE Awards announce Daniela Rus as 2025 Ediso... \n",
+ "18 Eureka Robotics raises $10.5M to scale its vis... \n",
+ "19 Vision-guided cobot automates paint process fo... \n",
+ "20 Brushed DC motors find use in robot applicatio... \n",
+ "21 Diversity and inclusion can accelerate robotic... \n",
+ "22 Advanced Precision Strain Wave Gear Offers Tor... \n",
+ "23 Innovative motion solutions are supporting the... \n",
+ "24 Renishaw and RLS help to drive a robot revolution \n",
+ "25 Ask an Expert Podcast: flexible conveyance for... \n",
+ "26 Hop Onboard the AMR Revolution: Vision & Local... \n",
+ "\n",
+ " link \\\n",
+ "0 https://www.therobotreport.com/matternet-adds-... \n",
+ "1 https://www.therobotreport.com/helm-ai-upgrade... \n",
+ "2 https://www.therobotreport.com/new-research-an... \n",
+ "3 https://www.therobotreport.com/from-ai-to-huma... \n",
+ "4 https://www.therobotreport.com/symbotic-buys-h... \n",
+ "5 https://www.therobotreport.com/sanctuary-ai-sh... \n",
+ "6 https://www.therobotreport.com/apptronik-partn... \n",
+ "7 https://www.therobotreport.com/alimak-group-sk... \n",
+ "8 https://www.therobotreport.com/doordash-partne... \n",
+ "9 https://www.therobotreport.com/mcity-open-sour... \n",
+ "10 https://www.therobotreport.com/2024-the-year-h... \n",
+ "11 https://www.therobotreport.com/waymo-is-headin... \n",
+ "12 https://www.therobotreport.com/realbotix-earns... \n",
+ "13 https://www.therobotreport.com/eyeonic-trace-l... \n",
+ "14 https://www.therobotreport.com/slip-robotics-p... \n",
+ "15 https://www.therobotreport.com/jetson-orin-nan... \n",
+ "16 https://www.therobotreport.com/mbodi-and-t-rob... \n",
+ "17 https://www.therobotreport.com/ieee-awards-ann... \n",
+ "18 https://www.therobotreport.com/eureka-robotics... \n",
+ "19 https://www.therobotreport.com/denso-automates... \n",
+ "20 https://www.therobotreport.com/brushed-dc-moto... \n",
+ "21 https://www.therobotreport.com/diversity-and-i... \n",
+ "22 https://www.therobotreport.com/advanced-precis... \n",
+ "23 https://www.therobotreport.com/innovative-moti... \n",
+ "24 https://www.therobotreport.com/renishaw-and-rl... \n",
+ "25 https://www.therobotreport.com/ask-an-expert-f... \n",
+ "26 https://www.therobotreport.com/hop-onboard-the... \n",
+ "\n",
+ " description \n",
+ "0 This latest partnership follows Matternet’s re... \n",
+ "1 Helm.ai said the new model enables automakers ... \n",
+ "2 Waymo shared research with Swiss Re, one of th... \n",
+ "3 The Robot Report Podcast reflects on the succe... \n",
+ "4 With the acquisition of OhmniLabs, Symbotic sa... \n",
+ "5 Sanctuary AI showed its latest breakthrough wi... \n",
+ "6 Apptronik will combine its iterative design ex... \n",
+ "7 Skyline Robotics said the joint system can hel... \n",
+ "8 Beginning today, when certain DoorDash custome... \n",
+ "9 The Mcity test facility has been open since 20... \n",
+ "10 Humanoids empowered by AI are coming, and the ... \n",
+ "11 The first all-electric Jaguar I-PACEs for Waym... \n",
+ "12 Realbotix plans to use the funding to directly... \n",
+ "13 Prototype of the Eyeonic Trace Laser Line Scan... \n",
+ "14 Slip Robotics plans to use its latest funding ... \n",
+ "15 NVIDIA released Jetson Orin Nano Super Develop... \n",
+ "16 ABB Robotics is working with Mbodi and T-Robot... \n",
+ "17 Currently the director of MIT CSAIL, Daniela R... \n",
+ "18 Eureka Robotics provides software and system t... \n",
+ "19 DENSO deployed a 3D-vision-guided cobot with A... \n",
+ "20 Recent research from Portescap found that brus... \n",
+ "21 The study outlined seven distinct benefits tha... \n",
+ "22 NA \n",
+ "23 NA \n",
+ "24 NA \n",
+ "25 NA \n",
+ "26 NA "
+ ],
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "dataframe",
- "variable_name": "df",
- "summary": "{\n \"name\": \"df\",\n \"rows\": 27,\n \"fields\": [\n {\n \"column\": \"title\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 27,\n \"samples\": [\n \"DoorDash partners with Wing to launch drone deliveries in Dallas-Fort Worth mall\",\n \"Eyeonic Trace Laser Line Scanner offers sub-millimeter depth perception\",\n \"Mcity says open-source digital twin enables cheaper autonomous vehicle testing\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"link\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 27,\n \"samples\": [\n \"https://www.therobotreport.com/doordash-partners-wing-launch-drone-deliveries-dallas-fort-worth-mall/\",\n \"https://www.therobotreport.com/eyeonic-trace-laser-line-scanner-offers-sub-millimeter-depth-perception/\",\n \"https://www.therobotreport.com/mcity-open-source-digital-twin-enables-cheaper-av-testing/\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"description\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 23,\n \"samples\": [\n \"NVIDIA released Jetson Orin Nano Super Developer Kit, lowered the price and dropped an update for existing Nano users.\",\n \"The Mcity test facility has been open since 2015, and autonomous vehicle developers can now test their technology from anywhere.\",\n \"This latest partnership follows Matternet\\u2019s recent launch of a drone delivery operation in Silicon Valley.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}"
+ "summary": "{\n \"name\": \"df\",\n \"rows\": 27,\n \"fields\": [\n {\n \"column\": \"title\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 27,\n \"samples\": [\n \"DoorDash partners with Wing to launch drone deliveries in Dallas-Fort Worth mall\",\n \"Eyeonic Trace Laser Line Scanner offers sub-millimeter depth perception\",\n \"Mcity says open-source digital twin enables cheaper autonomous vehicle testing\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"link\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 27,\n \"samples\": [\n \"https://www.therobotreport.com/doordash-partners-wing-launch-drone-deliveries-dallas-fort-worth-mall/\",\n \"https://www.therobotreport.com/eyeonic-trace-laser-line-scanner-offers-sub-millimeter-depth-perception/\",\n \"https://www.therobotreport.com/mcity-open-source-digital-twin-enables-cheaper-av-testing/\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"description\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 23,\n \"samples\": [\n \"NVIDIA released Jetson Orin Nano Super Developer Kit, lowered the price and dropped an update for existing Nano users.\",\n \"The Mcity test facility has been open since 2015, and autonomous vehicle developers can now test their technology from anywhere.\",\n \"This latest partnership follows Matternet\\u2019s recent launch of a drone delivery operation in Silicon Valley.\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}",
+ "variable_name": "df"
}
},
"metadata": {},
+ "output_type": "execute_result",
"execution_count": 24
}
],
- "source": [
- "import pandas as pd\n",
- "\n",
- "# Convert dictionary to DataFrame\n",
- "df = pd.DataFrame(result[\"news\"])\n",
- "df"
- ]
+ "metadata": {
+ "id": "1lS9O1KOI51y",
+ "colab": {
+ "height": 896,
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "76fb52bc-738a-44b4-ed34-13ce5ac9b26c"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": 24,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "Save it to CSV",
+ "outputs": [],
"metadata": {
"id": "v0CBYVk7qA5Z"
},
- "source": [
- "Save it to CSV"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": 26,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "BtEbB9pmQGhO",
- "outputId": "fe011a56-b28a-4e28-f2d1-4580c8a65596"
- },
+ "source": "# Save the DataFrame to a CSV file\ncsv_file = \"news.csv\"\ndf.to_csv(csv_file, index=False)\nprint(f\"Data saved to {csv_file}\")",
"outputs": [
{
- "output_type": "stream",
"name": "stdout",
"text": [
"Data saved to news.csv\n"
- ]
+ ],
+ "output_type": "stream"
}
],
- "source": [
- "# Save the DataFrame to a CSV file\n",
- "csv_file = \"news.csv\"\n",
- "df.to_csv(csv_file, index=False)\n",
- "print(f\"Data saved to {csv_file}\")"
- ]
+ "metadata": {
+ "id": "BtEbB9pmQGhO",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "fe011a56-b28a-4e28-f2d1-4580c8a65596"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": 26,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "## 🔗 Resources",
+ "outputs": [],
"metadata": {
"id": "-1SZT8VzTZNd"
},
- "source": [
- "## 🔗 Resources"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "\n
\n\n\n- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n- 🦜 **Langchain:** [ScrapeGraph docs](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n\nMade with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n",
+ "outputs": [],
"metadata": {
"id": "dUi2LtMLRDDR"
},
- "source": [
- "\n",
- "
\n",
- "\n",
- "\n",
- "- 🚀 **Get your API Key:** [ScrapeGraphAI Dashboard](https://dashboard.scrapegraphai.com) \n",
- "- 🐙 **GitHub:** [ScrapeGraphAI GitHub](https://github.com/scrapegraphai) \n",
- "- 💼 **LinkedIn:** [ScrapeGraphAI LinkedIn](https://www.linkedin.com/company/scrapegraphai/) \n",
- "- 🐦 **Twitter:** [ScrapeGraphAI Twitter](https://twitter.com/scrapegraphai) \n",
- "- 💬 **Discord:** [Join our Discord Community](https://discord.gg/uJN7TYcpNa) \n",
- "- 🦜 **Langchain:** [ScrapeGraph docs](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n",
- "\n",
- "Made with ❤️ by the [ScrapeGraphAI](https://scrapegraphai.com) Team \n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
}
],
"metadata": {
@@ -1290,8 +1242,8 @@
"provenance": []
},
"kernelspec": {
- "display_name": "Python 3",
- "name": "python3"
+ "name": "python3",
+ "display_name": "Python 3"
},
"language_info": {
"name": "python"
@@ -1299,4 +1251,4 @@
},
"nbformat": 4,
"nbformat_minor": 0
-}
+}
\ No newline at end of file
diff --git a/cookbook/research-agent/scrapegraph_langgraph_tavily_mysql.ipynb b/cookbook/research-agent/scrapegraph_langgraph_tavily_mysql.ipynb
index 2d9cbc7..1b05d9b 100644
--- a/cookbook/research-agent/scrapegraph_langgraph_tavily_mysql.ipynb
+++ b/cookbook/research-agent/scrapegraph_langgraph_tavily_mysql.ipynb
@@ -1,87 +1,103 @@
{
"cells": [
{
- "cell_type": "markdown",
- "metadata": {
- "id": "ReBHQ5_834pZ"
- },
- "source": [
- "
"
- ]
- },
- {
- "cell_type": "markdown",
+ "source": "## 🕷️ Research Agent with `scrapegraph`, `langgraph`, `tavily` and `MySQL`\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-langgraph-tavily-mysql) [](https://colab.research.google.com/drive/1uFyXOLLibGQKqvr8K7jaG_7S3V6oits2?usp=sharing)",
+ "outputs": [
+ {
+ "data": {
+ "text/plain": "## 🕷️ Research Agent with `scrapegraph`, `langgraph`, `tavily` and `MySQL`\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-langgraph-tavily-mysql) [](https://colab.research.google.com/drive/1uFyXOLLibGQKqvr8K7jaG_7S3V6oits2?usp=sharing)",
+ "text/markdown": "## 🕷️ Research Agent with `scrapegraph`, `langgraph`, `tavily` and `MySQL`\n\n[](https://www.runalph.ai/notebooks/scrapegraphai/scrapegraph-langgraph-tavily-mysql) [](https://colab.research.google.com/drive/1uFyXOLLibGQKqvr8K7jaG_7S3V6oits2?usp=sharing)"
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
"metadata": {
"id": "jEkuKbcRrPcK"
},
- "source": [
- "## 🕷️ Research Agent with `scrapegraph`, `langgraph`, `tavily` and `MySQL`"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "completed",
+ "execution_count": null,
+ "executionEndTime": "2026-03-26T00:13:02.867Z",
+ "executionStartTime": "2026-03-26T00:13:02.866Z"
},
{
- "cell_type": "markdown",
- "source": [
- ""
- ],
+ "source": "",
+ "outputs": [],
"metadata": {
"id": "-GmCGaVI3kcT"
- }
+ },
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 🔧 Install `dependencies`",
+ "outputs": [],
"metadata": {
"id": "IzsyDXEWwPVt"
},
- "source": [
- "### 🔧 Install `dependencies`"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": 1,
+ "source": "%%capture\n!pip install langgraph langchain-scrapegraph langchain-openai \"langchain-community>=0.2.11\" tavily-python",
+ "outputs": [],
"metadata": {
"id": "os_vm0MkIxr9"
},
- "outputs": [],
- "source": [
- "%%capture\n",
- "!pip install langgraph langchain-scrapegraph langchain-openai \"langchain-community>=0.2.11\" tavily-python"
- ]
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": 1,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 🔑 Import `ScrapeGraph`, `Tavily` and `OpenAI` API keys",
+ "outputs": [],
"metadata": {
"id": "apBsL-L2KzM7"
},
- "source": [
- "### 🔑 Import `ScrapeGraph`, `Tavily` and `OpenAI` API keys"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)",
+ "outputs": [],
"metadata": {
"id": "ol9gQbAFkh9b"
},
- "source": [
- "You can find the Scrapegraph API key [here](https://dashboard.scrapegraphai.com/)"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": 2,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "sffqFG2EJ8bI",
- "outputId": "132bdfb9-45c8-4c26-8ac2-d13593ba392a"
- },
+ "source": "import getpass\nimport os\n\nif not os.environ.get(\"SGAI_API_KEY\"):\n os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")\n\nif not os.environ.get(\"TAVILY_API_KEY\"):\n os.environ[\"TAVILY_API_KEY\"] = getpass.getpass(\"Tavily API key:\\n\")\n\nif not os.environ.get(\"OPENAI_API_KEY\"):\n os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API key:\\n\")",
"outputs": [
{
"name": "stdout",
- "output_type": "stream",
"text": [
"Scrapegraph API key:\n",
"··········\n",
@@ -89,314 +105,227 @@
"··········\n",
"OpenAI API key:\n",
"··········\n"
- ]
+ ],
+ "output_type": "stream"
}
],
- "source": [
- "import getpass\n",
- "import os\n",
- "\n",
- "if not os.environ.get(\"SGAI_API_KEY\"):\n",
- " os.environ[\"SGAI_API_KEY\"] = getpass.getpass(\"Scrapegraph API key:\\n\")\n",
- "\n",
- "if not os.environ.get(\"TAVILY_API_KEY\"):\n",
- " os.environ[\"TAVILY_API_KEY\"] = getpass.getpass(\"Tavily API key:\\n\")\n",
- "\n",
- "if not os.environ.get(\"OPENAI_API_KEY\"):\n",
- " os.environ[\"OPENAI_API_KEY\"] = getpass.getpass(\"OpenAI API key:\\n\")"
- ]
+ "metadata": {
+ "id": "sffqFG2EJ8bI",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "132bdfb9-45c8-4c26-8ac2-d13593ba392a"
+ },
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": 2,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 📝 Defining an `Output Schema` for Webpage Content Extraction\n",
+ "outputs": [],
"metadata": {
"id": "jnqMB2-xVYQ7"
},
- "source": [
- "### 📝 Defining an `Output Schema` for Webpage Content Extraction\n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n\n
\n",
+ "outputs": [],
"metadata": {
"id": "VZvxbjfXvbgd"
},
- "source": [
- "If you already know what you want to extract from a webpage, you can **define an output schema** using **Pydantic**. This schema acts as a \"blueprint\" that tells the AI how to structure the response.\n",
- "\n",
- "
\n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": 3,
+ "source": "from pydantic import BaseModel, Field\nfrom typing import List\n\n# Schema for a single news item\nclass NewsItemSchema(BaseModel):\n title: str = Field(description=\"Title of the news article\")\n link: str = Field(description=\"URL to the news article\")\n description: str = Field(description=\"Summary/description of the news article\")\n\n# Schema that contains a list of news items\nclass ListNewsSchema(BaseModel):\n news: List[NewsItemSchema] = Field(description=\"List of news articles with their details\")",
+ "outputs": [],
"metadata": {
"id": "dlrOEgZk_8V4"
},
- "outputs": [],
- "source": [
- "from pydantic import BaseModel, Field\n",
- "from typing import List\n",
- "\n",
- "# Schema for a single news item\n",
- "class NewsItemSchema(BaseModel):\n",
- " title: str = Field(description=\"Title of the news article\")\n",
- " link: str = Field(description=\"URL to the news article\")\n",
- " description: str = Field(description=\"Summary/description of the news article\")\n",
- "\n",
- "# Schema that contains a list of news items\n",
- "class ListNewsSchema(BaseModel):\n",
- " news: List[NewsItemSchema] = Field(description=\"List of news articles with their details\")"
- ]
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": 3,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "### 🚀 Initialize `scrapegraph` and `tavily` tools and `langgraph` prebuilt agent and run the `extraction`",
+ "outputs": [],
"metadata": {
"id": "cDGH0b2DkY63"
},
- "source": [
- "### 🚀 Initialize `scrapegraph` and `tavily` tools and `langgraph` prebuilt agent and run the `extraction`"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "Here we use `SmartScraperTool` to extract structured data using AI from a webpage.\n\n\n> If you already have an HTML file, you can upload it and use `LocalScraperTool` instead.\n\nYou can find more info in the [official langchain documentation](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n\n",
+ "outputs": [],
"metadata": {
"id": "M1KSXffZopUD"
},
- "source": [
- "Here we use `SmartScraperTool` to extract structured data using AI from a webpage.\n",
- "\n",
- "\n",
- "> If you already have an HTML file, you can upload it and use `LocalScraperTool` instead.\n",
- "\n",
- "You can find more info in the [official langchain documentation](https://python.langchain.com/docs/integrations/tools/scrapegraph/)\n",
- "\n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": 21,
+ "source": "from langchain_scrapegraph.tools import SmartScraperTool\nfrom langchain_community.tools import TavilySearchResults\n\n# Will automatically get SGAI_API_KEY from environment\n# Initialization without output schema\n# smartscraper_tool = SmartScraperTool()\n\n# Since we have defined an output schema, let's use it\n# This will force the tool to have always the same output structure\nsmartscraper_tool = SmartScraperTool(llm_output_schema=ListNewsSchema)\n\n# Initialize tavily tool to look for URLs\ntavily_tool = TavilySearchResults(\n max_results=1,\n name=\"urls_finder\",\n description=\"Use this tool to find webpages urls that satisfy the user request\",\n)\n\n",
+ "outputs": [],
"metadata": {
"id": "ySoE0Rowjgp1"
},
- "outputs": [],
- "source": [
- "from langchain_scrapegraph.tools import SmartScraperTool\n",
- "from langchain_community.tools import TavilySearchResults\n",
- "\n",
- "# Will automatically get SGAI_API_KEY from environment\n",
- "# Initialization without output schema\n",
- "# smartscraper_tool = SmartScraperTool()\n",
- "\n",
- "# Since we have defined an output schema, let's use it\n",
- "# This will force the tool to have always the same output structure\n",
- "smartscraper_tool = SmartScraperTool(llm_output_schema=ListNewsSchema)\n",
- "\n",
- "# Initialize tavily tool to look for URLs\n",
- "tavily_tool = TavilySearchResults(\n",
- " max_results=1,\n",
- " name=\"urls_finder\",\n",
- " description=\"Use this tool to find webpages urls that satisfy the user request\",\n",
- ")\n",
- "\n"
- ]
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": 21,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "We then initialize the `llm model` we want to use in the agent\n\n",
+ "outputs": [],
"metadata": {
"id": "W54HVoYeiJbG"
},
- "source": [
- "We then initialize the `llm model` we want to use in the agent\n",
- "\n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": 22,
+ "source": "# First we initialize the llm model we want to use.\nfrom langchain_openai import ChatOpenAI\n\nllm_model = ChatOpenAI(model=\"gpt-4o\", temperature=0)",
+ "outputs": [],
"metadata": {
"id": "ctrkEnltiBCD"
},
- "outputs": [],
- "source": [
- "# First we initialize the llm model we want to use.\n",
- "from langchain_openai import ChatOpenAI\n",
- "\n",
- "llm_model = ChatOpenAI(model=\"gpt-4o\", temperature=0)"
- ]
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": 22,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "Here we use `create_react_agent` to quickly use one of the prebuilt agents from `langgraph.prebuilt` module\n\nYou can find more info in the [official langgraph documentation](https://langchain-ai.github.io/langgraph/how-tos/create-react-agent/)\n\n",
+ "outputs": [],
"metadata": {
"id": "M0WY2Pa8Y8Pk"
},
- "source": [
- "Here we use `create_react_agent` to quickly use one of the prebuilt agents from `langgraph.prebuilt` module\n",
- "\n",
- "You can find more info in the [official langgraph documentation](https://langchain-ai.github.io/langgraph/how-tos/create-react-agent/)\n",
- "\n"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": 27,
+ "source": "from langgraph.prebuilt import create_react_agent\nfrom langgraph.checkpoint.memory import MemorySaver\n\n# List of tools we want the agent to use\ntools = [smartscraper_tool, tavily_tool]\n\n# We set up the agent's memory to review the different reasoning steps\nmemory = MemorySaver()\n\n# Add a configuration to specify where to store the graph states\nconfig = {\"configurable\": {\"thread_id\": \"1\"}}\n\n# Initialize the ReAct agent\ngraph = create_react_agent(\n model=llm_model,\n tools=tools,\n checkpointer=memory,\n)",
+ "outputs": [],
"metadata": {
"id": "Zo1BcIlHhcQP"
},
- "outputs": [],
- "source": [
- "from langgraph.prebuilt import create_react_agent\n",
- "from langgraph.checkpoint.memory import MemorySaver\n",
- "\n",
- "# List of tools we want the agent to use\n",
- "tools = [smartscraper_tool, tavily_tool]\n",
- "\n",
- "# We set up the agent's memory to review the different reasoning steps\n",
- "memory = MemorySaver()\n",
- "\n",
- "# Add a configuration to specify where to store the graph states\n",
- "config = {\"configurable\": {\"thread_id\": \"1\"}}\n",
- "\n",
- "# Initialize the ReAct agent\n",
- "graph = create_react_agent(\n",
- " model=llm_model,\n",
- " tools=tools,\n",
- " checkpointer=memory,\n",
- ")"
- ]
+ "cell_type": "code",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": 27,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "markdown",
+ "source": "Let's visualize the `graph`",
+ "outputs": [],
"metadata": {
"id": "_UYcJ2Mxip5w"
},
- "source": [
- "Let's visualize the `graph`"
- ]
+ "cell_type": "markdown",
+ "isExecuting": false,
+ "stdinRequest": null,
+ "executionState": "idle",
+ "execution_count": null,
+ "executionEndTime": null,
+ "executionStartTime": null
},
{
- "cell_type": "code",
- "execution_count": 28,
- "metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/",
- "height": 350
- },
- "id": "2FIKomclLNFx",
- "outputId": "11374a54-6cab-4037-a836-46e3d0dcb946"
- },
+ "source": "from IPython.display import Image, display\n\ndisplay(Image(graph.get_graph().draw_mermaid_png()))",
"outputs": [
{
- "output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAHwAAAFNCAIAAABNLZxVAAAAAXNSR0IArs4c6QAAIABJREFUeJztnXlAU1cW/+/LvrOFNWERVMCdEarFrS4VUeo2VXFrtdqpW6f92dra0aozU63j1Lr+uqFVK1ZUVByZKrVgVRQ3VEDZBNkiBMhGNrLn90f8UUcj8l7uy0sgn79Cknfu4cvlvPvuPfdcxGq1Ag/OhUS0Az0Rj+gE4BGdADyiE4BHdALwiE4AFOgWLRbQXKfTKE1apdlsshh07jEkpTFIDDaJzaNwvSk+QTRc20JgjdMtJlB6s63mvqa+QiuIYtJZZDaP7M2n6XVmKPbxxmoFSqlRqzTTWaTmen3kQHbkAE5IFAOPtuCIfutXWfktlbAPM3IgJzyWBcMxImmTGGvua6Rig1JqHPEGPyCMDte+o6LX3tfkHBYPHuM9fLIfPK9cBdHD9mtnJcERzFEz+RDNOiT6rV9l8hbj2FkBVDoC0SdXo65Ue/FEy9xPwuhMOOMO7KIX5sqNeku37ODPo5Kbjm6rX/z3XlQahO6FUfTcjBYWh/xqSo9QvIP9n9ekrglj88gO2sHy/3LvkoJGJ/U0xQEA89eGH/1XneN2UIv+uKpd0WwYNQPmjcVdYLBJU5aG5B5tcdAOatEvn2odMNLbwVbdl+BejHaNueaBxhEj6ESvvKPyDabxQ/B9YHNxElP8rp2VOmIBnegP76pHvuHvSHtdp6mpqbGxkajLO8E3iBbZn111T43ZAgrRW0V6tcLE9nb03t0VRCLR1KlTS0tLCbn8pQSEMyrvqjBfjkL0RyWaXgPYmFtChclkwjaWtV2F+fIu0msAu+Y+9rCOYpz+331NiSl+0GfgdDrd1q1bL1++DACIi4v7+OOPrVbr1KlTO76QkpKyadMmg8GQlpaWk5PT3NzM5/OnTJny3nvvkclkAMDs2bOjoqKioqIyMjJ0Ot2BAwfmzp37zOVwfQYA5GW09I7jhkUzMVyLYmq3vlKb5BeEoY3OOXDgQHZ29rJly/h8fnZ2NpPJZLFYX3zxxfr165ctWxYfH+/r6wsAIJPJN27cGD16tFAorKio+PHHH3k83oIFC2xGCgoKdDrdjh07tFpteHj485dDh0JD5M16fEU36CwkEkKhwp9jaWxsZDKZixYtolAo06dPt70ZExMDAIiIiBgyZIjtHTKZfOjQIQR54oBIJMrLy+sQnUKhbNmyhclkvuhy6LC9KJo2jLPWXY3pWpXZ8cdfuyQnJ+t0uvfff7+qqqrzb8pksq1bt06fPn3cuHHV1dVS6R/jtgEDBnQo7hzYPIpWacJ2bVdFt1oAnYmL6ImJibt27ZJKpampqV988YXJZP83kUql8+fPv3nz5vLly/fs2RMbG2s2/9HRnKw4AIBCJSEkjP/3XQ0vLC5Z0WrA1sZLSUxMHD58+NGjR3fs2BEcHLxkyZLnv3Py5EmZTHbw4MGgoCAAQFBQUF0dhGkQzKjkRjoL40xvVy+js0hGvcWCw9KbwWAAAJBIpPnz5/v7+5eXlwMAGAwGAKC1tbXjawqFwsfHx6a47cdOxl3PXw4djdLE5mFcYUZxWUQ/tkZp5vpADjIZGRmXLl2aPHlya2tra2trv379AACBgYECgSA9PZ3JZLa1taWmpsbHxx8/fvzbb78dPHhwXl7e1atXLRaLQqHw9rYzEfT85XQ65CU3AIAXn4rtQhT/IFxfanUx9sewFyEUCg0Gw44dO7KyslJTUxcuXAgAQBBky5YtbDb7q6++Onv2rEwmGzdu3NKlS0+cOLFu3Tqj0Xjw4MGIiIhjx47Ztfn85dDdLslvC4/BuBqM4uHocVX7zfOyGasE2FrqTjRUthfmyqYvxygFivAi6M1ESMBksFJevGSVkpKiVtuZCRo0aFBxcfHz73t5eZ05c6brPmAjPz9//fr1z79vtVqtViuJZOff/cKFC1TqC6OHuFbXN46H2R90y3X3LilUMlMnKxhisdhisXTdIIlE6rg34odOp7MbYSwWi8VioVDs9Lzg4OCOB7FnrWks6Vtql26OxOwP6jXSHzfUpH4cxsLnQcktyMtoCYpg9BuOvaejHmmOmuFfdFmBuT13Ryk16bRmRxTHInqfOI7RYCm+0uZIq+7L0X/XT5gX6KARLM9Uo2f6VxWpHVk6cVOOf90w9S8hNIajKUfYk41yfmqOHMDu8yeOgx64C8d3NCQtDPbiQ8hzxv5HS3or8FGJ+tavcsedcHGUUuP3a6tHTfOHojiEBNI7eYqSfEXiG/w+cd2wy7erzdfOSg3t5gnzA6l0aBsoIKRKq2Sma2clep0loh+7V3821xf+RgPnU1+uFdfpiq+0jXjDL3aYQ2OV54G2KUAiMtg2BdAYpOBIJoNFYvMoXB+qyYTiWYlALCarSmHSKs0IAoqvKIR9WH3iONDltgFN9A6kTYaWer26zahVmhES0CghTwc/ePAgLCyMy+XCNctgkehMMptH5vFp4TEsEp4Pf/BFx5ulS5euXLkyLi6OaEew49ldRwAe0QnA/UQXCAS2HCP3xf1Ef/z48dN5AO6I+4nOYrHsLju4Ee7nvVarRbVO4oK4n+g+Pj6enu5s5HK5p6c7G6FQ6Bm9OBuRSOQZvXhAjfuJzuFwPDdSZ6NWqz03UmfD4/E8Pd3ZKJVKT0/3gBr3Ez0oKMgzTnc2YrHYM073gBr3E90zDUAAnmkAD1hwP9FDQ0M94cXZNDQ0eMKLB9S4n+ieFAwC8KRgeMCC+4nuyXshAE/eCwF4ZhkJwDPL6AEL7ie6t7f3i0oluAvuJ3rnhaTcAvcT3TOfTgCe+XQC8PR0AvD0dALw8/Nz957uNpt3k5KSaDQaiUSSyWRsNptKpZJIJCqVmpmZSbRrqHGb4gksFquhocH2ur293fZi2bJlhDqFEbcJL8nJyc88EwmFwjlz5hDnEXbcRvRZs2YJBP9TfHLy5MnQyzI4B7cR3cfHZ9KkSR0/hoaGPn0Gg3vhNqIDAObNmxcaGmp77b7d3M1E5/F4SUlJCIKEh4e7bzfHa/SikptkYoPRAH99J3HwzBu96hITE8XVAADINQpJCMLxpvgG0ig4H68KeZyulBovnZRIGvXhsWyNys2eG+l0sqxZZ7WAPnGc+Nd98GsIpuhqhSnrm8Zx80K4Pm4z/LfLrRwJi0MaPhmXs3qgxnQrOPj32mkrw9xdcQBAQhJfq7bgV3ISmugFv8hGTAuAZY1wEpL4NQ80Oi0uaQfQRG+s1nJ9u9uRmfJmXA68gSa6xYLwfDEezOGa+AUzlHIjHpahia5RGC0W95iw7CIGnRngk9TkTg9H3QaP6ATgEZ0APKITgEd0AvCITgAe0QnAIzoBeEQnAI/oBOARnQC6v+hqtbryYTnRXvwP3V/0pX9JPXcO9yNPUeEGootE9Y5cbjs43KUgbGmtpaV5/4Fvbty4qtGoQ0PD581dPGH8k1wiqVSyZ++/CwtvUKjUoUOHXb6c+/236b16RQEAzvwn8/iJdImkJSgoZPy4SXNmL6TT6Q+rKt7/6ztbt+z+Yd+e6urKwMDg997964gRYwAAqfNS5HJZ1pkTWWdOBAYGZfycTdTv+zSEiW4ym8rLH0yb+qYXz/tyft7mLesFgtDYmP5ms/lv6z6UyaUffLBWJpOk7dsbNyTepvjBQz+cyEyfOSM1PDyyoaH22PGfRI/r/7b2HwAAvV7/93+ufX/VmuCgkAMHv/tiy7qMn7O9vLw3bdz2yaerhgweOuvN+VSaqyxsESZ6SLDg4I8nbDmhycnTZvx5wtWrv8fG9C8ru1/5sHzjhq2vjZkAAKivrz13/j8Gg0GpbDvy84/r120eM3q8zYKfn/+OnV+uWvmx7cf3V60ZN3YiAGDp0lXvLVtQVHxn9KhxMdH9KBSKnx9/4MAhRP2mz0Pkyn1VdeXBQ99XVJQCAMxms0wmBQC0tDYDAEJChLbvCIVhFoulvV1bWHjDZDJt3rJ+85YnR3TbkkckrS22H5kMpu1FYGAwAEAiaSXo13o5hIl+5+6tT9e+Hzck/pM1G9ks9oZNayxWCwBAIAgFAJSU3OvbJwYAUFZ2n8/39/LylsokAIAtm3cG+P/PwbchIcKa2uqn36FSqAAAi8V1U50IE/3w4X0hIcItm3faDjPv6KfRfWMT4of/kLa7ublJ0Sa/eu3S+nWbAQBc7pOz+8LCItC25Wq7TQgbMrYpFb2j+toUNxgM2vY/alu8v2qNUBjWIKrz9vLZu+eALbjHxSUgCHI661iHhY79GJ3DZDClUgluvwcWCOvpQ4bE5+Sc/eXcGR7X68TJIyqVsram2mq1ms3mFavenvXmAoEgFEEQlUqpVqs5HI5QEDpzRurJU0f/tv7/jBzxmlQqyTpz/Mstu2xRqBMGDozLzTv/89GDXC5v2CsjAgIcPZbbcQgT/Z1Fy2VSyZ69/+ZyeSlTZs5+c8HXO7fcvXf7T3EJ8UOHH07fZzKZbN/kcri7d+2PiIhcuWJ1QEDg6dPHbt0q8PPjjxo51p//8pyy9/7yV5lMcjh9n7eXT3R0P1cQHVoC6cFNtZPeEbK9IPwVzWazbdOi1WptbHq89N3U2bMWLF7k7D1d+aebIwewouPhbz1wuWRPvV6/YtXbAQFBgwf9iUqllZTc1el0UVF9ifYLJi4nOoIgE1+fkpeXc+DgdzQarVev3hs3bB09ahzRfsHE5USn0WhzZi+cM3sh0Y7giBvMMnY/PKITgEd0AvCITgAe0QnAIzoBeEQnAI/oBOARnQA8ohMANNF9Q2gutj7jKHQWmUrHpVNCM0qlkqSNOljWXAFRpcY3CJesDWiiRw7iSJv0sKwRjqbN5MWnevvjsh0ZmujRQzlGvbnokgyWQSKxgtyjTa+96Y+Tecj1Xi783EKhknyD6P5CBkSzzgEhISqZUSkzXs9uefvzCK4vXvPe8IthPryrrrmvMRmtksd2oo1er0MQEs2BDDetVkOnM1AVIZXL5TQajc1md/41JpdMoSEhkcxhk/Cq9PIEqxM5e/ZsWlqag0aWLFly584dVJcsWLAgPj7+jTfeyMzMdLB1KLhN2dcOCgoKYmJifHxQlHv68MMPr1y5giAIlUqNiopasWJFYmIinj6+BCc9HInF4m+++QaKqVdffRWV4rYijrYXRqOxtLR03bp1H3zwARRnsOEM0dVqdVpa2ooVK6BYO3LkSF1dHapLwsPDO86jIpFIKpXq6tWrY8aMgeIPBpwhOofD+fzzz2FZu3jxokyGbmDK5/OfuYsymcxLly7BcgktuIv+2Wef1dbWQjS4cePGmJiXpNI9Q0BAAJ1Ot722Wq0BAQFXrlyB6BJa8BV9z54906ZNi4hAnWfbCaGhoUwmE9UlAQEBtktYLFZ6evqgQYMg+oMB9xu9HDx4cNSoUVFRUaiumjlzptlsPnPmDACgrKyMRqOhtQATnIai1dXVBw4cwMPyBx98cOXKFTwsOw1cRNfr9SkpKXhYtlqtVVVVcrncQSNlZWWrV6+G5BFq3C+8wGL//v1RUVGvvfaa85uGL/qDBw+sVuuAAQPgmu0gIyNDKBSOHDkSJ/tOAPLopaamZuPGjfgpDgBobW2tqqqCYqq8vDw/Px+KKXTAjVa3bt1Sq9VwbT6DSCSqq6uDZW3mzJk1NTWwrHURmOFFp9MBABgMd5pJl0gkIpFoyBCnbu2FFl7UavWkSZOcoPi9e/e+/vprWNb4fL6TFYcp+rlz53bv3g3LWicwGIw7d+5ANFhQULBp0yaIBl+Ok8OZ4xiNxqqqKrg233777YaGBrg2OwFOTM/MzExISAgPD4fRDbo/EMJLUVHRL7/84kzFP/vsM5FIBNdmWVkZXIOdAEF0MpkM8c7WFWzP8XBtHjly5Pz583Btvgi3nAZobm5GECQgAOYRHKWlpQUFBUuWLIFo80U4KvqXX345ZMiQ5ORkeC51fxwKLwaD4d69e85XXCaTffXVV9DNXr9+XSwWQzf7PA6JTqPRjh071oUvQsbX1/fixYvQBaqurj59+jRcm3ZxSPSKioouFl2Bzs6dT6rzQGTkyJG+vjjndgHgUEwXiUQrV660LYB5QAX2nl5aWrp48WKozqBApVKtW7cOutnjx4931JnBD+yiT5w4cfr06VCdQQGXyxWJRPfv34dr9vDhwy0tLXBtPg9G0Q0GQ05ODmxn0LF161boITg1NbUjFww/MMb03NzcnJycbdu24eBS9wfjX5VOp7/11luwnUHNmjVrYC3d2cjPz29sbIRo0C4YRR85ciSuC6FdJCEhITsbZs3iY8eOwU0CtA+G6WCTybR7924c5pmJJz09/dGjR3i3guX5oqys7Pbt2zh0ACxUVVX5+fmhzVh/EfPnz4dip3OwhBcGg7Fq1SocnMGCQqH47LPPYFk7f/68UqmEZe1FYBG9d+/eCQkJODiDhfj4+JiYmObmZijWtm3b5oS5biyiZ2VlQV+4cYQPP/wwMBBCXVGz2TxhwgQvLy8YTnUKhvtAcnKyWCzG4QaDnR07dphMJqK96Cqoe7rJZJo3bx6UngURJpO5f/9+B420trZev34dkkedgVp0CoWyYMECfJzBztKlSyMjIx008ttvvzkntRG16NXV1T///DM+zmCHTCZPmDDB9nr06NHjx4/HYEQgEDhnFQz1OL2kpKS6uroLX3Q206ZNa2pqslW+9/LyunXrFtoh1ujRo3Hz7n9ALfqAAQP69++PjzMYefPNN5uamvT6P0oRUKlUDOtK586dmzhxIqqqA9hAHV569+7dp08ffJzBiE6nsyUMd0AikdCKXldXl5aW5gTFsYh+9OhRZyZDdYVTp04lJCQ8rZfFYkE7LU4mkyE+2XYOatFzc3Of6VaEQ6PRvvvuuzlz5jw9A4O2zwqFQqc9ZqMWfe7cub1798bHGYdYvXr1mjVrhEKh1WrFENN/+umn+nqHTibsOu6RVmc0WDUKU1ccFYvFu3btkkgkGzduFAqFXW9i3rx5Bw4c6NjNjgEEAC8+FelCN0Yt+t69e5cvX+6cGw4AoPKOuuiyQtqo9/KnGQ0W3NqxWixWB1dHeT60x1Wa8FjO0PHewZGd7UhBJ7rBYBgzZkxBQYEjznWdoitt9eXt8RP5HG+XO0biRSilxiunmxOn+IXFvLCAATrR9Xp9Xl6ecx7b7l5UNNcbRkyHmZrrNM4fEL2S5Bsey7L7qYvGdK3S/NvRlrGpwUQ7ghGjznr5VNP05SF2P0UXxVpaWrZv3w7Jsc6QNOrNXbpxuihUBiIT6zVt9pPF0IkulUrv3r0LybHOUMpMAaHutB/1eYR92PIWo92P0IkeEhLyySefQPKqM0xGi64dv7GKM1ArTFaL/X9WdKJ7eXkRXhaoG4BO9JKSku+++w43Z3oKqG+kjx49ws2ZngK6h464uDjHV8U8oBPd19fXORtEujfowsv169cPHz6MmzM9BXSii8VitCVXPTwPuvCSmJjo/Ooo3Q90osPdGd5jQRderly5kpmZiZszPQV0Pb2pqckZGxW6O+h6emJi4rRp03BzxlFKy+4/nf2Cgd8v/TZ2fHx9Pb4dC53oQqEwOjoaN2cc4nzO2ZWrFul0xGybRwU60QsKCn755RfcnHEIB/u4M0Enek1NjatlGtn4Lff8zl1bAQDTZ04YOz7+fM5Z2/ulZff/+uHSpOTEaTPG/2vb35WqJ1tbTCZT2r69b86e9HrS8KV/mZt/9Xe7Zq9fz39n6ZxJk0csemfWqdPQ6n2gu5EmJCRotVpYbUMkfuiw2bMWHD+R/uXmnWw2RygMAwDU1j766ONlERFRn6zZ2KaQHzj4XUuLePtX3wIAvtr+xW+55xbMfyciIuq33HOfb/h41460QYPinrap1Wo3/ePTiPDIj1avr6mpkkpbYXmLTnRXy2LswNvbJyRECACIjR3g5eVtezP9yH4SibTtX3u5HC4AgMvlbdm6oajojo+Pb86v2W8tXLro7fcAAGNGj1/w1oyDh77/evv/zFrLFTK9Xj9q1LjXJ0BeiEcXXu7evfv77/b/E12Qe0WFcXEJNsUBAAkJrwIAKipLi4rvAABGjhxrex9BkIT44RWVpc9cHhIs6N9/UPqR/SdPZRgMBoiOoRO9rKyssLAQYvO4otGovb3+yG7kcnkAAImkVaNRAwB8vP+YLuXxvLRarUajefpyBEG2btmdNDHlu+93vrVoZlERtLKn6EQfPHjwiBEjYLWNB09nlPD5AUplW8ePcrkMAMDhcPn8AADA0x/JZFIKhfJ8oWAOh/PhB2sPHTzJZnPWf74a1v0Mnej9+/cfPnw4lIahw2QwbR25453+/QfdKyrsyDG+fDkXADBw4JDY2AEIgly/8WR7kcFguH4jv3//QWQymUalPf33sA1DQ4IFM2ekqjVqsRhOrQYyqtq+paWlNTU1AoEAStudIK7Ttastgt72M6TswmCyzvznRG3dIwQgpWUl0dH9IsIjT546eq+okEqlXb+Rv//AN4MGxr391rs8npdY3HQ66xgAiETS+u23O2pqq9d8vCE4WEChUk9nHSuveBAWFsH3839r0UyJpFUqlZzOOmbQ65e8s6LrycCPilWCKIYX384xsuhEv3DhwoMHD5xw2h4G0Xlcnr9/4O+/XygouKJSKZOSUng8r4ED4m7dLjibfbKismzsaxPXfLzBlpebEP+qRqM+d/5MXl4Om8X++KP1ttssl8MNDgq5c/cWCSHF9hsoEtXnX714JT/Pz89/7SebBAIUacCdiI4ure727dttbW3Y9q6h4t4lhVRsemUSH++G8OPC4caE171Do+30G3Tj9Pj4eHhe9VxQTwMUFxfj5kxPAZ3ot27dclrp5W4MuvASHh7OYqG4uXmwCzrRhw0bhpsnPQh04UUkEj148AA3Z3oK6EQvLCw8efIkbs70FNCFF4FAAHe+rWeCepzuGao7Duq0Oug1hXsg6EQvLS09dOgQbs70FNCJLhAIBg8ejJszPQV0MT06Otpl817cCHQ9XSKROKe0GI1OorNwr2OOK1wfKomM2P0I3S/W1NS0b98+SF51Bs+P2lzjBrlanVBXpvYNotn9CJ3oQUFBr7/+OiSvOiMwjE6m2O8mboFWaQ6KYDA59muFuGhtAABAxW1V6Q3VhAX2t9e7OGf21ie9HegvtF89Bl1P12g0P/30EyTHXkJ0PHfoeO9z+0UtDTq9m+ye1ipN4tr2zB21U5YEv0hx1D1do9EkJydfvnwZkpMvp6lGdzdP8bhaSyIhBj1M6TEUV+scn0Bau8oUHstOSPLl+nQ2LEQ3ZGSz2U6u+RrcixG8JAgAYDbCDINtbW2pqannzp2DaNMCAJXapfuQ68Z0XNHpdIcPH3733XcJaR216CdPnkxOTvasHzkC6qB24sQJJxxKgzd6vT4rK4uo1lGLPmPGjOdz/twOrVa7d+9eolrvoTFdr9fn5+c7IWvKLqhFv3nzppeXl2fayxFQh5fi4uK8vDx8nHEearWawF3IqEUfPXr0wIED8XHGebS0tBByULONHhrT5XJ5YWFhx4kOTga16FKpNCsra8mSJbi51P1BHV44HI7jh9sQTnl5OYGH2KIWnU6nL1++nKhD1GFx/fr1yspKolrvoTH95s2bHA6nX79+hLSORfSCggI/P7++ffvi41L3B8uEcn19PYETF1DYt2+fVColqnUsoo8aNYqof0womM3mH374wc/PjygHemJMVygUBQUFzim9bxeMon/zzTdz5swhsLO4NRgXCdVq9bVr12A74ySysrJKSkoIdABjT29tbW1ra3PNA49eyqRJkw4fPuzv70+UAz0upmu12uLiYmIrHGDPQdi+fXtTUxNUZ5wBi8UivKYEdtHZbHZ2djZUZ5zBP/7xj8ePHxPrA/ZDmxYuXFhTUwPVGdypqKh4+PChE6p4dE7PiulyuZxKpXI4HGLdcEj0zMxMlUq1ePFiqC51fxxK5ktOTs7NzYXnDL6kpaU5J7n+pfSg8JKampqRkUG0FwCC6FqttqWlJSIiAp5L3R9Hc4VZLNbmzZvv3IFWPg8PDAaDS5UIhpCg/emnn7p45Z0NGzZQqXZqaRFF94/pcrm8qqrKaYd2dwU4WxFEItH3338PxRR0fHx8XEpxaKILhUKRSORScdPGrl27nLZJquvADC9qtZrwh72naWho+O9//7ts2TKiHXkWmKIrlUqTyeQ5Z+2lwNxexuPxPv30UxcZPmZmZjrnkGAMQN5/v3fv3urqarg2MZCbm/vo0aO4uLgufJcAuv+Q0QXBpdLE9u3b4W7RREV2dvYz5eddDVxE/+ijjwoLC5ubm20/Tp48GdchxNNbh9auXUun09lsNn7NOQ7u4SUpKUkqlQqFwvT0dDwGlPv27bMde3379m21Wm2xWHg8HvRW4IJjIZuSkpJhw4bZUgZ1Oh1Oh1M/ePDA1m+GDh06ffp011ccX9EXLVpkNpttr+VyOU754NXV1QiC2M4NUSgURO1SRAUuok+aNGno0KE2LWyYzWY8jgKrqKgwmUxPv9PW1vbKK69AbwguuIh+/vz5Xr16cTgci+VJsRCr1VpeXg69obq6OqVS+fQ7AoFg7Nix0BuCC/YUjM7JzMzMzs7OyMgQi8VyuRxBEKVSKRaLg4KCILZSVFTU3t6OIAiJRAoODk5MTJw5c6bLHjvWAV6iAwBSUlJSUlIuXbqUkZFRW1vb1tZWW1sLV/SSkhIKhRISEpKUlDR16tSQEPeoPeXQkFFU2V5T2t7SoNOqTO1qM4IgRr3Z7jetVqvFYiGT7RcSw4zZbEYQhISQwAuK2/gGMdrVRiaH4htED4qgRQ1ks71w7GddBIvoaoXp1gVF2Y02tg+dG8ChMihUOplCJ5MpJOBycwqIUW8y6U1mo0Ul0aqlWq4PdfBor37DuET6hEp0sxlcPNb66L46qC+fw2e+qNijK6NTGWT1bQatfswM/14DiSkVhEL0ugr95VMSli/LL8wNHkA6R68xSmsVPF/S5EU+oktuAAAFcUlEQVQBUKundYmuin7/mvLWBUWvVwhOvYSLXKTSKdRz16A4pwsKXRK9rqL90ilZ2BCYAw8XQSvXa1oVsz5w6rDn5f9aj+5rrmR1T8UBACwfOivA++dtDc5s9CWiqxWm3462CAd1T8VtsL3pTF/ur0danNbiS0T/735x+JBgZzlDGD4CrqzFUnPfSUsfnYlefltpBhQ6x4US0vDDJ9T78mmJc9rqTPT8LKl/ZE/Jp6CzqTQ2vfS6sgvfdZQXil5dpGH7sagMyA/uUDhyYsO/ds2GbtY31Ls4n1DRK++qmTy3L3qJCjqHqlKYVHJTF77rEC8UvbZUzQtw6eVdPODwWY/uq/Fuxf6UW0u93k/AJuFzQIJM3vifczsrq29SKXRBSHTyhGWhgn4AgANH1vjzw8lkyo3bWSazMbbviJlvfMJkPFnLvldy4deL++SKpkD/SKsVrxr2XD9Wqwj3Sln2e7pGZTIacPnFlErJ3rR3tVrltMmrpyStMpuN/3ffe03NT5LCLl09IpM3vrNg+/TJq4vv5+b+fsD2/p2inPTj63kcv+mTP4ruM7xR/BAP3wAAJApJ8liPk/EO7Pd0rdJMpuByC71w6UcO2/e9xXvJZAoAYOjg5K07/3zj9pnpU1YDAPz9wua9+XcEQcKE/YtLL1ZUXU8B7xuN+jO/fB0ZHvfu23tsM/ISaQNOulPoZK0K95huX3STwUJl4TI8L6+8pmhr/ts/X+t4x2w2KpRP0pKoVEbHcravd3BtfTEAoKauSKNVjEpM7VgDIZHwGlNRGRQmF/fnEvuik8iIQYvLH1yllvaLHjll4sqn32TQ7SQhkclUi8UMAJC3iW1/Azz8eQazwaxR4H72p33RWTyKxYTLMzGLydNo2wL8UWyB5LB9AABqrQIPf57BqDczubiv59m/kbK5ZIsJlxtpn8iE2vqihsd/5MDoDS8ZLYQE9UEQ0p0iZ5zibjKYud4EhZeAcIayVYdHe6+PXVpWeTXt0F9Hj5jHZfuWPyywWMyL5/+7k0t8vINe+dMbNwrPmEz66D6vKlWSssqrXA4u5cPaFbrwPvbPPoPIC2I6CYREsVSSdi6fCbc9vp9w1btpZ3N25106CBBEGBwzYvisl141fcpHFArtbnFORdWNXmGDQ4L6qtS4VFXUyLRRAwPxsPw0L1w5un+1reSGLjiWj7cHroNRZ66/07jkn7hvuX/hTSN2mNeNXzu7d2m1yi07Ztj9iO8rlMhEz7/fP2b03D9vxOSnHdp16s3bp9n9iMPytnvjfW3E/AmvvfMig21i9YARzlhz72yN9PovsoYai3+kj91PLRaLok38IrPAXgYMjca0DUWg0IkDJpORQrFzP2QyuEym/YwXqwWU5tWs3O6M+nsvWZj+Zk11zJhwd8xvQYu4Utp3IPVP46D1iU54yXLdxAVBLVVOWk8hEJ3KSLYanaP4y0XvPZgd3ocmqZE7xxtCsFpA9Q3RrA+dl9Lz8hSMV6f4BoeSmx92W90flzQt2tDLmS12KaVsxBs+3j7m5oeEFRzHCb3aeP/XmhkrgtleTl2VRJHLWJireFSq5wbyGFzcn9mcgLReqWlVLlwXjjh9lIAua/dxVXve8VYSlRrQx49Kd8U1664gE6laqmQDEr1HTiMm1wFLfnpFobrkmkopM3L82F5BbCqD4vpjSrPRopa2q1o1WoUuPJY95s98Bsvp2br/H+w7MVpF+od3NY21+pY6LYmEUJlkGpOC09wkZhgcmrJVq9eafYMZXB9K9FB2rwEcKo3gLgJnx7S+3aJVmvQ6K3Cx8g4kCsLikNk8CkJYt7aDpwoGAbhSB+gxeEQnAI/oBOARnQA8ohOAR3QC+H+6fjfnM2J1xQAAAABJRU5ErkJggg==\n",
"text/plain": [
"