From 81c0b4a14da4ea53d13562a91fd9994685150f98 Mon Sep 17 00:00:00 2001 From: grexrr Date: Sat, 15 Nov 2025 16:27:53 +0000 Subject: [PATCH 1/3] add notebook --- notebooks/openstreetmap_rag_pipeline.ipynb | 673 +++++++++++++++++++++ 1 file changed, 673 insertions(+) create mode 100644 notebooks/openstreetmap_rag_pipeline.ipynb diff --git a/notebooks/openstreetmap_rag_pipeline.ipynb b/notebooks/openstreetmap_rag_pipeline.ipynb new file mode 100644 index 0000000..6fab4f8 --- /dev/null +++ b/notebooks/openstreetmap_rag_pipeline.ipynb @@ -0,0 +1,673 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "Qo0Iu0MMaOTU" + }, + "source": [ + "# OpenStreetMap RAG pipeline" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MGH9HBLVagni" + }, + "source": [ + "> [OpenStreetMap](https://www.openstreetmap.org/) is a map of the world, created by people like you and free to use under an open license.\n", + "\n", + "OpenStreetMap has already provided [Overpass Api](https://wiki.openstreetmap.org/wiki/Overpass_API) queries\n", + "\n", + "notebook 会展示:基础 OSM 查询 → LLM summarization → Agent + tools\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nn5lixpbbCEP" + }, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6266U8y-avQ9" + }, + "source": [ + "## Setup Environment\n", + "- Install `haystack-ai`, `osm-integration-haystack`\n", + "- Setup `OPENAI_API_KEY`\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "u-Acpx6na0DA" + }, + "outputs": [], + "source": [ + "!pip install -q haystack-ai osm-integration-haystack" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6KSIfsm4ayYi" + }, + "source": [ + "## Add Your API Keys" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "dsCD2okqfEOg", + "outputId": "fe35ba66-9a57-4cf6-e3ae-ecd21801e76f" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Enter OpenAI API key:··········\n" + ] + } + ], + "source": [ + "import os\n", + "from getpass import getpass\n", + "\n", + "if \"OPENAI_API_KEY\" in os.environ:\n", + " del os.environ[\"OPENAI_API_KEY\"]\n", + "\n", + "if \"OPENAI_API_KEY\" not in os.environ:\n", + " os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter OpenAI API key:\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2SVEZnuVdpXl" + }, + "source": [ + "## Part1: Setup Environment\n", + "\n", + "这个部分是使用LLM agent处理OpenStreetMap数据的前置步骤。`OSMFetcher`来自'osm_integration_haystack'负责通过封装的`Overpass API` queries..." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NgzxySwdjHFs" + }, + "source": [ + "### 1.1 From Name [String] to Coordination [Tuple]\n", + "\n", + "这个部分示范了使用[Nominatim](https://nominatim.org/)来decode为具体坐标。这个子章节主要说明了数据输入的一种alternative。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "pOfsHARDjRzN" + }, + "outputs": [], + "source": [ + "!pip install -q geopy" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "z7r0PZf1jUpL", + "outputId": "067fd540-1e3f-468d-9386-ee76742614ac" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Query: saints peter and paul's catholic church\n", + "Latitude: 51.8989077\n", + "Longitude: -8.4743188\n", + "Display name: Saints Peter and Paul's Catholic Church, Carey's Lane, The Marsh, Centre B ED, Cork, County Cork, Munster, T12 FH27, Éire / Ireland\n" + ] + } + ], + "source": [ + "from geopy.geocoders import Nominatim\n", + "\n", + "geolocator = Nominatim(user_agent=\"haystack-osm-cookbook-demo\")\n", + "\n", + "# Geo-decoding a name string into geocode\n", + "location_name = \"saints peter and paul's catholic church\"\n", + "location = geolocator.geocode(location_name)\n", + "\n", + "print(f\"Query: {location_name}\")\n", + "print(f\"Latitude: {location.latitude}\")\n", + "print(f\"Longitude: {location.longitude}\")\n", + "print(f\"Display name: {location.address}\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4nFzlApoj8_L" + }, + "source": [ + "实际的Geocoding/decoding往往还需要考虑歧义和命中率等问题,预处理并非是本教程的重点。在一般的map-based软件实践中,出于对精度的考虑,后端接受的往往是一组实际的(latitude, longitude) tuple。在此我们就直接使用Saints Peter and Paul's Catholic Church的坐标,并尝试寻找200米内最近的咖啡店。" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "slQpVtXHWvIH" + }, + "outputs": [], + "source": [ + "from osm_integration_haystack import OSMFetcher\n", + "\n", + "CENTER = (51.8989077, -8.4743188) # (lat, lon)\n", + "RADIUS_M = 1000" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "4hO-qR8Lewgj" + }, + "outputs": [], + "source": [ + "osm_fetcher = OSMFetcher(\n", + " preset_center=CENTER, # Cork, Ireland\n", + " preset_radius_m=RADIUS_M, # 200m radius\n", + " target_osm_types=[\"node\"], # Only search nodes\n", + " target_osm_tags=[\"amenity\"], # Search amenity types\n", + " maximum_query_mb=2, # Limit query size\n", + " overpass_timeout=20\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "y_x4EHRKv9Te", + "outputId": "70584bae-dfc1-4863-fd25-c053d21b4921" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Current Query:\n", + "\n", + " [out:json][timeout:20][maxsize:2000000];\n", + " (\n", + " node[amenity](around:1000,51.8989077,-8.4743188);\n", + " );\n", + " out geom;\n", + " \n", + "Status: 200\n", + "Response: {\n", + " \"version\": 0.6,\n", + " \"generator\": \"Overpass API 0.7.62.8 e802775f\",\n", + " \"osm3s\": {\n", + " \"timestamp_osm_base\": \"2025-11-15T15:10:27Z\",\n", + " \"copyright\": \"The data included in this document is from www.ope...\n", + "[OSM_Doc_Converter] Reading Raw OSM GeoJson...\n", + "[OSM_Doc_Converter] Loaded 955 entries.\n", + "[OSM_Doc_Converter] Batch-processing data cleaning.\n" + ] + } + ], + "source": [ + "result = osm_fetcher.run() # Haystack component 标准接口\n", + "documents = result[\"documents\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "DgwnILdAwCyl", + "outputId": "1b7e6004-ba01-4c4c-ef06-96cee27e5ab9" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "📄 type: \n", + "\n", + "--- content ---\n", + "Restaurant: Koto, Carey's Lane, 6-7, T12 FH27. Tags: opening_hours=Mo-Su 12:00-22:00\n", + "\n", + "--- meta keys ---\n", + "['source', 'osm_id', 'osm_type', 'lat', 'lon', 'name', 'category', 'tags', 'tags_norm', 'address', 'distance_m']\n", + "\n", + "--- full meta ---\n", + "{'address': {'housenumber': '6-7',\n", + " 'postcode': 'T12 FH27',\n", + " 'street': \"Carey's Lane\"},\n", + " 'category': 'restaurant',\n", + " 'distance_m': 27.86087599824802,\n", + " 'lat': 51.8990101,\n", + " 'lon': -8.4739482,\n", + " 'name': 'Koto',\n", + " 'osm_id': 5203928867,\n", + " 'osm_type': 'node',\n", + " 'source': 'openstreetmap',\n", + " 'tags': {'amenity': 'restaurant',\n", + " 'contact:facebook': 'https://www.facebook.com/KotoCork/',\n", + " 'contact:instagram': 'https://www.instagram.com/kotocork',\n", + " 'cuisine': 'asian',\n", + " 'email': 'info@koto.ie',\n", + " 'opening_hours': 'Mo-Su 12:00-22:00',\n", + " 'phone': '+353-21-4274172',\n", + " 'smoking': 'no',\n", + " 'website': 'https://koto.ie/'},\n", + " 'tags_norm': {'amenity': 'restaurant',\n", + " 'contact_facebook': 'https://www.facebook.com/KotoCork/',\n", + " 'contact_instagram': 'https://www.instagram.com/kotocork',\n", + " 'cuisine': 'asian',\n", + " 'email': 'info@koto.ie',\n", + " 'opening_hours': 'Mo-Su 12:00-22:00',\n", + " 'phone': '+353-21-4274172',\n", + " 'smoking': False,\n", + " 'website': 'https://koto.ie/'}}\n" + ] + } + ], + "source": [ + "from pprint import pprint\n", + "\n", + "first_doc = documents[0]\n", + "print(\"📄 type:\", type(first_doc))\n", + "\n", + "print(\"\\n--- content ---\")\n", + "print(first_doc.content)\n", + "\n", + "print(\"\\n--- meta keys ---\")\n", + "print(list(first_doc.meta.keys()))\n", + "\n", + "print(\"\\n--- full meta ---\")\n", + "pprint(first_doc.meta)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "hEYMy0ZKwKcy", + "outputId": "0e83a3a3-a7ad-4943-950f-0b3932dce2c4" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Previewing first 5 documents:\n", + "\n", + "1. Koto\n", + " Type: restaurant\n", + " Distance: 27.9 m\n", + " Location: (51.8990101, -8.4739482)\n", + " Content: Restaurant: Koto, Carey's Lane, 6-7, T12 FH27. Tags: opening_hours=Mo-Su 12:00-22:00\n", + "\n", + "2. Dukes\n", + " Type: cafe\n", + " Distance: 28.7 m\n", + " Location: (51.8991234, -8.474089)\n", + " Content: Cafe: Dukes, Carey's Lane, 4, Cork.\n", + "\n", + "3. Soba Asian Street Food\n", + " Type: fast_food\n", + " Distance: 30.1 m\n", + " Location: (51.8989516, -8.4738856)\n", + " Content: Fast_food: Soba Asian Street Food.\n", + "\n", + "4. OffBeat Donuts\n", + " Type: fast_food\n", + " Distance: 35.1 m\n", + " Location: (51.8990968, -8.4739097)\n", + " Content: Fast_food: OffBeat Donuts, French Church Street, 17, Cork.\n", + "\n", + "5. Burritos and Blues\n", + " Type: fast_food\n", + " Distance: 43.6 m\n", + " Location: (51.899271, -8.4745565)\n", + " Content: Fast_food: Burritos and Blues, Paul Street, 9, Cork. Tags: opening_hours=Mo-We 12:00-20:00; Th-Sa 12:00-21:00; Su 13:00-...\n", + "\n" + ] + } + ], + "source": [ + "def preview_documents(docs, limit=5):\n", + " print(f\"Previewing first {min(len(docs), limit)} documents:\\n\")\n", + "\n", + " for i, doc in enumerate(docs[:limit], start=1):\n", + " name = doc.meta.get(\"name\", \"Unknown\")\n", + " category = doc.meta.get(\"category\", \"Unknown\")\n", + " distance = doc.meta.get(\"distance_m\", 0.0)\n", + " lat = doc.meta.get(\"lat\")\n", + " lon = doc.meta.get(\"lon\")\n", + "\n", + " print(f\"{i}. {name}\")\n", + " print(f\" Type: {category}\")\n", + " print(f\" Distance: {distance:.1f} m\")\n", + " print(f\" Location: ({lat}, {lon})\")\n", + " print(f\" Content: {doc.content[:120]}{'...' if len(doc.content) > 120 else ''}\")\n", + " print()\n", + "\n", + "preview_documents(documents, limit=5)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "R3WqXmSzyIbx" + }, + "source": [ + "## Part2: Pipeline to look for the nearest coffee shop" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "kzPKYWR4y3rb" + }, + "outputs": [], + "source": [ + "from haystack import Pipeline\n", + "from haystack.components.builders import ChatPromptBuilder\n", + "from haystack.components.generators.chat import OpenAIChatGenerator\n", + "from haystack.dataclasses import ChatMessage\n", + "from haystack.utils import Secret" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "t1AFay1Wy6HJ" + }, + "outputs": [], + "source": [ + "prompt_template = [\n", + " ChatMessage.from_system(\n", + " \"You are a geographic information assistant. \"\n", + " \"Based on the provided OpenStreetMap data, help the user find nearby places that match the user's query.\"\n", + " ),\n", + " ChatMessage.from_user(\n", + " \"\"\"\n", + " User location: {{ user_location }}\n", + " Search radius: {{ radius }}m\n", + " User query: {{ query }}\n", + "\n", + " Available location data:\n", + " {% for document in documents %}\n", + " - {{ document.content }}\n", + " Location: ({{ document.meta.lat }}, {{ document.meta.lon }})\n", + " Distance: {{ document.meta.distance_m }}m\n", + " Type: {{ document.meta.category }}\n", + " {% endfor %}\n", + "\n", + " Please:\n", + " 1. Find all locations that are relevant to the user's query\n", + " 2. Sort them by distance\n", + " 3. Recommend the nearest 3 locations\n", + " 4. Provide a short description for each\n", + "\n", + " Please respond in English.\n", + " \"\"\"\n", + " ),\n", + "]\n", + "\n", + "prompt_builder = ChatPromptBuilder(\n", + " template=prompt_template,\n", + " required_variables=[\"user_location\", \"radius\", \"query\", \"documents\"], # optional, depends on what your pipeline requires\n", + ")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "2m5fzhegy8MT" + }, + "outputs": [], + "source": [ + "llm = OpenAIChatGenerator(\n", + " api_key=Secret.from_env_var(\"OPENAI_API_KEY\"),\n", + " model=\"gpt-4o-mini\",\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "o9HPsQUky-V2", + "outputId": "79f5ddc4-7482-4045-eb03-5b86792d32d2" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "\n", + "🚅 Components\n", + " - osm_fetcher: OSMFetcher\n", + " - prompt_builder: ChatPromptBuilder\n", + " - llm: OpenAIChatGenerator\n", + "🛤️ Connections\n", + " - osm_fetcher.documents -> prompt_builder.documents (List[Document])\n", + " - prompt_builder.prompt -> llm.messages (list[ChatMessage])" + ] + }, + "execution_count": 152, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "coffee_pipeline = Pipeline()\n", + "coffee_pipeline.add_component(\"osm_fetcher\", osm_fetcher)\n", + "coffee_pipeline.add_component(\"prompt_builder\", prompt_builder)\n", + "coffee_pipeline.add_component(\"llm\", llm)\n", + "\n", + "# documents to prompt_builder\n", + "coffee_pipeline.connect(\"osm_fetcher.documents\", \"prompt_builder.documents\")\n", + "# ChatPromptBuilder output toward prompt(List[ChatMessage]) as llm.messages\n", + "coffee_pipeline.connect(\"prompt_builder.prompt\", \"llm.messages\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "gLebpkTG1JD8" + }, + "outputs": [], + "source": [ + "search_query = \"coffee shop\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "O87vWjTSzAI0", + "outputId": "e85bfcea-00f0-4a48-bb3e-415183e5df12" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Current Query:\n", + "\n", + " [out:json][timeout:20][maxsize:2000000];\n", + " (\n", + " node[amenity](around:1000,51.8989077,-8.4743188);\n", + " );\n", + " out geom;\n", + " \n", + "Status: 200\n", + "Response: {\n", + " \"version\": 0.6,\n", + " \"generator\": \"Overpass API 0.7.62.8 e802775f\",\n", + " \"osm3s\": {\n", + " \"timestamp_osm_base\": \"2025-11-15T15:11:30Z\",\n", + " \"copyright\": \"The data included in this document is from www.ope...\n", + "[OSM_Doc_Converter] Reading Raw OSM GeoJson...\n", + "[OSM_Doc_Converter] Loaded 955 entries.\n", + "[OSM_Doc_Converter] Batch-processing data cleaning.\n", + "Role: ChatRole.ASSISTANT\n", + "\n", + "Assistant reply:\n", + "\n", + "Based on your query for coffee shops in Cork within a 1000m radius from your location, here are the nearest three options:\n", + "\n", + "1. **Dukes, Carey's Lane, 4, Cork** \n", + " - **Distance:** 28.70m \n", + " - **Description:** A cozy café located on Carey's Lane, perfect for grabbing a quick coffee or enjoying a light snack in a relaxed atmosphere.\n", + "\n", + "2. **Plus & Minus, Cork** \n", + " - **Distance:** 45.59m \n", + " - **Description:** This café offers a range of coffee options alongside an inviting ambiance, ideal for both work and socializing.\n", + "\n", + "3. **Rebel Coffee Cork, French Church Street, 4, Cork** \n", + " - **Distance:** 53.15m \n", + " - **Description:** A charming coffee shop that focuses on quality brews and has a menu offering light bites, creating a perfect environment for coffee lovers.\n", + "\n", + "These establishments are the closest coffee options for you in Cork, making them excellent choices for your coffee cravings. Enjoy!\n" + ] + } + ], + "source": [ + "user_location = \"Cork, Ireland\"\n", + "radius = 1000\n", + "\n", + "result = coffee_pipeline.run(\n", + " {\n", + " \"osm_fetcher\": {},\n", + " \"prompt_builder\": {\n", + " \"user_location\": user_location,\n", + " \"radius\": radius,\n", + " \"query\": search_query,\n", + " },\n", + " }\n", + ")\n", + "\n", + "reply = result[\"llm\"][\"replies\"][0]\n", + "print(\"Role:\", reply.role)\n", + "print(\"\\nAssistant reply:\\n\")\n", + "print(reply.text)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wXA_fXMML6cT" + }, + "source": [ + "## Part 3 : Planning an afternoon itinerary with an Agent and OSM tools\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 177, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "94GznuQYL7i0", + "outputId": "ff520151-27a4-4ba9-d29d-2420f36873c3" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "\n", + "Inputs:\n", + " - center: Optional[Tuple[float, float]]\n", + " - radius_m: Optional[int]\n", + "Outputs:\n", + " - documents: List[Document]" + ] + }, + "execution_count": 177, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from osm_integration_haystack import OSMFetcher\n", + "\n", + "ITINERARY_CENTER = (51.898403, -8.473978) # (lat, lon)\n", + "ITINERARY_RADIUS_M = 1000 # 1000m 半径\n", + "\n", + "itinerary_fetcher = OSMFetcher(\n", + " preset_center=ITINERARY_CENTER,\n", + " preset_radius_m=ITINERARY_RADIUS_M,\n", + " target_osm_types=[\"node\"],\n", + " target_osm_tags=[\n", + " \"amenity\", # cafe / bar / pub / restaurant / place_of_worship\n", + " \"tourism\", # 景点、博物馆\n", + " \"leisure\", # 公园等\n", + " ],\n", + " maximum_query_mb=4,\n", + " overpass_timeout=30,\n", + ")\n", + "\n", + "itinerary_fetcher\n" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} From 0e0622e257a67e3067863e05eff48fd037356f99 Mon Sep 17 00:00:00 2001 From: grexrr Date: Sat, 15 Nov 2025 18:25:28 +0000 Subject: [PATCH 2/3] update --- notebooks/openstreetmap_rag_pipeline.ipynb | 1653 ++++++++++++-------- 1 file changed, 1007 insertions(+), 646 deletions(-) diff --git a/notebooks/openstreetmap_rag_pipeline.ipynb b/notebooks/openstreetmap_rag_pipeline.ipynb index 6fab4f8..9370d1a 100644 --- a/notebooks/openstreetmap_rag_pipeline.ipynb +++ b/notebooks/openstreetmap_rag_pipeline.ipynb @@ -1,673 +1,1034 @@ { - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "Qo0Iu0MMaOTU" - }, - "source": [ - "# OpenStreetMap RAG pipeline" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "MGH9HBLVagni" - }, - "source": [ - "> [OpenStreetMap](https://www.openstreetmap.org/) is a map of the world, created by people like you and free to use under an open license.\n", - "\n", - "OpenStreetMap has already provided [Overpass Api](https://wiki.openstreetmap.org/wiki/Overpass_API) queries\n", - "\n", - "notebook 会展示:基础 OSM 查询 → LLM summarization → Agent + tools\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "nn5lixpbbCEP" - }, - "source": [] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6266U8y-avQ9" - }, - "source": [ - "## Setup Environment\n", - "- Install `haystack-ai`, `osm-integration-haystack`\n", - "- Setup `OPENAI_API_KEY`\n", - "\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "u-Acpx6na0DA" - }, - "outputs": [], - "source": [ - "!pip install -q haystack-ai osm-integration-haystack" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "6KSIfsm4ayYi" - }, - "source": [ - "## Add Your API Keys" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "dsCD2okqfEOg", - "outputId": "fe35ba66-9a57-4cf6-e3ae-ecd21801e76f" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Enter OpenAI API key:··········\n" - ] - } - ], - "source": [ - "import os\n", - "from getpass import getpass\n", - "\n", - "if \"OPENAI_API_KEY\" in os.environ:\n", - " del os.environ[\"OPENAI_API_KEY\"]\n", - "\n", - "if \"OPENAI_API_KEY\" not in os.environ:\n", - " os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter OpenAI API key:\")\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "2SVEZnuVdpXl" - }, - "source": [ - "## Part1: Setup Environment\n", - "\n", - "这个部分是使用LLM agent处理OpenStreetMap数据的前置步骤。`OSMFetcher`来自'osm_integration_haystack'负责通过封装的`Overpass API` queries..." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "NgzxySwdjHFs" - }, - "source": [ - "### 1.1 From Name [String] to Coordination [Tuple]\n", - "\n", - "这个部分示范了使用[Nominatim](https://nominatim.org/)来decode为具体坐标。这个子章节主要说明了数据输入的一种alternative。" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "pOfsHARDjRzN" - }, - "outputs": [], - "source": [ - "!pip install -q geopy" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "z7r0PZf1jUpL", - "outputId": "067fd540-1e3f-468d-9386-ee76742614ac" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Query: saints peter and paul's catholic church\n", - "Latitude: 51.8989077\n", - "Longitude: -8.4743188\n", - "Display name: Saints Peter and Paul's Catholic Church, Carey's Lane, The Marsh, Centre B ED, Cork, County Cork, Munster, T12 FH27, Éire / Ireland\n" - ] - } - ], - "source": [ - "from geopy.geocoders import Nominatim\n", - "\n", - "geolocator = Nominatim(user_agent=\"haystack-osm-cookbook-demo\")\n", - "\n", - "# Geo-decoding a name string into geocode\n", - "location_name = \"saints peter and paul's catholic church\"\n", - "location = geolocator.geocode(location_name)\n", - "\n", - "print(f\"Query: {location_name}\")\n", - "print(f\"Latitude: {location.latitude}\")\n", - "print(f\"Longitude: {location.longitude}\")\n", - "print(f\"Display name: {location.address}\")\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "4nFzlApoj8_L" - }, - "source": [ - "实际的Geocoding/decoding往往还需要考虑歧义和命中率等问题,预处理并非是本教程的重点。在一般的map-based软件实践中,出于对精度的考虑,后端接受的往往是一组实际的(latitude, longitude) tuple。在此我们就直接使用Saints Peter and Paul's Catholic Church的坐标,并尝试寻找200米内最近的咖啡店。" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "slQpVtXHWvIH" - }, - "outputs": [], - "source": [ - "from osm_integration_haystack import OSMFetcher\n", - "\n", - "CENTER = (51.8989077, -8.4743188) # (lat, lon)\n", - "RADIUS_M = 1000" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "4hO-qR8Lewgj" - }, - "outputs": [], - "source": [ - "osm_fetcher = OSMFetcher(\n", - " preset_center=CENTER, # Cork, Ireland\n", - " preset_radius_m=RADIUS_M, # 200m radius\n", - " target_osm_types=[\"node\"], # Only search nodes\n", - " target_osm_tags=[\"amenity\"], # Search amenity types\n", - " maximum_query_mb=2, # Limit query size\n", - " overpass_timeout=20\n", - " )" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "y_x4EHRKv9Te", - "outputId": "70584bae-dfc1-4863-fd25-c053d21b4921" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Current Query:\n", - "\n", - " [out:json][timeout:20][maxsize:2000000];\n", - " (\n", - " node[amenity](around:1000,51.8989077,-8.4743188);\n", - " );\n", - " out geom;\n", - " \n", - "Status: 200\n", - "Response: {\n", - " \"version\": 0.6,\n", - " \"generator\": \"Overpass API 0.7.62.8 e802775f\",\n", - " \"osm3s\": {\n", - " \"timestamp_osm_base\": \"2025-11-15T15:10:27Z\",\n", - " \"copyright\": \"The data included in this document is from www.ope...\n", - "[OSM_Doc_Converter] Reading Raw OSM GeoJson...\n", - "[OSM_Doc_Converter] Loaded 955 entries.\n", - "[OSM_Doc_Converter] Batch-processing data cleaning.\n" - ] - } - ], - "source": [ - "result = osm_fetcher.run() # Haystack component 标准接口\n", - "documents = result[\"documents\"]" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "DgwnILdAwCyl", - "outputId": "1b7e6004-ba01-4c4c-ef06-96cee27e5ab9" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "📄 type: \n", - "\n", - "--- content ---\n", - "Restaurant: Koto, Carey's Lane, 6-7, T12 FH27. Tags: opening_hours=Mo-Su 12:00-22:00\n", - "\n", - "--- meta keys ---\n", - "['source', 'osm_id', 'osm_type', 'lat', 'lon', 'name', 'category', 'tags', 'tags_norm', 'address', 'distance_m']\n", - "\n", - "--- full meta ---\n", - "{'address': {'housenumber': '6-7',\n", - " 'postcode': 'T12 FH27',\n", - " 'street': \"Carey's Lane\"},\n", - " 'category': 'restaurant',\n", - " 'distance_m': 27.86087599824802,\n", - " 'lat': 51.8990101,\n", - " 'lon': -8.4739482,\n", - " 'name': 'Koto',\n", - " 'osm_id': 5203928867,\n", - " 'osm_type': 'node',\n", - " 'source': 'openstreetmap',\n", - " 'tags': {'amenity': 'restaurant',\n", - " 'contact:facebook': 'https://www.facebook.com/KotoCork/',\n", - " 'contact:instagram': 'https://www.instagram.com/kotocork',\n", - " 'cuisine': 'asian',\n", - " 'email': 'info@koto.ie',\n", - " 'opening_hours': 'Mo-Su 12:00-22:00',\n", - " 'phone': '+353-21-4274172',\n", - " 'smoking': 'no',\n", - " 'website': 'https://koto.ie/'},\n", - " 'tags_norm': {'amenity': 'restaurant',\n", - " 'contact_facebook': 'https://www.facebook.com/KotoCork/',\n", - " 'contact_instagram': 'https://www.instagram.com/kotocork',\n", - " 'cuisine': 'asian',\n", - " 'email': 'info@koto.ie',\n", - " 'opening_hours': 'Mo-Su 12:00-22:00',\n", - " 'phone': '+353-21-4274172',\n", - " 'smoking': False,\n", - " 'website': 'https://koto.ie/'}}\n" - ] - } - ], - "source": [ - "from pprint import pprint\n", - "\n", - "first_doc = documents[0]\n", - "print(\"📄 type:\", type(first_doc))\n", - "\n", - "print(\"\\n--- content ---\")\n", - "print(first_doc.content)\n", - "\n", - "print(\"\\n--- meta keys ---\")\n", - "print(list(first_doc.meta.keys()))\n", - "\n", - "print(\"\\n--- full meta ---\")\n", - "pprint(first_doc.meta)\n" - ] + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + } + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# OpenStreetMap RAG pipeline" + ], + "metadata": { + "id": "Qo0Iu0MMaOTU" + } + }, + { + "cell_type": "markdown", + "source": [ + "## OpenStreetMap + Haystack: From basic queries to agents\n", + "\n", + "       \n", + "\n", + "\n", + "[OpenStreetMap](https://www.openstreetmap.org/) is a free, community-driven map of the world. In this notebook, we use the [osm-integration-haystack](https://github.com/grexrr/osm-integration-haystack) package to turn OpenStreetMap data into `Haystack Document`s and then plug them into LLM workflows.\n", + "\n", + "We'll together walk through two progressively more advanced scenarios:\n", + "\n", + "1. **Basic OSM query → LLM summarization** \n", + " Use `OSMFetcher` to retrieve and preprocess nearby points of interest (POIs) around Cork city centre, then build a prompt that summarizes the locations for a specific user query (e.g. “find coffee shops nearby”).\n", + "\n", + "2. **Agent + tools: itinerary planner** \n", + " Wrap an OSM-based pipeline as a Haystack `PipelineTool`, expose it to an agent and let the LLM call this tool to plan an afternoon itinerary in Cork." + ], + "metadata": { + "id": "MGH9HBLVagni" + } + }, + { + "cell_type": "markdown", + "source": [ + "## Setup" + ], + "metadata": { + "id": "6266U8y-avQ9" + } + }, + { + "cell_type": "code", + "source": [ + "!pip install -q haystack-ai osm-integration-haystack" + ], + "metadata": { + "id": "u-Acpx6na0DA" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## Part 1: OpenStreetMap + LLM Summarization\n", + "\n", + "This part is a **preparation step** before using Agents and tools. \n", + "We focus on turning raw OpenStreetMap data into a small, vector-like knowledge base via `OSMFetcher`, and then asking an LLM to summarize it. In simpler terms, Part 1 demonstrates the basic pattern:\n", + "\n", + "🗺️ OpenStreetMap (Overpass API) \n", + "  → 📡 OSMFetcher \n", + "  → 📄 Documents (our vectorized knowledge base) \n", + "  → 🧩 ChatPromptBuilder + 🧠 OpenAIChatGenerator \n", + "  → 🤖 LLM summarization\n", + "\n", + "This will lay the foundation for more complex, **agentic** behavior in Part 2, where we'll wrap this logic into a reusable tool that an Agent can call automatically." + ], + "metadata": { + "id": "2SVEZnuVdpXl" + } + }, + { + "cell_type": "markdown", + "source": [ + "**Authorization**\n", + "\n", + "Before start, you need to provide your own OpenAI API key:" + ], + "metadata": { + "id": "tvFHVh7IgdoD" + } + }, + { + "cell_type": "code", + "source": [ + "import os\n", + "from getpass import getpass\n", + "\n", + "if \"OPENAI_API_KEY\" in os.environ:\n", + " del os.environ[\"OPENAI_API_KEY\"]\n", + "\n", + "if \"OPENAI_API_KEY\" not in os.environ:\n", + " os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter OpenAI API key:\")" + ], + "metadata": { + "id": "tLZGEaPxkb4y" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "**Extra:** From Name (String) to Coordinates (Tuple)\n", + "\n", + "In this example we use [Nominatim](https://nominatim.org/) to **geocode** the place name \n", + "*Saints Peter and Paul's Catholic Church* into latitude/longitude coordinates. \n", + "\n", + "This is not the main focus of the notebook. In real-world geocoding workflows you usually have to deal with ambiguity, match quality, and various string-cleaning heuristics, which are out of scope here. In most map-based applications, for accuracy and robustness, backend services expect a concrete `(latitude, longitude)` tuple rather than raw location strings." + ], + "metadata": { + "id": "NgzxySwdjHFs" + } + }, + { + "cell_type": "code", + "source": [ + "!pip install -q geopy" + ], + "metadata": { + "id": "pOfsHARDjRzN" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "from geopy.geocoders import Nominatim\n", + "\n", + "geolocator = Nominatim(user_agent=\"haystack-osm-cookbook-demo\")\n", + "\n", + "# Geo-decoding a name string into geocode\n", + "location_name = \"saints peter and paul's catholic church\"\n", + "location = geolocator.geocode(location_name)\n", + "\n", + "print(f\"Query: {location_name}\")\n", + "print(f\"Latitude: {location.latitude}\")\n", + "print(f\"Longitude: {location.longitude}\")\n", + "print(f\"Display name: {location.address}\")\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" }, + "id": "z7r0PZf1jUpL", + "outputId": "067fd540-1e3f-468d-9386-ee76742614ac" + }, + "execution_count": null, + "outputs": [ { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "hEYMy0ZKwKcy", - "outputId": "0e83a3a3-a7ad-4943-950f-0b3932dce2c4" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Previewing first 5 documents:\n", - "\n", - "1. Koto\n", - " Type: restaurant\n", - " Distance: 27.9 m\n", - " Location: (51.8990101, -8.4739482)\n", - " Content: Restaurant: Koto, Carey's Lane, 6-7, T12 FH27. Tags: opening_hours=Mo-Su 12:00-22:00\n", - "\n", - "2. Dukes\n", - " Type: cafe\n", - " Distance: 28.7 m\n", - " Location: (51.8991234, -8.474089)\n", - " Content: Cafe: Dukes, Carey's Lane, 4, Cork.\n", - "\n", - "3. Soba Asian Street Food\n", - " Type: fast_food\n", - " Distance: 30.1 m\n", - " Location: (51.8989516, -8.4738856)\n", - " Content: Fast_food: Soba Asian Street Food.\n", - "\n", - "4. OffBeat Donuts\n", - " Type: fast_food\n", - " Distance: 35.1 m\n", - " Location: (51.8990968, -8.4739097)\n", - " Content: Fast_food: OffBeat Donuts, French Church Street, 17, Cork.\n", - "\n", - "5. Burritos and Blues\n", - " Type: fast_food\n", - " Distance: 43.6 m\n", - " Location: (51.899271, -8.4745565)\n", - " Content: Fast_food: Burritos and Blues, Paul Street, 9, Cork. Tags: opening_hours=Mo-We 12:00-20:00; Th-Sa 12:00-21:00; Su 13:00-...\n", - "\n" - ] - } - ], - "source": [ - "def preview_documents(docs, limit=5):\n", - " print(f\"Previewing first {min(len(docs), limit)} documents:\\n\")\n", - "\n", - " for i, doc in enumerate(docs[:limit], start=1):\n", - " name = doc.meta.get(\"name\", \"Unknown\")\n", - " category = doc.meta.get(\"category\", \"Unknown\")\n", - " distance = doc.meta.get(\"distance_m\", 0.0)\n", - " lat = doc.meta.get(\"lat\")\n", - " lon = doc.meta.get(\"lon\")\n", - "\n", - " print(f\"{i}. {name}\")\n", - " print(f\" Type: {category}\")\n", - " print(f\" Distance: {distance:.1f} m\")\n", - " print(f\" Location: ({lat}, {lon})\")\n", - " print(f\" Content: {doc.content[:120]}{'...' if len(doc.content) > 120 else ''}\")\n", - " print()\n", - "\n", - "preview_documents(documents, limit=5)\n" - ] + "output_type": "stream", + "name": "stdout", + "text": [ + "Query: saints peter and paul's catholic church\n", + "Latitude: 51.8989077\n", + "Longitude: -8.4743188\n", + "Display name: Saints Peter and Paul's Catholic Church, Carey's Lane, The Marsh, Centre B ED, Cork, County Cork, Munster, T12 FH27, Éire / Ireland\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "### Step 1\n", + "Here we can just use the coordinate turple as the more conventional input." + ], + "metadata": { + "id": "4nFzlApoj8_L" + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "slQpVtXHWvIH" + }, + "outputs": [], + "source": [ + "from osm_integration_haystack import OSMFetcher\n", + "\n", + "CENTER = (51.8989077, -8.4743188) # (lat, lon)\n", + "RADIUS_M = 1000" + ] + }, + { + "cell_type": "code", + "source": [ + "osm_fetcher = OSMFetcher(\n", + " preset_center=CENTER, # Cork, Ireland\n", + " preset_radius_m=RADIUS_M, # 200m radius\n", + " target_osm_types=[\"node\"], # Only search nodes\n", + " target_osm_tags=[\"amenity\"], # Search amenity types\n", + " maximum_query_mb=2, # Limit query size\n", + " overpass_timeout=20\n", + " )" + ], + "metadata": { + "id": "4hO-qR8Lewgj" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "result = osm_fetcher.run() # Haystack component 标准接口\n", + "documents = result[\"documents\"]" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" }, + "id": "y_x4EHRKv9Te", + "outputId": "70584bae-dfc1-4863-fd25-c053d21b4921" + }, + "execution_count": null, + "outputs": [ { - "cell_type": "markdown", - "metadata": { - "id": "R3WqXmSzyIbx" - }, - "source": [ - "## Part2: Pipeline to look for the nearest coffee shop" - ] + "output_type": "stream", + "name": "stdout", + "text": [ + "Current Query:\n", + "\n", + " [out:json][timeout:20][maxsize:2000000];\n", + " (\n", + " node[amenity](around:1000,51.8989077,-8.4743188);\n", + " );\n", + " out geom;\n", + " \n", + "Status: 200\n", + "Response: {\n", + " \"version\": 0.6,\n", + " \"generator\": \"Overpass API 0.7.62.8 e802775f\",\n", + " \"osm3s\": {\n", + " \"timestamp_osm_base\": \"2025-11-15T15:10:27Z\",\n", + " \"copyright\": \"The data included in this document is from www.ope...\n", + "[OSM_Doc_Converter] Reading Raw OSM GeoJson...\n", + "[OSM_Doc_Converter] Loaded 955 entries.\n", + "[OSM_Doc_Converter] Batch-processing data cleaning.\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "from pprint import pprint\n", + "\n", + "first_doc = documents[0]\n", + "print(\"📄 type:\", type(first_doc))\n", + "\n", + "print(\"\\n--- content ---\")\n", + "print(first_doc.content)\n", + "\n", + "print(\"\\n--- meta keys ---\")\n", + "print(list(first_doc.meta.keys()))\n", + "\n", + "print(\"\\n--- full meta ---\")\n", + "pprint(first_doc.meta)\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" }, + "id": "DgwnILdAwCyl", + "outputId": "1b7e6004-ba01-4c4c-ef06-96cee27e5ab9" + }, + "execution_count": null, + "outputs": [ { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "kzPKYWR4y3rb" - }, - "outputs": [], - "source": [ - "from haystack import Pipeline\n", - "from haystack.components.builders import ChatPromptBuilder\n", - "from haystack.components.generators.chat import OpenAIChatGenerator\n", - "from haystack.dataclasses import ChatMessage\n", - "from haystack.utils import Secret" - ] + "output_type": "stream", + "name": "stdout", + "text": [ + "📄 type: \n", + "\n", + "--- content ---\n", + "Restaurant: Koto, Carey's Lane, 6-7, T12 FH27. Tags: opening_hours=Mo-Su 12:00-22:00\n", + "\n", + "--- meta keys ---\n", + "['source', 'osm_id', 'osm_type', 'lat', 'lon', 'name', 'category', 'tags', 'tags_norm', 'address', 'distance_m']\n", + "\n", + "--- full meta ---\n", + "{'address': {'housenumber': '6-7',\n", + " 'postcode': 'T12 FH27',\n", + " 'street': \"Carey's Lane\"},\n", + " 'category': 'restaurant',\n", + " 'distance_m': 27.86087599824802,\n", + " 'lat': 51.8990101,\n", + " 'lon': -8.4739482,\n", + " 'name': 'Koto',\n", + " 'osm_id': 5203928867,\n", + " 'osm_type': 'node',\n", + " 'source': 'openstreetmap',\n", + " 'tags': {'amenity': 'restaurant',\n", + " 'contact:facebook': 'https://www.facebook.com/KotoCork/',\n", + " 'contact:instagram': 'https://www.instagram.com/kotocork',\n", + " 'cuisine': 'asian',\n", + " 'email': 'info@koto.ie',\n", + " 'opening_hours': 'Mo-Su 12:00-22:00',\n", + " 'phone': '+353-21-4274172',\n", + " 'smoking': 'no',\n", + " 'website': 'https://koto.ie/'},\n", + " 'tags_norm': {'amenity': 'restaurant',\n", + " 'contact_facebook': 'https://www.facebook.com/KotoCork/',\n", + " 'contact_instagram': 'https://www.instagram.com/kotocork',\n", + " 'cuisine': 'asian',\n", + " 'email': 'info@koto.ie',\n", + " 'opening_hours': 'Mo-Su 12:00-22:00',\n", + " 'phone': '+353-21-4274172',\n", + " 'smoking': False,\n", + " 'website': 'https://koto.ie/'}}\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "def preview_documents(docs, limit=5):\n", + " print(f\"Previewing first {min(len(docs), limit)} documents:\\n\")\n", + "\n", + " for i, doc in enumerate(docs[:limit], start=1):\n", + " name = doc.meta.get(\"name\", \"Unknown\")\n", + " category = doc.meta.get(\"category\", \"Unknown\")\n", + " distance = doc.meta.get(\"distance_m\", 0.0)\n", + " lat = doc.meta.get(\"lat\")\n", + " lon = doc.meta.get(\"lon\")\n", + "\n", + " print(f\"{i}. {name}\")\n", + " print(f\" Type: {category}\")\n", + " print(f\" Distance: {distance:.1f} m\")\n", + " print(f\" Location: ({lat}, {lon})\")\n", + " print(f\" Content: {doc.content[:120]}{'...' if len(doc.content) > 120 else ''}\")\n", + " print()\n", + "\n", + "preview_documents(documents, limit=5)\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" }, + "id": "hEYMy0ZKwKcy", + "outputId": "0e83a3a3-a7ad-4943-950f-0b3932dce2c4" + }, + "execution_count": null, + "outputs": [ { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "t1AFay1Wy6HJ" - }, - "outputs": [], - "source": [ - "prompt_template = [\n", - " ChatMessage.from_system(\n", - " \"You are a geographic information assistant. \"\n", - " \"Based on the provided OpenStreetMap data, help the user find nearby places that match the user's query.\"\n", - " ),\n", - " ChatMessage.from_user(\n", - " \"\"\"\n", - " User location: {{ user_location }}\n", - " Search radius: {{ radius }}m\n", - " User query: {{ query }}\n", - "\n", - " Available location data:\n", - " {% for document in documents %}\n", - " - {{ document.content }}\n", - " Location: ({{ document.meta.lat }}, {{ document.meta.lon }})\n", - " Distance: {{ document.meta.distance_m }}m\n", - " Type: {{ document.meta.category }}\n", - " {% endfor %}\n", - "\n", - " Please:\n", - " 1. Find all locations that are relevant to the user's query\n", - " 2. Sort them by distance\n", - " 3. Recommend the nearest 3 locations\n", - " 4. Provide a short description for each\n", - "\n", - " Please respond in English.\n", - " \"\"\"\n", - " ),\n", - "]\n", - "\n", - "prompt_builder = ChatPromptBuilder(\n", - " template=prompt_template,\n", - " required_variables=[\"user_location\", \"radius\", \"query\", \"documents\"], # optional, depends on what your pipeline requires\n", - ")\n" - ] + "output_type": "stream", + "name": "stdout", + "text": [ + "Previewing first 5 documents:\n", + "\n", + "1. Koto\n", + " Type: restaurant\n", + " Distance: 27.9 m\n", + " Location: (51.8990101, -8.4739482)\n", + " Content: Restaurant: Koto, Carey's Lane, 6-7, T12 FH27. Tags: opening_hours=Mo-Su 12:00-22:00\n", + "\n", + "2. Dukes\n", + " Type: cafe\n", + " Distance: 28.7 m\n", + " Location: (51.8991234, -8.474089)\n", + " Content: Cafe: Dukes, Carey's Lane, 4, Cork.\n", + "\n", + "3. Soba Asian Street Food\n", + " Type: fast_food\n", + " Distance: 30.1 m\n", + " Location: (51.8989516, -8.4738856)\n", + " Content: Fast_food: Soba Asian Street Food.\n", + "\n", + "4. OffBeat Donuts\n", + " Type: fast_food\n", + " Distance: 35.1 m\n", + " Location: (51.8990968, -8.4739097)\n", + " Content: Fast_food: OffBeat Donuts, French Church Street, 17, Cork.\n", + "\n", + "5. Burritos and Blues\n", + " Type: fast_food\n", + " Distance: 43.6 m\n", + " Location: (51.899271, -8.4745565)\n", + " Content: Fast_food: Burritos and Blues, Paul Street, 9, Cork. Tags: opening_hours=Mo-We 12:00-20:00; Th-Sa 12:00-21:00; Su 13:00-...\n", + "\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Part2: Pipeline to look for the nearest coffee shop" + ], + "metadata": { + "id": "R3WqXmSzyIbx" + } + }, + { + "cell_type": "code", + "source": [ + "from haystack import Pipeline\n", + "from haystack.components.builders import ChatPromptBuilder\n", + "from haystack.components.generators.chat import OpenAIChatGenerator\n", + "from haystack.dataclasses import ChatMessage\n", + "from haystack.utils import Secret" + ], + "metadata": { + "id": "kzPKYWR4y3rb" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "prompt_template = [\n", + " ChatMessage.from_system(\n", + " \"You are a geographic information assistant. \"\n", + " \"Based on the provided OpenStreetMap data, help the user find nearby places that match the user's query.\"\n", + " ),\n", + " ChatMessage.from_user(\n", + " \"\"\"\n", + " User location: {{ user_location }}\n", + " Search radius: {{ radius }}m\n", + " User query: {{ query }}\n", + "\n", + " Available location data:\n", + " {% for document in documents %}\n", + " - {{ document.content }}\n", + " Location: ({{ document.meta.lat }}, {{ document.meta.lon }})\n", + " Distance: {{ document.meta.distance_m }}m\n", + " Type: {{ document.meta.category }}\n", + " {% endfor %}\n", + "\n", + " Please:\n", + " 1. Find all locations that are relevant to the user's query\n", + " 2. Sort them by distance\n", + " 3. Recommend the nearest 3 locations\n", + " 4. Provide a short description for each\n", + "\n", + " Please respond in English.\n", + " \"\"\"\n", + " ),\n", + "]\n", + "\n", + "prompt_builder = ChatPromptBuilder(\n", + " template=prompt_template,\n", + " required_variables=[\"user_location\", \"radius\", \"query\", \"documents\"], # optional, depends on what your pipeline requires\n", + ")\n" + ], + "metadata": { + "id": "t1AFay1Wy6HJ" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "llm = OpenAIChatGenerator(\n", + " api_key=Secret.from_env_var(\"OPENAI_API_KEY\"),\n", + " model=\"gpt-4o-mini\",\n", + ")" + ], + "metadata": { + "id": "2m5fzhegy8MT" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "coffee_pipeline = Pipeline()\n", + "coffee_pipeline.add_component(\"osm_fetcher\", osm_fetcher)\n", + "coffee_pipeline.add_component(\"prompt_builder\", prompt_builder)\n", + "coffee_pipeline.add_component(\"llm\", llm)\n", + "\n", + "# documents to prompt_builder\n", + "coffee_pipeline.connect(\"osm_fetcher.documents\", \"prompt_builder.documents\")\n", + "# ChatPromptBuilder output toward prompt(List[ChatMessage]) as llm.messages\n", + "coffee_pipeline.connect(\"prompt_builder.prompt\", \"llm.messages\")\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" }, + "id": "o9HPsQUky-V2", + "outputId": "79f5ddc4-7482-4045-eb03-5b86792d32d2" + }, + "execution_count": null, + "outputs": [ { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "2m5fzhegy8MT" - }, - "outputs": [], - "source": [ - "llm = OpenAIChatGenerator(\n", - " api_key=Secret.from_env_var(\"OPENAI_API_KEY\"),\n", - " model=\"gpt-4o-mini\",\n", - ")" + "output_type": "execute_result", + "data": { + "text/plain": [ + "\n", + "🚅 Components\n", + " - osm_fetcher: OSMFetcher\n", + " - prompt_builder: ChatPromptBuilder\n", + " - llm: OpenAIChatGenerator\n", + "🛤️ Connections\n", + " - osm_fetcher.documents -> prompt_builder.documents (List[Document])\n", + " - prompt_builder.prompt -> llm.messages (list[ChatMessage])" ] + }, + "metadata": {}, + "execution_count": 152 + } + ] + }, + { + "cell_type": "code", + "source": [ + "search_query = \"coffee shop\"" + ], + "metadata": { + "id": "gLebpkTG1JD8" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "user_location = \"Cork, Ireland\"\n", + "radius = 1000\n", + "\n", + "result = coffee_pipeline.run(\n", + " {\n", + " \"osm_fetcher\": {},\n", + " \"prompt_builder\": {\n", + " \"user_location\": user_location,\n", + " \"radius\": radius,\n", + " \"query\": search_query,\n", + " },\n", + " }\n", + ")\n", + "\n", + "reply = result[\"llm\"][\"replies\"][0]\n", + "print(\"Role:\", reply.role)\n", + "print(\"\\nAssistant reply:\\n\")\n", + "print(reply.text)\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" }, + "id": "O87vWjTSzAI0", + "outputId": "e85bfcea-00f0-4a48-bb3e-415183e5df12" + }, + "execution_count": null, + "outputs": [ { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "o9HPsQUky-V2", - "outputId": "79f5ddc4-7482-4045-eb03-5b86792d32d2" - }, - "outputs": [ - { - "data": { - "text/plain": [ - "\n", - "🚅 Components\n", - " - osm_fetcher: OSMFetcher\n", - " - prompt_builder: ChatPromptBuilder\n", - " - llm: OpenAIChatGenerator\n", - "🛤️ Connections\n", - " - osm_fetcher.documents -> prompt_builder.documents (List[Document])\n", - " - prompt_builder.prompt -> llm.messages (list[ChatMessage])" - ] - }, - "execution_count": 152, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "coffee_pipeline = Pipeline()\n", - "coffee_pipeline.add_component(\"osm_fetcher\", osm_fetcher)\n", - "coffee_pipeline.add_component(\"prompt_builder\", prompt_builder)\n", - "coffee_pipeline.add_component(\"llm\", llm)\n", - "\n", - "# documents to prompt_builder\n", - "coffee_pipeline.connect(\"osm_fetcher.documents\", \"prompt_builder.documents\")\n", - "# ChatPromptBuilder output toward prompt(List[ChatMessage]) as llm.messages\n", - "coffee_pipeline.connect(\"prompt_builder.prompt\", \"llm.messages\")\n" - ] + "output_type": "stream", + "name": "stdout", + "text": [ + "Current Query:\n", + "\n", + " [out:json][timeout:20][maxsize:2000000];\n", + " (\n", + " node[amenity](around:1000,51.8989077,-8.4743188);\n", + " );\n", + " out geom;\n", + " \n", + "Status: 200\n", + "Response: {\n", + " \"version\": 0.6,\n", + " \"generator\": \"Overpass API 0.7.62.8 e802775f\",\n", + " \"osm3s\": {\n", + " \"timestamp_osm_base\": \"2025-11-15T15:11:30Z\",\n", + " \"copyright\": \"The data included in this document is from www.ope...\n", + "[OSM_Doc_Converter] Reading Raw OSM GeoJson...\n", + "[OSM_Doc_Converter] Loaded 955 entries.\n", + "[OSM_Doc_Converter] Batch-processing data cleaning.\n", + "Role: ChatRole.ASSISTANT\n", + "\n", + "Assistant reply:\n", + "\n", + "Based on your query for coffee shops in Cork within a 1000m radius from your location, here are the nearest three options:\n", + "\n", + "1. **Dukes, Carey's Lane, 4, Cork** \n", + " - **Distance:** 28.70m \n", + " - **Description:** A cozy café located on Carey's Lane, perfect for grabbing a quick coffee or enjoying a light snack in a relaxed atmosphere.\n", + "\n", + "2. **Plus & Minus, Cork** \n", + " - **Distance:** 45.59m \n", + " - **Description:** This café offers a range of coffee options alongside an inviting ambiance, ideal for both work and socializing.\n", + "\n", + "3. **Rebel Coffee Cork, French Church Street, 4, Cork** \n", + " - **Distance:** 53.15m \n", + " - **Description:** A charming coffee shop that focuses on quality brews and has a menu offering light bites, creating a perfect environment for coffee lovers.\n", + "\n", + "These establishments are the closest coffee options for you in Cork, making them excellent choices for your coffee cravings. Enjoy!\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Part 3 : Planning an afternoon itinerary with an Agent and OSM tools\n", + "\n" + ], + "metadata": { + "id": "wXA_fXMML6cT" + } + }, + { + "cell_type": "code", + "source": [ + "from osm_integration_haystack import OSMFetcher\n", + "\n", + "CENTER = (51.898403, -8.473978)\n", + "RADIUS_M = 1000\n", + "\n", + "itinerary_fetcher = OSMFetcher(\n", + " preset_center=CENTER,\n", + " preset_radius_m=RADIUS_M,\n", + " target_osm_types=[\"node\"],\n", + " target_osm_tags=[\n", + " \"amenity\",\n", + " \"tourism\",\n", + " \"leisure\",\n", + " ],\n", + " maximum_query_mb=4,\n", + " overpass_timeout=30,\n", + ")\n" + ], + "metadata": { + "id": "94GznuQYL7i0" + }, + "execution_count": 197, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "from haystack.components.builders import ChatPromptBuilder\n", + "from haystack.dataclasses import ChatMessage\n", + "\n", + "itinerary_prompt_template = [\n", + " ChatMessage.from_system(\n", + " \"You are a local travel planner in Cork, Ireland. \"\n", + " \"Always answer in concise English.\"\n", + " ),\n", + " ChatMessage.from_user(\n", + " \"User request:\\n{{ user_request }}\\n\\n\"\n", + " \"Here are some nearby locations from OpenStreetMap:\\n\"\n", + " \"{% if documents %}\"\n", + " \"{% for doc in documents[:40] %}\"\n", + " \"- {{ doc.meta.get('name', 'Unknown') }} \"\n", + " \"(type: {{ doc.meta.get('category', 'unknown') }}, \"\n", + " \"distance: {{ '%.1f'|format(doc.meta.get('distance_m', 0)) }} m)\\n\"\n", + " \"{% endfor %}\"\n", + " \"{% else %}\"\n", + " \"No locations available.\\n\"\n", + " \"{% endif %}\\n\\n\"\n", + " \"Using this information, suggest 1–2 itineraries starting from a church or \"\n", + " \"historic religious site, then a study-friendly cafe, and ending at a bar/pub.\"\n", + " ),\n", + "]\n", + "\n", + "itinerary_prompt_builder = ChatPromptBuilder(template=itinerary_prompt_template)\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" }, + "id": "nvAaFQC3XJ0I", + "outputId": "445cccd7-8b27-4045-aeb5-117cfcc5d135" + }, + "execution_count": 198, + "outputs": [ { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "gLebpkTG1JD8" - }, - "outputs": [], - "source": [ - "search_query = \"coffee shop\"" - ] + "output_type": "stream", + "name": "stderr", + "text": [ + "WARNING:haystack.components.builders.chat_prompt_builder:ChatPromptBuilder has 2 prompt variables, but `required_variables` is not set. By default, all prompt variables are treated as optional, which may lead to unintended behavior in multi-branch pipelines. To avoid unexpected execution, ensure that variables intended to be required are explicitly set in `required_variables`.\n" + ] + } + ] + }, + { + "cell_type": "code", + "source": [ + "from haystack import Pipeline\n", + "\n", + "agent_itinerary_pipeline = Pipeline()\n", + "agent_itinerary_pipeline.add_component(\"itinerary_osm_fetcher\", itinerary_fetcher)\n", + "agent_itinerary_pipeline.add_component(\"itinerary_prompt_builder\", itinerary_prompt_builder)\n", + "\n", + "# 把 OSMFetcher 的 documents 塞进 ChatPromptBuilder 的 template_variables.documents\n", + "agent_itinerary_pipeline.connect(\n", + " \"itinerary_osm_fetcher.documents\",\n", + " \"itinerary_prompt_builder.documents\",\n", + ")\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" }, + "id": "P6NvzwTEXZRl", + "outputId": "6856790d-c5eb-44f6-e43d-13daea1806a0" + }, + "execution_count": 199, + "outputs": [ { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "O87vWjTSzAI0", - "outputId": "e85bfcea-00f0-4a48-bb3e-415183e5df12" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Current Query:\n", - "\n", - " [out:json][timeout:20][maxsize:2000000];\n", - " (\n", - " node[amenity](around:1000,51.8989077,-8.4743188);\n", - " );\n", - " out geom;\n", - " \n", - "Status: 200\n", - "Response: {\n", - " \"version\": 0.6,\n", - " \"generator\": \"Overpass API 0.7.62.8 e802775f\",\n", - " \"osm3s\": {\n", - " \"timestamp_osm_base\": \"2025-11-15T15:11:30Z\",\n", - " \"copyright\": \"The data included in this document is from www.ope...\n", - "[OSM_Doc_Converter] Reading Raw OSM GeoJson...\n", - "[OSM_Doc_Converter] Loaded 955 entries.\n", - "[OSM_Doc_Converter] Batch-processing data cleaning.\n", - "Role: ChatRole.ASSISTANT\n", - "\n", - "Assistant reply:\n", - "\n", - "Based on your query for coffee shops in Cork within a 1000m radius from your location, here are the nearest three options:\n", - "\n", - "1. **Dukes, Carey's Lane, 4, Cork** \n", - " - **Distance:** 28.70m \n", - " - **Description:** A cozy café located on Carey's Lane, perfect for grabbing a quick coffee or enjoying a light snack in a relaxed atmosphere.\n", - "\n", - "2. **Plus & Minus, Cork** \n", - " - **Distance:** 45.59m \n", - " - **Description:** This café offers a range of coffee options alongside an inviting ambiance, ideal for both work and socializing.\n", - "\n", - "3. **Rebel Coffee Cork, French Church Street, 4, Cork** \n", - " - **Distance:** 53.15m \n", - " - **Description:** A charming coffee shop that focuses on quality brews and has a menu offering light bites, creating a perfect environment for coffee lovers.\n", - "\n", - "These establishments are the closest coffee options for you in Cork, making them excellent choices for your coffee cravings. Enjoy!\n" - ] - } - ], - "source": [ - "user_location = \"Cork, Ireland\"\n", - "radius = 1000\n", - "\n", - "result = coffee_pipeline.run(\n", - " {\n", - " \"osm_fetcher\": {},\n", - " \"prompt_builder\": {\n", - " \"user_location\": user_location,\n", - " \"radius\": radius,\n", - " \"query\": search_query,\n", - " },\n", - " }\n", - ")\n", - "\n", - "reply = result[\"llm\"][\"replies\"][0]\n", - "print(\"Role:\", reply.role)\n", - "print(\"\\nAssistant reply:\\n\")\n", - "print(reply.text)\n" + "output_type": "execute_result", + "data": { + "text/plain": [ + "\n", + "🚅 Components\n", + " - itinerary_osm_fetcher: OSMFetcher\n", + " - itinerary_prompt_builder: ChatPromptBuilder\n", + "🛤️ Connections\n", + " - itinerary_osm_fetcher.documents -> itinerary_prompt_builder.documents (List[Document])" ] + }, + "metadata": {}, + "execution_count": 199 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "Test Pipeline output" + ], + "metadata": { + "id": "hoBe5d7Gb3UK" + } + }, + { + "cell_type": "code", + "source": [ + "test_res = agent_itinerary_pipeline.run(\n", + " {\n", + " \"itinerary_prompt_builder\": {\n", + " \"user_request\": \"I want to spend an afternoon in Cork city centre...\",\n", + " \"template_variables\": {}\n", + " }\n", + " }\n", + ")\n", + "\n", + "msgs = test_res[\"itinerary_prompt_builder\"][\"prompt\"]\n", + "for m in msgs:\n", + " print(m.role, \":\\n\", m.text, \"\\n\")\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" }, + "id": "bFNJGinpb55q", + "outputId": "f7de75e9-f1d8-4cc2-e4ee-fec2705696db" + }, + "execution_count": 201, + "outputs": [ { - "cell_type": "markdown", - "metadata": { - "id": "wXA_fXMML6cT" - }, - "source": [ - "## Part 3 : Planning an afternoon itinerary with an Agent and OSM tools\n", - "\n" - ] + "output_type": "stream", + "name": "stdout", + "text": [ + "Current Query:\n", + "\n", + " [out:json][timeout:30][maxsize:4000000];\n", + " (\n", + " node[amenity](around:1000,51.898403,-8.473978);\n", + "node[tourism](around:1000,51.898403,-8.473978);\n", + "node[leisure](around:1000,51.898403,-8.473978);\n", + " );\n", + " out geom;\n", + " \n", + "Status: 200\n", + "Response: {\n", + " \"version\": 0.6,\n", + " \"generator\": \"Overpass API 0.7.62.8 e802775f\",\n", + " \"osm3s\": {\n", + " \"timestamp_osm_base\": \"2025-11-15T16:52:29Z\",\n", + " \"copyright\": \"The data included in this document is from www.ope...\n", + "[OSM_Doc_Converter] Reading Raw OSM GeoJson...\n", + "[OSM_Doc_Converter] Loaded 1052 entries.\n", + "[OSM_Doc_Converter] Batch-processing data cleaning.\n", + "ChatRole.SYSTEM :\n", + " You are a local travel planner in Cork, Ireland. Always answer in concise English. \n", + "\n", + "ChatRole.USER :\n", + " User request:\n", + "I want to spend an afternoon in Cork city centre...\n", + "\n", + "Here are some nearby locations from OpenStreetMap:\n", + "- bicycle_parking (type: bicycle_parking, distance: 2.0 m)\n", + "- bicycle_parking (type: bicycle_parking, distance: 9.9 m)\n", + "- bicycle_parking (type: bicycle_parking, distance: 12.5 m)\n", + "- bicycle_parking (type: bicycle_parking, distance: 14.5 m)\n", + "- waste_basket (type: waste_basket, distance: 15.4 m)\n", + "- bicycle_parking (type: bicycle_parking, distance: 21.5 m)\n", + "- bicycle_parking (type: bicycle_parking, distance: 23.6 m)\n", + "- bicycle_parking (type: bicycle_parking, distance: 23.8 m)\n", + "- Cork Walks (type: information, distance: 25.4 m)\n", + "- waste_basket (type: waste_basket, distance: 26.6 m)\n", + "- waste_basket (type: waste_basket, distance: 28.1 m)\n", + "- The Pavilion (type: events_venue, distance: 29.4 m)\n", + "- waste_basket (type: waste_basket, distance: 29.5 m)\n", + "- Burger King (type: fast_food, distance: 30.5 m)\n", + "- Fellini (type: cafe, distance: 30.9 m)\n", + "- bicycle_parking (type: bicycle_parking, distance: 35.3 m)\n", + "- The Pana Shuffle (type: artwork, distance: 35.5 m)\n", + "- Intermission Bar (type: bar, distance: 36.1 m)\n", + "- waste_basket (type: waste_basket, distance: 37.2 m)\n", + "- waste_basket (type: waste_basket, distance: 39.0 m)\n", + "- AbraKebabra (type: fast_food, distance: 40.0 m)\n", + "- Mutton Lane Inn (type: pub, distance: 41.0 m)\n", + "- Cafe Mexicana (type: restaurant, distance: 45.2 m)\n", + "- waste_basket (type: waste_basket, distance: 45.3 m)\n", + "- waste_basket (type: waste_basket, distance: 46.7 m)\n", + "- Boots (type: pharmacy, distance: 50.7 m)\n", + "- bicycle_parking (type: bicycle_parking, distance: 50.8 m)\n", + "- bench (type: bench, distance: 58.2 m)\n", + "- bicycle_parking (type: bicycle_parking, distance: 58.9 m)\n", + "- fountain (type: fountain, distance: 60.6 m)\n", + "- 14A Restaurant (type: restaurant, distance: 61.0 m)\n", + "- Oyster Tavern (type: pub, distance: 61.0 m)\n", + "- waste_basket (type: waste_basket, distance: 61.2 m)\n", + "- Soba Asian Street Food (type: fast_food, distance: 61.3 m)\n", + "- The Farmgate Café (type: restaurant, distance: 66.6 m)\n", + "- Koto (type: restaurant, distance: 67.5 m)\n", + "- Bank of Ireland (type: bank, distance: 69.2 m)\n", + "- Krispy Kreme (type: fast_food, distance: 71.5 m)\n", + "- Akira (type: restaurant, distance: 73.3 m)\n", + "- Euronet (type: atm, distance: 74.6 m)\n", + "\n", + "\n", + "Using this information, suggest 1–2 itineraries starting from a church or historic religious site, then a study-friendly cafe, and ending at a bar/pub. \n", + "\n" + ] + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "Wrap with PipelineTool" + ], + "metadata": { + "id": "-PPU018bcMbT" + } + }, + { + "cell_type": "code", + "source": [ + "from haystack.tools import PipelineTool\n", + "\n", + "osm_itinerary_tool = PipelineTool(\n", + " pipeline=agent_itinerary_pipeline,\n", + " name=\"osm_itinerary_tool\",\n", + " description=(\n", + " \"Fetches nearby POIs and \"\n", + " \"builds a chat-style prompt summarizing.\"\n", + " ),\n", + " # Tool 输入 -> Pipeline 输入\n", + " input_mapping={\n", + " # tool 的 \"user_request\" -> pipeline 的 \"prompt_builder.user_request\"\n", + " \"user_request\": [\"itinerary_prompt_builder.user_request\"],\n", + " },\n", + " # Pipeline 输出 -> Tool 输出名\n", + " output_mapping={\n", + " \"itinerary_prompt_builder.prompt\": \"prompt\",\n", + " },\n", + ")\n" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" }, + "id": "tAjAAUc4cYKE", + "outputId": "b3a2df0a-146a-4bc5-b829-1736d4935bf6" + }, + "execution_count": 202, + "outputs": [ { - "cell_type": "code", - "execution_count": 177, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "94GznuQYL7i0", - "outputId": "ff520151-27a4-4ba9-d29d-2420f36873c3" - }, - "outputs": [ - { - "data": { - "text/plain": [ - "\n", - "Inputs:\n", - " - center: Optional[Tuple[float, float]]\n", - " - radius_m: Optional[int]\n", - "Outputs:\n", - " - documents: List[Document]" - ] - }, - "execution_count": 177, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from osm_integration_haystack import OSMFetcher\n", - "\n", - "ITINERARY_CENTER = (51.898403, -8.473978) # (lat, lon)\n", - "ITINERARY_RADIUS_M = 1000 # 1000m 半径\n", - "\n", - "itinerary_fetcher = OSMFetcher(\n", - " preset_center=ITINERARY_CENTER,\n", - " preset_radius_m=ITINERARY_RADIUS_M,\n", - " target_osm_types=[\"node\"],\n", - " target_osm_tags=[\n", - " \"amenity\", # cafe / bar / pub / restaurant / place_of_worship\n", - " \"tourism\", # 景点、博物馆\n", - " \"leisure\", # 公园等\n", - " ],\n", - " maximum_query_mb=4,\n", - " overpass_timeout=30,\n", - ")\n", - "\n", - "itinerary_fetcher\n" - ] + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.12/dist-packages/pydantic/json_schema.py:2324: PydanticJsonSchemaWarning: Default value is not JSON serializable; excluding default from JSON schema [non-serializable-default]\n", + " warnings.warn(message, PydanticJsonSchemaWarning)\n" + ] } - ], - "metadata": { + ] + }, + { + "cell_type": "code", + "source": [ + "from haystack.components.generators.chat import OpenAIChatGenerator\n", + "from haystack.components.agents import Agent\n", + "from haystack.dataclasses import ChatMessage\n", + "from haystack.utils import Secret\n", + "\n", + "itinerary_llm = OpenAIChatGenerator(\n", + " api_key=Secret.from_env_var(\"OPENAI_API_KEY\"),\n", + " model=\"gpt-4o-mini\",\n", + ")\n", + "\n", + "itinerary_agent = Agent(\n", + " chat_generator=itinerary_llm,\n", + " tools=[osm_itinerary_tool],\n", + " system_prompt=(\n", + " \"You are a helpful local guide in Cork, Ireland.\\n\\n\"\n", + " \"When the user asks you to plan an itinerary, first call 'osm_itinerary_tool'. \"\n", + " \"This tool returns a list of chat messages under the field 'prompt', which already \"\n", + " \"contains the user's request and a list of nearby locations.\\n\\n\"\n", + " \"Read those messages carefully, then respond with 1–2 itineraries \"\n", + " \"(church -> cafe -> bar/pub), including approximate walking distances.\"\n", + " ),\n", + ")\n", + "\n", + "itinerary_agent.warm_up()" + ], + "metadata": { + "id": "y5pk5UBzZQ0m" + }, + "execution_count": 203, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "user_request = (\n", + " \"I want to spend an afternoon in Cork city centre. \"\n", + " \"Please plan 1–2 possible itineraries where I:\\n\"\n", + " \"1) start by visiting a church or historic religious site,\\n\"\n", + " \"2) then go to a quiet cafe where I can study or work on my laptop,\\n\"\n", + " \"3) and finally end the day in a nice bar or pub nearby.\\n\\n\"\n", + " \"All places should be within reasonable walking distance. \"\n", + " \"For each itinerary, please include the place names, approximate distances between stops, \"\n", + " \"and a short explanation of why you chose them.\"\n", + ")\n", + "\n", + "result = itinerary_agent.run(messages=[ChatMessage.from_user(user_request)])\n", + "\n", + "final_msg = result[\"messages\"][-1]\n", + "print(\"Final role:\", final_msg.role)\n", + "print(\"\\nAssistant final reply:\\n\")\n", + "print(final_msg.text)\n" + ], + "metadata": { "colab": { - "provenance": [] + "base_uri": "https://localhost:8080/" }, - "kernelspec": { - "display_name": "Python 3", - "name": "python3" - }, - "language_info": { - "name": "python" + "id": "ikF1Grx0Z7Bo", + "outputId": "814423b0-872a-4359-d054-b5a45953a77a" + }, + "execution_count": 204, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Current Query:\n", + "\n", + " [out:json][timeout:30][maxsize:4000000];\n", + " (\n", + " node[amenity](around:1000,51.898403,-8.473978);\n", + "node[tourism](around:1000,51.898403,-8.473978);\n", + "node[leisure](around:1000,51.898403,-8.473978);\n", + " );\n", + " out geom;\n", + " \n", + "Status: 200\n", + "Response: {\n", + " \"version\": 0.6,\n", + " \"generator\": \"Overpass API 0.7.62.8 e802775f\",\n", + " \"osm3s\": {\n", + " \"timestamp_osm_base\": \"2025-11-15T16:56:42Z\",\n", + " \"copyright\": \"The data included in this document is from www.ope...\n", + "[OSM_Doc_Converter] Reading Raw OSM GeoJson...\n", + "[OSM_Doc_Converter] Loaded 1052 entries.\n", + "[OSM_Doc_Converter] Batch-processing data cleaning.\n", + "Final role: ChatRole.ASSISTANT\n", + "\n", + "Assistant final reply:\n", + "\n", + "Here are two possible itineraries for an afternoon in Cork city centre:\n", + "\n", + "### Itinerary 1:\n", + "1. **Start: St. Anne's Shandon Church**\n", + " - **Description:** This iconic church is famous for its stunning architecture and views from the tower. It's an excellent spot to explore Cork's history.\n", + " - **Distance to next stop:** ~1.1 km (approx. 14 minutes walk)\n", + "\n", + "2. **Stop 2: The Farmgate Café**\n", + " - **Description:** Located at the English Market, this café offers a cozy atmosphere, delicious food, and a good environment for studying or working on your laptop.\n", + " - **Distance to next stop:** ~0.5 km (approx. 6 minutes walk)\n", + "\n", + "3. **End: Mutton Lane Inn**\n", + " - **Description:** A charming pub with a fantastic selection of local beers and a vibrant atmosphere, perfect for winding down after a day of exploration.\n", + " - **Total Walking Distance:** ~1.6 km\n", + "\n", + "---\n", + "\n", + "### Itinerary 2:\n", + "1. **Start: St. Patrick's Street Church**\n", + " - **Description:** This historic church is known for its beautiful stained glass and serene ambiance, ideal for a peaceful start to your afternoon.\n", + " - **Distance to next stop:** ~0.8 km (approx. 10 minutes walk)\n", + "\n", + "2. **Stop 2: Fellini**\n", + " - **Description:** A quaint café popular among students and remote workers, offering a quiet space and great coffee for studying or working.\n", + " - **Distance to next stop:** ~0.3 km (approx. 4 minutes walk)\n", + "\n", + "3. **End: Intermission Bar**\n", + " - **Description:** A laid-back bar with excellent service, perfect for enjoying local drinks and soaking in the evening atmosphere after a productive day.\n", + " - **Total Walking Distance:** ~1.1 km\n", + "\n", + "Both itineraries include paths that are easy to navigate and will lead you through some of Cork's most delightful locations. Enjoy your afternoon!\n" + ] } - }, - "nbformat": 4, - "nbformat_minor": 0 -} + ] + } + ] +} \ No newline at end of file From f53a966c1d9f74c840cb80cf1f0813f158538c32 Mon Sep 17 00:00:00 2001 From: grexrr Date: Sun, 16 Nov 2025 01:09:03 +0000 Subject: [PATCH 3/3] update --- notebooks/openstreetmap_rag_pipeline.ipynb | 2217 +++++++++++--------- 1 file changed, 1210 insertions(+), 1007 deletions(-) diff --git a/notebooks/openstreetmap_rag_pipeline.ipynb b/notebooks/openstreetmap_rag_pipeline.ipynb index 9370d1a..bbb8862 100644 --- a/notebooks/openstreetmap_rag_pipeline.ipynb +++ b/notebooks/openstreetmap_rag_pipeline.ipynb @@ -1,1034 +1,1237 @@ { - "nbformat": 4, - "nbformat_minor": 0, - "metadata": { - "colab": { - "provenance": [] - }, - "kernelspec": { - "name": "python3", - "display_name": "Python 3" - }, - "language_info": { - "name": "python" - } - }, - "cells": [ - { - "cell_type": "markdown", - "source": [ - "# OpenStreetMap RAG pipeline" - ], - "metadata": { - "id": "Qo0Iu0MMaOTU" - } - }, - { - "cell_type": "markdown", - "source": [ - "## OpenStreetMap + Haystack: From basic queries to agents\n", - "\n", - "       \n", - "\n", - "\n", - "[OpenStreetMap](https://www.openstreetmap.org/) is a free, community-driven map of the world. In this notebook, we use the [osm-integration-haystack](https://github.com/grexrr/osm-integration-haystack) package to turn OpenStreetMap data into `Haystack Document`s and then plug them into LLM workflows.\n", - "\n", - "We'll together walk through two progressively more advanced scenarios:\n", - "\n", - "1. **Basic OSM query → LLM summarization** \n", - " Use `OSMFetcher` to retrieve and preprocess nearby points of interest (POIs) around Cork city centre, then build a prompt that summarizes the locations for a specific user query (e.g. “find coffee shops nearby”).\n", - "\n", - "2. **Agent + tools: itinerary planner** \n", - " Wrap an OSM-based pipeline as a Haystack `PipelineTool`, expose it to an agent and let the LLM call this tool to plan an afternoon itinerary in Cork." - ], - "metadata": { - "id": "MGH9HBLVagni" - } - }, - { - "cell_type": "markdown", - "source": [ - "## Setup" - ], - "metadata": { - "id": "6266U8y-avQ9" - } - }, - { - "cell_type": "code", - "source": [ - "!pip install -q haystack-ai osm-integration-haystack" - ], - "metadata": { - "id": "u-Acpx6na0DA" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "## Part 1: OpenStreetMap + LLM Summarization\n", - "\n", - "This part is a **preparation step** before using Agents and tools. \n", - "We focus on turning raw OpenStreetMap data into a small, vector-like knowledge base via `OSMFetcher`, and then asking an LLM to summarize it. In simpler terms, Part 1 demonstrates the basic pattern:\n", - "\n", - "🗺️ OpenStreetMap (Overpass API) \n", - "  → 📡 OSMFetcher \n", - "  → 📄 Documents (our vectorized knowledge base) \n", - "  → 🧩 ChatPromptBuilder + 🧠 OpenAIChatGenerator \n", - "  → 🤖 LLM summarization\n", - "\n", - "This will lay the foundation for more complex, **agentic** behavior in Part 2, where we'll wrap this logic into a reusable tool that an Agent can call automatically." - ], - "metadata": { - "id": "2SVEZnuVdpXl" - } - }, - { - "cell_type": "markdown", - "source": [ - "**Authorization**\n", - "\n", - "Before start, you need to provide your own OpenAI API key:" - ], - "metadata": { - "id": "tvFHVh7IgdoD" - } - }, - { - "cell_type": "code", - "source": [ - "import os\n", - "from getpass import getpass\n", - "\n", - "if \"OPENAI_API_KEY\" in os.environ:\n", - " del os.environ[\"OPENAI_API_KEY\"]\n", - "\n", - "if \"OPENAI_API_KEY\" not in os.environ:\n", - " os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter OpenAI API key:\")" - ], - "metadata": { - "id": "tLZGEaPxkb4y" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "markdown", - "source": [ - "**Extra:** From Name (String) to Coordinates (Tuple)\n", - "\n", - "In this example we use [Nominatim](https://nominatim.org/) to **geocode** the place name \n", - "*Saints Peter and Paul's Catholic Church* into latitude/longitude coordinates. \n", - "\n", - "This is not the main focus of the notebook. In real-world geocoding workflows you usually have to deal with ambiguity, match quality, and various string-cleaning heuristics, which are out of scope here. In most map-based applications, for accuracy and robustness, backend services expect a concrete `(latitude, longitude)` tuple rather than raw location strings." - ], - "metadata": { - "id": "NgzxySwdjHFs" - } - }, - { - "cell_type": "code", - "source": [ - "!pip install -q geopy" - ], - "metadata": { - "id": "pOfsHARDjRzN" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "from geopy.geocoders import Nominatim\n", - "\n", - "geolocator = Nominatim(user_agent=\"haystack-osm-cookbook-demo\")\n", - "\n", - "# Geo-decoding a name string into geocode\n", - "location_name = \"saints peter and paul's catholic church\"\n", - "location = geolocator.geocode(location_name)\n", - "\n", - "print(f\"Query: {location_name}\")\n", - "print(f\"Latitude: {location.latitude}\")\n", - "print(f\"Longitude: {location.longitude}\")\n", - "print(f\"Display name: {location.address}\")\n" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "Qo0Iu0MMaOTU" + }, + "source": [ + "# OpenStreetMap RAG pipeline" + ] }, - "id": "z7r0PZf1jUpL", - "outputId": "067fd540-1e3f-468d-9386-ee76742614ac" - }, - "execution_count": null, - "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "Query: saints peter and paul's catholic church\n", - "Latitude: 51.8989077\n", - "Longitude: -8.4743188\n", - "Display name: Saints Peter and Paul's Catholic Church, Carey's Lane, The Marsh, Centre B ED, Cork, County Cork, Munster, T12 FH27, Éire / Ireland\n" - ] - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "### Step 1\n", - "Here we can just use the coordinate turple as the more conventional input." - ], - "metadata": { - "id": "4nFzlApoj8_L" - } - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "slQpVtXHWvIH" - }, - "outputs": [], - "source": [ - "from osm_integration_haystack import OSMFetcher\n", - "\n", - "CENTER = (51.8989077, -8.4743188) # (lat, lon)\n", - "RADIUS_M = 1000" - ] - }, - { - "cell_type": "code", - "source": [ - "osm_fetcher = OSMFetcher(\n", - " preset_center=CENTER, # Cork, Ireland\n", - " preset_radius_m=RADIUS_M, # 200m radius\n", - " target_osm_types=[\"node\"], # Only search nodes\n", - " target_osm_tags=[\"amenity\"], # Search amenity types\n", - " maximum_query_mb=2, # Limit query size\n", - " overpass_timeout=20\n", - " )" - ], - "metadata": { - "id": "4hO-qR8Lewgj" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "result = osm_fetcher.run() # Haystack component 标准接口\n", - "documents = result[\"documents\"]" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" + "cell_type": "markdown", + "metadata": { + "id": "MGH9HBLVagni" + }, + "source": [ + "## OpenStreetMap + Haystack: From basic queries to agents\n", + "\n", + "       \n", + "\n", + "\n", + "[OpenStreetMap](https://www.openstreetmap.org/) is a free, community-driven map of the world. In this notebook, we use the [osm-integration-haystack](https://github.com/grexrr/osm-integration-haystack) package to turn OpenStreetMap data into `Haystack Document`s and then plug them into LLM workflows.\n", + "\n", + "We'll together walk through two progressively more advanced scenarios:\n", + "\n", + "1. **Basic OSM query → LLM summarization** \n", + " Use `OSMFetcher` to retrieve and preprocess nearby points of interest (POIs) around Cork city centre, then build a prompt that summarizes the locations for a specific user query (e.g. “find coffee shops nearby”).\n", + "\n", + "2. **Agent + tools: itinerary planner** \n", + " Wrap an OSM-based pipeline as a Haystack `PipelineTool`, expose it to an agent and let the LLM call this tool to plan an afternoon itinerary in Cork." + ] }, - "id": "y_x4EHRKv9Te", - "outputId": "70584bae-dfc1-4863-fd25-c053d21b4921" - }, - "execution_count": null, - "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "Current Query:\n", - "\n", - " [out:json][timeout:20][maxsize:2000000];\n", - " (\n", - " node[amenity](around:1000,51.8989077,-8.4743188);\n", - " );\n", - " out geom;\n", - " \n", - "Status: 200\n", - "Response: {\n", - " \"version\": 0.6,\n", - " \"generator\": \"Overpass API 0.7.62.8 e802775f\",\n", - " \"osm3s\": {\n", - " \"timestamp_osm_base\": \"2025-11-15T15:10:27Z\",\n", - " \"copyright\": \"The data included in this document is from www.ope...\n", - "[OSM_Doc_Converter] Reading Raw OSM GeoJson...\n", - "[OSM_Doc_Converter] Loaded 955 entries.\n", - "[OSM_Doc_Converter] Batch-processing data cleaning.\n" - ] - } - ] - }, - { - "cell_type": "code", - "source": [ - "from pprint import pprint\n", - "\n", - "first_doc = documents[0]\n", - "print(\"📄 type:\", type(first_doc))\n", - "\n", - "print(\"\\n--- content ---\")\n", - "print(first_doc.content)\n", - "\n", - "print(\"\\n--- meta keys ---\")\n", - "print(list(first_doc.meta.keys()))\n", - "\n", - "print(\"\\n--- full meta ---\")\n", - "pprint(first_doc.meta)\n" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" + "cell_type": "markdown", + "metadata": { + "id": "6266U8y-avQ9" + }, + "source": [ + "## Setup" + ] }, - "id": "DgwnILdAwCyl", - "outputId": "1b7e6004-ba01-4c4c-ef06-96cee27e5ab9" - }, - "execution_count": null, - "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "📄 type: \n", - "\n", - "--- content ---\n", - "Restaurant: Koto, Carey's Lane, 6-7, T12 FH27. Tags: opening_hours=Mo-Su 12:00-22:00\n", - "\n", - "--- meta keys ---\n", - "['source', 'osm_id', 'osm_type', 'lat', 'lon', 'name', 'category', 'tags', 'tags_norm', 'address', 'distance_m']\n", - "\n", - "--- full meta ---\n", - "{'address': {'housenumber': '6-7',\n", - " 'postcode': 'T12 FH27',\n", - " 'street': \"Carey's Lane\"},\n", - " 'category': 'restaurant',\n", - " 'distance_m': 27.86087599824802,\n", - " 'lat': 51.8990101,\n", - " 'lon': -8.4739482,\n", - " 'name': 'Koto',\n", - " 'osm_id': 5203928867,\n", - " 'osm_type': 'node',\n", - " 'source': 'openstreetmap',\n", - " 'tags': {'amenity': 'restaurant',\n", - " 'contact:facebook': 'https://www.facebook.com/KotoCork/',\n", - " 'contact:instagram': 'https://www.instagram.com/kotocork',\n", - " 'cuisine': 'asian',\n", - " 'email': 'info@koto.ie',\n", - " 'opening_hours': 'Mo-Su 12:00-22:00',\n", - " 'phone': '+353-21-4274172',\n", - " 'smoking': 'no',\n", - " 'website': 'https://koto.ie/'},\n", - " 'tags_norm': {'amenity': 'restaurant',\n", - " 'contact_facebook': 'https://www.facebook.com/KotoCork/',\n", - " 'contact_instagram': 'https://www.instagram.com/kotocork',\n", - " 'cuisine': 'asian',\n", - " 'email': 'info@koto.ie',\n", - " 'opening_hours': 'Mo-Su 12:00-22:00',\n", - " 'phone': '+353-21-4274172',\n", - " 'smoking': False,\n", - " 'website': 'https://koto.ie/'}}\n" - ] - } - ] - }, - { - "cell_type": "code", - "source": [ - "def preview_documents(docs, limit=5):\n", - " print(f\"Previewing first {min(len(docs), limit)} documents:\\n\")\n", - "\n", - " for i, doc in enumerate(docs[:limit], start=1):\n", - " name = doc.meta.get(\"name\", \"Unknown\")\n", - " category = doc.meta.get(\"category\", \"Unknown\")\n", - " distance = doc.meta.get(\"distance_m\", 0.0)\n", - " lat = doc.meta.get(\"lat\")\n", - " lon = doc.meta.get(\"lon\")\n", - "\n", - " print(f\"{i}. {name}\")\n", - " print(f\" Type: {category}\")\n", - " print(f\" Distance: {distance:.1f} m\")\n", - " print(f\" Location: ({lat}, {lon})\")\n", - " print(f\" Content: {doc.content[:120]}{'...' if len(doc.content) > 120 else ''}\")\n", - " print()\n", - "\n", - "preview_documents(documents, limit=5)\n" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" + "cell_type": "code", + "execution_count": 1, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "u-Acpx6na0DA", + "outputId": "b8122b37-d53e-4120-ae17-68f9cb21d70d" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/624.7 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[90m╺\u001b[0m \u001b[32m614.4/624.7 kB\u001b[0m \u001b[31m18.3 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m624.7/624.7 kB\u001b[0m \u001b[31m13.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/145.2 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m145.2/145.2 kB\u001b[0m \u001b[31m16.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/80.0 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m80.0/80.0 kB\u001b[0m \u001b[31m9.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", + "\u001b[?25h" + ] + } + ], + "source": [ + "!pip install -q haystack-ai osm-integration-haystack" + ] }, - "id": "hEYMy0ZKwKcy", - "outputId": "0e83a3a3-a7ad-4943-950f-0b3932dce2c4" - }, - "execution_count": null, - "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "Previewing first 5 documents:\n", - "\n", - "1. Koto\n", - " Type: restaurant\n", - " Distance: 27.9 m\n", - " Location: (51.8990101, -8.4739482)\n", - " Content: Restaurant: Koto, Carey's Lane, 6-7, T12 FH27. Tags: opening_hours=Mo-Su 12:00-22:00\n", - "\n", - "2. Dukes\n", - " Type: cafe\n", - " Distance: 28.7 m\n", - " Location: (51.8991234, -8.474089)\n", - " Content: Cafe: Dukes, Carey's Lane, 4, Cork.\n", - "\n", - "3. Soba Asian Street Food\n", - " Type: fast_food\n", - " Distance: 30.1 m\n", - " Location: (51.8989516, -8.4738856)\n", - " Content: Fast_food: Soba Asian Street Food.\n", - "\n", - "4. OffBeat Donuts\n", - " Type: fast_food\n", - " Distance: 35.1 m\n", - " Location: (51.8990968, -8.4739097)\n", - " Content: Fast_food: OffBeat Donuts, French Church Street, 17, Cork.\n", - "\n", - "5. Burritos and Blues\n", - " Type: fast_food\n", - " Distance: 43.6 m\n", - " Location: (51.899271, -8.4745565)\n", - " Content: Fast_food: Burritos and Blues, Paul Street, 9, Cork. Tags: opening_hours=Mo-We 12:00-20:00; Th-Sa 12:00-21:00; Su 13:00-...\n", - "\n" - ] - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "## Part2: Pipeline to look for the nearest coffee shop" - ], - "metadata": { - "id": "R3WqXmSzyIbx" - } - }, - { - "cell_type": "code", - "source": [ - "from haystack import Pipeline\n", - "from haystack.components.builders import ChatPromptBuilder\n", - "from haystack.components.generators.chat import OpenAIChatGenerator\n", - "from haystack.dataclasses import ChatMessage\n", - "from haystack.utils import Secret" - ], - "metadata": { - "id": "kzPKYWR4y3rb" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "prompt_template = [\n", - " ChatMessage.from_system(\n", - " \"You are a geographic information assistant. \"\n", - " \"Based on the provided OpenStreetMap data, help the user find nearby places that match the user's query.\"\n", - " ),\n", - " ChatMessage.from_user(\n", - " \"\"\"\n", - " User location: {{ user_location }}\n", - " Search radius: {{ radius }}m\n", - " User query: {{ query }}\n", - "\n", - " Available location data:\n", - " {% for document in documents %}\n", - " - {{ document.content }}\n", - " Location: ({{ document.meta.lat }}, {{ document.meta.lon }})\n", - " Distance: {{ document.meta.distance_m }}m\n", - " Type: {{ document.meta.category }}\n", - " {% endfor %}\n", - "\n", - " Please:\n", - " 1. Find all locations that are relevant to the user's query\n", - " 2. Sort them by distance\n", - " 3. Recommend the nearest 3 locations\n", - " 4. Provide a short description for each\n", - "\n", - " Please respond in English.\n", - " \"\"\"\n", - " ),\n", - "]\n", - "\n", - "prompt_builder = ChatPromptBuilder(\n", - " template=prompt_template,\n", - " required_variables=[\"user_location\", \"radius\", \"query\", \"documents\"], # optional, depends on what your pipeline requires\n", - ")\n" - ], - "metadata": { - "id": "t1AFay1Wy6HJ" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "llm = OpenAIChatGenerator(\n", - " api_key=Secret.from_env_var(\"OPENAI_API_KEY\"),\n", - " model=\"gpt-4o-mini\",\n", - ")" - ], - "metadata": { - "id": "2m5fzhegy8MT" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "coffee_pipeline = Pipeline()\n", - "coffee_pipeline.add_component(\"osm_fetcher\", osm_fetcher)\n", - "coffee_pipeline.add_component(\"prompt_builder\", prompt_builder)\n", - "coffee_pipeline.add_component(\"llm\", llm)\n", - "\n", - "# documents to prompt_builder\n", - "coffee_pipeline.connect(\"osm_fetcher.documents\", \"prompt_builder.documents\")\n", - "# ChatPromptBuilder output toward prompt(List[ChatMessage]) as llm.messages\n", - "coffee_pipeline.connect(\"prompt_builder.prompt\", \"llm.messages\")\n" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" + "cell_type": "markdown", + "metadata": { + "id": "2SVEZnuVdpXl" + }, + "source": [ + "## Part 1: Knowledge base Vectorization\n", + "\n", + "This part is a **preparation step** before using Agents and tools. \n", + "We focus on turning raw OpenStreetMap data into a small, vector-like knowledge base via `OSMFetcher`, and then in the next part we'll asking an LLM to summarize it. In simpler terms, Part 1 demonstrates the step 1-2 of the basic pattern:\n", + "\n", + "🗺️ OpenStreetMap (Overpass API) \n", + "  → 1. 📡 OSMFetcher \n", + "  → 2. 📄 Documents (our vectorized knowledge base) \n", + "  → 3. 🧩 ChatPromptBuilder + 🧠 OpenAIChatGenerator \n", + "  → 4. 🤖 LLM summarization\n", + "\n", + "This will lay the foundation for more complex, **agentic** behavior introduced in the later sections, where we'll wrap this logic into a reusable tool that an agent can call automatically." + ] }, - "id": "o9HPsQUky-V2", - "outputId": "79f5ddc4-7482-4045-eb03-5b86792d32d2" - }, - "execution_count": null, - "outputs": [ { - "output_type": "execute_result", - "data": { - "text/plain": [ - "\n", - "🚅 Components\n", - " - osm_fetcher: OSMFetcher\n", - " - prompt_builder: ChatPromptBuilder\n", - " - llm: OpenAIChatGenerator\n", - "🛤️ Connections\n", - " - osm_fetcher.documents -> prompt_builder.documents (List[Document])\n", - " - prompt_builder.prompt -> llm.messages (list[ChatMessage])" + "cell_type": "markdown", + "metadata": { + "id": "tvFHVh7IgdoD" + }, + "source": [ + "**Authorization**\n", + "\n", + "Before start, you need to provide your own OpenAI API key:" ] - }, - "metadata": {}, - "execution_count": 152 - } - ] - }, - { - "cell_type": "code", - "source": [ - "search_query = \"coffee shop\"" - ], - "metadata": { - "id": "gLebpkTG1JD8" - }, - "execution_count": null, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "user_location = \"Cork, Ireland\"\n", - "radius = 1000\n", - "\n", - "result = coffee_pipeline.run(\n", - " {\n", - " \"osm_fetcher\": {},\n", - " \"prompt_builder\": {\n", - " \"user_location\": user_location,\n", - " \"radius\": radius,\n", - " \"query\": search_query,\n", - " },\n", - " }\n", - ")\n", - "\n", - "reply = result[\"llm\"][\"replies\"][0]\n", - "print(\"Role:\", reply.role)\n", - "print(\"\\nAssistant reply:\\n\")\n", - "print(reply.text)\n" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" }, - "id": "O87vWjTSzAI0", - "outputId": "e85bfcea-00f0-4a48-bb3e-415183e5df12" - }, - "execution_count": null, - "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "Current Query:\n", - "\n", - " [out:json][timeout:20][maxsize:2000000];\n", - " (\n", - " node[amenity](around:1000,51.8989077,-8.4743188);\n", - " );\n", - " out geom;\n", - " \n", - "Status: 200\n", - "Response: {\n", - " \"version\": 0.6,\n", - " \"generator\": \"Overpass API 0.7.62.8 e802775f\",\n", - " \"osm3s\": {\n", - " \"timestamp_osm_base\": \"2025-11-15T15:11:30Z\",\n", - " \"copyright\": \"The data included in this document is from www.ope...\n", - "[OSM_Doc_Converter] Reading Raw OSM GeoJson...\n", - "[OSM_Doc_Converter] Loaded 955 entries.\n", - "[OSM_Doc_Converter] Batch-processing data cleaning.\n", - "Role: ChatRole.ASSISTANT\n", - "\n", - "Assistant reply:\n", - "\n", - "Based on your query for coffee shops in Cork within a 1000m radius from your location, here are the nearest three options:\n", - "\n", - "1. **Dukes, Carey's Lane, 4, Cork** \n", - " - **Distance:** 28.70m \n", - " - **Description:** A cozy café located on Carey's Lane, perfect for grabbing a quick coffee or enjoying a light snack in a relaxed atmosphere.\n", - "\n", - "2. **Plus & Minus, Cork** \n", - " - **Distance:** 45.59m \n", - " - **Description:** This café offers a range of coffee options alongside an inviting ambiance, ideal for both work and socializing.\n", - "\n", - "3. **Rebel Coffee Cork, French Church Street, 4, Cork** \n", - " - **Distance:** 53.15m \n", - " - **Description:** A charming coffee shop that focuses on quality brews and has a menu offering light bites, creating a perfect environment for coffee lovers.\n", - "\n", - "These establishments are the closest coffee options for you in Cork, making them excellent choices for your coffee cravings. Enjoy!\n" - ] - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "## Part 3 : Planning an afternoon itinerary with an Agent and OSM tools\n", - "\n" - ], - "metadata": { - "id": "wXA_fXMML6cT" - } - }, - { - "cell_type": "code", - "source": [ - "from osm_integration_haystack import OSMFetcher\n", - "\n", - "CENTER = (51.898403, -8.473978)\n", - "RADIUS_M = 1000\n", - "\n", - "itinerary_fetcher = OSMFetcher(\n", - " preset_center=CENTER,\n", - " preset_radius_m=RADIUS_M,\n", - " target_osm_types=[\"node\"],\n", - " target_osm_tags=[\n", - " \"amenity\",\n", - " \"tourism\",\n", - " \"leisure\",\n", - " ],\n", - " maximum_query_mb=4,\n", - " overpass_timeout=30,\n", - ")\n" - ], - "metadata": { - "id": "94GznuQYL7i0" - }, - "execution_count": 197, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "from haystack.components.builders import ChatPromptBuilder\n", - "from haystack.dataclasses import ChatMessage\n", - "\n", - "itinerary_prompt_template = [\n", - " ChatMessage.from_system(\n", - " \"You are a local travel planner in Cork, Ireland. \"\n", - " \"Always answer in concise English.\"\n", - " ),\n", - " ChatMessage.from_user(\n", - " \"User request:\\n{{ user_request }}\\n\\n\"\n", - " \"Here are some nearby locations from OpenStreetMap:\\n\"\n", - " \"{% if documents %}\"\n", - " \"{% for doc in documents[:40] %}\"\n", - " \"- {{ doc.meta.get('name', 'Unknown') }} \"\n", - " \"(type: {{ doc.meta.get('category', 'unknown') }}, \"\n", - " \"distance: {{ '%.1f'|format(doc.meta.get('distance_m', 0)) }} m)\\n\"\n", - " \"{% endfor %}\"\n", - " \"{% else %}\"\n", - " \"No locations available.\\n\"\n", - " \"{% endif %}\\n\\n\"\n", - " \"Using this information, suggest 1–2 itineraries starting from a church or \"\n", - " \"historic religious site, then a study-friendly cafe, and ending at a bar/pub.\"\n", - " ),\n", - "]\n", - "\n", - "itinerary_prompt_builder = ChatPromptBuilder(template=itinerary_prompt_template)\n" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" + "cell_type": "code", + "execution_count": 2, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "tLZGEaPxkb4y", + "outputId": "fcbe2edb-d574-4b0a-df64-5da3652dd81f" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Enter OpenAI API key:··········\n" + ] + } + ], + "source": [ + "import os\n", + "from getpass import getpass\n", + "\n", + "if \"OPENAI_API_KEY\" in os.environ:\n", + " del os.environ[\"OPENAI_API_KEY\"]\n", + "\n", + "if \"OPENAI_API_KEY\" not in os.environ:\n", + " os.environ[\"OPENAI_API_KEY\"] = getpass(\"Enter OpenAI API key:\")" + ] }, - "id": "nvAaFQC3XJ0I", - "outputId": "445cccd7-8b27-4045-aeb5-117cfcc5d135" - }, - "execution_count": 198, - "outputs": [ { - "output_type": "stream", - "name": "stderr", - "text": [ - "WARNING:haystack.components.builders.chat_prompt_builder:ChatPromptBuilder has 2 prompt variables, but `required_variables` is not set. By default, all prompt variables are treated as optional, which may lead to unintended behavior in multi-branch pipelines. To avoid unexpected execution, ensure that variables intended to be required are explicitly set in `required_variables`.\n" - ] - } - ] - }, - { - "cell_type": "code", - "source": [ - "from haystack import Pipeline\n", - "\n", - "agent_itinerary_pipeline = Pipeline()\n", - "agent_itinerary_pipeline.add_component(\"itinerary_osm_fetcher\", itinerary_fetcher)\n", - "agent_itinerary_pipeline.add_component(\"itinerary_prompt_builder\", itinerary_prompt_builder)\n", - "\n", - "# 把 OSMFetcher 的 documents 塞进 ChatPromptBuilder 的 template_variables.documents\n", - "agent_itinerary_pipeline.connect(\n", - " \"itinerary_osm_fetcher.documents\",\n", - " \"itinerary_prompt_builder.documents\",\n", - ")\n" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" + "cell_type": "markdown", + "metadata": { + "id": "NgzxySwdjHFs" + }, + "source": [ + "**Extra:** From Name (String) to Coordinates (Tuple)\n", + "\n", + "In this example we use [Nominatim](https://nominatim.org/) to **geocode** the place name \n", + "*Saints Peter and Paul's Catholic Church* into latitude/longitude coordinates. \n", + "\n", + "This is not the main focus of the notebook. In real-world geocoding workflows you usually have to deal with ambiguity, match quality, and various string-cleaning heuristics, which are out of scope here. In most map-based applications, for accuracy and robustness, backend services expect a concrete `(latitude, longitude)` tuple rather than raw location strings.\n", + "\n", + "Feel free to use any places or landmark that you want!" + ] }, - "id": "P6NvzwTEXZRl", - "outputId": "6856790d-c5eb-44f6-e43d-13daea1806a0" - }, - "execution_count": 199, - "outputs": [ { - "output_type": "execute_result", - "data": { - "text/plain": [ - "\n", - "🚅 Components\n", - " - itinerary_osm_fetcher: OSMFetcher\n", - " - itinerary_prompt_builder: ChatPromptBuilder\n", - "🛤️ Connections\n", - " - itinerary_osm_fetcher.documents -> itinerary_prompt_builder.documents (List[Document])" + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "pOfsHARDjRzN" + }, + "outputs": [], + "source": [ + "!pip install -q geopy" ] - }, - "metadata": {}, - "execution_count": 199 - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "Test Pipeline output" - ], - "metadata": { - "id": "hoBe5d7Gb3UK" - } - }, - { - "cell_type": "code", - "source": [ - "test_res = agent_itinerary_pipeline.run(\n", - " {\n", - " \"itinerary_prompt_builder\": {\n", - " \"user_request\": \"I want to spend an afternoon in Cork city centre...\",\n", - " \"template_variables\": {}\n", - " }\n", - " }\n", - ")\n", - "\n", - "msgs = test_res[\"itinerary_prompt_builder\"][\"prompt\"]\n", - "for m in msgs:\n", - " print(m.role, \":\\n\", m.text, \"\\n\")\n" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" }, - "id": "bFNJGinpb55q", - "outputId": "f7de75e9-f1d8-4cc2-e4ee-fec2705696db" - }, - "execution_count": 201, - "outputs": [ { - "output_type": "stream", - "name": "stdout", - "text": [ - "Current Query:\n", - "\n", - " [out:json][timeout:30][maxsize:4000000];\n", - " (\n", - " node[amenity](around:1000,51.898403,-8.473978);\n", - "node[tourism](around:1000,51.898403,-8.473978);\n", - "node[leisure](around:1000,51.898403,-8.473978);\n", - " );\n", - " out geom;\n", - " \n", - "Status: 200\n", - "Response: {\n", - " \"version\": 0.6,\n", - " \"generator\": \"Overpass API 0.7.62.8 e802775f\",\n", - " \"osm3s\": {\n", - " \"timestamp_osm_base\": \"2025-11-15T16:52:29Z\",\n", - " \"copyright\": \"The data included in this document is from www.ope...\n", - "[OSM_Doc_Converter] Reading Raw OSM GeoJson...\n", - "[OSM_Doc_Converter] Loaded 1052 entries.\n", - "[OSM_Doc_Converter] Batch-processing data cleaning.\n", - "ChatRole.SYSTEM :\n", - " You are a local travel planner in Cork, Ireland. Always answer in concise English. \n", - "\n", - "ChatRole.USER :\n", - " User request:\n", - "I want to spend an afternoon in Cork city centre...\n", - "\n", - "Here are some nearby locations from OpenStreetMap:\n", - "- bicycle_parking (type: bicycle_parking, distance: 2.0 m)\n", - "- bicycle_parking (type: bicycle_parking, distance: 9.9 m)\n", - "- bicycle_parking (type: bicycle_parking, distance: 12.5 m)\n", - "- bicycle_parking (type: bicycle_parking, distance: 14.5 m)\n", - "- waste_basket (type: waste_basket, distance: 15.4 m)\n", - "- bicycle_parking (type: bicycle_parking, distance: 21.5 m)\n", - "- bicycle_parking (type: bicycle_parking, distance: 23.6 m)\n", - "- bicycle_parking (type: bicycle_parking, distance: 23.8 m)\n", - "- Cork Walks (type: information, distance: 25.4 m)\n", - "- waste_basket (type: waste_basket, distance: 26.6 m)\n", - "- waste_basket (type: waste_basket, distance: 28.1 m)\n", - "- The Pavilion (type: events_venue, distance: 29.4 m)\n", - "- waste_basket (type: waste_basket, distance: 29.5 m)\n", - "- Burger King (type: fast_food, distance: 30.5 m)\n", - "- Fellini (type: cafe, distance: 30.9 m)\n", - "- bicycle_parking (type: bicycle_parking, distance: 35.3 m)\n", - "- The Pana Shuffle (type: artwork, distance: 35.5 m)\n", - "- Intermission Bar (type: bar, distance: 36.1 m)\n", - "- waste_basket (type: waste_basket, distance: 37.2 m)\n", - "- waste_basket (type: waste_basket, distance: 39.0 m)\n", - "- AbraKebabra (type: fast_food, distance: 40.0 m)\n", - "- Mutton Lane Inn (type: pub, distance: 41.0 m)\n", - "- Cafe Mexicana (type: restaurant, distance: 45.2 m)\n", - "- waste_basket (type: waste_basket, distance: 45.3 m)\n", - "- waste_basket (type: waste_basket, distance: 46.7 m)\n", - "- Boots (type: pharmacy, distance: 50.7 m)\n", - "- bicycle_parking (type: bicycle_parking, distance: 50.8 m)\n", - "- bench (type: bench, distance: 58.2 m)\n", - "- bicycle_parking (type: bicycle_parking, distance: 58.9 m)\n", - "- fountain (type: fountain, distance: 60.6 m)\n", - "- 14A Restaurant (type: restaurant, distance: 61.0 m)\n", - "- Oyster Tavern (type: pub, distance: 61.0 m)\n", - "- waste_basket (type: waste_basket, distance: 61.2 m)\n", - "- Soba Asian Street Food (type: fast_food, distance: 61.3 m)\n", - "- The Farmgate Café (type: restaurant, distance: 66.6 m)\n", - "- Koto (type: restaurant, distance: 67.5 m)\n", - "- Bank of Ireland (type: bank, distance: 69.2 m)\n", - "- Krispy Kreme (type: fast_food, distance: 71.5 m)\n", - "- Akira (type: restaurant, distance: 73.3 m)\n", - "- Euronet (type: atm, distance: 74.6 m)\n", - "\n", - "\n", - "Using this information, suggest 1–2 itineraries starting from a church or historic religious site, then a study-friendly cafe, and ending at a bar/pub. \n", - "\n" - ] - } - ] - }, - { - "cell_type": "markdown", - "source": [ - "Wrap with PipelineTool" - ], - "metadata": { - "id": "-PPU018bcMbT" - } - }, - { - "cell_type": "code", - "source": [ - "from haystack.tools import PipelineTool\n", - "\n", - "osm_itinerary_tool = PipelineTool(\n", - " pipeline=agent_itinerary_pipeline,\n", - " name=\"osm_itinerary_tool\",\n", - " description=(\n", - " \"Fetches nearby POIs and \"\n", - " \"builds a chat-style prompt summarizing.\"\n", - " ),\n", - " # Tool 输入 -> Pipeline 输入\n", - " input_mapping={\n", - " # tool 的 \"user_request\" -> pipeline 的 \"prompt_builder.user_request\"\n", - " \"user_request\": [\"itinerary_prompt_builder.user_request\"],\n", - " },\n", - " # Pipeline 输出 -> Tool 输出名\n", - " output_mapping={\n", - " \"itinerary_prompt_builder.prompt\": \"prompt\",\n", - " },\n", - ")\n" - ], - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" + "cell_type": "code", + "execution_count": 4, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "z7r0PZf1jUpL", + "outputId": "0c543243-f0fc-442a-ca06-3ae3def07bd6" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Query: saints peter and paul's catholic church\n", + "Latitude: 51.8989077\n", + "Longitude: -8.4743188\n", + "Display name: Saints Peter and Paul's Catholic Church, Carey's Lane, The Marsh, Centre B ED, Cork, County Cork, Munster, T12 FH27, Éire / Ireland\n" + ] + } + ], + "source": [ + "from geopy.geocoders import Nominatim\n", + "\n", + "geolocator = Nominatim(user_agent=\"haystack-osm-cookbook-demo\")\n", + "\n", + "# Geo-decoding a name string into geocode\n", + "location_name = \"saints peter and paul's catholic church\"\n", + "location = geolocator.geocode(location_name)\n", + "\n", + "print(f\"Query: {location_name}\")\n", + "print(f\"Latitude: {location.latitude}\")\n", + "print(f\"Longitude: {location.longitude}\")\n", + "print(f\"Display name: {location.address}\")\n" + ] }, - "id": "tAjAAUc4cYKE", - "outputId": "b3a2df0a-146a-4bc5-b829-1736d4935bf6" - }, - "execution_count": 202, - "outputs": [ { - "output_type": "stream", - "name": "stderr", - "text": [ - "/usr/local/lib/python3.12/dist-packages/pydantic/json_schema.py:2324: PydanticJsonSchemaWarning: Default value is not JSON serializable; excluding default from JSON schema [non-serializable-default]\n", - " warnings.warn(message, PydanticJsonSchemaWarning)\n" - ] + "cell_type": "markdown", + "metadata": { + "id": "4nFzlApoj8_L" + }, + "source": [ + "...here we can just use the coordinate turple as the more conventional input. In this scenario, we start from acquiring all \"node\" with \"amenity\" within 1000 meters for future AI processing." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "id": "slQpVtXHWvIH" + }, + "outputs": [], + "source": [ + "from osm_integration_haystack import OSMFetcher\n", + "\n", + "CENTER = (51.8989077, -8.4743188) # (lat, lon)\n", + "RADIUS_M = 1000" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "id": "4hO-qR8Lewgj" + }, + "outputs": [], + "source": [ + "osm_fetcher = OSMFetcher(\n", + " preset_center=CENTER, # Cork, Ireland\n", + " preset_radius_m=RADIUS_M, # 1000m radius\n", + " target_osm_types=[\"node\"], # Only search nodes\n", + " target_osm_tags=[\"amenity\"], # Search amenity types\n", + " maximum_query_mb=2, # Limit query size\n", + " overpass_timeout=20\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ej4JUi-rfOLd" + }, + "source": [ + "In the context of OpenStreetMap, terms like `\"node\"` and `\"amenity\"` refer to well-defined [elements](https://wiki.openstreetmap.org/wiki/Elements) and [map features](https://wiki.openstreetmap.org/wiki/Map_features) that describe how real-world objects are encoded in the map data (for example, a café as a point node with an `amenity=cafe` tag). The exact tagging scheme is not the focus of this tutorial. In the following examples, we’ll use a small subset of these categories to keep the queries simple and focused.\n", + "\n", + "The `OSMFetcher` component wraps the Overpass API and exposes a few key parameters:\n", + "\n", + "- `preset_center: Optional[Tuple[float, float]]` \n", + " Default center point for all queries, as a `(latitude, longitude)` tuple. \n", + "\n", + "- `preset_radius_m: Optional[int]` \n", + " Default search radius in **meters** around the center. \n", + "\n", + "- `target_osm_types: Optional[Union[str, List[str]]]` \n", + " Which OSM element types to query: `\"node\"`, `\"way\"`, and/or `\"relation\"`. \n", + " If omitted, the fetcher queries all three: `[\"node\", \"way\", \"relation\"]`.\n", + "\n", + "- `target_osm_tags: Optional[Union[str, List[str]]]` \n", + " A list of top-level OSM tags to filter by, such as `[\"amenity\", \"tourism\", \"leisure\"]`. \n", + " If set, the Overpass query will only return elements that have at least one of these tags. \n", + " If left as `None`, the fetcher does **not** filter by tag and will return all matching elements for the chosen types.\n", + "\n", + "- `maximum_query_mb: Optional[int]` \n", + " Rough upper bound on the Overpass response size, in megabytes. \n", + " This is passed to Overpass as `maxsize` to avoid huge responses and timeouts (default: `5` MB).\n", + "\n", + "- `max_token: int` \n", + " Intended as a soft budget for how much data should be returned to downstream LLM components. \n", + " In an LLM/Agent setting, this can be used to limit or compress the total amount of text and metadata so that it fits comfortably within the model's context window (default: `12000`).\n", + "\n", + "- `overpass_timeout: Optional[int]` \n", + " Timeout for the Overpass API request, in seconds (default: `25`). \n", + " If the query is too heavy or the server is slow, this helps prevent the call from hanging indefinitely.\n", + "\n", + "In most map-based backends, the typical pattern is to accept concrete `(lat, lon)` coordinates (for example, from the frontend's map widget or the user's GPS location) and then query nearby OSM elements using these parameters.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "u4ePKvV9f6RM" + }, + "source": [ + "... then we transform the returned OpenStreetMap data into `Haystack Document`s." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "y_x4EHRKv9Te", + "outputId": "9cd49bde-a13c-4300-acb6-dcb2b5ac654f" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Current Query:\n", + "\n", + " [out:json][timeout:20][maxsize:2000000];\n", + " (\n", + " node[amenity](around:1000,51.8989077,-8.4743188);\n", + " );\n", + " out geom;\n", + " \n", + "Status: 200\n", + "Response: {\n", + " \"version\": 0.6,\n", + " \"generator\": \"Overpass API 0.7.62.8 e802775f\",\n", + " \"osm3s\": {\n", + " \"timestamp_osm_base\": \"2025-11-16T00:05:43Z\",\n", + " \"copyright\": \"The data included in this document is from www.ope...\n", + "[OSM_Doc_Converter] Reading Raw OSM GeoJson...\n", + "[OSM_Doc_Converter] Loaded 955 entries.\n", + "[OSM_Doc_Converter] Batch-processing data cleaning.\n" + ] + } + ], + "source": [ + "result = osm_fetcher.run()\n", + "documents = result[\"documents\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "XXzKvJRVgVPy" + }, + "source": [ + "### Inspecting a single `Document`\n", + "\n", + "Haystack represents each piece of retrieved data as a `Document` with two main parts:\n", + "\n", + "- `content`: human-readable, unstructured text. \n", + " This is what we usually embed, retrieve and show to the user. LLMs and retrievers\n", + " mainly \"look at\" this field.\n", + "\n", + "- `meta`: machine-readable, structured metadata stored as a Python dictionary. \n", + " This is where we keep all the fields that are useful for filtering, ranking or\n", + " business logic (ids, coordinates, categories, tags, etc.)." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "DgwnILdAwCyl", + "outputId": "02b553e4-d9be-4ddd-9165-6b8dfbca2672" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "📄 type: \n", + "\n", + "--- content ---\n", + "Cafe: Dukes, Carey's Lane, 4, Cork.\n", + "\n", + "--- meta keys ---\n", + "['source', 'osm_id', 'osm_type', 'lat', 'lon', 'name', 'category', 'tags', 'tags_norm', 'address', 'distance_m']\n", + "\n", + "--- full meta ---\n", + "{'address': {'city': 'Cork',\n", + " 'country': 'IE',\n", + " 'housenumber': '4',\n", + " 'street': \"Carey's Lane\"},\n", + " 'category': 'cafe',\n", + " 'distance_m': 28.70318839718862,\n", + " 'lat': 51.8991234,\n", + " 'lon': -8.474089,\n", + " 'name': 'Dukes',\n", + " 'osm_id': 1128095411,\n", + " 'osm_type': 'node',\n", + " 'source': 'openstreetmap',\n", + " 'tags': {'amenity': 'cafe',\n", + " 'cuisine': 'coffee_shop',\n", + " 'entrance': 'main',\n", + " 'internet_access': 'wlan',\n", + " 'phone': '00353214905877',\n", + " 'wheelchair': 'yes'},\n", + " 'tags_norm': {'amenity': 'cafe',\n", + " 'cuisine': 'coffee_shop',\n", + " 'entrance': 'main',\n", + " 'internet_access': 'wlan',\n", + " 'phone': '00353214905877',\n", + " 'wheelchair': True}}\n" + ] + } + ], + "source": [ + "from pprint import pprint\n", + "\n", + "first_doc = documents[1]\n", + "print(\"📄 type:\", type(first_doc))\n", + "\n", + "print(\"\\n--- content ---\")\n", + "print(first_doc.content)\n", + "\n", + "print(\"\\n--- meta keys ---\")\n", + "print(list(first_doc.meta.keys()))\n", + "\n", + "print(\"\\n--- full meta ---\")\n", + "pprint(first_doc.meta)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EpB5PMmOiSWH" + }, + "source": [ + "... and here is the preview of the preprocessed documents which will be passed to the subsequent pipeline." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "hEYMy0ZKwKcy", + "outputId": "eb08092b-ee51-49f8-f2ca-d58e8247765b" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Previewing first 5 documents:\n", + "\n", + "1. Koto\n", + " Type: restaurant\n", + " Distance: 27.9 m\n", + " Location: (51.8990101, -8.4739482)\n", + " Content: Restaurant: Koto, Carey's Lane, 6-7, T12 FH27. Tags: opening_hours=Mo-Su 12:00-22:00\n", + "\n", + "2. Dukes\n", + " Type: cafe\n", + " Distance: 28.7 m\n", + " Location: (51.8991234, -8.474089)\n", + " Content: Cafe: Dukes, Carey's Lane, 4, Cork.\n", + "\n", + "3. Soba Asian Street Food\n", + " Type: fast_food\n", + " Distance: 30.1 m\n", + " Location: (51.8989516, -8.4738856)\n", + " Content: Fast_food: Soba Asian Street Food.\n", + "\n", + "4. OffBeat Donuts\n", + " Type: fast_food\n", + " Distance: 35.1 m\n", + " Location: (51.8990968, -8.4739097)\n", + " Content: Fast_food: OffBeat Donuts, French Church Street, 17, Cork.\n", + "\n", + "5. Burritos and Blues\n", + " Type: fast_food\n", + " Distance: 43.6 m\n", + " Location: (51.899271, -8.4745565)\n", + " Content: Fast_food: Burritos and Blues, Paul Street, 9, Cork. Tags: opening_hours=Mo-We 12:00-20:00; Th-Sa 12:00-21:00; Su 13:00-...\n", + "\n" + ] + } + ], + "source": [ + "def preview_documents(docs, limit=5):\n", + " print(f\"Previewing first {min(len(docs), limit)} documents:\\n\")\n", + "\n", + " for i, doc in enumerate(docs[:limit], start=1):\n", + " name = doc.meta.get(\"name\", \"Unknown\")\n", + " category = doc.meta.get(\"category\", \"Unknown\")\n", + " distance = doc.meta.get(\"distance_m\", 0.0)\n", + " lat = doc.meta.get(\"lat\")\n", + " lon = doc.meta.get(\"lon\")\n", + "\n", + " print(f\"{i}. {name}\")\n", + " print(f\" Type: {category}\")\n", + " print(f\" Distance: {distance:.1f} m\")\n", + " print(f\" Location: ({lat}, {lon})\")\n", + " print(f\" Content: {doc.content[:120]}{'...' if len(doc.content) > 120 else ''}\")\n", + " print()\n", + "\n", + "preview_documents(documents, limit=5)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "R3WqXmSzyIbx" + }, + "source": [ + "## Part2: Pipeline to look for the nearest coffee shop" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GdsJW9bWjSJH" + }, + "source": [ + "I know that a query like “find the nearest coffee shop” is, by itself, a very simple geo-filtering task which you can solve it with a couple of distance calculations and a sort. That's why in this example, however, I frame it as an LLM task to show how preprocessing can enable richer logic on top of the same data.\n", + "\n", + "`OSMFetcher` converts each OpenStreetMap point of interest into a Haystack `Document` with two sides (as you have seen in the previous section):\n", + "\n", + "- `content` holds a short, human-readable description of the place (name, category, address, and a few tags).\n", + "- `meta` stores all the structured fields, such as `lat`, `lon`, `category`, `address`, and a pre-computed `distance_m` from the search center (the user's location passed into `OSMFetcher`).\n", + "\n", + "In a real pipeline you would typically embed the `content` of each Document so that the embeddings capture the semantic meaning of the place descriptions - for example whether the text mentions “laptop”, “Wi-Fi”, “study”, “quiet”, “busy bar”, “traditional pub”, and so on. At the same time, the numeric `distance_m` in `meta` gives you the classic “map-style” filter: how far this place is from the user.\n", + "\n", + "In this pipeline the LLM never has to implement raw geospatial math. Instead, it reads the semantic description in `content` and combines it with the pre-computed `distance_m` field to decide which places both match the user's intent and are close enough. The low-level geospatial logic is pushed into `OSMFetcher`, and the LLM focuses purely on semantic filtering and ranking.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eRZrGO22-IpS" + }, + "source": [ + "### Step 1. Build the Prompt and initialize a Pipeline\n", + "We begin by building prompt and specify the llm we are using." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "id": "kzPKYWR4y3rb" + }, + "outputs": [], + "source": [ + "from haystack import Pipeline\n", + "from haystack.components.builders import ChatPromptBuilder\n", + "from haystack.components.generators.chat import OpenAIChatGenerator\n", + "from haystack.dataclasses import ChatMessage\n", + "from haystack.utils import Secret" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "id": "t1AFay1Wy6HJ" + }, + "outputs": [], + "source": [ + "prompt_template = [\n", + " ChatMessage.from_system(\n", + " \"You are a geographic information assistant. \"\n", + " \"Based on the provided OpenStreetMap data, help the user find nearby places that match the user's query.\"\n", + " ),\n", + " ChatMessage.from_user(\n", + " \"\"\"\n", + " User location: {{ user_location }}\n", + " Search radius: {{ radius }}m\n", + " User query: {{ query }}\n", + "\n", + " Available location data:\n", + " {% for document in documents %}\n", + " - {{ document.content }}\n", + " Location: ({{ document.meta.lat }}, {{ document.meta.lon }})\n", + " Distance: {{ document.meta.distance_m }}m\n", + " Type: {{ document.meta.category }}\n", + " {% endfor %}\n", + "\n", + " Please:\n", + " 1. Find all locations that are relevant to the user's query\n", + " 2. Sort them by distance\n", + " 3. Recommend the nearest 3 locations\n", + " 4. Provide a short description for each\n", + "\n", + " Please respond in English.\n", + " \"\"\"\n", + " ),\n", + "]\n", + "\n", + "prompt_builder = ChatPromptBuilder(\n", + " template=prompt_template,\n", + " required_variables=[\"user_location\", \"radius\", \"query\", \"documents\"], # optional, depends on what your pipeline requires\n", + ")\n" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "id": "2m5fzhegy8MT" + }, + "outputs": [], + "source": [ + "llm = OpenAIChatGenerator(\n", + " api_key=Secret.from_env_var(\"OPENAI_API_KEY\"),\n", + " model=\"gpt-4o-mini\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WPIJ1ZwH-PwR" + }, + "source": [ + "Here we output the `osm_fetcher.documents` to `prompt_builder` and the `prompt_builder.prompt` to the selected llm." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "o9HPsQUky-V2", + "outputId": "793c2bc8-14e2-474f-a57b-ca48bbe58b63" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "\n", + "🚅 Components\n", + " - osm_fetcher: OSMFetcher\n", + " - prompt_builder: ChatPromptBuilder\n", + " - llm: OpenAIChatGenerator\n", + "🛤️ Connections\n", + " - osm_fetcher.documents -> prompt_builder.documents (List[Document])\n", + " - prompt_builder.prompt -> llm.messages (list[ChatMessage])" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "coffee_pipeline = Pipeline()\n", + "coffee_pipeline.add_component(\"osm_fetcher\", osm_fetcher)\n", + "coffee_pipeline.add_component(\"prompt_builder\", prompt_builder)\n", + "coffee_pipeline.add_component(\"llm\", llm)\n", + "\n", + "# documents to prompt_builder\n", + "coffee_pipeline.connect(\"osm_fetcher.documents\", \"prompt_builder.documents\")\n", + "# ChatPromptBuilder output toward prompt(List[ChatMessage]) as llm.messages\n", + "coffee_pipeline.connect(\"prompt_builder.prompt\", \"llm.messages\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bf3B4GkQAGsR" + }, + "source": [ + "### Step 2. Query with natural language" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": { + "id": "gLebpkTG1JD8" + }, + "outputs": [], + "source": [ + "search_query = \"find me the nearest coffee shop for work, needs wifi\"" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "O87vWjTSzAI0", + "outputId": "f5869513-eb92-4fde-aa90-ad3ccf5b2b0f" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Current Query:\n", + "\n", + " [out:json][timeout:20][maxsize:2000000];\n", + " (\n", + " node[amenity](around:1000,51.8989077,-8.4743188);\n", + " );\n", + " out geom;\n", + " \n", + "Status: 200\n", + "Response: {\n", + " \"version\": 0.6,\n", + " \"generator\": \"Overpass API 0.7.62.8 e802775f\",\n", + " \"osm3s\": {\n", + " \"timestamp_osm_base\": \"2025-11-16T00:08:54Z\",\n", + " \"copyright\": \"The data included in this document is from www.ope...\n", + "[OSM_Doc_Converter] Reading Raw OSM GeoJson...\n", + "[OSM_Doc_Converter] Loaded 955 entries.\n", + "[OSM_Doc_Converter] Batch-processing data cleaning.\n", + "Role: ChatRole.ASSISTANT\n", + "\n", + "Assistant reply:\n", + "\n", + "Based on your query for the nearest coffee shops with Wi-Fi in Cork, here are the top three recommendations sorted by distance:\n", + "\n", + "1. **Dukes**\n", + " - **Type:** Cafe\n", + " - **Location:** Carey's Lane, 4, Cork.\n", + " - **Distance:** 28.7m\n", + " - **Description:** A cozy cafe offering a selection of coffee and pastries, perfect for a work session. \n", + "\n", + "2. **Rebel Coffee Cork**\n", + " - **Type:** Cafe\n", + " - **Location:** French Church Street, 4, Cork.\n", + " - **Distance:** 53.1m\n", + " - **Description:** A great spot for coffee lovers, providing a comfortable work environment with Wi-Fi available.\n", + "\n", + "3. **Cork Coffee Roasters**\n", + " - **Type:** Cafe\n", + " - **Location:** (specific street not provided).\n", + " - **Distance:** 61.5m\n", + " - **Description:** Known for its freshly brewed coffee and inviting ambiance, it's a great place to get some work done with Wi-Fi access.\n", + "\n", + "These options are close to your location and provide the ideal settings for work while you enjoy good coffee.\n" + ] + } + ], + "source": [ + "user_location = \"Cork, Ireland\"\n", + "radius = 1000\n", + "\n", + "result = coffee_pipeline.run(\n", + " {\n", + " \"osm_fetcher\": {},\n", + " \"prompt_builder\": {\n", + " \"user_location\": user_location,\n", + " \"radius\": radius,\n", + " \"query\": search_query,\n", + " },\n", + " }\n", + ")\n", + "\n", + "reply = result[\"llm\"][\"replies\"][0]\n", + "print(\"Role:\", reply.role)\n", + "print(\"\\nAssistant reply:\\n\")\n", + "print(reply.text)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ENWaEbSeAS2S" + }, + "source": [ + "If you can recalled the document I showed in the previous section, you'll notice that `Dukes` has a tag that saids `'internet_access': 'wlan'` which matches the result we are looking for!" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wXA_fXMML6cT" + }, + "source": [ + "## Part 3 : Planning an afternoon itinerary with an Agent and OSM tools\n", + "\n", + "Of course, in real application we are looking for a more open-ended, multi-step reasoning task. Rather than answering a single question like “Where's the nearest coffee shop that has wifi”, the user now gives a vague but structured request: plan an afternoon itinerary with three stages — a historic site, a quiet cafe to work in, and a nice bar or pub nearby.\n", + "\n", + "To tackle this, we expose `OSMFetcher` as a tool and give it to an agent built with `OpenAIChatGenerator`. The agent receives a list of nearby places and is solely responsible for selecting, organizing, and justifying an itinerary — using both semantic and geographic reasoning.\n", + "\n", + "This setup allows the LLM to act more like a local guide: instead of answering one-shot prompts, it explores tool outputs and composes a meaningful plan in response to an open-ended user request." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "r3aiYHTcBs5d" + }, + "source": [ + "### Step 1. Initial Setup" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "94GznuQYL7i0" + }, + "outputs": [], + "source": [ + "from osm_integration_haystack import OSMFetcher\n", + "\n", + "CENTER = (51.898403, -8.473978)\n", + "RADIUS_M = 1000\n", + "\n", + "itinerary_fetcher = OSMFetcher(\n", + " preset_center=CENTER,\n", + " preset_radius_m=RADIUS_M,\n", + " target_osm_types=[\"node\"],\n", + " target_osm_tags=[\n", + " \"amenity\",\n", + " \"tourism\",\n", + " \"leisure\",\n", + " ],\n", + " maximum_query_mb=4,\n", + " overpass_timeout=30,\n", + ")\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "nvAaFQC3XJ0I", + "outputId": "120f97bf-c2da-475e-e18a-da90fdada9ab" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "WARNING:haystack.components.builders.chat_prompt_builder:ChatPromptBuilder has 2 prompt variables, but `required_variables` is not set. By default, all prompt variables are treated as optional, which may lead to unintended behavior in multi-branch pipelines. To avoid unexpected execution, ensure that variables intended to be required are explicitly set in `required_variables`.\n" + ] + } + ], + "source": [ + "from haystack.components.builders import ChatPromptBuilder\n", + "from haystack.dataclasses import ChatMessage\n", + "\n", + "itinerary_prompt_template = [\n", + " ChatMessage.from_user(\n", + " \"User request:\\n{{ user_request }}\\n\\n\"\n", + " \"Here are some nearby locations from OpenStreetMap:\\n\"\n", + " \"{% if documents %}\"\n", + " \"{% for doc in documents[:40] %}\"\n", + " \"- {{ doc.meta.get('name', 'Unknown') }} \"\n", + " \"(type: {{ doc.meta.get('category', 'unknown') }}, \"\n", + " \"distance: {{ '%.1f'|format(doc.meta.get('distance_m', 0)) }} m)\\n\"\n", + " \"{% endfor %}\"\n", + " \"{% else %}\"\n", + " \"No locations available.\\n\"\n", + " \"{% endif %}\\n\\n\"\n", + " ),\n", + "]\n", + "\n", + "itinerary_prompt_builder = ChatPromptBuilder(template=itinerary_prompt_template)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PXDvvPGSCY8Y" + }, + "source": [ + "### Step 2. Build a pipeline for the Agent tool\n", + "\n", + "In the agentic scenario, you are **STRONGLY ADVICED** to wrap the `OSMFetcher` and `ChatPromptBuilder` into a single pipeline. If you exposed `OSMFetcher` directly as a tool, the agent would receive a large, complex list of Documents — which can easily exceed the context window and make planning harder. By composing this pipeline first and then wrapping it as a `PipelineTool`, we give the agent just enough curated information to reason effectively.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "P6NvzwTEXZRl", + "outputId": "1cc34d1c-6842-46cf-c356-5ddf1dd60bab" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "\n", + "🚅 Components\n", + " - itinerary_osm_fetcher: OSMFetcher\n", + " - itinerary_prompt_builder: ChatPromptBuilder\n", + "🛤️ Connections\n", + " - itinerary_osm_fetcher.documents -> itinerary_prompt_builder.documents (List[Document])" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from haystack import Pipeline\n", + "\n", + "agent_itinerary_pipeline = Pipeline()\n", + "agent_itinerary_pipeline.add_component(\"itinerary_osm_fetcher\", itinerary_fetcher)\n", + "agent_itinerary_pipeline.add_component(\"itinerary_prompt_builder\", itinerary_prompt_builder)\n", + "\n", + "# Pass OSMFetcher's documents into ChatPromptBuilder's template_variables.documents\n", + "agent_itinerary_pipeline.connect(\n", + " \"itinerary_osm_fetcher.documents\",\n", + " \"itinerary_prompt_builder.documents\",\n", + ")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "hoBe5d7Gb3UK" + }, + "source": [ + "...we first test the pipeline output with a simple user prompt." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "bFNJGinpb55q", + "outputId": "22330729-b9d7-4962-9115-90af61f37370" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Current Query:\n", + "\n", + " [out:json][timeout:30][maxsize:4000000];\n", + " (\n", + " node[amenity](around:1000,51.898403,-8.473978);\n", + "node[tourism](around:1000,51.898403,-8.473978);\n", + "node[leisure](around:1000,51.898403,-8.473978);\n", + " );\n", + " out geom;\n", + " \n", + "Status: 200\n", + "Response: {\n", + " \"version\": 0.6,\n", + " \"generator\": \"Overpass API 0.7.62.8 e802775f\",\n", + " \"osm3s\": {\n", + " \"timestamp_osm_base\": \"2025-11-16T00:45:46Z\",\n", + " \"copyright\": \"The data included in this document is from www.ope...\n", + "[OSM_Doc_Converter] Reading Raw OSM GeoJson...\n", + "[OSM_Doc_Converter] Loaded 1052 entries.\n", + "[OSM_Doc_Converter] Batch-processing data cleaning.\n", + "ChatRole.USER :\n", + " User request:\n", + "I want to spend an afternoon in Cork city centre...\n", + "\n", + "Here are some nearby locations from OpenStreetMap:\n", + "- bicycle_parking (type: bicycle_parking, distance: 2.0 m)\n", + "- bicycle_parking (type: bicycle_parking, distance: 9.9 m)\n", + "- bicycle_parking (type: bicycle_parking, distance: 12.5 m)\n", + "- bicycle_parking (type: bicycle_parking, distance: 14.5 m)\n", + "- waste_basket (type: waste_basket, distance: 15.4 m)\n", + "- bicycle_parking (type: bicycle_parking, distance: 21.5 m)\n", + "- bicycle_parking (type: bicycle_parking, distance: 23.6 m)\n", + "- bicycle_parking (type: bicycle_parking, distance: 23.8 m)\n", + "- Cork Walks (type: information, distance: 25.4 m)\n", + "- waste_basket (type: waste_basket, distance: 26.6 m)\n", + "- waste_basket (type: waste_basket, distance: 28.1 m)\n", + "- The Pavilion (type: events_venue, distance: 29.4 m)\n", + "- waste_basket (type: waste_basket, distance: 29.5 m)\n", + "- Burger King (type: fast_food, distance: 30.5 m)\n", + "- Fellini (type: cafe, distance: 30.9 m)\n", + "- bicycle_parking (type: bicycle_parking, distance: 35.3 m)\n", + "- The Pana Shuffle (type: artwork, distance: 35.5 m)\n", + "- Intermission Bar (type: bar, distance: 36.1 m)\n", + "- waste_basket (type: waste_basket, distance: 37.2 m)\n", + "- waste_basket (type: waste_basket, distance: 39.0 m)\n", + "- AbraKebabra (type: fast_food, distance: 40.0 m)\n", + "- Mutton Lane Inn (type: pub, distance: 41.0 m)\n", + "- Cafe Mexicana (type: restaurant, distance: 45.2 m)\n", + "- waste_basket (type: waste_basket, distance: 45.3 m)\n", + "- waste_basket (type: waste_basket, distance: 46.7 m)\n", + "- Boots (type: pharmacy, distance: 50.7 m)\n", + "- bicycle_parking (type: bicycle_parking, distance: 50.8 m)\n", + "- bench (type: bench, distance: 58.2 m)\n", + "- bicycle_parking (type: bicycle_parking, distance: 58.9 m)\n", + "- fountain (type: fountain, distance: 60.6 m)\n", + "- 14A Restaurant (type: restaurant, distance: 61.0 m)\n", + "- Oyster Tavern (type: pub, distance: 61.0 m)\n", + "- waste_basket (type: waste_basket, distance: 61.2 m)\n", + "- Soba Asian Street Food (type: fast_food, distance: 61.3 m)\n", + "- The Farmgate Café (type: restaurant, distance: 66.6 m)\n", + "- Koto (type: restaurant, distance: 67.5 m)\n", + "- Bank of Ireland (type: bank, distance: 69.2 m)\n", + "- Krispy Kreme (type: fast_food, distance: 71.5 m)\n", + "- Akira (type: restaurant, distance: 73.3 m)\n", + "- Euronet (type: atm, distance: 74.6 m)\n", + "\n", + " \n", + "\n" + ] + } + ], + "source": [ + "test_res = agent_itinerary_pipeline.run(\n", + " {\n", + " \"itinerary_prompt_builder\": {\n", + " \"user_request\": \"I want to spend an afternoon in Cork city centre...\",\n", + " \"template_variables\": {}\n", + " }\n", + " }\n", + ")\n", + "\n", + "msgs = test_res[\"itinerary_prompt_builder\"][\"prompt\"]\n", + "for m in msgs:\n", + " print(m.role, \":\\n\", m.text, \"\\n\")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-PPU018bcMbT" + }, + "source": [ + "### Step 3. **Wrap-up** the pipeline with `PipelineTool`.\n", + "\n", + "This will be used by the agent as a single callable tool, while also helping reduce total token usage and avoid exceeding GPT's context limit (e.g., 12,000 tokens). Of course, the actual token usage depends on your own configuration - in particular, the size of the search area and how much detail each fetched location includes by the `OSMFetcher`." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "tAjAAUc4cYKE", + "outputId": "9e4064f5-e5c7-472c-da50-319c28ae1745" + }, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/usr/local/lib/python3.12/dist-packages/pydantic/json_schema.py:2324: PydanticJsonSchemaWarning: Default value is not JSON serializable; excluding default from JSON schema [non-serializable-default]\n", + " warnings.warn(message, PydanticJsonSchemaWarning)\n" + ] + } + ], + "source": [ + "from haystack.tools import PipelineTool\n", + "\n", + "osm_itinerary_tool = PipelineTool(\n", + " pipeline=agent_itinerary_pipeline,\n", + " name=\"osm_itinerary_tool\",\n", + " description=(\n", + " \"Fetches nearby POIs and \"\n", + " \"builds a chat-style prompt summarizing.\"\n", + " ),\n", + "\n", + " input_mapping={\n", + " \"user_request\": [\"itinerary_prompt_builder.user_request\"],\n", + " },\n", + "\n", + " output_mapping={\n", + " \"itinerary_prompt_builder.prompt\": \"prompt\",\n", + " },\n", + ")\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pjXuB_uUIfmD" + }, + "source": [ + "### Step 4. Create the Agent\n", + "\n", + "We now create a `Haystack Agent` that knows how to use our `osm_itinerary_tool`.\n", + "This agent uses a chat-based LLM (`OpenAIChatGenerator`) and is given both:\n", + "\n", + "* The `PipelineTool` (so it can fetch and summarize nearby POIs)\n", + "* A `system_prompt` (so it knows **when** to call the tool and **how** to respond)\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "id": "y5pk5UBzZQ0m" + }, + "outputs": [], + "source": [ + "from haystack.components.generators.chat import OpenAIChatGenerator\n", + "from haystack.components.agents import Agent\n", + "from haystack.dataclasses import ChatMessage\n", + "from haystack.utils import Secret\n", + "\n", + "itinerary_llm = OpenAIChatGenerator(\n", + " api_key=Secret.from_env_var(\"OPENAI_API_KEY\"),\n", + " model=\"gpt-4o-mini\",\n", + ")\n", + "\n", + "itinerary_agent = Agent(\n", + " chat_generator=itinerary_llm,\n", + " tools=[osm_itinerary_tool],\n", + " system_prompt=(\n", + " \"You are a helpful local guide in Cork, Ireland.\\n\\n\"\n", + " \"When the user asks you to plan an itinerary, first call 'osm_itinerary_tool'. \"\n", + " \"This tool returns a list of chat messages under the field 'prompt', which already \"\n", + " \"contains the user's request and a list of nearby locations.\\n\\n\"\n", + " \"Read those messages carefully, then respond with 1–2 itineraries \"\n", + " \"(church -> cafe -> bar/pub), including approximate walking distances.\"\n", + " ),\n", + ")\n", + "\n", + "itinerary_agent.warm_up()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "dGAxWS6BKnSe" + }, + "source": [ + "...then we give it a user prompt that is complicated enough." + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ikF1Grx0Z7Bo", + "outputId": "706d8cdd-df97-45e5-b731-cac92179c091" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Current Query:\n", + "\n", + " [out:json][timeout:30][maxsize:4000000];\n", + " (\n", + " node[amenity](around:1000,51.898403,-8.473978);\n", + "node[tourism](around:1000,51.898403,-8.473978);\n", + "node[leisure](around:1000,51.898403,-8.473978);\n", + " );\n", + " out geom;\n", + " \n", + "Status: 200\n", + "Response: {\n", + " \"version\": 0.6,\n", + " \"generator\": \"Overpass API 0.7.62.8 e802775f\",\n", + " \"osm3s\": {\n", + " \"timestamp_osm_base\": \"2025-11-16T01:00:12Z\",\n", + " \"copyright\": \"The data included in this document is from www.ope...\n", + "[OSM_Doc_Converter] Reading Raw OSM GeoJson...\n", + "[OSM_Doc_Converter] Loaded 1052 entries.\n", + "[OSM_Doc_Converter] Batch-processing data cleaning.\n", + "Final role: ChatRole.ASSISTANT\n", + "\n", + "Assistant final reply:\n", + "\n", + "Here are two possible itineraries for spending an afternoon in Cork city centre, incorporating your requests:\n", + "\n", + "### Itinerary 1\n", + "1. **Visit St. Fin Barre's Cathedral** \n", + " - **Distance from Starting Point:** Approximately 1.0 km (12 minutes walk)\n", + " - **Why:** This stunning Gothic cathedral is one of Cork's most significant landmarks, offering beautiful architecture and a peaceful atmosphere.\n", + "\n", + "2. **Dentist Appointment - The Dental Suite** \n", + " - **Distance from St. Fin Barre's Cathedral:** Approximately 1.0 km (12 minutes walk)\n", + " - **Why:** Known for patient comfort and care, this dental clinic is conveniently located and well-rated in the city.\n", + "\n", + "3. **End at The Oliver Plunkett Pub** \n", + " - **Distance from The Dental Suite:** Approximately 500 m (6 minutes walk)\n", + " - **Why:** A lively pub known for its music and great atmosphere, making it a perfect place to unwind after your appointment.\n", + "\n", + "### Itinerary 2\n", + "1. **Visit Christ Church Cathedral** \n", + " - **Distance from Starting Point:** Approximately 800 m (10 minutes walk)\n", + " - **Why:** This beautiful historic site offers a glimpse into the rich history of Cork and lovely views.\n", + "\n", + "2. **Dentist Appointment - Cork Dental Care** \n", + " - **Distance from Christ Church Cathedral:** Approximately 600 m (8 minutes walk)\n", + " - **Why:** A well-regarded dental clinic known for its friendly service is located conveniently near the city centre.\n", + "\n", + "3. **End at The Flying Enterprise Bar** \n", + " - **Distance from Cork Dental Care:** Approximately 400 m (5 minutes walk)\n", + " - **Why:** This cozy bar has a welcoming atmosphere and is perfect for relaxing and enjoying a drink post-dentist visit.\n", + "\n", + "### Summary\n", + "Both itineraries offer a blend of historic charm, needed health care, and a chance to relax in a vibrant part of the city. They are designed to ensure that all locations are within reasonable walking distances of each other, making for a pleasant afternoon in Cork.\n" + ] + } + ], + "source": [ + "user_request = (\n", + " \"I want to spend an afternoon in Cork city centre. \"\n", + " \"Please plan 1–2 possible itineraries where I:\\n\"\n", + " \"1) start by visiting a church or historic religious site,\\n\"\n", + " \"2) then go to the dentist for painful torture,\\n\"\n", + " \"3) and finally end the day in a nice bar or pub nearby.\\n\\n\"\n", + " \"All places should be within reasonable walking distance. \"\n", + " \"For each itinerary, please include the place names, approximate distances between stops, \"\n", + " \"and a short explanation of why you chose them.\"\n", + ")\n", + "\n", + "result = itinerary_agent.run(messages=[ChatMessage.from_user(user_request)])\n", + "\n", + "final_msg = result[\"messages\"][-1]\n", + "print(\"Final role:\", final_msg.role)\n", + "print(\"\\nAssistant final reply:\\n\")\n", + "print(final_msg.text)\n" + ] } - ] - }, - { - "cell_type": "code", - "source": [ - "from haystack.components.generators.chat import OpenAIChatGenerator\n", - "from haystack.components.agents import Agent\n", - "from haystack.dataclasses import ChatMessage\n", - "from haystack.utils import Secret\n", - "\n", - "itinerary_llm = OpenAIChatGenerator(\n", - " api_key=Secret.from_env_var(\"OPENAI_API_KEY\"),\n", - " model=\"gpt-4o-mini\",\n", - ")\n", - "\n", - "itinerary_agent = Agent(\n", - " chat_generator=itinerary_llm,\n", - " tools=[osm_itinerary_tool],\n", - " system_prompt=(\n", - " \"You are a helpful local guide in Cork, Ireland.\\n\\n\"\n", - " \"When the user asks you to plan an itinerary, first call 'osm_itinerary_tool'. \"\n", - " \"This tool returns a list of chat messages under the field 'prompt', which already \"\n", - " \"contains the user's request and a list of nearby locations.\\n\\n\"\n", - " \"Read those messages carefully, then respond with 1–2 itineraries \"\n", - " \"(church -> cafe -> bar/pub), including approximate walking distances.\"\n", - " ),\n", - ")\n", - "\n", - "itinerary_agent.warm_up()" - ], - "metadata": { - "id": "y5pk5UBzZQ0m" - }, - "execution_count": 203, - "outputs": [] - }, - { - "cell_type": "code", - "source": [ - "user_request = (\n", - " \"I want to spend an afternoon in Cork city centre. \"\n", - " \"Please plan 1–2 possible itineraries where I:\\n\"\n", - " \"1) start by visiting a church or historic religious site,\\n\"\n", - " \"2) then go to a quiet cafe where I can study or work on my laptop,\\n\"\n", - " \"3) and finally end the day in a nice bar or pub nearby.\\n\\n\"\n", - " \"All places should be within reasonable walking distance. \"\n", - " \"For each itinerary, please include the place names, approximate distances between stops, \"\n", - " \"and a short explanation of why you chose them.\"\n", - ")\n", - "\n", - "result = itinerary_agent.run(messages=[ChatMessage.from_user(user_request)])\n", - "\n", - "final_msg = result[\"messages\"][-1]\n", - "print(\"Final role:\", final_msg.role)\n", - "print(\"\\nAssistant final reply:\\n\")\n", - "print(final_msg.text)\n" - ], - "metadata": { + ], + "metadata": { + "accelerator": "GPU", "colab": { - "base_uri": "https://localhost:8080/" + "gpuType": "A100", + "provenance": [] }, - "id": "ikF1Grx0Z7Bo", - "outputId": "814423b0-872a-4359-d054-b5a45953a77a" - }, - "execution_count": 204, - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "Current Query:\n", - "\n", - " [out:json][timeout:30][maxsize:4000000];\n", - " (\n", - " node[amenity](around:1000,51.898403,-8.473978);\n", - "node[tourism](around:1000,51.898403,-8.473978);\n", - "node[leisure](around:1000,51.898403,-8.473978);\n", - " );\n", - " out geom;\n", - " \n", - "Status: 200\n", - "Response: {\n", - " \"version\": 0.6,\n", - " \"generator\": \"Overpass API 0.7.62.8 e802775f\",\n", - " \"osm3s\": {\n", - " \"timestamp_osm_base\": \"2025-11-15T16:56:42Z\",\n", - " \"copyright\": \"The data included in this document is from www.ope...\n", - "[OSM_Doc_Converter] Reading Raw OSM GeoJson...\n", - "[OSM_Doc_Converter] Loaded 1052 entries.\n", - "[OSM_Doc_Converter] Batch-processing data cleaning.\n", - "Final role: ChatRole.ASSISTANT\n", - "\n", - "Assistant final reply:\n", - "\n", - "Here are two possible itineraries for an afternoon in Cork city centre:\n", - "\n", - "### Itinerary 1:\n", - "1. **Start: St. Anne's Shandon Church**\n", - " - **Description:** This iconic church is famous for its stunning architecture and views from the tower. It's an excellent spot to explore Cork's history.\n", - " - **Distance to next stop:** ~1.1 km (approx. 14 minutes walk)\n", - "\n", - "2. **Stop 2: The Farmgate Café**\n", - " - **Description:** Located at the English Market, this café offers a cozy atmosphere, delicious food, and a good environment for studying or working on your laptop.\n", - " - **Distance to next stop:** ~0.5 km (approx. 6 minutes walk)\n", - "\n", - "3. **End: Mutton Lane Inn**\n", - " - **Description:** A charming pub with a fantastic selection of local beers and a vibrant atmosphere, perfect for winding down after a day of exploration.\n", - " - **Total Walking Distance:** ~1.6 km\n", - "\n", - "---\n", - "\n", - "### Itinerary 2:\n", - "1. **Start: St. Patrick's Street Church**\n", - " - **Description:** This historic church is known for its beautiful stained glass and serene ambiance, ideal for a peaceful start to your afternoon.\n", - " - **Distance to next stop:** ~0.8 km (approx. 10 minutes walk)\n", - "\n", - "2. **Stop 2: Fellini**\n", - " - **Description:** A quaint café popular among students and remote workers, offering a quiet space and great coffee for studying or working.\n", - " - **Distance to next stop:** ~0.3 km (approx. 4 minutes walk)\n", - "\n", - "3. **End: Intermission Bar**\n", - " - **Description:** A laid-back bar with excellent service, perfect for enjoying local drinks and soaking in the evening atmosphere after a productive day.\n", - " - **Total Walking Distance:** ~1.1 km\n", - "\n", - "Both itineraries include paths that are easy to navigate and will lead you through some of Cork's most delightful locations. Enjoy your afternoon!\n" - ] + "kernelspec": { + "display_name": "Python 3", + "name": "python3" + }, + "language_info": { + "name": "python" } - ] - } - ] -} \ No newline at end of file + }, + "nbformat": 4, + "nbformat_minor": 0 +}