Skip to content

Latest commit

 

History

History
42 lines (27 loc) · 3.73 KB

File metadata and controls

42 lines (27 loc) · 3.73 KB

iOS Shortcuts Agent Setup

This guide will walk you through setting up and using the iOS Shortcuts Agent framework.

Prerequisites

  • An iPhone or iPad running iOS 14 or later
  • The Shortcuts app installed on your device
  • Access to your iCloud Drive

Installation

  1. Download the shortcuts: Find a way to get the shortcut files into your iCloud Drive. This can be done on a Mac, through iCloud.com on Windows, or by visiting GitHub in a mobile browser.
  2. Import the shortcuts: Open the Shortcuts app on your iPhone or iPad. Navigate to the folder in iCloud Drive where you saved the shortcut files. Tap on each shortcut file to import it into the Shortcuts app.

TLDR

  • Download and import the provided shortcuts into your Shortcuts app.
  • Create or edit an agent profile as a context object in your shortcut.
  • Use the "Chain Gemini API Calls" shortcut to process the agent's context and interact with the Gemini API.

Building Your Own Agent Shortcut

To leverage this framework, you'll create your own custom shortcuts that utilize the provided agent functions. The general workflow for these custom shortcuts involves three key steps:

  1. Create an Agent Profile: Your shortcut will first define an "agent profile." This is an initial context object that sets up the parameters and persona for your AI agent. Examples can be seen at Example Agent Profiles.
  2. Load and Edit Agent Profile: Load the agent profile into your custom shortcut. You can then edit this agent profile within the shortcut as needed. This might include adding a user prompt directly to the context object or simulating function calls by adding them to the context object. A 'blank' example can be seen at the Gemini Conversation Shortcut.
  3. Chain Gemini API Calls: Finally, your shortcut will pass this initial context object to the "Chain Gemini API Calls" function provided within this repository. This function orchestrates the interaction with the Gemini API, allowing your agent to process information, generate responses, and potentially perform further actions based on the initial context and the agent profile's capabilities.

The "Chain Gemini API Calls" Shortcut: The Backend Brains

The "Chain Gemini API Calls" shortcut is the core intelligence of this framework, acting as the backend orchestrator for your AI agent. Here's a breakdown of its key functionalities:

  1. Sends Input to Gemini API: It takes the input, which includes your agent's context and profile, and sends it to the Gemini API for processing.
  2. Processes Gemini Output: Upon receiving a response from the Gemini API, this shortcut parses the output to determine the next action.
  3. Displays Model Text Output: If the Gemini API's response includes a text output, the shortcut will display this directly to the user.
  4. Handles Function Calls: The shortcut checks if the model has requested a function call. If so, it will execute the corresponding shortcut on your iPhone. It is crucial that the name of the function called by the model in its response precisely matches the name of the shortcut on your device.
  5. Detects Task Completion: The shortcut also monitors for the model calling a specific "finished task" function. This signal indicates that the agent has completed its objective, allowing the "Chain Gemini API Calls" shortcut to end the overall response.

This modular approach allows you to build a variety of agent-like behaviors by customizing the agent profile and initial context for different tasks. This intricate process facilitates dynamic, multi-turn interactions where the AI agent can respond with text, call other Shortcuts for specific actions, and intelligently know when its task is complete.