- Purpose of the Project: To develop an AI-powered agent that can access and query data from the geotour.gr WordPress website to provide users with information and answer their questions about places of interest in Crete. This will enhance the user experience on the website by offering an interactive and informative way to explore Crete’s attractions.
- Overview of the AI Agent: The agent utilizes the pydantic-ai library to interact with the Gemini language model. It retrieves data from a custom WordPress API endpoint, asks for a keyword as a first step from the user and then ask them for questions as step two, from which it constructs prompts for Gemini, and generates human-readable answers by combining Gemini’s responses with relevant information from the retrieved data.
Current Implementation
Schema-Aware Prompts
To improve the agent’s ability to understand and interpret the data from the API, schema-aware prompts have been introduced. This involves:
- Defining a schema for the JSON data. This is a modified version of the Geotour schema, and these modifications are implemented on the construction function of the custom endpoint.Â
- Translating the schema into a clear natural language description within the prompt.
- Modifying the API response to align with the schema and make the data more readily interpretable by Gemini. Â
By incorporating the schema information into the prompts, Gemini can better understand the structure and meaning of the data, leading to more accurate and relevant answers. Â
Core Components (Code and Functionality)
The AI agent is implemented in Python using the pydantic-ai library, which facilitates interaction with the Gemini language model. The code comprises several key functions:
- get_listing_data(): Fetches data from the custom geotour.gr WordPress API, handles Brotli decompression, and retries failed requests.
- answer_question(): Orchestrates the interaction with Gemini by constructing prompts, sending requests, and processing responses.
The agent has been successfully integrated into the geotour.gr website using a custom WordPress plugin. The JavaScript code handles the communication between the frontend and the Python agent via AJAX requests.
API Interaction and Data Handling
The agent interacts with a custom WordPress API endpoint (https://www.geotour.gr/wp-json/geotourai/v1/listings) to retrieve data about places of interest in Crete. It handles potential errors, including Brotli decompression errors and HTTP request failures, using retries and fallback mechanisms.
User Interface and Search Term Input
A user-friendly interface has been implemented to enhance user interaction. The UI dynamically changes based on user actions:
- The user is prompted to input the search term.
- This search term is displayed and can be cleared.
- The button label changes between “Set” and “Ask”.
- The placeholder text updates to guide user input.
- Tooltips provide additional context.
- Previous questions and answers are displayed while the search term is active.
Prompt Construction and Gemini Integration
The answer_question() function constructs prompts for the Gemini language model. These prompts include the user’s question and a subset of the retrieved data filtered by the search term. The agent interacts with Gemini using the pydantic-ai library.
Answer Generation and Presentation
The agent generates human-readable answers by combining Gemini’s responses with relevant information extracted from the retrieved data. It presents the answers in a structured format. URLs are converted to links using the add_href_tags() function.
Strengths
- Leverages the power of the Gemini language model for natural language understanding and generation.
- Interacts with a live WordPress API to provide up-to-date information.
- Handles errors and inconsistencies in the API response.
Limitations
- The agent’s ability to reason about relationships between different fields or perform complex queries on the data is limited.
- The extraction of a meaningful keyword from what the user is asking was not successful, so a two-step request was implemented as a workaround for the most relevant term extraction.
- The prompt size needs to be managed carefully to avoid exceeding the model’s context window and increase a bit the performance which now is lacking. The prompt size is directly correlated with the most relevant term, as this term is used to filter the information that is returned from the REST API and is fed into the agent.
Long-Term Goals
-
- Enhance Contextual Understanding: Improve the agent’s ability to understand the context of user questions, especially follow-up questions, by refining the get_relevant_history() function and implementing techniques like pronoun resolution, location awareness, and keyword expansion.
- Improve Response Quality: Generate more accurate, informative, and relevant responses by optimizing prompt construction, incorporating schema-aware prompts, and potentially experimenting with different Gemini models or fine-tuning techniques.
- Expand Functionality: Add more advanced features to the agent, such as:
- Geolocation: Utilize the position field or other location data to answer location-based questions and provide recommendations.
- User Preferences: Allow users to specify their preferences (e.g., types of places, historical periods) and personalize the responses accordingly.
Optimize Performance and Scalability: Ensure the agent can handle multiple users and requests efficiently, optimize data retrieval and processing, and maintain a responsive user experience even under high load.