Vector Database Design Document

🎯 Overview

This document outlines the design of the vector database for the "MyOneTrueAllyPrototype" project.

The core purpose is to enable advanced semantic search for the project's key features, allowing the AI to act as a "true ally." By vectorizing user's personalized data stored in the Memory table and list items (EntryItem), we will provide the foundation for Gemini 2.5 Flash to generate highly personalized and context-aware responses.

By integrating a vector database with our relational database, we will build a robust system that enables the AI to accurately understand past user context, preferences, and specific data points (e.g., specific locations, categories).

🔧Technical Details

Embedding Model
- Provider: Google
- Model Name: text-embedding-004
- Dimension: 768
Vector Database
- Provider: Pinecone
- Index Name: my-one-true-ally
- Distance Metric: cosine
Chunking Strategy
- EntryItem: No chunking.
- Memory: This table stores AI-generated summaries and user-provided, manually registered data.
  - Manual Data: No chunking. These entries are intentionally short and concise.
  - AI Summaries: We will primarily not perform chunking, as summaries are expected to be kept short (e.g., under 500 characters). However, to handle long summaries that may occasionally be generated, we will implement one of the following strategies:
    - Strategy A (Re-generation): The system will be designed to re-generate the summary, if it exceeds the length limit, to ensure conciseness.
    - Strategy B (Chunking): A tool like RecursiveCharacterTextSplitter will be used to chunk excessively long summaries.
RAG (Retrieval-Augmented Generation) Workflow
- Use the text-embedding-004 model to vectorize the user's input.
- Perform a similarity search on the Pinecone index with the vectorized query.
- Filter the results using metadata.
- Retrieve the top-k most relevant text chunks.
- Combine the retrieved chunks with the original user query and a prompt to provide context to the LLM (Gemini 2.5 Flash).

⚡Performance Considerations

In a vector search system, response speed and cost are critical factors. We will proceed with the design while keeping the following points in mind:

Index Optimization:
- Data Volume: As the number of users and personalized data (Memory and EntryItem) grows, the index size will increase. We will manage Pinecone's index settings appropriately to prevent a decline in search speed.
- top-k Tuning: We will adjust the number of results to retrieve (top-k) to strike a balance between response quality and speed. By narrowing down the most relevant results, we can reduce the number of input tokens sent to the LLM, optimizing both response speed and cost.
Cost Management:
- Embedding API Costs: The text-embedding-004 model is priced based on the number of tokens. We need to be mindful of costs, especially when vectorizing long Memory texts.
- Pinecone Costs: Pinecone's pricing varies depending on index size and query volume. While the free or developer plans are sufficient for the prototype stage, it is crucial to understand the cost model for future scalability.
RAG Workflow Efficiency:
- Concurrent Requests: We will manage server-side processing capacity and API call volume to prevent bottlenecks in the event of many concurrent user requests.
- Caching Strategy: Frequently searched data (e.g., items in a specific category) can be cached to significantly improve response speed.

🧪 Test Strategy

The vector database system's correctness and speed are crucial. We will perform tests based on the following strategy:

1. Unit Testing

Target: Vectorization process, data insertion/update in Pinecone.
Content: Verify that API requests to the text-embedding-004 model are successful and return vectors of the expected dimensions. We will also validate that data is written to Pinecone correctly.