Building the AI System

Retrieval-Augmented Generation (RAG) Implementation

Since we want to map the news with the prospective stocks, we need to build a knowledge base for the AI to retrieve relevant stocks based on certain keywords and their sectors.

The knowledge base is essentially needed since an AI Model can generate a hallucinative results.

Inside of the knowledge base, it contains:

The Stocks Code,
The Company Name,
The Sectors, and
The Keywords That Describe The Company Itself.

We used Gemini to generate the information, since such AI model is connected with Google service, ensuring the accuracy of the descriptions for each stock.

Here is how the knowledge base look like:

Screenshot 2026-04-16 at 20.13.16.png

Prompt Engineering

Since there are 900+ stocks in Indonesia, including such knowledge base inside of AI can cause context window limit, a memory leak in short, for the model itself.

Instead of throwing all of the knowledge base into the AI, we need a way to filter the data by making sure the token is used effectively.

Therefore, a chain of thoughts is implemented.

Instead of including all of the 900 stocks, we implemented several AI prompts. The chain looks like this:

First, based on the news title and its description, we asked the Gemini to guess at most 5 sub sectors that will be influenced by such news.
- Instead of searching the 900 stocks, the AI will filter these 54 sub sectors to only 5 sub sectors.
From there, we filter the knowledge base based on the selected sub sectors, and put the stocks code and keywords to be processed by the AI.

By doing this, we saved the token usage by 80% using this two-step AI pipeline.

Cost Optimization

Since Gemini has a free tier to it, I try to think on how to optimize cost, while at the same time making sure the prompt delivers the desirable output.

From here, I decided to make an experiment if the AI model can assess at least 10 news per request and use the desired knowledge base that is filtered based on subsectors generated from the news.

Thankfully, I made it. Assuming there are 60 news per day, the model still covers the free tier, which significantly made the total cost close to 0!