Since we want to map the news with the prospective stocks, we need to build a knowledge base for the AI to retrieve relevant stocks based on certain keywords and their sectors.
The knowledge base is essentially needed since an AI Model can generate a hallucinative results.
Inside of the knowledge base, it contains:
The Stocks Code,
The Company Name,
The Sectors, and
The Keywords That Describe The Company Itself.
We used Gemini to generate the information, since such AI model is connected with Google service, ensuring the accuracy of the descriptions for each stock.
Here is how the knowledge base look like:

Since there are 900+ stocks in Indonesia, including such knowledge base inside of AI can cause context window limit, a memory leak in short, for the model itself.
Instead of throwing all of the knowledge base into the AI, we need a way to filter the data by making sure the token is used effectively.
Therefore, a chain of thoughts is implemented.
Instead of including all of the 900 stocks, we implemented several AI prompts. The chain looks like this:
By doing this, we saved the token usage by 80% using this two-step AI pipeline.
Since Gemini has a free tier to it, I try to think on how to optimize cost, while at the same time making sure the prompt delivers the desirable output.
From here, I decided to make an experiment if the AI model can assess at least 10 news per request and use the desired knowledge base that is filtered based on subsectors generated from the news.
Thankfully, I made it. Assuming there are 60 news per day, the model still covers the free tier, which significantly made the total cost close to 0!