Executive Summary

For my AI app Aiba Breath I needed to extract, categorise and repurpose news stories. While seemingly simple task, to get stellar results, one needs to employ a complex mix of prompt engineering techniques with human interaction at certain points. Example results you can see in the end of this case-study.

Solution

1. Define Unified News Summary Format

First things first.

Although LLMs work well with any unstructured texts, providing structured input helps achieve much better results.

To be able to extract news in a unified format, I needed to develop one. So, I turned to GPT for assistance. :-)

After getting the answer, I manually edited it to be used in the prompt of the next step.

2. Extract News Stories Summaries in a Unified Format

After collecting news stories, the first step is to extract them from their original sources and format them in a unified manner. This task is executed automatically through the OpenAI GPT-4 API.

At this stage, I send two types of messages to the GPT — a system message containing the instructions and details of the expected output (sourced from the previous step's completion), and a user message that includes the source text converted into embeddings from a vector database (for this, I utilize Pinecone and FAISS).

If I'm using YouTube videos as news sources, sometimes GPT may miss some of the stories. This is where I use the self-consistency technique.