LLM Hacking Notes, Spring 2024

How I am feeling today

Excellent.

What’s on my mind

Inner monologue prompting

From OpenAI’s prompting guide, I’ve been using their suggested “inner monologue” approach as a means of building a thought through response over multiple steps. (It appears to work better with GPT-4). There are some other useful techniques in here too.

OpenAI Platform

Python Pickle?

… is exceedingly useful as it lets you dump arbitrary Python objects to disk and then retrieve them easily, without thinking at all about wrangling them to/from another data format like JSON.

Perplexity is suggesting that there are potential issues with Pickle though – it is fairly vintage – and suggests some alternatives here:

Can’t attest to the veracity of its claims but it’s definitely worth a poke about.

Benchmarking & Evaluation

… is becoming more of a thing, as a route to meaningful comparison across different prompts vs datasets, etc. If you were looking for a rigorous way of establishing a baseline for a particular dataset and pipeline combo then this points the way towards how you might start thinking about it:

Optimizing LLMs: Tools and Techniques for Peak Performance Testing - Semaphore

It’s a bit of a lightweight article (and geared towards CI) but there are some helpful links in there, including to Langchain Evaluators, which gives you a few different tools for LLM-powered evaluation – answer comparison, criteria comparison, etc.

Prompt engineering just got weirder

AI Prompt Engineering Is Dead

Positive affirmation

<aside> ☀️ If we look busy then maybe the robots will spare us.

</aside>