<aside> 🙌 This is my space for prompt experiments which may (or may not!) be of interest to others. These have generally been thrown at ChatGPT to see what sticks, following any best practice that I’ve stumbled across (short is best, how to structure data, etc.)

The thinking here is that by using chat – which is free – I’m getting a feel for what works and doesn’t, which can then be built into an appropriate Langchain using the Completions API.

</aside>

Writing Python scripts

Bashing out quick and horrid Python scripts for data manipulation in Jupyter works well, e.g.

Create a python script which opens a text file, splits its contents into pages using a given delimiter, then saves each page out into a new directory as a new file

Working with OCR text

Aim is to take scraggy OCR text (from e.g. Hathi) and output something more structured.

Processing Markdown and removing badly OCR’d illustrations:

Convert the following OCR text from a historical document into structured markdown. The text contains badly formatted OCR interpretation of a diagram which should be ignored.

<file contents here>

Update: removing diagrams works well whereas converting to markdown has resulted in a substantially re-written text. This would likely work better therefore as two prompts.

Processing into Markdown with additional instructions:

Convert the following unstructured OCR text from a historical document into clean and structured markdown, removing page numbers. Note the following:

- Pages typically start with the current chapter heading IN CAPITALS.
- Pages typically finish with the page number.
- Pages sometimes have footnote references to other pages, indicated by the use of an asterisk
- ‘En’ dashes are sometimes (but not always) used to separate a heading from its content
- Lines which are only 1-3 characters long are often transcription errors.

<file contents here>

Filtering and extrapolation

For instance categorising different passages (diagram, general, etc.) then processing each separately.

Interpreting machine/process diagrams based solely on a text description:

# split the diagram content
From the following OCR text of a historical document, return only the text pertaining to the description of a diagram. "..."

# Derive the machine's purpose
From the following historical description of a machine, summarise its purpose in a single sentence. "..."

# List the processes involved
From the following historical description of a machine, list the processes involved. "..."

# Filter the list, etc..

# List its parts
From the following description of a machine, list its mechanical parts. "..."