<aside> 💡

Tl;DR:

Since my last update, I built a full web app to show Claude Code live in action taking the exam, and demo-ed the app to various interested parties. Here’s a screenshot from the web app:

A screenshot of the web app showing Claude Code’s attempts at the IUCN Red List Assessor Exam

A screenshot of the web app showing Claude Code’s attempts at the IUCN Red List Assessor Exam

Moreover, with some small prompt engineering, Claude Code Sonnet now consistently passes 8 of 8 exams, averaging 86% – comfortably attaining the pass mark of 75%, and far better than I can do!

Here are some closing thoughts to wrap up this project.

Another important question on my mind here though, is to get clear on what research is involved in a given project, as opposed to just AI-driven software engineering – even if accelerating Red List assessments would be extremely impactful work for conservation. On the other hand, working with AI agents is new for all of us – and so designing effective workflows for this is certainly novel terrain. For example, it’s not yet clear how to best connect these agents to Anil’s corpus of scientific literature, an exciting future direction. But it’s definitely still important to be mindful of this research vs application trade-off.

Next steps for taking this further will be to approach the IUCN to (a) gauge their interest and (b) see if we could access their SIS data with its full history of Red List assessments. Michael Dales has mentioned he can put me in touch with the head of the Red List Unit, Craig Hilton-Taylor, who definitely seems an appropriate person to talk to about this. So we’ll see where this goes from here – but for now, I am hopeful this project at least serves a valuable PoC and example of how AI can contribute positively in the conservation domain.