Background:
The IUCN Red List is a Barometer of Life that “measures the pressures acting on species, which guides and informs conservation actions to help prevent extinctions.”
172,600 species have been assessed for the IUCN Red List to date. Their target is 260,000 by 2030.
So, what is the IUCN’s main constraint in achieving this? They can’t carry out (a) new assessments and (b) re-assessments fast enough, because they don’t have enough experts trained. This results in many species lacking coverage (e.g. only 18% of plants). Moreover, many species’ assessments are out of date.
Hypothesis:
<aside>
💡
A carefully-designed AI workflow could significantly accelerate Red List assessments
</aside>
Why do I think AI would be helpful here? The task seems well suited to an AI workflow, since:
- The assessment process is extremely well-documented: see https://www.iucnredlist.org/assessment/process.
- We have a huge corpus of training and testing data, including drafts and version histories, to train and evaluate whether AI can complete the task successfully.
- AI can read through and ‘learn’ from 160,000 assessments in a way that humans can’t.
- AI is multi-lingual and multi-modal so can incorporate diverse sources of info.
- The goal for Red List assessments is consistent adherence to clearly articulated protocols (which AI is good at), not creativity (which humans are better at).
We now have AI models that outperform the top human mathematicians and computer scientists in maths and computer science olympiads. Why not leverage these for Red List assessments?
This could even just serve as a preliminary validation procedure to verify that the Red List criteria have been appropriately and consistently applied by assessors.
Concerns:
The biggest concern I foresee, is that AI can’t go do fieldwork and speak to locals and experts like real-world assessors could… But even if just for re-assessments, AI could be a very valuable tool, to assess if the Red List predictions held up and see if species need to be re-classified. Expanding new assessments is hard enough, but it’s going to be increasingly difficult to keep the existing ones up to date too as the coverage grows…
Methodology:
- Build each AI subagent, with ground-truth ‘unit test’ evals for each component.
- For example, we could have an carefully tested agent to guide each search for information with best practices:
- Population status and trends
- Geographic range
- Habitat and Ecology
- Use and Trade
- Recent, current or projected future threats
- Conservation actions
- Each of these might require developing specialised skills standardising how to access specific data.
- The workflow could differ depending on the taxa in question. Experts could collaborate to improve the prompts and skills.
- To mitigate against data leakage from models having access to the assessments already, we can train on pre-2024 assessments and test on 2025 onwards, and then only use models whose knowledge cut-off was before that date. We can also compare fine-tuned models vs prompted models vs extra RLHF’ed models.
Next steps:
- I’ve started doing the Red List Assessor training myself, to see what it entails, and eventually contribute my own assessment.
- A soft goal could be to see how far I can get with adding a Red List entry for an African Baobab.
- Why African Baobab? I saw a lot of them in Makuleke and Mashatu, so I have a decent understanding of them and their distribution.
- I’ll document the process thoroughly, understand the pain points. Then use this experience to design the AI workflow.
- The next step will be to build an MVP end-to-end agent to prove the concept.