<aside> 💡

TL;DR

Yes!

</aside>

Summary

I recently completed the IUCN Red List Assessor Training course, achieving 80% in the final exam to receiving my official certification (you need >75% to pass the exam). Upon completing it, I was curious about how Claude Code would do, so I decided to put it to the test.

So, how does Claude Code do? Pretty good! It passed four out of five exam runs, averaging 80%.

Claude Code’s exam results. The top row is my own personal attempt, the bottom 5 are Claude Code’s. Claude Code got the highest grade of 88%.

Claude Code’s exam results. The top row is my own personal attempt, the bottom 5 are Claude Code’s. Claude Code got the highest grade of 88%.

Humans are allowed to repeat the exam as many times as needed until they pass, so four out of five is a very good result. Moreover, I am confident the incorrect answers are not due to an innate limitation, but rather just requires more careful context engineering.

I remain confident that AI can can significantly help the IUCN scale up Red List Assessments.

Background:

What I did: