Last Updated: July 2020

Creating natural-sounding speech from text is considered a “grand challenge” in the field of AI and has been a research goal for decades. Over the last two years, WellSaid Labs has consistently researched and developed tremendous breakthroughs in the quality of text-to-speech systems.

This is a collection of observations about WellSaid’s research and TTS capabilities. Below, you can find samples that demonstrate the range of our voices. The samples are unfiltered and unedited; what you hear is a transparent representation of the research in its current state. Here, you’ll get a sense of where we’ve been, from getting close to human parity and achieving it, and get a sense of where we’ll go next, with new languages and more complex texts.

Voices

Meet our AI voice actors. Listen to all fifteen voices as they read Shirley Anita Chisholm's 1970 speech, "For the Equal Rights Amendment."

Vanessa

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/38f95f4b-0de2-4e42-8ff3-f5dc85953a5d/VN-26.mp3

Alana

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/27426390-0507-48e4-a233-628fd8660101/AB-245.mp3

David

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/e77db1e5-636c-4655-8626-8d639615c2b6/DD-24.mp3

Wade

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/d4e51f3b-e823-4ef2-85dc-738549b618f5/WC-95.mp3

Paige

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/6cbbace0-660c-4ece-8db6-b18a622c047b/PL-6.mp3

Ava

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/27075e0e-a21e-4105-8a83-c86e959012c2/AM-33.mp3

Isabel

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/a522b4ae-e20e-472f-a255-6270a0383a2d/IV-12.mp3

Nicole

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/2d1d5865-d4c7-4a3a-b3d7-3c659d2dbf19/NL-5.mp3

Tristan

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/1d88b29d-489a-4021-96ba-cd38497678cf/TF-320.mp3

Kai

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/e93bf84c-127a-4080-818e-60c56801a1ec/KM-33.mp3

Sofia

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/ca2a21b4-6c2d-481f-8158-1bec438f055f/SH-18.mp3

Ramona

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/cc7b3655-6b02-4d4d-a49f-e25a61139ed6/RJ-13.mp3

Tobin

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/95bb234e-153e-4b9e-90a8-4df68e41b288/TA-1.mp3

Patrick

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/d1f32e91-e607-48de-bafc-2b6f06526857/PK-1.mp3

Jeremy

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/a26e02d3-3054-4654-89ea-12bb96614ba0/JG-1.mp3

Transcript Credits: Thank you, Shirley Anita Chisholm, for your work on equal rights.

Naturalness

WellSaid Labs' text-to-speech is the first to achieve human parity (June 2020) for naturalness on short audio clips (512 clips of 15 seconds or less) across multiple voices (15 voices).

Brief History

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/166b98c7-2967-4655-b008-95ac85bd47f8/image_(3).png

<aside> 📓 WellSaid Labs replicated Tacotron-2 in September 2018, the current state-of-the-art.

</aside>

Over the course of two years of research, we incrementally improved our mean opinion score for naturalness. Our initial research was evaluated anecdotally.