For our first prototype of the Voice Cursor, I wanted to test the simplest version of the idea I presented: detecting pauses in speech. Since the full Zoom integration is too complex for a first attempt, I built a lightweight demo where the microphone input is monitored for silence, and soft pauses such as “uhh”, “er”, “erm”, and “umm”. When the system detects a pause longer than one half a second or a preset soft pause, it will show that a pause was detected.
I decided to style this prototype in the spirit of Manfred Mohr’s algorithmic works, where structures unfold step by step according to simple rules. Instead of geometric lines, I used text. The unfolding comes from how the system gradually surfaces possible next words(which is to be implemented). It’s algorithmic, reactive, and never exactly the same twice.
This prototype doesn’t aim to be perfect or even particularly “useful” yet. The point is to see what it feels like when the computer actively listens for hesitation and responds. It’s a playful sketch of the bigger idea: a real-time linguistic co-pilot.
Idea sketch
For my second prototype, the exercise in class really helped me stretch the original concept (the CV-based transliteration and speech helper a.k.a. “Voice Cursor”) and eventually flip the whole direction toward what my teammate and I are now pursuing — the NFC networking assistant.