First Prototype (~9/4)


Description

For our first prototype of the Voice Cursor, I wanted to test the simplest version of the idea I presented: detecting pauses in speech. Since the full Zoom integration is too complex for a first attempt, I built a lightweight demo where the microphone input is monitored for silence, and soft pauses such as “uhh”, “er”, “erm”, and “umm”. When the system detects a pause longer than one half a second or a preset soft pause, it will show that a pause was detected.

I decided to style this prototype in the spirit of Manfred Mohr’s algorithmic works, where structures unfold step by step according to simple rules. Instead of geometric lines, I used text. The unfolding comes from how the system gradually surfaces possible next words(which is to be implemented). It’s algorithmic, reactive, and never exactly the same twice.

This prototype doesn’t aim to be perfect or even particularly “useful” yet. The point is to see what it feels like when the computer actively listens for hesitation and responds. It’s a playful sketch of the bigger idea: a real-time linguistic co-pilot.

Idea sketch

  1. we take the mic and outline/script as an input.
  2. For further steps we are thinking of creating a simple robot on a rail that detects the speaker and follows it, recognizing postures and hand gestures and suggest an improved version.(for big stages)

Second Prototype (~9/11)


Description

For my second prototype, the exercise in class really helped me stretch the original concept (the CV-based transliteration and speech helper a.k.a. “Voice Cursor”) and eventually flip the whole direction toward what my teammate and I are now pursuing — the NFC networking assistant.

5 Alternative Meanings

  1. Cursor as connector - instead of text editing, the cursor becomes a way to link two people.
  2. Pause as data - hesitation is not a flaw, it’s information that can be tracked or logged.
  3. Voice as identity - the way we speak is a kind of personal business card.
  4. Transliteration as translation - moving between modes, not just languages (spoken → written, or spoken → network edge).
  5. Cursor as trail - it leaves a mark of where you’ve been, a history of interactions.