The PECSS App was envisioned to use all the sensors available in a regular smartphone to extract crucial insights about the patient while they work on their in-vivo or imaginal exposure exercises. One of the fundamental input streams that we wanted the app to incorporate was audio, and that is exactly what I contributed in this work thread. I developed the “Digiscape Audio Module” for Android, that extracts important audio features from the patient’s smartphone in a completely non-intrusive, offline fashion.
Code Repository: https://github.com/DiptarkBose/DigiScape
On this page, I explain how I developed the audio module for Android.
While at school, we have been shown pictures of perfect sinusoidal waves whenever the topic of sound was discussed. But that's hardly the case. Actual audio waves that are present in our surroundings are a composition of multiple frequencies. Imagine you are sitting in a park. Children laughing would produce high-frequency waves, whereas a lawnmower would produce sound waves of lower frequencies. Birds chirping around you could be towards the higher frequencies, whereas the sound of a football being kicked would be a low frequency 'thud'.
Thus, ambient audio is vastly complex and consists of a combination of multiple sinusoids. In reality, a sound wave around you would seem something like this:
In an Android environment, the mic captures and stores the sound wave using an array of numerical values. The AudioRecord class and its methods help us achieve that. So essentially, audio in our context, is just an array of numerical values.
// Polls the AudioRecord object and stores audio discrete values in buffer
int nread = recorder.read(buffer, 0, buffer.length);
Something like [126, -57, 73, 56, 73, 452 ...........]
That's it! That's all the sound information we have with us to proceed with!
The first glance at the array of numbers would probably give no indication about anything in our surroundings. But we need these numbers to paint the picture of the surroundings. Is there a engine idling near the mic? Is the television switched on? Is someone speaking something?
The main challenge of developing the Digiscape audio module is to understand how this array of numerical values can assist us in painting a picture of the smartphone's environment.
There are tons of sound detection projects already implemented in python, which simply take in a .wav file, and churn out the most probable category that the sound belongs to. So why aren't we using something of this sort? Since Digiscape's audio is highly sensitive, all computations, classifications, and processing need to be done on the device itself, totally offline. No network/API calls are allowed. Hence, sending the audio file to an external server where the python script can do its magic is not really an option for us.