Introduction

Phonic relies on machine learning models to measure, estimate or classify response analytics. These range from very simple linear or heuristic models to larger, more complicated end-to-end neural networks.

Most of our machine learning models are supervised language models trained on datasets that humans have labelled manually. For example, the labels for a sentiment analysis model might be Positive, Negative, or Neutral. The advantage of this approach is that the model will learn to attribute phraseology to sentiment polarity in the same manner that a human would. This is a tremendous benefit, and means that the model will tend to product outputs that seem reasonable to humans. However, since these models learn from human labelled data they will only ever be as accurate as the humans labelling the data. In machine learning this is known as "garbage in garbage out"- a poorly labelled dataset will produce a poorly performing model.

To mitigate inter-labeller variability (particularly on tasks as subjective as the degree of emotion in an audio clip) we typically have the same piece of data labelled multiple times. This produces a distribution of labels, improving model accuracy and allowing us to understand prediction confidence. Other techniques are implemented to improve black-box observability and output correctness.

For any other questions about the models below, or other inquiries please reach contact us at [email protected].

1. Multi-Modal Sentiment Analysis

What is Sentiment?

Sentiment is a unidimensional measure of the emotional content of a text response. The specific dimension on which sentiment is measured is called “valence” or polarity - a measure of the intrinsic positivity or negativity of a piece of text. Sentiment measures contain a valence (positive/neutral/negative) and a magnitude from zero to one. Valence is sometimes also called polarity.

This convention is adopted from psychology and cognitive science literature and is common in machine learning. Some variations, including the one presented in the Phonic dashboard, contain a "mixed" measure, in addition to measures of positive and negative sentiment independently. Responses will be rated as "mixed" when they appear to have both positive and negative sentiment. For example, a response might say "I loved the food, but I hated how long I had to wait." A naive sentiment implementation would observe that this contains both positive and negative sentiment, averaging to neutral. The Mixed class represents this variance.

Sentiment Examples

Calculating Sentiment from Text and Tone of voice

The current implementation calculates sentiment using text and tone of voice. In other words, we calculate one measure of sentiment from the transcription, and another measure from the tonal content of the speech. These two measures are then combined in a final calculation. Research suggests that tone of voice contains more than five times more communicative information than text alone. In practice, the prediction tends to be dominated by the text component unless the question prompts an emotional response.

2. Emotional Analysis: Sentiment in Higher Dimensions

Sentiment is useful for quickly filtering (ex. "show me the 10 most negative responses"), however it is a highly simplified model of human emotion. Actual emotion is a complex, multi-dimensional and any metric is an approximation at best. Sentiment analysis, being a one-dimensional measurement, is both the most robust and egregious oversimplification. This simplicity and robustness, however, make it very popular in industry. An internal research is on other sentiment and emotion frameworks. For example, a model could be trained on the eight primary classes in Plutchick's Wheel of Emotion, yielding an 8-dimensional measurement of emotion. Our emotional analysis model uses an ontology of 22 emotions.

Emotional analysis is currently in beta. Contact our team for more information.

3. Auto-Coding/Classification

Auto-coding automatically tags survey responses with a set of user-defined classes. For example, a question asking about sneaker preference could be coded according to any of the following frames:

Screen Shot 2022-01-16 at 12.39.11 AM.png