There's a tendency for social systems to change when you measure them... this post describes my potentially risky endeavor to take measurements of the social systems that matter most to me: my relationships. Hopefully, the effect of this measurement will be positive, as I may be able to learn something about myself, reflect, and grow. Or maybe I'll decide interpersonal informatics should never be attempted, and the data enabling it should remain buried deep within the servers in Pryor Lake, Oklahoma.

Here I'll describe my journey in exploring the digital traces of my social life. The seed of this project idea was born when I canceled my Sprint cell phone service and switched to Google Voice back in the Spring of 2019. Prior to this, I occasionally cleared my phone to save space, and deleted old message threads. Now, however, every interaction mediated through my phone number was saved for eternity, and available for export in Google's Takeout.

It wasn't until I left my job in the aerospace industry and began a PhD that I had the space and openness of mind to explore this further. I found myself in one of the strangest sections in the University of Colorado Boulder's library, reading books like Measurement of Love and Intimate Relations (by Oliver C.S. Tzeng) and The Pleasure of Your Company, A Socio-Psychological Analysis of Modern Sociability (by Emile Jean Pin). These books were strange, certainly, but oddly fascinating. Another book, Interpersonal Perception, (by R.D. Laing, et al.), described how relationship outcomes could be reliably predicted based on a quantitative survey of interpersonal perception. It wasn't agreement that mattered in relationships, but understanding. And this understanding could be measured, apparently. These books described old research threads just waiting to be brought into the 21st century.

The Interpersonal Informatics Aggregator

The university hackathon, HackCU, was a good setup to start exploring technology and measurement of relationships. I built the framework for what I called the "Interpersonal Informatics Aggregator." The goal was to create a localized, individual database to store personal data from various online services in a structured, useful way, supporting personal growth, and helping individuals quit social media apps (by facilitating data transfer, saving contacts, etc). The prototype I made during the hackathon was rough, essentially a Python script that parsed the JSON and HTML from Google and Facebook's data dumps into a SQLite database, but I ran out of time to do anything useful with the data.

This spring (2021), I'm taking a cool class on Info Viz, and we had an assignment to "tell a story with data." I couldn't imagine a better way to push the project a bit further, and see if I can't explore interpersonal informatics one step further.

The primary work I had to finish on the aggregator was contact merging. I had to export all my Apple contacts to Google, and create a parser for the Google Contact csv data dump. I unified the contacts based on name and phone number, which gave each of my different friends a unified "personhood" across multiple platforms.

The Data

I had previously brainstormed a few social informatics interfaces for a personal health informatics class, and they relied on sentiment analysis, so I ran all 68,000 of my messages through TextBlob's sentiment analyzer, and exported a csv of all these messages, along with word count and the time they were sent. This file contains every message I had ever sent or received on Facebook Messenger (which I started using in 2008), and every message I had ever sent or received on Google Voice (which I started using in 2019). So the data is slightly incomplete, as it doesn't include my text messages before 2019.


The first thing I wanted to look at was some temporal distribution of my messages. A scatterplot seemed like a good first step, simply showing my messages across time. I use the date sent as the X axis, and the word count as the Y axis. To do this, I booted up an Observable Notebook to try and apply what we had learned in class about D3. I started to learn the limits of D3, as the rendering of the plot become very slow with 68,000 data points (it completely crashed when I tried to color the points by sentiment... that will have to wait for another time).

The current live rendering of the plot seems capped by D3, and only a few data points are showing. Luckily I grabbed a screenshot when it rendered correctly, and have pasted that below (this was before I could add titles/axis labels). The current live version (with titles/labels, but much less data) is here.

The X axis is the date a message was sent or received. The Y axis is the number of words in the message.