Shifting away from AWS

How we use AWS

We gained access to AWS through ITS infrastructure in late 2020 as we also began to migrate out website to WordPress — the beginnings of the new digital era of The Michigan Daily. AWS offers dozens of services. We use a few of them:

Static file storage. We primarily use AWS to store static files in S3. We have several buckets that store our data graphics, special websites, data files, etc. These files are cached globally using CloudFront. File storage is dirt cheap for us.
Data retrieval and processing. Notable examples include processing university bus data, Washtenaw County live election data and course registration data. We typically use a combination of a Lambda function and S3 bucket: the Lambda function runs periodically to retrieve and process data. The data is then stored in an S3 bucket. This is typically cost effective for us.
Domain management. We host and pay for our domains through AWS Route53. We also use Route53 to set up subdomains. This is not a cost concern for us.
Emails. We used Simple Email Service (SES) to send emails to Love Notes recipients with the noreply@michigandaily.com email address.
Servers. We host our mobile application push notification server using an EC2 instance. We also maintain The Daily Library with an EC2 instance.
Databases. We maintain a DynamoDB table of user device identifiers for our mobile application. The notification server reads and writes to the table to send notifications.

Serverless

An EC2 instance is essentially an Amazon cloud computer that we have control over. The computer runs continuously as long as the instance is active. Even if a website has low traffic, the instance runs. If an instance is running, we have to pay for it. Across our 2 instances, we are paying over 200 dollars a year. Given that both our notification server and library are rarely used, this is quite the inefficient use of our funds.

There are also developer experience pain points with using EC2 instances. Deployment (at least currently) requires SSH-ing into the instance terminal and manually deploying. And, there is low visibility into if an instance is malfunctioning. There are ways to mitigate this, but it’d involve quite a lot of configuration with AWS, which I have little interest in pursuing. While I do think this work (i.e., DevOps) can be valuable in some organizations, I don’t think we’re quite at the stage where we need to, or should be, conducting deep dives into AWS. I’m much more interested in creating applications and delivering valuable user experiences while also being able to quickly deploy.

In contrast to continuously running machines, there is the “serverless” model. Serverless is a bit of a misnomer: there are still servers running. However, we don’t have to personally manage the server, and we only pay for what traffic or computer power we use. We’ve used serverless models through AWS Lambda functions. A Lambda function “spins up” or activates when we trigger one, do some work, and then it “spins down.” We only pay for the few seconds or minutes of work, and nothing else.

Note that Lambdas are still entire machines: when you spin up a Lambda, you have to activate a full operating system. This can take quite some time and may not be optimal for time sensitive operations. For a tracker or scraper, this can be fine: a few seconds or minutes delay won’t have too much impact. However, for something like a web application, having several seconds between server responses may be untenable. This is called a “cold start.”

Deploying a Lambda function has similar developer pain points.

New mental model

We will continue to use AWS to host static files with S3, run data retrieval processes with Lambda functions, manage domains with Route53.

Data retrieval processes are generally one-off, so we don’t need continuous deployments. And, there aren’t really any other platforms (besides val.town) that periodically run functions.

There are 3 main points of friction or inefficiency. We’ve already discussed 1) poor developer experience in content deployment and 2) poor cost justification given the small number of users for administrative or internal tooling.

Another point of friction is content management. Our crosswords are built with a static builder website. Then, JSON files are exported to Google Drive and ingested by the player website. Our recruitment website has a similar publication flow: we have a long Google Doc in ArchieML formatting that needs to be manually deployed. Similarly, special landing pages require a tedious Google Spreadsheet.

Newsrooms want to be lean and nimble. We’ve long had a permeating philosophy at The Daily that we don’t need a database if we have a Google Doc or spreadsheet. After all, ArchieML can become JSON, and a spreadsheet is close enough to SQL most of the time. But, I think there are certain websites that would benefit from more robust content management systems.