Thankfully, the data that I needed was scraped already in a reddit post I found. Without this, I would have to scrape the subreddit which would take me a long time and extensive amounts of time since the Reddit API is rate limited. The project utilizes data scraped from the r/JapanTravel subreddit, including both submissions and comments. As described in the reddit post, the files downloaded are in .zst format. We have submissions.zst and comments.zst. After converting them to .csv, we can explore our data.

submission.csv

There are 6 columns to look at each submission:

To us, the most important columns are Date , Title, and Content. But first, let’s look at the dataframe

image.png

About the data

Data Challenges