Digging into data
IRE’s experienced trainers will start with the basics of navigating Google Sheets and using formulas, then walk you through sorting, filtering and aggregating data with pivot tables to find story ideas. You'll come away with a solid base for analyzing data in your newsroom, including how to find and request data, identify and clean dirty data, find story ideas and make your work ironclad.
IRE’s Laura Moscoso \ Adam Rhodes
<aside>
🔗 https://drive.google.com/file/d/1GaroE7e5JZOBJxKIijLhKIrSEs1alkcH/view?usp=sharing
^ step-by-step intro to Google Sheets ****for complete beginners
https://docs.google.com/document/d/1fXx7kVNu26gluSP-iR-134tdHsR7NlpuEs-5NSFVALk/edit?usp=sharing
^ useful spreadsheet formulas for reporting
https://support.google.com/docs/table/25273?hl=en
^ complete list of Google Sheets formulas
https://source.opennews.org/articles/building-cleaner-smarter-spreadsheets/
^ guide to structuring spreadsheets
https://docs.google.com/presentation/d/1mPCXxmyEhBGRlwaLQOTZ6B8uXjKHzBDL_QLp6bw09hk/edit?usp=sharing
^ guide to scraping on Google Sheets
https://docs.google.com/document/d/1vepo_gI05d259lLDHGUURJp2avcn1aitmlgUKrTUFTI/edit?usp=sharing
^ Data Diary example
https://github.com/Quartz/bad-data-guide
^ types of bad data & how to fix them
https://github.com/propublica/guides/blob/master/data-bulletproofing.md
^ how to fact check data
</aside>
What’s data journalism?
- All the components of regular journalism (interviews, observation, documents) + data.
- Data is useful when it generates new story ideas, adds credibility to existing stories, proves what’s actually happening (vs. what people say is happening), packages information into digestible/interactive formats.
Process
- Generate a story idea and angle
- Story ≠ ****topic (i.e. “university endowment investment” vs. “how has the university’s endowment investments changed in light of recent divestment campaigns?”)
- Angle = why should people care? What makes this story timely, relevant, and interesting to your audience? What role does data play in it?
- Obtain data + interview ppl to fill in background info about it
- When was it collected? How? By who? About who? For what purpose?
- Talk with both the people who collect data and the people who the data is about (otherwise, it’s unequal reporting).
- Analyze data + interview ppl about your findings
- Keep a data diary to track your analysis, observations, and questions (reference example ^), then use those notes to direct your interviews with human sources.
- Visualize data + write article
- Reference visualization & writing with numbers notes.
- Article should follow standard journalism style, even if it’s based in data. Some basics:
- Inverted pyramid structure = key info first, followed by rest of info in order of descending importance. Write with an eye on what readers need & want to know.
- Lede = opening paragraph(s). Can be direct (deliver findings from data) or delayed (ease in with an anecdote/description).
- Nut graf = paragraph(s) after lede. Encapsulate the angle, i.e. what this article is about and why it matters.
- Body = info from data and interviews.
- Kicker = final paragraph(s). Wrap up the article depending on the ‘vibes’.
- Fact check data + article before publishing
- Reference Quarts & ProPublica githubs ^.
What can Google Sheets do?
- Easiest way to analyze data without coding.
Import data
- Formats:
-
CSV (comma separated value) — ideal format, can import directly.
-
PDF — use Tabula to extract data into a CSV, then import. If you’re worried about malware, use Dangerzone to clean the file.
-
Websites — use “importHTML( )” function in Google Sheets (reference guide to scraping with Sheets ^ for alternate methods).

- Url = self explanatory.
- Query = “table” or “list”, depending on the data format on the webpage.
- Index = which table/list on the webpage you want (i.e. “1,” “2,” etc). If your first guess is wrong, just try the next number.
- Locale = can ignore.
- TIP: copy-paste the url, query, and index into separate cells so you can reference them in the function.
- TIP: always copy-paste your scraped table so it unlinks from the formula, in case the webpage changes/shuts down.
-
Good ol’ copy-paste!
- Keep one copy of the data in its original form for future reference.
- Record source notes — when & where you got the data.
Clean data