Look at your data

Come with a healthy skepticism

Basic checks: record counts, pivot tables, histograms + scatterplots

Looking for:

Are there any values missing?

Are there extreme outliers?

Count unique values

Weird trends

Weigh against known totals

Are there summaries you can check your data against?

The bigger the finding, the more you should be skeptical/try to disprove it

Using imperfect data

If you can’t go broad (nationally, full sample), go small

If your data is too big to check or review manually, can you sample enough to feel comfortable with?

Some data that is readily available is meaningless

Can you still use the data if you disclose the weakness and play to its strength in your analysis?

Know what you can’t say

Superlatives can come back to bite you

Beware of making universal statements off a limited dataset or a subset of data

A majority isn’t everyone

Reframing what you’re proving. “We can’t say X, but we can say Y”

Hypothesize ambitiously but be humble when drawing any conclusions

Dealbreakers

When the payoff (story, findings) isn’t worth the effort (cleaning, more data collection, massive levels of caveats for the story)

When the data is just too flawed