Basic checks: record counts, pivot tables, histograms + scatterplots
Looking for:
Are there any values missing?
Are there extreme outliers?
Count unique values
Weird trends
Weigh against known totals
Are there summaries you can check your data against?
The bigger the finding, the more you should be skeptical/try to disprove it
If you can’t go broad (nationally, full sample), go small
If your data is too big to check or review manually, can you sample enough to feel comfortable with?
Some data that is readily available is meaningless
Can you still use the data if you disclose the weakness and play to its strength in your analysis?
Superlatives can come back to bite you
Beware of making universal statements off a limited dataset or a subset of data
A majority isn’t everyone
Reframing what you’re proving. “We can’t say X, but we can say Y”
Hypothesize ambitiously but be humble when drawing any conclusions
When the payoff (story, findings) isn’t worth the effort (cleaning, more data collection, massive levels of caveats for the story)
When the data is just too flawed