Create factors/correlations for the numbers we are seeing for positive test rates and mortality percentages by city/county with other factors such as the covid tests that are being used in each city/county, testing/reporting methodologies , staffing levels, percentage of pop tested, etc.
Example: Florida and Florida counties that are right next to each other and have overlapping zip codes are having strikingly different positive test rates (6 % vs 13%).
factor analysis models / linear regression models
Linear regression data modeling would involve identifying the factors potentially influencing the positive test rates . These would be implemented as the feature vector for the model. We would collect this feature data for the communities/regions we have positive test rates for. Then ,we would use this factor data and positive test rate data to "Train" and "test" our data model . Once the data model is accurately predicting the positive test rates, we can apply linear regression to identify how the factors are correlating with the positive test rate . To be accurate , we need to include as many potential factors as possible and get this data for as many regions as possible
Some of the data is made publicly available at the case level by zipcode/county. Other info (like test used/methodologies/processes/staffing levels) would need to be gathered by calling the test centers in each county.
Predicted
Predictors-Factors
Test factors
Community factors
Presence of specific situations
Jupyter Notebooks (to access and format the data for ingestion by the data model, to run the data model, and to provide explanation/ a story around the data model)
AirTable (to store the data collected for the factors and the positive test rates by community ,region)
Internal Scraping/Data Collection Tools