Explaining Adjoining Regional Differences in Positive Covid Test Rates

Scope/Broad Description

Create factors/correlations for the numbers we are seeing for positive test rates and mortality percentages by city/county with other factors such as the covid tests that are being used in each city/county, testing/reporting methodologies , staffing levels, percentage of pop tested, etc.

Example: Florida and Florida counties that are right next to each other and have overlapping zip codes are having strikingly different positive test rates (6 % vs 13%).

Tecnique

factor analysis models / linear regression models

Description of the Experiment

Linear regression data modeling would involve identifying the factors potentially influencing the positive test rates . These would be implemented as the feature vector for the model. We would collect this feature data for the communities/regions we have positive test rates for. Then ,we would use this factor data and positive test rate data to "Train" and "test" our data model . Once the data model is accurately predicting the positive test rates, we can apply linear regression to identify how the factors are correlating with the positive test rate . To be accurate , we need to include as many potential factors as possible and get this data for as many regions as possible

Data Points

Some of the data is made publicly available at the case level by zipcode/county. Other info (like test used/methodologies/processes/staffing levels) would need to be gathered by calling the test centers in each county.

Variables to consider

Predicted

Postive tests rate

Predictors-Factors

Test factors
- type of test used (as we just saw this week, the governor of Ohio in the US got a positive test result using the quick test on Monday. He was retested with a slower test and it is negative. he has no symptoms. he is getting tested with another 3rd test now that is supposed to be more accurate)
- test centers
Community factors
- demographics
- include labor/ income factors (areas that have business providing less than a living wage where workers must get overtime to pay bills, those workers are not taking the paid time off to get tested and stay home when sick)
- age
- ethnicity
- race
Presence of specific situations
- prisons (if there is a prison in a community with a population of at least X % of the community then that feature of the community would be positive. In this kind of analysis, "features" are modeled as binary values (1 or 0 ) in a feature vector )
- slaughter houses ( it seems the problem with those is that they host teams of workers form other regions with lower standard and are housed in rally close quarters without the necessary precautions)
Tools

Jupyter Notebooks (to access and format the data for ingestion by the data model, to run the data model, and to provide explanation/ a story around the data model)

AirTable (to store the data collected for the factors and the positive test rates by community ,region)

Internal Scraping/Data Collection Tools

To Do:
- [ ] Define factors and data points
- [ ] define data colelction process
- [ ] create tasks for data collection process
- [ ] implement Jupyter Note book for regression data model

Scope/Broad Description

Tecnique

Description of the Experiment

Data Points

Variables to consider

Tools

To Do: