COVID-19 Analysis
So in case you didn't notice, March Madness kinda didn't happen this year. So instead, I tried to analyze some COVID-19 data to see what I could learn and potentially contribute to that effort. The same site that hosts the March Madness algorithm competition, Kaggle.com, hosts various other data-centric competitions throughout the year. And while I don't think there was a terribly competitive attitude to it, there was a COVID-19 "competition."
I tried getting involved early on - largely because I needed to spend my PTO I had otherwise intended to use for March Madness. Unfortunately, I wasn't feeling terribly helpful, and I was a bit dissatisfied with the data we had. So I desparately attempted to reach out to the keeper of all things data, Google:
Yo @googlemaps, on the off chance you aren't already, store that "busyness" factor you show for shops and restaurants - guessing that's one of the best ways to measure social distancing. (Also would someday be nice to have in the API if it's not already) 🙂
— Brandon Hoeksema (@BrandonHoeksema) March 15, 2020
I was hoping they would make accessible the data they show in these little histogram sorta charts. These show you a baseline, expected busyness as well as the current actual busyness.

I was also trying to be active in the Kaggle forums to get that idea some traction as well as to make sure that I didn't miss it when it became available.
It was early April when my phone told me about Google's mobility reports which sounded like exactly what I was asking for. EXCEPT they put the data in PDF format. Which basically makes it impossible for anyone to efficiently analyze this data. So after trying to politely complain about that in the Kaggle forums and receiving no indication that more useable data was coming (Kaggle is owned by Google), I frustratedly gave up on contributing to the analysis effort. But two days ago, I checked Google's Mobility Report page and discovered that they had at some point added a CSV format for the mobility data.
So now I have no excuse. Other than that the organized "competition" is over. But hey, I'm too curious to not take at least a little peak at this data.
The main relationship I've been interested in understanding better is that of social distancing and its effect on the spread of coronavirus. I intend to do a few more plots/charts in the future, but I managed to throw something together real quick. Note, I may discover major issues with my current analysis, but for now I have a bazillion plots that generally seem pretty reasonable to me - one for the entire US, one for each state, and one for each county. The x-axis shows the percent change in peoples' activity relative to a pre- national emergency baseline. Google's dataset was quite rich and split the data into 6 different categories which I'm plotting where available. Then the y-axis represents how fast we're spreading the virus. Specifically, it is the number of days it's taking to double the number of deaths - deaths are theoretically more reliable than cases since it doesn't require a test to die from the virus. That said it's also a lagging indicator so trying to appropriately line up the social distancing numbers with the spread numbers is a shot in a dimly lit room. But anyway, the larger "days to double" is, the slower the spread.
Let's take a look at how Denver's doing as an example
The white points fit fairly well with what I'd expect - the more people stay home, the longer it'll take to spread this virus. The red points similarly match my expectation - the more people avoid public transit, the longer it'll take to spread the virus.
For all the plots for all of the US states and counties: COVID-19 Google Drive.
And as always, a relevant xkcd: