Open Source Data Science Masters - First Project

I've been wanting to dive into data science, and I've decided to do this Open Source Data Science Masters which seems like a quite logical and very in-depth learning experience.

What I'm up to now, primarily, is two parallel tracks: catching up on my math and statistics knowledge and skills (it's been quite a number of years since I've needed them) and also diving into data analysis in python (and probably eventually R), learning things along the way.

In terms of math and stats, I'm starting with Linear Algebra. It's a good course so far.

And in terms of data analysis, I grabbed some census data - in particular, State-by-state migration flows from 2005 - 2016. I'm gathering a series of questions to ask of this data (and I'm certain I'll need more data to answer these questions.)

Here are the questions I've thought of (and also have been contributed by @gusandrews)

  • What are the states that get the most migrants? How has that changed over time?
  • What are the migration pattern of migration to blue states from red states and vice versa? (For this I'll need voting results for each state over time (and how do I define "Blue" and "Red"?))
  • Is state-by-state migration influenced by economic growth? (For this I'll need the GDP of each state over time.)
  • Is state-by-state migration influenced by housing prices/availability and real estate investment? (For this I'll need housing and real estate data for each state. Some of it might come from another census data set.)
  • Has migration been influenced by climate-related disasters? (For this I'll need data on natural disasters by state and year.)

I might eventually see if I can find more granular migration data - like by county.

Anyway, the next blog post will be about getting this migration data into python's pandas library, and beginning to do a bit of analysis.

links

social