As a data analyst, I think about data too much. I was driving home from work today and a thought crossed my mind... again. I ponder this too often, but there is so much data that goes uncollected, uncollected, unobserved, and just taken for granted. What could we learn from the seemingly unnecessary pieces of reality that just pass us by. Even if we could collect data in the tremendous quantities that it rolls by us, how would we analyze googles of data - remember, google's a number... . Well, in all honesty, Google is probably in the best position to answer that question.
The first problem to consider comes from a term that I recently learned form Google's chief economist, Hal Varian... data munging. This can be considered data prep also, but I think "munging" is meant to be a more general term - not really sure. This is what gives much of my labor its value. I'll admit, I'm a youngster in the labor force, so I'm still learning and honing my skills, but if data is not formatted, shaped, reshaped, transformed, merged, appended, dropped, sorted, etc., etc. properly then it might as well just pass right by us. This can be the most time consuming part of analyzing data, whatever the data happens to be.
Just as a proposition... could data munging be automated? From a VERY abstract pov, I think this question calls for generalizing a definition of data and the sorts of analysis that are relevant. There are so many ways to look at data. I think the answer is, "sure, if you know the source and it's natural format." But what about new data, like the spawning of a never-before-witnessed black market. There might be ways to collect, munge, and analyze data from current black markets, but how would you generalize that process when a new (unexpected) one shows up.
The black market is only an example, but consider a scenario where every piece of the DGP (Data Generating Process) is observable. First off, this is impossible - we'd spend more time observing than time exists, so... . Anyway, imagine it's so, I'd guess that tools along the lines of anomaly detection would be used. In that sense, we (think we) know how the world is now, and we observe how it changes - pretty simple. I wasn't going anywhere in particular with this, but these are the little frustrations that roll through my mind on a daily basis.
On a completely different note, I'm gonna talk about health care. Not what we should or shouldn't do, but a recent perspective I've had on what's been taking place in washington lately. Sort of porting my thoughts about data to an optimistic viewpoint on healthcare. This first came to mind when I'd thought the dems reform bill was an eventuality, which may not be the case anymore with Scott Brown in the Senate now. Anyway, aside from all the things that I think are inefficient with our current system, I do want Washington to make major changes to it sometime - I don't care if it's a dem or rep bill, just change it... and, I almost don't care what you change as long as it's broad and relatively major. Of course, I have different personal views because in the end it's my pocket I care about.
This is about understanding our health system. Right now we know the rules and how to play the game as it is now. From the perspective of a "social consciousness" (and data) perspective, this is like having a data set of a couple hundred thousand observations - one for each person that plays the game. They each have their own opinions and such. From day to day, the game changes very little. If Washington passes legislation that turns football into rugby (metaphorical), then everyone has to adjust... somehow. Managers can't use the same strategies, players can't either. It's like giving each person one more observation to consider in their data set - thus a couple hundred thousand more observations. Now we have some variation, which was entirely absent before.
My point is that if we all have this second observation in our data sets we can develop a better understanding of what's really important in our health care system. Rather than listening to politicians, economists, opinion leaders, etc. tell us what's important, we can learn first hand. The problem with this is that it costs resources to adapt, whether it's time, money, or stress - it's not necessarily a pleasant approach, but we are sure to learn a lot about our health care system and ourselves. Right now, we think we know what's important, doctors think they know what's important, politicians think they know what's important. We all think we know what's important to play the screwed up game we have right now, but how do we play the game that provides us with the healthiest nation as a whole. No one knows that because we can't figure out the best rules for that game.
No matter what happens with health care, it's in the American psyche now and we'll think about it more regardless. If things do change, pay attention and reform your opinions. Our current system is messy, and any new system will not be an end-all (it may even be messier). But we need to pay attention to the effects that any of these changes will have on our lives, because we won't get the chance to again until we go through another one of these polarizing health care debates.
Pick o' the Post: "On Impulse" by Animals As Leaders on Animals As Leaders
This is Animals As Leaders debut album, and this track is pretty cool. Oddly enough, I really like the techno-ish beat around 1:20 and 2:50. This whole album has, just, amazing guitar work on it. The musicianship reminds me of so many different artists as various times (BTBAM, Guthrie Govan, DT, Meshuggah), but they are very uniquely, Animals As Leaders. Enjoy.