Saturday, January 30, 2010

Measuring Life

I know I just posted about this, but I've learned a few things now and need to voice what they are.  I told a friend about your.flowingdata.com (yfd) and was immediately referred to daytum.com.  The UI seemed more user-friendly than yfd.  I soon learned why.

Daytum is basically a counter.  If you want to count the drinks you've had, Daytum.  If you want to count the number of apples you eat, Daytum.  Limited, but user-friendly.  Yfd, has a more flexible data structure. You can specify the type of data (categorical, event, counter, measurement), and they... tag items better.  This wouldn't seem like a deciding factor, but Daytum, for example would not let me differentiate watching southpark on jtv and watching basketball on jtv.  Maybe I'm missing something, but it was frustrating, and just decided I'd put my Measuring Life efforts into yfd - because of the increased flexibility in what and how to measure my life.  Another important (deciding) feature: I couldn't figure out how to track my weight on Daytum - I came up with a hack to do it, but it seemed I would have to download the data myself and view it that way.  Anyway, yfd it is.

I decided that logging my life this way needed to be an extremely simple process.  There's a visualization tool that yfd provides that will pair two "actions" and generate graphs of the duration between those two actions.  So, whenever I start or stop an activity, I use the following rules: Start) "action"-ing, Stop) "action"-ed.  Now whenever I'd like I can pair these actions and keep track of how much time I spend doing them.  For example, I am "blogging" right now, and I will have "blogged" when I'm finished here.

If my comp's on, then logging this data is pretty simple and takes <10 sec.  If I have to use my (unreliable) BB Storm, then it takes <15 sec, or my phone's not working properly and it doesn't get logged.

Using twitter to do this is pretty simple.  You follow @yfd and create and account with your.flowingdata.com, @yfd requests to follow you and voila!  For what I'm doing, adding data is simple.  Just tweet "d yfd running" when I start running, and "d yfd ran" when I'm done running.  Or sleeping, or cooking, or eating, etc.

More generally, the syntax has a format something like this:

d yfd [action] [value] [unit] [time] #[tag] #[tag] ... #[tag]

... something like that.  For now, I'm keeping it simple.

Sorry, Daytum.  You just don't cut it in this analysts book.  Next I'll be looking into grafitter.com, whose syntax is a little different.  The functionality doesn't appear (on the surface) to be "better" than yfd, but we'll see.

Pick o' the Post #5: "The Day that Never Comes" by Metallica on Death Magnetic

Personal Data Collections, Analysis, and Visualization

I am extremely excited about this new vain of "reserach" that i've found.  Collecting and Analyzing/Visualizing personal data.  I'm also writing this post because I've run across enough websites and services that I want some kind of reference so I don't have to remember all of them.

First, I found your.flowingdata.com.  I've looked into this the most so far, and have only glanced at other services such as daytum.com, and grafitter.com.  There's a couple of characteristics which will shape my decision as to which service(s) I want to use.

At one end of the spectrum of personal data collection, we can imagine a 1984-like deal, where it's completely open.  On the other side, you can authorize everything.  It seems that these services have started from the authorization side of things.  It seems that a big part of this competition is ease of use... then visualization.  We'll see - I've only been at this for a bit (couple hours now).  See, I could've logged that time, but it's just not convenient enough yet.

Similar data sites:
flowingdata.com
curetogether.com
quantifiedself.com
manyeyes.alphaworks.ibm.com/manyeyes/
tweetstats.com
twistori.com

Pick o' the Post #4: "Up from the Skies" by Jimi Hendrix on Axis: Bold as Love

Thursday, January 28, 2010

Wednesday, January 20, 2010

How much data are we conscious of? ... health care and a song pick too.

As a data analyst, I think about data too much.  I was driving home from work today and a thought crossed my mind... again.  I ponder this too often, but there is so much data that goes uncollected, uncollected, unobserved, and just taken for granted.  What could we learn from the seemingly unnecessary pieces of reality that just pass us by.  Even if we could collect data in the tremendous quantities that it rolls by us, how would we analyze googles of data - remember, google's a number... .  Well, in all honesty, Google is probably in the best position to answer that question.


The first problem to consider comes from a term that I recently learned form Google's chief economist, Hal Varian... data munging.  This can be considered data prep also, but I think "munging" is meant to be a more general term - not really sure.  This is what gives much of my labor its value.  I'll admit, I'm a youngster in the labor force, so I'm still learning and honing my skills, but if data is not formatted, shaped, reshaped, transformed, merged, appended, dropped, sorted, etc., etc. properly then it might as well just pass right by us.  This can be the most time consuming part of analyzing data, whatever the data happens to be.


Just as a proposition... could data munging be automated?  From a VERY abstract pov, I think this question calls for generalizing a definition of data and the sorts of analysis that are relevant.  There are so many ways to look at data.  I think the answer is, "sure, if you know the source and it's natural format."  But what about new data, like the spawning of a never-before-witnessed black market.  There might be ways to collect, munge, and analyze data from current black markets, but how would you generalize that process when a new (unexpected) one shows up.
The black market is only an example, but consider a scenario where every piece of the DGP (Data Generating Process) is observable.  First off, this is impossible - we'd spend more time observing than time exists, so... .  Anyway, imagine it's so, I'd guess that tools along the lines of anomaly detection would be used.  In that sense, we (think we) know how the world is now, and we observe how it changes - pretty simple.  I wasn't going anywhere in particular with this, but these are the little frustrations that roll through my mind on a daily basis.


On a completely different note, I'm gonna talk about health care.  Not what we should or shouldn't do, but a recent perspective I've had on what's been taking place in washington lately.  Sort of porting my thoughts about data to an optimistic viewpoint on healthcare.  This first came to mind when I'd thought the dems reform bill was an eventuality, which may not be the case anymore with Scott Brown in the Senate now.  Anyway, aside from all the things that I think are inefficient with our current system, I do want Washington to make major changes to it sometime - I don't care if it's a dem or rep bill, just change it... and, I almost don't care what you change as long as it's broad and relatively major.  Of course, I have different personal views because in the end it's my pocket I care about.


This is about understanding our health system.  Right now we know the rules and how to play the game as it is now.  From the perspective of a "social consciousness" (and data) perspective, this is like having a data set of a couple hundred thousand observations - one for each person that plays the game.  They each have their own opinions and such.  From day to day, the game changes very little.  If Washington passes legislation that turns football into rugby (metaphorical), then everyone has to adjust... somehow.  Managers can't use the same strategies, players can't either.  It's like giving each person one more observation to consider in their data set - thus a couple hundred thousand more observations.  Now we have some variation, which was entirely absent before.

My point is that if we all have this second observation in our data sets we can develop a better understanding of what's really important in our health care system.  Rather than listening to politicians, economists, opinion leaders, etc. tell us what's important, we can learn first hand.  The problem with this is that it costs resources to adapt, whether it's time, money, or stress - it's not necessarily a pleasant approach, but we are sure to learn a lot about our health care system and ourselves.  Right now, we think we know what's important, doctors think they know what's important, politicians think they know what's important.  We all think we know what's important to play the screwed up game we have right now, but how do we play the game that provides us with the healthiest nation as a whole.  No one knows that because we can't figure out the best rules for that game.

No matter what happens with health care, it's in the American psyche now and we'll think about it more regardless.  If things do change, pay attention and reform your opinions.  Our current system is messy, and any new system will not be an end-all (it may even be messier).  But we need to pay attention to the effects that any of these changes will have on our lives, because we won't get the chance to again until we go through another one of these polarizing health care debates.

Pick o' the Post:  "On Impulse" by Animals As Leaders on Animals As Leaders

This is Animals As Leaders debut album, and this track is pretty cool.  Oddly enough, I really like the techno-ish beat around 1:20 and 2:50.  This whole album has, just, amazing guitar work on it.  The musicianship reminds me of so many different artists as various times (BTBAM, Guthrie Govan, DT, Meshuggah), but they are very uniquely, Animals As Leaders.  Enjoy.

Sunday, January 17, 2010

Calculating Words

Wow! I thought calculatingwords.blogspot.com would be too general to be available. Here it is, and I like the title... "countingwords" was taken, but I think I like this better. Either way, it's here to stay.

Hi, my name's Dan Bowen and I'm a data analyst for BookLamp.org, hence... Calculating Words. I also find that when I mean to express something in writing, I'm finicky about how it reads - for sure, I'm not alone... and not always perfect about it. I use every character I can think of, no matter how obscure to get the feeling I'm looking for... elipses, double-elipses, double-dash (thanks Paul), dashes, ampersands, carrots, semicolons, pipes, tildes, etc. ... and often. Sometimes I feel like there's an order of operations for punctuation, or the intended pause isn't long enough, or maybe a certain %haracter pops into my head that I just want to use.

I can think of times when I've spent hours writing something up and just scraped it as a lost cause, even though I still really want to get the idea out - again, I can't be alone here. Anyway, I calculate words for a living, and my own as an exercise.

Nuff with the intro, my this first post (below) is something I plan to keep up on a weekly basis. Aptly named, "Pick of the Week." Thanks to my friend Austin's BlahBlahBlog for inspiring this. I'm going to try to keep these as generally "acceptable" as possible. So, I'll do my best to steer clear of all-out weedles, or straight up death metal. But, I'm not sure how long that's gonna last.

The pick of the week is: (... drumroll ...)

"Desert of Song" by Between the Buried and Me on The Great Misdirect