May 142012
 

Updated 2012-05-18 14:54 PDT:

I missed this when I wrote the post, but the algorithm Overview uses to cluster the documents is a new one. Here’s the write-up:

Hierarchical Clustering and Tagging of Mostly Disconnected Data


If you’ve been following my tweet stream, you saw me tweet this:

At $1450 a month for five seats, I think the service is overpriced. Moreover, Twitter, Facebook/Instagram, Google/YouTube or Yahoo/Flickr could easily build this into their web sites and deliver it for free, essentially by-passing two middlemen – Geofeedia and the news organization subscribing to Geofeedia. And a clever RSS / Yahoo! Pipes hacker could build something like this for use in a newsroom. For that matter, if you limit yourself to Twitter you can do most of this with Twitter / Advanced Search.

I must admit that I love the idea and think this could evolve into something game-changing. I wrote about the potential for this back in January 2010!

The Twitter Streaming API — How It Works and Why It’s A Big Deal

To get an idea what this could become, check out Knowledge Discovery from Data Streams by Joao Gama.

Moving on, I don’t know how I’ve managed to be a tech blogger writing about computational journalism without discovering Overview until last week, but it happened. Twitter serendipity at work – I was watching my Interactions page and saw a tweet of mine retweeted by @overviewproject. The Overview project is led by Jonathan Stray. You can see the entire team here.

Overview is open source, lives on Github and appears to be a mix of Ruby and Java. I’m currently testing it out for potential inclusion in one of my computational journalism appliances. It’s a browser / desktop application, so most likely it will end up in the successor to Data Journalism Developer Studio  2012LX. If you want to work with it yourself, the instructions are here.

So which of the two represents the future of journalism? Both, of course! With the proper underlying database and real-time knowledge discovery algorithms, Geofeedia could be a game-changer. But in the long run, as a for-profit service, I think they’ll either get acquired or duplicated by the big players..The Overview project, on the other hand, is an open source project. It’s well-funded by the Knight Foundation and Associated Press, and the team is led by one of the well-known names in computational journalism. Overview is certainly going to be part of my future.