Borasky Research Journal Google+ Page

Borasky Research Journal Amazon Store


Data Journalism Developer Studio 2012LX

 

Data Journalism Developer Studio 2012 Overview

Download Data Journalism Developer Studio 2012 From SUSE Gallery

Data Journalism Developer Studio 2012 on Github

Data Journalism Developer Studio 2012 Blog


I’ve just released Data Journalism Developer Studio 2012. This is a major refactoring of the code base. The major user-visible changes are:

  1. I’ve removed RStudio Server for the time being. It was redundant for most users, and removing it freed up over 100 MB on the released appliances. I do plan to put an installer script for it on the appliance at a later date.
  2. Given the availability of a big chunk of space, I was able to move some frequently-used packages out of the options and into the released appliance. They are
    1. The R Commander GUI. This turns R into a spreadsheet-like user interface. I’ve included the Text Mining plugin as well.
    2. Google Refine. This is another spreadsheet-like tool for working with messy data. The Tesseract Optical Character Recognition package is also included.
    3. Maqetta. This is a WYSIWYG HTML5 user interface builder based on the Dojo JavaScript libraries.
    4. The Perl utilities are back in the main appliance.
  3. I’ve re-organized the install scripts slightly. The BARD re-districting mapping tool is now part of the Spatial task view, and the “beancounter” financial database tool is now part of the Finance task view.

There’s more coming in the next few weeks on the road map. I’ve been testing the Octopress lightweight blogging platform. It’s quite technical – it’s billed as a blogging platform for hackers, and that’s a pretty good description. It’s very lightweight, though, and it works with Github for painless deployment and version control. There will be a sample blog for the Data Journalism Developer Studio 2012 up on Github in a day or so.

Now that the Twitter Perl libraries are back in the main appliance, I’ll be putting my Twitter user and tweet CSV dump routines on the appliance. That way, you’ll be able to acquire tweets or user lists and process them from the appliance desktop.

 

About Data Journalism Developer Studio


In all the technology news last week, you might have missed this story. I only saw it mentioned on Reuters, not on any of the major technology blogs that I read. As is my usual practice when I see a technology story that matches my interests, I try to locate the original sources and post links on Twitter. So in case you missed those, here they are:

LinkedIn shares were a bubble: academic model | Reuters http://meb.tw/iNiM8R
@znmeb
M. Edward Borasky
Is There a Bubble in LinkedIn's Stock Price?http://meb.tw/loYBD3 [pdf]
@znmeb
M. Edward Borasky

There’s a fair amount of technical detail about the model in the paper cited in my second tweet. If you want even more, the model itself is documented here:

How to Detect an Asset Bubble by Robert Jarrow, Younes Kchia, Philip Protter :: SSRN http://meb.tw/iqvwUQ

So what’s the story here? From “Is There a Bubble in LinkedIn’s Stock Price?”:

It has been well documented in the financial press that a methodology is needed that can identify an asset price bubble in real time. William Dudley, the President of the New York Federal Reserve, in an interview with Planet Money [3] stated “…what I am proposing is that we try to identify bubbles in real time, try to develop tools to address those bubbles, try to use those tools when appropriate to limit the size of those bubbles and, therefore, try to limit the damage when those bubbles burst.”

It is also widely recognized that this is not an easy task. Indeed, in 2009 the Federal Reserve Chairman Ben Bernanke said in Congressional Testimony [1] “It is extraordinarily difficult in real time to know if an asset price is appropriate or not”.

Here’s a link to the William Dudley interview, and one to Bernanke’s testimony.

Professor Jarrow and his colleagues took up the challenge laid down by the Federal Reserve Board. The model they have devised is quite complex, involving stochastic differential equations and reproducing kernel Hilbert spaces. They tested this model on stock price data from “the alleged internet dotcom bubble (and beyond), from 1999 to 2005.” While there will no doubt be much more peer review of the data, model and conclusions, the test shows promise. Moreover, it can be applied to the price of any publicly-traded stock. The test has three possible results:

  1. There’s definitely a bubble.
  2. There’s definitely not a bubble.
  3. No conclusion about a bubble can be drawn from the data.

So now we come to LinkedIn. LinkedIn was publicly traded for the first time on May 19, 2011, using the symbol LNKD. Professor Jarrow and his colleagues obtained real-time price data from Bloomberg for the first four days of trading and applied their model. And their claim is quite definitive:

We have found, definitively, that there is a price bubble!

While the technology is certainly interesting in its own right, at least to data journalists like myself, what are the wider implications of this? First of all, the context of the Dudley interview was the Finance / Insurance / Real Estate (FIRE) sector and the holdings of the Federal Reserve Board in that industry. As we all know, the Great Recession we discuss on a daily basis originated in the FIRE sector.

The context of the model Jarrow, et. al., have created, on the other hand, is publicly-traded stocks. In particular, the model was initially tested on Internet stocks during a well-documented bubble, and applied to a social media stock within days of its initial public offering. Moreover, the model should work in real time. Given a live data feed and enough computing capacity, it should be possible to monitor data and make investment decisions in real time.

Even though the model is designed for real-time publicly-traded stocks, it should be applicable to any financial time series that satisfies the underlying mathematical assumptions. This includes, for example, prices of shares in the “secondary markets” for companies like Facebook and Twitter. I haven’t attempted to implement the model yet – I’ve been away from computational finance for several years and I’m in the process of coming back up to speed on the methodologies. The core technologies are available in the Data Journalism Developer Studio, however, and if anyone is interested in working on this, send me a tweet @znmeb.

 

I’ve just pushed release 1.0.0 of the Data Journalism Developer Studio into the SUSE Gallery. Changes:

  • The base appliance ships with Mozilla Firefox as the browser rather than Chromium. Chromium is available as an add-on installation script set. This was a difficult decision for me to make, but the version of Chromium in the Open Build Service is 13.0.xxx, which is updated frequently and can be unstable. This is roughly equivalent to Google’s “Canary” build on Windows and Macintosh. Chromium was proving too unstable for regular use, so I replaced it with Firefox.
  • I added CoffeeScript to the install scripts for node.js and NowJS. If you’re a JavaScript developer, I welcome more suggestions for node.js packages.

I’m planning to open the project up to other developers in the near future. Now that the Fundry feature request mechanism is in place, the road map is public. My own plan is to start building user-level documentation. Most of the software in the appliance is well-documented on its own, but there aren’t too many examples of application-level usage that I’ve been able to find.

Powered by Fundry

 

I’m really conflicted about this. On the one hand, I know Twitter needs to sell advertising, and web services need to promote themselves. And yes, this is a real news event, not a manufactured story. But I wonder – are we heading back to the days of “Yellow Journalism” in the tweet stream? Please comment below.

 

 According to Mashable, “Kraft Looks to Reward Twitter Users Who Tweet About Mac & Cheese“,

Under a new program quietly rolled out over the past few weeks, any time two people individually use the phrase “mac & cheese” in a tweet, they’ll each get a link pointing out the “Mac & Jinx.” The first one to click the link and give Kraft his or her address gets five free boxes of Kraft’s mac and cheese and a T-shirt.

It seems that “Mac & Cheese” is now a Trending Topic, as of 2011-03-08 19:12 UTC. But when you click on the topic, you see this Promoted Tweet:

What could be worse? Alyssa Milano, who has 1,403,372 followers, posted this tweet:

This could get interesting. 

Update: it has gotten interesting. @WootLive has gotten into the act.

Update: FriendsEAT has tweeted about capacity issues stemming from their article.

Oh, by the way — the Kraft campaign that started this whole thing is being run by Crispin Porter + Bogusky. Does that name sound familiar? It’s the same agency that came up with the GroupOn Super Bowl ads about Tibet and seafood curry.

 

Update 2011-03-20

For a variety of reasons, I have replaced the Social Media Analytics Research Toolkit, Code Like A Pirate and Project Kipling with a new, modular appliance called the Data Journalism Developer Studio. All of the software found in those three appliances can be installed via scripts provided in the new appliance. Links:


Upon careful reading of Twitter’s API Terms of Service, I have decided to temporarily remove two appliances from the SUSE Studio Gallery. Those two appliances are the Social Media Analytics Research Toolkit (SMART@znmeb) and Project Kipling Real-Time Data Journalism Tools. I do intend to put them back on line at some point in the future, but I do not at this time know when they will be back, because I haven’t determined the scope of required changes to the appliances or their marketing materials. Why? These two appliances may be in violation of item 4.A. below:

4. You will not attempt or encourage others to:

A. sell, rent, lease, sublicense, redistribute, or syndicate the Twitter API or Twitter Content to any third party for such party to develop additional products or services without prior written approval from Twitter;

B. remove or alter any proprietary notices or marks on the Twitter API or Twitter Content;

C. use or access the Twitter API for purposes of monitoring the availability, performance, or functionality of any of Twitter’s products and services or for any other benchmarking or competitive purposes; or

D. use Twitter Marks as part of the name of your company or Service, or in any product, service, or logos created by you. You may not use Twitter Marks in a manner that creates a sense of endorsement, sponsorship, or false association with Twitter. All use of Twitter Marks, and all goodwill arising out of such use, will inure to Twitter’s benefit.

E. use or access the Twitter API to aggregate, cache (except as part of a Tweet), or store place and other geographic location information contained in Twitter Content.

While I don’t encourage people to redistribute Twitter data, the appliances do have the ability to collect Twitter data and I can’t prevent them from redistributing it. I want to emphasize that Twitter has not asked me to take these appliances down! I don’t know that they violate the letter of item 4.C., but I think they violate the spirit of that clause, so I am removing them until I can determine in what form they are viable products.

 

Update 2011-02-08: Twitter blogs about the Al Jazeera campaign.

Robin Sloan (@robinsloan) of Twitter has written a blog post detailing the Al Jazeera campaign. He confirmed that Al Jazeera is in fact watching the keywords and promoting tweets if the keywords become trending topics. Robin has some very nice graphics on the blog post showing the spikes in tweets per hour around the promotions, although they only show the tweet rate spikes, not the Promoted Tweet insertion points, the keywords, or any of the other detailed tracking that their analytics platform is capable of providing.

Speaking of the analytics platform, a little more detail about the underlying mechanisms surfaced last week on SlideShare. Kevin Weil, Twitter’s head of analytics, posted this presentation on Rainbird. It’s an interesting approach – a patch to the open-source Cassandra database to allow hierarchical counting.


Update 2011-02-07: Yet another Promoted Tweet using “Egypt” to get attention on Twitter. This time it’s “Trade King”, an online brokerage.

When does it end? After last night’s disgraceful Groupon commercials on the Super Bowl and last week’s Kenneth Cole tweet, I’m beginning to think there’s not much human misery left that someone won’t try to use to hawk their wares. Twitter, I think it’s time you planted a stake in the ground and said, “There are some search keywords we will not allow in Promoted Tweets. Egypt is the first, and there will be others.”


Update 2011-02-05: The Committee to Protect Journalists (CPJ, tweeting as @pressfreedom) has purchased a Promoted Tweet.


Update 2011-02-04: Al Jazeera has now purchased a Promoted Trend hashtag “#demandaljazeera”!

I see the story is being picked up now from various blogs. It’s going to be an interesting weekend, with the events in Egypt competing for our attention with Super Bowl XLV.

Twitter’s Promoted Trends clearly are a winner. Just in case nobody has reminded you of this lately, Al Jazeera and Twitter are both businesses. Twitter, the business, sold Al Jazeera, the business, advertising, just as they have sold advertising to Audi, Google and others recently. And the cable companies that Al Jazeera wants to start distributing Al Jazeera content are businesses, too. This is about money, pure and simple. This is about closing sales.


If you’re following the events unfolding in Egypt, I’m sure you’ve heard the major news stories, including the attempts by the Egyptian government to shut down cell phone and Internet communications. And I’m also sure by now you’ve heard of Al Jazeera English, which is, as the name suggests, the English-language service of Al Jazeera. For some background on Al Jazeera English, you can read these stories in the New York Times:

As the Times notes, Al Jazeera English’s images and stories are getting through, even though governments may be attempting to block them. But in addition, Al Jazeera English is actively using Twitter’s advertising mechanisms – Promoted Accounts and Promoted Tweets – to build a following on Twitter and market itself! Al Jazeera English has purchased Promoted Tweets on the major hashtags – #Egypt, #jan25, #Mubarak, #egipto – and other searches such as “Egypt” and “Mubarak” and even the names of some of the new cabinet ministers.

The following screen shots are typical.


What’s even more interesting is that Al Jazeera English has purchased a Promoted Account, which means it sometimes shows up at the top of “Who To Follow”:


I haven’t seen a Promoted Trend yet – perhaps Al Jazeera English marketing thought that would be tacky. And I’m not sure what to make of all of this, given Twitter’s blog post, “The Tweets Must Flow“. Perhaps better journalists than I will step forward and provide me with some clues in the comments. Meanwhile, follow the money.

Oh, yeah – while we’re on the subject - Yellow journalism: From Wikipedia, the free encyclopedia

© 2011 Borasky Research Journal Suffusion theme by Sayontan Sinha