Borasky Research Journal Google+ Page

Borasky Research Journal Amazon Store


Data Journalism Developer Studio 2012LX

 

Download Data Journalism Developer Studio 2012 From SUSE Gallery


Update 2012-01-15 – In curating a story on Sentiment Analysis and the 2012 Election, I discovered this blog post by Laurent Luce on Twitter sentiment analysis using Python and NLTK, the natural language processing toolkit. The Python NLTK is not in the base appliance, but it can be installed using the following commands:

> cd /home/studio/Install-Scripts/Python-NLTK
> ./cleanup.bash
> ./install-dependencies.bash
> ./install-bash


Update 2012-01-14 – I added the ‘textir’, ‘tm.webmining’ and ‘tm.sentiment’ library packages to the base appliance in version 2.2.0. So there’s no need to install anything in the base appliance if you want to do sentiment analysis. A good overview of sentiment analysis can be found at Sentiment Analysis and Subjectivity.


If you’ve been following the history of the Data Journalism Developer Studio, you know that it evolved from three previous appliances. Those appliances have been discontinued, but the software in them for the most part lives on in the current one. I’ve been seeing quite a bit of search traffic to my blog coming from the “sentiment analysis” keyword, so I’m posting this mini-guide to getting started.

Sentiment analysis in Data Journalism Developer Studio 2012 is done using the textir R library package. Textir is a “set of tools for inference about text and associated speaker/document sentiment,” created by Assistant Professor of Econometrics and Statistics and Robert L. Graves Faculty Fellow Matt Taddy of the University of Chicago Booth School of Business.

If you’re interested in the mathematics behind this package, Professor Taddy has posted a document to Archiv.org, titled “Inverse Regression for Analysis of Sentiment in Text.” Three sample problems and their solutions are described in the paper: ideology in political speeches, on-line restaurant reviews and business news and stock performance. The political speech, restaurant review and business news datasets are included with the library. See also On Estimation and Selection for Topic Models.

The easiest way to get this package is to install it via Rstudio. Start up Rstudio and select the “Packages” tab in the lower right quadrant. Then press the “Install Packages” button. Type “textir” in the middle line on the form and press “Install”.

 

Data Journalism Developer Studio 2012LX Blog


Last week, two stories broke about vendors showing off their sentiment analysis tools on social media messages about the 2012 election. The “smaller” story is about Twitter “predicting” the results of the New Hampshire primary. The “larger” story is about Facebook making a deal with Politico to share public and private data about the GOP candidates.

As you can imagine this topic is of extreme interest to me, and I’ve taken two steps in researching this story.

  1. I’ve put the CRAN sentiment analysis library packages ‘textir‘ and ‘tm.sentiment‘ into the base Data Journalism Developer Studio 2012 appliance, so you can experiment with this in the comfort and safety of your own home, without having to buy any software.
  2. I’ve started curating news and technology articles on the topic at Scoop.it: Sentiment Analysis and the 2012 Election.

I’m not sure how long this is going to be an active news story. The ACLU has weighed in on the Facebook – Politico deal, but in the larger context of SOPA, it may get lost in the shuffle.

Research papers on sentiment analysis:

© 2011 Borasky Research Journal Suffusion theme by Sayontan Sinha