Download Data Journalism Developer Studio 2012 From SUSE Gallery
Update 2012-01-15 – In curating a story on Sentiment Analysis and the 2012 Election, I discovered this blog post by Laurent Luce on Twitter sentiment analysis using Python and NLTK, the natural language processing toolkit. The Python NLTK is not in the base appliance, but it can be installed using the following commands:
> cd /home/studio/Install-Scripts/Python-NLTK
> ./cleanup.bash
> ./install-dependencies.bash
> ./install-bash
Update 2012-01-14 – I added the ‘textir’, ‘tm.webmining’ and ‘tm.sentiment’ library packages to the base appliance in version 2.2.0. So there’s no need to install anything in the base appliance if you want to do sentiment analysis. A good overview of sentiment analysis can be found at Sentiment Analysis and Subjectivity.
If you’ve been following the history of the Data Journalism Developer Studio, you know that it evolved from three previous appliances. Those appliances have been discontinued, but the software in them for the most part lives on in the current one. I’ve been seeing quite a bit of search traffic to my blog coming from the “sentiment analysis” keyword, so I’m posting this mini-guide to getting started.
Sentiment analysis in Data Journalism Developer Studio 2012 is done using the textir R library package. Textir is a “set of tools for inference about text and associated speaker/document sentiment,” created by Assistant Professor of Econometrics and Statistics and Robert L. Graves Faculty Fellow Matt Taddy of the University of Chicago Booth School of Business.
If you’re interested in the mathematics behind this package, Professor Taddy has posted a document to Archiv.org, titled “Inverse Regression for Analysis of Sentiment in Text.” Three sample problems and their solutions are described in the paper: ideology in political speeches, on-line restaurant reviews and business news and stock performance. The political speech, restaurant review and business news datasets are included with the library. See also On Estimation and Selection for Topic Models.
The easiest way to get this package is to install it via Rstudio. Start up Rstudio and select the “Packages” tab in the lower right quadrant. Then press the “Install Packages” button. Type “textir” in the middle line on the form and press “Install”.
