Borasky Research Journal Google+ Page
Borasky Research Journal Amazon Store

Ed Borasky

 

Data Journalism Developer Studio 2012LX Blog


By now, you’ve probably seen the reactions to Apple’s “education event” yesterday. My take is that it was 100% Apple marketing and zero “disrupting education.” It was all about selling overpriced tablets to schools that are struggling to keep teachers on the payroll. It was all about forcing authors to buy new Macintosh machines or upgrading existing ones to MacOS X “Lion”. And it was about a restrictive EULA for authors.

Textbooks should be free! That’s one way to disrupt education. And CK12.org provides Science, Technology, Engineering and Mathematics (STEM) textbooks for free. These are textbooks developed by educators, not marketers. They work on iPads, Kindles, PDF readers, or you can read them on line in your browser. There are authoring tools on the web site as well. The current CK-12 FlexBooks Library lists 38 mathematics textbooks, 34 in science and 20 in other subjects. Some have both student and teacher editions. Once you have an account, you can access the authoring and reformatting tools. I highly recommend doing this even if you only want to read or teach from the books.

And education software should be free! The most comprehensive collection of free educational software I’ve found is openSUSE Linux for Education – openSUSE:Education-Li-f-e. This is a LiveDVD that will boot on most PC-based hardware with at least 1 GB of RAM.. You don’t even need a hard drive – since it’s a Live DVD, Li-f-e doesn’t touch the hard drive unless you explictly direct it to do so. If you want, you can copy the DVD to a USB drive and boot from that. The directions for that are here.

Li-f-e is an absolutely stunning collection of software. It has the openSUSE 12.1 32-bit Linux operating system, the GNOME 3, KDE 4 and ultra-light IceWM desktops, desktop / productivity software, and a comprehensive collection of educational software for students ranging from pre-school all the way up into graduate school. It also has a complete Linux / Apache / MySQL / PHP (LAMP) server stack, a Linux Terminal Server Project (LTSP) server stack and a complete suite of professional software and web development tools. And the Scratch tools for teaching kids to program are there.

Given that these free tools exist, and have been around since well before the iPad, I don’t see how Apple marketing can claim to be disrupting education. There’s real disruption if you know where to look.

 

Download Data Journalism Developer Studio 2012 From SUSE Gallery


Update 2012-01-15 – In curating a story on Sentiment Analysis and the 2012 Election, I discovered this blog post by Laurent Luce on Twitter sentiment analysis using Python and NLTK, the natural language processing toolkit. The Python NLTK is not in the base appliance, but it can be installed using the following commands:

> cd /home/studio/Install-Scripts/Python-NLTK
> ./cleanup.bash
> ./install-dependencies.bash
> ./install-bash


Update 2012-01-14 – I added the ‘textir’, ‘tm.webmining’ and ‘tm.sentiment’ library packages to the base appliance in version 2.2.0. So there’s no need to install anything in the base appliance if you want to do sentiment analysis. A good overview of sentiment analysis can be found at Sentiment Analysis and Subjectivity.


If you’ve been following the history of the Data Journalism Developer Studio, you know that it evolved from three previous appliances. Those appliances have been discontinued, but the software in them for the most part lives on in the current one. I’ve been seeing quite a bit of search traffic to my blog coming from the “sentiment analysis” keyword, so I’m posting this mini-guide to getting started.

Sentiment analysis in Data Journalism Developer Studio 2012 is done using the textir R library package. Textir is a “set of tools for inference about text and associated speaker/document sentiment,” created by Assistant Professor of Econometrics and Statistics and Robert L. Graves Faculty Fellow Matt Taddy of the University of Chicago Booth School of Business.

If you’re interested in the mathematics behind this package, Professor Taddy has posted a document to Archiv.org, titled “Inverse Regression for Analysis of Sentiment in Text.” Three sample problems and their solutions are described in the paper: ideology in political speeches, on-line restaurant reviews and business news and stock performance. The political speech, restaurant review and business news datasets are included with the library. See also On Estimation and Selection for Topic Models.

The easiest way to get this package is to install it via Rstudio. Start up Rstudio and select the “Packages” tab in the lower right quadrant. Then press the “Install Packages” button. Type “textir” in the middle line on the form and press “Install”.

 

Data Journalism Developer Studio 2012LX Blog


Last week, two stories broke about vendors showing off their sentiment analysis tools on social media messages about the 2012 election. The “smaller” story is about Twitter “predicting” the results of the New Hampshire primary. The “larger” story is about Facebook making a deal with Politico to share public and private data about the GOP candidates.

As you can imagine this topic is of extreme interest to me, and I’ve taken two steps in researching this story.

  1. I’ve put the CRAN sentiment analysis library packages ‘textir‘ and ‘tm.sentiment‘ into the base Data Journalism Developer Studio 2012 appliance, so you can experiment with this in the comfort and safety of your own home, without having to buy any software.
  2. I’ve started curating news and technology articles on the topic at Scoop.it: Sentiment Analysis and the 2012 Election.

I’m not sure how long this is going to be an active news story. The ACLU has weighed in on the Facebook – Politico deal, but in the larger context of SOPA, it may get lost in the shuffle.

Research papers on sentiment analysis:

 

Data Journalism Developer Studio 2012 Overview

Download Data Journalism Developer Studio 2012 From SUSE Gallery

Data Journalism Developer Studio 2012 on Github

Data Journalism Developer Studio 2012 Blog


I’ve just released Data Journalism Developer Studio 2012. This is a major refactoring of the code base. The major user-visible changes are:

  1. I’ve removed RStudio Server for the time being. It was redundant for most users, and removing it freed up over 100 MB on the released appliances. I do plan to put an installer script for it on the appliance at a later date.
  2. Given the availability of a big chunk of space, I was able to move some frequently-used packages out of the options and into the released appliance. They are
    1. The R Commander GUI. This turns R into a spreadsheet-like user interface. I’ve included the Text Mining plugin as well.
    2. Google Refine. This is another spreadsheet-like tool for working with messy data. The Tesseract Optical Character Recognition package is also included.
    3. Maqetta. This is a WYSIWYG HTML5 user interface builder based on the Dojo JavaScript libraries.
    4. The Perl utilities are back in the main appliance.
  3. I’ve re-organized the install scripts slightly. The BARD re-districting mapping tool is now part of the Spatial task view, and the “beancounter” financial database tool is now part of the Finance task view.

There’s more coming in the next few weeks on the road map. I’ve been testing the Octopress lightweight blogging platform. It’s quite technical – it’s billed as a blogging platform for hackers, and that’s a pretty good description. It’s very lightweight, though, and it works with Github for painless deployment and version control. There will be a sample blog for the Data Journalism Developer Studio 2012 up on Github in a day or so.

Now that the Twitter Perl libraries are back in the main appliance, I’ll be putting my Twitter user and tweet CSV dump routines on the appliance. That way, you’ll be able to acquire tweets or user lists and process them from the appliance desktop.

 

Data Journalism Developer Studio 2012LX Blog


I was on the webinar that introduced this book, along with thousands of others. Over 30,000 registered, and the count of people who attended was 10,899. If you had a Kindle, you could download this book for free. I did. In fact, if you’re an Amazon Prime member, you can still get the Kindle edition for free. Dan Zarrella has been collecting Twitter and Facebook data for some time now, and this book is the result of a careful study of what works, what doesn’t work, and what sometimes works. It’s an easy read and very useful.


Pulse is along the lines of the previous book, but is much more detailed, and talks about more data sources. Moreover, it’s research-oriented – how to do the sort of thing Dan Zarrella did to define his hierarchy of contagiousness.

Pulse covers Google search trends, Facebook connections, blogs and Twitter in some detail. THe themes are “what we surf, who we friend and what we say.” Chapter 8 describes three “potential pulses” – “where we go, what we buy and how we play”. You should think of this book as an overview – you’ll need to dig deeper if you want to implement any of the research described.


One of the recurring themes in the past two years has been the so-called “lean startup”. I have some skepticism about the concept, particularly the way it’s been described in blogs. So it’s refreshing to see a book like Venture Deals come out that’s full of actual meat, not just admonitions to “fail faster.” The authors were here in Portland a few weeks ago for a well-attended lecture on the contents, and I have a signed copy. If you’re starting a business, you need to read this book first.


I’ve covered this book at some length here, so I’ll defer you to the previous review. While it’s very advanced technically, it’s the best book published this year on trading technology. Most of the year’s other trading books are rehashes of decades-old technical analysis methods that may or may not work any more. If you want to be a trader, I highly recommend getting this book.


We heard a lot about economic inequality this year, and we’ll hear a lot more in 2012. Whether it’s Occupy Wall Street, calls for higher taxes on the wealthy by Warren Buffett, or President Obama’s speech in Osawatomie, Kansas, economic inequality and how to reverse it have become a topic of interest.

Changing Inequality is an easy-to-read book on the subject. It traces the causes of economic inequality over the past three decades, and suggests a few possible ways to reverse the trend. It should be noted, however, that reversing a 30-year trend of rising inequality isn’t easy. For example, as noted in the book,

In the late 1980s, when it first became clear that rapid increases in inequality were more than a short-term or cyclical phenomenon, researchers began to look for causes. It was almost a decade before widespread consensus was reached among economists that these changes were largely driven by skill-biased increases in demand, many of them probably the result of technological changes linked to a growing use of computer technologies.

The challenge for policy-makers is to devise policies that promote both growth and equality. Changing Inequality is a good place to start.

 

Data Journalism Developer Studio 2012LX Blog


The world of computational finance has changed dramatically since I first got interested in the underlying mathematics in 1982. We’ve seen events like the stock market crashes in 1987 and 1989, the failure of Long Term Capital Management in 1998, and more recently, the collapse of Lehman Brothers in September 2008 and the “Flash Crash” in May of 2010.

I’ve spent a fair amount of time over the past year catching up on the theory and practice of algorithmic trading. The following three books are the best I’ve found on the subject. Having made my way through them, I consider traditional technical analysis at best useless and at worst downright suicidal. They are expensive; if you can only afford one of them, I’d recommend the second, Asset Price Dynamics, Volatility, and Prediction by Stephen J. Taylor.


 

Financial Markets and Trading is the newest of these books, and is also the most expensive. It’s designed as a textbook at the undergraduate / graduate level and is fairly self-contained. Schmidt does cover a lot of ground, however, and for implementation details you’ll probably need to search out the original papers on the Internet.

What makes this book unique is

  • An extended section on high-frequency trading, including an overview of the May 2010 “Flash Crash”, and
  • A comprehensive chapter on testing technical trading rules.

These testing techniques go well beyond the traditional backtesting / optimization techniques that are well-known among traders. As this book and its references show, technical analysis sometimes works and sometimes it doesn’t. You’ll need these algorithms to know the difference.


 

 

As I noted above, if you can only afford one of these books, this is the one to get. Unique features include

  • Spreadsheet formulas for many of the algorithms,
  • Algorithms for extracting information from high-frequency data
  • Implied return density calculations from options prices

There are also some algorithms for testing technical trading rules, but I think Schmidt’s treatment of the subject is far more comprehensive.


 

This is the oldest book of the three, and probably the most theoretical. However, it provides much more detail on market microstructure models than the other two, and it includes a chapter on order execution timing strategies.


 

I’ve collected some resources on income and wealth inequality. I’m just now digging into the mathematics, so this list will no doubt grow.




America’s Growing Income Gap, by the Numbers - ProPublica http://t.co/jYuKrNt5 $MACRO
@znmeb
M. Edward Borasky
Income Inequality Near You - ProPublica http://t.co/D8CpQBHQ $$
@znmeb
M. Edward Borasky
Changing Inequality by Rebecca M. Blank http://t.co/1u0hYCMJ $$
@znmeb
M. Edward Borasky
Wealth inequality in the United States - Wikipedia, the free encyclopedia http://t.co/3YIvAfEq
@znmeb
M. Edward Borasky
Economic inequality - Wikipedia, the free encyclopedia http://t.co/a2XINHIb
@znmeb
M. Edward Borasky
Income inequality metrics - Wikipedia, the free encyclopedia http://t.co/155c9cGd
@znmeb
M. Edward Borasky
Gini coefficient - Wikipedia, the free encyclopedia http://t.co/ygZqppcj
@znmeb
M. Edward Borasky
Lorenz curve - Wikipedia, the free encyclopedia http://t.co/E7OjIiwC
@znmeb
M. Edward Borasky
Pareto distribution - Wikipedia, the free encyclopedia http://t.co/pahxtTiK
@znmeb
M. Edward Borasky
 

 

Yet it seems to me that both inside the administration and outside of it there’s a shortage of turning to economists with specific expertise in recessions for advice on coping with the recession. These things are rare events. Buffett’s a smart guy and an old guy, but the USA has never been in this situation throughout the entirety of even his business career.

via The President Should Call Some Economists | ThinkProgress.

There are thousands of government economists in Washington and elsewhere, many of them noted PhDs. They report to the President via several Cabinet departments. They work in the Federal Reserve Board. They work for Congress as staffers or in the Congressional Budget Office. We pay their salaries via taxes, just like we pay our elected officials. In short, the President has “called some economists”. So have Congress and the Federal Reserve Board.

The media can and should hold these economists accountable, just as they do elected officials and corporate leaders like Buffett and Mulally. And it turns out that this is easy to do. The Departments of Treasury, Commerce and Labor all have excellent web sites with data, analysis and research papers. The Federal Reserve Board has an excellent web site with more data, analysis and research papers. So does the Congressional Budget Office. There is absolutely, positively no shortage of turning to economists with specific expertise in recessions for advice on coping with the recession.

Mr. Yglesias, it’s your job to advocate for the ThinkProgress agenda. But it’s the President’s job to synthesize the solutions from these thousands of economists and sell them to the American people. It also is his job to sell them to a collection of elected GOP local, state and Federal officials who want to see him voted out of office in November 2012. Recruiting industry leaders like Warren Buffett and Alan Mulally for advice, validation, salesmanship — whatever they can offer to help Washington restore sustainable economic growth — is a damn good idea.

 

Updated 2011-07-31:

1. I’ve turned off One True Fan, most likely permanently, because it was confusing some Highligher users.

2. I’ve turned off Disqus and WordPress commenting as well, though this might be temporary. For the moment, I want to test out Highlighter as the main/only method of discussion.

3. I haven’t found a list of all sites using Highlighter yet, but I can recommend two run by friends of mine, Audrey Watters (@audreywatters) and Michelle Rae Anderson (@mediaChick):


At O’Reilly’s miniTOC Portland conference yesterday (hashtag #TOCPDX), Josh Mullineaux of Highlighter.com presented a brief overview of a new tool for websites, called Highlighter. I’ve enabled Highlighter on this site. There’s a video on the Highlighter home page, but here’s how it works:

1.  Highlight any text or image on the site with the mouse / trackpad. You will get a three-option menu: “Save This”, “Share It” and “Comment”.

2. The “Save This” option saves the highlighted content in your Highlighter profile. The “Share It” gives you the option of sharing the highlighted content on Facebook, Twitter, or in an email.

3. The “Comment” option is a little more interesting, and I think this is the awesome part of Highlighter. You can comment on the highlighted content, and your comment is sent to the website owner, me in this case, for moderation. If the owner approves, the comment is posted and any Highlighter subscriber can see it and join in the discussion.

There’s a good bit more to this:

  • Highlighter subscribers can follow each others’ streams, just like on Twitter or Facebook. You can think of it as a social network for publishers and their readers. You can join Highlighter here: http://highlighter.com/register/
  • For the publisher, there are detailed analytics about how your readers are engaging with your site.
  • Subscribers have a profile page, which I’ve linked to my Twitter and LinkedIn profiles. Mine is http://highlighter.com/znmeb/
The installation is simple – in my case, I’ve simply installed a WordPress plugin. But any web site where you can install JavaScript in the page footer can use Highlighter. And even if you aren’t a publisher, you can subscribe to Highlighter and join in the discussion. I love it!
© 2011 Borasky Research Journal Suffusion theme by Sayontan Sinha