znmeb

May 162012
 

I’ve just released Computational Journalism Server 1.2.6. There are two major updates to the functionality.

  1. I’ve added a script to upgrade the server to a desktop. The “install-lxde.bash” script installs a full LXDE desktop. You’ll have the Firefox browser, the Claws email client, the AbiWord word processor, Gnumeric spreadsheet, the Leafpad text editor, the LXTerminal terminal emulator, ePDFViewer and a graphical file manager.
  2. Given a desktop install, I’ve added scripts to download and install the prototype Overview tool for semantic visualization and hierarchical clustering of large document sets. I’ve been experimenting with this for three days now and I think it belongs in every computational journalist’s tool set.

I wrote a bit about Overview two days ago. The Overview team has an ambitious road map and I’m pretty confident their approach will help working journalists make sense of the volumes of text available. The tool as documented on the Overview web site runs on Windows and Macintosh personal computers with no modifications. In addition to the ability to install Overview in a desktop-enhanced Computational Journalism Server, I’ve provided scripts to install it on any openSUSE 12.1, Fedora 17 or Ubuntu 12.02 desktop. With minor tweaks it should run on older versions or other Linux distributions.

Here’s a sneak peek at the documentation for the new features, derived from https://github.com/znmeb/Computational-Journalism-Server/blob/master/Overview/README.md

Running Overview on the Computational Journalism Server

What’s Overview?

Overview is a tool for semantic visualization and hierarchical clustering of large document sets. Jonathan Stray of the Associated Press leads the development, with funding provided by a Knight Foundation grant.

The main project page is at http://overview.ap.org/. It’s an open source project and its repositories are on Github at https://github.com/overview. And they have a Twitter account: @overviewproject.

If you want to run the prototype of Overview on a Windows or Macintosh personal computer, the instructions are here: Getting Started with the Overview Prototype. If you want to run Overview on the Computational Journalism Server or a Linux Desktop, read on!

Running the Overview Prototype on the Computational Journalism Server

  1. You’ll need to download and install the Computational Journalism Server first. I recommend doing the “install-all.bash” full install rather than just installing the base appliance.
  2. Next, you’ll need to install the LXDE desktop. As “root”, do# cd /opt/Computational-Journalism-Server
    # ./install-lxde.bash

    When the script asks if you want to trust the repository, answer “a” for “always”.

  3. After the LXDE desktop repositories, patterns and packages are installed, you’ll be sent to a YaST2 session to change the default run level. Enter “Expert Mode”. Tab to the “Set default runlevel after booting to:” field and select “5: Full multiuser with network and display manager”. Then tab to “OK” and press “Enter”.
  4. Reboot and log in on the console as the non-root useryou created when you installed the appliance. Select “LXDE” in the “Desktop” pulldown menu. You should be in the LXDE desktop.The “install-lxde.bash” script installs a full LXDE desktop. You’ll have the Firefox browser, the Claws email client, the AbiWord word processor, Gnumeric spreadsheet, the Leafpad text editor, the LXTerminal terminal emulator, ePDFViewer and a graphical file manager.
  5. Open an “LX Terminal”. The menu button is in the lower left. Start the menu and select “System Tools -> LXTerminal”.
  6. In the terminal, type$ cd /opt/Computational-Journalism-Server
    $ cp -a Overview/ ~

    This creates a copy in your home directory where you can write.

  7. Type$ cd ~/Overview
    $ ./install-overview-openSUSE.bash

    This will take quite a bit of time to download and recompile the required Ruby 1.9.3.

  8. Type$ ./test-overview.bash

    This will run all the test cases. The “caracas” example takes quite a bit of time in the Ruby NLP step, but the others run fairly quickly on my 8 GB dual-core laptop.

Running the Overview Prototype on Linux Desktop / Laptop

If you already have a Linux desktop, do the following. I’ve tested this on openSUSE 12.1, Ubuntu 12.04 “Precise Pangolin” and Fedora 17 “Beefy Miracle”. It can probably be made to work on older Fedora or Ubuntu desktops with a little tweaking, but I’m not testing it on them. It will get tested on openSUSE 12.2 when the beta comes out.

  1. Install “git”.
  2. In some convenient directory where you have write access, type$ git clone http://github.com/znmeb/Computational-Journalism-Server
    $ cd Computational-Journalism-Server/Overview
    $ ./install-overview-<DISTRO>.bash

    where <DISTRO> is Fedora, Ubuntu or openSUSE. Do not run this as “root” – use an ordinary user account!

  3. That will take some time; as noted above, one of the steps is to download and recompile Ruby 1.9.3 from source. When it’s done, type “./test-overview.bash” as above.
May 152012
 

Surely you’ve seen this: I got one of those emails. As far as I’m concerned it’s spam and I’ll be disabling it.It is not

  • Timely – what good is a summary of last week’s tweets?
  • Revelant – I don’t see any evidence that it tracks the topics I care about. And, of course, if I did care about a topic, I’d be tracking it on Twitter via search, in my RSS feed reader, via search engines, on email lists and even in face-to-face meetings.
  • Personal – It’s a damn email autoresponder, fercryingoutloud! Sure, it knows my name and Twitter handle, just like every other email autoresponder I’ve ever joined.

I think this is a giant leap backwards for Twitter. Email marketing represents everything a lot of us hate about the Internet. It’s annoying and for the most part a waste of the senders’ time as well as the receivers. I’m on probably a dozen or two email lists / Google Groups relevant to my interests. But I rarely give out my email address any more to, say, download a “free white paper” or some other “content marketing” gizmo.

Twitter has an email list of hundreds of millions of addresses. How long do you suppose it will take phishers to copy the emails, hook up databases of Twitter handles and email addresses and start pumping out fake “best of Twitter” emails? How long do you suppose it will be before advertisers want “Promoted Stories” sent out to this mailing list? And if you’re using GMail to read these emails, well, Google is making advertising dollars on Twitter’s back! What’s up with that?

I haven’t seen many complaints about this so far in the tech blogs. I think the focus is on Facebook’s IPO and Yahoo’s attempting to fire its way to growth. But I think it’s a bad idea, and I’ve unsubscribed.

If I were working at Twitter, I’d go the exact opposite way. Instead of building a “weekly news magazine”, I’d build a breaking real-time world news ticker. Give me a page with a map of the world. Capture an average of tweet rates by geotag, time of day and day of the week in a database. When the tweet rate takes a sharp increase at a location, light it up on the map and give me a link to search for the tweets.

May 142012
 

If you’ve been following my tweet stream, you saw me tweet this:

At $1450 a month for five seats, I think the service is overpriced. Moreover, Twitter, Facebook/Instagram, Google/YouTube or Yahoo/Flickr could easily build this into their web sites and deliver it for free, essentially by-passing two middlemen – Geofeedia and the news organization subscribing to Geofeedia. And a clever RSS / Yahoo! Pipes hacker could build something like this for use in a newsroom. For that matter, if you limit yourself to Twitter you can do most of this with Twitter / Advanced Search.

I must admit that I love the idea and think this could evolve into something game-changing. I wrote about the potential for this back in January 2010!

The Twitter Streaming API — How It Works and Why It’s A Big Deal

To get an idea what this could become, check out Knowledge Discovery from Data Streams by Joao Gama.

Moving on, I don’t know how I’ve managed to be a tech blogger writing about computational journalism without discovering Overview until last week, but it happened. Twitter serendipity at work – I was watching my Interactions page and saw a tweet of mine retweeted by @overviewproject. The Overview project is led by Jonathan Stray. You can see the entire team here.

Overview is open source, lives on Github and appears to be a mix of Ruby and Java. I’m currently testing it out for potential inclusion in one of my computational journalism appliances. It’s a browser / desktop application, so most likely it will end up in the successor to Data Journalism Developer Studio  2012LX. If you want to work with it yourself, the instructions are here.

So which of the two represents the future of journalism? Both, of course! With the proper underlying database and real-time knowledge discovery algorithms, Geofeedia could be a game-changer. But in the long run, as a for-profit service, I think they’ll either get acquired or duplicated by the big players..The Overview project, on the other hand, is an open source project. It’s well-funded by the Knight Foundation and Associated Press, and the team is led by one of the well-known names in computational journalism. Overview is certainly going to be part of my future.

 

May 132012
 

It’s Sunday. It’s Mother’s Day. It’s the 27th anniversary of my arrival in Portland, Oregon. It’s hot. I’m grumpy. I need a laugh. Don’t you?

So … here are my picks for the five funniest movies of all time. I can’t honestly rank them, so I’ll simply give you the list.

Wag the Dog is one of those films that I love for the subtle references to the events of the time. Has it grown stale with time? I don’t think so. Our political process hasn’t changed as far as I can tell. Now, of course, we’d see viral YouTube videos and Twitter Trending Topics, but tails will continue to wag dogs and this movie will always be funny.

The Producers (1968): If I absolutely, positively had to pick the movie that made me laugh longer and harder than any other, this would be it. This was the first of the amazing list of comedies Mel Brooks and his troupe made, and it will always be my favorite.

Peter Sellers was one of the truly great comic talents of the 20th century. I could have picked any of a dozen of his films, but I settled on one of the earlier ones, The Mouse That Roared. This is biting satire at its finest. Few had heard of Sellers when this film was made, but it made him a world-wide star.

No list of comedies would be complete without one from Neil Simon. I’ve seen them all and this one remains my favorite. The Sunshine Boys stars George Burns and Walter Matthau as two aging comedians. They were a famous vaudeville act, but they broke up. Poor Richard Benjamin has to engineer a reunion, but … well … they can’t stand each other.

Finally, here’s a hint – http://www.youtube.com/watch?v=LS75NtlH3gI. But I’m guessing you’ll want to watch the whole thing. Like Sellers, Danny Kaye was a comic genius who appeared in dozens of films. But it’s The Court Jester that stands out in my mind above all the others.

So – dear readers – what are your funniest movies of all time?

May 122012
 

As I’ve noted recently, I’m in the process of migrating the Computational Journalism Server towards a full Platform as a Service offering. To that end, my development environments now run the three major Linux community desktops – Ubuntu, openSUSE and Fedora. I’m writing a bunch of convenience scripts so I can operate them all in a similar matter, and I run all three distros with the GNOME 3 GNOME shell.

This week, I upgraded from the stable Fedora 16 to the beta Fedora 17. For a beta, Fedora 17 is remarkably stable. There were no major issues with either of my machines, unlike Ubuntu 12.04 LTS, which required some video hacking to run on my workstation. The desktop, like previous Fedora desktops, is mostly standard GNOME issue, unlike Ubuntu’s.

I still prefer openSUSE green to Fedora blue, but that’s easily changed. In short, if I wanted to, I could make this my main desktop rather than openSUSE without a major learning curve or intensive customization. I couldn’t do that on Ubuntu, even using Ubuntu’s GNOME 3 GNOME shell.

On the laptop, Ubuntu and openSUSE both support “powertop” and “cpupower” / “cpufrequtils” for managing the processor frequency. Fedora has “powertop” but I couldn’t find a tool to set the power governors and ended up writing a Perl script to do that on Fedora. Neither Ubuntu nor Fedora appears to have a comprehensive system configuration GUI tool set like the one openSUSE provides with YaST2.

That’s not necessarily a bad thing on servers, since that’s usually done by editing configuration files and running command-line tools. Still, there are thousands of annoying differences in what configuration files are called and where they live among the three distributions. Fedora’s tools in the “system-config-*” packages are better than anything I could find in Ubuntu.

Moving on to documentation, I’d rate Fedora the highest among the three. For example, this page clearly documents what you need to do to run OpenStack Essex on Fedora 17. I couldn’t find anything for Essex on openSUSE at all, and Ubuntu’s documentation was written by a third party, not the Ubuntu community. For where I am now in the Computational Journalism Server project, documentation is more important than anything except underlying operating-system-level quality.

As far as I’m concerned, all three distributions are solid under the hood. They all track security isses and issue bug fixes promptly, they all run on reasonable hardware configurations without much hassle, they all appear to have solid communities and solid financial backing.Under the hood, Linux is Linux is Linux.

But on the developer’s desktop, it’s the little things that matter. Ubuntu’s “consumer-oriented” Unity desktop put me off instantly. Even when I installed GNOME shell to get my preferred desktop, the color scheme was annoying and the workflow wasn’t as smooth as it is with openSUSE and Fedora. That puts Ubuntu at the bottom.

Overall, I’d rate Fedora 17 slightly better than the current stable openSUSE, 12.1. There’s more software packaged in the Fedora base, including some of the CRAN library packages for R. The documentation is better on Fedora, but the system administration GUI tools are better on openSUSE.

Will I switch? That’s a tough call. openSUSE isn’t standing still; there’s a beta of 12.2 scheduled for release the same week as Fedora 17 stable. If Fedora has anything like SUSE Studio or openSUSE Build Service, I haven’t found it. So most likely I’ll blow away the Ubuntu partitions and start testing openSUSE 12.2 beta. Still, Fedora 17 is a solid working Linux desktop and I’ll be using it more as time passes.