May 162012
 

I’ve just released Computational Journalism Server 1.2.6. There are two major updates to the functionality.

  1. I’ve added a script to upgrade the server to a desktop. The “install-lxde.bash” script installs a full LXDE desktop. You’ll have the Firefox browser, the Claws email client, the AbiWord word processor, Gnumeric spreadsheet, the Leafpad text editor, the LXTerminal terminal emulator, ePDFViewer and a graphical file manager.
  2. Given a desktop install, I’ve added scripts to download and install the prototype Overview tool for semantic visualization and hierarchical clustering of large document sets. I’ve been experimenting with this for three days now and I think it belongs in every computational journalist’s tool set.

I wrote a bit about Overview two days ago. The Overview team has an ambitious road map and I’m pretty confident their approach will help working journalists make sense of the volumes of text available. The tool as documented on the Overview web site runs on Windows and Macintosh personal computers with no modifications. In addition to the ability to install Overview in a desktop-enhanced Computational Journalism Server, I’ve provided scripts to install it on any openSUSE 12.1, Fedora 17 or Ubuntu 12.02 desktop. With minor tweaks it should run on older versions or other Linux distributions.

Here’s a sneak peek at the documentation for the new features, derived from https://github.com/znmeb/Computational-Journalism-Server/blob/master/Overview/README.md

Running Overview on the Computational Journalism Server

What’s Overview?

Overview is a tool for semantic visualization and hierarchical clustering of large document sets. Jonathan Stray of the Associated Press leads the development, with funding provided by a Knight Foundation grant.

The main project page is at http://overview.ap.org/. It’s an open source project and its repositories are on Github at https://github.com/overview. And they have a Twitter account: @overviewproject.

If you want to run the prototype of Overview on a Windows or Macintosh personal computer, the instructions are here: Getting Started with the Overview Prototype. If you want to run Overview on the Computational Journalism Server or a Linux Desktop, read on!

Running the Overview Prototype on the Computational Journalism Server

  1. You’ll need to download and install the Computational Journalism Server first. I recommend doing the “install-all.bash” full install rather than just installing the base appliance.
  2. Next, you’ll need to install the LXDE desktop. As “root”, do# cd /opt/Computational-Journalism-Server
    # ./install-lxde.bash

    When the script asks if you want to trust the repository, answer “a” for “always”.

  3. After the LXDE desktop repositories, patterns and packages are installed, you’ll be sent to a YaST2 session to change the default run level. Enter “Expert Mode”. Tab to the “Set default runlevel after booting to:” field and select “5: Full multiuser with network and display manager”. Then tab to “OK” and press “Enter”.
  4. Reboot and log in on the console as the non-root useryou created when you installed the appliance. Select “LXDE” in the “Desktop” pulldown menu. You should be in the LXDE desktop.The “install-lxde.bash” script installs a full LXDE desktop. You’ll have the Firefox browser, the Claws email client, the AbiWord word processor, Gnumeric spreadsheet, the Leafpad text editor, the LXTerminal terminal emulator, ePDFViewer and a graphical file manager.
  5. Open an “LX Terminal”. The menu button is in the lower left. Start the menu and select “System Tools -> LXTerminal”.
  6. In the terminal, type$ cd /opt/Computational-Journalism-Server
    $ cp -a Overview/ ~

    This creates a copy in your home directory where you can write.

  7. Type$ cd ~/Overview
    $ ./install-overview-openSUSE.bash

    This will take quite a bit of time to download and recompile the required Ruby 1.9.3.

  8. Type$ ./test-overview.bash

    This will run all the test cases. The “caracas” example takes quite a bit of time in the Ruby NLP step, but the others run fairly quickly on my 8 GB dual-core laptop.

Running the Overview Prototype on Linux Desktop / Laptop

If you already have a Linux desktop, do the following. I’ve tested this on openSUSE 12.1, Ubuntu 12.04 “Precise Pangolin” and Fedora 17 “Beefy Miracle”. It can probably be made to work on older Fedora or Ubuntu desktops with a little tweaking, but I’m not testing it on them. It will get tested on openSUSE 12.2 when the beta comes out.

  1. Install “git”.
  2. In some convenient directory where you have write access, type$ git clone http://github.com/znmeb/Computational-Journalism-Server
    $ cd Computational-Journalism-Server/Overview
    $ ./install-overview-<DISTRO>.bash

    where <DISTRO> is Fedora, Ubuntu or openSUSE. Do not run this as “root” – use an ordinary user account!

  3. That will take some time; as noted above, one of the steps is to download and recompile Ruby 1.9.3 from source. When it’s done, type “./test-overview.bash” as above.
May 122012
 

As I’ve noted recently, I’m in the process of migrating the Computational Journalism Server towards a full Platform as a Service offering. To that end, my development environments now run the three major Linux community desktops – Ubuntu, openSUSE and Fedora. I’m writing a bunch of convenience scripts so I can operate them all in a similar matter, and I run all three distros with the GNOME 3 GNOME shell.

This week, I upgraded from the stable Fedora 16 to the beta Fedora 17. For a beta, Fedora 17 is remarkably stable. There were no major issues with either of my machines, unlike Ubuntu 12.04 LTS, which required some video hacking to run on my workstation. The desktop, like previous Fedora desktops, is mostly standard GNOME issue, unlike Ubuntu’s.

I still prefer openSUSE green to Fedora blue, but that’s easily changed. In short, if I wanted to, I could make this my main desktop rather than openSUSE without a major learning curve or intensive customization. I couldn’t do that on Ubuntu, even using Ubuntu’s GNOME 3 GNOME shell.

On the laptop, Ubuntu and openSUSE both support “powertop” and “cpupower” / “cpufrequtils” for managing the processor frequency. Fedora has “powertop” but I couldn’t find a tool to set the power governors and ended up writing a Perl script to do that on Fedora. Neither Ubuntu nor Fedora appears to have a comprehensive system configuration GUI tool set like the one openSUSE provides with YaST2.

That’s not necessarily a bad thing on servers, since that’s usually done by editing configuration files and running command-line tools. Still, there are thousands of annoying differences in what configuration files are called and where they live among the three distributions. Fedora’s tools in the “system-config-*” packages are better than anything I could find in Ubuntu.

Moving on to documentation, I’d rate Fedora the highest among the three. For example, this page clearly documents what you need to do to run OpenStack Essex on Fedora 17. I couldn’t find anything for Essex on openSUSE at all, and Ubuntu’s documentation was written by a third party, not the Ubuntu community. For where I am now in the Computational Journalism Server project, documentation is more important than anything except underlying operating-system-level quality.

As far as I’m concerned, all three distributions are solid under the hood. They all track security isses and issue bug fixes promptly, they all run on reasonable hardware configurations without much hassle, they all appear to have solid communities and solid financial backing.Under the hood, Linux is Linux is Linux.

But on the developer’s desktop, it’s the little things that matter. Ubuntu’s “consumer-oriented” Unity desktop put me off instantly. Even when I installed GNOME shell to get my preferred desktop, the color scheme was annoying and the workflow wasn’t as smooth as it is with openSUSE and Fedora. That puts Ubuntu at the bottom.

Overall, I’d rate Fedora 17 slightly better than the current stable openSUSE, 12.1. There’s more software packaged in the Fedora base, including some of the CRAN library packages for R. The documentation is better on Fedora, but the system administration GUI tools are better on openSUSE.

Will I switch? That’s a tough call. openSUSE isn’t standing still; there’s a beta of 12.2 scheduled for release the same week as Fedora 17 stable. If Fedora has anything like SUSE Studio or openSUSE Build Service, I haven’t found it. So most likely I’ll blow away the Ubuntu partitions and start testing openSUSE 12.2 beta. Still, Fedora 17 is a solid working Linux desktop and I’ll be using it more as time passes.

May 052012
 

First of all, let me put this in perspective. I’ve been using Linux on workstations and laptops since Red Hat Linux 6.2. I stayed with Red Hat all the way through Red Hat Linux 9. When Red Hat split the distribution into Red Hat Enterprise Linux and Fedora Core in 2003, I switched to Debian. I ran Debian for about six months, then switched to Gentoo Linux. In the summer of 2008, I switched to openSUSE Linux and I’ve been on openSUSE since then.

Every time one of the major community Linux distributions ships a new stable release, I try it out. So far, none of the Debian, Fedora, Ubuntu or Mint releases has come out significantly better than openSUSE, so I’ve stuck with it. And that remains true for Ubuntu 12.04 LTS “Precise Pangolin”. If that were the end of the story, I could close this blog post now. But it’s not.

If you’ve been following this blog and my Twitter stream and Github account, you’ll know that I’ve been collecting tools for computational journalism and packaging them as appliances. And I’m moving on towards a Platform as a Service. One of the requirements I’ve put on that is that the tools should be distribution-agnostic as much as possible. Up to now, everything has been on openSUSE because of the SUSE Studio appliance construction tools and to a lesser extent the openSUSE Build Service package repositories. But I’ve come to the point where I need to make things work on Fedora and Ubuntu.

So I’ve quad-booted my laptop (Windows, openSUSE, Fedora 16 and Ubuntu 12.04). And I’m trying to triple-boot my workstation with openSUSE, Fedora and Ubuntu. Which brings us to the first problem – openSUSE and Fedora installed cleanly on the workstation, but Ubuntu 12.04 didn’t. In particular, the Ubuntu desktop doesn’t even come up on a 1024×768 monitor!

I can understand Linux not coming up on a wireless card that’s relatively new. I can understand Linux having trouble with a touchpad or with audio. After all, the hardware makers design for Windows and Apple, not Linux desktops / laptops. But a 1024×768 monitor that’s run everything from Gentoo / WindowMaker to KDE 3.5 to KDE 4 to GNOME 2 and GNOME 3 and LXDE and Cinnamon on openSUSE? A 1024×768 monitor that runs Fedora 16 without any problems? That’s just plain wrong!

I did get the Ubuntu desktop working on the laptop, which is a much newer configuration. I’m not going to spend a great deal of time on how ugly the desktop actually is when it works. That’s been covered in numerous places and desktops are

  1. A matter of personal taste, and
  2. Customized to the user’s workflow.

But for someone who, like me, is used to the GNOME 2 desktop as delivered in previous versions of Ubuntu and Fedora, the openSUSE customization of GNOME 2 and the current clean implementations of GNOME 3 on openSUSE and Fedora, Ubuntu’s Unity desktop is jarring. And it’s really hard to figure out how to do things, where stuff is, and so on.

Moreover, the whole distribution is “pushy” – it’s hawking subscriptions to Ubuntu One cloud music, for example. The software installer has favorite apps, and so on. It’s like having a Kindle Fire or an iPad or visiting the Chrome Web Store or Google Play – the Ubuntu desktop is trying to sell you something every time you move your mouse. Ubuntu has turned the Linux desktop into just another media consumption device!

That’s two strikes – annoyances but not deal-breakers. But what I want to do with Fedora and Ubuntu is use them as hosts for virtual appliances, just like I use openSUSE and Windows / VirtualBox now.. In openSUSE and Fedora, I can go into the software installer and select a “pattern” and get everything I need to do that. If Ubuntu has that, it’s well hidden under the games and the productivity suites and the media apps. Sure, I can go find how to do that on Ubuntu on the web, but it seems to be going against the grain of the distribution. It only took me two minutes to find it on Fedora after almost four years of working daily on openSUSE!

I’m sure “Precise Pangolin” is a fine distribution “under the hood.” The previous long-term support version, 10.04, is an acknowledged workhorse in servers along with Debian, RHEL/CentOS/Scientific Linux and SLES. I have to test on it, and I’ll figure out how to be productive at it. But if Canonical can’t come up with a desktop built for Linux professionals like me, they’re going to lose us.

May 022012
 

As I’ve noted here, the Computational Journalism Server “wants to be a Platform-as-a-Service (PaaS) when it grows up.” In plotting the way forward to that goal, I’ve looked at three options:

  1. Remain on openSUSE / SUSE Studio and collect other open source tools to provide the additional services that would make the current server into a true PaaS.
  2. Start with the Cloud Foundry or one of its derivatives and add the computational journalism tools.
  3. Start with the Red Hat OpenShift Origin PaaS and add the computational journalism tools.

Remain on openSUSE

If I remain on openSUSE, as noted above, I’d need to collect more tools to provide additional services. Many of these deal with the underlying infrastructure. My target infrastructure is OpenStack Essex. That’s still the target. Moreover, the overall goal is still to provide an R-language tool set for dealing with large-scale computational journalism problems using library packages in the Comprehensive R Archive Network that implement parallel, cluster and grid computing.

When I started the original Data Journalism Developer Studio project, the Platform as a Service concept was in its infancy. It has matured rapidly, though. Cloud Foundry and Red Hat OpenShift are both about a year old, and several derivatives of Cloud Foundry have already appeared.

Cloud Foundry runs on Ubuntu Linux and OpenShift on Red Hat / Fedora. There isn’t an equivalent packaged solution for openSUSE. I’d have to build that for the Computational Journalism Server to be a true PaaS. And that seems to me a diversion from the mission.

Cloud Foundry

Cloud Foundry is an open source project from VMware. Derivative projects include AppFog for PHP projects, PaaS.io for Haskell, Stackato for Perl and Python and a community fork called CloudFreeStyle. Of these, neither AppFog nor PaaS.io are relevant, since there’s little PHP or Haskell in the current or planned use cases for the Computational Journalism Server. So the options are Cloud Foundry itself, Stackato or CloudFreeStyle.

I’ve ruled out the base Cloud Foundry for two main reasons.

  1. Most of the frameworks, services and tools provided by Cloud Foundry are totally unfamiliar to me. I know enough Ruby to do simple scripting and enough Java to call class libraries, but I know virtually nothing about Ruby on Rails, and I know absolutely nothing about Spring, Scala, RabbitMQ or Eclipse. I’d have a steep learning curve on tools that aren’t relevant to the core of the Computational Journalism Server – R, SQL and NoSQL databases, Hadoop and to a lesser extent Perl and Python.
  2. For an open source project with a year’s history under its belt, the documentation is, in a word, abysmal. In particular, the tasks I need to accomplish to make Computational Journalism Server into a PaaS – primarily adding R parallel programming capabilities and application packages – are totally undocumented.

I’m a member of the CloudFreeStyle project. I joined because I wanted to learn how to do what’s in Cloud Foundry and how to enhance it. Because it’s a source-level fork of Cloud Foundry, it would be easy to add functionality and ignore the components I don’t need, at least in the beginning. The “glue logic” to talk to applications and to the cloud infrastructure is already there and should “just work”. But, like its parent, there’s little documentation and I’d have to figure things out from the source.

Finally, there’s Stackato. Stackato is a very impressive product and ActiveState’s documentation, support and tool set is world-class. I’ve been a happy ActiveState Perl Development Kit user for years. If the Computational Journalism Server was a commercial product / business venture rather than an open source project, I’d go with Stackato. But the Computational Journalism Server isn’t there yet and may never be.

OpenShift Origin

OpenShift Origin, released on April 30, 2012, is an open source PaaS construction platform from Red Hat. I’ve spent about a day and a half browsing the documentation and I’m blown away by how comprehensive it is, especially for someone like me who wants to build a tool set from the ground up. The OpenShift Origin documentation is every bit as awesome as Cloud Foundry’s is abysmal.

The demos include a number of “LAMP stack oldies but goodies” – MediaWiki, WordPress, and Drupal. There’s also an OpenShift Origin LiveCD, based on Fedora 16, that turns any 64-bit Intel / AMD workstation, laptop or virtual machine into an OpenShift PaaS. With a few additional steps you can install OpenShift Origin permanently on a real or virtual machine.

The Way Forward

At the moment, I’m keeping three options open:

  1. Remain on openSUSE,
  2. CloudFreeStyle, and
  3. OpenShift Origin.

But I suspect the strength of the documentation will pull the project towards OpenShift Origin sooner rather than later. “Watch this space,” as the saying goes.