As I’ve noted here, the Computational Journalism Server “wants to be a Platform-as-a-Service (PaaS) when it grows up.” In plotting the way forward to that goal, I’ve looked at three options:
- Remain on openSUSE / SUSE Studio and collect other open source tools to provide the additional services that would make the current server into a true PaaS.
- Start with the Cloud Foundry or one of its derivatives and add the computational journalism tools.
- Start with the Red Hat OpenShift Origin PaaS and add the computational journalism tools.
Remain on openSUSE
If I remain on openSUSE, as noted above, I’d need to collect more tools to provide additional services. Many of these deal with the underlying infrastructure. My target infrastructure is OpenStack Essex. That’s still the target. Moreover, the overall goal is still to provide an R-language tool set for dealing with large-scale computational journalism problems using library packages in the Comprehensive R Archive Network that implement parallel, cluster and grid computing.
When I started the original Data Journalism Developer Studio project, the Platform as a Service concept was in its infancy. It has matured rapidly, though. Cloud Foundry and Red Hat OpenShift are both about a year old, and several derivatives of Cloud Foundry have already appeared.
Cloud Foundry runs on Ubuntu Linux and OpenShift on Red Hat / Fedora. There isn’t an equivalent packaged solution for openSUSE. I’d have to build that for the Computational Journalism Server to be a true PaaS. And that seems to me a diversion from the mission.
Cloud Foundry
Cloud Foundry is an open source project from VMware. Derivative projects include AppFog for PHP projects, PaaS.io for Haskell, Stackato for Perl and Python and a community fork called CloudFreeStyle. Of these, neither AppFog nor PaaS.io are relevant, since there’s little PHP or Haskell in the current or planned use cases for the Computational Journalism Server. So the options are Cloud Foundry itself, Stackato or CloudFreeStyle.
I’ve ruled out the base Cloud Foundry for two main reasons.
- Most of the frameworks, services and tools provided by Cloud Foundry are totally unfamiliar to me. I know enough Ruby to do simple scripting and enough Java to call class libraries, but I know virtually nothing about Ruby on Rails, and I know absolutely nothing about Spring, Scala, RabbitMQ or Eclipse. I’d have a steep learning curve on tools that aren’t relevant to the core of the Computational Journalism Server – R, SQL and NoSQL databases, Hadoop and to a lesser extent Perl and Python.
- For an open source project with a year’s history under its belt, the documentation is, in a word, abysmal. In particular, the tasks I need to accomplish to make Computational Journalism Server into a PaaS – primarily adding R parallel programming capabilities and application packages – are totally undocumented.
I’m a member of the CloudFreeStyle project. I joined because I wanted to learn how to do what’s in Cloud Foundry and how to enhance it. Because it’s a source-level fork of Cloud Foundry, it would be easy to add functionality and ignore the components I don’t need, at least in the beginning. The “glue logic” to talk to applications and to the cloud infrastructure is already there and should “just work”. But, like its parent, there’s little documentation and I’d have to figure things out from the source.
Finally, there’s Stackato. Stackato is a very impressive product and ActiveState’s documentation, support and tool set is world-class. I’ve been a happy ActiveState Perl Development Kit user for years. If the Computational Journalism Server was a commercial product / business venture rather than an open source project, I’d go with Stackato. But the Computational Journalism Server isn’t there yet and may never be.
OpenShift Origin
OpenShift Origin, released on April 30, 2012, is an open source PaaS construction platform from Red Hat. I’ve spent about a day and a half browsing the documentation and I’m blown away by how comprehensive it is, especially for someone like me who wants to build a tool set from the ground up. The OpenShift Origin documentation is every bit as awesome as Cloud Foundry’s is abysmal.
The demos include a number of “LAMP stack oldies but goodies” – MediaWiki, WordPress, and Drupal. There’s also an OpenShift Origin LiveCD, based on Fedora 16, that turns any 64-bit Intel / AMD workstation, laptop or virtual machine into an OpenShift PaaS. With a few additional steps you can install OpenShift Origin permanently on a real or virtual machine.
The Way Forward
At the moment, I’m keeping three options open:
- Remain on openSUSE,
- CloudFreeStyle, and
- OpenShift Origin.
But I suspect the strength of the documentation will pull the project towards OpenShift Origin sooner rather than later. “Watch this space,” as the saying goes.