Borasky Research Journal Google+ Page

Borasky Research Journal Amazon Store


Data Journalism Developer Studio 2012LX

 

Data Journalism Developer Studio 2012LX Blog


By now, you’ve probably seen the reactions to Apple’s “education event” yesterday. My take is that it was 100% Apple marketing and zero “disrupting education.” It was all about selling overpriced tablets to schools that are struggling to keep teachers on the payroll. It was all about forcing authors to buy new Macintosh machines or upgrading existing ones to MacOS X “Lion”. And it was about a restrictive EULA for authors.

Textbooks should be free! That’s one way to disrupt education. And CK12.org provides Science, Technology, Engineering and Mathematics (STEM) textbooks for free. These are textbooks developed by educators, not marketers. They work on iPads, Kindles, PDF readers, or you can read them on line in your browser. There are authoring tools on the web site as well. The current CK-12 FlexBooks Library lists 38 mathematics textbooks, 34 in science and 20 in other subjects. Some have both student and teacher editions. Once you have an account, you can access the authoring and reformatting tools. I highly recommend doing this even if you only want to read or teach from the books.

And education software should be free! The most comprehensive collection of free educational software I’ve found is openSUSE Linux for Education – openSUSE:Education-Li-f-e. This is a LiveDVD that will boot on most PC-based hardware with at least 1 GB of RAM.. You don’t even need a hard drive – since it’s a Live DVD, Li-f-e doesn’t touch the hard drive unless you explictly direct it to do so. If you want, you can copy the DVD to a USB drive and boot from that. The directions for that are here.

Li-f-e is an absolutely stunning collection of software. It has the openSUSE 12.1 32-bit Linux operating system, the GNOME 3, KDE 4 and ultra-light IceWM desktops, desktop / productivity software, and a comprehensive collection of educational software for students ranging from pre-school all the way up into graduate school. It also has a complete Linux / Apache / MySQL / PHP (LAMP) server stack, a Linux Terminal Server Project (LTSP) server stack and a complete suite of professional software and web development tools. And the Scratch tools for teaching kids to program are there.

Given that these free tools exist, and have been around since well before the iPad, I don’t see how Apple marketing can claim to be disrupting education. There’s real disruption if you know where to look.

 

I’ve just pushed release 1.0.0 of the Data Journalism Developer Studio into the SUSE Gallery. Changes:

  • The base appliance ships with Mozilla Firefox as the browser rather than Chromium. Chromium is available as an add-on installation script set. This was a difficult decision for me to make, but the version of Chromium in the Open Build Service is 13.0.xxx, which is updated frequently and can be unstable. This is roughly equivalent to Google’s “Canary” build on Windows and Macintosh. Chromium was proving too unstable for regular use, so I replaced it with Firefox.
  • I added CoffeeScript to the install scripts for node.js and NowJS. If you’re a JavaScript developer, I welcome more suggestions for node.js packages.

I’m planning to open the project up to other developers in the near future. Now that the Fundry feature request mechanism is in place, the road map is public. My own plan is to start building user-level documentation. Most of the software in the appliance is well-documented on its own, but there aren’t too many examples of application-level usage that I’ve been able to find.

Powered by Fundry

 

I’m really conflicted about this. On the one hand, I know Twitter needs to sell advertising, and web services need to promote themselves. And yes, this is a real news event, not a manufactured story. But I wonder – are we heading back to the days of “Yellow Journalism” in the tweet stream? Please comment below.

 

 According to Mashable, “Kraft Looks to Reward Twitter Users Who Tweet About Mac & Cheese“,

Under a new program quietly rolled out over the past few weeks, any time two people individually use the phrase “mac & cheese” in a tweet, they’ll each get a link pointing out the “Mac & Jinx.” The first one to click the link and give Kraft his or her address gets five free boxes of Kraft’s mac and cheese and a T-shirt.

It seems that “Mac & Cheese” is now a Trending Topic, as of 2011-03-08 19:12 UTC. But when you click on the topic, you see this Promoted Tweet:

What could be worse? Alyssa Milano, who has 1,403,372 followers, posted this tweet:

This could get interesting. 

Update: it has gotten interesting. @WootLive has gotten into the act.

Update: FriendsEAT has tweeted about capacity issues stemming from their article.

Oh, by the way — the Kraft campaign that started this whole thing is being run by Crispin Porter + Bogusky. Does that name sound familiar? It’s the same agency that came up with the GroupOn Super Bowl ads about Tibet and seafood curry.

 

For a variety of reasons, mostly due to appliance development, I haven’t upgraded my workstation to openSUSE 11.4 yet. So I took the time over the weekend to do the upgrade to Release Candidate 2. As is often the case, a lot has changed under the hood but there aren’t many really earth-shaking user-visible updates.

First of all, I’m a GNOME desktop user, and prefer the openSUSE enhanced GNOME desktop over the ones currently shipping with Fedora and Ubuntu. I’m not going to get into the GNOME 3 / Unity / KDE 4 desktop battle – one has to make a decision in order to have a productive workflow and I’ve made it in favor of Gnome 2 openSUSE edition. So if you care about other desktops, you’ll need to look elsewhere at the moment.

The desktop definitely feels more responsive than 11.3 – the browsers start up faster, both Firefox, which is currently at 4.0b12, and Chrome unstable, currently at 11.0.686.3 dev. I don’t know if this is because the browsers are faster starting themeselves, the desktop program start is faster or the kernel memory management is better, or some combination of the above, but it’s definitely worth the upgrade right there. I won’t be taking this machine back to 11.3!

There are only incremental improvements in the overall GNOME dekstop – it’s currently at 2.32. The details on the upgrade can be found here. But the productivity suite has changed from OpenOffice.org to LibreOffice 3.3.1. Again, I’m not going to get into the decision-making process but I switched to LibreOffice a few weeks ago and it seems much more responsive than OpenOffice.org.

Now for a more major enhancement – openSUSE 11.4 supports both KVM and Xen as virtual appliance platforms out of the box, and VirtualBox has been upgraded to 4.0.4. I mostly run my appliances in VMWare Workstation because they’re desktops and I like the “Unity” mode of having applications in the appliance appearing on my workstation’s desktop. But I will be testing out all the other options now that they’re built-in.

The standard openSUSE 11.4 release now includes a video editor – PiTiVi. I’ve been trying to choose a video editor for Project Kipling and have installed all the open source video editors I could find. But that makes the appliance larger than it needs to be and exposes a number of extra repositories. Now that there’s a “standard” video editor, I’ll be using it in Project Kipling.

Speaking of appliances, the plan for SUSE Studio is to have the repositories for 11.4 appliances available at the time the distribution is released, March 10, 2011. I will be porting Project Kipling to 11.4 as soon as the repositories are available in SUSE Studio. The others will probably just sit at 11.3 for a few weeks; I need to look at the tradeoffs between building applications from upstream source and installing them from the openSUSE Build Service and other repositories.

So should you upgrade to openSUSE 11.4 on March 10th? Of course! Should you replace your current operating system with openSUSE 11.4? Sixteen months ago, when openSUSE 11.2 came out, I would have said “Yes!” The base operating system, desktops, browsers and productivity tools were as good as Windows or MacOS X at near zero cost, and for a scientific workstation user like myself, a Linux desktop was and still is the only economically viable option.

But tablets, smart phones, “the cloud” and the Great Recession have changed that. Steve Jobs is right – we live in a post-PC world. openSUSE 11.4 — and Fedora and Ubuntu — are PC operating systems. There’s a tremendous shakeout happening in how we use digital technology, and I’m not convinced that any Linux desktop is the way forward. The battles now are for the best integrated pocket form factor, tablet and notebook. openSUSE, Fedora and Ubuntu seem doomed forever to battle for the one percent of the population that needs a high-performance low-cost workstation.

 

Update 2011-03-20

For a variety of reasons, I have replaced the Social Media Analytics Research Toolkit, Code Like A Pirate and Project Kipling with a new, modular appliance called the Data Journalism Developer Studio. All of the software found in those three appliances can be installed via scripts provided in the new appliance. Links:


Upon careful reading of Twitter’s API Terms of Service, I have decided to temporarily remove two appliances from the SUSE Studio Gallery. Those two appliances are the Social Media Analytics Research Toolkit (SMART@znmeb) and Project Kipling Real-Time Data Journalism Tools. I do intend to put them back on line at some point in the future, but I do not at this time know when they will be back, because I haven’t determined the scope of required changes to the appliances or their marketing materials. Why? These two appliances may be in violation of item 4.A. below:

4. You will not attempt or encourage others to:

A. sell, rent, lease, sublicense, redistribute, or syndicate the Twitter API or Twitter Content to any third party for such party to develop additional products or services without prior written approval from Twitter;

B. remove or alter any proprietary notices or marks on the Twitter API or Twitter Content;

C. use or access the Twitter API for purposes of monitoring the availability, performance, or functionality of any of Twitter’s products and services or for any other benchmarking or competitive purposes; or

D. use Twitter Marks as part of the name of your company or Service, or in any product, service, or logos created by you. You may not use Twitter Marks in a manner that creates a sense of endorsement, sponsorship, or false association with Twitter. All use of Twitter Marks, and all goodwill arising out of such use, will inure to Twitter’s benefit.

E. use or access the Twitter API to aggregate, cache (except as part of a Tweet), or store place and other geographic location information contained in Twitter Content.

While I don’t encourage people to redistribute Twitter data, the appliances do have the ability to collect Twitter data and I can’t prevent them from redistributing it. I want to emphasize that Twitter has not asked me to take these appliances down! I don’t know that they violate the letter of item 4.C., but I think they violate the spirit of that clause, so I am removing them until I can determine in what form they are viable products.

 

This tutorial covers profiling of Linux servers using open-source tools such as “iostat”, “oprofile” and “blktrace”. Both processor-bound and I/O-bound cases are covered, and the emphasis is on tools that provide visual displays of relevant metrics.

Linux Server Profiling: Using Open Source Tools For Bottleneck Analysis

 

Data Journalism Developer Studio 2012LX Blog


Disclosure

As you probably know, I live in the Portland, Oregon area and have for many years. One of the must-visit places here is Powell’s Books. The book links in this post will all take you to Powell’s as part of their Partner Program. If you’d like to join the program too, here’s the link.

Updated September 11, 2011: I recently purchased a Kindle, and two of these books are now available in that format. For those two, I’ve tweeted my Amazon Affiliate links out and have embedded those tweets here.

Prerequisite Software

To get the most out of these books, you will need to install some software. You will need Mondrian, R, GGobi, and the ggplot2, rggobi and DescribeDisplay R packages. All of these will run on a Windows, Macintosh or Linux desktop / laptop, including most netbooks. And they are all free, open source software. An easy way to get them all, packaged in an openSUSE Linux appliance, is to download Data Journalism Developer Studio 2012.

Ggplot2: Elegant Graphics for Data Analysis (Use R)

ggplot2 is an advanced graphics package for the R programming language. It is based on the grammar of graphics (Grammar of Graphics 2ND Edition). ggplot2 generates the most beautiful static graphics I’ve ever seen. You can use ggplot2 at any stage of your analysis. Simple exploratory plots can be made with a simple call to the “qplot” function, and when you’re ready to create a final report or presentation, you can get publication-quality graphics.

The two things I like most about the ggplot2 package are

  • The absolutely stunning visual appeal of the plots it produces: Dr. Wickham has paid great attention to the visual aspects of the output. I don’t know of another package in any language that generates such beautiful plots.
  • The numerous built-in analysis methods: Boxplots, kernel and quantile regression and smoothing, faceted plots – all are “standard equipment” with ggplot2.
ggplot2: Elegant Graphics for Data Analysis (Use R) by Hadley Wickham http://t.co/GfXsOJm via @
@znmeb
M. Edward Borasky

Interactive Graphics for Data Analysis: Principles and Examples

This book is a complete course in interactive graphics for data analysis. It is mostly based on the Mondrian interactive statistical data visualization system, although there is some use of R as well. The first part covers the basic tools, and the second part gives case studies.

The case studies really are the best part of the book. They cover geographical analysis, some interesting history from the sinking of the Titanic and the 2004 Florida election. As I note below, there is some overlap in tools between Mondrian and GGobi, but you really need both books and both packages to be able to do everything.

Interactive Graphics for Data Analysis: Principles and Examples (Chapman & Hall/CRC C... by Martin Theus http://t.co/6XbdVWU via @
@znmeb
M. Edward Borasky

Interactive and Dynamic Graphics for Data Analysis: With R and Ggobi (Use R)

As the title implies, this book is also a complete course in data analysis using interactive graphics. But the focus here is on R and GGobi rather than Mondrian. While there is some overlap in the tools, there are some things Mondrian does that GGobi doesn’t do, and vice versa. A partial list:

  • Geographic datasets: Mondrian only
  • Mosaic plots: Mondrian only
  • Classification: GGobi only
  • Clustering: GGobi only
  • Social network graphs: GGobi only

In addition, GGobi integrates directly with R and ggplot2 via the rggobi and DescribeDisplay packages. There are some integration points between R and Mondrian, but that integration isn’t as tight as it is with R, GGobi and ggplot2.

Interactive and Dynamic Graphics for Data Analysis: With R and GGobi (Use R) by Dianne Cook http://t.co/TxTU5ov via @
@znmeb
M. Edward Borasky
© 2011 Borasky Research Journal Suffusion theme by Sayontan Sinha