Clicky Real-Time Web Analytics!
|
Prerequisite Software
To get the most out of these books, you will need to install some software. You will need Mondrian, R, GGobi, and the ggplots, rggobi and DescribeDisplay R packages. All of these will run on a Windows, Macintosh or Linux desktop / laptop, including most netbooks. And they are all free, open source software.
ggplot2 is an advanced graphics package for the R programming language. It is based on the grammar of graphics (The Grammar of Graphics (Statistics and Computing)). ggplot2 generates the most beautiful static graphics I’ve ever seen. You can use ggplot2 at any stage of your analysis. Simple exploratory plots can be made with a simple call to the “qplot” function, and when you’re ready to create a final report or presentation, you can get publication-quality graphics.
The two things I like most about the ggplot2 package are
- The absolutely stunning visual appeal of the plots it produces: Dr. Wickham has paid great attention to the visual aspects of the output. I don’t know of another package in any language that generates such beautiful plots.
- The numerous built-in analysis methods: Boxplots, kernel and quantile regression and smoothing, faceted plots – all are “standard equipment” with ggplot2.
This book is a complete course in interactive graphics for data analysis. It is mostly based on the Mondrian interactive statistical data visualization system, although there is some use of R as well. The first part covers the basic tools, and the second part gives case studies.
The case studies really are the best part of the book. They cover geographical analysis, some interesting history from the sinking of the Titanic and the 2004 Florida election. As I note below, there is some overlap in tools between Mondrian and GGobi, but you really need both books and both packages to be able to do everything.
As the title implies, this book is also a complete course in data analysis using interactive graphics. But the focus here is on R and GGobi rather than Mondrian. While there is some overlap in the tools, there are some things Mondrian does that GGobi doesn’t do, and vice versa. A partial list:
- Geographic datasets: Mondrian only
- Mosaic plots: Mondrian only
- Classification: GGobi only
- Clustering: GGobi only
- Social network graphs: GGobi only
In addition, GGobi integrates directly with R and ggplot2 via the rggobi and DescribeDisplay packages. There are some integration points between R and Mondrian, but that integration isn’t as tight as it is with R, GGobi and ggplot2.
Tweet This Post Plurk This Post Buzz This Post Delicious Digg This Post Facebook MySpace Ping This Post Reddit Stumble This Post
I’ve just learned that CrisisCamp will be starting up a Portland session this weekend. Here’s the EventBrite listing if you’re in or near Portland, Oregon:
http://crisiscamphaitipdx.eventbrite.com/
What is CrisisCamp? “This Saturday, (and Sunday if there’s interest) CrisisCamp will bring together volunteers to collaborate on technology projects which aim to assist in Haiti’s relief efforts by providing data, information, maps and technical assistance to NGOs, relief agencies and the public.”
Projects include:
Port Au Prince Basemap
We Have, We Need Exchange We-Have-We-Need-logo.png?
Languages and Translation
Mobile Applications 4 Crisis Response
NPR’s Crisis Wiki
Family Reunification Systems
Tweak the Tweet
I’ll be there — I’m hoping to help out with Tweak the Tweet.
Please join me if you can – this event is free and open to the public. You don’t have to be technical to volunteer time. There will be projects that can be done by anybody who has used Google.
Tweet This Post Plurk This Post Buzz This Post Delicious Digg This Post Facebook MySpace Ping This Post Reddit Stumble This Post
I haven’t posted personal stuff on my blogs very often, even when I had six of them. But I learned something today that I feel I have to write about. I sometimes joke that my Twitter screen name, “znmeb”, is my initials, “meb”, preceded by the initials of my two childhood heroes, Zorro and Captain Nemo. In actual fact, though, my two childhood heroes were my father, a biochemist, and Jack Benny.
I’ve been able to pick up parts of the story but not the complete history. The brief version is this: CBS discovered some master recordings of Jack Benny programs ranging from 1952 through 1961. The complete list is here. The International Jack Benny Fan Club offered to pay to have them digitized, thus preserving them from the inevitable decay that old media will suffer.
There were some lengthy negotiations, including a letter from Benny’s estate allowing the release of the masters. However, CBS has decided that they will not release the masters to the International Jack Benny Fan Club for digital preservation. If this decision is not reversed, the media will decay in storage and this part of America’s history will be lost.
Here are a few more links I’ve been able to track down on Twitter:
http://www.nypost.com/p/entertainment/tv/cbs_benches_jack_benny_n7b76Pou9inaGa1S9JqDxM
http://www.tvweek.com/blogs/tvbizwire/2010/01/cbs-nixes-unearthed-jack-benny.php
http://voiceactors.wordpress.com/2010/01/19/jack-benny-program/
http://overlawyered.com/2010/01/cant-clear-the-copyrights-contd/
http://www.boingboing.net/2010/01/18/cbs-uncovers-rare-ja.html
http://themoderatevoice.com/59065/killing-comedic-heritage-cbs-reportedly-seals-some-classic-jack-benny-show-comedy-masters/
I can’t imagine what my youth would have been like without Jack Benny. And I don’t think it’s right or even good business for CBS to withhold these broadcasts from preservation and future audiences. If you feel that way too, please join me in sending an email to
http://www.cbs.com/info/user_services/fb_global_form.php
or send a snail mail to
Sumner Redstone, Executive Chairman
Leslie Moonves, President and CEO
51 West 52 Street
New York, New York 10019-6188
Please note: it isn’t about the money. The funds exist to make this preservation happen! If the preservation is done, future audiences will be able to enjoy these shows. If it isn’t, they’ll be lost.
There’s a Facebook page for this as well, at
http://www.facebook.com/pages/Tell-Les-Moonves-to-preserve-The-Jack-Benny-Benny-Program-masters/287864780538. If you’re on Facebook, please join us there.
Tweet This Post Plurk This Post Buzz This Post Delicious Digg This Post Facebook MySpace Ping This Post Reddit Stumble This Post
As you probably know, there are quite a few tools out there that attempt to “score” Twitter users. I’ve looked at most of them, and I have yet to find one that does everything. But the one that’s the most flexible, customizable and useful to me as a micro-blogger is Twitalyzer 2.0.
Twitalyzer is the brainchild of Eric T. Peterson (@erictpeterson), a noted web analytics expert and author of Web Analytics Demystified: A Marketer’s Guide to Understanding How Your Web Site Affects Your Business . Eric brings a passion for analytics and an understanding of the need for actionable metrics and reports to the Twitter scoring arena, something I haven’t seen in any other tool.
What’s new in 2.0? Quite a bit. There are more metrics, a 51-page handbook, tools for segmentation of users, benchmarks, goals, sentiment analysis, and, of course, more of the flexible dashboards and reporting that set Twitalyzer 1.0 apart from the other Twitter scoring tools. I counted 15 separate reports, and I probably missed some. You can plot trends for 22 separate metrics over time.
The two things I liked the most about Twitalyzer 1.0 were:
- All of the metrics were defined. You could see what was being counted and what those counts meant.
- There were clear recommendations on how to improve your scores.
Twitalyzer 2.0 has kept that. There are many more metrics, but they are still all defined. And the recommendations are still there, along with a new “Goals” report that allows you to set goals and track your progress towards them.
But in my view, the most important new feature of Twitalyzer 2.0 is the Segmentation / Tagging functionality. I’m still learning how to use this, but the examples in the handbook are very well written, and it’s clearly a vital part of any analytics tool set.
How does Twitalyzer compare with the other Twitter scoring tools? There are two others I’ve used in depth, TwitterGrader and Klout. TwitterGrader reports only a single score, and there is no definition of how that score is derived or what actions one should take to improve it. Klout has a few reports, a number of metrics and recommendations for how to improve them, but the Klout reports seem to be full of old data, and it can take hours for them to update your results. And I didn’t see anything like Twitalyzer’s segmentation capability.
There are a few things that could be improved.
- Location: Twitalyzer maintains separate lists for all “spellings” of a locality. For example, there are separate lists for “Portland, OR”, “Portland, Oregon” and “Portland, Oregon, USA”. Twitalyzer isn’t the only tool that suffers from this – TwitterGrader does too, and many tools don’t do location-based analytics at all. But it would be fairly easy to combine most of the spellings and misspellings of a given metropolitan area like Portland / Vancouver into a single location, using a combination of Twitter Search and the Google Maps Geocoding API.
- CSV export of metrics time series: Twitalyzer can export a single time series to CSV format now in the “Trends” menu. But there are 22 or so metrics; a combined CSV file of all of them would be very useful, especially for someone like me who wants to correlate Twitter metrics with other metrics, campaigns, events, and so on.
- I’d like to be able to integrate Twitalyzer data with the Clicky web analytics tools. There is Google Analytics integration now, but I’m not sure I’m going to stay with Google Analytics, even though it’s free and an “industry standard.” Clicky is real-time; Google Analytics isn’t.
Tweet This Post Plurk This Post Buzz This Post Delicious Digg This Post Facebook MySpace Ping This Post Reddit Stumble This Post
Actually, now that I think about it, the birds in the PDX area don’t fly South for the winter. If the weather gets really bad, they just get a motel on the Oregon Coast. But seriously, folks, as you may know, I had four WordPress blogs, a Blogger blog, a Posterous blog and also have a LinkedIn page and a Facebook page. And I tweet. A lot. More than any other Portlander, I think.
Something had to give – and it was five of the six blogs. I’ve left the Posterous blog on line – I’m still moving posts from it to this one. And I’ve left http://borasky-research.net/smart-at-znmeb on line, because there are still some links to it out there. But I’ve imported all of its posts, comments and pages here.
I’ve imported the Blogger blog here and deleted it. And I’ve imported the Linux Capacity Planning blog and the AlgoCompSynth blog posts, comments and pages here as well and deleted them. So this is it – my only blog! Accept no substitutes!
Other changes:
- I’ve switched from the Carrington theme to the Atahualpa theme. I went through a lot of themes in the process, and in the end, it was the fact that I could get a three-column blog easily and the nature images at the top that led me to this one. It’s amazingly flexible and powerful.
- I’ve integrated IntenseDebate commenting. This means you can log in with openID, Twitter or Facebook to comment, and your comments everywhere in the IntenseDebate world can be synchronized. I didn’t really do any investigation of IntenseDebate vs. Disqus as a comment management platform – there’s a popular WordPress plugin for IntenseDebate, so I went with it.
- Each post has buttons at the bottom so you can post it on Twitter, Facebook, Delicious, Reddit, Digg, StumbleUpon and a few other places. Each post also has a TweetMeme retweet button.
Tweet This Post Plurk This Post Buzz This Post Delicious Digg This Post Facebook MySpace Ping This Post Reddit Stumble This Post
The R programming language was featured about a year ago in a New York Times article (http://bit.ly/iaqQ). I’ve been an R user since 2000, so I’ve collected some resources for people who want to get started with R.
The first place to start is the R Project web site at http://www.r-project.org/. Next, you’ll actually want to install R itself. There are several options, depending on your environment.
- Linux
- Using your distro’s native packages. Most Linux distros either have R available in the base repositores or have it available from external repositories. The advantage of this is that it will be integrated with your package management system. The disadvantages are that you may not get the latest version of R, and there is no uniformity between distros about how R itself is named or how many R libraries are packaged.
- Download a package from the Comprehensive R Archive Network (CRAN). Select a mirror at http://cran.r-project.org/mirrors.html. Then follow the “Linux” link at the top. That will give you packages for Ubuntu, Debian, Suse and Red Hat. Red Hat includes Red Hat Enterprise 4 and 5 plus Fedora. Suse includes both the SUSE Linux Enterprise and openSUSE versions.
- Build from source. Instructions for doing this are at http://cran.fhcrc.org/doc/manuals/R-admin.html
- Windows or MacOS X
- Select a mirror at http://cran.r-project.org/mirrors.html.
- Follow the Windows or MacOS X link in the top panel, just under the Linux link.
- On Windows, follow the “base” link and download “R-2.10.1-win32.exe”. It’s a standard Windows installer, which you just run.
- On MacOS X, download and install “R-2.10.1.dmg”
I usually build R from source on my Linux machines. Once you’ve got R installed, you should have most of the documentation. But everything is also available on line at http://cran.r-project.org/manuals.html. You’ll definitely want to read the Introduction at http://cran.r-project.org/doc/manuals/R-intro.html and the FAQ at http://cran.r-project.org/faqs.html.
Here’s a few books on R and statistics / data visualization:
Data Visualization and R Programming Books
Tweet This Post Plurk This Post Buzz This Post Delicious Digg This Post Facebook MySpace Ping This Post Reddit Stumble This Post
|
|