Top Ten Links

Top Ten Pages

Clicky Real-Time Web Analytics!

Clicky Web Analytics

What the Heck is Happening to Internet Explorer?

If you’re a web analytics aficionado, you know that most analytics tools, including Clicky, give you statistics on which browsers your visitors are using. Through the magic of Clicky’s real-time analysis, you can see this for my web site for the past 30 days. I’m opening this blog post up for comments – the question is, “What the Heck is Happening to Internet Explorer?”

Clicky Web Analytics

Post to Twitter Tweet This Post Post to Plurk Plurk This Post Post to Yahoo Buzz Buzz This Post Post to Delicious Delicious Post to Digg Digg This Post Post to Facebook Facebook Post to MySpace MySpace Post to Ping.fm Ping This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post

Eight R Video Tutorials on VCASMO

Thanks to Drew Conway (@drewconway), a PhD student at New York University, there are now eight excellent video tutorials on using the R language up on VCASMO. I think I should do a blog post about how magical VCASMO is and why they should be eating SlideShare’s lunch, but for now, I’ll just say

  • The presentations there are synchronized video and slides, and
  • They have an API.
  1. MATLAB/R Dictionary (Rosetta Stone Talk – 1/3)
    Presentation given by Harlan Harris at the NYC R Statistical Meetup on January 7, 2010.
  2. Learning R via Python…or the other way around (Rosetta Stone Talk – 2/3)
    Presentation given by Drew Conway at the NYC R Statistical Meetup on January 7, 2010.
  3. Data munging with SQL and R (Rosetta Stone Talk – 3/3)
    Presentation given by Josh Reich at the NYC R Statistical Meetup on January 7, 2010.
  4. Use Rapache: It Works!

    Presentations given at the Bay Area useR Group on January 10, 2010 by Jeff Horner, creator of the Rapache module, on “R-driven web applications”.
  5. Web Development with R
    Presentations given at the Bay Area useR Group on January 10, 2010 by Jeroen Ooms, on how to create web applications use R.
  6. Soical Network Analysis in R

    Presentation by Drew Conway on August 6, 2009 at the NYC R Statistical Programming Meetup on how to perform basic social network analysis in R using the igraph package.
  7. Introduction to the Grammar of Graphics with ggplot2 in R

    A detailed introduction to the Grammar of Graphics as implemented in R with the data visualization library ggplot2. This talk was given by Harlan Harris to the NYC R Statistical Meetup on December 3, 2009.
  8. Visualizing Data in R with ggplot2

    Drew Conway presents a brief talk on how to visualize data in R with ggplot2 at the NYC R Statistical Meetup on December 3, 2009.

In case you missed them, here are Amazon and Powell’s links for the ggplot2 book:

ggplot2: Elegant Graphics for Data Analysis (Use R)

Post to Twitter Tweet This Post Post to Plurk Plurk This Post Post to Yahoo Buzz Buzz This Post Post to Delicious Delicious Post to Digg Digg This Post Post to Facebook Facebook Post to MySpace MySpace Post to Ping.fm Ping This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post

Three Must-Have Books on Data Visualization

Prerequisite Software

To get the most out of these books, you will need to install some software. You will need Mondrian, R, GGobi, and the ggplots,  rggobi and DescribeDisplay R packages. All of these will run on a Windows, Macintosh or Linux desktop / laptop, including most netbooks. And they are all free, open source software.

ggplot2: Elegant Graphics for Data Analysis (Use R)

ggplot2 is an advanced graphics package for the R programming language. It is based on the grammar of graphics (The Grammar of Graphics (Statistics and Computing)). ggplot2 generates the most beautiful static graphics I’ve ever seen. You can use ggplot2 at any stage of your analysis. Simple exploratory plots can be made with a simple call to the “qplot” function, and when you’re ready to create a final report or presentation, you can get publication-quality graphics.

The two things I like most about the ggplot2 package are

  • The absolutely stunning visual appeal of the plots it produces: Dr. Wickham has paid great attention to the visual aspects of the output. I don’t know of another package in any language that generates such beautiful plots.
  • The numerous built-in analysis methods: Boxplots, kernel and quantile regression and smoothing, faceted plots – all are “standard equipment” with ggplot2.

Interactive Graphics for Data Analysis: Principles and Examples (Chapman & Hall/CRC Computer Science & Data Analysis)

This book is a complete course in interactive graphics for data analysis. It is mostly based on the Mondrian interactive statistical data visualization system, although there is some use of R as well. The first part covers the basic tools, and the second part gives case studies.

The case studies really are the best part of the book. They cover geographical analysis, some interesting history from the sinking of the Titanic and the 2004 Florida election. As I note below, there is some overlap in tools between Mondrian and GGobi, but you really need both books and both packages to be able to do everything.

Interactive and Dynamic Graphics for Data Analysis: With R and GGobi (Use R)

As the title implies, this book is also a complete course in data analysis using interactive graphics. But the focus here is on R and GGobi rather than Mondrian. While there is some overlap in the tools, there are some things Mondrian does that GGobi doesn’t do, and vice versa. A partial list:

  • Geographic datasets: Mondrian only
  • Mosaic plots: Mondrian only
  • Classification: GGobi only
  • Clustering: GGobi only
  • Social network graphs: GGobi only

In addition, GGobi integrates directly with R and ggplot2 via the rggobi and DescribeDisplay packages. There are some integration points between R and Mondrian, but that integration isn’t as tight as it is with R, GGobi and ggplot2.

Post to Twitter Tweet This Post Post to Plurk Plurk This Post Post to Yahoo Buzz Buzz This Post Post to Delicious Delicious Post to Digg Digg This Post Post to Facebook Facebook Post to MySpace MySpace Post to Ping.fm Ping This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post

Tweak-the-Tweet

As many of you know, I’ve been doing a lot of research into social media analytics, especially in algorithms for text analysis of Twitter data. My focus has been on what the machine learning people call unsupervised learning. Why? Because I’ve come to the realization that tweets are an evolving language. They’re really a meta-language – a tweet could be about a web page, a blog, a picture, etc.

Tweets often aren’t complete sentences or other linguistic constructs as we know them. Twitter has @replies, hashtags, retweets, “follow friday”, trending topics, link shorteners, and a number of other new linguistic constructs that don’t appear in the natural human languages we use in everyday conversation.

There is now a project called Tweak-the-Tweet, TtT for short. TtT is a project of the University of Colorado at Boulder. Here’s the news release: CU Grad Student’s ‘Tweet’ Approach Streamlines Online Communications During Haiti Disaster.

As the story notes, Tweak-the-Tweet is helping Haiti relief efforts by providing standardized syntax for Twitter communications. I think this is very important. You can think of TtT as the Twitter equivalent of the telegraph and ham radio’s Morse code, or police and citizen band’s “10-codes.” It’s a way of conveying a lot of information in a small 140-character space. ReadWriteWeb covered the story here

Tweak the Tweet: New Twitter Hashtag Syntax for Sharing Information During Catastrophes

Here’s the main page for the project: HELPING HAITI: TWEAK the TWEET (TtT). You can follow them on Twitter and visit the project wiki. One of the projects we’ll be working on at this weekend’s CrisisCampPDX is Tweak-the-Tweet, so I’ll be posting an update next week. Meanwhile, I urge anyone who can to help out this worthwhile effort.

Post to Twitter Tweet This Post Post to Plurk Plurk This Post Post to Yahoo Buzz Buzz This Post Post to Delicious Delicious Post to Digg Digg This Post Post to Facebook Facebook Post to MySpace MySpace Post to Ping.fm Ping This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post

CrisisCamp Coming to Portland!

I’ve just learned that CrisisCamp will be starting up a Portland session this weekend. Here’s the EventBrite listing if you’re in or near Portland, Oregon:

http://crisiscamphaitipdx.eventbrite.com/

What is CrisisCamp? “This Saturday, (and Sunday if there’s interest) CrisisCamp will bring together volunteers to collaborate on technology projects which aim to assist in Haiti’s relief efforts by providing data, information, maps and technical assistance to NGOs, relief agencies and the public.”

Projects include:

Port Au Prince Basemap

We Have, We Need Exchange We-Have-We-Need-logo.png?

Languages and Translation

Mobile Applications 4 Crisis Response

NPR’s Crisis Wiki

Family Reunification Systems

Tweak the Tweet

I’ll be there — I’m hoping to help out with Tweak the Tweet.

Please join me if you can – this event is free and open to the public. You don’t have to be technical to volunteer time.  There will be projects that can be done by anybody who has used Google.

Post to Twitter Tweet This Post Post to Plurk Plurk This Post Post to Yahoo Buzz Buzz This Post Post to Delicious Delicious Post to Digg Digg This Post Post to Facebook Facebook Post to MySpace MySpace Post to Ping.fm Ping This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post

Please Help Save The Jack Benny Masters!

I haven’t posted personal stuff on my blogs very often, even when I had six of them. But I learned something today that I feel I have to write about. I sometimes joke that my Twitter screen name, “znmeb”, is my initials, “meb”, preceded by the initials of my two childhood heroes, Zorro and Captain Nemo. In actual fact, though, my two childhood heroes were my father, a biochemist, and Jack Benny.

I’ve been able to pick up parts of the story but not the complete history. The brief version is this: CBS discovered some master recordings of Jack Benny programs ranging from 1952 through 1961. The complete list is here. The International Jack Benny Fan Club offered to pay to have them digitized, thus preserving them from the inevitable decay that old media will suffer.

There were some lengthy negotiations, including a letter from Benny’s estate allowing the release of the masters. However, CBS has decided that they will not release the masters to the International Jack Benny Fan Club for digital preservation. If this decision is not reversed, the media will decay in storage and this part of America’s history will be lost.

Here are a few more links I’ve been able to track down on Twitter:

http://www.nypost.com/p/entertainment/tv/cbs_benches_jack_benny_n7b76Pou9inaGa1S9JqDxM

http://www.tvweek.com/blogs/tvbizwire/2010/01/cbs-nixes-unearthed-jack-benny.php

http://voiceactors.wordpress.com/2010/01/19/jack-benny-program/

http://overlawyered.com/2010/01/cant-clear-the-copyrights-contd/

http://www.boingboing.net/2010/01/18/cbs-uncovers-rare-ja.html

http://themoderatevoice.com/59065/killing-comedic-heritage-cbs-reportedly-seals-some-classic-jack-benny-show-comedy-masters/

I can’t imagine what my youth would have been like without Jack Benny. And I don’t think it’s right or even good business for CBS to withhold these broadcasts from preservation and future audiences. If you feel that way too, please join me in sending an email to

http://www.cbs.com/info/user_services/fb_global_form.php

or send a snail mail to

Sumner Redstone, Executive Chairman
Leslie Moonves, President and CEO

51 West 52 Street
New York, New York 10019-6188

Please note: it isn’t about the money. The funds exist to make this preservation happen! If the preservation is done, future audiences will be able to enjoy these shows. If it isn’t, they’ll be lost.

There’s a Facebook page for this as well, at

http://www.facebook.com/pages/Tell-Les-Moonves-to-preserve-The-Jack-Benny-Benny-Program-masters/287864780538. If you’re on Facebook, please join us there.

Post to Twitter Tweet This Post Post to Plurk Plurk This Post Post to Yahoo Buzz Buzz This Post Post to Delicious Delicious Post to Digg Digg This Post Post to Facebook Facebook Post to MySpace MySpace Post to Ping.fm Ping This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post

Twitalyzer 2.0 - A Big Step Forward

As you probably know, there are quite a few tools out there that attempt to “score” Twitter users. I’ve looked at most of them, and I have yet to find one that does everything. But the one that’s the most flexible, customizable and useful to me as a micro-blogger is Twitalyzer 2.0.

Twitalyzer is the brainchild of Eric T. Peterson (@erictpeterson), a noted web analytics expert and author of Web Analytics Demystified: A Marketer’s Guide to Understanding How Your Web Site Affects Your Business. Eric brings a passion for analytics and an understanding of the need for actionable metrics and reports to the Twitter scoring arena, something I haven’t seen in any other tool.

What’s new in 2.0? Quite a bit. There are more metrics, a 51-page handbook, tools for segmentation of users, benchmarks, goals, sentiment analysis, and, of course, more of the flexible dashboards and reporting that set Twitalyzer 1.0 apart from the other Twitter scoring tools. I counted 15 separate reports, and I probably missed some. You can plot trends for 22 separate metrics over time.

The two things I liked the most about Twitalyzer 1.0 were:

  1. All of the metrics were defined. You could see what was being counted and what those counts meant.
  2. There were clear recommendations on how to improve your scores.

Twitalyzer 2.0 has kept that. There are many more metrics, but they are still all defined. And the recommendations are still there, along with a new “Goals” report that allows you to set goals and track your progress towards them.

But in my view, the most important new feature of Twitalyzer 2.0 is the Segmentation / Tagging functionality. I’m still learning how to use this, but the examples in the handbook are very well written, and it’s clearly a vital part of any analytics tool set.

How does Twitalyzer compare with the other Twitter scoring tools? There are two others I’ve used in depth, TwitterGrader and Klout. TwitterGrader reports only a single score, and there is no definition of how that score is derived or what actions one should take to improve it. Klout has a few reports, a number of metrics and recommendations for how to improve them, but the Klout reports seem to be full of old data, and it can take hours for them to update your results. And I didn’t see anything like Twitalyzer’s segmentation capability.

There are a few things that could be improved.

  1. Location: Twitalyzer maintains separate lists for all “spellings” of a locality. For example, there are separate lists for “Portland, OR”, “Portland, Oregon” and “Portland, Oregon, USA”. Twitalyzer isn’t the only tool that suffers from this – TwitterGrader does too, and many tools don’t do location-based analytics at all. But it would be fairly easy to combine most of the spellings and misspellings of a given metropolitan area like Portland / Vancouver into a single location, using a combination of Twitter Search and the Google Maps Geocoding API.
  2. CSV export of metrics time series: Twitalyzer can export a single time series to CSV format now in the “Trends” menu. But there are 22 or so metrics; a combined CSV file of all of them would be very useful, especially for someone like me who wants to correlate Twitter metrics with other metrics, campaigns, events, and so on.
  3. I’d like to be able to integrate Twitalyzer data with the Clicky web analytics tools. There is Google Analytics integration now, but I’m not sure I’m going to stay with Google Analytics, even though it’s free and an “industry standard.” Clicky is real-time; Google Analytics isn’t.

Post to Twitter Tweet This Post Post to Plurk Plurk This Post Post to Yahoo Buzz Buzz This Post Post to Delicious Delicious Post to Digg Digg This Post Post to Facebook Facebook Post to MySpace MySpace Post to Ping.fm Ping This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post

Twitter Search Changes – What Do They Mean?

As you probably recall, Twitter released the Streaming API into production on January 5, 2010. My in-depth analysis is here. Six days later, Twitter made the following announcement:

Search API: High-Volume and Repeated Queries Should Migrate to Streaming API

What does this mean? Well, if you’re a vendor of Twitter monitoring tools that depend on Twitter Search for your input data, you should be working on this now. Why?

As I note in my analysis of the streaming API, there is an extra relevance and ranking filter between the raw tweets coming into Twitter and the tweets that get indexed for Twitter Search. As a result of this extra filter, there are more tweets available to users of the Streaming API than there are to users of the Search API.

I haven’t found any details on the current relevance and ranking filters, where they are going, or how fast they’re planning to get where they’re going. But the message is clear:

“This transition begins a fundamental shift towards a high value, high result quality, lower query volume Search API.” And in a thread on the Twitter API Developers’ Google Group, the author of that announcement, John Kalucki, added, “Both Search and Streaming discard all statuses from low-quality users. Search additionally filters the remaining statuses for relevance and ranking purposes. This may be hard to see now, unless you cross-reference the Streaming results, but this divergence will soon accelerate and become more obvious.”

I’m already seeing some evidence of this on the Twitter Developers’ Google Group. Specifically, see this thread. The message to us developers is, “Get serious with the Streaming API.” But what about end users?

Certainly, if you’re using monitoring tools that depend on Twitter Search, you should be asking the vendors about this. But if you’re just using Twitter Search for your own purposes, I think this is very good news. Personally, I recommend using Advanced Twitter Search at http://search.twitter.com/advanced. Again, from the announcement:

“Shifting the heaviest users away from Search should dramatically improve the overall Search experience. Resources can be allocated to the search architecture’s strength: historical, complex and high value queries.”

Post to Twitter Tweet This Post Post to Plurk Plurk This Post Post to Yahoo Buzz Buzz This Post Post to Delicious Delicious Post to Digg Digg This Post Post to Facebook Facebook Post to MySpace MySpace Post to Ping.fm Ping This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post

The geese are migrating, and so have I – Part Deux

Actually, now that I think about it, the birds in the PDX area don’t fly South for the winter. If the weather gets really bad, they just get a motel on the Oregon Coast. But seriously, folks, as you may know, I had four WordPress blogs, a Blogger blog, a Posterous blog and also have a LinkedIn page and a Facebook page. And I tweet. A lot. More than any other Portlander, I think.

Something had to give – and it was five of the six blogs. I’ve left the Posterous blog on line – I’m still moving posts from it to this one. And I’ve left http://borasky-research.net/smart-at-znmeb on line, because there are still some links to it out there. But I’ve imported all of its posts, comments and pages here.

I’ve imported the Blogger blog here and deleted it. And I’ve imported the Linux Capacity Planning blog and the AlgoCompSynth blog posts, comments and pages here as well and deleted them. So this is it – my only blog! Accept no substitutes!

Other changes:

  1. I’ve switched from the Carrington theme to the Atahualpa theme. I went through a lot of themes in the process, and in the end, it was the fact that I could get a three-column blog easily and the nature images at the top that led me to this one. It’s amazingly flexible and powerful.
  2. I’ve integrated IntenseDebate commenting. This means you can log in with openID, Twitter or Facebook to comment, and your comments everywhere in the IntenseDebate world can be synchronized. I didn’t really do any investigation of IntenseDebate vs. Disqus as a comment management platform – there’s a popular WordPress plugin for IntenseDebate, so I went with it.
  3. Each post has buttons at the bottom so you can post it on Twitter, Facebook, Delicious, Reddit, Digg, StumbleUpon and a few other places. Each post also has a TweetMeme retweet button.

Post to Twitter Tweet This Post Post to Plurk Plurk This Post Post to Yahoo Buzz Buzz This Post Post to Delicious Delicious Post to Digg Digg This Post Post to Facebook Facebook Post to MySpace MySpace Post to Ping.fm Ping This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post

Getting Started with the R Programming Language

The R programming language was featured about a year ago in a New York Times article (http://bit.ly/iaqQ). I’ve been an R user since 2000, so I’ve collected some resources for people who want to get started with R.

The first place to start is the R Project web site at http://www.r-project.org/. Next, you’ll actually want to install R itself. There are several options, depending on your environment.

  • Linux
    • Using your distro’s native packages. Most Linux distros either have R available in the base repositores or have it available from external repositories. The advantage of this is that it will be integrated with your package management system. The disadvantages are that you may not get the latest version of R, and there is no uniformity between distros about how R itself is named or how many R libraries are packaged.
    • Download a package from the Comprehensive R Archive Network (CRAN). Select a mirror at http://cran.r-project.org/mirrors.html. Then follow the “Linux” link at the top. That will give you packages for Ubuntu, Debian, Suse and Red Hat. Red Hat includes Red Hat Enterprise 4 and 5 plus Fedora. Suse includes both the SUSE Linux Enterprise and openSUSE versions.
    • Build from source. Instructions for doing this are at http://cran.fhcrc.org/doc/manuals/R-admin.html
  • Windows or MacOS X
    • Select a mirror at http://cran.r-project.org/mirrors.html.
    • Follow the Windows or MacOS X link in the top panel, just under the Linux link.
      • On Windows, follow the “base” link and download “R-2.10.1-win32.exe”. It’s a standard Windows installer, which you just run.
      • On MacOS X, download and install “R-2.10.1.dmg”

I usually build R from source on my Linux machines. Once you’ve got R installed, you should have most of the documentation. But everything is also available on line at http://cran.r-project.org/manuals.html. You’ll definitely want to read the Introduction at http://cran.r-project.org/doc/manuals/R-intro.html and the FAQ at http://cran.r-project.org/faqs.html.

Here’s a few books on R and statistics / data visualization:

Data Visualization and R Programming Books

Post to Twitter Tweet This Post Post to Plurk Plurk This Post Post to Yahoo Buzz Buzz This Post Post to Delicious Delicious Post to Digg Digg This Post Post to Facebook Facebook Post to MySpace MySpace Post to Ping.fm Ping This Post Post to Reddit Reddit Post to StumbleUpon Stumble This Post

Get Adobe Flash playerPlugin by wpburn.com wordpress themes