Borasky Research Journal Google+ Page

Borasky Research Journal Amazon Store


Data Journalism Developer Studio 2012LX Blog

 

Updated 2011-07-31:

1. I’ve turned off One True Fan, most likely permanently, because it was confusing some Highligher users.

2. I’ve turned off Disqus and WordPress commenting as well, though this might be temporary. For the moment, I want to test out Highlighter as the main/only method of discussion.

3. I haven’t found a list of all sites using Highlighter yet, but I can recommend two run by friends of mine, Audrey Watters (@audreywatters) and Michelle Rae Anderson (@mediaChick):


At O’Reilly’s miniTOC Portland conference yesterday (hashtag #TOCPDX), Josh Mullineaux of Highlighter.com presented a brief overview of a new tool for websites, called Highlighter. I’ve enabled Highlighter on this site. There’s a video on the Highlighter home page, but here’s how it works:

1.  Highlight any text or image on the site with the mouse / trackpad. You will get a three-option menu: “Save This”, “Share It” and “Comment”.

2. The “Save This” option saves the highlighted content in your Highlighter profile. The “Share It” gives you the option of sharing the highlighted content on Facebook, Twitter, or in an email.

3. The “Comment” option is a little more interesting, and I think this is the awesome part of Highlighter. You can comment on the highlighted content, and your comment is sent to the website owner, me in this case, for moderation. If the owner approves, the comment is posted and any Highlighter subscriber can see it and join in the discussion.

There’s a good bit more to this:

  • Highlighter subscribers can follow each others’ streams, just like on Twitter or Facebook. You can think of it as a social network for publishers and their readers. You can join Highlighter here: http://highlighter.com/register/
  • For the publisher, there are detailed analytics about how your readers are engaging with your site.
  • Subscribers have a profile page, which I’ve linked to my Twitter and LinkedIn profiles. Mine is http://highlighter.com/znmeb/
The installation is simple – in my case, I’ve simply installed a WordPress plugin. But any web site where you can install JavaScript in the page footer can use Highlighter. And even if you aren’t a publisher, you can subscribe to Highlighter and join in the discussion. I love it!
 
Goodbye and thanks for all the animated GIFs http://twitpic.com/5p6pn1
@znmeb
M. Edward Borasky

 

I’m really conflicted about this. On the one hand, I know Twitter needs to sell advertising, and web services need to promote themselves. And yes, this is a real news event, not a manufactured story. But I wonder – are we heading back to the days of “Yellow Journalism” in the tweet stream? Please comment below.

 

 According to Mashable, “Kraft Looks to Reward Twitter Users Who Tweet About Mac & Cheese“,

Under a new program quietly rolled out over the past few weeks, any time two people individually use the phrase “mac & cheese” in a tweet, they’ll each get a link pointing out the “Mac & Jinx.” The first one to click the link and give Kraft his or her address gets five free boxes of Kraft’s mac and cheese and a T-shirt.

It seems that “Mac & Cheese” is now a Trending Topic, as of 2011-03-08 19:12 UTC. But when you click on the topic, you see this Promoted Tweet:

What could be worse? Alyssa Milano, who has 1,403,372 followers, posted this tweet:

This could get interesting. 

Update: it has gotten interesting. @WootLive has gotten into the act.

Update: FriendsEAT has tweeted about capacity issues stemming from their article.

Oh, by the way — the Kraft campaign that started this whole thing is being run by Crispin Porter + Bogusky. Does that name sound familiar? It’s the same agency that came up with the GroupOn Super Bowl ads about Tibet and seafood curry.

 

Update 2011-03-20

For a variety of reasons, I have replaced the Social Media Analytics Research Toolkit, Code Like A Pirate and Project Kipling with a new, modular appliance called the Data Journalism Developer Studio. All of the software found in those three appliances can be installed via scripts provided in the new appliance. Links:


Upon careful reading of Twitter’s API Terms of Service, I have decided to temporarily remove two appliances from the SUSE Studio Gallery. Those two appliances are the Social Media Analytics Research Toolkit (SMART@znmeb) and Project Kipling Real-Time Data Journalism Tools. I do intend to put them back on line at some point in the future, but I do not at this time know when they will be back, because I haven’t determined the scope of required changes to the appliances or their marketing materials. Why? These two appliances may be in violation of item 4.A. below:

4. You will not attempt or encourage others to:

A. sell, rent, lease, sublicense, redistribute, or syndicate the Twitter API or Twitter Content to any third party for such party to develop additional products or services without prior written approval from Twitter;

B. remove or alter any proprietary notices or marks on the Twitter API or Twitter Content;

C. use or access the Twitter API for purposes of monitoring the availability, performance, or functionality of any of Twitter’s products and services or for any other benchmarking or competitive purposes; or

D. use Twitter Marks as part of the name of your company or Service, or in any product, service, or logos created by you. You may not use Twitter Marks in a manner that creates a sense of endorsement, sponsorship, or false association with Twitter. All use of Twitter Marks, and all goodwill arising out of such use, will inure to Twitter’s benefit.

E. use or access the Twitter API to aggregate, cache (except as part of a Tweet), or store place and other geographic location information contained in Twitter Content.

While I don’t encourage people to redistribute Twitter data, the appliances do have the ability to collect Twitter data and I can’t prevent them from redistributing it. I want to emphasize that Twitter has not asked me to take these appliances down! I don’t know that they violate the letter of item 4.C., but I think they violate the spirit of that clause, so I am removing them until I can determine in what form they are viable products.

 

HootSuite - Social Media Dashboard

Full disclosure – I am now a HootSuite Pro Affiliate. Now – the rest of the story. As you no doubt recall, I’ve received a Google Cr-48 Chrome Notebook as part of the pilot program. So I need a Twitter management tool that works on the Cr-48, as well as on my other machines using the Chrome browser.

So I checked out all the free tools in the Chrome Web Store, including the free version of Hootsuite. In addition to the free version of HootSuite, there’s Tweetdeck and Seesmic and a few other lesser-known tools. I tried the three main ones – Tweetdeck, Seesmic and HootSuite. First up was Tweetdeck. Tweetdeck is a very nice tool, but it has two missing features:

  1. It only connects to Twitter, Facebook, Buzz and Foursquare. I’m not on Facebook, Buzz or Foursquare, but I am on Twitter and LinkedIn. Tweetdeck doesn’t appear to connect to LinkedIn.
  2. Tweetdeck does not appear to be able to schedule posts.

In addition, Tweetdeck appears to only have the built-in light-type-on-dark-background theme. I found it difficult to use for long periods as a result.

Seesmic is quite a bit better. It connects to LinkedIn as well as Twitter, Facebook, Buzz and Foursquare. It has two themes, black on white and white on black. And as of last week, Seesmic allows scheduled posts. 

HootSuite, even in the free version, connects to Twitter, Facebook, Facebook Pages, LinkedIn, Ping.fm, WordPress.com blogs, MySpace, Foursquare and mixi. It doesn’t seem to connect to Buzz, however. And HootSuite allows scheduled posts. HootSuite has its own integrated link shortener, and with the free version you get 30 days of statistics history for link clicks. There’s another nice feature that apparently only HootSuite has – when you see a tweet in one of the streams, you can expand the whole conversation if there is one!

What you get with the Pro version, which is $5.99US a month, is

  • Unlimited statistics history
  • Google Analytics integration
  • Facebook insights
  • Influence scores
  • No advertising (although I display the Promoted Tweets)
  • Available membership in the affiliate program
  • Access to some training tools, called the HootSuite University

So, I signed up for HootSuite Pro and for the Affiliate Program. I haven’t explored the HootSuite University yet, but it looks interesting if you’re a community manager or other social media professional. If you’d like to sign up too, here’s a handy widget:

HootSuite - Social Media Dashboard

 

Data Journalism Developer Studio 2012LX Blog


Update 2011-03-20

For a variety of reasons, I have replaced the Social Media Analytics Research Toolkit, Code Like A Pirate and Project Kipling with a new, modular appliance called the Data Journalism Developer Studio. All of the software found in those three appliances can be installed via scripts provided in the new appliance. Links:


I’ve just released version 2.0.0 of the Social Media Analytics Research Toolkit. In addition to the usual updating of packages to the most recent versions, SMART@znmeb now has a sentiment analysis library! The library is an R package that I discovered on Twitter just today called textir. Textir is a “set of tools for inference about text and associated speaker/document sentiment,” created by Assistant Professor of Econometrics and Statistics and Robert L. Graves Faculty Fellow Matt Taddy of the University of Chicago Booth School of Business.

If you’re interested in the mathematics behind this package, Professor Taddy has posted a document to Archiv.org, titled “Inverse Regression for Analysis of Sentiment in Text.” Three sample problems and their solutions are described in the paper: ideology in political speeches, on-line restaurant reviews and business news and stock performance. I’m excited to have this package available.

The political speech and restaurant review datasets are included with the library, but I couldn’t find the business news data set. I’m also adding the package to Project Kipling, but it will be a day or so before I get that build completed. I’ve been wanting a sentiment analysis capability in the appliances for quite some time, but haven’t been able to find an open source package until now.

One final note: 2.0.0 will be the last release of the larger appliance from 2010, Code Like A Pirate. Everything in that appliance and more can be found in Project Kipling, and there have been so few downloads of it that there’s no point in duplicating the effort and taking up the extra disk space. I’m going to leave the Open Virtualization Format (OVF) file up on SUSE Studio for Code Like A Pirate 2.0.0, but all the other builds will be removed and no more will be done.

 

Truthy

I’m sure you’ve all heard the phrase, “Gee, I wish I’d said that!” Well, when you craft software like I do, sometimes you run across something and say, “Gee, I wish I’d built that!” I’m talking about Truthy, a project at the University of Indiana.

Truthy is a web service that collects data from the Twitter “Gardenhose” Streaming API feed. “Gardenhose” is currently delivering a random sample of about 10% of the full public “Firehose” stream. Truthy scans the stream for “memes” – #hashtags, @mentions / @replies and URLs. Twitter calls these entities. When Truthy sees a significant change in traffic for a meme, it investigates further.

The goal of Truthy is to track memes about US politics and track the propagation of them in Twitter. To be more specific, Truthy looks for “astroturfing” and other misinformation campaigns and provides some stunning visualizations of how these campaigns are initiated and spread. From the FAQ:

“What does ‘Truthy’ mean?
“A truthy meme relies on deceptive tactics to represent misinformation as fact. The Truthy system uses ‘truthy’ to refer to activities such as political smear campaigns, astroturfing, and other social pollution.”

“What is astroturfing?
“Astroturfing denotes political, advertising, or public relations campaigns that are formally planned by an organization, but are disguised as spontaneous, popular ‘grassroots’ behavior. The term refers to AstroTurf, a brand of synthetic carpeting designed to look like natural grass.”

Truthy is fascinating to explore. Go to the Memes page and you’ll see the memes Truthy is tracking, sorted by network size. When I wrote this, “#p2″ was the meme with the largest network. If you click on the network visualization, it enlarges and Truthy displays a panel of statistics. But what’s really the point of Truthy is that you can watch the stream of tweets go by tagged with this meme in a widget on the right. As they go by, you can get the sense of what people propagating the meme are saying and press the “Truthy” button if you think it satisfies the definition.

There’s another way to inject your opinion into the Truthy system. From the Science Friday archives:

“Want to participate in an mini-experiment? Tweet using the key words ‘#truthy, @scifri, and @truthyatindiana’ and the project leaders will try to track their spread.”

The architecture of Truthy is described on their “About” page. In addition to the Klatch visualization and analysis of the spread of memes, Truthy also attempts to do sentiment analysis and classification on the tweets it collects. The sentiment analysis algorithm is described in

Modeling public mood and emotion: Twitter sentiment and socio-economic phenomena

The “About” page also links to a paper describing the mathematics of the analysis of meme propagation, based on data collected during the Brown – Coakley election in Massachusetts to fill the Senate seat of the late Senator Edward Kennedy.

From Obscurity to Prominence in Minutes: Political Speech and Real-Time Search

When I discovered Truthy, I sent an email to the investigators, asking whether Truthy is open souce. It is not open source at the present time, although most of the components listed in the FAQ are:

“What technology do you use?
“We use a variety of tools to bring you the Truthy service. The overall effort is directed using our own custom scripting language, which we call ‘Klatsch’. This language uses the Gephi Toolkit for graph layout. We also rely on a number of other open source tools, including Boost, Django, Google Chart Tools, ImageMagick, JQuery, MPlayer, MySQL, and the Twitter APIs. Our thanks to the authors of these tools for making our site possible! Finally we gratefully acknowledge CNETS and NSF for funding the computing infrastructure that hosts the Truthy service.”

So you won’t be seeing Truthy in the Social Media Analytics Research Toolkit any time soon. Personally, I think technologies like this have a great potential in social media marketing as well. Perhaps Twitter will incorporate something like this into their “Resonance” algorithm for their “Promoted” advertising model. And, of course, Twitter isn’t the only place where memes are injected and propagate.

Gee, I wish I’d built that!

 

 

Download “Getting Started with the Social Media Analytics Research Toolkit” (pdf, 1.25 megabytes)


Download the Social Media Analytics Research Toolkit


I’ve decided to do a weekly review post of the events I found notable on line during the week, complete with rants, snark and quite possibly even some math. Hey, math is what I do. ;-)

Angelgate

By now, you’ve probably heard the story – Michael Arrington posted on Techcrunch that he’d gotten wind of a secret meeting of “super angels” at Bin 38. He crashed the meeting, and reported that the super angels were allegedly “colluding” to “hold valuations down,” an act which is possibly felonious. There was a hashtag, more blog posts, tweets, a supposedly “leaked” private email, an apparent “feud” between two well-known Silicon Valley “angels”, Dave McClure and Ron Conway, and, of course, Hitler finding out about the whole thing.

It’s hard to find anything good in this story, and it’s even harder to find anything good on Techcrunch these days. We’ve been treated to a post by Sarah Lacy blaming teachers’ unions for the “Why Our Schools Suck” and another one saying, “But from where I sit, it never felt much like a recession at all.” We’ve heard from Vivek Wadhwa about “Silicon Valley’s Dark Secret: It’s All About Age,” and Arrington saying “Too Few Women In Tech? Stop Blaming The Men.” There was a strange post from Arrington: “Blogging and Mass Psychomanipulation.” And finally, Arrington posted that “AngelGate won’t take over the TechCrunch Disrupt agenda.”

Great going, Arrington – throw out an allegation of Federal criminal behavior, get two Silicon Valley legends publicly at each other’s throats, post a private email from one and a deleted tweet from the other, threaten to post more private emails and then have the expectation that they’ll be on your stage at your conference bright and early Monday morning all unicorns and rainbows to “talk about how venture capitalists and angel investors can help entrepreneurs succeed.”

Mr. Arrington, I can’t imagine why either Ron Conway or Dave McClure would want to be up there with you. What do either of them have to gain? For that matter, what do entrepreneurs have to gain from attending Techcrunch Disrupt? They’ve heard all the advice already, they know the game, and most of them are too busy trying to become the one in 20 that succeeds to spend money on a plane ticket, a hotel room and a conference pass. The only reason they’d be there at all would be to “network” with the very people whose ethics and character you’ve slammed!  Angelgate makes Silicon Valley look elitist, arrogant, vicious and out of touch – more out of touch with Main Street than Wall Street ever was.

President Obama said it best, I think:

“We understand exactly who and what got us into this mess. Now, we don’t mind cleaning it up… But don’t just stand there and say, ‘You’re not holding the mop right.’

“Instead of standing on the sidelines, why don’t you grab a mop? Help us clean up this mess and get America back on track!”

So I’ve stopped reading Techcrunch. I’m one of those people who are driven to be lifelong learners, and there’s nothing I can learn from Techcrunch any more. Techcrunch seems to have become one big game of “Let’s you and him fight.” It’s mostly rumor, innuendo and bait. And sadly, people are taking the bait. It’s propaganda for the most part, and the only good news is that critical thinking isn’t dead – the commenters are calling bullshit early and often.

If Silicon Valley is truly about helping entrepreneurs succeed, about “building what people want to buy”, creating jobs in America, and, yes, capitalism, lose the attitude that Silicon Valley is the best place – or even the only place, as I’ve heard – to build a startup. Because what I saw this week has convinced me that it’s the worst place. Silicon Valley is the last place I’d go to give birth to a business.

Twitter Analytics

At a sports marketing conference, Twitter announced that it would be releasing a real-time analytics dashboard. There hasn’t been much detail released about it, but the announcement that it would be “free” struck me as unlikely, considering the way it was described. I suspect that it’s not free, but bundled with purchases of Twitter advertising and other marketing services.

Why do I think that? Well, there are two kinds of metrics one can get from Twitter. The first kind is publicly available now and is the basis behind Twitalyzer, Klout and many other services. Using the Social Media Analytics Research Toolkit, you can build any conceivable dashboard, real-time or otherwise, from the publicly available Twitter feed. Between the REST, Search and Streaming APIs, including the possibility of negotiated elevated access up to the full Firehose of public tweets, one can obtain

  • Who tweets what when, and sometimes even from where and to whom,
  • Who is following whom, who is listed by whom,
  • Who retweets what, who marks what as a favorite,
  • Click data for links tweeted using publicly-available tracking APIs, like ow.ly and bit.ly,
  • World, regional and in some cases by city trending topics, and
  • Search query results about places and topics or combinations thereof.

With the exception of elevated access levels to the Streaming API, all of that is available now for free. In short, you can pretty much analyze everything people write on Twitter. But there’s another set of metrics that only exist at the moment inside Twitter’s data centers – metrics about how people behave reading Twitter. We don’t know

  • How does our Twitter page compare with other Twitter pages – how many people visit @znmeb, for example, compared with @justinkistner,
  • What parts of our Twitter pages work and what parts don’t, and
  • What do Twitter users search for?

With Google or Bing, I can get keyword suggestions – in the case of Bing, even a determination of “commercial intent” vs. “just looking.” There’s an entire search engine optimization / marketing industry build up around these tools, many of them free. We’re even starting to see “social media optimization” tools, although I haven’t seen anything that looked like it had any substance. Without knowing what people are searching for on Twitter, how can I know what to tweet? I can tell you what people are tweeting about, but not what they wish we were tweeting about.

Google and Microsoft may be in a position to give away search optimization tools to market advertising. Compete and Quantcast and Alexa may be in a position to give away competitive analysis data to market higher-level or more detailed data. Hubspot can give away dozens of social media reports and white papers to market their tools and services.

But I don’t think Twitter is in that position. I don’t think they can afford to give away detailed Twitter page analytics or search user behavior data. So I’d be very surprised if their “free” real-time analytics dashboard is anything more than a new presentation of information we can already get externally via the APIs. And if they really are planning to give away analytics beyond a dashboard made from the already-publicly-available data, I think that’s a huge strategic mistake.

 

Update 2011-03-20

For a variety of reasons, I have replaced the Social Media Analytics Research Toolkit, Code Like A Pirate and Project Kipling with a new, modular appliance called the Data Journalism Developer Studio. All of the software found in those three appliances can be installed via scripts provided in the new appliance. Links:

© 2011 Borasky Research Journal Suffusion theme by Sayontan Sinha