Jul 282011
 

Updated 2011-07-31:

1. I’ve turned off One True Fan, most likely permanently, because it was confusing some Highligher users.

2. I’ve turned off Disqus and WordPress commenting as well, though this might be temporary. For the moment, I want to test out Highlighter as the main/only method of discussion.

3. I haven’t found a list of all sites using Highlighter yet, but I can recommend two run by friends of mine, Audrey Watters (@audreywatters) and Michelle Rae Anderson (@mediaChick):


At O’Reilly’s miniTOC Portland conference yesterday (hashtag #TOCPDX), Josh Mullineaux of Highlighter.com presented a brief overview of a new tool for websites, called Highlighter. I’ve enabled Highlighter on this site. There’s a video on the Highlighter home page, but here’s how it works:

1.  Highlight any text or image on the site with the mouse / trackpad. You will get a three-option menu: “Save This”, “Share It” and “Comment”.

2. The “Save This” option saves the highlighted content in your Highlighter profile. The “Share It” gives you the option of sharing the highlighted content on Facebook, Twitter, or in an email.

3. The “Comment” option is a little more interesting, and I think this is the awesome part of Highlighter. You can comment on the highlighted content, and your comment is sent to the website owner, me in this case, for moderation. If the owner approves, the comment is posted and any Highlighter subscriber can see it and join in the discussion.

There’s a good bit more to this:

  • Highlighter subscribers can follow each others’ streams, just like on Twitter or Facebook. You can think of it as a social network for publishers and their readers. You can join Highlighter here: http://highlighter.com/register/
  • For the publisher, there are detailed analytics about how your readers are engaging with your site.
  • Subscribers have a profile page, which I’ve linked to my Twitter and LinkedIn profiles. Mine is http://highlighter.com/znmeb/
The installation is simple – in my case, I’ve simply installed a WordPress plugin. But any web site where you can install JavaScript in the page footer can use Highlighter. And even if you aren’t a publisher, you can subscribe to Highlighter and join in the discussion. I love it!
 Posted by at 18:57
Jul 022011
 

Data Journalism Developer Studio 2012 Overview

Download Data Journalism Developer Studio 2012 From SUSE Gallery

Data Journalism Developer Studio on Github

Data Journalism Developer Studio 2012 Blog


This was done with a slide rule. http://i.imgur.com/L4r2B.jpg
@nickvolp
uıǝɥuǝdןoʌ ʞɔıu

Actually, no, it wasn’t done with just a slide rule! The calculations were most likely done with large mainframes! The Apollo Program started in 1961 with the famous speech by President Kennedy.

http://en.wikipedia.org/wiki/Apollo_program

At that time, the prevailing tools for orbital and other NASA calculations were large mainframes, the IBM 7090 and similar scientific computers from other vendors. By 1964, IBM had introduced System360.

http://en.wikipedia.org/wiki/IBM_System/360

No doubt much work was done on IBM 7090/7094-class mainframes during the earlier parts of the Apollo project, but by the 1969 moon landing, most of the earlier machines would have been replaced with System360 or similar mainframes from other vendors. In any event, there was much more compute power available during the Apollo project than slide rules!

 Posted by at 22:31
Jun 092011
 

About Data Journalism Developer Studio


In all the technology news last week, you might have missed this story. I only saw it mentioned on Reuters, not on any of the major technology blogs that I read. As is my usual practice when I see a technology story that matches my interests, I try to locate the original sources and post links on Twitter. So in case you missed those, here they are:

LinkedIn shares were a bubble: academic model | Reuters http://meb.tw/iNiM8R
@znmeb
M. Edward Borasky
Is There a Bubble in LinkedIn's Stock Price?http://meb.tw/loYBD3 [pdf]
@znmeb
M. Edward Borasky

There’s a fair amount of technical detail about the model in the paper cited in my second tweet. If you want even more, the model itself is documented here:

How to Detect an Asset Bubble by Robert Jarrow, Younes Kchia, Philip Protter :: SSRN http://meb.tw/iqvwUQ

So what’s the story here? From “Is There a Bubble in LinkedIn’s Stock Price?”:

It has been well documented in the financial press that a methodology is needed that can identify an asset price bubble in real time. William Dudley, the President of the New York Federal Reserve, in an interview with Planet Money [3] stated “…what I am proposing is that we try to identify bubbles in real time, try to develop tools to address those bubbles, try to use those tools when appropriate to limit the size of those bubbles and, therefore, try to limit the damage when those bubbles burst.”

It is also widely recognized that this is not an easy task. Indeed, in 2009 the Federal Reserve Chairman Ben Bernanke said in Congressional Testimony [1] “It is extraordinarily difficult in real time to know if an asset price is appropriate or not”.

Here’s a link to the William Dudley interview, and one to Bernanke’s testimony.

Professor Jarrow and his colleagues took up the challenge laid down by the Federal Reserve Board. The model they have devised is quite complex, involving stochastic differential equations and reproducing kernel Hilbert spaces. They tested this model on stock price data from “the alleged internet dotcom bubble (and beyond), from 1999 to 2005.” While there will no doubt be much more peer review of the data, model and conclusions, the test shows promise. Moreover, it can be applied to the price of any publicly-traded stock. The test has three possible results:

  1. There’s definitely a bubble.
  2. There’s definitely not a bubble.
  3. No conclusion about a bubble can be drawn from the data.

So now we come to LinkedIn. LinkedIn was publicly traded for the first time on May 19, 2011, using the symbol LNKD. Professor Jarrow and his colleagues obtained real-time price data from Bloomberg for the first four days of trading and applied their model. And their claim is quite definitive:

We have found, definitively, that there is a price bubble!

While the technology is certainly interesting in its own right, at least to data journalists like myself, what are the wider implications of this? First of all, the context of the Dudley interview was the Finance / Insurance / Real Estate (FIRE) sector and the holdings of the Federal Reserve Board in that industry. As we all know, the Great Recession we discuss on a daily basis originated in the FIRE sector.

The context of the model Jarrow, et. al., have created, on the other hand, is publicly-traded stocks. In particular, the model was initially tested on Internet stocks during a well-documented bubble, and applied to a social media stock within days of its initial public offering. Moreover, the model should work in real time. Given a live data feed and enough computing capacity, it should be possible to monitor data and make investment decisions in real time.

Even though the model is designed for real-time publicly-traded stocks, it should be applicable to any financial time series that satisfies the underlying mathematical assumptions. This includes, for example, prices of shares in the “secondary markets” for companies like Facebook and Twitter. I haven’t attempted to implement the model yet – I’ve been away from computational finance for several years and I’m in the process of coming back up to speed on the methodologies. The core technologies are available in the Data Journalism Developer Studio, however, and if anyone is interested in working on this, send me a tweet @znmeb.

 Posted by at 12:41
Feb 182011
 

Nor is it the blog post I had intended to write about Twitter. Lisa Barone (@lisabarone), Chief Branding Office of Outspoken Media posted this on her blog today, and I think it’s a rather important post.

“I’m not sure if you’ve heard, but Twitter is infested with bots. Dirty, dirty, bots!

“For the past couple of weeks, the Twitter spam bots have been out in full force, often hitting accounts with handfuls of new cleavage-baring followers per minute. There’s been a lot of conversation about it on Twitter and lots of complaints things are spiraling out of control – but that’s all been from users. With so many folks complaining about the rise in Twitter bots and Twitter spam, I can’t help but think we haven’t heard much from Twitter. Where are they and what are they doing about it?”

A few weeks ago, I started to see accounts follow me on Twitter with very suspicious tweets and biographies. The accounts had strange names, often including a location, like “@OMG_boise”.  The bio would mention a few locations and appeared mostly normal, but the tweets appeared to be random English words. Sometimes they were actual sentences, and sometimes they looked like a response to a human’s first name. But they were clearly some kind of bot, or at best, tweets being posted via Amazon Mechanical Turk. But what I wasn’t seeing was links, either in the profiles or the tweets. They just looked like noise.

On Monday, February 14, 2011, that changed – they started to tweet links. So far, I’ve seen four domains, and I’ve sent Twitter a list of the domains. Twitter seems to be cleaning them up – most of the accounts actually tweeting the links have disappeared from Twitter Search. In any event, this seems to be insidious behavior – automatically following people in the hopes that they’ll follow back and mostly tweeting nonsense – but hardly the sort of thing that could really hurt Twitter in the same way as what Lisa describes. But if you’re still using some kind of automatic follow-back tool, this is one more reason to stop!

The same cannot be said for another form of spam that’s infesting Twitter. Twitter’s Trending Topics are more or less unusable, because as soon as a trend emerges, spambots jump on it. often on multiple trending topics! This kind of spam is particularly vicious because it interferes with two of Twitter’s three advertising tools – Promoted Trends and Promoted Tweets.

So let’s crowdsource this! If you have some spare time while you’re in Twitter, click on a Trending Topic or two, and if you see tweets with multiple Trending Topics and links, do a “Block and Report for Spam” on the account. At some point, I’m sure Twitter will figure out an algorithmic way to do this, but in the meantime it’s a start.

One final note: individual account blocking with the “Block and Report for Spam” button appears to be the only mechanism available for users to report spam to Twitter! Yet for both types of spam I’ve described – the tweeting of specific domains and co-opting Trending Topics – a simple Twitter Search will deliver a list of spammy tweets. I’d like to see an email address for Twitter where we could send those search queries, rather than having to crowdsource spam block / reports one tweet at a time.

So what was the blog post I intended to write about Twitter? It was about the leaked video tutorial for advertisers on how to use Promoted Accounts and Promoted Tweets, and how bullish I am on Twitter’s advertising model as a result of reviewing the video in its entirety – twice. Twitter – Lisa is dead on here. You’ve got a great shot at being a superb advertising platform if you can clean the garbage out of Twitter Search and Trending Topics.

Update: Here’s a guide from Erik Deckers on “10 Signs for Spotting Twitter Spammers.” Happy crowdsourcing!

Update 2011-02-24

Twitter seems to have cleaned up most of the first kind of spam – the “word salad” spammers. A search for the domains that were using them is now turning up empty at any rate. However, I did see this rather curious set of tweets from Twitter’s Taylor Singletary (@episod) yesterday:

Hashtag spam, on the other hand, continues unchecked. As noted in my tweet above, there’s one particular hashtag – “kcabwollofmaet#” spelled backwards – that seems to be part of a well-orchestrated campaign to boost follower counts, possibly tied into some Twitter applications. I did a Twitter Search for this hashtag using the “History.pl” function in Project Kipling, then did plot of tweets per minute from the data. As noted in my tweet above, over 90,000 accounts are spamming using this hashtag, some with as many as 119,000 followers! Twitter’s search returned 643,000 tweets!

Those numbers were shocking to me – 90,000 spam accounts and peaks of over 200 spam tweets per minute! Why is Twitter spending the disk space, RAM and processor cycles to index this junk?

 Posted by at 20:25
Feb 142011
 

 Last year, I discovered a hoax on Wikipedia - The ‘глупо муравей’ story: Shostakovich, musique concrète, Wikipedia, bullshit and curation. With the help of my friends on a Shostakovich mailing list, I was able to get it corrected. Now I’d like to ask for your help getting another little piece of history documented in Wikipedia.

Years ago, I read a comment in a book on board games that Confucius had advised “the idle rich” to play weiqi (the game most of the world knows by its Japanese name, Go) rather than “let their minds stagnate.” I haven’t been able to track down that exact quotation. Wikipedia only says this:

Go originated in ancient China sometime before the 3rd century BC (exactly when is unknown), by which time it was already a popular pastime, as indicated by a reference to the game in the Analects of Confucius.

The closest I’ve been able to find in English translations is this:

Analects of Confucius – Ch.17 – 22/ Confucius said, “He who always has a full stomach but does nothing meaningful is simply a good-for-nothing. Is there not a game of chess? Even playing chess is better than idling the time away.”

Now there is a Chinese variant of chess, rarely seen outside of China. But I wonder – was Confucius talking about the Chinese version of chess, or was he talking about weiqi? The British have been rabid chess players for a long time, and perhaps earliest the translators substituted a game they knew well for a game they did not.

So the questions are:

1. Is the game referred to in Analects Chapter 17 really chess? Is it weiqi? Some other game?

2. Are there other references to either chess or weiqi in the writings of Confucius?

#ibmwatson? Are you up for this? Google Translate? LinkedIn Answers? Quora?

 Posted by at 19:12