Join Me At Social Fresh Portland!

social media conference
Viralheat

Dumping A Spreadsheet Of Your Most Recent Tweets - TweetCSV.pl

I am releasing the Perl script I run to dump a spreadsheet of my most recent tweets, Contacts.pl, as an open source project, under the Artistic License. This is the same license that Perl uses. I’ve been using this script or variations of it since before the Haiti earthquake, so it’s pretty well tested.

The master repository is on Github at http://github.com/znmeb/TweetCSV. As far as I know, Contacts.pl will run on any modern version of Perl, but I’ve only tested it on Perl 5.10 on openSUSE Linux 11.2, and with ActiveState ActivePerl on Windows. If you have any trouble running it, please feel free to send me a tweet @znmeb.

How does it work? TweetCSV.pl makes Twitter API calls using the Perl Net::Twitter module from CPAN. You get the results in a comma-separated-value (CSV) file, which you can then open in a spreadsheet. Instructions for downloading and running the script are at http://github.com/znmeb/TweetCSV/blob/master/README.

Again, please feel free to send me a tweet if you need help getting this running. And special thanks to Marc Mims (@semifor), who has developed the Net::Twitter Perl module that interfaces with the Twitter API!

Collecting Historical Twitter Data For A Topic - TopicTweetHistory.pl

I am releasing the Perl script I run to collect historical Twitter data for a topic, TopicTweetHistory.pl, as an open source project, under the Artistic License. This is the same license that Perl uses. I’ve been using this script or variations of it since before the Haiti earthquake, so it’s pretty well tested.

The master repository is on Github at http://github.com/znmeb/TopicTweetHistory. As far as I know, TopicTweetHistory.pl will run on any modern version of Perl, but I’ve only tested it on Perl 5.10 on openSUSE Linux 11.2, and with ActiveState ActivePerl on Windows. If you have any trouble running it, please feel free to send me a tweet @znmeb.

How does it work? The first thing TopicTweetHistory does is open a browser window to Advanced Twitter Search. You simply build your Twitter Search query there, then when you are getting the results you want, copy the query string and paste it back as input to TopicTweetHistory. The script performs the Twitter Search back in time and delivers all tweets that match the query. You get the results in a comma-separated-value (CSV) file, which you can then open in a spreadsheet. There are more details on running the script at http://github.com/znmeb/TopicTweetHistory/blob/master/README.

Again, please feel free to send me a tweet if you need help getting this running. And special thanks to Marc Mims (@semifor), who has developed the Net::Twitter Perl module that interfaces with the Twitter API!

Exporting Your Twitter Contacts to a Spreadsheet - Contacts.pl

I am releasing the Perl script I run to dump a spreadsheet of my Twitter followers and friends, Contacts.pl, as an open source project, under the Artistic License. This is the same license that Perl uses. I’ve been using this script or variations of it since before the Haiti earthquake, so it’s pretty well tested.

The master repository is on Github at http://github.com/znmeb/TwitterContacts. As far as I know, Contacts.pl will run on any modern version of Perl, but I’ve only tested it on Perl 5.10 on openSUSE Linux 11.2, and with ActiveState ActivePerl on Windows. If you have any trouble running it, please feel free to send me a tweet @znmeb.

How does it work? Contacts.pl makes Twitter API calls using the Perl Net::Twitter module from CPAN. You get the results in a comma-separated-value (CSV) file, which you can then open in a spreadsheet. Instructions for downloading and running the script are at http://github.com/znmeb/TwitterContacts/blob/master/README.

Again, please feel free to send me a tweet if you need help getting this running. And special thanks to Marc Mims (@semifor), who has developed the Net::Twitter Perl module that interfaces with the Twitter API!

USGS Working On Earthquake Detection Via Twitter!

I’ve just learned that the U.S. Geological Survey (USGS) has started a project to enhance detection of earthquakes using Twitter! The project is called U.S. Geological Survey Twitter Earthquake Detector (USGSTed). The web site is http://recovery.doi.gov/press/us-geological-survey-twitter-earthquake-detector-ted/, and the Twitter feed is @USGSted.

According to the site, “Social Internet technologies are providing the general public with anecdotal earthquake hazard information before scientific information has been published from authoritative sources.Ā  People local to an event are able to publish information via these technologies within seconds of their occurrence. In contrast, depending on the location of the earthquake, scientific alerts can take between 2 to 20 minutes.”

As you probably recall, I’ve been working with some of the projects that arose after the Haiti earthquake in January. I don’t have much data from that earthquake, but I do have data from the Chile quake and today’s Taiwan quake.

To give you an idea of how rapidly Twitter responds to an earthquake, the Chile earthquake occurred Saturday, February 27, 2010 at 06:34:14 UTC. In my data collected after the quake, using GeoTweetHistory.pl, the first tweet about the earthquake has the time stamp 06:34:37 UTC. That’s right, the first tweet was sent only 23 seconds after the quake! That first tweet reads simply, “TEMBLOR”.

If you’d like to look at the data from today’s Taiwan earthquake, I’ve uploadedĀ  the data file from GeoTweetHistory.pl to http://github.com/znmeb/GeoTweetHistory/blob/master/taiwan_quake.zip. The earthquake occurred Thursday, March 04, 2010 at 00:18:52 UTC.

I don’t read Chinese, so I can’t tell when the first tweet about the earthquake was. If you can read Chinese and would like to help, download the file, uncompress it and open it in a spreadsheet. The collected tweets are displayed with the newest tweet first. You should see tweets about the earthquake shortly after “2010-03-04 00:18:53 +0000″. Let me know on Twitter or in the comments here on the blog. Thanks!

Collecting Twitter Data After An Event - GeoTweetHistory.pl

I am releasing the Perl script I run to collect Twitter data after an event, GeoTweetHistory.pl, as an open source project, under the Artistic License. This is the same license that Perl uses. I’ve been using this script or variations of it since before the Haiti earthquake, so it’s pretty well tested.

The master repository is on Github at http://github.com/znmeb/GeoTweetHistory. As far as I know, GeoTweetHistory.pl will run on any modern version of Perl, but I’ve only tested it on Perl 5.10 on openSUSE Linux 11.2, and with ActiveState ActivePerl on Windows. If you have any trouble running it, please feel free to send me a tweet @znmeb.

How does it work? When an event happens, people tweet about it. These tweets go into Twitter Search, and unless Twitter has blocked the person tweeting, the tweets get indexed. Events almost always have a location associated with them. In the case of an earthquake, the USGS gives the coordinates and time of the earthquake on their web site as soon as they have this information. The web site is http://earthquake.usgs.gov/earthquakes/recenteqsww/.

So all you have to do is go to the USGS site, find the earthquake details, and get the location of the earthquake. Once you have the location details, you simply run GeoTweetHistory.pl. The script accesses Twitter Search back in time and delivers all tweets within the specified circle. You get the results in a comma-separated-value (CSV) file, which you can then open in a spreadsheet. There are more details on running the script at http://github.com/znmeb/GeoTweetHistory/blob/master/README.

Again, please feel free to send me a tweet if you need help getting this running. And special thanks to Marc Mims (@semifor), who has developed the Net::Twitter Perl module that interfaces with the Twitter API!

Chile Earthquake Tweak-the-Tweet Twitter Feed



Social Fresh Portland is Filling Up Rapidly & Price Goes Up Tomorrow!

I’ve just been informed that the Social Fresh Portland conference is filling up rapidly. There are only 300 seats, and over 100 of them are taken already. And the price goes up tomorrow! So, if you’re planning to come to the conference, whether you’re here in Portland or from somewhere else, head on over to Social Fresh Portland and sign up today! I’m sure this conference is going to sell out!

Tweak-the-Tweet for Chile

In response to yesterday’s earthquake in Chile, the good folks at EPIC Colorado have adapted their Tweak-the-Tweet syntax to both English and Spanish.

“Project EPIC has created a set of prescriptive tweets using the Tweak the Tweet syntax for the Chile earthquake. You can find both the English and Spanish versions of these tweets in this Google document (http://bit.ly/9psDqd).

Please help us by re-tweeting these from our @epiccolorado and @TtT_Pacific Twitter accounts. We would like to have response and relief organizations like the Red Cross, World Bank, or other appropriate organizations retweet these prescriptive tweets so that more people will pick up the syntax and make it useful. Let me know if you have any ideas or if you can help with this effort. ”

You can also follow @sophiabliu for more information.

Social Fresh Portland

As you can see over to the left, I’m volunteering at Social Fresh Portland. And I’d like to invite those of you outside of the Portland area to join me! As the web site says, “Social Fresh Portland (Oregon) is a social media conference for marketers. We focus on case studies and what social media can really do for business bottom lines.”

A number of speakers are from the Portland area, and I’ve met most of them. The list of Portland-based speakers includes

The complete list of speakers and schedule is here.

Social Fresh Portland is just the first of a long list of social media and technical conferences happening here in Portland in 2010. In the next few weeks, as the schedules start to appear on line, I’ll be posting invitations. Meanwhile, here’s a little bit about why Portland is such a great place to visit.

The thing is, I don’t actually live in Portland, but in a suburb called Aloha that, strangely enough, has nothing to do with Hawaii. And I’ve never been to Hawaii, so I can’t very well ask you to visit there, can I? So, yes, definitely visit Portland!

I’ve been here for almost 25 years. What do we have?

Water:

Two major rivers meet here, and fresh drinking water literally falls out of the sky free for the taking! If you like it salty, there’s a few bays and coves a couple of hours to the West.

Air:

We get our air mostly fresh off the ocean, or occasionally funneled through the Columbia Gorge by a high-pressure cell. In any event, we get our air before much of the US does, and we try our damnedest not to add stuff to it on its way East.

Mountains:

Yeah, there’s one not too far away that gave us a little trouble in 1980, but for the most part, they’re pretty to look at and a great place to go hiking or skiing or just hanging out in the lodge.

Parks:

There are so many, I can’t list them all, so I’ll just give you a link to my favorite. Tryon Creek State Park. If Nature holds true to form, the trilliums will be in full bloom there!

Beer:

Contrary to popular belief, you can get imported beer here. But why would you? Ours has better hops, has more alcohol, is served in pubs, restaurants, banquet halls and even movie theaters! According to Wikipedia, “In 2008, Portland had 30 microbreweries located within the city limits, more than any city in the world and greater than one-third of the state total. With 46 microbrew outlets, Portland has more breweries and brewpubs per capita than any other city in the United States. Many have won nationwide and international acclaim.”

Food and wine:

We grow it. We catch it in the ocean. We make it. We cook it. We eat it. We package it up and ship it. And we love to share it. Our food cart scene has been featured on national television and in the New York Times.

Entertainment:

New York has Greenwich Village. Washington has Georgetown. Portland has Portland! Jazz, folk, rock, symphonic, chamber, ballet, opera, and two new music ensembles. Portland has numerous theater companies and a major performing arts center. We have a listener-supported classical radio station that’s heard around the world on the Internet. Oh, yeah – if you happen to hear bagpipes, they just might be coming from a unicyclist.

Bloggers and Tweeters and Geeks, Oh! My!:

I’m a blogger. This is my blog. I had five others once. And I have a LinkedIn page. And a Facebook page. And I tweet. A lot – at last count more than any other Portlander.

Geeks: we have Linus Torvalds. Perhaps you’ve heard of Linux? He invented it. We have Ward Cunningham. Perhaps you’ve heard of the Wiki? He invented it. We have major contributors to Perl, PostgreSQL, Ruby, WordPress and other open source projects.We have Jive Software and Zapproved. We have the Silicon Florist. We have 30 Hour Day. We have Strange Love Live.

We love social media, software and (wait for it) social media software! Software is a craft here, just like belts, jewelry and beer. You can actually sit and watch us make it in coffee shops and pubs.

So if you’re looking for a great city to visit this year, we’re here! Just be careful crossing the street if you hear bagpipes.

A Peek Under Twitter's Hood

Yesterday, Evan Weaver tweeted “Twitter open source page is live! http://twitter.com/about/opensource”. This page is a fascinating peek under Twitter’s hood – the cutting edge open source technologies that power the popular microblogging service. For those of us who work with Twitter, this is required reading for career management and lifelong learning. And for those of us who are Twitter users, it’s a fascinating look at the future of the real-time web.

Ruby

As you may know, Twitter was originally a Ruby on Rails application. That’s actually where I first heard of Twitter – at RubyConf 2006. Early in 2007, I joined Twitter, and my first friends and followers were people I had met at RubyConf 2006.

As you’ll see below, Twitter has now incorporated many other technologies, but they still use Rails, and Ruby. In particular, the version of Ruby they use is Ruby Enterprise Edition (REE), a version tuned for stability and scalability.

Scala

Scala “is a general purpose programming language designed to express common programming patterns in a concise, elegant, and type-safe way. It smoothly integrates features of object-oriented and functional languages, enabling Java and other programmers to be more productive. Code sizes are typically reduced by a factor of two to three when compared to an equivalent Java application.”

It’s not just Twitter that’s using Scala; Foursquare also uses it. Because of the high visibility of Twitter and Foursquare, I expect to hear a lot more about Scala in the coming months. For those of you in the Portland, Oregon area, there is now a Scala programmers’ group, @PDXScala.

Cassandra

Cassandra is one of the newer “Non-SQL” databases. It was originally developed at Facebook, and released as open source in 2008. The description on the Cassandra home page reads, “A highly scalable, eventually consistent, distributed, structured key-value store.” The key term (pun intended) here is “key-value store”. This is somewhat like what they used to call in the ancient days (1950s) “associative memory”. Rather than specify an object (the value) by its location, we give it a name (the key) and the system can find it.

Here’s an interview with Ryan King, Twitter’s Director of Storage, on why Twitter chose Cassandra.

Hadoop and Pig

Hadoop is another highly-scalable distributed tool. Hadoop primarily implements the MapReduce operation. MapReduce is a way of applying massive processing power to massive datasets. The concepts behind MapReduce originated in the early 1960s or even earlier, during the development of the Lisp programming language. An implementation of MapReduce has been patented by Google.

Pig is a “scripting language” designed to work with Hadoop. It simplifies the programming tasks of people working with large datasets.

Summary

While the technologies are interesting to technologists like me, what does such massive power give the Twitter user? And why does Twitter need it? Here’s a simple example: The exact arrival rates of tweets at Twitter aren’t widely publicized. I don’t know if this is a “trade secret” or not – I’ve seen estimates of these rates on blogs but I’m not sure that the way those estimates were obtained is technically valid.

However, there is a publicly-available subset of the full “Firehose” data stream available via the Streaming API, calledĀ  “Sample”. There’s no official documentation on what fraction of the full Firehose comes through the Sample stream. But in a sample I collected in January, I saw a peakĀ  of 81,718 tweets in a single hour!

And what was so special about that hour? It was, to be precise, the hour between 01:00:00 and 02:00:00 UTC January 13th, 2010. The Haiti earthquake happened at 21:53:10 UTC on January 12th, 2010 – about three hours earlier. Remember – “Sample”, as the name implies, is a subset of the full tweet stream! That’s the reason Twitter needs the massive power it is getting from these cutting-edge technologies.

Update!

Todd Hoff of Highscalability.com has just published a more detailed analysis of Twitter’s use of Hadoop and Pig, including links to a presentation by Kevin Weil, Analytics Lead at Twitter. Both are highly recommended!

Books on the Technologies

Hadoop in Action
by Chuck Lam
Powells.com
Pro Hadoop
by Jason Venner
Powells.com
Programming in Scala
by Martin Odersky
Powells.com
Programming Scala
by Dean Wampler
Powells.com
Beginning Scala
by David Pollak
Powells.com