Skip to content


A Peek Under Twitter’s Hood (Updated!)

Follow @znbeta To Sign up for Social Media Analytics Research Toolkit Private Beta!

Yesterday, Evan Weaver tweeted “Twitter open source page is live! http://twitter.com/about/opensource”. This page is a fascinating peek under Twitter’s hood – the cutting edge open source technologies that power the popular microblogging service. For those of us who work with Twitter, this is required reading for career management and lifelong learning. And for those of us who are Twitter users, it’s a fascinating look at the future of the real-time web.

Ruby

As you may know, Twitter was originally a Ruby on Rails application. That’s actually where I first heard of Twitter – at RubyConf 2006. Early in 2007, I joined Twitter, and my first friends and followers were people I had met at RubyConf 2006.

As you’ll see below, Twitter has now incorporated many other technologies, but they still use Rails, and Ruby. In particular, the version of Ruby they use is Ruby Enterprise Edition (REE), a version tuned for stability and scalability.

Scala

Scala “is a general purpose programming language designed to express common programming patterns in a concise, elegant, and type-safe way. It smoothly integrates features of object-oriented and functional languages, enabling Java and other programmers to be more productive. Code sizes are typically reduced by a factor of two to three when compared to an equivalent Java application.”

It’s not just Twitter that’s using Scala; Foursquare also uses it. Because of the high visibility of Twitter and Foursquare, I expect to hear a lot more about Scala in the coming months. For those of you in the Portland, Oregon area, there is now a Scala programmers’ group, @PDXScala.

Cassandra

Cassandra is one of the newer “Non-SQL” databases. It was originally developed at Facebook, and released as open source in 2008. The description on the Cassandra home page reads, “A highly scalable, eventually consistent, distributed, structured key-value store.” The key term (pun intended) here is “key-value store”. This is somewhat like what they used to call in the ancient days (1950s) “associative memory”. Rather than specify an object (the value) by its location, we give it a name (the key) and the system can find it.

Here’s an interview with Ryan King, Twitter’s Director of Storage, on why Twitter chose Cassandra.

Hadoop and Pig

Hadoop is another highly-scalable distributed tool. Hadoop primarily implements the MapReduce operation. MapReduce is a way of applying massive processing power to massive datasets. The concepts behind MapReduce originated in the early 1960s or even earlier, during the development of the Lisp programming language. An implementation of MapReduce has been patented by Google.

Pig is a “scripting language” designed to work with Hadoop. It simplifies the programming tasks of people working with large datasets.

Summary

While the technologies are interesting to technologists like me, what does such massive power give the Twitter user? And why does Twitter need it? Here’s a simple example: The exact arrival rates of tweets at Twitter aren’t widely publicized. I don’t know if this is a “trade secret” or not – I’ve seen estimates of these rates on blogs but I’m not sure that the way those estimates were obtained is technically valid.

However, there is a publicly-available subset of the full “Firehose” data stream available via the Streaming API, called� “Sample”. There’s no official documentation on what fraction of the full Firehose comes through the Sample stream. But in a sample I collected in January, I saw a peak� of 81,718 tweets in a single hour!

And what was so special about that hour? It was, to be precise, the hour between 01:00:00 and 02:00:00 UTC January 13th, 2010. The Haiti earthquake happened at 21:53:10 UTC on January 12th, 2010 – about three hours earlier. Remember – “Sample”, as the name implies, is a subset of the full tweet stream! That’s the reason Twitter needs the massive power it is getting from these cutting-edge technologies.

Update!

Todd Hoff of Highscalability.com has just published a more detailed analysis of Twitter’s use of Hadoop and Pig, including links to a presentation by Kevin Weil, Analytics Lead at Twitter. Both are highly recommended!

Books on the Technologies

Hadoop in Action
by Chuck Lam
Powells.com
Pro Hadoop
by Jason Venner
Powells.com
Programming in Scala
by Martin Odersky
Powells.com
Programming Scala
by Dean Wampler
Powells.com
Beginning Scala
by David Pollak
Powells.com


  • Twitter
  • Technorati Favorites
  • DZone
  • Share/Bookmark

Posted in Uncategorized.

Tagged with , , , , , , , , , , , , , , , , , , , , , , , , , , , , , .


24 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. znmeb says

    RT @DZone “A Peek Under Twitter’s Hood” http://borasky-research.net/2010/02/17/a...

  2. znmeb says

    A Peek Under Twitter’s Hood (Updated!) | Borasky Research Journal http://borasky-research.net/2010/02/17/a...

  3. znmeb says

    A Peek Under Twitter’s Hood (Updated!) « Borasky Research Journal http://borasky-research.net/2010/02/17/a...

  4. znmeb says

    RT @DZone “A Peek Under Twitter’s Hood” http://borasky-research.net/2010/02/17/a...

Continuing the Discussion

  1. Justin C. Houk linked to this post on 2010/02/19

    RT @znmeb: A Peek Under Twitter’s Hood (Updated!) « Borasky Research Journal http://meb.tw/bTjysZ

  2. Ed Borasky linked to this post on 2010/02/19

    RT @znmeb: A Peek Under Twitter’s Hood (Updated!) « Borasky Research Journal http://meb.tw/bTjysZ

  3. Northlandfox linked to this post on 2010/02/20

    A Peek Under Twitter's Hood http://is.gd/8P17g

  4. Ed Borasky linked to this post on 2010/03/02

    A Peek Under Twitter’s Hood (Updated!) | Borasky Research Journal http://meb.tw/bTjysZ

  5. Lisa Tweedie linked to this post on 2010/03/02

    RT @znmeb: A Peek Under Twitter’s Hood (Updated!) | Borasky Research Journal http://meb.tw/bTjysZ

  6. Ed Borasky linked to this post on 2010/03/28

    RT @TopsyRT: A Peek Under Twitter's Hood (Updated!) http://bit.ly/adDOM1

  7. Lisa Tweedie linked to this post on 2010/03/28

    RT @znmeb: RT @TopsyRT: A Peek Under Twitter's Hood (Updated!) http://bit.ly/adDOM1

  8. K.R. O'Connell linked to this post on 2010/03/31

    RT @znmeb: RT @DZone "A Peek Under Twitter’s Hood" http://dzone.com/Rmkl

  9. Chirp – A Developer’s Perspective – Part 1 « Borasky Research Journal linked to this post on 2010/04/15

    [...] a previous post, I focused on the technologies under the hood, so I want to focus on the people who have built this phenomenon – people we developers work [...]

  10. Ed Borasky linked to this post on 2010/05/13

    A Peek Under Twitter’s Hood (Updated!) – Borasky Research Journal http://meb.tw/bTjysZ

  11. Taylor Ellwood linked to this post on 2010/05/13

    RT @znmeb: A Peek Under Twitter’s Hood (Updated!) – Borasky Research Journal http://meb.tw/bTjysZ

  12. Aviva O linked to this post on 2010/05/18

    RT .@znmeb: A Peek Under Twitter's Hood (Updated!) – Borasky Research Journal http://meb.tw/bTjysZ

  13. Mark Chouanard linked to this post on 2010/05/18

    RT @znmeb: A Peek Under Twitter’s Hood (Updated!) – Borasky Research Journal http://meb.tw/bTjysZ

  14. duchowney linked to this post on 2010/05/22

    RT @znmeb: A Peek Under Twitter’s Hood (Updated!) – Borasky Research Journal http://meb.tw/bTjysZ



Some HTML is OK

or, reply to this post via trackback.

Twitter Users
Enter your personal information in the form or sign in with your Twitter account by clicking the button below.



Borasky Research Journal is Digg proof thanks to caching by WP Super Cache