May 132010
 

Today, May 13, 2010, marks the 25th anniversary of my arrival in the Portland, Oregon area. I’ve lived here longer than I’ve lived anywhere else, and I call it my home. For those of you reading this from outside the area, I invite you to come visit us. There are lots of conferences, festivals and other reasons to come here, but – well – it’s just an all-around wonderful place.

But the thing is, I don’t actually live in Portland, but in a suburb called Aloha that, strangely enough, has nothing to do with Hawaii. And I’ve never been to Hawaii, so I can’t very well ask you to visit there, can I? So, yes, definitely visit Portland!

What do we have?

  1. Water. Two major rivers meet here, and fresh water literally falls out of the sky free for the taking! If you like it salty, there’s a few bays and coves a couple of hours to the West.
  2. Air. We get our air mostly fresh off the ocean, or occasionally funneled through the Columbia Gorge by a high-pressure cell. In any event, we get it before much of the US, and we try our damnedest not to add stuff to it on its way East.
  3. Mountains. Yeah, there’s one not too far away that gave us a little trouble in 1980, but for the most part, they’re pretty to look at and a great place to go skiing.
  4. Parks. There are so many, I can’t list them all, so I’ll just give you a link to my favorite. Tryon Creek State Park. And my second favorite, Cooper Mountain Nature Park.
  5. Beer. Contrary to popular belief, you can get imported beer here. But why would you? Ours has better hops, has more alcohol, is served in pubs, restaurants, banquet halls and even movie theaters! To quote Wikipedia, “In 2008, Portland had 30 microbreweries located within the city limits, more than any city in the world and greater than one-third of the state total.[4] With 46 microbrew outlets, Portland has more breweries and brewpubs per capita than any other city in the United States.[5] Many have won nationwide and international acclaim.
  6. Food and wine. We grow it. We catch it in the ocean. We make it. We cook it. We eat it. We package it up and ship it. And we love to share it. Our food cart scene has been featured on national television and in the New York Times.
  7. Entertainment. New York has Greenwich Village. Washington has Georgetown. Portland has — Portland! Jazz, folk, rock, symphonic, chamber, ballet, opera, and two new music ensembles. Portland has numerous theater companies and a major performing arts center. We have listener-supported jazz and classical radio stations heard around the world on the Internet. Oh, yeah – if you happen to hear bagpipes, they just might be coming from a unicyclist.
  8. Bloggers and Tweeters and Geeks, Oh! My!
    • I’m a blogger. This is my blog. I used to have five others. And I have a LinkedIn page. And I tweet. A lot – at last count more than any other Portlander. Folks around here call me @znmeb.
    • Geeks: we have Linus Torvalds. Perhaps you’ve heard of Linux? He invented it. We have Ward Cunningham. Perhaps you’ve heard of the wiki? He invented it. We have major contributors to Perl, PostgreSQL, Ruby, WordPress and other open source projects. We have Jive Software and Zapproved. We have the Silicon Florist. We have 30 Hour Day. We have Strange Love Live.
    • We love social media, software and (wait for it) social media software! Software is a craft here, just like belts, jewelry and beer. You can actually sit and watch us make it in coffee shops and pubs.

So if you’re looking for a great city to visit this year, we’re here! Just be careful crossing the street if you hear bagpipes.

 Posted by at 11:46
May 102010
 

In case you want to block Facebook entirely, here are two links with directions. Enjoy!

How to Block Facebook http://meb.tw/ccqSEx
How to Block Facebook on your Computer without Software | eHow.com http://meb.tw/bQKOQq


Updated May 18th: Comments, pingbacks and trackbacks are back on. I’m still looking for the magic incantation to suppress the tweetbacks, though.

On more substantive matters:

1. There is a search site being bandied about the Internet that supposedly shows scandalous information freely searchable out of Facebook. I’m not a lawyer, but I strongly suspect that said site, which I am not going to name or link to, is in direct violation of Facebook’s Terms of Service. So, yes, it’s a great example of the risks, but I don’t think it’s a particularly productive way to achieve change.

2. There’s an old saying: “If you like us, tell your friends. If you don’t like us, tell us.”

In the case of Facebook, the “us” is the Board of Directors. Thanks to @davidhstannard and Wikipedia, Facebook’s Board of Directors is

http://en.wikipedia.org/wiki/Peter_Thiel
http://en.wikipedia.org/wiki/Marc_Andreessen
http://en.wikipedia.org/wiki/Mark_Zuckerberg
http://en.wikipedia.org/wiki/Donald_E._Graham
http://en.wikipedia.org/wiki/Jim_Breyer

It is their job to ensure that Facebook is responsive to the needs of all the stakeholders

So, if you would like to see change happen at Facebook, those are the gentlemen who can make it happen.


Updated May 16th: I’ve disabled pingbacks and trackbacks for this article. Apparently I’m picking up a comment every time someone tweets a link to this article. I’ve deleted the tweets — I think they’re annoying.

I’ve also deleted a few links from comments. They are links to places I don’t support. This is a moderated blog. It always has been and always will be.


Yesterday afternoon, May 11, 2010, at 18:01 Pacific Daylight Time, I initiated a deletion of my Facebook account. In what has to be the most insulting part of the process, after I acknowledged that I indeed wanted to delete the account and successfully entered the CAPTCHA codes, I was told that it would take 14 days for the deletion to take place.

I can understand a one-day “grace” period, just on the off chance that someone might have captured my credentials and deleted my account without my knowledge. But not 14 days. That’s just plain insulting.

Now if I were really paranoid … (Thanks, @sooperay!)

One final note: Facebook’s Terms of Service for developers now state:

“You must give users control over their data by posting a privacy policy that explains what data you collect, and how you will use, store, and/or transfer their data….You may cache data you receive from the Facebook API in order to improve your application’s user experience, but you should try to keep the data up to date…You will delete all data you receive from us concerning a user if the user asks you to do so, and will provide a mechanism for users to make such a request. (emphasis added)”

So, Mr. Zuckerberg, can you provide me with a complete list of these developers who have received data from Facebook concerning me, so I can initiate the process of requesting that they delete all data they’ve received? Thanks in advance for your prompt attention in this matter!


Sometimes, satire says it best:

It is no longer necessary to write new stories about Facebook privacy issues; just change the dates.” – @FakeAPStylebook on Twitter.

A popular interactive visualization by blogger Matt McKeon shows how Facebook has systematically made more and more information about its members public over time. Click on the image to show the advance over time, or click individual times to see what was public at any one time.

As the Fake AP Stylebook notes, with each day that passes, it gets harder and harder to say something new about Facebook and privacy. Bloggers like Marshall Kirkpatrick, Caroline McCarthy, Eben Moglen and Robert Scoble regularly write about the erosion of privacy on line in general, and among Facebook members in particular.

I don’t spend a lot of time on Facebook. It is my only connection online to a few friends and relatives, but for the most part, my online networking is done on LinkedIn and Twitter. Over the year that I’ve been on Facebook, I’ve used it mostly as a way of finding out when and where local musicians are performing.

I am now making plans to delete my Facebook account. I’ve sent a message to those few friends of mine on Facebook that I have no other online connection with, and I have deactivated the account. I expect to delete the account within the month. Why am I leaving?

“I don’t know about you, but I have not yet witnessed a spontaneous recovery from incompetence.” – Susan Scott, Fierce Conversations.

And I think that’s what we’re talking about when we talk about Facebook and privacy. I think we are talking about massive incompetence. I am planning to leave Facebook because I believe their management is incompetent.

As shown in Matt McKeon’s interactive visualization, Facebook has changed. It has changed from a place where people could connect in safety and privacy to a huge data mine. Facebook’s 400+ million members’ personal data and online behavior tracks are apparently not only public, but for sale.

Facebook’s management has ignored the howls of protest from privacy advocates like Eben Moglen of the Software Freedom Law Center:

“The human race has susceptibility to harm but Mr. Zuckerberg has attained an unenviable record: he has done more harm to the human race than anybody else his age.”

Facebook’s management has ignored the concerns of respected journalists like ReadWriteWeb’s Marshall Kirkpatrick.

“I don’t buy Zuckerberg’s argument that Facebook is now only reflecting the changes that society is undergoing. I think Facebook itself is a major agent of social change and by acting otherwise Zuckerberg is being arrogant and condescending.”

Consumer groups have filed complaints with the FTC, four United States Senators have written to Facebook suggesting a reversal of recent changes, and hardly a day goes by without disclosure of yet another “bug” allowing personal data to “leak” out of Facebook. One of those Senators, Al Franken, has even posted instructions for disabling Facebook’s recent “gift” of members’ personal information to third parties. I don’t know what to call Facebook’s lack of response and failure to take the actions suggested by the Senators except incompetence.

The more polite of the protesters are calling for a single-day boycott.

Facebook Protest Facebook Group

“June 6th, 2010, chosen for being D-Day: Commit to NOT LOGGING INTO FACEBOOK for ONE DAY! That one day could cost them millions. Maybe THEN Zuckerberg will ‘believe in’ privacy.”

And if you do a Twitter search for “Facebook privacy”, most likely what you will find is anti-Facebook blog posts by big-name bloggers and tweeters, instructions for how to disable the latest Facebook “gifts” of personal information to third parties, links to the interactive visualization above, and so on. Here’s an Atom feed link if you want to see for yourself.

In the face of all of this, I don’t see how Facebook’s CEO, Mark Zuckerberg, can continue to claim:

“And then in the last 5 or 6 years, blogging has taken off in a huge way and all these different services that have people sharing all this information. People have really gotten comfortable not only sharing more information and different kinds, but more openly and with more people. That social norm is just something that has evolved over time.”

The only explanation I can offer is incompetence — a total failure to, as Susan Scott so eloquently puts it, to “master the courage to interrogate reality.”

Perhaps the most telling article of them all is by Caroline McCarthy of CNET News: “Understanding Facebook’s Privacy Aftershocks“. A sample:

 Posted by at 16:18
May 062010
 

Data Journalism Developer Studio 2012 Overview

Download Data Journalism Developer Studio 2012 From SUSE Gallery

Data Journalism Developer Studio on Github

Data Journalism Developer Studio 2012 Blog


The rise of Twitter has made large quantities of text available in multiple languages. As a result, text processing, text analytics and other natural language processing techniques have become a staple in business intelligence. So I’ve put together a list of what I think are the essential references in the area. I’ve attempted to arrange them in order of increasing mathematical sophistication. And, as always, I’ve provided Powell’s Partner Program links so you can buy them.

Most of the algorithms described in these books are available in the Data Journalism Developer Studio. For a complete description of the toolkit, see About The Data Journalism Developer Studio.


Even if you’re not a Python programmer, this book is probably the best place to start. The book will walk you through the Python language, and it’s written by the experts on the Python Natural Language Tool Kit. The Python Natural Language Toolkit is one of the featured components of my Social Media Analytics Research Toolkit.


For Perl programmers, this book is a good place to start. Topics include pattern matching, data structures, probability, information retrieval, corpus linguistics, multivariate statistics, clustering and an introduction to R programming. Both Perl and R are available in the Social Media Analytics Research Toolkit.


After you’ve gotten started, this book will give you a good overview of the more technical and mathematical aspects of natural language processing. Topics include classical approaches, empirical and statistical approaches and applications. Machine translation, speech recognition, information retrieval, question answering, ontology construction and sentiment analysis are all covered.

The chapter on sentiment analysis is particularly well done. It covers most current techniques and includes sections on dealing with spam. Sentiment analysis is still somewhat controversial, although nearly all social media monitoring providers include it in some form. This chapter provides much-needed clarity on just what is and isn’t possible in sentiment analysis. Chapter 13, “Normalized Web Distance and Word Similarity”, is also notable. It describes the algorithms used in the CompLearn suite of programs that are part of the Social Media Analytics Research Toolkit.


This isn’t strictly a book about either text processing or natural language processing, but I’ve included it for three reasons:

  1. It covers all of the matrix decompositions one would use in text processing and natural language processing.
  2. It covers algorithms for social graph analysis.
  3. It has a very readable introduction to using tensors – arrays with more than two dimensions – in data mining. My opinion is that tensor-based algorithms are the future of natural language processing in general and text analytics in particular.

While also not totally about text / natural language processing, this book is an excellent overview of the technologies used in counterterrorism. There’s not as much technical detail as there is in the other books – you’ll need to go following the references. I’ve included this book because I see great potential for some of the technologies in business intelligence. For example, mining data for people in a social media site who “should” be friends or followers but aren’t is one technique businesses could “borrow” from law enforcement.


This book is an excellent overview of some of the more recent research in text mining. It includes chapters on “Detection of Bias in Media Outlets with Statistical Learning Methods”, “Topic Models”, “Utility-Based Information Distillation” and “Adaptive Information Filtering”. But in my opinion one chapter, “Nonnegative Matrix and Tensor Factorization for Discussion Tracking”, justifies the purchase of the book on its own.

Much of modern text processing depends on linear algebra over so-called “bag of words” vector space models. In such a model, keywords are extracted from the text and a collection of documents — called a corpus — is represented by arrays of keyword frequencies. In these models, a matrix is a two-dimensional array of the frequencies, usually indexed by keywords for rows and documents or document authors by columns.


Latent Semantic Analysis, sometimes called Latent Semantic Indexing, is a common technique in natural language processing, and this book explores it in both mathematical and practical detail. There is also a chapter on probabilistic topic modeling, sometimes called latent Dirichlet analysis. If you want to experiment with these techniques, I recommend the open source Java-language Mallet package. Mallet is included in the Social Media Analytics Research Toolkit, as are R language tools for latent semantic analysis.


Finally, we come to my current area of research, Topic Detection and Tracking. This book is the classic reference on the subject, and is required reading if you’re interested in automated journalism.


Appendix – R Language Natural Language Processing Task View

In addition to the Python Natural Language Tool Kit and Mallet, the Social Media Analytics Research Toolkit contains the R Natural Language Processing Task View. Here’s a copy of the contents of that task view as of 2010-08-23:

CRAN Task View: Natural Language Processing


Didn’t find what you’re looking for?

Click here to visit Powell's Books!

 Posted by at 10:03