Privacy, Large Dataset Research, and the Netflix Prize

Netflix recently announced the cancellation of the second Netflix Prize in a post on its blog. A large number of researchers entered the first contest as it offered an opportunity to work with a large real world dataset combined with the promise of a one million dollar prize and worldwide publicity.

The company’s decision to cancel the contest settled a private lawsuit described by Ryan Singel in his Wired article Netflix Spilled Your Brokeback Mountain Secret, Lawsuit Claims and closed an inquiry from the Federal Trade Commission explained in a Wall Street Journal blog post FTC’s Privacy Worries Prompt Netflix to Cancel Contest byJennifer Valentino-DeVries.

In an earlier column, The State of User Tracking and the Impossibility of Anonymizing Data, I described current research on de-anonymization and re-identification and in particular problems with the Netflix contest. Arvind Narayanan and Vitaly Shmatikov wrote An open letter to Netflix from the authors of the de-anonymization paper. The authors say they hope Netflix will continue to work with researchers in a way that allows for further advances, but that also preserves privacy through techniques such as differential privacy.

Bellkor’s Pragmatic Chaos, the team that claimed the prize wrote about the contest and the implications of their findings in the IEEE Spectrum article The Million Dollar Programming Prize. Dan Gillick further describes the winning solution in Predicting Movie Ratings: The Math That Won The Netflix Prize.