AOL surely did not think about the immense backlash they would receive from the internet community when they released anonymized logs of 500,000 AOL users at the AOL research website. The file consisted of about 20 million web queries from about 500.000 AOL users in the course of three months (march to may 2006). The AOL username was replaced by a unique ID, everything else was kept unchanged in the logs.
AOL quickly took down the website but there is still google cache copy available. The big compressed logfile (over 400 megabytes) can be obtained from Greg Sadetskys website as a web or torrent download. Pulling the information from the AOL website surely looks like they are admitting a wrong doing and are now in full damage control mode. The uncompress logfile consists of text files with a combined size of over two gigabytes.
Some questions naturally arise: Why did AOL release the data ? Why is there such a big outcry in the web community ? I think that AOL released the data for research and marketing purposes. This is a goldmine for every researcher on search engines and user interaction and marketers will surely analyse the keywords and search phrases extensively. A question remains: Why did they make the file available for download to the public ? Would not it be better to offer the file dvd for researchers only ?
Google will be offering six DVDs with search data soon as well, so it is not only evil AOL who is “sharing” user searches with others. There are two clear differences between the AOL and the Google approach: AOL released customer data, that is data of people who are paying AOL for internet access, while Google has not that customer relationship. The second difference is that AOL released the data to the public while Google will be offering the data on DVD supposedly for researchers only.
I took a look at the first text file with a size of more than 200 megabytes. Each line begins with the unique Id that replaced the username, the search queries, the time of the search and the possible destination (url) the user went to. Everything is unfiltered, you could surely create some pretty accurate profiles from the user queries. User 205405 is searching for rape, child abuse and the like while user 2603120 is looking for spanish language courses in chicago.
It will be interesting to see if some of the 500k users will sue AOL over this privacy infringement. Others will most likely demand that AOL reports some of its users to the authorities because of their searches.
Maybe it is even possible to uncover a real name by analyising the searches. I did not take a closer look at them but I saw searches for real names and others are reporting that “searches for names of specific people, addresses, telephone numbers, illegal drugs, and more can be found in the logs as well.“
Related Posts

10 Users Commented In This Post
Subscribe To This Post Comment Rss Or TrackBack URL