AOL surely did not think about the immense backlash they would receive from the internet community when they released anonymized logs of 500,000 AOL users at the AOL research website. The file consisted of about 20 million web queries from about 500.000 AOL users in the course of three months (march to may 2006). The AOL username was replaced by a unique ID, everything else was kept unchanged in the logs.
AOL quickly took down the website but there is still google cache copy available. The big compressed logfile (over 400 megabytes) can be obtained from Greg Sadetskys website as a web or torrent download. Pulling the information from the AOL website surely looks like they are admitting a wrong doing and are now in full damage control mode. The uncompress logfile consists of text files with a combined size of over two gigabytes.
Some questions naturally arise: Why did AOL release the data ? Why is there such a big outcry in the web community ? I think that AOL released the data for research and marketing purposes. This is a goldmine for every researcher on search engines and user interaction and marketers will surely analyse the keywords and search phrases extensively. A question remains: Why did they make the file available for download to the public ? Would not it be better to offer the file dvd for researchers only ?
Google will be offering six DVDs with search data soon as well, so it is not only evil AOL who is “sharing” user searches with others. There are two clear differences between the AOL and the Google approach: AOL released customer data, that is data of people who are paying AOL for internet access, while Google has not that customer relationship. The second difference is that AOL released the data to the public while Google will be offering the data on DVD supposedly for researchers only.
I took a look at the first text file with a size of more than 200 megabytes. Each line begins with the unique Id that replaced the username, the search queries, the time of the search and the possible destination (url) the user went to. Everything is unfiltered, you could surely create some pretty accurate profiles from the user queries. User 205405 is searching for rape, child abuse and the like while user 2603120 is looking for spanish language courses in chicago.
It will be interesting to see if some of the 500k users will sue AOL over this privacy infringement. Others will most likely demand that AOL reports some of its users to the authorities because of their searches.
Maybe it is even possible to uncover a real name by analyising the searches. I did not take a closer look at them but I saw searches for real names and others are reporting that “searches for names of specific people, addresses, telephone numbers, illegal drugs, and more can be found in the logs as well.“
Related posts:
Updates on the AOL ScandalDisplay a list of Web Searches done on a computer
Search Google Anonymously
Why Google Search Results Can Be Different
Track Me Not Firefox Extension
Faroo P2P Web Search
Official Google url Alternatives
Google File Search Beta 1
10 Responses to “Anonymized Logs of 500000 AOL users on the net”
Trackbacks/Pingbacks
-
AOL Search Data Shows Users Planning to commit Murder.
http://research.aol.com released a list of 20 million + searches by 500,000 AOL users. Contained in this list are social security numbers, credit cards and other personal information. There are some truly scary things in this database.
There are… -
AOL Gate: Search Query Data Scandal
Techcrunch notes that AOL has released a file containing 20,000,000 queries from “anonymized” users. However, this is a problem because anything those users typed into AOL search–social security numbers, names, drug deals, etc can be…
-
[...] True to their destructive mission, AOL has publicly released semi-anonymised logs from their search site. This essentially means that all the search queries made of over 500k users between the months of march and may 2006 were posted out in the open. [...]


Wow, “20.000.000 million web queries from about 500.000 AOL users in the course of three months”, that´s *a lot*. As in: more than 5 queries per second per user 24 hours per day, 7 days per week.
You might mean 20 million ;-)
Only kidding, nice piece you wrote today.
Absolutely unbelievable that AOL does a stupid thing like this (Or maybe I don´t know AOL well enough and than maybe it *is* believable ;-)
We now just have to wait for someone to:
break the unique code and create a mashup with other databases (say: phonebook)
I guess this will happen before September
Wanna bet ? ;-)
Was it this schandal that prompted you to post about the secretmaker software ?
Cheers,
Peter
I stand corrected ;)
I really think they wanted to improve their image by releasing the data. They made one mistake though, forgot about the rights of their customers, forgot about privacy at all.
Big blunder, I suppose some people will loose their jobs over this.
The secretmaker software would not have changed the logfiles. Your ISP always knows what you are up to. Well, unless you use encryption for your traffic that is.
Let us take a look at a proxy for example:
Normal connection.
You < --> ISP < --> Destination
Proxy Connection
You < --> ISP < --> Proxy < --> Destination
The only possible solution is encryption, everything else can be read and logged.
Yes, I see.
So is there a good overall firefox encryption extension that you would advise ?
Cheers,
Peter
well Peter, the problem with encryption is that both sides have to support it. It does no good if you encrypt your data and the other side is not able to decrypt it and send encrypted data back.
That means, there is no such thing for firefox. It is possible to use encryption in email with friends for example, or encryption on your hard drive but not the way you want it to be.
Regarding encryption: ok. But between User and Proxy it could work, couldn´t it ?
Regarding breaking the AOL semi-anonymised logs:
New York Times journalists have broken the first one already…
Cheers,
Peter
A site where you can search the data is here:
http://www.datablunder.com/logitems/query/
A *quick* site where you can search the AOL Logs for yourself, is here:
http://www.frogspy.com