ghacks Technology News

Anonymized Logs of 500000 AOL users on the net


AOL surely did not think about the immense backlash they would receive from the internet community when they released anonymized logs of 500,000 AOL users at the AOL research website. The file consisted of about 20 million web queries from about 500.000 AOL users in the course of three months (march to may 2006). The AOL username was replaced by a unique ID, everything else was kept unchanged in the logs.

AOL quickly took down the website but there is still google cache copy available. The big compressed logfile (over 400 megabytes) can be obtained from Greg Sadetskys website as a web or torrent download. Pulling the information from the AOL website surely looks like they are admitting a wrong doing and are now in full damage control mode. The uncompress logfile consists of text files with a combined size of over two gigabytes.

Some questions naturally arise: Why did AOL release the data ? Why is there such a big outcry in the web community ? I think that AOL released the data for research and marketing purposes. This is a goldmine for every researcher on search engines and user interaction and marketers will surely analyse the keywords and search phrases extensively. A question remains: Why did they make the file available for download to the public ? Would not it be better to offer the file dvd for researchers only ?

Google will be offering six DVDs with search data soon as well, so it is not only evil AOL who is “sharing” user searches with others. There are two clear differences between the AOL and the Google approach: AOL released customer data, that is data of people who are paying AOL for internet access, while Google has not that customer relationship. The second difference is that AOL released the data to the public while Google will be offering the data on DVD supposedly for researchers only.

I took a look at the first text file with a size of more than 200 megabytes. Each line begins with the unique Id that replaced the username, the search queries, the time of the search and the possible destination (url) the user went to. Everything is unfiltered, you could surely create some pretty accurate profiles from the user queries. User 205405 is searching for rape, child abuse and the like while user 2603120 is looking for spanish language courses in chicago.

It will be interesting to see if some of the 500k users will sue AOL over this privacy infringement. Others will most likely demand that AOL reports some of its users to the authorities because of their searches.

Maybe it is even possible to uncover a real name by analyising the searches. I did not take a closer look at them but I saw searches for real names and others are reporting that “searches for names of specific people, addresses, telephone numbers, illegal drugs, and more can be found in the logs as well.



Tags: , ,
Categories: Search Engines




Related posts:

  1. Updates on the AOL Scandal
  2. Display a list of Web Searches done on a computer
  3. Search Google Anonymously
  4. Track Me Not Firefox Extension
  5. Why Google Search Results Can Be Different
  6. Faroo P2P Web Search
  7. Official Google url Alternatives
  8. Realtime Search Engine Factery Labs

10 Responses to “Anonymized Logs of 500000 AOL users on the net”

  1. Peter Huesken says:

    Wow, “20.000.000 million web queries from about 500.000 AOL users in the course of three months”, that´s *a lot*. As in: more than 5 queries per second per user 24 hours per day, 7 days per week.
    You might mean 20 million ;-)

    Only kidding, nice piece you wrote today.
    Absolutely unbelievable that AOL does a stupid thing like this (Or maybe I don´t know AOL well enough and than maybe it *is* believable ;-)

    We now just have to wait for someone to:
    break the unique code and create a mashup with other databases (say: phonebook)
    I guess this will happen before September
    Wanna bet ? ;-)

    Was it this schandal that prompted you to post about the secretmaker software ?

    Cheers,
    Peter

  2. Martin says:

    I stand corrected ;)

    I really think they wanted to improve their image by releasing the data. They made one mistake though, forgot about the rights of their customers, forgot about privacy at all.

    Big blunder, I suppose some people will loose their jobs over this.

    The secretmaker software would not have changed the logfiles. Your ISP always knows what you are up to. Well, unless you use encryption for your traffic that is.

    Let us take a look at a proxy for example:

    Normal connection.

    You < --> ISP < --> Destination

    Proxy Connection

    You < --> ISP < --> Proxy < --> Destination

    The only possible solution is encryption, everything else can be read and logged.

  3. Peter Huesken says:

    Yes, I see.

    So is there a good overall firefox encryption extension that you would advise ?

    Cheers,
    Peter

  4. Martin says:

    well Peter, the problem with encryption is that both sides have to support it. It does no good if you encrypt your data and the other side is not able to decrypt it and send encrypted data back.

    That means, there is no such thing for firefox. It is possible to use encryption in email with friends for example, or encryption on your hard drive but not the way you want it to be.

  5. Peter Huesken says:

    Regarding encryption: ok. But between User and Proxy it could work, couldn´t it ?

    Regarding breaking the AOL semi-anonymised logs:
    New York Times journalists have broken the first one already…

    Cheers,
    Peter

  6. ty says:

    A site where you can search the data is here:

    http://www.datablunder.com/logitems/query/

  7. A *quick* site where you can search the AOL Logs for yourself, is here:

    http://www.frogspy.com

Leave a Reply   Follow Ghacks   Subscribe To Comment Rss

© 2005-2010 Ghacks.net. All Rights Reserved. Privacy Policy - About Us