<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>gHacks technology news &#187; aol anonymized logs</title>
	<atom:link href="http://www.ghacks.net/tag/aol-anonymized-logs/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ghacks.net</link>
	<description>A technology blog covering software, mobile phones, gadgets, security, the Internet and other relevant areas.</description>
	<lastBuildDate>Mon, 23 Nov 2009 22:22:46 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Anonymized Logs of 500000 AOL users on the net</title>
		<link>http://www.ghacks.net/2006/08/07/anonymised-logs-of-500000-aol-users-on-the-net/</link>
		<comments>http://www.ghacks.net/2006/08/07/anonymised-logs-of-500000-aol-users-on-the-net/#comments</comments>
		<pubDate>Mon, 07 Aug 2006 07:39:01 +0000</pubDate>
		<dc:creator>Martin</dc:creator>
				<category><![CDATA[Search Engines]]></category>
		<category><![CDATA[aol]]></category>
		<category><![CDATA[aol anonymized logs]]></category>
		<category><![CDATA[aol logs]]></category>

		<guid isPermaLink="false">http://www.ghacks.net/2006/08/07/anonymised-logs-of-500000-aol-users-on-the-net/</guid>
		<description><![CDATA[AOL surely did not think about the immense backlash they would receive from the internet community when they released anonymised logs of 500,000 AOL users at the AOL research website. The file consisted of about 20.000.000 million web queries from about 500.000 AOL users in the course of three months (march to may 2006). The AOL username was replaced by a unique ID, everything else was kept unchanged in the logs.]]></description>
			<content:encoded><![CDATA[<p>AOL surely did not think about the immense backlash they would receive from the internet community when they released anonymized logs of 500,000 AOL users at the AOL research website. The file consisted of about 20 million web queries from about 500.000 AOL users in the course of three months (march to may 2006). The AOL username was replaced by a unique ID, everything else was kept unchanged in the logs.</p>
<p>AOL quickly took down the website but there is still google cache copy available. The big compressed logfile (over 400 megabytes) can be obtained from <a title="aol log files download" target="_blank" href="http://www.gregsadetsky.com/aol-data/">Greg Sadetskys website</a> as a web or torrent download. Pulling the information from the AOL website surely looks like they are admitting a wrong doing and are now in full damage control mode. The uncompress logfile consists of text files with a combined size of over two gigabytes.</p>
<p><span id="more-693"></span>Some questions naturally arise: Why did AOL release the data ? Why is there such a big outcry in the web community ? I think that AOL released the data for research and <a target="_blank" title="marketing and the aol 500 k logs" href="http://plentyoffish.wordpress.com/2006/08/06/aol-releases-googles-most-prized-keyword-list-google-is-gonna-get-mega-spammed/">marketing purposes</a>. This is a goldmine for every researcher on search engines and user interaction and marketers will surely analyse the keywords and search phrases extensively. A question remains: Why did they make the file available for download to the public ? Would not it be better to offer the file dvd for researchers only ?</p>
<p><a title="google 6 dvd user searches" target="_blank" href="http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html">Google</a> will be offering six DVDs with search data soon as well, so it is not only evil AOL who is &#8220;sharing&#8221; user searches with others. There are two clear differences between the AOL and the Google approach: AOL released customer data, that is data of people who are paying AOL for internet access, while Google has not that customer relationship. The second difference is that AOL released the data to the public while Google will be offering the data on DVD supposedly for researchers only.</p>
<p>I took a look at the first text file with a size of more than 200 megabytes. Each line begins with the unique Id that replaced the username, the search queries, the time of the search and the possible destination (url) the user went to.  Everything is unfiltered, you could surely create some pretty accurate profiles from the user queries. User 205405 is searching for rape, child abuse and the like while user 2603120 is looking for spanish language courses in chicago.</p>
<p>It will be interesting to see if some of the 500k users will sue AOL over this privacy infringement. Others will most likely demand that AOL reports some of its users to the authorities because of their searches.</p>
<p>Maybe it is even possible to uncover a real name by analyising the searches. I did not take a closer look at them but I saw searches for real names and <a target="_blank" title="techcrunch" href="http://www.techcrunch.com/2006/08/06/aol-proudly-releases-massive-amounts-of-user-search-data/">others</a> are reporting that &#8220;<span style="font-style: italic">searches for names of specific people, addresses, telephone numbers, illegal drugs, and more can be found in the logs as well.</span>&#8220;</p>

	Tags: <a href="http://www.ghacks.net/tag/aol/" title="aol" rel="tag">aol</a>, <a href="http://www.ghacks.net/tag/aol-anonymized-logs/" title="aol anonymized logs" rel="tag">aol anonymized logs</a>, <a href="http://www.ghacks.net/tag/aol-logs/" title="aol logs" rel="tag">aol logs</a><br />

	<h4>Related posts</h4>
	<ul class="st-related-posts">
	<li><a href="http://www.ghacks.net/2006/08/09/updates-on-the-aol-scandal/" title="Updates on the AOL Scandal (August 9, 2006)">Updates on the AOL Scandal</a> (1)</li>
	<li><a href="http://www.ghacks.net/2008/07/29/whats-up-with-the-download-squad/" title="What&#8217;s Up With The Download Squad (July 29, 2008)">What&#8217;s Up With The Download Squad</a> (13)</li>
	<li><a href="http://www.ghacks.net/2009/03/11/truemark-email-identification/" title="Truemark Email Identification (March 11, 2009)">Truemark Email Identification</a> (5)</li>
	<li><a href="http://www.ghacks.net/2008/05/22/send-aim-messages-without-a-software/" title="Send AIM Messages without a software (May 22, 2008)">Send AIM Messages without a software</a> (3)</li>
	<li><a href="http://www.ghacks.net/2008/02/19/remove-viewmgrexe/" title="Remove viewmgr.exe (February 19, 2008)">Remove viewmgr.exe</a> (0)</li>
</ul>

]]></content:encoded>
			<wfw:commentRss>http://www.ghacks.net/2006/08/07/anonymised-logs-of-500000-aol-users-on-the-net/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
	</channel>
</rss>
