<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
> <channel><title>gHacks Technology News &#124; Latest Tech News, Software And Tutorials &#187; google bot</title> <atom:link href="http://www.ghacks.net/tag/google-bot/feed/" rel="self" type="application/rss+xml" /><link>http://www.ghacks.net</link> <description>A technology news blog covering software, mobile phones, gadgets, security, the Internet and other relevant areas.</description> <lastBuildDate>Sat, 11 Feb 2012 09:52:46 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.3.1</generator> <atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/> <item><title>Access Websites As Google Bot</title><link>http://www.ghacks.net/2010/05/05/access-websites-as-google-bot/</link> <comments>http://www.ghacks.net/2010/05/05/access-websites-as-google-bot/#comments</comments> <pubDate>Wed, 05 May 2010 09:03:05 +0000</pubDate> <dc:creator>Martin Brinkmann</dc:creator> <category><![CDATA[The Web]]></category> <category><![CDATA[be the bot]]></category> <category><![CDATA[google bot]]></category> <category><![CDATA[search engine]]></category> <category><![CDATA[washington post]]></category> <category><![CDATA[yahoo bot]]></category> <guid
isPermaLink="false">http://www.ghacks.net/?p=25155</guid> <description><![CDATA[Google bot is the general term for Google&#8217;s automated web crawling service that is linked to the Google search engine. Google sends out requests to webpages that use a Google Bot user agent. This specific user agent is used for several purposes including identification and restrictions. Webmasters can for instance filter out Google Bot from [...]]]></description> <content:encoded><![CDATA[<p>Google bot is the general term for Google&#8217;s automated web crawling service that is linked to the Google search engine. Google sends out requests to webpages that use a Google Bot user agent. This specific user agent is used for several purposes including identification and restrictions.</p><p>Webmasters can for instance filter out Google Bot from their website statistics to get a better picture of how many real users visit the site in a given time.</p><p>Some webmasters and services on the other hand try to cheat by allowing Google Bot access to all of their contents while they display a registration or buy page to users who want to access the same information.</p><p><span
id="more-25155"></span>That&#8217;s not allowed according to Google&#8217;s terms of use but some webmasters do it nevertheless.</p><p>Some users had now the idea to pose as Google Bot to access the information without buying or registering first.</p><p><a
href="http://www.avivadirectory.com/bethebot/">Be The Bot</a> is a website that simplifies the process. It contains a form where a web address can be entered. The user can also select to pose as Google Bot or Yahoo Bot. The requested url will then be displayed on the same screen.</p><p><img
src="http://www.ghacks.net/wp-content/uploads/2010/05/bethebot_google_bot-500x297.png" alt="bethebot google bot" title="bethebot google bot" width="500" height="297" class="alignnone size-medium wp-image-25156" /></p><blockquote><p> Have you ever been googleing something, and you see exactly what you need in the preview, but when you click the link it doesnt show you what you want to see?<br
/> This is because the owners of the site are trying to trick you into buying something, or registering. It&#8217;s a common tactic on the internet. When Google visits the site, it gives something called a &#8220;Header&#8221;. This header tells the site who the visitor is. Google&#8217;s header is &#8220;Googlebot&#8221;. The programmers of the site check to see if the header says &#8220;Googlebot&#8221;, and if it does, it opens up all of its content for only googles eyes.</p></blockquote><p>This works on all pages that allow Google Bot or Yahoo Bot complete access to their website but block visitors by asking them to register or buy first.</p><p>It works for instance on the Washington Post website which asks visitors to register before they can read the contents that are posted on the site. Copying the url from the website of the Post or opening washingtonpost.com in the url form at Be The Bot will provide unrestricted immediate access to the contents. (via <a
href="http://www.online-tech-tips.com/cool-websites/view-members-only-content-without-registering/">Online Tech Tips</a>)</p> ]]></content:encoded> <wfw:commentRss>http://www.ghacks.net/2010/05/05/access-websites-as-google-bot/feed/</wfw:commentRss> <slash:comments>7</slash:comments> </item> <item><title>Check your robots.txt at Google</title><link>http://www.ghacks.net/2007/05/20/check-your-robotstxt-at-google/</link> <comments>http://www.ghacks.net/2007/05/20/check-your-robotstxt-at-google/#comments</comments> <pubDate>Sun, 20 May 2007 19:14:28 +0000</pubDate> <dc:creator>Martin Brinkmann</dc:creator> <category><![CDATA[Google]]></category> <category><![CDATA[The Web]]></category> <category><![CDATA[check robots.txt]]></category> <category><![CDATA[google bot]]></category> <category><![CDATA[robots.txt]]></category> <category><![CDATA[webmaster central]]></category> <category><![CDATA[wordpress robots.txt]]></category> <guid
isPermaLink="false">http://www.ghacks.net/2007/05/20/check-your-robotstxt-at-google/</guid> <description><![CDATA[A robots.txt tells the search engine robots about the directories and files that should be indexed on a website. A wrongly edited robots.txt file could mean that search engine robots are not crawling your website anymore with the result that new articles will not be indexed in the search engine index.
This can be quite devastating especially for webmasters who do earn their living from their websites. It takes up to two weeks until changes can be seen when you edit the robots.txt file which is way to long if you made a mistake. ]]></description> <content:encoded><![CDATA[<p>A robots.txt tells the search engine robots about the directories and files that should be indexed on a website. A wrongly edited robots.txt file could mean that search engine robots are not crawling your website anymore with the result that new articles will not be indexed in the search engine index.</p><p>This can be quite devastating especially for webmasters who do earn their living from their websites. It takes up to two weeks until changes can be seen when you edit the robots.txt file which is way to long if you make a mistake.</p><p>One great way to check if your robots.txt is valid and does exactly what you want it to do is to use check it in realtime at <a
href="https://www.google.com/accounts/ServiceLogin?service=sitemaps&amp;passive=true&amp;nui=1&amp;continue=https://www.google.com/webmasters/tools/siteoverview&amp;followup=https://www.google.com/webmasters/tools/siteoverview&amp;hl=en" target="_blank">Google&#8217;s Webmaster Central</a> service. The first thing that you have to do is to create a free account and add your site to it.</p><p><span
id="more-1576"></span>Once that is done you can access various services that are offered. One of them is the <em>robots.txt analysis</em> which lets you check your robots.txt on your website. Google automatically retrieves the robots.txt from your website if one exists and adds the main url to the list of urls that should be checked.</p><p>You may add new entries to the robots.txt and to the list of urls that should be checked. This is important because of two reasons. First, you want to check new entries or a complete new robots.txt file which means you have to add and edit entries.</p><p>It is also important to check various urls and not only the main url. If you take ghacks for example. All article pages have a certain syntax which differs from that of the main page. To give you an example, I did add the following robots.txt file and articles pages. This is the right way if you run a WordPress blog. If you do run a different website you do need to add a different robots.txt and pages of course..</p><p><strong>robots.txt</strong></p><p>User-agent: *<br
/> Disallow: /wp-<br
/> Disallow: /feed/<br
/> Disallow: /trackback/<br
/> Disallow: /rss/<br
/> Disallow: /comments/feed/<br
/> Disallow: /page/<br
/> Disallow: /date/<br
/> Disallow: /comments/</p><p>User-agent: Googlebot<br
/> Disallow: /*/feed/$<br
/> Disallow: /*/feed/rss/$<br
/> Disallow: /*/trackback/$<br
/> Disallow: /*?*<br
/> Disallow: /*?</p><p># This is the ad bot for google<br
/> User-agent: Mediapartners-Google*</p><p># Allow Everything<br
/> Allow: /*</p><p><strong>Test URLs against this robots.txt file</strong></p><p>http://www.ghacks.net/<br
/> http://www.ghacks.net/2007/05/20/support-ghacks/<br
/> http://www.ghacks.net/tag/<br
/> http://www.ghacks.net/category/<br
/> http://www.ghacks.net/2007/05/20/flitter-a-flickr-twitter-realtime-screensaver/trackback/</p><p>You may add a second search engine bot which should also try and crawl the site. It would be a good idea to select the Adsense bot for instance. Clicking on check displays the results if Google bot wanted to crawl your website.</p><p>Allowed means that Google Bot is able to crawl that type of sites will Blocked means that this type of sites will not be crawled. If the results are not to your satisfaction you can easily edit the robots.txt and check again until they are.</p> ]]></content:encoded> <wfw:commentRss>http://www.ghacks.net/2007/05/20/check-your-robotstxt-at-google/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> <item><title>Free Answers from Experts-Exchange.com</title><link>http://www.ghacks.net/2006/07/11/free-answers-from-experts-exchangecom/</link> <comments>http://www.ghacks.net/2006/07/11/free-answers-from-experts-exchangecom/#comments</comments> <pubDate>Tue, 11 Jul 2006 20:19:57 +0000</pubDate> <dc:creator>Martin Brinkmann</dc:creator> <category><![CDATA[Hacking]]></category> <category><![CDATA[experts exchange]]></category> <category><![CDATA[google bot]]></category> <category><![CDATA[members]]></category> <guid
isPermaLink="false">http://www.ghacks.net/2006/07/11/free-answers-from-experts-exchangecom/</guid> <description><![CDATA[You might have stumbled upon this site if you ever had a computer related question that you could not answer. First step, ask google. Second step, find websites that force you to login or even pay to see the answer. Big thanks to eonestudio for making this little trick public. The above link displays a video which shows how you can take a look at the answers for free.]]></description> <content:encoded><![CDATA[<p>You might have stumbled upon this site if you ever had a computer related question that you could not answer. First step, ask google. Second step, find websites that force you to login or even pay to see the answer. Big thanks to eonestudio for making this little trick public. The above link displays a video which shows how you can take a look at the answers for free.</p><p>All you need to do is the following. Search for something, e.g. printing from a website, the search string you would enter would look like the following.</p><p>printing from a website site:experts-exchange.com</p><p>Please note that it is not necessary to use the site:experts-exchange.com parameter, this is only done for demonstration purposes.</p><p><span
id="more-623"></span>The first search result will link directly to the experts-exchange.com article with no solution but a button stating View Solution. If you click that button you see that you will have to subscribe to view the solution. Not today though.</p><p>Go back to the Google results page and click on Cached which loads a cached version of that webpage. Guess what ? The cached page holds all the answers to the question right at the bottom. Just scroll down and see for yourself.</p> ]]></content:encoded> <wfw:commentRss>http://www.ghacks.net/2006/07/11/free-answers-from-experts-exchangecom/feed/</wfw:commentRss> <slash:comments>13</slash:comments> </item> </channel> </rss>
