Ways to find websites that are copying your content - gHacks Tech News

Ways to find websites that are copying your content

There are a couple of reasons why you may want to make sure no other website on the Internet has published contents that you have created without proper authorization. The main reason from a webmaster perspective is duplicate content that is caused by scrapers. Google, Bing and other search engines most of the time gets it right, and will rank your content in front of the scraping sites. Sometimes however they do not so that your site ends up taking a backseat as it is outranked by sites that have copied your articles on theirs.

There are other issues that need to be mentioned, like attribution or the chance that you are associated with a website that you have no affiliation with whatsoever.

Webmasters have a couple of options to deal with scraping sites.

  • Contact a webmaster or owner per email. If there is no web form or contact option on the site, try looking up whois records and using the email addresses there. Even if the whois data is protected by a proxy, there is an email address linked here that you can use.
  • If that is not working out, I'd give it a week's time, you are left with no other option than to send a DMCA request to the website owner and maybe even the provider that is hosting the site to get the contents removed.

Finding websites that copy your content

One of the best options to go about that is to copy a sentence or paragraph from your article and search for it on sites like Google Search or Bing.

copied website contents

I suggest you add the sites to a list first before you visit them one by one to find contact information. Instead of searching for a sentence in your article, you can alternative search for the title in quotes instead. That works only however if it is a unique title.

Another option that you have is to look at the trackbacks and pingbacks that your website receives if the data is available to you. WordPress for instance displays those information in the admin interface. Here you then need to click through to the sites to see if and how they have copied your contents. Some may only have quoted your content, or only referenced it as a link, while others will have copied it word for word on their sites.

Here are a couple of trackbacks of a site that not only scrapes the content, but also runs it through so-called spinning software which automatically replaces words with other words or phrases so that it passes as unique content and not duplicate content. While it is obvious for human readers that the contents do not make any sense, search engine bots are not able to do that just yet.

scraped content

WordPress administrators can filter the comments by ping so that only trackbacks and pingbacks are listed and not user comments.

Commercial services

Well known services such as Copyscape or Plagiarismcheck provide you with search options and sometimes even monitoring for a price. Copyscape Premium for instance starts at $.05 cents per search. For that, you get options like batch scanning up to 10,000 pages for copyright issues, full access to the service's database and options to exclude results from certain sites.





  • We need your help

    Advertising revenue is falling fast across the Internet, and independently-run sites like Ghacks are hit hardest by it. The advertising model in its current form is coming to an end, and we have to find other ways to continue operating this site.

    We are committed to keeping our content free and independent, which means no paywalls, no sponsored posts, no annoying ad formats (video ads) or subscription fees.

    If you like our content, and would like to help, please consider making a contribution:

    Comments

    1. bennix said on December 4, 2012 at 4:06 pm
      Reply

      Great sharing… I am using the same manual method and its really working though you need to patiently search using your post title or a sentence…

    2. tarkan dost said on December 4, 2012 at 5:01 pm
      Reply

      google is responsible because their pagerank and search mechanismus is very stupid

    3. Tim said on December 4, 2012 at 5:22 pm
      Reply

      How did you get your Google search tools back on the left-hand side?

      1. Martin Brinkmann said on December 4, 2012 at 5:37 pm
        Reply

    Leave a Reply