Ways to find websites that are copying your content
There are a couple of reasons why you may want to make sure no other website on the Internet has published contents that you have created without proper authorization. The main reason from a webmaster perspective is duplicate content that is caused by scrapers. Google, Bing and other search engines most of the time gets it right, and will rank your content in front of the scraping sites. Sometimes however they do not so that your site ends up taking a backseat as it is outranked by sites that have copied your articles on theirs.
There are other issues that need to be mentioned, like attribution or the chance that you are associated with a website that you have no affiliation with whatsoever.
Webmasters have a couple of options to deal with scraping sites.
- Contact a webmaster or owner per email. If there is no web form or contact option on the site, try looking up whois records and using the email addresses there. Even if the whois data is protected by a proxy, there is an email address linked here that you can use.
- If that is not working out, I'd give it a week's time, you are left with no other option than to send a DMCA request to the website owner and maybe even the provider that is hosting the site to get the contents removed.
Finding websites that copy your content
One of the best options to go about that is to copy a sentence or paragraph from your article and search for it on sites like Google Search or Bing.
I suggest you add the sites to a list first before you visit them one by one to find contact information. Instead of searching for a sentence in your article, you can alternative search for the title in quotes instead. That works only however if it is a unique title.
Another option that you have is to look at the trackbacks and pingbacks that your website receives if the data is available to you. WordPress for instance displays those information in the admin interface. Here you then need to click through to the sites to see if and how they have copied your contents. Some may only have quoted your content, or only referenced it as a link, while others will have copied it word for word on their sites.
Here are a couple of trackbacks of a site that not only scrapes the content, but also runs it through so-called spinning software which automatically replaces words with other words or phrases so that it passes as unique content and not duplicate content. While it is obvious for human readers that the contents do not make any sense, search engine bots are not able to do that just yet.
WordPress administrators can filter the comments by ping so that only trackbacks and pingbacks are listed and not user comments.
Well known services such as Copyscape or Plagiarismcheck provide you with search options and sometimes even monitoring for a price. Copyscape Premium for instance starts at $.05 cents per search. For that, you get options like batch scanning up to 10,000 pages for copyright issues, full access to the service's database and options to exclude results from certain sites.Advertisement