There are a couple of reasons why you may want to make sure no other website on the Internet has published contents that you have created without proper authorization. The main reason from a webmaster perspective is duplicate content that is caused by scrapers. Google, Bing and other search engines most of the time gets it right, and will rank your content in front of the scraping sites. Sometimes however they do not so that your site ends up taking a backseat as it is outranked by sites that have copied your articles on theirs.
There are other issues that need to be mentioned, like attribution or the chance that you are associated with a website that you have no affiliation with whatsoever.
Webmasters have a couple of options to deal with scraping sites.
One of the best options to go about that is to copy a sentence or paragraph from your article and search for it on sites like Google Search or Bing.
I suggest you add the sites to a list first before you visit them one by one to find contact information. Instead of searching for a sentence in your article, you can alternative search for the title in quotes instead. That works only however if it is a unique title.
Another option that you have is to look at the trackbacks and pingbacks that your website receives if the data is available to you. WordPress for instance displays those information in the admin interface. Here you then need to click through to the sites to see if and how they have copied your contents. Some may only have quoted your content, or only referenced it as a link, while others will have copied it word for word on their sites.
Here are a couple of trackbacks of a site that not only scrapes the content, but also runs it through so-called spinning software which automatically replaces words with other words or phrases so that it passes as unique content and not duplicate content. While it is obvious for human readers that the contents do not make any sense, search engine bots are not able to do that just yet.
WordPress administrators can filter the comments by ping so that only trackbacks and pingbacks are listed and not user comments.
Well known services such as Copyscape or Plagiarismcheck provide you with search options and sometimes even monitoring for a price. Copyscape Premium for instance starts at $.05 cents per search. For that, you get options like batch scanning up to 10,000 pages for copyright issues, full access to the service's database and options to exclude results from certain sites.
Advertising revenue is falling fast across the Internet, and independently-run sites like Ghacks are hit hardest by it. The advertising model in its current form is coming to an end, and we have to find other ways to continue operating this site.
We are committed to keeping our content free and independent, which means no paywalls, no sponsored posts, no annoying ad formats or subscription fees.
If you like our content, and would like to help, please consider making a contribution:
Ghacks is a technology news blog that was founded in 2005 by Martin Brinkmann. It has since then become one of the most popular tech news sites on the Internet with five authors and regular contributions from freelance writers.