Plagiarism Detection:
How to Win Against Thieves
Who Steal Your Articles!




Plagiarists love your original content published at EzineArticles and other honest publishers because it ranks high in Google's search results. The trouble is that plagiarists do not include a link back to your site or author credit—because they do not publish the resource box or include a link back to the article source. Here are 5 steps you can take to protect your content, detect plagiarism, and get unauthorized copies of your content removed from the World Wide Web:
  1. Include copyright and author information when creating your articles,
  2. Set up an early detection system for finding plagiarists,
  3. Identify and contact the offenders,
  4. Identify and contact their registrars or hosts, and
  5. Submit a Digital Millennium Copyright Act (DMCA) complaint.

1. Include Copyright and Author Information with Your Articles

The first step in the war on plagiarism is to provide copyright information in the article body as well as author information in the resource box. Within the article body, you can include a copyright notice and the article title with it's date of publication. Here is an example of what I use at the end of my articles:

Copyright © 2010 Royce Tivel Select Digitals selectdigitals.com.
After the Crash - Fixing a Tower Trainer 40 ARF, April 5, 2010

Depending on the publisher's article-submission requirements, you may not be able to use an active link or domain name in the article body, as I did above. Even if these are permitted, all active links and URLs in the article could be stripped by the plagiarist, although a non-hyper linked reference to your site might still remain—especially if the plagiarist is using software to automate the theft.

You can use the resource box to positively identify yourself as the author and can include a link to your web site or blog. Here is an example:
About the Author: Royce Tivel has written extensively about digital photography, Adobe, radio-controlled (RC) airplanes, WordPress, travel, and more. Visit his web site at Select Digitals, selectdigitals.com, for additional content on these subjects, including many images related to his articles published at [publisher's name goes here].
The HTML code for the link looks like this—and I would strongly recommend using a hyperlink to your site in the resource box:

<a href="http://www.selectdigitals.com" target="_blank" title="Content by Royce Tivel at SelectDigitals.com">Select Digitals</a>


An honest publisher will include the resource box, will not tamper with the article body, and will provide a link to the article source. If a plagiarist strips out the resource box or neglects to include a link to the article source, the chances are still good that the copyright and author information will be left in the article body.

2. Detect the Plagiarism Early

Plagiarism detection begins by setting up an early warning system for plagiarists. I estimate that 90% of all article theft is done when the article is first published. The worst offenders appear to be plagiarists with blogs. Today, content can be easily gathered with content-aggregator software through RSS (Really Simple Syndication) feeds, manipulated, and placed on a blog. "White hat" content aggregation that includes author credit and article source information is great for authors--but "black hat" manipulation of the aggregated content, which removes the author and source information is just plain article theft.

Many WordPress sites are using the Multi User (MU) version and offer "members" a free WordPress blog as a sub-domain. An offshoot of WordPress is BuddyPress—and I have found plagiarized content at these sites, too. I have found that there is little or no supervision or monitoring of the "members." I have also found that the administrators of the MU sites will terminate a blog when they receive a report of plagiarism. In the case of a sub domain on an MU site that has plagiarized your material, the registrant in a lookup will be the "owner" of the domain who is responsible for the sub-domains. Your plagiarism detection system must first identify the plagiarist before you can report them to the administrators.

Because of the blog problem, a Google Blog Search on the title of the article, a keyword, a phrase, or a "snippet" from the first paragraph of the article—using quote marks around the search term(s)—is probably your best *no cost* tool for plagiarism detection (Figure 1). Jonathan Bailey at plagiarismtoday.com has this advice for searches:

"I would focus not on titles but statistically improbable phrases within the work, 8-10 words long. Those produce good matches and are easy to find in a work."


Figure 1: Google Blog Search
Google Blog Search

Once the search is completed (Figure 2), and if there are results matching your quoted search query, you will be able to look through the results for plagiarized content. Since the first result shown below came neither from my site nor from my article publisher, I would certainly want to check it out.

Figure 2: Blog Search Results
Google Blog Search Results

Google's search results include a title (blue), a snippet (black text), and a URL (green). The URL will include the domain name of an offending site. Clicking on either the title or URL will take the browser to the actual blog or web page. The domain name will also appear as part of the URL in the browsers address bar.

Even if the snippet contains plagiarized text from your article, the title or URL may take you to pages with no trace of your article. This can happen when your plagiarized article is published by the plagiarist, gets listed in Google, and then the plagiarist substitutes his own page content for your article: the plagiarized content remains in the snippet but the links go to the plagiarist's own content, thus hijacking your traffic! The remaining "footprint" left in the snippet can be enough to shut down a site or blog.

A great feature of the Google Blog Search comes at the bottom of the results page. At the end of the results are options for setting up email alerts—the early warning system—so you can be notified when sites use the search term in the future (figure 3).

Figure 3: Alert Options
Google Blog Search Alert Options

You are most likely to see plagiarized results show up within the first few days of publication; so, I recommend that you set up your alert to receive an email once each day. You can end the alerts at any time. The alerts can be limited to blogs or contain comprehensive results for the Web as well: I elect the "comprehensive" option for my email alerts (Figure 4).

Figure 4: Create a Google Alert
Create a Google Alert
3 Identify and Contact the Plagiarist

The best way to identify a plagiarist is to do a "whois" or similar "lookup" on the domain name. Using a "whois" lookup for the domain name will display contact information for the domain-name registrant. In my experience, plagiarists do not usually leave contact information on their pages, but the domain registrant is required to include it when the domain is registered--but plagiarists do not always include valid contact information! If you do not find valid contact information for the registrant, you can contact the registrar about this. Figure 5 shows the "lookup" form at domain white pages.

Figure 5: Whois Lookup at domainwhitepages.com
Whois Lookup at domainwhitepages.com
Depending on the lookup service used (internic, domaintools, domainwhitepages, etc.), the contact's email address might be an image and not text. In that case, you will have to type out the email address. Here is the registrant's information from a lookup of my web site:

Address lookup
Domain Whois record
Registrant Contacts


In the case of selectdigitals.com, all of the information necessary to contact the registrant is available. In my experience, registrants of MU sites have responded promptly to my complaint and have removed the offending "member"; so it is worthwhile to make the attempt and allow two or three days for a response. This gives the registrant a chance to comply with the original publisher's terms of service or to remove the content completely.

Sometimes, the registrar or registration service will provide a "firewall" for a registrant. At NameCheap.com, this is called "WhoisGuard." The registrar's contact information is given in the lookup and emails to the registrant are forwarded without giving away the registrant's "real" contact information.

Your goal in contacting the registrant is to get the article published accurately, completely (including resource box), and identified with the complete article source. You can help the honest publisher by supplying the article title, a link to the article source, and a copy of the resource box.

I have found that contacting a plagiarist by email is the least effective method of removing plagiarized content. Still, this attempt should be made to give the honest publisher a chance to make necessary changes. Also, the fact that you have made the attempt will give more weight to your complaints sent to the registrar, host, or to Google. Give the suspected plagiarist two or three days to respond.

Translating Your Documents into a Foreign Language

If you are trying to contact a registrant, registrar, or host in a foreign country (non-english speaking, in my case), you can take advantage of the Google translation service (see "References" section at the end of this article). I first create the letter in english and then use the translator to convert my letter into the foreign language. When you do this, it is very important to test any links you wish to include in the foreign version as, especially for complex links, some of the characters in the link might be converted to the foreign language and the link might not work: you might have to modify a translated link so it works.

My practice is to email the letter in english (my native language) together with a translated version. I typically use MS Word to compose my letters and then paste them into my email client. Note: I suggest "plugging" the translated copy back into the translator as a check: translating back to the original language might reveal problems with the translation that will have to be fixed. Also, some languages produce better translations than others.

4. Identify the Registrar or HOST

A lookup of the plagiarist's domain name will include a list of the domain-name servers(DNS). From the DNS information listed in the lookup above, the web host is clearly identified as "IXWEBHOSTING.COM":
Name Server: NS5.IXWEBHOSTING.COM
Name Server: NS6.IXWEBHOSTING.COM
A lookup on a DNS will yield additional information about the host used by the plagiarist—and the host's contact information. Here is some of the information available from a lookup of "ixwebhosting.com":

Host Contact Information
Similarly, a lookup for the registrar listed in the original "whois" will result in additional information about the registrar. A reputable registrar or host will provide information about reporting copyright infringements. As an example, the copyright policy for my domain's registrar can be viewed here:enom.com.

Registrars often use resellers for the business of domain registration. The resellers also have stringent policies against abuse. For my domain, the reseller is listed in the original lookup as follows:
Registration Service Provided By: NameCheap.com
Contacting the registrar or host is probably the most effective way to take a plagiarist's site or blog off the air. Here is what I do. I create a Digital Millennium Copyright Act (DMCA) complaint, just as I would for a complaint written for Google, except I do not use a title directed to Google. Both hosts and registrars take these complaints very seriously and, in my experience, take fast action to block the offending sites from web access. Give the registrar or host two or three days to respond before going any further. I use this format for my complaints:

  1. To: [registrant, registrar, or host name]
  2. Date: [date and time]
  3. Identify the copyrighted work,
  4. Identify the offending web page, including the search query used to find it ("tower trainer 40"),
  5. Provide your contact information,
  6. Provide contact information (if any) for the plagiarist (the email address you used for the registrant),
  7. Include specific language as to the accuracy of your complaint, and
  8. Optional: If I have additional information, I put it here.

When you identify a plagiarist from another country, it might seem like an impossible task to get the content removed, but you might be surprised. Recently, I was able to get a web site blocked by a Korean registrar, co.cc. After identifying the registrar, I looked at their terms of service and here is what I found:

"You agree that you will not upload, distribute or reproduce on the Web Site:
a. any copyrighted material, trademarks, or other proprietary information without obtaining the prior written consent of the owner of such proprietary rights...."


After I submitted my complaint to the domain service registrar, co.cc, I got a response the next day:

"Dear Sir, In reponse to your request, we have suspended ..., the domain won't work with co.cc domain for now.

However, I would like to inform you that we are just a domain service registrar. For that reason, we do not have any authority over deleting original web site. It seems keep happening no matter how many times we block up this kinds of sites, abuser do not stop abuse co.cc domain.

Please let me know if you face this kind of issues in the any future, I will try to take prompt action.

Thank you. "


The response reflects, I think, the frustration registrars and hosts feel in dealing with the huge problem of plagiarism. In this case, even though the site did not get deleted, it is no longer visible on the Web. If the site still remains in Google's search results, a Google DMCA should take care of the problem. Jonathan Bailey has this to say about contacting registrars:

"...even though it can work, I tell people to avoid sending notices to registrars as almost none will actually revoke a domain over a copyright issue. They will only do it if there is an issue with the domain itself. Your interaction with co.cc was the exception, not the rule (for better or worse)."


5. Submit a DMCA Complaint to Google

If nothing else seems to work, you can FAX a DMCA complaint directly to Google. Instructions for submitting a DMCA can be obtained from the resource link at the end of this article. Google has both legal support and AdSense support. Each support group has it's own FAX number for DMCA complaints (legal: (650) 963-3255; AdSense: (650) 618-8507). For action against a Google blogger, you can file a DMCA complaint online: check the resources at the end of this article for a link.

If AdSense is on the site along with the plagiarized content, a DMCA complaint to Google AdSense support just might hit the offender in the pocket book. Revenue from AdSense is often the primary reason plagiarists use your articles—your valued content draws increased traffic to the AdSense site.

A useful add-on for FireFox users is SeoQuake. When this add-on is activated, hovering over an AdSense ad will bring up the "AdsSpy" with a link to information about the plagiarist's AdSense ID (Figure 6-7). The plagiarist's ID can be included with the DMCA complaint.

Figure 6: AdsSpy
AdsSpy

Figure 7: AdSense ID
AdSense ID

Plagiarism, Plagerism, Plagirism, Plaigarism

You don't have to know how to spell "plagiarism" to join the fight against plagiarists. You can still detect plagiarism and join the war to remove it by After four years of college and after writing this article—I can still misspell plagiarism with the best of 'em. My favorite way to misspell it is, "plagerism."

Resources







 I hope you enjoyed this article,
Royce Tivel


Royce Tivel

 Royce Tivel
rtivel@selectdigitals.com







Select Digitals Home Page


Valid HTML 4.01 Transitional