Duplicating your own content

Duplicating your own content 

I have spent many hours lamenting over this subject and decided to blog it, here is my theory regarding duplicate content, the main problem occurs when a fairly new site makes a blog post or article and someone finds the text and submits it to a authority site, Google then spiders the authority site and index’s the re-submitted duplicate before the original gets spidered,

Duplicate content

You can define duplicate content as several copies of the same text in different areas of the internet. the problem occurs when search engine spider bots come across this information and cross-references it in its index, search engines are only going to want to display 1 version of the same text in order to maintain relevant results for its end users. usually all other copies are added to the supplemental index or worse still not even being added to the index.

Doing the right thing

Primarily, search result providers want to do what’s right and give credit to the original author for the content they write.
 

Determining the original author of the content

Consider we have 4 copies of the same article on posted on different domains. How is Google going to determine which is the original copy?

Example 1

Here we have 4 copies of the same article. Date is the date when the content was first discovered/indexed, Pagerank equals Google’s determination of the value of the page. The smaller pages pointing to each larger page represent incoming links from other websites.

  • Article 1 was indexed a couple of weeks after the other copies, as a search engine you probably would decide that this is not the original because it wasn’t published first.
  • Article 2 and 3 have the same good Pagerank, were first indexed about the same time, and have the same number of incoming links.
  • Article 4 was indexed slightly after articles 2 and 3, and it also has less PR and less links, so as a search engine you might conclude this is not the best copy to list either.Thinking like a search engine you would come to the conclusion that either article 2 or article 3 were the original content. Google is at this point probably going to have a guess as to which one to display, in most cases its 50/50 if the original author gets penalised.

    Enter the scraper site

    Now lets use the same example as before, this time we will add a “credit to the author” link to article 4. Article 4 can be considered a low value rip off site, but was kind enough to credit the author with a link to the original.

    Example 2

    Now there is a clear sign to search engine spiders that article 2 is the original.
     

    When there are numerous copies of the same content on the internet it is difficult for search engines to determine which is the original copy. its is highly probable that part of the algorithm used will factor in the linking structure between duplicates to determine the original.

    Common sense prevails on this theory and this seems like a logical assumption to make.

    Duplicating your own content

    This solution could be implemented if your content is being scrapped. by actually duplicating your own content and crediting yourself you can make sure that you will be determined as the original in such occurrence.

    RSS feeds are growing rapidly and this problem is going to grow alongside them. search engines will need to get smarter also.

  • No Comment

    You must be logged in to post a comment.