Duplicating your own content
Duplicating your own content
I have spent many hours lamenting over this subject and decided to blog it, here is my theory regarding duplicate content, the main problem occurs when a fairly new site makes a blog post or article and someone finds the text and submits it to a authority site, Google then spiders the authority site and index’s the re-submitted duplicate before the original gets spidered,
Duplicate content
You can define duplicate content as several copies of the same text in different areas of the internet. the problem occurs when search engine spider bots come across this information and cross-references it in its index, search engines are only going to want to display 1 version of the same text in order to maintain relevant results for its end users. usually all other copies are added to the supplemental index or worse still not even being added to the index.
Doing the right thing
Primarily, search result providers want to do what’s right and give credit to the original author for the content they write.
Determining the original author of the content
Consider we have 4 copies of the same article on posted on different domains. How is Google going to determine which is the original copy?

Here we have 4 copies of the same article. Date is the date when the content was first discovered/indexed, Pagerank equals Google’s determination of the value of the page. The smaller pages pointing to each larger page represent incoming links from other websites.
Enter the scraper site
Now lets use the same example as before, this time we will add a “credit to the author” link to article 4. Article 4 can be considered a low value rip off site, but was kind enough to credit the author with a link to the original.

Now there is a clear sign to search engine spiders that article 2 is the original.
When there are numerous copies of the same content on the internet it is difficult for search engines to determine which is the original copy. its is highly probable that part of the algorithm used will factor in the linking structure between duplicates to determine the original.
Common sense prevails on this theory and this seems like a logical assumption to make.
Duplicating your own content
This solution could be implemented if your content is being scrapped. by actually duplicating your own content and crediting yourself you can make sure that you will be determined as the original in such occurrence.
RSS feeds are growing rapidly and this problem is going to grow alongside them. search engines will need to get smarter also.