Duplicate content is exactly what the term suggests it to be: It is content of which a duplicate exists somewhere on the web. The term duplicate content applies to all kinds of content: text, video and images, but in the context of SEO it mainly refers to textual content. Every page, article or even a snippet of text can have a duplicate somewhere on the web. The existence of duplicate content has three main causes:
- One page in a website can be accessed via multiple URLs
The most common examples are homes page that can be reached via www.example.com, example.com, example.com/index.html and www.example.com/index.html. This can be prevented by using 301 (permanent) redirects - The same or very similar content is shown in different ways on different urls in one website
The most common examples are lists of items that can be sorted in multiple ways and every sort state is displayed on another URL. - Copies of your content exists on other websites
The most common cause is other webmasters hijacking your content. There are many lazy webmasters out there who rather steal content from others than write their own. Not all copied content is stolen though. YouTube videos for instance are offered to everyone to embed on their own website and using RSS feeds is a way for authors to yndicate their content.
To find out if there are copies of your content out there you could use a tool like the one offered at copyscape.com
On site issues with duplicate content can easily be found with free tools like the one available from virante.com
To see how similar two pages are you could use the tool provided at duplicatecontent.net
Duplicate content issues can be fixed by making only one version of the content indexable by spiders. This can be accomplished by:
- Deleting all duplicate pages and making sure that you fix all links that might break. (This can be a lot of work.)
- Using robots.txt, no-follow and no-index tags to prevent spiders from indexing duplicates.
- Using 301 redirects to redirect all traffic from the duplicate content to the original content.
- Using the canonical tag can also be very useful to prevent penalties from Google for duplicate content.
See what more Matt Cutts has to say about duplicate content and the canonical tag on mattcutts.com. As you can see here too the people at Google do make a lot of fuss about duplicate content. If you can be penalized for having duplicate content on your site is not clear, but I think Google cannot do that. As all big news agencies like BBC and CNN duplicate content provided by agencies like Reuters and Associated Press, it seems very unlikely that Google’s can punish you for republishing content. For furhter reading see the post “SEO: There is no duplicate content penalty” on practicalecommerce.com.
Further reading: