How to Recover from a Duplicate Content Penalty

With Google’s tame Panda on the prowl, it’s always possible that your site may wake up one day penalized by the dreaded algorithm. A high search ranking lost overnight. Panic at the office as you search for what went wrong. Then the words strike you; duplicate content. You don’t know why, but nothing else seems to fit. Content that was fine before is penalized now, and you have no choice. You have to do something before irreparable damage is made.

Identifying Duplicate Content Penalties

Before you can , you first need to know that it’s duplicate content that’s penalizing your site. To do this, you need to go to your Google Analytics profile. From there, find the Google Traffic section, to see the traffic coming from the search engines. Dig down and find the Organic primary dimension, and click Google as a source. Do you see a sudden drop? If it’s on a weekend, you might not need to worry; organic traffic drops regularly on weekends. Just scroll back to make sure it’s in keeping with trends over time. If it’s not, you may have a penalty on your hands.

Thankfully, you can use Google Analytics to discover if you’re penalized by duplicate content as well. In your Google Webmaster Tools, find the HTML Improvements section. In there, you’ll be able to see some indicators of duplicate content. Check the title tag section to see if there are large numbers of duplicated tags. You may not have multiple pages with that tag live, but if you’re running an eCommerce site, dynamic generation of that tag can cause issues.

The way this happens is when you use certain eCommerce platforms, the way the user navigates to your product may be different. The platform generates a URL unique to the session when the user performs a search or navigates through category menus. All of that unique information is stored in the URL, which Google may then index. This leads to a number of different dynamically generated URLs to the same product page. Different URL, same content, duplication warning.

A similar issue comes from search results pages on large eCommerce sites. The search results page may show 5 or 10 products on a given page, but 90% of the content outside of those products is the same. Google reads this as duplicate content, particularly when you have hundreds of pages of similar design.

On a side note, not all penalties are automatic applications of the algorithm. It’s possible that you may have received a manual penalty. In the Webmaster Tools dashboard, under Search Traffic, there is a manual actions button. Click it and see if you have received a message about manual sanctions applied to your site. This applies most often to a bad link profile rather than duplicate content, but it might still apply.

Fixing Duplicate Content Issues

Google Panda is trying to provide the best possible results to searchers. This means anything with too little content, content that’s valueless or content that’s duplicated can be the target of a penalty. So how do you deal with each of these issues?

NOINDEX. Once you have identified the content that is probably holding your site back, you have a few options. The first is to to that content. This tag goes in the header of your code and it tells the search engine crawlers not to index the content at all. This is useful for certain duplicate pages that cannot be removed but are not useful to be served up by a search engine. You are essentially telling the search engine that you need the page, but you don’t want it shown to your visitors. What sort of pages should you NOINDEX?

• Regular email promotions. If you host html versions of emails, often they look very similar from week to week. If you want to keep old promotions up, for reference or archival purposes, you need to NOINDEX them.
• Pages that are near but not total duplicates of other content you need on your site.

You should maintain a FOLLOW tag in along with your NOINDEX; this allows search engines to follow any links on the page, even if it isn’t indexing the page itself. Otherwise you risk breaking off some content and hiding it from search engines behind a wall of NOINDEXed pages.

Rel=”canonical”. This is a more elegant solution than the NOINDEX for most eCommerce duplicate content. Take again the example of the dynamically generated URLs. No matter what path the user takes to reach a given page, the page they land on is always the same. The difference in URL is the reason for a duplicate penalty. The correct solution in this instance is to use the Rel=”Canonical” tag. This tag, when coupled with a URL, tells the search engine that this page is an intentional duplicate of the linked page. Every product page, for example, would have a Rel=”Canonical” tag pointing towards the primary static URL for that product. Any dynamically generated link to that product will be flagged as non-canonical, and the search engine will instead pretend it is the canonical link. Instead of a hundred links all pointing to the same page, you now have one.

This is your most elegant solution to the problem. You need to make sure, however, that you never tell Google that the canonical page is NOINDEX. If you do, you’re telling Google that all of these duplicate pages are actually a page that it can’t see. This essentially removes the page entirely from your SEO, which you don’t want.

Rel=”next”, “prev”. These are special tags for the issue mentioned above with search results. with only a small percentage of the page changed from result to result. The rel=”next” and prev tags are tags you apply to your page number buttons at the bottom of the search results. This tells Google that the pages before and after the current one are actually segments of the same page. Essentially it takes your 20-page search results and expands them as though they were one long page with every result showing. No more duplicate content issues; even if on the live site they are separate pages, on the Index they look like one page. This is also helpful for combining the effects of SEO from every individual page into one superpage.

Addressing Thin Content

Sometimes your problem is not duplicate content, it’s thin content. Thin content is pages on your site with fewer than, say, 300 words of content on the page. With so few words, Google decides that your page can’t have all that much of value to show to a user. The 300-word count is just an estimate; , and it can let a 100-word page pass as valuable. It’s a judgment you need to make upon reviewing you site.

Thin content has three options. You can remove the page entirely, you can hide it with NOINDEX or you can add more content to make it valuable. Adding content is the best option, of course, as long as it’s a page you can expand. Product descriptions often fall into this category. Fortunately, spicing up your product descriptions is not a difficult task.

Once you’ve dealt with your thin and duplicate content, you can tell Google you’re ready for a review and the search engine will check out your site. If the issues it saw have been fixed, your ranking will be restored.