The internet is teeming with content, and of course we’re aiming to keep ours as original as possible. However content is bound to become duplicated to some degree eventually.
The result could simply be some keyword cannibalization, or Google simply not indexing swaths of your website - likely overlooking the specific page you want to rank.
In short, it’s something you want to keep on top of.
Let’s start with what Google defines as duplicate content, which is essentially “substantive blocks of content within or across domains that either completely match other content or are appreciably similar.”
Put another way, you can mention a topic on multiple pages, but avoid copy + pasting the same content on those pages.
In terms of detecting duplicate content:
- Google Search Console provides great insight through HTML Improvements and Coverage reports, which can show you duplicate title tag, descriptions, and pages Google excluded.
- Online tools such as SEMRush (which are the only two I have experience with in this context) can scan your content and tell you the percentage of duplicate content you have.
While researching this post, tools such as Siteliner, and Copyscape are also apparently good at this.
- Manually - this is relatively easy for small sites if you only have a handful of pages. Simply type “site:yourdomain.com” in Google and use a small snippet of text in quotation marks. This will tell you where overlap is.
You can also use this process to see where keyword cannibalization is occurring, if you put type the keyword in question, in quotations along with the “site:yourdomain.com” text.
This process will also tell you where each page is ranking for each keyword in question.
Okay, now how do we fix duplicate content if we found it?
- Simply re-write content as best you can, include reviews, unique product information - really anything to make the content unique.
- Consolidate the content onto the best ranking page, and 301 redirect the old pages to the “best” page.
If you have a lot of blogs that require this process, it can take a long time to do properly.
- If all else fails, use the NoIndex tag in the HTML head of the page to tell Google that you don’t want that page to rank.
Unfortunately I don’t have any method to prevent duplicate content from happening outside of regular check-ups on Google Search Console, which is my preferred tool.
With that being said, what’s your go-to method? Do you have a process that prevents duplicate content from happening on your site? If so, I’d love to know.