Advanced SEO meta tags: canonical, noindex, nofollow & robots.txt
Every week, we get questions about which meta tags a site should be using to achieve X result. In many cases, people are trying to accomplish relatively simple tasks with overly complicated methods. When it comes to SEO, the best message you can send to search engines is the simplest one.
Here's a simple breakdown of the intended purpose of each tag, including examples of when and how they should be used.
Possibly the most misunderstood of meta tags, rel="canonical" seems to pop up all over the place—and rarely where it should!
Intended Purpose: To tell search engines the original source of content, when 2 or more copies of that content exists.
Best Use: To give credit to the original source of content, when you're republishing that content on another page of the same site or different site.
Common Misuse: Implementing canonical tags on pages that look similar, but aren't actually duplicate content, such as paginated listings pages or tabbed listing details content. rel="canonical" should only be used when content is essentially identical.
The noindex tag is actually a message to bots, asking those bots not to include the page in the index. Note that noindexed pages will still be crawled.
Intended Purpose: To ask search engines to exclude a specific page (or set of pages) from the search engine results.
Best Use: To eliminate pages that you don't want included in search engine results, such as a promotion that has expired. Noindex can also be used to eliminate your low performing or "too similar but competing" pages from SERPs. Note: doing this incorrectly can be catastrophic for a site; always consult a trusted SEO expert first.
Common Misuse: Using noindex to hide private documents. Google respects noindex requests, but not all search engine robots are as honourable.
Robots.txt isn't actually a meta tag, but it comes up a lot when discussing them! A robots.txt file tells Google not to crawl specific pages (or sections of the site), but does allow those pages to be indexed.
Intended Purpose: To ensure search engine bots aren't wasting time by crawling pages of low importance.
Best Use: Don't. Most sites don't need one. If you want a robots.txt file, use it to ask crawlers not to spend their dedicated time for your site on pages or sections that are of low importance.
Common Misuse: Using robots.txt instead of noindex to eliminate results from the SERPs. Also, using robots.txt to hide private documents—search engines often still include the URLs of these pages in SERPs.
The nofollow tag prevents search engines from "following" the link, and therefore reading the content, or passing link juice and anchor text information. Historically, this meant the link juice would stay on your page, but updates in more recent years mean this link equity dissipates instead.
Intended Purpose: To ask search engines not to follow a specific link, or all links on a given page.
Best Use: To tag paid ads or links to content that you don't have control over. It can also be used to ask search engines not to follow links to low priority pages on the site, such as dashboard login pages or paginated pages.
Common Misuse: Using nofollow to heavily sculpt internal links instead of creating a logical navigation structure for a site.
Don't make things harder for yourself...
Google has given SEOs and webmasters an arsenal of tags, which essentially serve as preference instructions for search engines. But too many people misunderstand these tags, and try to use advanced methods to accomplish tasks that are actually very basic.
By developing a site that is well organized and logical, there should be very little need for advanced meta tags. The use of canonical, noindex, nofollow and robots.txt is best reserved for the rare exception, and shouldn't be part of your ongoing SEO strategy.