Whilst robots.txt allows you to manage the accessibility of your site’s overall content to crawlers, it doesn’t indicate if the pages should be indexed. That’s where meta robots tags come in – they prevent unwanted webpages from being indexed.
However, as there’s a common misconception that robots.txt can control indexation, many SEOs are a little unsure how to use meta robots tags correctly – here’s what you should know.
What is the meta robots tag?
Quite simply, a meta robots tag tells search engine crawlers exactly how to crawl and index the pages of a website, as well as how snippets are served in SERPs.
Meta robots tags are pieces of HTML code that sit in thesection of the site – they look a little something like this:
Why are meta robots tags important for SEO?
There are several SEO benefits to using a meta robots tag. Firstly, using the directives give you more control over how search engines crawl and index your content – without these instructions, search engines will automatically attempt to crawl and index everything on site.
There are likely to be pages on your site that you definitely don’t want search engines indexing, such as pages with thin content, internal search results and PPC landing pages – and this is especially true of bigger sites, where you’ll need to take extra care to stay on top of crawlability and indexation.
Meta robots tags are page-specific, directing search engines on how they should deal with the page. It’s so important for your site’s SEO that page-level directives are combined properly with your sitemap and robot.txt files.
Meta robots tag attributes and directives
Meta robots tags comprise two attributes. The first part of the tag is the name attribute, which specifies which crawlers – also known as user-agents (UA) need to follow the instructions.
Usually, you’ll want to instruct all crawlers to follow the instructions in the meta robots tag, so it will be:
Although most of the time you’ll want to use ‘robots’ as default, it’s also possible to use as many meta robots tags in thesection as you need. So, if you wanted to instruct Google and Bing’s UAs specifically, it would look something like this:
The second attribute in meta robots tags is the content attribute. This section instructs user-agents how to crawl and index on-page content. Without using a meta robots tag, the automatic default is for the page to be indexed and all links to be followed.
So, what directives can you use in the content attribute? Here are a few of the most commonly used:
- index – include the specified page in the index,
- noindex – don’t index the page or show it in the SERPS,
- follow – follow the links on the page,
- nofollow – don’t follow the links on the page,
- none – this is a shortcut that specifies noindex, nofollow,
- all – a shortcut that specifies index, follow,
- noimageindex – don’t index the on-page images,
- noarchive – don’t show a cached version of the specified page on the SERPs,
- notranslate – stops Google from offering a translated version of the page in SERPs,
- nosnippet – prevents text and video snippets from showing within the SERPs.
Remember that you can use multiple directives at once, but make sure not to include conflicting directives – for example, “follow, nofollow” – or Google will automatically default to the most restrictive directive. In this instance, it would make the page nofollow.
What is the x-robots-tag?
There is an alternative to meta robots tags if you’re trying to control how search engines crawl and index webpages – say hello to the x-robots-tag. However, it is a bit more complicated to implement than meta robots tags.
The x-robots-tag is used to stop search engines from indexing non-HTML content, such as PDFs or images. It’s also great for controlling how certain elements of a page are indexed, rather than the page in its entirety. The x-robots-tag is an HTTP header response as opposed to an HTML tag, but any meta robots directives can be specified as an x-robots-tag – but do remember that you should avoid using both on the same page.
In order to implement the x-robots-tag, you’ll need access to your website’s server configuration file, header .php or .htaccess. Without access to these, you’ll just need to use your meta robots tags to give instructions to crawlers.
For any site, it’s imperative that you get to grips with managing how your site is crawled and indexed – having control over this means you can stop search engines following links, prevent unwanted pages ending up in SERPs, control how your snippets are displayed and much more.
It may be a little complicated at first, but the robots are your friends!