Canonical URL’s

Canonical was a term coined by Google several years ago. It is the term that they use to refer to pages that can have more than one URL. Choosing the best structure for the page is the act of canonicalization. To better understand, we will review an example.

The homepage of a website can be referred to in many ways. Users can get to the homepage through a variety of URLs, but they all go to the same place. There may be other combinations, but here are a few common ones.

  1. http://www.domain.com
  2. http://domain.com
  3. http://domain.com/
  4. http://domain.com/index.php
  5. http://www.domain.com/index.htm

For two main reasons, it is better to choose one structure and then redirect the others to the chosen URL, rather than having search engines index each and every page, and having users come to the website through any URL. By placing 301 redirects on the pages that you do not want indexed, then anyone coming to the site will be sent to the chosen URL. The chosen URL is the canonical URL.

Now that you understand the definition of canonical URL, you probably wonder what relevance it has in website building and why you should even care. The first reason you want your URLs to be standard is to avoid duplicate content issues when search engines index your website’s pages. If visitors can get to your site in four or five different ways, then you have four or five pages of the exact same thing in the search engine results. You might think that is a good idea, giving you more chances of being found, but in fact, duplicate content for the purposes of having more pages is not acceptable behavior. Duplicate content can be, at the worst a reason for site removal from search engines, and at the least a reason for filtering out some pages.

The second reason for choosing one URL over the others, is for improved search engine optimization purposes. When search engines index webpages, each page carries its own authority depending on who links to the page, and on how many other websites find the page relevant. The problem is when other websites link to your pages, you have no control over the links they use. So basically, links are found everywhere. Having many links to a website is a benefit, but when links going to the same page have different URLs, then the authority of a page is diluted. Search engines see each page as separate, so you are hampering your efforts to better rankings.

The easiest way to understand this is through simple math. If you have one-thousand websites linking to http://domain.com, then this page is very popular according to other websites. But, let’s say some web owners decide to link to any of the above five combinations. Now maybe you have one-hundred links going here, two-hundred going somewhere else, another five-hundred over there, and on and on… It does not make any sense having links willy-nilly, especially when you can control where they go right on your own server. Using canonical urls helps the search engines understand they need to pick the one option, not the other. For example, we have loads of people linking to pages on our social shopping network at Yozo.be. We prefer not to use the www. version – but people link to it anyway. Using the canonical tag, we make sure Google indexes (and values) the correct url.

Further, when you link to your own pages within your website, it is important for you to keep the same structure as well. With so many blog scripts and content management systems on the market, it is easy to forget what structure you used. By setting your canonical URLs, you can benefit from better onsite SEO as well.

To conclude, when we think about about the concept of canonical URLs and its importance for SEO, the main reason to choose one URL over the others is to get the most benefit from outside links. More links funneled to one location is better than dividing them around. It is similar to watering the lawn. The area that gets the most water benefits with nice green grass. And the area that only received a few drops of water is brown and burnt from the sun.