So you work really hard, literally day and night, to ensure that your website stands up the competition in search engines like Google, Bing etc. You put a lot of effort in SEO - write descriptions, meta keywords, catchy page titles, do strong internal linking and much more. However somehow you fail to rank higher or worse don't appear at all in the results pages of these search engines....
Canonical/duplicate URLs may be the issue.
Canonical and duplicate URLs
In simple terms, out of a list of multiple URLs pointing to the exact same or almost same location a canonical URL is the original, prefered URL.
And the rest of the URLs are considered to be duplicate URLs. Taking an example consider the following list of URLs pointing to the exact same location:
For you all these URLs would be the same but for a search engine they are all absolutely different.. and this is where the issue begins.
What is the issue?
To understand the issue we'll have to consider a scenario.
You submit http://example.com to Google's search engine and it consequently crawls that link and saves it in its index. Now someday you submit http://www.example.com (note the www here) to Google by specifying http://www.example.com in the href of a link on your site, let's say in the a element. What Google does and will do for you site too is crawl links from links. It will crawl the href mentioned above from the refering link, and as soon as this happens it would notice one thing - the new URL resembles one previously indexed URL that is http://example.com and would consequently push down or even worse remove both the URLs from its SRP (search results page).
So this is the issue - that your webpage is removed from Google! All your hard work, time spent on writing amazing content gets trashed in just one silly attempt.
But no one can keep one from mistakenly specifying a different URL pointing to the same location. And so there are different things to consider in solving such problems.
Perhaps the easiest way to not get into canonical hot waters by the www, non-www and/or https and non-https URLs is to do external URL redirection.
This redirection ideally happens on the server, and can be easily applied in Apache servers under their configuration files. See how to force www and https in .htaccess for more details.
What would happen in this case is that you'll decide one URL structure of your web page let's say https://www.example.com, which has both www and https, and then redirect all other, non-www and non-https, requests to it. This will ensure that even if you mentioned http://www.example.com in an href attribute mistakenly you and the search engine will be redirected to the prefered single version. One location, no issue!
Canonical meta tag
Sometimes URL redirection isn't the simplest way to get out of canonical and duplicate remains on your website like in the following cases:
Although it can do the saving job, URL redirection code would become quite a bit complicated as compared to the simple trick we are about to give just now.
And that trick is of having a meta canonical tag on your page.
The idea is that we specify the prefered, canonical URL in the href attribute of a meta tag with rel="canonical". This informs the search engine that the canonical URL to be crawled more often and put into index is the one mentioned in href and all the other ones are duplicate.
<meta rel="canonical" href="http://www.example.com/books/book1.php">
The address bar won't of course change for both the above URLs i.e there will still be different URLs pointing to the same location however now we won't have to worry about any such duplicate content issues since we have the meta tag for the rescue!
And you don't care that much whether two URLs point to the same location... or do you?