What are URIs?
URIs are one of the most fundamental aspects of the World Wide Web.
They allow us to uniquely identify resources on the internet and essentially help us navigate our way around it. URIs are an integral part of the web, for if we take them out of the equation, the web would cease to function the way it does right now.
URIs define the entire web rather than being a part of it.
To get to the definition:
There are two classifications of URIs:
- Uniform Resource Locators, or URLs, are used to locate resources on the web instead of just uniquely identifying them. Almost every single person who uses the internet uses URLs at one point or another .
- Uniform Resource Names, or simply URNs, are used to identify given resources but not necessarily locate them (as is otherwise the case with URLs). URNs are not as common as URLs.
A common misconception amongst beginner, sometimes even experienced, developers is that URIs and URLs are the same. Strictly speaking, that's NOT the case.
URIs and URLs are not the same!
As we learnt above, URIs can either be URLs or URNs; all URLs are URIs but not all URIs are URLs. In other words, URIs are a superset of URLs.
This misconception, that URIs and URLs are the same, arises from the fact that most — in fact, almost all — of the URIs that we use on the web are URLs.
Yes for sure, when a URI is a URL, we can call it either of these. But such a notion should not take us into believing that URIs and URLs are literally the exact same thing. Absolutely not.
A URL is just one classification of a URI, the other being URNs.
The components of a URI
In theory, a URI is a very simple concept. But to further simplify it, it's broken down into individual components, each serving a different purpose.
Here's an illustration of the general syntax of a URI:
scheme:[//authority]path[?query][#fragment]
There are five components depicted here: scheme, authority, path, query and fragment.
Let's see what each of these components does...
Scheme
The scheme, sometimes also known as the URI's naming scheme, is the most important part of a URI — the entry point into the URI.
For example, the scheme that we're all mostly familiar with on the web is https
. The https
scheme lays forward a specification of URIs that are used identify resources served with the HTTP protocol. Similarly, mailto
is used to identify resources that represent emailing addresses.
There are many different kinds of URI schemes. Some of the most popular ones are http
, https
(http
with encryption), mailto
, tel
, telnet
, ftp
, file
, data
, ssh
, irc
, etc.
Each of these schemes is used to produce URIs whose meaning depends upon the semantics and syntax of the scheme itself. Different schemes use different parts of the general URI syntax (shown above) differently.
In this chapter, we're only concern ourselves with the following schemes: https
(and http
), file
, mailto
and tel
.
Authority
Many URI schemes leverage the concept of an authority, sometimes also known as the naming authority, to hand over the path and the following components to, for the purpose of resolution.
The authority begins with two forward slashes (//
) and ends with the next delimiter in the URI. It is comprised of further subcomponents, which, just like a URI, can also be expressed using a generic syntax.
The generic syntax of a URI is described below:
[userinfo@]host[:port]
It's worthwhile noting that NOT every URI is composed of an authority.
As an example, in the https
scheme, the authority component is simply comprised of a domain name. That is, in the HTTP URI https://example.com
, the authority is //example.com
, where example.com
is simply the domain name of the underlying resource.
In the following section, when we explore the https
(and http
) URI scheme in detail, we'll explore the authority component along with it.
Path
The path is perhaps the most ubiquitous component of a URI to understand.
Just like the authority component applies to the preceding naming scheme, the path applies to the preceding naming scheme along with the naming authority (if any).
As the name suggests, the path typically addresses the 'path' to get to a resource. This path may either be physical, i.e. it represents an actual filesystem path on a computer, or be abstract, i.e not existing in reality but processed by a computer for an appropriate response.
Once again, taking https
as an example, in the following URI,
https://example.com/lectures/lecture-1.html
,
the path is /lectures/lecture-1.html
.
We'll learn more about paths, specifically in the https
scheme, later on in this chapter when we explore the https
scheme in detail.
Query
After the path comes the query in a URI.
The path begins with (and includes) the ?
symbol.
We don't need to worry about understanding the query component at this stage because it's mostly used by server-side software and sometimes even by JavaScript, both of which are avenues that we're yet to discover.
Fragment
The fragment is the final component of a URI. As with the query component, not all URIs leverage the fragment.
The fragment begins with (and inclues) the #
symbol.
When used in https
(or http
), the fragment represents a destination anchor in the underlying resource, which is customarily an HTML document. The name of the anchor is simply the value of the fragment component, excluding the #
symbol.
We'll see more about fragments and destination anchors in the next HTML Hyperlinks chapter.
The http
and https
URI schemes
It won't be wrong to say that, by far, the most popular and commonly-used set of URIs on the web fall under the category of http
and https
URIs. The webpage that you're currently viewing is also identified by such a URI.
So what's so special about https and http?
Well, to start off the discussion, the http
scheme denotes URIs that apply to the HTTP protocol.
HTML, HTTP and URIs are the three core technologies that formulated the web in the early 90s. HTML was the format of the data, HTTP was the way this data was transferred, and URI was the way to identify a given piece of the data.
As simple as it could get.
HTTP can transfer much more than just hypertext!
Although the name 'HTTP' suggests that the protocol is only capable of transferring hypertext, this has changed over the course of the years since its inception — HTTP today can transfer any kind of data, including such things as images, videos, audios, PDFs, binary files, our very own HTML files, and lot more.
In that sense, one might say that the term 'HTTP' today doesn't truly encompass what the protocol is capable of transferring, and it won't be wrong to say this.
Actually, the term 'HTTP' was crafted at a time when the only capability of the protocol was transferring just hypertext, but soon it evolved into a complex technology that became powerful enough to transfer just about any kind of data. In this evolution, however, the name of the protocol was kept the same.
But, frankly speaking, we feel that the term HTTP is quite cool despite the fact that it just talks about hypertext. What do you say?
Moving on, let's now talk about the https
scheme.
The https
scheme relies on the HTTPS protocol, which is just HTTP with an added layer of security.
You might ask: Why was HTTPS created? Well, let's see...
In the nascent web, security wasn't really a big issue until people started to realize of the immense vulnerabilities that the web intrinsically carried along with it.
One of these was that of eavesdropping, whereby an attacker would tap into an HTTP connection between a client and a server and read all the data being transmitted therein.
Since HTTP was all just plain text, this led to a severe vulnerability of sensitive information being possibly leaked to eavesdroppers.
The solution, HTTPS.
HTTPS doesn't do an enormous amount of engineering to HTTP; it just takes whatever is delivered in the HTTP protocol and encrypts it (converts it into scrambled data). This encryption renders all the transferring data useless to an eavesdropper, who only gets to see gibberish data.
We've largely simplified the model of HTTPS and eavesdropping, which is one of the many ways of cracking web security, over here; in reality, the situation is far more complex than this. Getting into the details of HTTPS and the possibilities of cracking its security are both out of the scope of this text and require some highly technical topics.
Anyways, now that we know about the http
and https
URI schemes, let's quickly go through the more common syntax of such URIs, with an example URI.
Consider the following simple URI:
We start off with the scheme, which is https
. Next comes the authority part, which for http and https URIs is just a domain name along with a port number.
If the port is omitted, it's implicitly assumed to be 443
for https
and 80
for http
.
Since the example URI above doesn't include a port and its scheme is https
, it's equivalent to the following URI that includes a port:
By default, browsers are configured to not display the port for http
if it's 80
and for https
if it's 443
.
After the authority part, we have the path, which is /home.html
.
For http and https URIs, making intuition of the path is really simple. It just represents the hierarchical path on the underlying machine hosting the website that takes us to the end resource.
In the web's early days, paths were always physical paths, i.e. they were equivalent to a normal filesystem path on the server.
For instance, if the path /home.html
shown above was a physical path, then /home.html
would be representing an actual home.html file located inside the root directory of the website www.example.com on its respective server.
However, these days, paths are usually abstract, i.e. they don't exist for real. Abstract paths are processed by servers using some kind of a program that crafts a response based on the path requested.
For example, in the URI www.example.com/products/78, we might not actually have anything such as /products/78 on the underlying server (there even isn't any file extension on 78!). The path would probably be processed by the server, with a product's information obtained from a database whose id is 78
.
Going forward, after the path, we have the query, as follows:
The query begins with (and includes) the ?
symbol. It consists of a set of name=value
pairs, known as parameters, delimited by &
characters.
Each of the parameters in a query acts more or less like an HTML attribute in that it provides additional information in the URI.
For instance, in the example shown above, the lang
parameter set to the value en-us
tells it to the server that the home.html resource should be returned in the English (US) language.
We won't be discussing the query component any further than this as it relies upon server-side software and/or JavaScript to be able to be parsed and reacted upon, and we haven't yet explored both of these avenues.
The final thing left for an http (and https) URI is the fragment.
A fragment simply represents a particular section within the resource identified by the underlying URI, which is mostly an HTML document.
A fragment begins with (and includes) the #
symbol.
In the example URI above, the fragment #section1
represents a section — or better to say, a destination anchor — in the home.html resource (an HTML file) with the name section1
.
We'll be discussing fragments and destination anchors in detail in the next HTML Hyperlinks chapter.
http
URIs. It's a highly technical document, though, but there still would be something of use for anyone of any experience level in it.The file
URI scheme
Besides http
and https
, the file
URI scheme is also a very commonly-used scheme.
file
is used by every newbie when he/she begins to learn HTML. Even you have used it, possibly without you knowing about it. The HTML files that you've been creating thus far in this course have all been launched in the browser using a file
URI!
Go to any one of that HTML file that you created and open it up again in the browser and notice the URL displayed in the address bar above.
file
URI scheme represents URIs that identify resources on the underlying filesystem.As an example, consider the following file
URI on a Windows computer:
It denotes a greeting.html file that resides in the directory path C:/Users/Alice/OneDrive/Desktop.
The file
scheme doesn't have an authority, and its path always represents an actual physical path pointing to a file on the underlying filesystem.
file
URI can never be abstract!At this beginning stage, you're all good to work with HTML in file URIs.
It's only once you're done learning CSS and JavaScript that you should, and probably would, transition to http
URIs with the help of some server software (which you'll set on your own).
file
URIs in depth.Absolute vs. relative URLs
In this section, we shall discuss about two ways of referring to URLs while working with them on the web, namely absolute URLs and relative references.
First things first:
Some examples of absolute URLs are:
https://example.com/
https://example.com/home.html
http://localhost/items/1
http://localhost:3000/
file:///C:/Users/Alice/OneDrive/Desktop/greeting.html
An absolute URL doesn't have to be combined together with another URL in order to give us a complete URL to work with — it already is in complete form.
Note that we use the term 'URL' in our discussion only when we have a complete URL. For example, we'll indeed call https://example.com/home.html
a URL, but not home.html
, since it doesn't begin with a scheme.
As we shall see up next, home.html
aligns with the second way of referring to a URL, i.e. via a relative reference.
As the name suggests, a relative reference is literally relative to another URL.
After resolving a relative reference, the complete URL that we get in the end is called its target URL. The URL relative to which a relative reference gets resolved is called the base URL.
Relative references or relative URLs?
Relative references are more commonly referred to as relative URLs (the same as relative URLs as per the context) in resources out there.
However, we tend to avoid this naming, being in line with RFC 3986.
In fact, RFC 3986 itself states that the term 'relative URI' was used in previous RFCs but led to some readers misunderstanding that it referred to a subset of URIs, which wasn't the case.
Consequently, the term 'relative reference' was used in lieu of 'relative URI' to emphasize that it's merely a means of referring to another URI and not a URI itself.
Relative references are pretty commonly used on the web in order to save space and keep from specifying complete, absolute URLs when the resources linked are somehow related to the current URL (in the browser's address bar). Relative references lead to shorter addresses.
A relative reference can begin with two slashes (//
); in this case, it denotes a network-path reference, beginning with its authority (recall that the authority begins with //
).
For instance, if our base URL is http://example.com/home.html
and our network-path reference is //codeguage.com/about
, we'll get the following resolved URL: http://codeguage.com/about
.
As you can see, the scheme of the target URL of a network-path reference is the same as that of the base URL. That is, in our example, we had http
in the base URL and again http
in the resolved, target URL.
A relative reference can even begin with a single slash (/
); in this case, it denotes an absolute-path reference.
An absolute-path reference denotes the complete — the 'absolute' — path of the target URL.
For instance, let's say that our base URL is http://example.com/items/watch.html
and our absolute-path reference is /home.html
. Then we'll get the following resolved URL: http://example.com/home.html
.
Here are a handful of examples of absolute-path references, with the base URL http://example.com/items/watch.html
:
/home.html
; resolves tohttp://example.com/home.html
/about/our-story
; resolves tohttp://example.com/about/our-story
/cart/checkout
; resolves tohttp://example.com/card/checkout
If a relative reference doesn't begin with a slash (/
), we have what's called a relative-path reference.
A relative-path reference denotes a partial path of the target URL; the complete path is determined by combining it with the path of the base URL (hence, the term 'relative').
Suppose that our base URL is again http://example.com/items/watch.html
and our relative-path reference is cup.html
. Then we'll get the following resolved URL: http://example.com/items/cup.html
.
It's very easy to reason this using the analogy of files and directories on a computer. That is, think of items
as being a directory, containing the file watch.html
; then when we refer to cup.html
, since we're still in that items
directory, we just go to the cup.html
file in it.
Here are a couple of examples of relative references for the base URL http://example.com/items/watch.html
:
watches/big-watch.html
; resolves tohttp://example.com/items/watches/big-watch.html
./cup.html
; resolves tohttp://example.com/items/cup.html
../cup.html
; resolves tohttp://example.com/cup.html
The last two examples here are worth consideration. The special values .
and ..
are called dot-directories.
.
represents the current directory. A reference beginning with./
is essentially the same as the reference with these two characters trimmed off...
represents the parent directory; it takes us one directory upwards.
It's paramount for you to master these references since we'll be using them a lot in the coming chapters, as we start linking more and more resources into our HTML pages.
The mailto
URI scheme
Two other particularly handy URI schemes are those that represent email addresses and telephone terminals. In this section and the next one, we shall explore these two kinds of URIs.
Starting off with the former, the mailto
URI scheme is used to denote email addresses.
mailto
URI scheme represents URIs that identify resources accessible via Internet mail.In simpler words, a mailto
URI represent an email address, or multiple such addresses, that will be sent an email to.
A mailto
URI has neither an authority component, nor a fragment; only a scheme (obviously), followed by a path, followed by an optional query.
By default, when a mailto
URI is opened up in the browser, it takes us to the system's default email app (whichever it is) and creates a new email in it, directed to the given recipient(s) as mentioned in the URI.
mailto
doesn't directly send an email; it just creates a new email in the email app which we have to send ourselves. In that sense, we can edit the email before dispatching it.Let's consider an example URI:
This URI represents the email address contact@example.com
. If we enter this URI into the browser's address bar, we'll be taken to the system's default emailing app, with a message crafted to be sent to the given address.
mailto
URIs in very fine, technical detail.The tel
URI scheme
Many websites provide mobile/telephone contact links which, when clicked, directly take us to the phone app with the underlying contact number filled in there, ready to be called.
Such links leverage the tel
URI scheme.
tel
URI scheme represents URIs that identify resources reachable via a telephone number.The basic syntax of tel
URIs is extremely simple.
The telephone number comes right after tel:
and can be delimited by hypens (-
) for better readability (by separating the country code, the area code, and so on).
tel
URI. As specified before, use hyphens (-
) for delimiting different parts of the number.Here's an example URI:
The +1
at the beginning of the number is the country code of the US; hence, the shown number actually represents a phone number in the US.
Every country has an associated country code to be used in telephone numbers. For example, UK has the code +44
, Turkey has +90
, France has +33
, Saudi Arabia has +966
, and so on.
The rest of the parts of the number above further filter it down to specific locations. For instance, the segment 201
in the number above, after the country code, represents the area code of the eastern part of the state of New Jersey in the US.
Different countries have different rules of defining numbers; some might use city codes and area codes whereas others might not use them at all.
Anyways, moving on, browsers are configured by default to handle tel
URIs by opening the underlying number in the system's phone application.
On mobile browsers, this is a straightforward process since the phone app exists in the same system as the browser app. However, on desktop devices, opening a tel
URI usually leads to some kind of a browser pop-up asking where to redirect the call; if the browser is somehow connected to our phone, we can get the tel
URI to be passed on to the mobile phone.
tel
URIs in granular detail.