Part-II: Representation of URI

Shivam
4 min readJan 18, 2020

In this Part-II from the series of posts on the URI, We will be going to cover the syntax or grammar of URI. If you haven’t read the Part-I of this series, please read it before going further, so we will be on the same page.

In this article, we are going to look into the syntax of URI with its meaning in detail.

Syntax Of URI:

URI = <schemeName>:<schemeSyntax>

Above is the syntax of an absolute URI. Here the schemeName is the name of the scheme being used(HTTP, FTP, URN) followed by a colon (“:”) and then a string schemeSyntax whose interpretation depends on the scheme.

The syntax of the URI is dependent on the scheme.

  • Each URI begins with a scheme name, and each scheme has its specification for assigning identifiers within that scheme.
  • This mechanism makes URI an extensible naming system. For example, URI specification for the HTTP scheme may be different than the specification for the FTP or the TELNET.
  • However, there are some common subcomponents which are available in most of the URI, such as the scheme, authority, path, query, and
    fragment.

Following is the generic syntax of the URI

URI = [scheme] “:” [authority] + [path] + [ “?” query ] + [ “#” fragment ]

  • Only the scheme is a mandatory part, and the availability of the rest of the components is dependent on the scheme.

For example: “https://www.google.com:80/over/there?name=ferret#nose”, If we break this URI, we will get the following details:

URI Representation

URI Components:

Most of the URI consist Scheme, Authority, Path, Query and Fragment components. Let’s understand the use of each of them one by one:

Scheme

  • Each URI begins with a scheme name that refers to a specification for assigning identifiers within that scheme.
  • The scheme registry maintains the mapping between scheme names and their specifications. HTTP, FPT, URN, TEL, etc. are examples of the scheme.
  • Many URI schemes are named after protocols. For example, the HTTP is both a scheme and protocol name.
  • You can also create your scheme and scheme specifications, and refer this link for more details on it.

Authority

  • Many URI schemes delegate the responsibility to handle the remaining part of the URI to authority component.
  • Authority component is made of 3 sub-components, namely a user-info, a host and a port.
  • Syntax of the Authority is as follows:

authority = [ user-info “@” ] host [ “:” port ]

  • This generic syntax provides a common means for distinguishing an authority based on a registered name(URN) or server address(URL), along with optional port and user information.
  • The authority component starts by a double slash (“//”) and ends by /, ? or # character.
Authority Details

User Information:

  • The user-info is an optional subcomponent which contains information required to identify the user. We can use `:` as a delimiter to represent information like “userName: password”.
  • The @ is used as a delimiter which separates the user-info from the host.
  • The user-info can be in the form of encoded or plain text. And the application should not display any information after the first colon`:`.

Host:

  • The host subcomponent consists of either a registered name or an IP address.

Port:

The Port numbers are commonly reserved to identify specific services so that an arriving packet can be easily forwarded to a running application.

  • For example, the “HTTP” scheme defines a default port of “80”. For more details, refer to this link.
  • Port number is separated out from hostname using the colon “:”.

Path

  • The Path helps to identify a resource within the scope of the URI’s scheme.
  • It also can contain data, usually organized in hierarchical form.
  • The path is terminated by the first question mark (“?”) or number sign (“#”) character, or by the end of the URI.
  • If a URI contains an authority component, then the path component must either be empty or begin with a slash (“/”) character.
  • If a URI does not contain an authority component, then the path cannot begin with two slash characters (“//”).

Query

  • The query component contains a query string for non-hierarchical data, which most often is a sequence of a key-value pair.
  • The query begins with the first question mark (“?”) character and terminated by a number sign (“#”).

Fragment

  • The Fragment is used to identify a secondary resource within a primary resource(URI).
  • The presence fragment component is indicated by the number sign (“#”).
  • For example, It can be a link to sub-content on the same web page.

Keep In Mind:

  • The URI represents only the generic syntax.
  • The actual syntax of the URI is dependent on the Scheme or Protocol.

Thanks. That’s all in this part-II. I hope you learned something from this post. If you have any suggestions or questions, please add it in the comment below. Thanks again for reading, Happy Learning 👏.

Further Reading:

  • Part-III of this series contains more examples of URI which are commonly used.

References:

--

--

Shivam

Product Engineer @ Gojek. Likes to write on Productivity, Android App Development, Kotlin, Software Engineering, etc.