Location Aware Apps — Connectivity

37 min readNov 17, 2018

Smartphones and other devices with location tracking capability have enabled a number of use cases in everyday life. E.g.:

Navigation services while travelling by various modes of transport
On-demand services such as cabs, delivery and home services
Health and fitness apps that track activity patterns
Location based marketing services
Augmented reality and gaming apps
Indoor asset tracking
Indoor proximity engagement

Location aware apps that run on these devices build on a number of underlying technologies. This is a series of articles that provide an overview of the various technologies that are brought together to deliver these use cases.

Connectivity

Connectivity refers to the mechanism by which tracked devices communicate with a server. We first cover the various technologies used for the wireless network that the device attaches to. We then look at various communication protocols, and finally look at application considerations. In all these, the following parameters need to be evaluated:

Bandwidth: The amount of data that needs to be communicated
Latency: The time sensitivity of the data that needs to be communicated
Overheads: The overheads in each batch of data that is communicated
Power: The power consumed in communicating the data
Reliability: How well the system tolerates temporary connectivity outages

Wireless Networks

There are range of wireless networking technologies available for connecting devices to a server, possibly through the Internet. A network is defined by the multiple ‘nodes’ that communicate using the same type of medium and are delimited by devices such as routers or gateways. The Internet itself is a collection of multiple such networks. Most of these technologies are industry standards, whereas there are also a few proprietary solutions used for some specific use cases. The choice of the physical interface technology has a direct influence on the metrics described at the beginning of this article,and also the following aspects of the network. This in turn affects how the communication protocol is designed.

The architecture of the network i.e. the structural relationship between the various nodes in terms of the control flow. E.g. master-slave (one of the nodes controls the communication); multi-master (multiple nodes can autonomously control the communication); or peer-to-peer (any node can autonomously control the communication). Note that the architectural patterns at the protocol or application level (such as client-server or even peer-to-peer) are independent of this.
The topology of the network i.e. the arrangement of the communication links between the nodes. E.g. a point-to-point connection where the two endpoints are directly linked; a star network where a central hub is the router or gateway, and all other nodes connect to this hub; or a mesh network where a node can connect with any other node.
The area the network can cover. E.g. Personal Area Network (covers a small personal area, like a room), Local Area Network (covers a limited area, like a building), or Wide Area Network (covers large geographical area, like a city).

The following subsections summarize a few of the families of standards available for wireless networking, and how the standards interface with the rest of the protocol stack.

IEEE 802 Standards

The IEEE 802 family of standards are further divided into sub-families e.g. 802.3 for Wired LAN (Ethernet), 802.11 for Wireless LAN (WiFi), 802.15 for Wireless PAN (Bluetooth etc.). In terms of the protocol stack (detailed in the next section), this family specifies the Physical Layer and the Data Link Layer. The Data Link Layer consists of the Medium Access Control (MAC) and the Logical Link Control (LLC) sub-layers.

Industry Groups

A number of access technologies are governed by an industry group that uses or in some cases defines these various standards.

The NFC Forum utilizes standards like ISO/IEC 18092 and ISO/IEC 14443–2,3,4 to define the NFC protocols.
The WiFi Alliance is a non-profit body that is responsible for certifying IEEE 802.11 devices and ensures their interoperability.
The Bluetooth SIG (Special Interest Group) is an industry body that defines the Bluetooth specifications. IEEE ratified the specifications up to Bluetooth 1.2 as IEEE 802.15.1. Bluetooth specifies the entire protocol stack, with a number of optional application layer ‘profiles’ that are chosen according to the use case. E.g. the PAN profile is used for tethering over Bluetooth.
A number of standards for IoT devices are also governed by industry bodies.
The Thread Group maintains the transport layer protocol for using 6LoWPAN (IPv6 on Low-power WPAN — a network layer adaptation of IPv6) over the IEEE 802.15.4 (Low-Rate WPAN) MAC and physical layers.
The LoRa Alliance is a non-profit body developing a LPWAN (Low-Power WAN) technology in the form of the open LoRaWAN MAC layer that builds on the proprietary LoRa physical layer.

Cellular Standards

The 3GPP family of cellular standards specify the physical links for a Wireless WAN.

In the 2G GPRS and 3G UMTS specifications, the key components are the Serving GSN (GPRS Support Node) and the Gateway GSN, with the SSGN responsible for session and mobility management, and the GGSN interfacing with the public Internet. The data link layer on the ‘Mobile Station’ (i.e. the device) consists of the LLC and the RLC/MAC (Radio Link Control) sub-layers — the MS communicates with a similar stack on the BSS (Base Station Subsystem) that in turn interfaces with the SSGN and the GGSN.

In the case of 4G LTE, the key components are now the Serving Gateway (S-GW) and the Packet-Data-Network Gateway (P-GW). The data link layer on the ‘User Equipment’ (i.e. device) consists of the the PDCP (Packet Data Convergence Protocol), the RLC (Radio Link Control) and the MAC layers on the user plane and a Radio Resource Control (RRC) layer on the control plane — the UE communicates with a similar stack on the evolved base station (eNodeB) that in turn interfaces with the S-GW and the P-GW.

Protocols

The Internet Reference Model, also referred to as TCP/IP, is a simplified form of the 7-layer OSI Model. The model emphasizes architectural principles, and loosely refers to four layers of software:

Application Layer — The protocols used by host processes to communicate over the network. The application may be architected as a client-server system, or as a peer-to-peer system. At this layer, usually, a human readable hostname (e.g. www) and domain name (e.g. acme.com) are combined to into a fully qualified name (i.e. www.acme.com) to identify the host.
Transport Layer — The protocols with which hosts can communicate end-to-end, and make corrections for packets lost in the underlying network. The most used protocols are the connection-oriented Transport Control Protocol (TCP) and the connectionless Unified Datagram Protocol (UDP). At this layer, a logical ‘port number’ is used to identify a specific communication end-point.
Network Layer — The Internet Protocol (IP) specifies the common connectionless protocol that provides best effort packet delivery across network boundaries. Best effort implies packets may be lost, corrupted, delivered out of order, or in some cases, duplicated. The two basic functions of this protocol are to identify hosts using their IP Address; and to set up routing tables to determine how packets should be forwarded between nodes.
Link Layer — The medium specific protocols that concerns with communicating with neighbouring hosts on the same local network without any intervening routers or gateways. E.g. the Address Resolution Protocol and the Point-to-Point Protocol. At this layer, nodes are identified by a physical address — typically the MAC (Medium Access Control) address.

The Internet consists of end hosts that communicate with each other via multiple intermediate nodes. Intermediate nodes may include repeaters (multi-port device that repeats incoming physical layer data on all outgoing ports), switches (multi-port device that directs incoming link layer data to appropriate output ports) or routers (multi-port device that routes IP packets between different network ports). On the egress path at a node, the data flows down the layers, with each layer constructing ‘packets’ of data with a layer specific header and a payload. On the ingress path, the data flows up the layers, with each layer acting upon the layer specific header, and then stripping it before passing the payload upwards.

Link Layer

The Link Layer builds on underlying hardware that interfaces with the physical links described earlier.

Location aware apps need to connect to the internet over wireless networks, and are often mobile. Both aspects pose their own sets of challenges:

With wireless networks the key challenges are that packet loss rates are higher than with wired networks and that connectivity can be irregular and intermittent. The transport layer is responsible for adapting its error handling algorithms for wireless networks — more details are covered in the next section.
With mobile connectivity, the challenge is in maintaining connectivity while changing the ‘attachment point’, or when switching networks. Session persistence can be implemented at different layers — at the application, the transport or the IP layers. These are described in the following sections.

Network Layer

Internet Addressing

The IANA (Internet Assigned Numbers Authority) coordinates the allocation of public Internet Addresses to service providers through a hierarchical structure under Regional Internet Registries. Service providers may statically bind an allocated IP address to a host, or more commonly, it may be dynamically assigned. Servers hosting websites or applications are typically assigned a static IP address, and use the Domain Name System (DNS) to map these to human readable domain names.

A network or collection of networks managed by single organization such as an Internet Service Provider (ISP) is referred to as an Autonomous System. The AS Number (ASN), also managed by IANA, is used by various routing protocols to identify the routing policies used by the organization. Routers within an AS (i.e. internal routers) use protocols such as Open Shortest Path First (OSPF) and Enhanced Interior Gateway Routing Protocol (EIGRP). Routers that operate across Systems (i.e. external routers) use protocols such as the Border Gateway Protocol (BGP).

IPv4 and IPv6

The original IPv4 used a 32-bit address space that has come under pressure with the ever increasing number of connected devices. This has been alleviated in the short term by using private address blocks that service providers can freely allocate. IPv4 has reserved the 10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16 blocks for this purpose. Clients from these private networks can access the public internet through a router using NAPT (Network And Port Translation). This is the default form of Network Address Translation (NAT) that maps a private IP/port combination to a unique public IP/port where the public IP may be shared by multiple hosts. A server in a private network needs a different NAT technique known as Port Forwarding that statically maps a port on the router to a private IP/port. Further, the server may use a Dynamic DNS service to map a name to its dynamic IP address, or setup a tunnel.

Most web based applications use a client-server architecture, where the client usually ‘pulls’, and server ‘push’ is only occasional — so most clients can run behind a NAT firewall. However, content providers will prefer public addresses for their clients so as to reduce their reliance on the ISP hosted NAT. Also, enterprise applications like VPN and ftp require public addresses for their clients. IoT applications in particular are well suited to take advantage of the various enhancements offered by IPv6. So although the migration to the 128-bit address space defined by IPv6 may not be urgent, the available IPv4 public addresses will continue to reduce. While a few services have already moved to IPv6-only hosts, most take a dual stack approach. Some services may also rely on Content Delivery Networks (CDN) that support IPv6 to alleviate the urgency to migrate.

IP Mobility

Mobility at the IP layer requires mechanisms that can be used to persist a hosts IP address. IP Mobility within a cellular network is addressed using GTP (GPRS Tunneling Protocol). GTP tunneling is a technique used to run IP over UDP, embedding IP frames in UDP packets across the network backhaul to enable seamless intra-3GPP mobility. IP Mobility across multiple cellular networks is based on Mobile IP (MIP) in 2G/3G networks or Proxy Mobile IP (PMIP) in LTE networks. With Mobile IP, a ‘home agent’ receives packets for the ‘mobile node’ (MN), and tunnels them to its current location through a ‘foreign agent’ using IP over IP tunneling to a ‘Care of Address’. This Host-based mobility approach requires changes to the MN, and can be inefficient. PMIP offers a Network-based mobility approach, with a Mobile Access Gateway acting as a proxy for the MN.

When switching between Cellular and WiFi, IP Mobility is referred to as ‘WiFi Offload’. The GAN/UMA technology has been used in the past for offloading voice calls. A similar approach is used for data traffic using Host-based Dual Stack MIP (DSMIP). A more efficient approach to simultaneously use Cellular and WiFi, depending on the traffic, uses IP Flow Mobility (IFOM).

Transport Layer

The two prominent protocols at the transport layer are the Transmission Control Protocol (TCP) and the Unified Datagram Protocol (UDP). TCP is used when accurate delivery is important — it disassembles a ‘data stream’ from the source application into ‘segments’ of data, transports them to the destination, and then reassembles them into a data stream for the destination application. UDP is used when timely delivery is more important — it takes independent, self-contained ‘datagrams’ of data, and transports them from the source to the destination.

UDP

UDP is a simple, one-shot, stateless, message-oriented protocol that supports multicast. It is useful in scenarios where transmission errors can either be ignored, or can be handled at the application. Some application protocols that rely on UDP are:

Precision Time Protocol (PTP) uses UDP multicasting to synchronize clocks in a network to a designated ‘grandmaster’ down to a sub-microsecond range.
Dynamic Host Configuration Protocol (DHCP) is used by clients to request a server to dynamically assign an IP address. A session starts with a client using a UDP broadcast to discover the server to which the server responds with a UDP unicast.
Domain Name System (DNS) is a distributed hierarchical system for naming the computers on a network, wherein clients (DNS resolver) can query the ‘name servers’ to map a name to an IP address. The resolver usually makes a single UDP request to which it receives a single response.

TCP Overview

TCP is a peer-to-peer (i.e. no master-slave relationship) connection-oriented protocol. The communication is done in three phases — creating the connection; performing the data transfer; and finally terminating the connection. The connection is created with a ‘three way handshake’ — the client first sends a synchronization packet to the server to which the server responds with a synchronization-acknowledgement packet that the client then responds to with an acknowledgement packet. In this phase, each side communicates the initial ‘sequence number’ that will be incremented as packets are sent and acknowledged. The termination phase involves a ‘four way handshake’ — the client and server each send a ‘finish packet’ that the other side responds to with an acknowledgement.

The data transfer phase involves a number of end-to-end operations. The goal is to send the data stream with maximum throughput and collate it in the right sequence at the receiver, while ensuring fair usage across multiple connections. The sender transmits segments of data and waits for acknowledgements using a sliding window protocol. The sliding window is based on the minimum of a ‘receiver window’ and a ‘congestion window’. These windows serve the following purposes:

Flow control to ensure that the receiver is not flooded

The receiver maintains a ‘receive window’ representing the free buffer size still available, limiting how much the sender can transmit without acknowledgement.

Congestion control to ensure that the network is not flooded

The sender maintains a ‘congestion window’ representing how much data can be in flight over the network without acknowledgement.

Congestion Control

On wired networks, data loss is assumed to be due to congestion. Data loss is detected when duplicate cumulative acknowledgements (DupAck) are received or when there is a retransmission timeout (RTO). Congestion control involves four states i.e. slow start (increase window size exponentially until a threshold, or loss), congestion avoidance (thereafter, increase window size linearly until loss), fast retransmit (retransmit on 3 DupAcks without waiting for RTO), and fast recovery (return to congestion avoidance before falling back to slow start). Long-lived connections (e.g. FTP) are usually in the congestion avoidance state, whereas short lived connections (e.g. HTTP) spend most of the time in the slow start state.

The original approach used cumulative acknowledgements (indicating that all prior segments were received), wherein data loss leads to an unwinding of the sliding window for retransmission. Selective acknowledgements (SACK) allows receivers to acknowledge discontinuous data segments, and enables selective retransmission by the sender — this is used when both ends support SACK.

Wireless TCP

On wireless networks, data loss is usually due to temporary radio interference. The standard approach will treat these data losses as congestion, leading to a reduction in the congestion window and lower throughput. Below are a list of some of the techniques used to address this for single-hop networks, where the wireless host communicates directly with an access point (AP) that in turn communicates through a wired backhaul with fixed hosts.

Split network approach — split the link so that the wired part is handled with standard TCP, and the wireless part uses a customized TCP e.g. Indirect TCP. The AP acts as a proxy in both directions, and independently acknowledges both the sender and the receiver.
Link layer approaches — adapt the link layer to address these losses so that the TCP layer only needs to address congestion losses i.e. handle ‘local retransmissions’ at the link layer e.g. the Snooping TCP. While streaming to the wireless host, data loss is detected when the AP notices DupAcks by snooping into the TCP packets; whereas in the reverse direction, data loss is detected when the AP notices a break in the sequencing.
End-to-end approaches — adapt the congestion control and error recovery algorithms at the end hosts. In the previous approaches, the fixed host is unaware of the wireless link, and does not require any changes. However, an end-to-end approach like Mobile TCP requires changes to the fixed host also.

On multi-hop ad-hoc networks, there are additional constraints such as that adjacent hops cannot transmit simultaneously. The overall throughput tends to fall with the number of hops. Protocols such as TCP with Buffering Capability and Sequence Information (TCP-BuS), and Ad-Hoc TCP (ATCP) are used to address these challenges.

Transport Mobility

Transport layer mobility requires that the protocol preserve the end-to-end socket state. With Indirect TCP, the state of the proxy on the current AP needs to be transferred to the new AP (e.g. home agent to foreign agent). With Snooping TCP, handover is simpler as the end-to-end semantics are preserved. With Mobile TCP, the protocol itself is designed to account for mobility.

Multihoming and Multipath

The transport layer can also address reliability and mobility using multiple concurrent paths. MultiPath TCP (MPTCP) is an extension of TCP that allows it to simultaneously use multiple paths to connect with a regular TCP endpoint. It can also take advantage of multihoming, where the various paths connect through multiple network interfaces. Stream Control Transmission Protocol (SCTP) is an alternative transport layer implementation that creates an ‘association’ of multiple data streams, compared with the single stream ‘connection’ in TCP. SCTP also supports multihoming, using the multiple network interfaces on the client and the server for fault tolerance.

Transport Layer Roles

The transport layer is responsible for transporting packets from an endpoint on one host to an endpoint on another host. The endpoints are identified by the IP Address of the host, the transport protocol and a logical port number. A ‘socket’ is the software ‘instantiation’ of this endpoint as created by the host operating system. The transport layer ‘client’ is the host that takes the active role, responsible for initiating the communication request. The ‘server’ is the host that takes the passive role, responsible for listening for requests or data.

The server starts by instantiating a new socket and ‘binding’ it to a specific port. With TCP, the server then prepares to ‘listen’ for connections, then waits to ‘accept’ a request. When a TCP client needs to communicate, it sends a ‘connect’ request to the desired port. Once the server ‘accepts’ the request, it sets up a new socket to proceed with the communication. The connection is accepted on completion of the three-way handshake. Thereafter, both sides can use their sockets to send and receive data symmetrically. Once done, the client usually initiates the process of terminating the connection with the four-way handshake. TCP servers are implemented to be ‘concurrent’ i.e. they can continue processing new requests even when servicing an earlier request.

With UDP, the server is immediately ready to receive data after binding the socket. When a UDP client needs to communicate, it first ‘binds’ to the local port, then proceeds with sending their request, and then waits for the response. UDP servers are implemented to be ‘iterative’ i.e. they will process a single request at a time, and will be ready to process a new request after sending a response.

Application Layer

The protocols at this layer are specific to the user application e.g. FTP (1971) for sharing files, SMTP (1982) for email communication, and HTTP (1991) for web browsing. Simple time sensitive protocols use a connectionless transport protocol whereas most others use a connection-oriented transport.

The connection-oriented protocols are usually end-to-end, using a single ‘packet-switched’ connection between two hosts. In some cases, the protocol may also be hop-by-hop, with multiple disjoint connections between intermediate hosts in a store-and-forward setup (e.g. SMTP).

The applications usually use a client-server architecture where the client is responsible for initiating the connection. Once the connection is established, the communication itself is typically initiated by the client, but may also be initiated by the server in some cases. Client initiated communication is simpler, as the server is usually online, and available to accept connections. The communication may be in ‘pull-mode’ where the initiator is the data-consumer; or may be in ‘push-mode’ where the initiator is the data-source.

E-Mail

Each user or list is identified by a email address consisting of a ‘local part’ and a ‘domain’ separated by an ‘@’ character. SMTP clients use the domain to lookup the ‘MX record’ using DNS and identify the destination host. The local part is used by the receiving mail server to identify the mailbox, and its interpretation is server specific. E.g. Gmail treats the local part as case insensitive, and ignores any ‘.’ characters. A number of services also support sub-addressing, where any string after a ‘+’ character in the local part is treated as a tag, and directed to the same mailbox.

The email message itself consists of a message-header and a message-body, separated by a blank line. The header contains multiple fields, each of which is a key-value pair. SMTP does not require the ‘To:’ and ‘From:’ header fields to match the actual delivery list and sender’s mailbox. Instead, the email client communicates the delivery list as a part of the SMTP handshake protocol.

The message body initially supported only 7-bit ASCII, but with the development of the Multipurpose Internet Mail Extensions (MIME) specification (1992), a wide set of text and multimedia formats are now supported. MIME allows composing a message with multiple parts, arranged in a hierarchy, and with each ‘leaf part’ being of a specific media type and subtype. MIME parts also specify the charset that defines the character encoding used in the message body.

SMTP was designed as a push protocol wherein the sender’s user agent ‘pushes’ email messages to their mail server. The message is then pushed from server to server till it reaches the recipient’s server, from where the recipient’s user agent can pull the message. In case the user agent is a email client, it will use a different protocol like POP or IMAP, whereas a browser based email application will use HTTP.

WWW

The world wide web (WWW) was designed as a way to share and edit hyperlinked text documents (hypertext) over the internet. This creates a network of human readable content over the underlying network of computers. There are four key components — the protocol (HTTP — Hypertext Transfer Protocol), the format (HTML — Hypertext Markup Language), the client application (i.e. ‘user agent’ such as a web browser) and the server application (web server e.g. httpd).

HTTP is used by clients to request a server for hypertext documents and other resources. The resources on the server are addressed using a Uniform Resource Locator (URL) based on a Uniform Resource Identifier (URI) scheme. HTTP is a stateless, half-duplex request- response protocol in which messages are exchanged over a TCP/IP connection.

HTML is a markup language that allows text documents to be annotated with hyperlinks and other information. HTML is primarily intended to describe the structure of documents — the description of the presentation of the document is delegated to a stylesheet language, in this case CSS (Cascading Style Sheets). In addition to documents, from early on, HTML also had support for hypermedia i.e. interlinked multimedia content (graphics, images, audio and video). Over time, there was a need to support dynamic content on the client side (e.g. animations) as well as from the server side (e.g. ticker updates).

Along with HTML and CSS, the JavaScript scripting language completed the suite of front- end languages . The Document Object Model (DOM), that structures the elements of a document in a hierarchical tree, allows the client side script to act on specific elements. Other languages used in the browser over time were based on plugins e.g. ActionScript with Flash; or transpilers.

Interactive forms and server-side dynamic content required server-side technologies like Common Gateway Interface (CGI) and numerous others such as Java Server Pages (JSP), Microsoft Active Server Pages (ASP) and more recently, Node.JS. Interactive web applications were later enabled with the introduction of Asynchronous JavaScript and XML (AJAX) capability using the XMLHttpRequest (XHR) Object in the browser’s JavaScript environment. The methods of this object allow client-side scripts to make asynchronous HTTP requests.

The HTTP server application usually listens on TCP port 80 for client requests. It is usually responsible for first authenticating the requests and interfacing with server side processes in various programming languages. The Apache HTTP Server (httpd) was one of the earliest implementations.

Web browsers are responsible for laying out and rendering the document using a ‘browser engine’. The engine implements the DOM data structure and enforces the security policies. Mosaic, followed by Netscape were some of the first browsers.

HTTP Communication

With HTTP/1.0 (1996), the client would first setup a TCP connection and then send a request message. The client would then terminate the connection on receiving the response message. HTTP/1.1 (1997) added support for persistent connections that allowed clients to reuse the connection for multiple sequential requests and avoid the TCP handshake overheads. Most browsers enable this by default — but to avoid resource wastage, browsers as well as HTTP server implementations close the connection after a configurable timeout. HTTP/1.1 also supported pipelined requests wherein multiple requests can be made without waiting for their response. The responses would be sent back asynchronously, in-full, and in-order. Pipelining is not enabled by default in most browsers due to possible Head-of-Line blocking as a result of the in-order requirement. HTTP/2 (2015) added a mechanism for multiplexed requests where responses could be returned in chunks and out of order — this is supported by most browsers — but only over a secure transport, largely for improved privacy and to avoid any unintended interpretations by intermediaries.

HTTP Messages

In HTTP/1.0 and HTTP/1.1 the messages are transmitted as text. HTTP messages start with a ‘start line’, followed by a series of lines with ‘header fields’, an empty line and finally, an optional ‘message body’.

In a request message, the start line is a ‘request line’, containing a ‘request method’, a ‘request target’ (usually a URL), and the ‘protocol version’ (i.e. HTTP/<version>).

In a response message, the start line is a ‘status line’, containing the ‘protocol version’ (i.e. HTTP/<version>), a ‘status code’ and the ‘status text’. Header fields contain a key-value pair, separated by a colon. The ‘Content-Type’ and ‘Content-Length’ headers are typically used to represent the format and size of the message body. The content type is according to the MIME specifications.

In HTTP/2, the message are composed into binary frames, supporting header compression, and message multiplexing. The frame may be a header frame, a data frame, or some other type.

The request messages in HTTP/1.0 supported the GET, POST and HEAD request methods. HTTP/1.1 added the other methods in use today such as PUT and DELETE. The ‘Host’ header field was made mandatory in HTTP/1.1, whereas with HTTP/2, it should be omitted in a direct request. The ‘Authorization’ header is used by the client to send authentication information. The client may include a ‘Cookie’ header to pass any identifying strings received in prior server responses (with ‘Set-Cookie’ headers). The ‘User-Agent’ header is used to indicate the client program e.g. a web browser or a web crawler.

The status code on the response message is a three digit value. The first digit represents the category (i.e. 1xx for information, 2xx for success, 3xx for redirection, 4xx client error, and 5xx for server error). For large responses, HTTP/1.1 allows for the body to be sent in multiple chunks, where the ‘Content-Length’ header is skipped, and a ‘Transfer-Encoding’ header is set to ‘chunked’.

Web Security

One aspect of web security is protecting against man-in-the-middle attacks. This is implemented by encrypting all HTTP data that is sent over TCP using an intermediate TLS layer (Transport Layer Security). The URI scheme uses the ‘https’ prefix for this scheme, and the default TCP port number at the server is changed to 443. The scheme is based on Public Key Infrastructure (PKI) where the basis of trust lies with a number of Certificate Authorities (CA). A secure server first creates an asymmetric key pair, and then registers their public key with a Registration Authority (RA). The RA issues a Public Key Certificate digitally signed by the CA. Browsers are shipped with the Root Certificate of major CAs, and uses them to verify the integrity and authenticity of the leaf certificate (and associated intermediate certificates) sent by the server. In the ‘simple’ mode of operation, only the server is authenticated. In ‘mutual’ mode, the user also needs to generate a key pair, and install a Client Certificate in the browser — more often though, the client authentication is at application level.

TLS is implemented in two sub-layers — the Handshake protocol that is used while setting up the secure connection; and the Record protocol used to transfer the data securely. The TLS transfer unit is a ‘record’, created by encapsulating the application data stream. TLS is designed to ensure the integrity and authenticity of the record headers, and the integrity, authenticity and confidentiality of the record payload.

Asymmetric key encryption based on the server certificate is used during the handshake, and symmetric key encryption is used after that. The symmetric keys are derived from a ‘master secret’ that is computed independently at the client and the server using a set of ‘input secrets’. The different TLS versions support many different cryptographic and key exchange algorithms. As browsers and servers can be implemented with different versions of TLS, the protocol starts with a negotiation on the latest version that both support, and the algorithms to use.

Another aspect of web security is protecting from vulnerabilities due to malicious usage of client-side scripting. The Same-Origin Policy (SOP) was originally designed to protect access to the DOM, and has since been extended to also protect the global JavaScript object. It ensures that pages from the same site are treated in isolation from pages from some other site. The term ‘origin’ is defined by the combination of the protocol, the host, and the port. SOP restricts pages loaded from one origin from accessing another origin. A number of mechanisms are available for relaxing this restriction if required e.g. Cross-Origin Resource Sharing (CORS). CORS enables cross-origin access by having the new origin return a response message with the Access-Control-Allow-Origin and related header fields.

HTTP Server Initiated Push

Real-time web applications build on the existing HTTP infrastructure for scenarios where information needs to be pushed to clients. As a client-server system, HTTP did not natively support such a mechanism, and various indirect approaches for such ‘bidirectional HTTP’ scenarios have been deployed.

Prior to HTML5, a number of approaches known as comet technologies have been used.

In ‘HTTP long polling’, the client polls the server with requests, but the server does not respond immediately. It holds the connection on each request, waiting until it has a notification to send.
BOSH (Bidirectional-streams Over Synchronous HTTP) is an example of a protocol that uses long polling.
In ‘HTTP streaming’, the client opens a connection that the server never closes. It sends asynchronous notifications on the connection using chunked transfer encoding. This mechanism only works with HTTP/1.0 and HTTP/1.1.
XHR streaming and iFrame streaming are two old approaches that were based on this method

HTML5 specified two new mechanisms to support bidirectional communication.

Server-Sent-Events (SSE) are a part of the HTML5 specification that builds on HTTP streaming and defines a new interface called EventSource for the JavaScript environment. This interface is designed to improve the efficiency of unidirectional traffic. The EventSource builds on the underlying Event mechanism for notifying DOM events to the application.
WebSockets are a part of the HTML5 specification that builds on HTTP to create a new bidirectional protocol with a new URI scheme using ‘ws’ and ‘wss’. This is built on a specific HTTP mechanism designed to upgrade an established connection to a new incompatible protocol.

With HTTP/2 there are some further changes to how asynchronous notification works.

HTTP/2 supports bidirectional streams, in that the client sends a request stream, and the server sends their response on the same stream. Each stream is a series of frames with a mechanism to indicate the end of stream. With this mechanism, there is no longer a need for chunked transfers. This streaming mechanism can continue to be used for Server Sent Events.
HTTP/2 no longer supports the Upgrade header used to set up a WebSocket, and this is mechanism is no longer available. However, there are alternate approaches that may be possible.
HTTP/2 Server Push is a new mechanism that allows the server to push unsolicited content to a client in order to reduce load latencies. This mechanism can be used by the server to make predictions about future requests in response to a given client request — and then send the anticipated request message (as a PUSH_REQUEST frame) along with the corresponding response. However, this push is processed by the browser, and needs to be combined with SSE to trigger an Event.

Push Notifications are used to send messages asynchronously to a client application, and also to allow the user agent (e.g. browser) to notify a user even when the application is not active.

Web Push Notifications are implemented using the Web Push Protocol that builds on the Server Push mechanism defined in HTTP/2. A Service Worker (as defined in HTML5) is used to notify users even when the target web application is not loaded. The W3C has defined the browser ‘Push’ and ‘Notification’ APIs for this functionality optionally using VAPID based identification. There are a number of ‘push services’ that are used by the server side application to actually buffer and push the notification to the user agent.
Mobile Push Notifications are specific notifications services by Apple and Google to deliver notifications to iOS and Android apps — the Apple Push Notification service (APNs) and the Firebase Cloud Messaging service (FCM).

It is also possible to implement push by reversing the client server roles specifically for this subset of interaction i.e. for notifications or callbacks. Such server (now the client) initiated push to a client (now the server) can be implemented using WebHooks. This is also the basis of the WebSub approach for aggregating web feeds.

Intermediaries

In theory, with the end-to-end principle, communication on the Internet should not be affected by intermediaries. However, for a number of practical reasons, it is necessary to understand the role of intermediaries in shaping Internet traffic,

A number of intermediaries operate within the local network. Repeaters and hubs (multi-port repeater), now deprecated, were used to simply repeat the signal on each input port to all other ports primarily to increase the network span. A bridge operates at the link layer, and combines two or more sub-networks. It decides whether to forward or filter ‘frames’ between its ports based on their destination MAC address. This helps isolate the traffic in the various sub-networks, limiting the ‘collision domain’ while extending the ‘broadcast domain’. A switch is simply a multiport bridge that may incorporate additional efficiencies. A router operates at the network layer to forward ‘packets’ between two or more networks, and forms the boundary for a broadcast domain. The packets from a source are forwarded from one router to another until it reaches its destination. Routers within a LAN are referred to as interior routers; and routers that connect a LAN to a WAN are referred to as border routers. A gateway is a generic term that can refer to any entity that forms a boundary between two dissimilar networks, and as such, is the edge of the local network. The gateway can operate at any layer, and may be implemented as a dedicated device, as a gateway router, or as part of a host machine. It is responsible for translating protocols between the two networks and may also be responsible for providing NAT, Firewall, and Proxy functionality.

A firewall operates above the network layer, and is used to ensure that only authorized packets or connections are allowed in or out. A firewall can be implemented as a hardware interceptor, as a software layer on the gateway computer or router, or as a software layer on the local computer. A network layer firewall filters packets either in a stateless or stateful manner at an IP address level. An transport layer firewall filters connection requests for a socket at a port level.

A proxy is a special HTTP server that is effectively an application layer firewall that can filter requests based on the protocol or the URL, at a process level. For forward proxies, what the term proxy usually refers to, the client browser needs to be configured so that it knows the right proxy for a given protocol and URL. When a proxy is configured, a multi-hop relay is established. The client sends all requests to the proxy, that in turn will forward it to its own proxy server or to the origin server. The only change is that the request target in the request message needs to use the absolute path instead of the relative path so that the proxy is aware of the origin server. For reverse proxies, also known by the term gateway, the proxy itself is configured to redirect incoming requests to the right origin server. The redirection is transparent to the client, and may be based on the request path or other factors like server load.

A tunnel is an intermediary that blindly relays packets from one connection to the other. A tunnel helps create a virtual end-to-end connection on a multi-hop relay. It typically works using repeated encapsulation e.g. a TCP packet embedded in a HTTP message. The HTTP CONNECT method is used to tunnel a TCP connection through intermediate proxies to any TCP port on the origin server. This approach is required for creating a secure end-to-end connection for TLS.

A load balancer (LB) is a software or hardware intermediary that can distribute workloads across multiple servers transparently to improve capacity and availability. A Layer 4 LB distributes loads based on transport layer information e.g. by using Destination NAT. A Layer 7 LB, that also serves as a reverse proxy, distributes load based on application layer information e.g. by using session information. A number of advanced techniques are used by the modern load balancer, broadly referred to as ADC (Application Delivery Controller), that may be part of a larger ADN (Application Delivery Network).

A cache is a generic term for any intermediary that improves response times. For web clients, multiple intermediate caches are used to serve any static content with the shortest possible response time. The browser itself caches content at a number of levels: an image cache (store an image only once per page) and a preload cache (load content that may be shown later) that can hold responses for each page; a JavaScript controlled Service Worker cache that can hold responses for each origin; a HTTP/2 Server Push cache that can hold responses for each connection; and the HTTP cache that can hold responses across origins. Intermediate proxies may also cache responses for all downstream clients. A CDN (Content Delivery Network) is a geographically distributed network of proxy servers, often located within ISP data centers. They are used by service providers to improve content delivery for their customers.

The response that is cached may be the actual resources, or it may be the redirection URL or status code. The cache is matched using the request method and the target, and may also use additional request header fields based on the headers mentioned in the ‘Vary’ header field of the response. A server can control the responses cached at the HTTP cache and other intermediaries using the ‘Cache-Control’ HTTP header. This allows the server to specify whether the response can be cached, whether it can be cached by ‘public’ intermediaries, whether it needs to be validated every time, and for how long it can be cached.

Service Quality

Traffic on the open Internet is largely undifferentiated, and end-to-end delivery is best effort only. However, with specific service providers, and in closed networks, it is possible to provide differing levels of Quality of Service (QoS) based on the traffic.

Service quality is measured by delay (the latency from sender to receiver), jitter (variation in delay), bandwidth (end-to-end throughput) and reliability (error rate). QoS assurances are designed to improve one or more of these metrics for a select class of traffic. This may include approaches like bandwidth allocation and traffic shaping. Different applications are sensitive to different measures e.g. web browsing is sensitive to delay while media streaming is sensitive to jitter.

Given the scale and variation on the open Internet, QoS assurances for a select subset of traffic can potentially significantly degrade the service quality for non-differentiated traffic. In general, QoS assurances will work on networks where the service quality is already high, and carving out the assurances will not adversely affect the remaining traffic.

IEEE has a task group focused on creating standards for Time-Sensitive Networks (TSN) that addresses QoS management at Layer 2 and Layer 3 in network bridges in closed networks e.g. 802.1Qav addresses traffic shaping and 802.1Qat addresses bandwidth reservation for Audio Video Bridging (AVB).

In addition to service quality, a number of user applications such as online gaming and over-the-top (OTT) media streaming also measure Quality of Experience (QoE). QoE techniques are designed to compensate for variation in network service quality. An example of this is the MPEG-DASH (Dynamic Adaptive Streaming over HTTP) specification that uses adaptive bitrate streaming.

IoT Application Protocols

The difference between the application protocols for IoT and the Web are largely because whereas the Web is designed for a network of computer hosts, IoT consists of networks of sensor and actuator nodes. These nodes are significantly smaller in scale, and larger in number. The smaller scale implies that the protocols have to be designed to run in a constrained environment with lower compute capability and also lower power consumption, since the nodes are often wireless and battery powered. The larger number of nodes implies that protocols needs to have a well defined hierarchy, and a discovery and addressability mechanism. IoT applications are usually architected with three layers — devices or nodes (D), gateways that aggregate the nodes (G), and backend services that consume and analyze the data (S). Correspondingly, the communication protocols need to address D2D, D2G, D2S and G2S scenarios.

With the large number of devices, it becomes important to keep the communication loosely coupled and asynchronous — in this respect, the request-reply architecture of the web becomes a limiting factor, and a publish-subscribe architecture is often preferred. The data exchange in a pub-sub system can be through a central broker (preferred for constrained devices) or through a shared bus (allows the network to scale better). Where the messages are custom and parsed by application logic, a message-centric (MCPS) protocol works better, whereas when the messages can be standardised and the contained data can be parsed and stored by the middleware, a data-centric (DCPS) protocol is preferred.

There are a large variety of IoT protocols some of which are:

HTTP (for capable devices) and Constrained Application Protocol (CoAP — for constrained devices) are request-reply based protocols that work with TCP and UDP respectively. CoAP also supports a broker-based pub-sub protocol.
Message Queuing Telemetry Transport (MQTT — for constrained devices) and Advanced Message Queuing Protocol (AMQP — for capable devices) are both broker-based MCPS protocols that work with TCP.
The OMG Data Distribution Service (DDS) is an example of a bus-based DCPS protocol that works with UDP. The Extensible Messaging and Presence Protocol (XMPP) can be setup as a bus-based DCPS protocol that can work over either TCP or HTTP.

DDS is suited to D2D communication; CoAP and MQTT are suited for D2G communication; and HTTP, AMQP and MQTT are suited for G2S communication. The CoAP Observe extension is suited to applications that require a push to constrained devices, and the CoAP Group Communication extension to applications that require multi-casting.

Web APIs

Web APIs are the HTTP interfaces that a web application server exposes to its clients. The APIs are defined by the accepted formats of the HTTP request messages, and the associated structure of the response messages. Specifically, it refers to the exposed resource paths, the HTTP methods available on those paths, and the data associated with those methods including the format and structure of the HTTP headers and body. API parameters may be passed as part of the path, attached to the path as a query string, included as a header field, or included in the message body.

The HTTP messaging scheme provides a large leeway in how they can be used to define web APIs. With the prevalence of web applications there is an increasing need to define rules and provide tools to make it easier to develop and maintain these APIs. There are a number of paradigms that help in this respect.

The REST (Representational State Transfer) paradigm is a constraint on HTTP, concerned with standardizing how resources should be identified, what they should represent, and how they should be managed. The HTTP request messages are expected to be self-contained and independent. In this way, the application state can be transferred between the client and the server using stateless ‘representations’. The HATEOAS (Hypermedia as the Engine of Application State) paradigm is a part of the REST guidelines with further constraints that enables clients to ‘discover’ the APIs instead of being directly coded to those APIs. This helps decouple the client and server implementations. Swagger is an example of a tool that helps build REST APIs.

It is recommended that resources should be identified by ‘nouns’ i.e. the objects that needs to be manipulated; and that the request methods are the ‘verbs’ that manipulate the objects. It is also recommended that all methods besides POST should be idempotent. Idempotent operations need to be handled carefully by the protocol and should not be pipelined, to avoid retransmissions. Web applications often need to track HTTP sessions to identify a user across a series of request response transactions. Using a Session ID stored in a client side cookie requires the server to maintain the session state. The approach with REST is to include the authentication credentials in every request.

The RPC (Remote Procedure Call) paradigm is concerned with standardizing how the HTTP parameters are marshalled and unmarshalled. This allows RPC implementations to create a shim for the client and the server that abstracts the asynchronous message exchange into synchronous function calls, in various programming languages. An Interface Description Language (IDL) is used to define the parameters independent of the programming language. The Google gRPC technology is built on the Protocol Buffers (protobuf) IDL, also from Google. The Apache Thrift IDL is the underlying technology used by Facebook. Both of these are open source and provide their own tooling.

The Query Language paradigm is concerned with standardizing how data operations are performed. The REST paradigm is already compatible with the CRUD (Create, Retrieve, Update and Delete) database operations, wherein Create maps to POST, Retrieve maps to GET, Update maps to PUT and Delete maps to DELETE. The GraphQL technology from Facebook models the APIs as database queries built on the GET and POST methods. The Falcor technology from Netflix models the APIs as a way to navigate a global database schema.

Distributed Systems

In a distributed system, each of the services has some form of state that is stored in a local or shared database and that serves as the ‘source of truth’ for the system. As the states change over time, they need to synchronised across the services. RESTful APIs and other mechanisms all provide a synchronous mechanism to transfer state — a mechanism where the producer directly communicates with a consumer, and both need to be available when the communication occurs. An asynchronous mechanism is one where the producer and consumer communicate through an intermediary, eliminating the need for them to be available at the same time. This mechanism decouples the various components, and simplifies the scaling. Two different approaches are possible: message bus, or message broker. With a message bus, each service is required to confirm to a defined standard, and a common data schema is used e.g. Enterprise Service Buses. A message broker enables more flexible communication where messages may be transformed as they are received or sent e.g. RabbitMQ, ActiveMQ. Each of these approaches may support multiple routing mechanisms such as broadcast, topic based publish/subscribe or address based unicast or multicast.

An alternate approach to synchronise state is to use events. Whereas state evolves over time, events are immutable transitions. There are a number of patterns used for such an event-driven architecture. It is even possible to use events themselves as the ‘source of truth’, and track state locally only, within each service. Events may be communicated and stored in a continuous stream using a tool like Kafka, or they may be accumulated in batches using a big data framework like Hadoop.

Application Considerations

The multiple generations of mobile data services starting with 2G EDGE/GPRS, then 3G UMTS/HSPA, and now 4G LTE have provided increasingly faster and lower latency mobile data connectivity. The EDGE network provides a typical download speed of 150 kbps and enables real-time location tracking. Dynamic map updates in location aware apps requires the faster 3G speeds offered by HSPA of typically 1.5 Mbps. More immersive maps are enabled with the faster LTE speeds of around 15 Mbps. Mobile applications that depend on these mobile data services need to address a number of challenges.

There may be coverage gaps, and network speed may vary due to changes in network congestion, and in signal quality. Depending on the carrier’s network, the mobile may also switch between 2G, 3G and 4G environments. To handle these, mobile applications should be able to handle temporary network outages, bandwidth/latency fluctuations, and handover delays. E.g.:
Decouple user transitions from data transitions. UI should not get locked up waiting for network response.
Cache the data where possible e.g. for data that does not change or that does not change frequently.
Pre-fetch the data where possible e.g. for data that is known to be needed in advance.
Detect and handle failures e.g. let the user retry; log and retry later; or retry with a backoff algorithm.
Mobile devices are always constrained by the available battery capacity. The power consumed by succeeding generations of mobile data has been increasing. A large factor is the power consumed by the radio connection and disconnection processes. The presence of multiple concurrent applications and plugins also leads to increased battery consumption. Mobile applications need to be designed to minimize their overall battery consumption. E.g.:
Avoid constant polling. Balance the use of push notifications and polling depending on the application.
Avoid constant beacon updates. Combine multiple updates into a single burst.
Avoid constant streaming. Download data in short bursts when signal quality is good.
Avoid aggressive retrying. Backoff when responses are delayed.