Let’s dive into VoIP protocols — Episode 1 : SIP Part 1

Theoretical approach of the Session Initiation Protocol

Published in

PCAP-Inspector

3 min readMar 1, 2018

Hi, for this first real episode of the series, we’ll begin with SIP on our marvelous journey through the VoIP protocols. If you did not read the introduction episode yet, I invite you to check out now the Episode 0 where I describe the process and make a short introduction on VoIP protocols and their roles.

The basics

SIP was first designed in 1996 and standardized by the IETF in the RFC 2543 in 1999, it was then superseded by the RFC 3261 in 2002. It is, as defined in the RFC, “An application-layer control (signaling) protocol for creating, modifying, and terminating sessions with one or more participants. These sessions include Internet Telephone calls, multimedia distribution, and multimedia conferences.”

Functionality

SIP provides 5 services :

User location: determination of the end system to be used for
communication.
User availability: determination of the willingness of the called
party to engage in communications.
User capabilities: determination of the media and media parameters
to be used.
Session setup: “ringing”, establishment of session parameters at
both called and calling party.
Session management: including transfer and termination of
sessions, modifying session parameters, and invoking
services.

In order to do so, SIP works in a very similar way to HTTP, the clients send request invoking a method on the server and get a response. All of the transaction are in text format. Addressing is done with URI with the same convention as e-mails : sip:host@domain.

Components

There are 2 types of components in a SIP system : User Agents and Servers.

The 3 types of servers are :

Proxy server : Transfers request to the next hop.
Redirect server : Gives back the location of the next hop.
Registration server : Keeps track of the user locations.

The User Agent is composed of two parts :

UA Client : Initiates SIP requests.
UA Server : Responds for the user.

There is actually a kind of third type of component : Back To Back User Agents (B2BUA). They are acting as a proxy from the client point of view but offer more possibilities due to a different process under the hood. We’ll explore these differences in Part 2 with detailed examples.

Transport Layer

SIP supports multiple transport layers, basically TCP and UDP, SIP doesn’t rely on TCP for reliability. It also supports other protocols such as TLS if encryption is needed for example, but we’ll come back on encryption later.

Main interactions

Registering

As mentioned earlier, a registration server keeps track of the location (IP) of the users, it permits the translation between the URI : sip:Bruno@domain.xyz and the IP : 1.2.3.4 on the network. In order to do so, clients send a REGISTER request with its URI, IP, and other optional information about it. It can also register multiple IPs in order to have multiple endpoints ringing at the same time for example. Most of the time, the proxy server also provides the registering function.

Call

A schema will be way clearer than a paragraph. Here is the situation with a proxy :

Next episode, we’ll dive into the practice with real examples. We’ll talk about : Authentication, B2BUA vs Proxy, redirect server.