# Playing Together with Elixir Binaries-Strings :)

Play seriously to win over.

This article comprises of things that you’ll encounter while working with Strings and Raw bytes explaining with real situational examples. I tried to design the images, to focus on what we are talking. Hope you like them.

Elixir Version

All the examples used in this article are executed in `iex `using the following combination of `Elixir/Erlang OTP` .

### Gentle Intro

I got to do the heavy workout on packet parsing using the header lengths on raw binaries decoding and encoding of 16, 32, 64 bit strings in one of my projects. So, I just got a thought to share the experience.

Hope, you already knew the difference of `bitstring`, `binary`, `bit`, and `byte`. If true, do: `skip the following screen shot` else: `have a glance of it`.

“Every `binary `is a `bitstring `but every `bitstring `is not a `binary` “

In elixir, binary is represented by `<<>>` . Of course, everybody does know.

`iex(8) data = <<"hello">>"hello"`
`iex(9) is_binary datatrue`
`iex(10) is_bitstring datatrue`
`iex(11) data2 = <<1,2,3::4>><<1, 2, 3::size(4)>>`
`iex(12) is_bitstring data2true`
`iex(13) is_binary data2false`

### What makes a binary different from bitstring ?

If the number of `bits `is a multiple of `8`, then we call it as a `binary`.

Consider the following example.

`<<1,2,3::4>>`

In the above line, we did not mention the number `bits `to be used for `1,2` but we represented for `3`. In elixir, if the size is not mentioned, it uses default `8 bits`. So, `<<1,2,3::4>>` is equal to `<<1::8, 2::8, 3::4>>` which is a `20 bit` data. We cannot call it as a `binary `as number of `bits `is `20 `which is not a multiple of `8`.

Have look at the following representation.

### Raw bytes and Understanding Elixir representation

Strings in elixir are binaries. Sorry for repeating the same statement again and again. But, I have to do. Even when you are asked by waking up from sleep, you are supposed to say that.

Consider a word `hello `each letter or a `grapheme `will take 8 bits. So, the total `byte_size `of a word `hello `is 5.

`iex> byte_size "hello"5iex> String.graphemes "hello"["h", "e", "l", "l", "o"]`
`iex> String.valid? <<35>>true`
`iex> <<35>>"#" // valid string`

The ASCII (American Standard Code for Information Interchange) code for `#` is `35`. The binary representation of `35` is `100011` 6 bit data.

Here, `<< 35 >>` means we are telling to use `8` bits for `35`. So, `00100011 `is the binary form for `35`. If you represent like `<< 35::6>>` is fall under `raw bytes` of data.

`iex> <<35::6>><<35::size(6)>>`
`iex> String.valid?(<<35::6>>)false`
`iex> String.valid?(<<35::8>>)true`

#### Understanding Elixir Representation

Consider the following lines of code

`iex> match?("#", <<35>>)true`
`iex> match? "#", <<0::1, 0::1, 1::1, 0::1, 0::1, 0::1, 1::1, 1::1>>true `
`iex> match? <<35>>, <<0::1, 0::1, 1::1, 0::1, 0::1, 0::1,1::1,1::1>>true`

Here, literally we are dividing each bit of `<< 35::8 >>` to `<<0::1, 0::1, 1::1, 0::1, 0::1, 0::1,1::1, 1::1>>`

### Back End Story of Learning

When I was learning the basics of programming in Elixir, I used to turn the pages without reading when ever I see the symbols `<<>>` . These symbols are night mare when I was a kid relative to Elixir. Learning them is like a feeling of hitting the mountain with your head at a speed of 200. Just imagine.

OK! Stories are apart. But, once you get a clear picture of what is meant by raw byte and valid strings in your mind, you’ll climb the mountain with ease.

Programmers heavily deal with raw bytes in their life than Strings. Especially, one who always do parsing.

Programmers count memory but not in length.

Remember the previous line, we talk on this later inside the article in deep.

### Extracting Sub-String

This is a real-world situation.

#### Extracting a String of Known Length

If you know the exact length of the string and position from where you want to extract, then you can go with the following approach

#### Using binary_part for raw bytes

When you dwell on real world project, you’ll encounter the situations dealing with raw bytes of data. I would suggest you learn as much as possible before working with raw bytes of data.

`iex> binary_part("hello medium", 6, 6)"medium"`

The `binary_part(binary, start, length)` extracts the binary part from `start` to the `length` . It is used for splitting the raw bytes of data.

When the length is negative and within the bounds, it extracts the string from right to left unlike it does from left to right.

#### Things to remember.

→Here, the index cannot be negative.

→Here, the binaries are `zero-indexed` means `binary_part("hello",1,1)` would results `e` not `h` . You have to try `binary_part("hello", 0,1)` . Hope you understood what the zero-indexed is.

→The `start `and `length `cannot exceed the `byte_size `of string. Otherwise, it raises an `Argument Error` Exception.

#### Using binary_part in Guard clause

This definition can be used in guard clause as well.

Example: Packet Parsing

For an example, you are parsing the packets like `\$admin#medium#worlds#best#blog` , `\$user#blackode#a#medium#writer` . You are asked to write a definition that receives a packet and you have to differentiate each packet from other.

You can do this by splitting the packet like `String.split(packet, "#")` and using `if` macro to do the job. But, it takes more code logic. You can make use of the `binary_part` in guard clause like following.

`defmodule Parser do  def parse(packet) when binary_part(packet, 1,5)=="admin" do    IO.puts "Admin Packet !"  end  ...end`

Check out the execution screenshot

=================Warning=================

As I already mentioned in the things to remember section, if either `length` or `start` values are out of bounds, then it raises an Argument Error exception.

#### — Extracting a string of Unknown Length

If you don’t know the length of the sub string, you cannot use the `binary_part` function. Here comes the binary pattern matching `<<>> `in handy.

Situation
You are asked to extract the string from the position 6 to end of the string.

String in Elixir is a multiple of 8 bits which we call it as binary. It means, if the bit_size is divided by 8 then we call that `bitstring `as `binary`.

As we talked earlier in the intro section, each letter in string is of `8 bits` means `1 byte`. So, to skip the` 6` letters you have to skip `6x8` bits.

#### — Extracting first letter from the string

Situation
Extract the currency symbol from string “\$500”

This can be achieved in many ways

#### String.first

`iex> string = "\$500""\$500"`
`iex> string |> String.first"\$"`

#### Pattern Matching

`iex> string = "\$500""\$500"`
`iex> <<first::8,_rest::binary>> = string"\$500"`
`iex> <<first>>"\$"`
`iex> first36            // code_point ascii-code of \$`
`iex> <<35>>"#"`

#### String.split

Not recommended in this situation but, it is good to know the option existence.

As we know, it splits the string based on the given pattern. If the pattern is `""` it gives some different result.

`iex> string = "\$500""\$500"`
`iex> string |> String.split("")["", "\$", "5", "0", "0", ""]`

note: no space between `""`

If you observe here, it added some extra `""` at head and tail. You have to again trim them by passing an option `trim: true` .

`iex> string = "\$500""\$500"`
`iex> string |> String.split("", trim: true) |> hd"\$"`

String.slice

`iex> String.slice "\$500", 0, 1"\$"`
`iex> String.slice "\$500", -4, 1"\$"`

### String.slice [ VS ] binary_part

As we know, both will takes arguments as `(str, start, len)` and returns a sub string starting at the offset `start`, and of length `len` .

I kept thinking of why would be there two functions with similar functionality. So, I started checking out the things that differentiate them.

#### Out of bound options

When the `start` and `len` are out of the bounds then `binary_part` would raise an Argument Error as it is designed to use along with raw bytes but not `String.slice` which refers to the String.length.

Let’s check that.

`iex(14) str = "hello medium" "hello medium"`
`iex(15) String.slice str, 6, 10"medium"`
`iex(16) binary_part str, 6, 10`
`Bug Bug ..!!** (ArgumentError) argument error    :erlang.binary_part("hello medium", 6, 10)`

Here, after position `6` only remain with `6` letters, but we tried to extract sub string of `len 10` . So, the `binary_part` raised an error but not `String.slice` which gave a result of sub string from index `6` to end of the string. Hope you got the point.

### Raw Bytes and Graphemes

The function `String.slice(str, start, len)` , the `start` is the index of the graphemes whereas in `binary_part` it is the index of a byte.

It will be more clear with the following example.

`iex> str = "hełło" "hełło"iex> String.length str5iex> byte_size str7`
`iex> String.graphemes str["h", "e", "ł", "ł", "o"]`

I hope you understand what I mean of `graphemes`. The `graphemes `length of `str `is `5 `but its `byte_size `is `7 `that is where these functions differ from each other.

The `byte_size/1` counts the underlying `raw bytes`, and `String.length/1` counts `characters` .

The function `String.slice` deals with unicode graphemes and `binary_part `deals byte_size.

In general, `binary_part `deals with `raw bytes`.

Internal Representation of String (Raw Bytes)

You can see the `binary `representation of any string with a little hack of joining the string with `<<0>>` .

`iex> str = "hełło" "hełło"`
`iex> raw = str <> <<0>><<104, 101, 197, 130, 197, 130, 111, 0>>`
`iex(37) String.slice raw, 2, 3"łło"`
`iex(38) binary_part raw, 2, 3 <<197, 130, 197>>`

The elixir has a `Base` module which helps you in decoding and encoding of binaries. Have a look here.

Hope you enjoyed playing with strings. Practice makes you more perfect. Try to parse ipv4 packet based on its header length .

If you find this helpful, please put your hand forward to share. Let’s others get benefited from this.

Sharing is Caring.