Pattern Matching a Function Body: Binaries

Elixir has the ability to Pattern Match on binaries. If you recall, a string in Elixir is implemented as a binary type. This lets us do some interesting things. It is also important to understand some of the limitations it has so we can choose the best approach for our problem.

Matching a String Prefix

A string is a binary type. We can match on the beginning of a string like this:

defmodule StringTests do

  def match_greeting("Hello " <> subject), do: {:hello, subject}
  def match_greeting("Greetings " <> subject), do: {:greetings, subject}
  def match_greeting("Good morning!"), do: {:morning, nil}
  def match_greeting(_other), do: :unknown

end

StringTests.match_greeting("Hello Tom")                                 
#=> {:hello, "Tom"}

StringTests.match_greeting("Greetings Jane")         
#=> {:greetings, "Jane"}

StringTests.match_greeting("Good morning!")
#=> {:morning, nil}

StringTests.match_greeting("Buenos dias") 
#=> :unknown

This works when we know the exact prefix we are looking for. It is case sensitive and everything after the match gets bound to the variable. This can be useful when working with a text-based protocol over TCP or UDP. For example, a command strings like the following can be matched and parsed quickly.

  • "GET /url/endpoint"
  • "SAY Hey guys! How's it going?"
  • "POKE friend_user_name"

Matching the Middle or End?

You may think about having the pattern match for the end of the string. Let’s try that:

greeting <> "Tom" = "Hello Tom"
#=> ** (ArgumentError) the left argument of <> operator inside a match should be always a literal binary as its size can't be verified, got: greeting

As you can see, matching at the end isn’t allowed. For a binary pattern match to work, it must know the size of the pieces being matched, at least in the front. It can match on a known sized beginning and catch the rest of an unknown size at the end.

Matching a Fixed Size String

Binary pattern matching works well for matching and parsing a fixed size string where the structure is already known. Imagine that you have some date values stored in the format "YYYYMMDD". The values are stored as a string. You want to display the date as "MM/DD/YYYY". Binary pattern matching works great here!

Let’s look at this example and then we’ll break it down.

defmodule Formatting do

  def date(<< year::binary-size(4), month::binary-size(2), day::binary-size(2) >>) do
    "#{month}/#{day}/#{year}"
  end

end

Formatting.date("20181230")
#=> "12/30/2018"

The pattern match uses the << >> characters to indicate it is a binary type. In this example we define a pattern that breaks the data into 3 chunks using the variable names year, month, and day. With each specifying how large it is.

Defining a pattern like this lets the BEAM perform fast matches. It also lets us “declare” the pattern we want which allows us to elegantly and quickly parse the data. It’s just so cool!

Practice Exercises

The following exercises continue using the Pattern Matching project. We continue focusing on making a single test pass at a time.

The tests we are focusing on are in test/binaries_test.exs. Running the following command will execute all the tests in this file. Running all the tests now will show they all fail.

Remember to focus on the test file as a specification for what the code should do and what the sample inputs look like.

$ mix test test/binaries_test.exs

[...]

Finished in 0.06 seconds
7 tests, 7 failures

Randomized with seed 905586

Exercise #1 – Binary.identify_command/1

In this exercise you write the function identify_command/1 that takes a string where the start of the text contains a text-based command. This sort of thing actually exists like with the Hypertext Transfer Protocol v1.1 (HTTP 1.1) specification. This is, of course, a dramatically simplified usage and test case.

There are 2 tests for this function. One is the ability to correctly identify the commands we care about and the other handles unsupported commands.

mix test test/binaries_test.exs:18
mix test test/binaries_test.exs:22

Make the tests pass by using pattern matching in the function declaration.

Exercise #2 – Binary.format_phone/1

In this exercise you write the function format_phone/1 that takes a string containing a US-based phone number with no formatting or special characters.

There are 2 tests for this function. One is the ability to correctly parse the input value and return a correctly formatted string. The other handles inputs that don’t match.

mix test test/binaries_test.exs:30
mix test test/binaries_test.exs:35

Make the tests pass by using pattern matching in the function declaration.

Exercise #3 – Binary.image_type/1

Binary data can be something other than a string. Image files often have a standardized header that describes the file. We can use pattern matching to identify the header data of a file and classify it for us.

In this exercise you write the function image_type/1 that takes a binary (not a Unicode string), containing some image file signatures.

There are 3 tests for this function. The first two handle correctly identifying a PNG and JPG files. The other test deals with unsupported file signatures.

mix test test/binaries_test.exs:48
mix test test/binaries_test.exs:52
mix test test/binaries_test.exs:58

Make the tests pass by using pattern matching in the function declaration.

Recap

If you are interested in going deeper on binary pattern matching, then there are many great resources and examples online. The Elixir documentation for defining a bitstring (aka the function named <<>>/1) is an excellent reference. There are other great examples of binary pattern matching by others as well. For example, if you want to learn about matching on PNG headers, you can check out articles like this one.

The important thing to remember with binary pattern matching is where it works well and where it doesn’t. It works great in these situations:

  • matching on a command-style prefix
  • matching and unpacking a fixed size string (like for formatting)
  • matching and unpacking data from a fixed size binary structure (like a header)

If you want to parse data from the middle or end of a string and there isn’t a predictable location for it, then you probably want a Regular Expression. Luckily, you have that available in RegEx! You just can’t do that much work in a pattern match function clause. Remember, when pattern matching, the BEAM is trying to answer the question, “Should I execute this function clause?”

Binary pattern matching is awesome! Just keep this tool in mind and know that this tool is available to you when you have a suitable problem.

Comments are closed

This is a static version of the site. Comments are not available.

10 Comments

  1. Joep Stender on July 10, 2020 at 2:14 pm

    The link to https://hexdocs.pm/elixir/Kernel.SpecialForms.html in this article is displayed incorrectly.

    • Mark Ericksen on July 10, 2020 at 2:31 pm

      Thanks Joep! There is a problem linking directly to the function. I can’t do it or it corrupts the rendering. 🙁 The function I’m referring to is <<>>/1

      I’ve fixed the rendering problem but at the cost of a less precise link.

  2. romenigld on December 14, 2020 at 12:20 pm

    Wow this help me a lot, I always don’t understand well when I see something about binary using elixir.
    Now this opens my mind.
    Thank you so much, your explanations are awesome!
    i will now read on Documentation about this <> and be prepared.

    • Mark Ericksen on December 14, 2020 at 1:35 pm

      Awesome! I’m glad it is helpful!

  3. Mark Johnson on February 23, 2021 at 12:47 am

    These lessons are proving to be very helpful for me. I was surprised to find that the rest of the binary data does not match unless ::binary is appended.
    def image_type(<>), do: :png doesn’t work but
    def image_type(<>), do: :png does work

    Thank you, Mark, for making this available.

  4. Vitaly Vasiliev on June 5, 2021 at 9:31 pm

    1.13 elixir, little problem with 1 test, it is not accepting “SAY “, only “SAY” etc

    • Mark Ericksen on June 6, 2021 at 6:34 am

      Hello Vitaly! Can you be more specific? I just tested using Elixir 1.12.1 (current latest released version) and all the tests pass without a need to change.

  5. Vitaly Vasiliev on June 5, 2021 at 9:31 pm

    Fixed by removing space in the test. 🙂

  6. Jaison Vieira on October 4, 2021 at 8:31 pm

    My tests passed doing this:

    def image_type(@png_signature _image_body), do: :png
    def image_type(@jpg_signature _image_body), do: :jpg

    I see your solution is different. Is it the best way to do or just different? Why do both of them work?

  7. Maksym Kosenko on December 13, 2021 at 3:03 pm

    The links from “Recap” were very helpful for me! Thanks a lot, Mark

Comments are closed on this static version of the site.