Pattern Matching a Function Body: Binaries
Elixir has the ability to Pattern Match on binaries. If you recall, a string in Elixir is implemented as a binary type. This lets us do some interesting things. It is also important to understand some of the limitations it has so we can choose the best approach for our problem.
Contents
Matching a String Prefix
A string is a binary type. We can match on the beginning of a string like this:
defmodule StringTests do
def match_greeting("Hello " <> subject), do: {:hello, subject}
def match_greeting("Greetings " <> subject), do: {:greetings, subject}
def match_greeting("Good morning!"), do: {:morning, nil}
def match_greeting(_other), do: :unknown
end
StringTests.match_greeting("Hello Tom")
#=> {:hello, "Tom"}
StringTests.match_greeting("Greetings Jane")
#=> {:greetings, "Jane"}
StringTests.match_greeting("Good morning!")
#=> {:morning, nil}
StringTests.match_greeting("Buenos dias")
#=> :unknown
This works when we know the exact prefix we are looking for. It is case sensitive and everything after the match gets bound to the variable. This can be useful when working with a text-based protocol over TCP or UDP. For example, a command strings like the following can be matched and parsed quickly.
"GET /url/endpoint"
"SAY Hey guys! How's it going?"
"POKE friend_user_name"
Matching the Middle or End?
You may think about having the pattern match for the end of the string. Let’s try that:
greeting <> "Tom" = "Hello Tom"
#=> ** (ArgumentError) the left argument of <> operator inside a match should be always a literal binary as its size can't be verified, got: greeting
As you can see, matching at the end isn’t allowed. For a binary pattern match to work, it must know the size of the pieces being matched, at least in the front. It can match on a known sized beginning and catch the rest of an unknown size at the end.
Matching a Fixed Size String
Binary pattern matching works well for matching and parsing a fixed size string where the structure is already known. Imagine that you have some date values stored in the format "YYYYMMDD"
. The values are stored as a string. You want to display the date as "MM/DD/YYYY"
. Binary pattern matching works great here!
Let’s look at this example and then we’ll break it down.
defmodule Formatting do
def date(<< year::binary-size(4), month::binary-size(2), day::binary-size(2) >>) do
"#{month}/#{day}/#{year}"
end
end
Formatting.date("20181230")
#=> "12/30/2018"
The pattern match uses the << >>
characters to indicate it is a binary type. In this example we define a pattern that breaks the data into 3 chunks using the variable names year
, month
, and day
. With each specifying how large it is.
Defining a pattern like this lets the BEAM perform fast matches. It also lets us “declare” the pattern we want which allows us to elegantly and quickly parse the data. It’s just so cool!
Practice Exercises
The following exercises continue using the Pattern Matching project. We continue focusing on making a single test pass at a time.
The tests we are focusing on are in test/binaries_test.exs
. Running the following command will execute all the tests in this file. Running all the tests now will show they all fail.
Remember to focus on the test file as a specification for what the code should do and what the sample inputs look like.
$ mix test test/binaries_test.exs
[...]
Finished in 0.06 seconds
7 tests, 7 failures
Randomized with seed 905586
Exercise #1 – Binary.identify_command/1
In this exercise you write the function identify_command/1
that takes a string where the start of the text contains a text-based command. This sort of thing actually exists like with the Hypertext Transfer Protocol v1.1 (HTTP 1.1) specification. This is, of course, a dramatically simplified usage and test case.
There are 2 tests for this function. One is the ability to correctly identify the commands we care about and the other handles unsupported commands.
mix test test/binaries_test.exs:18
mix test test/binaries_test.exs:22
Make the tests pass by using pattern matching in the function declaration.
Exercise #2 – Binary.format_phone/1
In this exercise you write the function format_phone/1
that takes a string containing a US-based phone number with no formatting or special characters.
There are 2 tests for this function. One is the ability to correctly parse the input value and return a correctly formatted string. The other handles inputs that don’t match.
mix test test/binaries_test.exs:30
mix test test/binaries_test.exs:35
Make the tests pass by using pattern matching in the function declaration.
Exercise #3 – Binary.image_type/1
Binary data can be something other than a string. Image files often have a standardized header that describes the file. We can use pattern matching to identify the header data of a file and classify it for us.
In this exercise you write the function image_type/1
that takes a binary (not a Unicode string), containing some image file signatures.
There are 3 tests for this function. The first two handle correctly identifying a PNG and JPG files. The other test deals with unsupported file signatures.
mix test test/binaries_test.exs:48
mix test test/binaries_test.exs:52
mix test test/binaries_test.exs:58
Make the tests pass by using pattern matching in the function declaration.
Recap
If you are interested in going deeper on binary pattern matching, then there are many great resources and examples online. The Elixir documentation for defining a bitstring (aka the function named <<>>/1
) is an excellent reference. There are other great examples of binary pattern matching by others as well. For example, if you want to learn about matching on PNG headers, you can check out articles like this one.
The important thing to remember with binary pattern matching is where it works well and where it doesn’t. It works great in these situations:
- matching on a command-style prefix
- matching and unpacking a fixed size string (like for formatting)
- matching and unpacking data from a fixed size binary structure (like a header)
If you want to parse data from the middle or end of a string and there isn’t a predictable location for it, then you probably want a Regular Expression. Luckily, you have that available in RegEx! You just can’t do that much work in a pattern match function clause. Remember, when pattern matching, the BEAM is trying to answer the question, “Should I execute this function clause?”
Binary pattern matching is awesome! Just keep this tool in mind and know that this tool is available to you when you have a suitable problem.
10 Comments
Comments are closed on this static version of the site.
Comments are closed
This is a static version of the site. Comments are not available.