using Programming;

A Blog about some of the intrinsics related to programming and how one can get the best out of various languages.

Optimizing for Tail-Call Recursion in F#

Rewriting for Tail-Call Recursion in F#

While this post is written in F# and specifically for it (referencing ILASM), the principles and practices here can be applied to any function (or non-functional) language that uses tail-call recursion and provides optimizations for it.

Recently I was working on a project, and I had to extend the F# List type to make things a bit simpler. I wrote a method takeThrough that allowed me to provide a predicate, and much like takeWhile, it would return elements until the predicate returned false, then it would return one more element. This was important for the code I was writing, I needed to return everything up to and including the first element that caused the predicate to return false.

So, I wrote this method called takeThrough:

let takeThrough(predicate)(source) =
    let rec loop sourceTemp =
        let head = sourceTemp |> List.head
        if head |> predicate = true then
            head :: (sourceTemp |> List.tail |> loop)
        else
            [head]
    loop source

The problem with this method is that it cannot be optimized for tail-call recursion in it's current state.

What is Tail-Call Recursion?

In order to understand why we're talking about optimizing for tail-call recursion here, we have to first undestand what is tail-call recursion?

In functional languages such as F# things like loops are discouraged, and in some of them even unavailable completely. So, in order to avoid them we have to rewrite our methods for recursion of some form (usually).

If we were to take this same example in C# it might look something like:

IEnumerable<T> TakeThrough<T>(IEnumerable<T> source, Predicate<T> predicate)
{
    var continueLoop = true;

    foreach (var item in source)
    {
        if (predicate(item))
        {
            yield return item;
            continueLoop = true;
        }
        else
        {
            yield return item;
            continueLoop = false;
        }

        if (!continueLoop)
        {
            break;
        }
    }
}

(This may not be the most optimal manner to write this method in, but it guarantees success.)

So we're just looping through each item here and returning it as we go. With F# we don't want to use such a construct, as we want to avoid loops. So we go to recursion, in C# the F# code we wrote above might look more like:

IEnumerable<T> TakeThrough<T>(IEnumerable<T> source, Predicate<T> predicate)
{
    return _loop(source, predicate);
}

IEnumerable<T> _loop<T>(IEnumerable<T> sourceTemp, Predicate<T> predicate)
{
    var head = sourceTemp.First();

    if (predicate(head))
    {
        var result = new List<T>();
        result.Add(head);
        result.AddRange(_loop(sourceTemp.Skip(1), predicate));
        return result;
    }
    else
    {
        return new List<T> { head };
    }
}

It's almost verbatim identical to the F# version. What we can see being a problem here is a StackOverflowException being through if source is large enough and predicate would return late enough. This is what we're hoping to avoid with Tail-Call recursion.

Remember: in order for a method to be optimized for tail-call recursion, the recursive call has to be the last thing the method does.

Now you might look at that method and say "well the last thing that happens is everything is piped to loop." Not quite true. We don't realize that head :: is the very last thing the method has to do.

This is an important note because loop is called, then that value is given to the concatenation operator.

The if/else is ugly too

Of course the other problem is the if/else construct, but that can be fixed with a match head |> predicate with and then match to each boolean value (true and false).

Right, so that's simple. Easy fix:

match head |> predicate with
| true -> head :: (sourceTemp |> List.tail |> loop)
| false -> [head]

Great. We solved the easy idiomatic issue, but how in the world do we make it tail-call recursive?

Visualizing our tail-call recursion

The first thing we have to do is determine how can we write the structure of this method so that the loop call is the last thing to happen? Ignore what it does for now. We just want to know what it would have to look like. We need a visual.

let takeThrough predicate list =
    let rec loop ... =
        ...
        loop ...
    loop ...

So we know what it should look like-ish. That's a very good start. Now we have to figure out how we can get it to that state.

So we know that our method needs to match each item with a predicate, and then return a List of all the elements that matched and the next element. So we need to accumulate a list of elements.

Notice I bolded accumulate. We need a variable in our loop that is an accumulator in this case.

Now we know our visual needs to change:

let takeThrough predicate list =
    let rec loop acc ... =
        ...
        loop newAcc ...
    loop [] ...

This looks about right. Our acc will be a List since that's what we're building out of, and we're going to pipe the newAcc to the list each time we iterate, and then pipe an empty list to our loop before we get started.

Creating our tail-call recursion

So now that we've visualized it, we can start to write the final pieces of it.

We'll start at the final line: loop [] .... What do we know about this call? We know that we only have two parameters in the method, and one variable (well, constant, but it's a function so it will look like a variable). And that's all we need. So we'll pass our initial list to our loop because it's the only we have to pass.

let takeThrough predicate list =
    let rec loop acc ... =
        ...
        loop newAcc ...
    loop [] list

Now our definition for loop has to change:

let takeThrough predicate list =
    let rec loop acc listTemp =
        ...
        loop newAcc newListTemp
    loop [] list

Alright, great progress. We just have to apply our operations now. In our case, the newAcc will be the appended list, and the listTemp will be stripped of the first item. Let's get the logic for head in there and work from that.

let takeThrough predicate list =
    let rec loop acc listTemp =
        let head = listTemp |> List.head
        match head |> predicate with
        | true -> ... loop newAcc newList
        | false -> ...
    loop [] list

Perfect! We're almost done, getting newAcc and newList are both easy: newAcc is just List.append acc [head], and newList is just listTemp |> List.tail.

let takeThrough predicate list =
    let rec loop acc listTemp =
        let head = listTemp |> List.head
        match head |> predicate with
        | true -> loop (List.append acc [head]) (sourceTemp |> List.tail)
        | false -> ...
    loop [] list

The last issue is our false condition: what do we do here?

Simple: we just kill the batman return what the newAcc would have been.

let takeThrough predicate list =
    let rec loop acc listTemp =
        let head = listTemp |> List.head
        match head |> predicate with
        | true -> loop (List.append acc [head]) (sourceTemp |> List.tail)
        | false -> List.append acc [head]
   loop [] list

And we've achieved our goal of tail-call recursion. The very last thing in loop is a call to itself. (Remember that in this case, match is the last condition, then one of two things happens: we call loop or we return the new stuff.)

Verifying with ILDASM

If we look at the IL for this method, we'll see the following:

.method assembly static class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T> 
        loop@4<T>(class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<!!T,bool> predicate,
                  class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T> acc,
                  class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T> sourceTemp) cil managed
{
  // Code size       69 (0x45)
  .maxstack  6
  .locals init ([0] !!T head)
  IL_0000:  nop
  IL_0001:  ldarg.2
  IL_0002:  call       !!0 [FSharp.Core]Microsoft.FSharp.Collections.ListModule::Head<!!0>(class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0>)
  IL_0007:  stloc.0
  IL_0008:  ldarg.0
  IL_0009:  ldloc.0
  IL_000a:  callvirt   instance !1 class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<!!T,bool>::Invoke(!0)
  IL_000f:  brfalse.s  IL_0031
  IL_0011:  ldarg.0
  IL_0012:  ldarg.1
  IL_0013:  ldloc.0
  IL_0014:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0> class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T>::get_Empty()
  IL_0019:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0> class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T>::Cons(!0,
                                                                                                                                                                class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0>)
  IL_001e:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0> [FSharp.Core]Microsoft.FSharp.Core.Operators::op_Append<!!0>(class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0>,
                                                                                                                                                      class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0>)
  IL_0023:  ldarg.2
  IL_0024:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0> [FSharp.Core]Microsoft.FSharp.Collections.ListModule::Tail<!!0>(class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0>)
  IL_0029:  starg.s    sourceTemp
  IL_002b:  starg.s    acc
  IL_002d:  starg.s    predicate
  IL_002f:  br.s       IL_0000
  IL_0031:  ldarg.1
  IL_0032:  ldloc.0
  IL_0033:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0> class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T>::get_Empty()
  IL_0038:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0> class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T>::Cons(!0,
                                                                                                                                                                class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0>)
  IL_003d:  tail.
  IL_003f:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0> [FSharp.Core]Microsoft.FSharp.Core.Operators::op_Append<!!0>(class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0>,
                                                                                                                                                      class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0>)
  IL_0044:  ret
} // end of method List::loop@4

We're only concerned with our recursion:

IL_0000:  nop
...
IL_000f:  brfalse.s  IL_0031
...
IL_002f:  br.s       IL_0000
IL_0031:  ldarg.1
...
IL_003d:  tail.

Just as we hoped, it's simply returning to the beginning of the method instead of calling it again.

Functionally replacing if in F#

Eliminating if from F#

Recently I was asked by a colleague if there were a better way to write a specific method this colleague was using. It was a simple method which called a couple other methods and returned a value from them. Essentially, if two conditions met specific criteria, call one of four other methods. Oh, and it was in F#.

Naturally, as C-style programmers it's easy for us to use if or switch to do what we want, but for some reason when we look at functional languages we cannot seem to reason how we should replace these two constructs with match. It must be trivial, right? We must be missing some silly detail. That's not entirely true. We're not missing anything trivial, we're just not being creative enough.

Functional languages like F# bear the advantage of being very verbose about what's going on. They're also great at implicitly typing things, and making a function read as a mathematical expression. I bolded that for a reason: if we begin to look at our code as a mathematical expression instead of code, we will hopefully see what we're missing.

Let's look at a much reduced sample of the code we were working with:

type ThingType = 
    Left = 0
    | Right = 1

member private this.methodLeftOne =
    true

member private this.methodRightOne =
    false

member private this.methodLeftTwo =
    false

member private this.methodRightTwo =
    true

member this.MatchAndIf var thingType =
    match var with
    | 1 -> if thingType = ThingType.Left then this.methodLeftOne else this.methodRightOne
    | 2 -> if thingType = ThingType.Left then this.methodLeftTwo else this.methodRightTwo
    | _ -> false

My colleague was calling the MatchAndIf function which was to return a boolean value from the two parameters. The code in the other four methods was a bit more complex, but I've simplified it here so we can see how things will turn out.

So, we're looking at a pretty simple bit of code: if var and thingType are 1 and ThingType.Left respectively, return methodLeftOne, if they're 1 and any other ThingType value, return this.methodRightOne, etc. Pretty easy to follow.

We have a slight inconsistency here, however. If thingType is set to a non-valid value, then unexpected (well, unintended) things can happen. This is not so ideal. To fix it with this code would be a mess, now we would have if ... then ... else if ... then ... else .... Sure, that does what we want, but it's really ugly for F#.

Nested match

So, the first thing we might think of to rewrite it is to use a nested match. Alright, easy enough, replace the inner if with a match.

member this.NestedMatch var thingType =
    match var with
    | 1 ->
        match thingType with
        | ThingType.Left -> this.methodLeftOne
        | _ -> this.methodRightOne
    | 2 ->
        match thingType with
        | ThingType.Left -> this.methodLeftTwo
        | _ -> this.methodRightTwo
    | _ -> false

This is obviously more F#-like. It gives us a lot more peace-of-mind, right? But we didn't fix the issue above, so let's do that.

member this.NestedMatchFixed var thingType =
    match var with
    | 1 ->
        match thingType with
        | ThingType.Left -> this.methodLeftOne
        | ThingType.Right -> this.methodRightOne
        | _ -> false
    | 2 ->
        match thingType with
        | ThingType.Left -> this.methodLeftTwo
        | ThingType.Right -> this.methodRightTwo
        | _ -> false
    | _ -> false

Wait a minute, why do we need three default (_) cases? Ah, right, because if 1 or 2 are matched, they won't fall through to the default case, and F# will get very upset if we omit it and implicitly return false. (That's not always a bad thing.)

Tuple match

Well, we might think to ourselves "I can just match on a Tuple instead." Indeed that's true, let's see how that looks.

member this.TupleMatch var thingType =
    match (var, thingType) with
    | (1, ThingType.Left) -> this.methodLeftOne
    | (1, ThingType.Right) -> this.methodRightOne
    | (2, ThingType.Left) -> this.methodLeftTwo
    | (2, ThingType.Right) -> this.methodRightTwo
    | _ -> false

Alright, that's not bad. We've gotten a lot closer to our goal. But now we have things knowing about things they shouldn't. The TupleMatch method does too many things inside it. It's looking for a var of 1 or 2 and a ThingType.

Finally Isolating Everything

The only other thing we can do to fix this (which I can tell you is the best option based on the context of what code I had) is to check thingType in our Match method, and pipe var to our methodLeft or methodRight method (whichever is appropriate).

member private this.methodLeft var =
    match var with
    | 1 -> true
    | _ -> false

member private this.methodRight var =
    match var with
    | 2 -> true
    | _ -> false

member this.FinalMatch var thingType =
    match thingType with
    | ThingType.Left -> var |> this.methodLeft
    | ThingType.Right -> var |> this.methodRight
    | _ -> false

Now each method is only responsible for checking and reporting the parts it cares about. We complied with SRP and we kept it entirely functional. Each method is responsible for looking at only the code it cares about, it's not worried about what the next method down the chain is doing.

SQL Server Datatypes: How to avoid VarChar

I've seen, time and time again, programmers make many of the same mistakes regarding their SQL datatypes, and one of them is to use VarChar for almost everything. I've seen it so many times that if I had a nickel for each time I saw it, well, let's just say my McLaren P1 would be yellow.

Why do people use VarChar so much?

Well, to be honest, it's easy. We, as people, are generally lazy, and it's easy to store anything in a VarChar(50), or worse, a VarChar(MAX)! Why is this a bad thing? Well for some data, it's not, but for others, it's just not the best option. As developers and programmers, we almost always have a choice as to how we should store our data, and sometimes, it's easy to make an inefficient one.

Let's take a solid example. I was over on Stack Overflow one day, and I noticed a developer doing something odd: the developer was storing an IP address (we'll assume IPv4 of 192.168.0.1 which is a pretty common IP for default gateways in small home and office networks) in a VarChar or a Char field. I'm not sure on the precision of it, or which it was (as the developer left out the DDL), but for sake of argument let's assume it was the smallest precision required to store any IP Address, and as such a VarChar(15).

The developer, much like the rest of us, was trying to find a way to shrink the amount of data used down. So, the developer proposed the suggestion of, instead of store 1.1.1.1, we'll just omit all the characters except the last two (in this example: .1), and keep the fourth octet in the database. The downfall of this is quite obvious: we now have no way of distinguishing whether our value is 1.1.1.1, 2.2.2.1, 3.3.3.1 or any other repeated value. But, there's a better way.

Let's take a peek at what we know at this point:
  1. The data being stored is binary data;
  2. It's being stored in a string field;
  3. The maximum length on the string field is 15 characters;
Now this doesn't just apply to IP Addresses, it also applies to hashes, encrypted data and other binary objects.

At first glance this might not seem so bad. The IP Address as a string is 192.168.0.1. The maximum data-size is going to be 17 bytes, as the VarChar type takes one byte per character, and two bytes of overhead. The size for our specific address is 12, by the same math. The developer took the time to address the issue of fitting the data within the seemingly smallest datatype possible. But what did the developer forget?

First, we're trying to store binary data. The smallest way to store this (at least in string format) is either in hexadecimal or Base64 encoding. Let's assume we use hexadecimal (it really doesn't matter either way). We're storing data that is four bytes, which means we need eight characters. Our example leaves us with 0xC0A80001 or, for short: C0A80001. So, this alone allows us to reduce our maximum storage space to almost half it's original size, and our utilized space (for this example) to 10 bytes from 12. With just one quick optimization we converted our 15-character string to an 8-character hexadecimal string. Now that we know that, we can make another optimization and change it to a Char(8) type. This reduces two more bytes of overhead, and leaves our example at a cool 8 bytes of storage space.

But, we're forgetting one small thing: SQL Server (at least, Microsoft SQL Server) has a Binary type. Much like the Char type, the Binary type has a fixed size. The difference is that the Binary type can store raw byte data. It takes a length, just like the Char does, so in our case, it would be Binary(4) (to store four bytes for one IPv4 address). The binary type will only store the raw data for the address, so we're left with:
  1. Byte 1: 0xC0
  2. Byte 2: 0xA8
  3. Byte 3: 0x00
  4. Byte 4: 0x01
Microsoft SQL Server also has a VarBinary type which works just like the VarChar type. It supports the same size limits: 1-8000 or MAX. It also requires two bytes of overhead for each row, just like a VarChar type.

The nice thing about using a Binary type for this field, is that it allows us to save a significant amount of space. By optimizing this field, we've saved 11 bytes of storage per row. How significant is that? If we had 500,000,000 we've saved 5.5GB of data. (And for big-data applications, 500,000,000 rows is insignificant.)

You might say, "well my application is small data, 500,000,000 rows is a pretty significant number, and 5.5GB for that many records is small." While that may be true, this is just one field we've optimized.

The DateTime example

Let's take another example: I've seen a lot of people use the VarChar type for DateTime data as well, when it's completely unnecessary. The SQL Server has several types for DateTime data, the more useful being DateTime, DateTime2, and DateTimeOffset. Microsoft recommends that you no longer use DateTime for new work, as the DateTime2 and DateTimeOffset types align with the SQL standard, and are more portable. The DateTime2 and DateTimeOffset fields also have better precision and a larger range.

Why is this so important? You can just as easily store a as a string in a VarChar field, and then parse it later. The problem with that is that you can't filter quite so easily for certain criteria. It's easy (at least with a DateTime2 field) to filter for dates within a certain range, on a certain date, etc. It's less intuitive with any string type.

The other problem is less obvious: with a VarChar type, there is no validation done that guarantees the input string is a DateTime string. This means it's up to whatever logic you have manipulating the database to make this guarantee.

What about the NVarChar and NChar types?

I've not discussed these so far because we were talking about binary data, which in most any form is stored in some ASCII or raw form. These types (NVarChar and NChar) are Unicode (UTF-16, specifically) variants of the VarChar and Char types, respectively. These types take two bytes per character, with the variable-length type taking an extra two bytes of overhead. In our example, were the first field type an NVarChar(15) it would have taken up to 32 bytes of data. (As 30 bytes for the 15 characters plus two bytes of overhead.) The specifiable sizes for these two fields are any integers in the range 1-4000, or MAX for NVarChar.

What do the numbers in parenthesis represent?

Many fields have an optional size, precision or other parameter to represent different amounts and forms of data that can be stored within them. For all fields we're discussing in this article, the parenthesis represent how many characters (for the Char, VarChar, NChar and NVarChar types), or how many bytes (for the Binary and VarBinary types) the field can store.

What are the VarChar, NVarChar and VarBinary types doing internally?

All three of these types work in a very specific way, internally. You can see that the maximum size any of the three of them can take is up to 8000 bytes, but what does that mean?

Internally, in Microsoft SQL Server, the variable length fields (which have the optional MAX specification) store data in one of two ways:
  1. For data that fits within 8000 bytes, the data is stored in-row;
  2. For data greater than 8000 bytes, the data is stored out-of-row and a pointer to the data is stored in-row;
This should help clarify what the server is doing, and what the specifications mean, and why I always cringe when I see VarChar(MAX) or NVarChar(MAX), in a situation that doesn't call for it.

In summation:

As always: know your data, know your users, and most of all, know your environment.