using Programming;

A Blog about some of the intrinsics related to programming and how one can get the best out of various languages.

Getting started with programming and getting absolutely nowhere (Part 11)

Why do we compose?

Lesson 10: Building some 'fuzzy logic' for our names

If you've been following along in this series (which I recommend that you do, it starts way back here), you'll notice that I've done a lot of a |> (g >> f), and not as much a |> g |> f or f(g(a)), if you're a curious person (which you should be, curiosity is the only way we learn something new), you're probably wondering why. Why do I write things as a |> (g >> f)` rather than the other two options? Well, today's lesson is going to go into that and help us learn what the differences are, so that hopefully things are more clear to you in the future.

First thing first, what is the purpose of these three things?

Before we begin our adventure, I want to refresh our memory back to lesson 1, when I said:

The pipe-right (|>) operator takes the value on the left side, and transforms it to be the last argument of the function on the right side.

So we need to define an even more basic idea first, because pipe-right works on that. So let's do that.

In programming, we often work with something called a function or method, now these terms are sometimes interchangeable, and sometimes not, but we'll define a function first:

A function is a symbol that takes one parameter and returns one value.

So what's a symbol?

In software-engineering, programming, whatever you call it, a "symbol" is a name mapped to a memory location. This is an important concept, because we often thing of symbols to a value (let x = 5) as being the value themselves: they're not. The symbol x is a memory-pointer to a portion of memory that holds the value of 5.

So, think of it this way: if you look in your phone 'contacts' you see a "name", but that name often has a "number" associated with it. When you call or text someone you don't call their name, you call the number. The name is a "symbol" that points to the "number". Hopefully that helps clear it up.

So if a function is a symbol, what does it point to?

That depends, and I'm not nearly educated enough to go into full details on that, but I'll give you the "skinny":

  • When you declare a function let f = (+) 5, you declare a "symbol" (f in this case) that maps to a set of instructions ((+) 5 in this case).
  • When you call "f" (such as f(3)), you are actually asking the software to load the instructions, and then execute them with the value "3" being passed as parameter. Of course, it's not obvious here because I used composition, so let's redefine f: let f x = x + 5. Now is it clear? We have a memory location that is a set of instructions (there are three, we'll get into that shortly) that performs x + 5.
  • When you pass x to f, you are passing it via a "stack", which is an internal memory component that can be read to and written to, but only the top value. This is another important concept. We only ever see the "top" value of the stack, or the last one to be pushed in.

So, here's a more clear example:

  • let x = 5: declares a symbol "x" that ponts to an address of value 5;
  • let f x = x + 5: declares a symbol "f" that points to a function that expects a single parameter bound to symbol x (a different x) then exectues the instruction x + 5 (plus at least two others, again I'll get to that later);
  • let y = f x: declares a symbol y that is the symbol x passed as the first parameter to the function of symbol f, then places the result of that into an address bound to symbol y;

Alright, so does that make sense? Now we understand that f is just a symbol, that points to a memory location with some instructions. So this introduces composition, which is the idea that we can build a function out of smaller parts. Let's define g: let g x = x * 2.

So the symbol g points to a memory location that says "load and execute the three instructions", but this means we can build the composite function g of f: let gf = g >> f. Think of the >> operator as being "and then", we say "do g, and then do f with the result." We can also let fg = f >> g, which is "do f, then do g". The interesting thing is that the symbol gf or fg is just a symbol holding the two functions, and saying "when you get done with the first, give that result to the second, then give me the result."

So now let's go through our example:

let x = 5 // Bind `5` to the symbol `x`, `x` is a value of `int`
let f = (+) 5 // Bind `+ 5` to the symbol `f`, `f` is a function of `int -> int`
let g = (*) 2 // Bind `* 2` to the symbol `g`, `g` is a function of `int -> int`
let gf = g >> f // Bind `g and then f` to the symbol `gf`, `gf` is a function of `int -> int`
let y = x |> gf // Bind the result of `gf x` to the symbol `y`, `y` is a value of `int`

Now at the moment we define y, gf gets executed with the value x, which in tern means g gets executed with x, then f gets executed with g(x). Interestingly, we can model this entire operation more clearly:

let gf x =
    let x_g = g x
    f x_g

Now the problem here is that it takes more time and consideration to realize what happened (after you get used to composition, obviously): we bound an intermediate symbol to the result of g, then returned the result of f with that intermediate symbol. Interestingly gf will have the same int -> int signature at the root level, but it is now defined as let gf (x : int) : int =, rather than let gf : int -> int =, which is a subtle but important difference: gf now directly relies upon the top value of the stack rather than indirectly.

Now I'm noting this because I want to show you the pipe-right option of gf: let gf x = x |> g |> f, which is equivalent to let gf x = f (g x). So we now have three options for writing gf (all being an equivalent result):

let gf x = f (g x) // Write them as parameterized calls
let gf x = x |> g |> f // Write them as piped calls
let gf = g >> f // Write them as a composition

I would choose the last result each and every time and I want you to understand why: when we see let gf = g >> f, we should immediately think to ourselves that gf is a single step, composed of sub-steps g and then f. The important part there is that gf is a single step, it's a single operation, it stands for one thing to do, whereas x |> g |> f stands for multiple things to do. It may not imply it directly, but it indirectly says "take x, do g with it, then do f with that result", whereas g >> f says "do g and then do f immediately." Yes, all three mean the same thing, and yes, all three result in the same value, but it's important for us to write our code readably, we want people to understand it. When I see f (g x) I think "do f, then do g with x, and give that to the f being done." It's much harder to see the immediate idea that f is just the next step in the pipeline, especially once you get to curried or "multi-parameter" functions.

We can also note that let gf = g >> f is shorter than the other two results, in number of characters and time to type. (I'm faster at > than ( or ), for example, and I'd bet you probably are too.)

Bottom line, pick the clearest option

I want to boil this entire discussion down to one idea: write the code that's most obvious. Write the code that's most clear. Write the code that's most meaningful. We've had a trope in the software engineering world for a long time:

Write your code as if the person that has to maintain it is a violent sociopath who knows where you live.

Don't write "clever" things (unless theres a very good reason to), write things that are direct, obvious and say what they mean.

What are the three instructions I talked about earlier?

Right, so I want to clarify this one, when I said we define let f x = x + 5 being three instructions, what do I mean by that?

Well, in the software engineering world we don't get anything for free (except coffee sometimes, but only if we have a good employer). We pay for everything. When we define let f x = x + 5 we have to do at least three things here:

  1. Load the top value in the stack to x;
  2. Perform the (+) operation with x and 5;
  3. Return the result;

We have to do all three, and even if we write-off + as two function calls (because it is, it takes two arguments so it's two calls), we have to do at least those three things. We have to load the top value on the stack, we have to perform our + 5 (we'll call that one instruction), and we have to return that value.

I've slowed down on these posts for the moment, largely because a lot of my free time has been lost, but I'm going to continue to do at least one a week, and when I get more time I'll be doing them more than that. I have a few different projects I'm working on that will all be described here in time, so we'll be going into some pretty advanced topics, so it would be prudent to try to study a few of these things in your free time, as we have a lot to cover yet.