# Starting our next problem

Lesson 13: Currying (and not the 'Steph' kind)

As expected, I want to continue on to the next issue we're going to solve. One of the many things I deal with on a daily basis is Spam Filtering, which we use a third-party provider for, as well as the built-in spam filtering of our Email system. The problem, of course, ends up being that spammers have gotten really good at getting around these filters.

Now, being who I am, I'm going to share some information with you on ways we can help alleviate this issue. One of the first things we'll do is replace all the invalid / unicode code-points that are out of our normal processing scope. What do I mean by this? Well let's see what we're talking about.

## All 'text' has an encoding

Let's discuss this for a moment. All text we see has an "encoding" in a computer system — a way to map a visual character with a number. There are many encodings out there, but some of the most popular are detailed below:

• ASCII: this was one of the first encodings used on text. In this encoding we map 128 different characters to the numbers `0` through `127`, using 7 bits of space. We have since built an encoding known as "Extended ASCII", which maps an additional 128 characters to the `128` - `255` space using a full 'byte' (8-bits).
• EBCDIC: this was an encoding invented by IBM for it's main-frame systems. This maps characters in a similar manner as ASCII, but it does so with a different 'table'. (A 'table' simply being a matrix of bit combinations to the appropriate character/signal.)
• UTF-32: this maps all the unicode code-points to a 32-bit number, which indicates what code-point should be rendered.
• UTF-16: this maps all the unicode code-points to a 16-bit number, which may also have a second 16-bit number after it. The .NET framework internally uses this format, so we're going to spend a lot of interaction with it, specifically.
• UTF-8: this maps all the unicode code-points to an 8-bit number, which may also have additional 8-bit numbers after it. This is one of the most common encodings, and is backwords compatible with ASCII (`0`-`127`). You'll find this is used a lot on the web, and in cross-platform solutions because it has a somewhat low memory footprint (requires a minimum of one byte per character, up to 6 bytes per character if I recall correctly), and it's easy to work with.

I'm not going to go into detail on this subject, but you need to understand that there are many encodings, and they all mean stuff. This is a hugely immense topic, and there is a good explanation here. We just need to know that this means there are many characters to be represented, and we want to account for that.

Now Email can use any of these encodings (it's usually specified in the body somewhere), but we don't actually need to know that part. We'll actually allow the .NET API to pass us a string to a function, so knowing the encoding is less important because .NET will convert it to UTF-16. Here's what we really need to know:

• Most characters that we see are a single UTF-16 value, something between `0` and `65535`;
• Some characters are two UTF-16 values, with each being a "high surrogate" or a "low surrogate";
• If we convert the string to strictly UTF-32, each character is a single integer value;

I've actually done some of the hard part of mapping the characters to code-points (actually, I did all of the hard part), which is evidenced at this repository, under the `String.fs` file, notably `toCodePoints` and `fromCodePoints`. There is also a `toCodePoint` and `fromCodePoint` in the `Char.fs` file. The basics of these functions are noted below:

``````/// Returns the character(s) represented by the unicode code-point.
let fromCodePoint = System.Char.ConvertFromUtf32
/// Returns the unicode code-point represented by the string at the i'th position.
let toCodePoint (s : string) (i : int) = (s, i) |> System.Char.ConvertToUtf32

/// Converts the string to an array of Int32 code-points (the actual Unicode Code Point number).
let toCodePoints (str : string) : int seq =
let mapper i c =
// Ignore the low-surrogate because it's already been converted
if System.Char.IsLowSurrogate(c) then None
else i |> Char.toCodePoint str |> Some
str
|> Seq.mapi mapper
|> Seq.choose id

/// Converts the array of Int32 code-points (the actual Unicode Code Point number) to a string.
let fromCodePoints =
Array.map (Char.fromCodePoint >> explode)
>> Array.flatten
>> implode
``````

This also uses my `Array.flatten` function, which we talked about in a previous blog post.

## Ok, so what's our goal?

So the first step in this whole thing is to define the problem. Let's consider we want to flag emails which have certain terms in them, for ease of use we'll define a sample of basic words:

``````let listToCodePoints = List.map (String.toCodePoints >> Seq.toArray)
let filters = ["nope"; "fail"] |> listToCodePoints
``````

We want to flag the words `nope` and `fail`. Now, ordinarily this would be easy, because we can match the word with the filter. The problem here is that the word can have 'obsfucated' or 'confusable' characters, such as `ℕope`, `𝑵ope`, `ռope`, etc. We can also have `𝕱ail`, and `𝓕ail`, etc. So, the initial issue is that each of these strings looks like what we want to match, but the computer has no way of knowing that. So we'll define some tests:

``````let listToCodePoints = List.map (String.toCodePoints >> Seq.toArray)
let filters = ["nope"; "fail"] |> listToCodePoints
let terms = ["ℕope"; "𝑵ope"; "ռope"; "nope"; "𝕱ail"; "𝓕ail"; "pass"; "𝕿rue"; "𝓽𝓻𝓾𝓮"] |> listToCodePoints
``````

We want a process to find what "filters" each term matched. The easiest way is to build a function:

``````let matchedFilters term =
let fn = term |> transformToLowerTerm |> (=)
filters |> List.filter fn
``````

And then define `let transformToLowerTerm = id`. Right? We don't have a function for that yet, but we know it'll take a string and return one, so return the same thing (this let's us start building our function chain). Now, if we define a few more helpers, we get the actual transformation:

``````let transformToLowerTerm = id

let matchedFilters term =
let fn = term |> transformToLowerTerm |> (=)
filters |> List.filter fn

let any = (<) 0

let matchedTerms = terms |> List.filter (matchedFilters >> List.length >> any) |> List.map String.fromCodePoints
``````

We should only get `val matchedTerms : string list = ["nope"]`, just one so far. Now we need to define the transformation to let us compare our strings. For this, the Unicode Consortium gives us a list of "confusable" characters, under the Security section, which has a confusables.txt file. This file has a pretty simple CSV format, any `#` indicates a comment for the remainder of the line (ignore it, basically), and each field is separated by a semicolon (`;`). There are also tabs included, but we can strip those with `String.Trim`.

This file we're given by the Unicode Consortium is actually really easy to read with F#, about 20 lines or so:

``````type UnicodeConfusables = { Original : int; Replacement : int array; Field3 : string }

let mapToConfusable =
function
| [|a; b; c|] ->
{ Original = a |> String.trim |> Int.fromHex
Replacement = b |> String.trim |> String.split ' ' |> Array.map Int.fromHex
Field3 = c |> String.trim } |> Some
| _ -> None

let file = "http://www.unicode.org/Public/security/10.0.0/confusables.txt"
let data =
use wc = new WebClient()

let confusables =
data
|> String.split '\n'
|> Seq.filter (String.startsWith "#" >> not)
|> Seq.map (Seq.takeWhile ((<>) '#') >> Seq.toArray >> String.implode >> String.split ';')
|> Seq.choose mapToConfusable
|> Seq.toArray
``````

So now we're down to building a proper `transformToLowerTerm` function. I've already given away what it should do by the name, so you should try your hand at building it first. Here's a hint: the transform term is simply going to map any of the confusable characters (as deemed by having a presence in the `confusables` array with the same `Original` code-point value), and the lower-case term is simply going to take anything `A-Z` and change it to `a-z`.

Once you're done, you should have two functions which we can compose into the `transformToLowerTerm`:

``````let transformToLowerTerm = transformTerm >> lowerCaseTerm
``````

So if you've done it right, then the entire block here should compile:

``````let transformToLowerTerm = transformTerm >> lowerCaseTerm

let matchedFilters term =
let fn = term |> transformToLowerTerm |> (=)
filters |> List.filter fn
``````

Here are the functions I've defined, you don't have to use them but you should be able to reproduce the same thing with them, thus for any given input, my version of `lowerCaseTerm` should return the same as yours, and any given input to `transformTerm` should return the same as yours:

``````let transformTerm =
Array.map (fun codePoint ->
match codePoint |> findCandidates with
| [||] -> [|codePoint|]
| candidates -> candidates..Replacement)
>> Array.flatten

let A = 'A' |> int
let Z = 'Z' |> int
let a = 'a' |> int

let lowerCaseTerm = Array.map (function | c when c >= A && c <= Z -> c + (a - A) | c -> c)
``````

You'll notice that everything is pretty simple, possible with the exception of the math in `lowerCaseTerm`, but in the ASCII encoding (and, conveniently, the UTF-16 encoding) the characters `A-Z` are sequential, starting at value `65` (`0x41`). The characters `a-z` are also sequential, starting at `97` (`0x61`). Since our characters are already integers, we could have compared directly to `>= 0x41` and `<= 0x5A`, but it's easier to follow if we define constants that hold the values there. The `int` function will return the integer value of the input character, so when we do `int 'A'`, we get `65`. (You can try this in F# Interactive to see what I mean.)

So, for our math, we just need to test that the character is in the range `A-Z` (defined by `c >= A && c <= Z`, and if it is return `c + (a - A)`, which will take `c` and add the difference between the lower-case `a` to the upper-case `A`. This should always be `32` or `0x20`. (Since each lower-case character is exactly 32-positions ahead of the upper-case one, `c + 32` is always the lower-case character if we're an upper-case character.)

At this point we've done everything required to map the confusables for a word, so we'll look at the entire source and see what can or can't be done to improve it quick, then call it a day.

``````type UnicodeConfusables = { Original : int; Replacement : int array; Field3 : string }

let mapToConfusable =
function
| [|a; b; c|] ->
{ Original = a |> String.trim |> Int.fromHex
Replacement = b |> String.trim |> String.split ' ' |> Array.map Int.fromHex
Field3 = c |> String.trim } |> Some
| _ -> None

let file = "http://www.unicode.org/Public/security/10.0.0/confusables.txt"
let data =
use wc = new WebClient()

let confusables =
data
|> String.split '\n'
|> Seq.filter (String.startsWith "#" >> not)
|> Seq.map (Seq.takeWhile ((<>) '#') >> Seq.toArray >> String.implode >> String.split ';')
|> Seq.choose mapToConfusable
|> Seq.toArray

let listToCodePoints = List.map (String.toCodePoints >> Seq.toArray)

let filters = ["nope"; "fail"] |> listToCodePoints
let terms = ["ℕope"; "𝑵ope"; "ռope"; "nope"; "𝕱ail"; "𝓕ail"; "pass"; "𝕿rue"; "𝓽𝓻𝓾𝓮"] |> listToCodePoints

let findCandidates codePoint = confusables |> Array.filter (fun x -> x.Original = codePoint)

let transformTerm =
Array.map (fun codePoint ->
match codePoint |> findCandidates with
| [||] -> [|codePoint|]
| candidates -> candidates..Replacement)
>> Array.flatten

let A = 'A' |> int
let Z = 'Z' |> int
let a = 'a' |> int

let lowerCaseTerm = Array.map (function | c when c >= A && c <= Z -> c + (a - A) | c -> c)

let transformToLowerTerm = transformTerm >> lowerCaseTerm

let matchedFilters term =
let fn = term |> transformToLowerTerm |> (=)
filters |> List.filter fn

let any = (<) 0

let matchedTerms = terms |> List.filter (matchedFilters >> List.length >> any) |> List.map String.fromCodePoints
terms |> List.map (transformToLowerTerm >> String.fromCodePoints)
``````

Right off the bat I had an idea to eliminate some of the complexities of `lowerCaseTerm`, let's see where it goes.

Let's look at what we want to do again: if it's a character between `'A'` and `'Z'` we want to replace it with the lower-case (add `32`) version.

``````let lowerCaseTerm =
let uppers = ['A'..'Z'] |> List.map int
Array.map (fun c ->
if uppers |> List.contains c then c + 32
else c)
``````

Now this may not be shorter, but there's no more complicated math (since we've hardcoded `32`, we could replace that with our `a - A` again, but instead we'll just add the 32. Of course, we can also take advantage of the framework here, and use `System.Char.IsUpper`, but that works specifically on the `char`. We could convert our `int` to `char`, but then there is a possibility of converting an invalid value. For this reason, we'll stick with one of the first two options. We could also rewrite the `Array.map` as:

``````Array.map (function | c when uppers |> List.contains c -> c + 32 | c -> c)
``````

Which does the same thing, but with a quick (pointless) pattern-match. Again, what you desire to use here is your choice, and I am simply offering different alternatives that do the same thing. This is to demonstrat that you can often do the same thing in many ways with F#. As always, I recommend writing it in the way that is most clear and concise. Don't be clever, write it in a way that you (and others) can understand it.

Once we're done our output should look something like the following:

``````val matchedTerms : string list =
["ℕope"; "𝑵ope"; "ռope"; "nope"; "𝕱ail"; "𝓕ail"]
val it : string list =
["nope"; "nope"; "nope"; "nope"; "fail"; "fail"; "pass"; "true"; "true"]
``````

I've been running this in F# Interactive, so it automatically assigns `terms |> List.map (transformToLowerTerm >> String.fromCodePoints)` to `it`. Otherwise, everything ought to make sense: the first list is just terms that look like `nope` or `fail`, and the second is all terms transformed to lower-case.

## Where we (could) fail

One of the biggest issues, that might not be immediately apparent, with how we do this is: what happens if we have multiple confusables for a character? We only transform the first one, which means we could be missing out on some possible conversions. We'll go over fixing this in the next lesson, but I want you to start thinking about it now. Essentially, our goal will be to swap out the `transformTerm` with a version that returns every possible option (which for each of our samples is still only `1`).

Overall we should be heading down a new track this time, we won't be using a lot of the old stuff we learned, because we now need to think about a new type of problem. This next section of this series will be bringing us into another domain, which we will use to further define our knowledge. I want you to continue to explore these ideas before I bring the series to them, that way you can start testing your knowledge.

# Currying (and not the 'Steph' kind)

Lesson 12: How do we become experts?

In this lessson we're going to talk about currying (and I do mean neither the food nor the basketball player), which is one of the most complex topics in F#. The F# designers (shout out to Don Syme, Phillip Carter, and many others) have done a wonderful job of implementing this very difficult and complex idea, and it's important that we, as F# developers, understand how it works.

Currying is the idea that a function can rewrite itself to support having only one parameter. This is hugely important, and I haven't yet clarified this in our adventure, so I want to do so here. My friend Chris asked for this discussion on Twitter, and I feel absolutely obliged to make it happen.

I want to start this chat off with a quick analogy: you're standing at the checkout line in the grocery store, you start putting all your items on the belt, but the person running the register can only scan one item at a time. Since they want to keep things going quickly, they scan your items in the same order you put them on in. This is similar to currying, where the person who is running the register would be the 'main function' you are calling, but due to the ability to only handle one thing at a time, they take one, handle it, then move to the next one. Now part of the way through you decided you want a 'subtotal', they stop taking items, and deliver the result at that moment to you. This is the partial application of functions.

This analogy is what I want to liken to currying, this is a very difficult concept to understand, and the designers and engineers of F# have done a wonderful job making it happen, they really have. I really want you to have a clear understanding of how currying works, and why we do it. F# makes this extremely trivial: we're going to use the example of summing a list of integers with the `List.sumBy` method, rather than `List.sum`. We'll use this to demonstrate what exactly happens with currying.

To start, we'll define a `sumBy1` function, which takes `a` (a list), and computes `List.sumBy` with a `map` (projection function). This is a simple task, simply pass `map` to the `List.sumBy` function, then pass `a`.

``````let sumBy1 map a =
a |> List.sumBy map
``````

Next, we want to 'curry' this function. As I mentioned before, 'currying' is the process of rewriting the function so that it has one parameter, not two. Here, we do this by pushing `a` to an `inner` function, and then returning that inner function. We effectively move the `a` parameter out of the main function, and into an inner function. If you look at the type signature of this, you'll see it's an `('a -> int) -> 'a list -> int`, which our previous function was as well. This should help introduce the 'partial application' process I mentioned before, where we can run just one function and get a result of a partial applied function. (Here, we could call `sumBy2` with a `map`, and not an `a`, and get the 'partially applied' version, we can also do that with the previous version: `sumBy1`.)

Another good point to discuss here is that signature. The `a -> b -> c` signature that we're seeing on these functions tells us how F# curried it. Each of the arrows is a new result, it's a return. So `a -> b -> c` takes an '`a`', and returns a `b -> c`, which is a function. So it takes the first input, then returns a function which can take the second input, and then returns the final result. This should begin to make it clear why we are discussing currying and partial application together: they're directly related.

We can write a pre-curried version of our `sumBy1` function, which we'll call `sumBy2`, and have it posess the same `('a -> int) -> 'a list -> int` signature.

``````let sumBy2 map =
let inner a =
a |> List.sumBy map
inner
``````

Here's an interesting observation: we magically created a new variable within our function, we never got passed an `a`, but we managed to create it out of nowhere. It just appeared. That's how currying works. It doesn't care about the parent function at all, it's concerned about the individual function currently in scope. We can create variables at will, and gather a whole new featureset, we don't have to even worry of the parent function.

Now with this `sumBy2`, it should be obvious that `inner` can be rewritten again to eliminate the inner function altogether, and we can make an actual function that is the inner function, as seen below:

``````let sumBy3Inner map = map |> List.sumBy
``````

What this introduces us to is composition, realistically, we could write `let sumFn = sumBy3Inner id`, for example, to build the partially applied version of our function. What should be interesting here is that we're piping the `map`, even though we're not executing the function. We're simply building it up. We could call `a |> sumBy3Inner id`, but we're going to build a `sumBy3` to further compose the chain.

``````let sumBy3 a = a |> sumBy3Inner
``````

Now we get into some complex currying. We want to write our function such that the arguments are reversed: we take the list first, then the function second. This leaves us with the `sumBy4` example, but it's more difficult than that, we need to find a way to curry it out.

``````let sumBy4 a map =
a |> List.sumBy map
``````

Curiously, the solution to this is actually the exact same solutions as we did for `sumBy2`, but instead of `inner a`, we'll build `inner map`. The same partial application aspects apply, and we've built a whole new function inside the previous one. The issue here, is that we can't abstract the `inner` function out, it's just not possible.

``````let sumBy5 a =
let inner map =
a |> List.sumBy map
inner
``````

Now if we collect the sum of each of these functions, we find that each of the results is a single `int` - this tells me our functions probably work, because we're working with very different types. We can also, of course, run this in F# interactive and prove it to ourselves, which we'll do.

``````let sum1 = [1..100] |> sumBy1 id
let sum2 = [1..100] |> sumBy2 id
let sum3 = [1..100] |> sumBy3 id
let sum4 = id |> sumBy4 [1..100]
let sum5 = id |> sumBy5 [1..100]
``````

So all five of our values here result in the same thing, an `int` of the value `5050`. This demonstrates that currying has no net effect on the result. The next question you might have is how does this apply to more than two parameters? The same as it applies to just two:

``````let fnWith3Params a b c =
a + b + c

let curriedFnWith3Params a =
let innerFn1 b =
let innerFn2 c =
a + b + c
innerFn2
innerFn1
``````

We just recursively apply the currying. The most inner function returns the full result, and each outer function returns it's inner. This is all the magic, which can hardly be classified as magic. Now when we talk about partial application, there's no magic here either. Now that the function is curried, we can pretty cleanly explain partial application.

When we partially apply we don't do anything particularly abstract, we actually do some pretty standard stuff. With our `curriedFnWith3Params` example, when we `let tempFn = curriedFnWith3Params 1`, we simply get `innerFn1` with the `innerFn2` value `a` pre-applied. When we `let tempFn = tempFn 2`, we get `innerFn2` with `a` and `b` pre-applied (or baked-in), then when we call `tempFn 3`, we get the result of that which is `6`. That's it! No magic!

``````let add1 = 1 + 2 + 3
let add2 = fnWith3Params 1 2 3
let add3 = curriedFnWith3Params 1 2 3
``````

As previously demonstrated, our three-parameter function results are all the same. We've proven that we get the same result, regardless of which form we use to write it, we have also proven that currying can be done, and it can easily be done, and we can even abstract the curried functions out of the main function and reuse them.

By now you should have a basic idea of how this comes in to play with composition. It should be somewhat clear that when we build a composition chain, we're essentially asking the compiler to build a new function that is a very straightforward chain of the others. These can even be curried as well.

At this point, you may be asking yourself: "why is this important, why do I need to understand this?" There's a very long, drawn out and contrived answer to this, so instead, I'm going to be realistic: you probably don't need to know the drawn-out details of currying, but here's what you do need to know:

• A function has exactly one parameter, so if it looks like it has more than one, it really doesn't. Each parameter produces a new function with exactly one parameter, in the case of multi-parameter functions.
• When we call one of these functions, we are actually calling into a tree of functions, which means we can stop at any branch and get the current "prepared" chain.
• A comma-separated list of arguments, and a space-separated (with no comma) list of arguments are different: one builds a set of tupled parameters, the other builds a set of functions.

For a developer coming from C#, C++, VB.NET, PHP, Java, or JavaScript (or any other language where you separate parameters/arguments by commas), this might seem weird, but when you do that in F# it's considered one argument. A function defined `let fn (p1, p2, p3)` is not the same as a function defined `let fn p1 p2 p3`: the first function takes a single parameter that has three parts, and the second function is three curried functions each taking a single parameter for it's part.

Interestingly, we can apply the aspect of currying to our own design patterns, instead of needing all the parameters up-front, we can build functions that take some of the parameters and then return a new function to take the rest. This let's us maintain SRP (Single-Responsibility Principle) in F#, but in a totally new fashion. We don't need to take a parameter that's currently useless, we can leave that to the function it's being handled by. We can return a new function, which is natively built-in to the language, so our users don't have any idea that we didn't need the new parameter initially.

I actually demonstrated this the other day, in the example of bad code I wrote. When I have the `match p |> getSpecificity with`, I return one of two functions. Both of those functions have the same signature, so I could Have just embedded them in the `match`. Instead, I took advantage of function currying, so the API only has what is needed, and comes up with new parameters on the fly.

Now I could go on explaining away how this works, I really could. I've been working with this for some time now (48 hours, in fact), and I could tell you another 2-3 pages of storytime. Instead, I want to direct you to one of Scott Wlaschin's references, F# For Fun and Profit, particularly the currying section, as well as the partial application section.

I hope this was useful for you, and I hope that even though this was long winded (100+ lines of explanation for 30 lines of code) you still learned something. My first (and only, really) goal with this series is to help guide you to a new path of learning, I want to share what I know with you, with the hopes that one day you'll share what you know with the rest of us.

# How do we become experts?

Lesson 11: Why do we compose?

As I've written this series, I've realized I haven't covered one very important aspect of becoming a programmer: how do we become experts? How do we get better?

## Programming well is not easy

Any seasoned software developer can tell you that becoming a good programmer is not an easy task, but I'll let you in on a little secret: becoming good at anything requires a lot of practice, and many failures. If you ask any athlete, you'll get the same response. Programming shares this facet.

So with all this said, I want it to be painfully clear: programming takes practice. It takes a lot of practice, refining your skills, honing your talents, and focusing on what's important: solving problems. As a software engineer you are expected to solve problems, in whatever manner you must. Because programming takes practice, I want to continue solving problems with you, and guide you towards how we should think about them.

If you've ever wondered how the 'pros' write complex, structured code, it's actually just a great deal of practice and studying (much like any other activity). In fact, we often write simplistic code at the beginning, and begin reducing it to more complex, but concise code. We're going to explore some examples of that today.

## Example one: eliminating lambda's

One of the most common things I do in F# is a lambda. That is, an anonymous function of the form `fun <parameters> -> <expression>`, you'll find that you end up using these a lot because they're easy to express: define the activity you need to do and do it. We'll take the following example and eliminate each lambda.

Initially you might write some code that looks like the following (I purposefully violated a couple aspects of F# here, bear with me we'll fix them):

``````let rows =
[parameters.InitialOffset..mainSheet.LastRowNum]
|> List.map (fun num -> num |> mainSheet.GetRow)
|> List.map (fun row ->
match row with
| null -> None
| _ -> Some row)
|> List.filter (fun rowOpt -> rowOpt.IsSome)
|> List.map (fun x -> x |> Option.get)
``````

Which is clear, tells us exactly what is happening, but can be made shorter and more idiomatic.

The first thing to note is the `List.map` lambda. If you ever see a lambda of the form `fun x -> x |> someFunctions`, you should understand that you can completely eliminate the lambda here. We can rewrite `fun x -> x |> someThing` as `someThing`, no need for the `fun x -> x |>`. This also applies if you're writing functions of the form `someThing(x)`, or `someThing x`. If the lambda has one parameter, and we're using that parameter only for a single call, then we can remove the entire lambda and replace it with the function chain.

``````|> List.map mainSheet.GetRow
``````

Boom, done. Next, we can eliminate that last `List.filter` and `List.map` with a single `List.choose`:

``````|> List.choose (fun x -> x)
``````

Which now introduces an interesting idea: how do we reduce `fun x -> x` to a single function call?

Well, F# has a handy, built-in function called the 'identity function', or `id` for short. This function is essentially the same as our lambda, so we can replace `fun x -> x` with `id`:

``````|> List.choose id
``````

Next, we should realize that `List.choose` can work in place of our second `List.map`, so we'll replace that:

``````let rows =
[parameters.InitialOffset..mainSheet.LastRowNum]
|> List.map mainSheet.GetRow
|> List.choose (fun row ->
match row with
| null -> None
| _ -> Some row)
``````

And now we get to another interesting function that requires some framework knowledge: we can actually replace that entire `List.choose` lambda with a single function: `Option.ofObj`. This function takes any 'object' type and returns an `Option`: `None` if the object was `null`, and `Some object` if it was not.

``````let rows =
[parameters.InitialOffset..mainSheet.LastRowNum]
|> List.map mainSheet.GetRow
|> List.choose Option.ofObj
``````

There is also an `Option.ofNullable`, which works on nullable value-types (like a nullable 'int').

## Example two: identifying functionality that can be abstracted

In one of my work projects, I have the following function, which should definitely be reduced:

``````let compare (m1 : SpecPerson) (m2 : SpecPerson) =
let result =
m1.FirstName = m2.FirstName
&& m1.LastName = m2.LastName
&& m1.State = m2.State
let result =
result &&
match m1.MagicId, m2.MagicId with
| Some a, Some b -> a = b
| _ -> true
let result =
result &&
match m1.MiddleName, m2.MiddleName with
| Some a, Some b -> a = b
| _ -> true
let result =
result &&
match m1.City, m2.City with
| Some a, Some b -> a = b
| _ -> true
let result =
result &&
match m1.Zip, m2.Zip with
| Some a, Some b -> a = b
| _ -> true
let result =
result &&
match m1.OtherThing, m2.OtherThing with
| Some a, Some b -> a = b
| _ -> true
result
``````

If it's not painfully obvious here, I'm comparing several values, and in the case of optionals, if not both of the things are `Some`, then considering them `true`. We can really abstract that to a higher-order function:

``````let compare (m1 : SpecPerson) (m2 : SpecPerson) =
let optionalComparison a b =
match a, b with
| Some a, Some b -> a = b
| _ -> true
m1.FirstName = m2.FirstName
&& m1.LastName = m2.LastName
&& m1.State = m2.State
&& optionalComparison m1.MagicId m2.MagicId
&& optionalComparison m1.MiddleName m2.MiddleName
&& optionalComparison m1.City m2.City
&& optionalComparison m1.Zip m2.Zip
&& optionalComparison m1.OtherThing m2.OtherThing
``````

Now doesn't that seem nicer? We should always try to identify what's common and pull it up a level, so we can reuse it. This is a really tough thing to be able to do in the beginning, so I don't expect you to get it right away. As you gain experience you'll learn what can be lifted out more effectively, and you'll gather a stronger understanding of what is important to lift out.

## Example three: think critically about what you're doing

I recently had the following `match` in a project I was working on:

``````match p with
| One OtherSpecific
| Multiple (AnyNumber OtherSpecific)
| Multiple (Two OtherSpecific)
| Multiple (Zero OtherSpecific)
| Multiple (Many OtherSpecific) -> doSpecificThing
| One OtherNonspecific
| Multiple (AnyNumber OtherNonspecific)
| Multiple (Two OtherNonspecific)
| Multiple (Zero OtherNonspecific)
| Multiple (Many OtherNonspecific)  -> doNonspecificThing
``````

Of course, we can really improve this, what terrible dev wro—....oh yeah, me. We should realize that we're doing something that can easily be pushed away:

``````let getSpecificity q =
match q with
| One s
| Multiple (AnyNumber s)
| Multiple (Two s)
| Multiple (Zero s)
| Multiple (Many s) -> s

match p |> getSpecificity with
| OtherSpecific -> doSpecificThing
| OtherNonspecific -> doNonspecificThing
``````

I'm supposed to be a professional and I let that ugly mess live in a project for quite some time, so if you ever begin to doubt yourself: even the good programmers do it.

You may also notice that this is no shorter than the original version (both are 11 lines, though the second version has a debatable line-break), but interestingly, this version is reusable, if we ever have to `getSpecificity` elsewhere, we can. This should be incredibly important to you, and you absolutely must consider it: if something is generic, build it to reuse it.

## You may not be perfect, but no one is

I don't think I've stated it in this series yet, so here we go: you may not be perfect, but no one is. Programming is a way of thinking, it's an entire mindset. If you stick with it, you'll eventually become so familiar with the entire process that it'll become second nature, you'll begin looking at things more clearly. There will be several "aha" moments — one day you'll be sitting on the porch thinking about something completely unrelated, and then an epiphany will hit, and things become immedately clear. This happens to all of us, and it will most probably happen to you.

Every time I write one of these I reflect on what the previous lessons include, and continually disappoint myself. I really, really expected to do better for you all, and I promise it'll come. If anyone wants a specific topic or 'thing' (where 'thing' can be a feature, concept, problem) covered, please let me know on Twitter. I want you to want to read this, and for that to happen I need to know what you want to see. You can also comment on this post, I read all comments and (unfortunately) delete most as spam, but it's a safe medium to reach out.

# Why do we compose?

If you've been following along in this series (which I recommend that you do, it starts way back here), you'll notice that I've done a lot of `a |> (g >> f)`, and not as much `a |> g |> f` or `f(g(a))`, if you're a curious person (which you should be, curiosity is the only way we learn something new), you're probably wondering why. Why do I write things as a |> (g >> f)` rather than the other two options? Well, today's lesson is going to go into that and help us learn what the differences are, so that hopefully things are more clear to you in the future.

## First thing first, what is the purpose of these three things?

Before we begin our adventure, I want to refresh our memory back to lesson 1, when I said:

The pipe-right (`|>`) operator takes the value on the left side, and transforms it to be the last argument of the function on the right side.

So we need to define an even more basic idea first, because pipe-right works on that. So let's do that.

In programming, we often work with something called a function or method, now these terms are sometimes interchangeable, and sometimes not, but we'll define a function first:

A function is a symbol that takes one parameter and returns one value.

## So what's a symbol?

In software-engineering, programming, whatever you call it, a "symbol" is a name mapped to a memory location. This is an important concept, because we often thing of symbols to a value (`let x = 5`) as being the value themselves: they're not. The symbol `x` is a memory-pointer to a portion of memory that holds the value of `5`.

So, think of it this way: if you look in your phone 'contacts' you see a "name", but that name often has a "number" associated with it. When you call or text someone you don't call their name, you call the number. The name is a "symbol" that points to the "number". Hopefully that helps clear it up.

## So if a function is a symbol, what does it point to?

That depends, and I'm not nearly educated enough to go into full details on that, but I'll give you the "skinny":

• When you declare a function `let f = (+) 5`, you declare a "symbol" (`f` in this case) that maps to a set of instructions (`(+) 5` in this case).
• When you call "`f`" (such as `f(3)`), you are actually asking the software to load the instructions, and then execute them with the value "`3`" being passed as parameter. Of course, it's not obvious here because I used composition, so let's redefine `f`: `let f x = x + 5`. Now is it clear? We have a memory location that is a set of instructions (there are three, we'll get into that shortly) that performs `x + 5`.
• When you pass `x` to `f`, you are passing it via a "stack", which is an internal memory component that can be read to and written to, but only the top value. This is another important concept. We only ever see the "top" value of the stack, or the last one to be pushed in.

So, here's a more clear example:

• `let x = 5`: declares a symbol "`x`" that ponts to an address of value `5`;
• `let f x = x + 5`: declares a symbol "`f`" that points to a function that expects a single parameter bound to symbol `x` (a different `x`) then exectues the instruction `x + 5` (plus at least two others, again I'll get to that later);
• `let y = f x`: declares a symbol `y` that is the symbol `x` passed as the first parameter to the function of symbol `f`, then places the result of that into an address bound to symbol `y`;

Alright, so does that make sense? Now we understand that `f` is just a symbol, that points to a memory location with some instructions. So this introduces composition, which is the idea that we can build a function out of smaller parts. Let's define `g`: `let g x = x * 2`.

So the symbol `g` points to a memory location that says "load and execute the three instructions", but this means we can build the composite function `g of f`: `let gf = g >> f`. Think of the `>>` operator as being "and then", we say "do `g`, and then do `f` with the result." We can also `let fg = f >> g`, which is "do `f`, then do `g`". The interesting thing is that the symbol `gf` or `fg` is just a symbol holding the two functions, and saying "when you get done with the first, give that result to the second, then give me the result."

So now let's go through our example:

``````let x = 5 // Bind `5` to the symbol `x`, `x` is a value of `int`
let f = (+) 5 // Bind `+ 5` to the symbol `f`, `f` is a function of `int -> int`
let g = (*) 2 // Bind `* 2` to the symbol `g`, `g` is a function of `int -> int`
let gf = g >> f // Bind `g and then f` to the symbol `gf`, `gf` is a function of `int -> int`
let y = x |> gf // Bind the result of `gf x` to the symbol `y`, `y` is a value of `int`
``````

Now at the moment we define `y`, `gf` gets executed with the value `x`, which in tern means `g` gets executed with `x`, then `f` gets executed with `g(x)`. Interestingly, we can model this entire operation more clearly:

``````let gf x =
let x_g = g x
f x_g
``````

Now the problem here is that it takes more time and consideration to realize what happened (after you get used to composition, obviously): we bound an intermediate symbol to the result of `g`, then returned the result of `f` with that intermediate symbol. Interestingly `gf` will have the same `int -> int` signature at the root level, but it is now defined as `let gf (x : int) : int =`, rather than `let gf : int -> int =`, which is a subtle but important difference: `gf` now directly relies upon the top value of the stack rather than indirectly.

Now I'm noting this because I want to show you the pipe-right option of `gf`: `let gf x = x |> g |> f`, which is equivalent to `let gf x = f (g x)`. So we now have three options for writing `gf` (all being an equivalent result):

``````let gf x = f (g x) // Write them as parameterized calls
let gf x = x |> g |> f // Write them as piped calls
let gf = g >> f // Write them as a composition
``````

I would choose the last result each and every time and I want you to understand why: when we see `let gf = g >> f`, we should immediately think to ourselves that `gf` is a single step, composed of sub-steps `g` and then `f`. The important part there is that `gf` is a single step, it's a single operation, it stands for one thing to do, whereas `x |> g |> f` stands for multiple things to do. It may not imply it directly, but it indirectly says "take `x`, do `g` with it, then do `f` with that result", whereas `g >> f` says "do `g` and then do `f` immediately." Yes, all three mean the same thing, and yes, all three result in the same value, but it's important for us to write our code readably, we want people to understand it. When I see `f (g x)` I think "do `f`, then do `g` with `x`, and give that to the `f` being done." It's much harder to see the immediate idea that `f` is just the next step in the pipeline, especially once you get to curried or "multi-parameter" functions.

We can also note that `let gf = g >> f` is shorter than the other two results, in number of characters and time to type. (I'm faster at `>` than `(` or `)`, for example, and I'd bet you probably are too.)

## Bottom line, pick the clearest option

I want to boil this entire discussion down to one idea: write the code that's most obvious. Write the code that's most clear. Write the code that's most meaningful. We've had a trope in the software engineering world for a long time:

Write your code as if the person that has to maintain it is a violent sociopath who knows where you live.

Don't write "clever" things (unless theres a very good reason to), write things that are direct, obvious and say what they mean.

## What are the three instructions I talked about earlier?

Right, so I want to clarify this one, when I said we define `let f x = x + 5` being three instructions, what do I mean by that?

Well, in the software engineering world we don't get anything for free (except coffee sometimes, but only if we have a good employer). We pay for everything. When we define `let f x = x + 5` we have to do at least three things here:

1. Load the top value in the stack to `x`;
2. Perform the `(+)` operation with `x` and `5`;
3. Return the result;

We have to do all three, and even if we write-off `+` as two function calls (because it is, it takes two arguments so it's two calls), we have to do at least those three things. We have to load the top value on the stack, we have to perform our `+ 5` (we'll call that one instruction), and we have to return that value.

I've slowed down on these posts for the moment, largely because a lot of my free time has been lost, but I'm going to continue to do at least one a week, and when I get more time I'll be doing them more than that. I have a few different projects I'm working on that will all be described here in time, so we'll be going into some pretty advanced topics, so it would be prudent to try to study a few of these things in your free time, as we have a lot to cover yet.

# Building some 'fuzzy logic' for our names

Lesson 9: Knowing the terminology is important

If you recall, in lesson 7 I mentioned the following:

We'll then (at a later date) explore how we can modify it to account for multi-part last names, such as `LA ROUSE`, or `MAC GREGOR`. This is a lot harder than it looks, and we'll only worry about that for `Sheet 2`, because in `Sheet 1` it's always separated out.

Today, we're going to explore this part of our topic - we'll build some 'fuzzy logic' that doesn't seem fuzzy, but happens to be. Of course, first we need to define 'fuzzy logic', so let's do that.

## What is 'fuzzy logic'?

Fuzzy logic is used fairly often in programming, it's essentially building a set of rules that sacrifice one of the following to make up for the other two: predictability, accuracy, and speed. Ours will sacrifice accuracy for speed and predictability. We're going to build a function that is mostly correct, but sometimes wrong, to allow us to solve the 95% scenario.

We need some "rules" for our fuzzy logic, so let's define them:

• If a word is <= 3 characters, it's part of a "compound word";
• A compound word should be grouped with the next word if possible;
• If a word is a single character followed by a single period, in any quantity, it's an "initial";

In [lesson 8], we built `splitNames`:

``````let splitNames (name : string) =
match ' ' |> name.Split with
| [|first; last|] -> (Some first, None, Some last)
| [|first; middle; last|] -> (Some first, Some middle, Some last)
| _ -> (None, None, None)
``````

This isn't fuzzy logic so much as bad logic: we just assume that two parts = first, last name, and three parts = first, middle, last name. We make absolutely no guarantee that there's even an apporpriate value there, for the name of `John Mac Gregor`, we get `(Some John, Some Mac, Some Gregor)`. That's not right at all, for `Rhonda La Rouse`, we get `(Some Rhonda, Some La, Some Rouse)`, again wrong.

So let's define portions of our 'fuzzy logic':

``````let isInitial : char array -> bool = (function | [|_; '.'|] -> true | _ -> false)
``````

Good start, we test a `char array` for two elements of `anything, '.'`. We can test this easily: `"a." |> Seq.toArray |> isInitial`, which passes.

Now we need a function, `isOnlyInitials` which tells us if a `string` only consists of `initials`, i.e. `D.W.` should be `true`:

``````let isOnlyInitials : string -> bool =
Seq.toArray
>> Array.mapi (fun i x -> (x, i))
>> Array.groupBy (fun (x, i) -> i / 2)
>> Array.map snd
>> Array.map (Array.map fst)
>> Array.forall isInitial
``````

This is an interesting function, because we use `mapi` to give us the index and element, then `groupBy` to put them into pairs, then `map` to only get the second element (`x, i`), then `map (map fst)` to give us just the first element from each sub-array. We're left with just `forall isInitial`, which is equivalent to `Array.filter isInitial |> Array.length = 0`. Simple, right?

## Let's appy this to our existing function

So how do we apply this? We need to modify `splitNames` to account for the initial, and for the compound name. To do so, we'll get rid of the `match`, because we're now just going to work our fuzzy logic. First we need to get the parts:

``````let nameParts = ' ' |> name.Split
``````

That's a good start, and it gives us an array of `string` objects representing each part. Previously this was all we needed, but now we need to put them into groupings based on our new rules.

Now we need to "fold" over our names, and collect them into an array of names that are based on our rules. For this we'll build a folder that takes a `string list` and then a `string` and returns a `string list`, with the first element being the last element tested.

The basic idea here is to match `acc` with `prev::rem`, and if `prev` is <= 3 characters, and it's not only initials, then our new `prev` should be `prev part`.

``````nameParts
|> Array.fold (fun acc pt ->
match acc with
| prev::rem
when prev |> Seq.length <= 3
&& prev |> (isOnlyInitials >> not) ->
(sprintf "%s %s" prev pt)::rem
| _ -> pt::acc
) []
``````

## Make it work for Ron, darnit!

Pretty simple, right? So if we test it with a basic data-set: `["John Mac Gregor"; "Rhonda La Rouse"; "John C. Mac Gregor"; "Rhonda A. La Rouse"; "La Ronda Doe"] |> List.map splitNames;;`, we can see that it returns what we expected. Now, we lose one particular case I ran into: `"Do Ray Doe"`, which turns into `(Some "Do Ray", None, Some "Doe")`, when it should have been `(Some "Do", Some "Ray", Some "Doe")`. We also lost support for `Ron Doe`, which returns all `None` because `Ron` is paired with `Doe`. To fix this we'll alter our `when` guard clause:

``````let splitNames (name : string) =
let isInitial : char array -> bool = (function | [|_; '.'|] -> true | _ -> false)
let isOnlyInitials : string -> bool =
Seq.toArray
>> Array.mapi (fun i x -> (x, i))
>> Array.groupBy (fun (x, i) -> i / 2)
>> Array.map snd
>> Array.map (Array.map fst)
>> Array.forall isInitial
let nameParts = ' ' |> name.Split
let nameParts =
nameParts
|> Array.fold (fun acc pt ->
match acc with
| prev::rem
when prev |> Seq.length <= 3
&& prev |> (isOnlyInitials >> not)
&& rem |> List.length >= 1 ->
(sprintf "%s %s" prev pt)::rem
| prev::rem
when prev |> Seq.length <= 2
&& prev |> (isOnlyInitials >> not) ->
(sprintf "%s %s" prev pt)::rem
| _ -> pt::acc
) []
|> Array.ofList
|> Array.rev
match nameParts with
| [|first; middle; last|] -> (Some first, Some middle, Some last)
| [|first; last|] -> (Some first, None, Some last)
| _ -> (None, None, None)
``````

We also added a second `prev::rem` case, the first matches if there's already more than one other word with a length of <= 3 (so `Ron Doe` is fixed), the second matches any if the first word is <= 2 characters (working for `La Ronda`, but not properly for `Bo David`). Now we could add a filter for `Bo` to not match on the `<= 2` guard clause, but that adds a maintainability issue. Instead, we just document that it's 99% accurate, and what cases we know won't work properly.

Today's lesson is somewhat short, but introduces a very complex topic. I recommend that you make sure you fully understand what is happening, as it's important to know. Also, if anyone has a better idea on the `isOnlyInitials` function, I'd be very interested in hearing about it. The grouping by `index / 2` works, but I'm sure there's a better way to do it.