# Building some 'fuzzy logic' for our names

Lesson 9: Knowing the terminology is important

If you recall, in lesson 7 I mentioned the following:

We'll then (at a later date) explore how we can modify it to account for multi-part last names, such as `LA ROUSE`, or `MAC GREGOR`. This is a lot harder than it looks, and we'll only worry about that for `Sheet 2`, because in `Sheet 1` it's always separated out.

Today, we're going to explore this part of our topic - we'll build some 'fuzzy logic' that doesn't seem fuzzy, but happens to be. Of course, first we need to define 'fuzzy logic', so let's do that.

## What is 'fuzzy logic'?

Fuzzy logic is used fairly often in programming, it's essentially building a set of rules that sacrifice one of the following to make up for the other two: predictability, accuracy, and speed. Ours will sacrifice accuracy for speed and predictability. We're going to build a function that is mostly correct, but sometimes wrong, to allow us to solve the 95% scenario.

We need some "rules" for our fuzzy logic, so let's define them:

• If a word is <= 3 characters, it's part of a "compound word";
• A compound word should be grouped with the next word if possible;
• If a word is a single character followed by a single period, in any quantity, it's an "initial";

In [lesson 8], we built `splitNames`:

``````let splitNames (name : string) =
match ' ' |> name.Split with
| [|first; last|] -> (Some first, None, Some last)
| [|first; middle; last|] -> (Some first, Some middle, Some last)
| _ -> (None, None, None)
``````

This isn't fuzzy logic so much as bad logic: we just assume that two parts = first, last name, and three parts = first, middle, last name. We make absolutely no guarantee that there's even an apporpriate value there, for the name of `John Mac Gregor`, we get `(Some John, Some Mac, Some Gregor)`. That's not right at all, for `Rhonda La Rouse`, we get `(Some Rhonda, Some La, Some Rouse)`, again wrong.

So let's define portions of our 'fuzzy logic':

``````let isInitial : char array -> bool = (function | [|_; '.'|] -> true | _ -> false)
``````

Good start, we test a `char array` for two elements of `anything, '.'`. We can test this easily: `"a." |> Seq.toArray |> isInitial`, which passes.

Now we need a function, `isOnlyInitials` which tells us if a `string` only consists of `initials`, i.e. `D.W.` should be `true`:

``````let isOnlyInitials : string -> bool =
Seq.toArray
>> Array.mapi (fun i x -> (x, i))
>> Array.groupBy (fun (x, i) -> i / 2)
>> Array.map snd
>> Array.map (Array.map fst)
>> Array.forall isInitial
``````

This is an interesting function, because we use `mapi` to give us the index and element, then `groupBy` to put them into pairs, then `map` to only get the second element (`x, i`), then `map (map fst)` to give us just the first element from each sub-array. We're left with just `forall isInitial`, which is equivalent to `Array.filter isInitial |> Array.length = 0`. Simple, right?

## Let's appy this to our existing function

So how do we apply this? We need to modify `splitNames` to account for the initial, and for the compound name. To do so, we'll get rid of the `match`, because we're now just going to work our fuzzy logic. First we need to get the parts:

``````let nameParts = ' ' |> name.Split
``````

That's a good start, and it gives us an array of `string` objects representing each part. Previously this was all we needed, but now we need to put them into groupings based on our new rules.

Now we need to "fold" over our names, and collect them into an array of names that are based on our rules. For this we'll build a folder that takes a `string list` and then a `string` and returns a `string list`, with the first element being the last element tested.

The basic idea here is to match `acc` with `prev::rem`, and if `prev` is <= 3 characters, and it's not only initials, then our new `prev` should be `prev part`.

``````nameParts
|> Array.fold (fun acc pt ->
match acc with
| prev::rem
when prev |> Seq.length <= 3
&& prev |> (isOnlyInitials >> not) ->
(sprintf "%s %s" prev pt)::rem
| _ -> pt::acc
) []
``````

## Make it work for Ron, darnit!

Pretty simple, right? So if we test it with a basic data-set: `["John Mac Gregor"; "Rhonda La Rouse"; "John C. Mac Gregor"; "Rhonda A. La Rouse"; "La Ronda Doe"] |> List.map splitNames;;`, we can see that it returns what we expected. Now, we lose one particular case I ran into: `"Do Ray Doe"`, which turns into `(Some "Do Ray", None, Some "Doe")`, when it should have been `(Some "Do", Some "Ray", Some "Doe")`. We also lost support for `Ron Doe`, which returns all `None` because `Ron` is paired with `Doe`. To fix this we'll alter our `when` guard clause:

``````let splitNames (name : string) =
let isInitial : char array -> bool = (function | [|_; '.'|] -> true | _ -> false)
let isOnlyInitials : string -> bool =
Seq.toArray
>> Array.mapi (fun i x -> (x, i))
>> Array.groupBy (fun (x, i) -> i / 2)
>> Array.map snd
>> Array.map (Array.map fst)
>> Array.forall isInitial
let nameParts = ' ' |> name.Split
let nameParts =
nameParts
|> Array.fold (fun acc pt ->
match acc with
| prev::rem
when prev |> Seq.length <= 3
&& prev |> (isOnlyInitials >> not)
&& rem |> List.length >= 1 ->
(sprintf "%s %s" prev pt)::rem
| prev::rem
when prev |> Seq.length <= 2
&& prev |> (isOnlyInitials >> not) ->
(sprintf "%s %s" prev pt)::rem
| _ -> pt::acc
) []
|> Array.ofList
|> Array.rev
match nameParts with
| [|first; middle; last|] -> (Some first, Some middle, Some last)
| [|first; last|] -> (Some first, None, Some last)
| _ -> (None, None, None)
``````

We also added a second `prev::rem` case, the first matches if there's already more than one other word with a length of <= 3 (so `Ron Doe` is fixed), the second matches any if the first word is <= 2 characters (working for `La Ronda`, but not properly for `Bo David`). Now we could add a filter for `Bo` to not match on the `<= 2` guard clause, but that adds a maintainability issue. Instead, we just document that it's 99% accurate, and what cases we know won't work properly.

Today's lesson is somewhat short, but introduces a very complex topic. I recommend that you make sure you fully understand what is happening, as it's important to know. Also, if anyone has a better idea on the `isOnlyInitials` function, I'd be very interested in hearing about it. The grouping by `index / 2` works, but I'm sure there's a better way to do it.  