using Programming;

A Blog about some of the intrinsics related to programming and how one can get the best out of various languages.

Getting started with programming and getting absolutely nowhere (Part 15)

Building a list of several candidates

Lesson 14: Starting our next problem

In the previous lesson, I mentioned the following:

One of the biggest issues, that might not be immediately apparent, with how we do this is: what happens if we have multiple confusables for a character? We only transform the first one, which means we could be missing out on some possible conversions. We'll go over fixing this in the next lesson, but I want you to start thinking about it now. Essentially, our goal will be to swap out the transformTerm with a version that returns every possible option (which for each of our samples is still only 1).

If it's not clear, what I mean by this is that if a character we're given could be confused with two different characters, then we want to consider both possibilities. We want to be able to test all variants, to ensure we don't miss any.

For this, we need a way to gather all the variants. Here we'll go back to a quick REPL integration, to begin our development. We want a function, called getCombinations, that allows us to provide an array of arrays, and returns another array of arrays of each possible combination of those positions. So, if we provided [|[|"g"|]; [|"r"|]; [|"e"; "a"|]; [|"y"|]|], we want to get back [|[|"g"; "r"; "e"; "y"|]; [|"g"; "r"; "a"; "y"|]|], thus, we should have the following:

// This should be [|"grey"; "gray"|]
let combinationTest = [|[|'g'|]; [|'r'|]; [|'e'; 'a'|]; [|'y'|]|] |> getCombinations |> Array.map String.implode

Make sense? So given an array of options for a position, return all possible combinations of those options. If we provide [|[|'w'|]; [|'o'|]; [|'r'|]; [|'d'|]|], get [|[|'w'; 'o'; 'r'; 'd'|]|].

We're going to do this in a fairly simple manner, using recursion (but not tail-call recursion, in our example). Now I've done the hard part, and written the getCombinations function, in this lesson we'll focus on modifying our current program to use it.

Getting all the combinations

This function is somewhat simple, and lacks tail-call recursion, so a good exercise for you might be to rewrite it for tail-call recursion. (It can be done, and I have done it with this just to prove it, we'll discuss that in a different non-lesson.)

let rec getCombinations<'a> (input : 'a array array) : 'a array array =
    let currentArray = input |> Array.head
    let tail = input |> Array.tail
    if tail |> Array.length |> any then
        tail
        |> getCombinations
        |> Array.map (fun ia ->
            currentArray
            |> Array.map (fun ca ->
                [|[|ca|]; ia|]
                |> Array.flatten))
        |> Array.flatten
    else currentArray |> Array.map Array.initOne

So, we basically get the sub-combinations, and combine them with the combinations of the option we currently have. Pretty simple, right?

Now we can test this with the previous tests, and prove it to ourselves, but I leave that up to you. (I already proved it to myself, it's your turn to prove that it works like we want.) We use the 'generic type' specification here, as it should really work on anything. So, your signature should be 'a array array -> 'a array array.

Next, we need to pull this to our transform

Previously, we defined transformTerm as follows:

let transformTerm =
    Array.map (fun codePoint ->
        match codePoint |> findCandidates with
        | [||] -> [|codePoint|]
        | candidates -> candidates.[0].Replacement)
    >> Array.flatten

We're going to slightly update this defnition. The first step is to modify candidates to return all the replacements: candidates |> Array.map (fun x -> x.Replacement)). Next, we need to modify the [||] (empty) match-case to return [|[|codePoint|]|]. Then, we need to add a call to getCombinations at the end. Finally, move Array.flatten to the end, in an Array.map.

let transformTerm =
    Array.map (fun codePoint ->
        match codePoint |> findCandidates with
        | [||] -> [|[|codePoint|]|]
        | candidates -> candidates |> Array.map (fun x -> x.Replacement))
    >> getCombinations
    >> Array.map Array.flatten

This, of course, breaks our transformToLowerTerm and matchedFilters. For transformToLowerTerm we just need to Array.map lowerCaseTerm instead, and the matchedFilters will be a bit more difficult. Our previous version was pretty simple:

let matchedFilters term = filters |> List.filter (term |> transformToLowerTerm |> (=))

Here, we have to think about what needs done now. We have multiple terms being matched to multiple filters, as such we have a significant rewrite to do, but only on the List.filter portion. We'll modify the filter to filter each term against the current filter, and then test that we got any matches. If we did, any form of the term matched our filter.

let matchedFilters term =
    filters
    |> List.filter (fun filter ->
        let transformations = term |> transformToLowerTerm
        let anyMatch = transformations |> Array.filter ((=) filter)
        anyMatch |> Array.length |> any)

That should be all we need to modify, we can test our transformToLowerTerm with the following:

terms |> List.map (transformToLowerTerm >> Array.map String.fromCodePoints)

Small bugs begin to appear

We do happen to have a bug in this, and I'll show you what it is quick.

In the case that we add a confusable, we always ignore the original character. This can be problematic if we add confusables for the regular ASCII characters, such as o, p, or e. To fix this, we simply add Array.append [|[|codePoint|]|] after our Array.map (fun x -> x.Replacement). This will include the original character as well.

We can test all this out on our real data-set, by changing our confusables and filters:

let confusables2 =
    data
    |> String.split '\n'
    |> Seq.filter (String.startsWith "#" >> not)
    |> Seq.map (Seq.takeWhile ((<>) '#') >> Seq.toArray >> String.implode >> String.split ';')
    |> Seq.choose mapToConfusable
    |> Seq.toArray

let confusables =
    [|{ Original = ("ℕ", 0) ||> Char.toCodePoint; Replacement = "m" |> String.toCodePoints |> Seq.toArray; Field3 = "" }|]
    |> Array.append confusables2

let listToCodePoints = List.map (String.toCodePoints >> Seq.toArray)

let filters = ["mope"; "fail"] |> listToCodePoints
let terms = ["ℕope"; "𝑵ope"; "ռope"; "nope"; "𝕱ail"; "𝓕ail"; "pass"; "𝕿rue"; "𝓽𝓻𝓾𝒆"] |> listToCodePoints

This should result in matchedTerms being val matchedTerms : string list = ["ℕope"; "𝕱ail"; "𝓕ail"] instead of the whole list, if we change mope to nope we still get the first "ℕope", which proves everything works properly.

Great work, team!

Overall, we should have learned that if we structure our code well, we can add new functionality without having to do too much modification. We can easily add a whole new feature without creating much of a problem at all.


The entire goal of this lesson is to show how we can add substantial functionality to our software without doing major rewrites. It also helps demonstrate how we should be thinking about problems as we solve them. Work from a small issue up to the larger one, and build independent stages to allow easy troubleshooting, testing and integration of them. This wasn't a long lesson, but it's just as important as any other. (I've been swamped at work all week thanks to the ball being dropped by an API we consume, so haven't had a lot of time to get a good post together. Hopefully this meets everyone's expectations.)

Loading