Building a list of several candidates
Lesson 14: Starting our next problem
In the previous lesson, I mentioned the following:
One of the biggest issues, that might not be immediately apparent, with how we do this is: what happens if we have multiple confusables for a character? We only transform the first one, which means we could be missing out on some possible conversions. We'll go over fixing this in the next lesson, but I want you to start thinking about it now. Essentially, our goal will be to swap out the transformTerm
with a version that returns every possible option (which for each of our samples is still only 1
).
If it's not clear, what I mean by this is that if a character we're given could be confused with two different characters, then we want to consider both possibilities. We want to be able to test all variants, to ensure we don't miss any.
For this, we need a way to gather all the variants. Here we'll go back to a quick REPL integration, to begin our development. We want a function, called getCombinations
, that allows us to provide an array of arrays, and returns another array of arrays of each possible combination of those positions. So, if we provided [|[|"g"|]; [|"r"|]; [|"e"; "a"|]; [|"y"|]|]
, we want to get back [|[|"g"; "r"; "e"; "y"|]; [|"g"; "r"; "a"; "y"|]|]
, thus, we should have the following:
// This should be [|"grey"; "gray"|]
let combinationTest = [|[|'g'|]; [|'r'|]; [|'e'; 'a'|]; [|'y'|]|] |> getCombinations |> Array.map String.implode
Make sense? So given an array of options for a position, return all possible combinations of those options. If we provide [|[|'w'|]; [|'o'|]; [|'r'|]; [|'d'|]|]
, get [|[|'w'; 'o'; 'r'; 'd'|]|]
.
We're going to do this in a fairly simple manner, using recursion (but not tail-call recursion, in our example). Now I've done the hard part, and written the getCombinations
function, in this lesson we'll focus on modifying our current program to use it.
Getting all the combinations
This function is somewhat simple, and lacks tail-call recursion, so a good exercise for you might be to rewrite it for tail-call recursion. (It can be done, and I have done it with this just to prove it, we'll discuss that in a different non-lesson.)
let rec getCombinations<'a> (input : 'a array array) : 'a array array =
let currentArray = input |> Array.head
let tail = input |> Array.tail
if tail |> Array.length |> any then
tail
|> getCombinations
|> Array.map (fun ia ->
currentArray
|> Array.map (fun ca ->
[|[|ca|]; ia|]
|> Array.flatten))
|> Array.flatten
else currentArray |> Array.map Array.initOne
So, we basically get the sub-combinations, and combine them with the combinations of the option we currently have. Pretty simple, right?
Now we can test this with the previous tests, and prove it to ourselves, but I leave that up to you. (I already proved it to myself, it's your turn to prove that it works like we want.) We use the 'generic type' specification here, as it should really work on anything. So, your signature should be 'a array array -> 'a array array
.
Next, we need to pull this to our transform
Previously, we defined transformTerm
as follows:
let transformTerm =
Array.map (fun codePoint ->
match codePoint |> findCandidates with
| [||] -> [|codePoint|]
| candidates -> candidates.[0].Replacement)
>> Array.flatten
We're going to slightly update this defnition. The first step is to modify candidates
to return all the replacements: candidates |> Array.map (fun x -> x.Replacement))
. Next, we need to modify the [||]
(empty) match-case to return [|[|codePoint|]|]
. Then, we need to add a call to getCombinations
at the end. Finally, move Array.flatten
to the end, in an Array.map
.
let transformTerm =
Array.map (fun codePoint ->
match codePoint |> findCandidates with
| [||] -> [|[|codePoint|]|]
| candidates -> candidates |> Array.map (fun x -> x.Replacement))
>> getCombinations
>> Array.map Array.flatten
This, of course, breaks our transformToLowerTerm
and matchedFilters
. For transformToLowerTerm
we just need to Array.map lowerCaseTerm
instead, and the matchedFilters
will be a bit more difficult. Our previous version was pretty simple:
let matchedFilters term = filters |> List.filter (term |> transformToLowerTerm |> (=))
Here, we have to think about what needs done now. We have multiple terms being matched to multiple filters, as such we have a significant rewrite to do, but only on the List.filter
portion. We'll modify the filter to filter each term against the current filter, and then test that we got any matches. If we did, any form of the term matched our filter.
let matchedFilters term =
filters
|> List.filter (fun filter ->
let transformations = term |> transformToLowerTerm
let anyMatch = transformations |> Array.filter ((=) filter)
anyMatch |> Array.length |> any)
That should be all we need to modify, we can test our transformToLowerTerm
with the following:
terms |> List.map (transformToLowerTerm >> Array.map String.fromCodePoints)
Small bugs begin to appear
We do happen to have a bug in this, and I'll show you what it is quick.
In the case that we add a confusable, we always ignore the original character. This can be problematic if we add confusables for the regular ASCII characters, such as o
, p
, or e
. To fix this, we simply add Array.append [|[|codePoint|]|]
after our Array.map (fun x -> x.Replacement)
. This will include the original character as well.
We can test all this out on our real data-set, by changing our confusables
and filters
:
let confusables2 =
data
|> String.split '\n'
|> Seq.filter (String.startsWith "#" >> not)
|> Seq.map (Seq.takeWhile ((<>) '#') >> Seq.toArray >> String.implode >> String.split ';')
|> Seq.choose mapToConfusable
|> Seq.toArray
let confusables =
[|{ Original = ("ℕ", 0) ||> Char.toCodePoint; Replacement = "m" |> String.toCodePoints |> Seq.toArray; Field3 = "" }|]
|> Array.append confusables2
let listToCodePoints = List.map (String.toCodePoints >> Seq.toArray)
let filters = ["mope"; "fail"] |> listToCodePoints
let terms = ["ℕope"; "𝑵ope"; "ռope"; "nope"; "𝕱ail"; "𝓕ail"; "pass"; "𝕿rue"; "𝓽𝓻𝓾𝒆"] |> listToCodePoints
This should result in matchedTerms
being val matchedTerms : string list = ["ℕope"; "𝕱ail"; "𝓕ail"]
instead of the whole list, if we change mope
to nope
we still get the first "ℕope"
, which proves everything works properly.
Great work, team!
Overall, we should have learned that if we structure our code well, we can add new functionality without having to do too much modification. We can easily add a whole new feature without creating much of a problem at all.
The entire goal of this lesson is to show how we can add substantial functionality to our software without doing major rewrites. It also helps demonstrate how we should be thinking about problems as we solve them. Work from a small issue up to the larger one, and build independent stages to allow easy troubleshooting, testing and integration of them. This wasn't a long lesson, but it's just as important as any other. (I've been swamped at work all week thanks to the ball being dropped by an API we consume, so haven't had a lot of time to get a good post together. Hopefully this meets everyone's expectations.)