using Programming;

A Blog about some of the intrinsics related to programming and how one can get the best out of various languages.

Getting started with programming and getting absolutely nowhere (Part 4)

Moving on to solving Business Problems

Lesson 3: Since We've Made Something, Let's Make it Cooler
Lesson 5: Now We Begin Our Adventure

By now I hope you have a grasp on the language we'll be using, because I'm done with the basic Syntax section. I'm going to start going into business problems, business logic, and we're going to actually build things for our business domain. These are real problems, real challenges that I've had to solve for work, and I want to give you real-world experience, instead of the contrived examples many "introduction to programming" blogs, tutorials and guides go through. (Things like "Hello World", "Fizz Buzz", "Project Euler" - it's not that I think these are not valuable, I just see no need to repeat the same thing over and over again. You can find plenty of tutorials that use these all over the net, mine will not be such things.)

You probably remember that in my first post I said we were going to explore things, and that I would not give you all the answers. I am still going to be following that modus operandi, and we're going to define some problems and talk out the solutions. From here-on-out any new syntax tricks I introduce I'll be brief on, basically I'll tell you the most effective term to google in order to get information on the component of the language. For this reason I recommend you bookmark F# for Fun and Profit. This will become an extraordinary tool on our adventures.

We're going to process a lot of spreadsheet data

I'm not sure if you're aware, but in the business world we use Excel a lot. I mean, way more than is healthy (or even reasonable). As a result of this, many of us in the IT field find that we end up processing Excel data a lot more than normal. (In fact, there's an entire community of programmers I happen to know working on a VBA IDE add-in, to make this less unpleasant.) I'm not telling you this to scare you, but it probably will; that said, don't be concerned, because there are tools out there to make this workable from F#: the first of which is COM Interop (which we won't do, because it sucks), and the second I'm going to mention being the NPOI library, available on NuGet.

We're going to use (or, at least, I'm going to use) the NPOI library, based on the Apache POI for Java. You'll find the documentation to be less than helpful, but it's a very powerful library. It allows you to work with the Excel sheet as a regular .NET object, instead of having to deal with it from a COM Interop point of view, which makes heavy use of dynamic typing. (This, as a side-effect, makes it more painful to work with in F# as dynamic typing is not a regular thing there.) Before we start working with an Excel workbook, though, I want to get some "helper" functions we'll use out of the way.

Keep in mind that I'm going to be demonstrating how to solve problems based on things I've gone through. This means many of the problems I go through here will be Excel based, but not all of them. So bear with it, if it seems like we're doing a lot of boring Excel processing and such, it just means I haven't don anything else at work for a while. None of these things are contrived, I really cannot stress that enough.

Creating some helpers for our operations

Right, so all that said, let's move on. Excel (and, as a result, most spreadsheet processors) use a letter+number to index a cell, with the letter being the column number (I.e. A is column 1), and the number being the row. For this reason, we probably want a function which can turn a letter-based-column-number into an actual digit, since NPOI (and arrays in general) index things by integer values. The function I've defined is pretty basic, but it convers a char to an integer, returning Some if it's valid, and None if not:

let letterVal (c : char) =
    match int c with
    | v when v >= int 'A' && v <= int 'Z' -> Some (v - int 'A')
    | _ -> None

Of course this has the flaw of only supporting "capital case" letters, but we could modify that to support lower-case as well:

let letterVal (c : char) =
    match int c with
    | v when v >= int 'A' && v <= int 'Z' -> Some (v - int 'A')
    | v when v >= int 'a' && v <= int 'z' -> Some (v - int 'a')
    | _ -> None

And if it's necessary for your implementation we can support literal digits:

let letterVal (c : char) =
    match int c with
    | v when v >= int 'A' && v <= int 'Z' -> Some (v - int 'A')
    | v when v >= int 'a' && v <= int 'z' -> Some (v - int 'a')
    | v when v >= int '0' && v <= int '9' -> Some (v - int '0')
    | _ -> None

So this gets us a function which we can test out that should allow us to turn "A" into the first column number, which when 0-based is 0: let colValue = 'A' |> letterVal. Pretty simple, and pretty powerful. Being an F# function that returns an Option, we can do something like the following: let colValues = 'AB' |> String.explode |> Array.choose letterVal. This should give us an array of [|0; 1|]. The problem with this is that we really need a single number, not two. For AB we want to get 27, so we could build a function (colNum) that defines the following:

let colNum =
    >> Array.choose letterVal
    >> Array.rev
    >> Array.mapi (fun i x -> (float x) * System.Math.Pow(26., i |> float))
    >> Array.sum
    >> int

When bugs fly

We have a huge bug with this though: for our value of "AB" it's going to return 1, not 27 as we would expect. Our mistake is not so subtle: the (float x) * is the problem: when x is 0 (as it is for A), it will always return 0. We could fix this by doing float x + 1., but then we're doing extra work that really is part of the letterVal logic, to fix this, we'll just apply the offset in letterVal:

let letterVal (c : char) =
    match int c with
    | v when v >= int 'A' && v <= int 'Z' -> Some (1 + v - int 'A')
    | v when v >= int 'a' && v <= int 'z' -> Some (1 + v - int 'a')
    | v when v >= int '0' && v <= int '9' -> Some (1 + v - int '0')
    | _ -> None

But now we think to ourselves, 'why am I doing the same step (1 + v - int ...) in three places? Can I do that in one place?' We sure can:

let letterVal (c : char) =
    let f (c : char) v = 1 + v - int c
    match int c with
    | v when v >= int 'A' && v <= int 'Z' -> v |> f 'A' |> Some
    | v when v >= int 'a' && v <= int 'z' -> v |> f 'a' |> Some
    | v when v >= int '0' && v <= int '9' -> v |> f '0' |> Some
    | _ -> None

So this almost fixes our bug, now "AB" |> colnum returns 28, which is almost there. We need to subtract 1 from the result to get the actual value. For this we're going to just create a sub method, which will take two values and then return the first value subtracted from the second. You can define this yourself, but it should end up working as follows:

let colNum =
    >> Array.choose letterVal
    >> Array.rev
    >> Array.mapi (fun i x -> (float x) * System.Math.Pow(26., i |> float))
    >> Array.sum
    >> int
    >> sub 1

Beautiful. Our code works, we can test this by running ["A"; "Z"; "AA"; "AB"; "ZZ"; "AAA"] |> colNum, which should give us [1; 26; 27; 28; 702; 703]. We could subtract another 1 in our method to zero-base it, but instead we'll make a new method which returns colNum minus 1: let colNumZBase =. You can implement this on your own.

Making it a bit more usable

This whole colNum function is great, but I want to make the fun i x lambda an actual function, that can be used elsewhere. For this we have to think pretty hard about what it's doing, why and how. For those who don't know what's happening here, the Array.mapi lambda is converting the values from an integer to the placeholder for the base. If we consider that our 26. is our base, call it n, each value is going from x to x * 26.^i, where i is the index. (We reverse it so that as the order of the values goes right, they become larger indexes and thus higher placeholders. This means each position is n ^ i, where n is the base and i is the position, then we can define a function which does this math more abstractly:

let baseValue b i =
    System.Math.Pow(b, i)

Now our formula has some more meaning, but it's not fully reusable. We can see cases when we would be trying to do this often, in fact our baseValue always uses 26. as the base in our scenario, so we could subdefine even further: let indexVal = baseValue 26. and change our formula to fun i x -> (float x) * indexVal (float i), but we can apply further transformations to even more generalize it: let elementVal i x = i |> float |> baseValue 26. |> (*) x, which leads us for a final transformation into:

let colNum =
    let elementVal i x = i |> float |> baseValue 26. |> (*) x
    >> Array.choose letterVal
    >> Array.rev
    >> float
    >> Array.mapi elementVal
    >> Array.sum
    >> int
let colNumZBase = colNum >> sub 1

This leaves our entire colNum as just a series of function compositions, and each function is independent. This makes our code much more readable, and allows us to be more expressive in the future. Notice that we expected x to be a float in elementVal, we could remove the float call and then replace x with (float x) in the elementVal function, but that's less clear. Now the transformation is happening elsewhere - we want it to happen before that function to alleviate its responsibilities. We can expect i to be transformed to a float, because after all it's an index, but the values could have special transformations, so we don't want to make that the elementVal functions responsibility.

Organizing our code

We've built everything, and it all works, but we need to organize it. F# has Modules and Namespaces, I won't go into details of the differences, you can read about those elsewhere, instead I'm just going to drop all our code into a module in a separate file that we can reuse later.

module Excel
    /// Allows piping of the subtraction operator as the second argument: sub a b is equivalent to b - a.
    let private sub b a = ...

    /// Returns the integer value of a character if the charater is an alphabet character, or a number. (If char is between 'A' and 'Z', returns char - 'A' + 1, for example.)
    let letterVal (c : char) = ...

    /// Returns the value of the index for the specified base. I.e. base^i
    let private baseValue b i = ...

    /// Returns the 1-based index of the column number. I.e. "A" becomes 1, "AA" becomes 27.
    let colNum = ...

    /// Returns colNum - 1. I.e. "A" becomes 0, "AA" becomes 26.
    let colNumZBase = ...

This keeps it out of our way in our production work - because we won't need to modify or change it there (most likely). We're just going to consume it, which means we just need to be able to reference what we've already built. We could organize it further, since sub and baseValue are more abstract functions, but I leave that up to you if you like.

Some other helpful functions

I don't want to start actually processing an Excel workbook today, so I'm going to give you some more modules and such that I have that are particularly helpful. I won't explain all (most) of them, but they'll be used in my samples later and I want you to have them now.

The first three I'll explain slightly: F# has the three collection types, and these "flatten" a collection of collections, that is: if you have an array of arrays, this will convert them to one array, with all the sub elements. It's basically _.collect id, where id is the identity function, but I find it easier to read _.flatten in my code than _.collect id.

module Array
    /// Flattens an array of arrays to a single-dimensional array. (Equivalent to Array.collect id)
    let flatten<'a> : 'a array array -> 'a array = Array.collect id
module List
    /// Flattens a list of lists to a single-dimensional list. (Equivalent to List.collect id)
    let flatten<'a> : 'a list list -> 'a list = List.collect id
module Seq
    /// Flattens a sequence of sequences to a single-dimensional sequence. (Equivalent to Seq.collect id)
    let flatten<'a> : 'a seq seq -> 'a seq = Seq.collect id

I have all of these in three separate files, but you can arrange them how you like. They're called like a normal function on that collection type, with .flatten. The <'a> is an F# generic type-parameter. Feel free to look those up to learn more.

The rest of the helpful functions that I have (up to now) are below, each module is in it's own file for organization purposes:

module Object
    /// Equivalent to object.ToString.
    let toStr o = o.ToString()

module Char
    /// Returns true if a char is equal to the double-quote (").
    let isDQuote = (=) '"'
    /// Returns true if a char is equal to the single-quote (').
    let isSQuote = (=) '''
    /// Returns true if a char is equal to the double-quote (") or single-quote (').
    let isQuote c = c |> isDQuote || c |> isSQuote

module String
    open System
    /// Gets the char array of a string.
    let explode : string -> char array = Seq.toArray
    /// Assembles a string from a char array.
    let implode : char array -> string = String
    /// Splits a string using the StringSplitOptions on the specified char.
    let splitOpt (options : StringSplitOptions) (c : char) (str : string) = str.Split([|c|], options)
    /// Splits a string on the specified char using the default StringSplitOptions.
    let split (c : char) (str : string) = str.Split(c)
    /// Determines if a string contains a substring.
    let contains (search : string) (subject : string) = subject.Contains(search)
    /// Performs a String.Trim() which removes leading and trailing whitespace.
    let trim (str : string) = str.Trim()

    let private processStr func = explode >> func >> implode
    let private filterStr func = explode >> Array.filter func >> implode

    /// Removes all double-quote characters from a string.
    let stripDQuotes = filterStr (Char.isDQuote >> not)
    /// Removes all single-quote characters from a string
    let stripSQuotes = filterStr (Char.isSQuote >> not)
    /// Removes all double- or single-quote characters from a string in one pass.
    let stripQuotes = filterStr (Char.isQuote >> not)
    /// Filters a string to contain only digits.
    let digitsOnly = filterStr Char.IsDigit
    /// Filters a string to contain only letters.
    let lettersOnly = filterStr Char.IsLetter

All of these are pretty basic, so I won't explain them. Feel free to use them or not, but these are some of the most common functions I use in my work. You'll notice most of them are function compositions: I like to compose functions as much as possible, as it should reduce some overhead as far as function calls later on. (Whether it does or not is untested, but at the very least it forces me to be consistent.)

Defining function "purity"

Many of these are also "pure" functions, that is: a function which will return the exact same output every time it is run for a specific input. Think of something like a + b, the + function is pure, it will always return the same result for any two inputs. In our library functions here all of them are, in-fact, pure. They are entirely reproduceable. If we repeat the function at any given point in time it will always have the same result.

Some functions and methods in .NET are considered impure: they are not guaranteed to return the same output for any given input on each invokation. Consider the .NET Random: calling Next() will generate a different result on each subsequent call, which means more than just the input parameters can influence the function. If a function return value can change based on things other than input parameters, it's impure.

Bonus day: Filtering Parenthesis

One of the fields I get in these spreadsheets is a specific data field, often with a bunch of parenthesized information in it. Let me rephrase that: the data is ______ and then there's a nasty (____) in there, sometimes with sub-parenthesis.

This, of course, doesn't match with what I expect from the field, usually I am only concerned with the non-parenthetical data. (In fact, I'm always concerned with the non-parenthetical data, just occasionally I want the parenthetical data as well. Not often.) In order to support this, I built a function filterParens (filter parenthesis) which will go through each character in the string, and if it is an opening parenthesis, it will ignore all characters until that parenthesis is closed. It also supports sub-parentheticals, that is: (Something (Else)) will be entirely filtered, and (Something (Else) Entirely) will be as well.

I did this through clever use of a fold, which really wasn't all that clever. In fact, it's pretty straightforward:

let filterParens =
    >> Array.fold (fun acc el -> 
        match acc, el with
        | (items, i), '(' -> (items, i + 1)
        | (items, i), ')' -> (items, Math.Min(i - 1, 0))
        | (items, 0), _ -> (el::items, 0)
        | (items, i), _ -> (items, i)) ([], 0)
    >> fst
    >> List.rev
    >> List.toArray
    >> String.implode

If it's not obvious, acc is a tuple of the current list of items, and the level of parenthesis we are in. (Because F# is stateless and immutable by default, we cannot just increment or decrement the value for our levels. This is not a bad thing though - beginners may find it easier to understand a stateless language, which is why I selected F# for our adventures.) Also, fst is a function that takes a tuple and returns the first element, if you want to research further.

Now I don't allow it to drop below 0 (the Math.min(i - 1, 0)), so if we close a parenthesis before we opened it, the only result will be that we don't save either. Anything between them will still be saved, and anything after the opening one will not be saved (because it will be back to level 1). This is not a problem for my data-set, but if it is for yours then you could easily modify the second and third match expressions to account for that. (In fact, I recommend you try doing that. Your homework for this post is to modify filterParens to not stop at 0 levels, but allow negative levels as well, and not ignore text between )...(.)

We're done with todays lesson, I know it wasn't as "cool" as we had hoped, but this sets us up for success when we do begin processing things with NPOI. (Which I expect will be the next post here.) As always, keep in mind that while this may be somewhat mundane, it does lead in to bigger things. (I do have a form of a curriculum planned.)