using Programming;

A Blog about some of the intrinsics related to programming and how one can get the best out of various languages.

Getting started with programming and getting absolutely nowhere (Part 4)

Moving on to solving Business Problems

Lesson 3: Since We've Made Something, Let's Make it Cooler
Lesson 5: Now We Begin Our Adventure

By now I hope you have a grasp on the language we'll be using, because I'm done with the basic Syntax section. I'm going to start going into business problems, business logic, and we're going to actually build things for our business domain. These are real problems, real challenges that I've had to solve for work, and I want to give you real-world experience, instead of the contrived examples many "introduction to programming" blogs, tutorials and guides go through. (Things like "Hello World", "Fizz Buzz", "Project Euler" - it's not that I think these are not valuable, I just see no need to repeat the same thing over and over again. You can find plenty of tutorials that use these all over the net, mine will not be such things.)

You probably remember that in my first post I said we were going to explore things, and that I would not give you all the answers. I am still going to be following that modus operandi, and we're going to define some problems and talk out the solutions. From here-on-out any new syntax tricks I introduce I'll be brief on, basically I'll tell you the most effective term to google in order to get information on the component of the language. For this reason I recommend you bookmark F# for Fun and Profit. This will become an extraordinary tool on our adventures.

We're going to process a lot of spreadsheet data

I'm not sure if you're aware, but in the business world we use Excel a lot. I mean, way more than is healthy (or even reasonable). As a result of this, many of us in the IT field find that we end up processing Excel data a lot more than normal. (In fact, there's an entire community of programmers I happen to know working on a VBA IDE add-in, to make this less unpleasant.) I'm not telling you this to scare you, but it probably will; that said, don't be concerned, because there are tools out there to make this workable from F#: the first of which is COM Interop (which we won't do, because it sucks), and the second I'm going to mention being the NPOI library, available on NuGet.

We're going to use (or, at least, I'm going to use) the NPOI library, based on the Apache POI for Java. You'll find the documentation to be less than helpful, but it's a very powerful library. It allows you to work with the Excel sheet as a regular .NET object, instead of having to deal with it from a COM Interop point of view, which makes heavy use of dynamic typing. (This, as a side-effect, makes it more painful to work with in F# as dynamic typing is not a regular thing there.) Before we start working with an Excel workbook, though, I want to get some "helper" functions we'll use out of the way.

Keep in mind that I'm going to be demonstrating how to solve problems based on things I've gone through. This means many of the problems I go through here will be Excel based, but not all of them. So bear with it, if it seems like we're doing a lot of boring Excel processing and such, it just means I haven't don anything else at work for a while. None of these things are contrived, I really cannot stress that enough.

Creating some helpers for our operations

Right, so all that said, let's move on. Excel (and, as a result, most spreadsheet processors) use a letter+number to index a cell, with the letter being the column number (I.e. A is column 1), and the number being the row. For this reason, we probably want a function which can turn a letter-based-column-number into an actual digit, since NPOI (and arrays in general) index things by integer values. The function I've defined is pretty basic, but it convers a char to an integer, returning Some if it's valid, and None if not:

let letterVal (c : char) =
    match int c with
    | v when v >= int 'A' && v <= int 'Z' -> Some (v - int 'A')
    | _ -> None

Of course this has the flaw of only supporting "capital case" letters, but we could modify that to support lower-case as well:

let letterVal (c : char) =
    match int c with
    | v when v >= int 'A' && v <= int 'Z' -> Some (v - int 'A')
    | v when v >= int 'a' && v <= int 'z' -> Some (v - int 'a')
    | _ -> None

And if it's necessary for your implementation we can support literal digits:

let letterVal (c : char) =
    match int c with
    | v when v >= int 'A' && v <= int 'Z' -> Some (v - int 'A')
    | v when v >= int 'a' && v <= int 'z' -> Some (v - int 'a')
    | v when v >= int '0' && v <= int '9' -> Some (v - int '0')
    | _ -> None

So this gets us a function which we can test out that should allow us to turn "A" into the first column number, which when 0-based is 0: let colValue = 'A' |> letterVal. Pretty simple, and pretty powerful. Being an F# function that returns an Option, we can do something like the following: let colValues = 'AB' |> String.explode |> Array.choose letterVal. This should give us an array of [|0; 1|]. The problem with this is that we really need a single number, not two. For AB we want to get 27, so we could build a function (colNum) that defines the following:

let colNum =
    >> Array.choose letterVal
    >> Array.rev
    >> Array.mapi (fun i x -> (float x) * System.Math.Pow(26., i |> float))
    >> Array.sum
    >> int

When bugs fly

We have a huge bug with this though: for our value of "AB" it's going to return 1, not 27 as we would expect. Our mistake is not so subtle: the (float x) * is the problem: when x is 0 (as it is for A), it will always return 0. We could fix this by doing float x + 1., but then we're doing extra work that really is part of the letterVal logic, to fix this, we'll just apply the offset in letterVal:

let letterVal (c : char) =
    match int c with
    | v when v >= int 'A' && v <= int 'Z' -> Some (1 + v - int 'A')
    | v when v >= int 'a' && v <= int 'z' -> Some (1 + v - int 'a')
    | v when v >= int '0' && v <= int '9' -> Some (1 + v - int '0')
    | _ -> None

But now we think to ourselves, 'why am I doing the same step (1 + v - int ...) in three places? Can I do that in one place?' We sure can:

let letterVal (c : char) =
    let f (c : char) v = 1 + v - int c
    match int c with
    | v when v >= int 'A' && v <= int 'Z' -> v |> f 'A' |> Some
    | v when v >= int 'a' && v <= int 'z' -> v |> f 'a' |> Some
    | v when v >= int '0' && v <= int '9' -> v |> f '0' |> Some
    | _ -> None

So this almost fixes our bug, now "AB" |> colnum returns 28, which is almost there. We need to subtract 1 from the result to get the actual value. For this we're going to just create a sub method, which will take two values and then return the first value subtracted from the second. You can define this yourself, but it should end up working as follows:

let colNum =
    >> Array.choose letterVal
    >> Array.rev
    >> Array.mapi (fun i x -> (float x) * System.Math.Pow(26., i |> float))
    >> Array.sum
    >> int
    >> sub 1

Beautiful. Our code works, we can test this by running ["A"; "Z"; "AA"; "AB"; "ZZ"; "AAA"] |> colNum, which should give us [1; 26; 27; 28; 702; 703]. We could subtract another 1 in our method to zero-base it, but instead we'll make a new method which returns colNum minus 1: let colNumZBase =. You can implement this on your own.

Making it a bit more usable

This whole colNum function is great, but I want to make the fun i x lambda an actual function, that can be used elsewhere. For this we have to think pretty hard about what it's doing, why and how. For those who don't know what's happening here, the Array.mapi lambda is converting the values from an integer to the placeholder for the base. If we consider that our 26. is our base, call it n, each value is going from x to x * 26.^i, where i is the index. (We reverse it so that as the order of the values goes right, they become larger indexes and thus higher placeholders. This means each position is n ^ i, where n is the base and i is the position, then we can define a function which does this math more abstractly:

let baseValue b i =
    System.Math.Pow(b, i)

Now our formula has some more meaning, but it's not fully reusable. We can see cases when we would be trying to do this often, in fact our baseValue always uses 26. as the base in our scenario, so we could subdefine even further: let indexVal = baseValue 26. and change our formula to fun i x -> (float x) * indexVal (float i), but we can apply further transformations to even more generalize it: let elementVal i x = i |> float |> baseValue 26. |> (*) x, which leads us for a final transformation into:

let colNum =
    let elementVal i x = i |> float |> baseValue 26. |> (*) x
    >> Array.choose letterVal
    >> Array.rev
    >> float
    >> Array.mapi elementVal
    >> Array.sum
    >> int
let colNumZBase = colNum >> sub 1

This leaves our entire colNum as just a series of function compositions, and each function is independent. This makes our code much more readable, and allows us to be more expressive in the future. Notice that we expected x to be a float in elementVal, we could remove the float call and then replace x with (float x) in the elementVal function, but that's less clear. Now the transformation is happening elsewhere - we want it to happen before that function to alleviate its responsibilities. We can expect i to be transformed to a float, because after all it's an index, but the values could have special transformations, so we don't want to make that the elementVal functions responsibility.

Organizing our code

We've built everything, and it all works, but we need to organize it. F# has Modules and Namespaces, I won't go into details of the differences, you can read about those elsewhere, instead I'm just going to drop all our code into a module in a separate file that we can reuse later.

module Excel
    /// Allows piping of the subtraction operator as the second argument: sub a b is equivalent to b - a.
    let private sub b a = ...

    /// Returns the integer value of a character if the charater is an alphabet character, or a number. (If char is between 'A' and 'Z', returns char - 'A' + 1, for example.)
    let letterVal (c : char) = ...

    /// Returns the value of the index for the specified base. I.e. base^i
    let private baseValue b i = ...

    /// Returns the 1-based index of the column number. I.e. "A" becomes 1, "AA" becomes 27.
    let colNum = ...

    /// Returns colNum - 1. I.e. "A" becomes 0, "AA" becomes 26.
    let colNumZBase = ...

This keeps it out of our way in our production work - because we won't need to modify or change it there (most likely). We're just going to consume it, which means we just need to be able to reference what we've already built. We could organize it further, since sub and baseValue are more abstract functions, but I leave that up to you if you like.

Some other helpful functions

I don't want to start actually processing an Excel workbook today, so I'm going to give you some more modules and such that I have that are particularly helpful. I won't explain all (most) of them, but they'll be used in my samples later and I want you to have them now.

The first three I'll explain slightly: F# has the three collection types, and these "flatten" a collection of collections, that is: if you have an array of arrays, this will convert them to one array, with all the sub elements. It's basically _.collect id, where id is the identity function, but I find it easier to read _.flatten in my code than _.collect id.

module Array
    /// Flattens an array of arrays to a single-dimensional array. (Equivalent to Array.collect id)
    let flatten<'a> : 'a array array -> 'a array = Array.collect id
module List
    /// Flattens a list of lists to a single-dimensional list. (Equivalent to List.collect id)
    let flatten<'a> : 'a list list -> 'a list = List.collect id
module Seq
    /// Flattens a sequence of sequences to a single-dimensional sequence. (Equivalent to Seq.collect id)
    let flatten<'a> : 'a seq seq -> 'a seq = Seq.collect id

I have all of these in three separate files, but you can arrange them how you like. They're called like a normal function on that collection type, with .flatten. The <'a> is an F# generic type-parameter. Feel free to look those up to learn more.

The rest of the helpful functions that I have (up to now) are below, each module is in it's own file for organization purposes:

module Object
    /// Equivalent to object.ToString.
    let toStr o = o.ToString()

module Char
    /// Returns true if a char is equal to the double-quote (").
    let isDQuote = (=) '"'
    /// Returns true if a char is equal to the single-quote (').
    let isSQuote = (=) '''
    /// Returns true if a char is equal to the double-quote (") or single-quote (').
    let isQuote c = c |> isDQuote || c |> isSQuote

module String
    open System
    /// Gets the char array of a string.
    let explode : string -> char array = Seq.toArray
    /// Assembles a string from a char array.
    let implode : char array -> string = String
    /// Splits a string using the StringSplitOptions on the specified char.
    let splitOpt (options : StringSplitOptions) (c : char) (str : string) = str.Split([|c|], options)
    /// Splits a string on the specified char using the default StringSplitOptions.
    let split (c : char) (str : string) = str.Split(c)
    /// Determines if a string contains a substring.
    let contains (search : string) (subject : string) = subject.Contains(search)
    /// Performs a String.Trim() which removes leading and trailing whitespace.
    let trim (str : string) = str.Trim()

    let private processStr func = explode >> func >> implode
    let private filterStr func = explode >> Array.filter func >> implode

    /// Removes all double-quote characters from a string.
    let stripDQuotes = filterStr (Char.isDQuote >> not)
    /// Removes all single-quote characters from a string
    let stripSQuotes = filterStr (Char.isSQuote >> not)
    /// Removes all double- or single-quote characters from a string in one pass.
    let stripQuotes = filterStr (Char.isQuote >> not)
    /// Filters a string to contain only digits.
    let digitsOnly = filterStr Char.IsDigit
    /// Filters a string to contain only letters.
    let lettersOnly = filterStr Char.IsLetter

All of these are pretty basic, so I won't explain them. Feel free to use them or not, but these are some of the most common functions I use in my work. You'll notice most of them are function compositions: I like to compose functions as much as possible, as it should reduce some overhead as far as function calls later on. (Whether it does or not is untested, but at the very least it forces me to be consistent.)

Defining function "purity"

Many of these are also "pure" functions, that is: a function which will return the exact same output every time it is run for a specific input. Think of something like a + b, the + function is pure, it will always return the same result for any two inputs. In our library functions here all of them are, in-fact, pure. They are entirely reproduceable. If we repeat the function at any given point in time it will always have the same result.

Some functions and methods in .NET are considered impure: they are not guaranteed to return the same output for any given input on each invokation. Consider the .NET Random: calling Next() will generate a different result on each subsequent call, which means more than just the input parameters can influence the function. If a function return value can change based on things other than input parameters, it's impure.

Bonus day: Filtering Parenthesis

One of the fields I get in these spreadsheets is a specific data field, often with a bunch of parenthesized information in it. Let me rephrase that: the data is ______ and then there's a nasty (____) in there, sometimes with sub-parenthesis.

This, of course, doesn't match with what I expect from the field, usually I am only concerned with the non-parenthetical data. (In fact, I'm always concerned with the non-parenthetical data, just occasionally I want the parenthetical data as well. Not often.) In order to support this, I built a function filterParens (filter parenthesis) which will go through each character in the string, and if it is an opening parenthesis, it will ignore all characters until that parenthesis is closed. It also supports sub-parentheticals, that is: (Something (Else)) will be entirely filtered, and (Something (Else) Entirely) will be as well.

I did this through clever use of a fold, which really wasn't all that clever. In fact, it's pretty straightforward:

let filterParens =
    >> Array.fold (fun acc el -> 
        match acc, el with
        | (items, i), '(' -> (items, i + 1)
        | (items, i), ')' -> (items, Math.Min(i - 1, 0))
        | (items, 0), _ -> (el::items, 0)
        | (items, i), _ -> (items, i)) ([], 0)
    >> fst
    >> List.rev
    >> List.toArray
    >> String.implode

If it's not obvious, acc is a tuple of the current list of items, and the level of parenthesis we are in. (Because F# is stateless and immutable by default, we cannot just increment or decrement the value for our levels. This is not a bad thing though - beginners may find it easier to understand a stateless language, which is why I selected F# for our adventures.) Also, fst is a function that takes a tuple and returns the first element, if you want to research further.

Now I don't allow it to drop below 0 (the Math.min(i - 1, 0)), so if we close a parenthesis before we opened it, the only result will be that we don't save either. Anything between them will still be saved, and anything after the opening one will not be saved (because it will be back to level 1). This is not a problem for my data-set, but if it is for yours then you could easily modify the second and third match expressions to account for that. (In fact, I recommend you try doing that. Your homework for this post is to modify filterParens to not stop at 0 levels, but allow negative levels as well, and not ignore text between )...(.)

We're done with todays lesson, I know it wasn't as "cool" as we had hoped, but this sets us up for success when we do begin processing things with NPOI. (Which I expect will be the next post here.) As always, keep in mind that while this may be somewhat mundane, it does lead in to bigger things. (I do have a form of a curriculum planned.)

Getting started with programming and getting absolutely nowhere (Part 3)

Since we've made something, let's make it cooler

Lesson 2: Now That We're Programmers, Let's Make Something
Lesson 4: Moving on to Solving Business Problems

Now that we've built something that actually resembles a business problem, and I want to take this time to show you one of the coolest features of F# in my opinion. So in this post, I want to introduce the conecpt of a "DSL", or a "domain-specific language".

F# has a very powerful feature that allows you to create your own operators. What do I mean by operators? Things like +, or -, or the |> operator. You can build your own, which leads to building your own DSL.

What, exactly, is a DSL?

I mentioned the name "domain-specific language", but I know without context that has little meaning, so let's clarify. By building our own DSL we're actually creating a language that is a subset of another language. We're creating our own language! We don't even have to do much to make it happen, so let's investigate that a little bit.

First off, a DSL isn't always apporpriate, you need to take care to understand what a DSL applies to, and when to use it. As a language, F# already has a lot of operators, including some confusing ones, so adding more to the mix is probably not the greatest idea unless it makes sense, and it's going to be used a lot.

In work I've actually done this, and I'm going to show off the operator I created, how it works, and how you can create one. It's really quite simple, and if you've tried experimenting with the language you probably already learned that it's possible, and may have created one yourself. If so, congrats! That's awesome! You clearly figured it out faster than me, and I applaud you for that, because I didn't figure this out until a very long time after I started using F#.

Let's make a DSL

Seriously, let's create one. Remember that split function we built before? It was pretty basic, wasn't it? We took a char and a string, and we split apporpriately, but it's about time for us to make it a little easier.

In the work I do we do a lot of string parsing, which means a lot of calling split. In fact, I do it so often that I have a String.fs library file, that has this (and a few other) nifty functions in it. I carry that library to all my F# projects, and it saves me a great deal of time.

let split (c:char) (s:string) = s.Split(c)

Remember this method? Pretty basic, take a string and split on the char. We can partially apply it, even, to make it simpler if we're splitting on the same string a lot. Now what if I told you we could build our own operator to make this work? What if we could create something like:

let splitString = someString <|> someChar

Notice the <|> operator? Try to use it in F# right now, you probably can't. (Unless you already created it, then feel free to use it.) Because I'm a nice person I'm going to just give you this operator for free.

let (<|>) (s:string) (c:char) = s |> split c

It is that complex, adding your own operator is as easy as defining a function with the operator in parenthesis as the function name. We could literally create anything, the issue is, should we?

Uh oh - here be dragons, errr...dinosaurs

Be careful when defining a DSL

DSL's can be powerful things - they can make complex things simple, and they can make simple things complex. You want to define a DSL that fits, and a DSL that is powerful. Don't define a DSL "just because you can" - I've done so, bad idea.

However, if you define a DSL well. you can turn something increasingly complex into a much simpler feeling, it feels like a built-in operator. It feels like it belongs. It just feels natural.

Since this post is coming out on a Friday (at least here in the Eastern Time Zone - could easily be Saturday for many of you) you probably don't have to work tomorrow (or today, if it's already Saturday), so if you get a chance, experiment with custom operators. Try to figure out how they work, this one little feature can truly make things easy on you. Shoot, you can even try to build the opposite operator: >|< if you like, it should fit in with the code below. (Though in my real DSL for this, the operator is actually ><, but I demonstrated it as >|< since that's more directly transpated as the opposite of what we have.)

let names =
    rawNames <|> ' '
    |> Array.toList
    |> List.fold (fun (acc:string list) str ->
        match acc with
        | prev::tail when prev.Length <= 3 -> (prev >|< " " >|< str)::tail
        | _ -> str::acc) []
    |> List.rev
    |> Array.ofList

Bonus day: documentation

We've gone through writing some code, but the biggest thing people get caught up on is writing documentation. Now, on one hand, I'm a firm believer that good code documents itself. On the other hand, I know that I very infrequently write good code. So we always need a way to tell the "consumer" (often times ourselves, but if you're working on some API think of the people who will be using it) what to do with our method/function/variable/thing. This comes in many forms, one of which is the Visual Studio / F# documentation comments.

Have you ever wondered how people can put text into the intellisense-hover-dealy of the function/variable/parameter they provide? It's actually quite easy, and in F# it's trivial: use a triple-slash comment: ///.

/// Splits a string into an array on the specified char.
let (<|>) (s:string) (c:char) = s |> split c

Now when we hover any instance of our <|> operator, we should see that string show up:

Woah - its like magic

Damn that's cool. If you're building an F# project, you can also generate an XML file with all the triple-slash comment docs in it. Simply open the project properties, go to "Build" and check the "XML Documentation File" box. Visual Studio will then generate a .xml file in the same directory as your .exe/.dll that contains these comments in a form that Visual Studio can understand, and that can be used to generate web documentation. (This is part of how MSDN is built - the code comment documentation becomes the web documentation.)

As an aside, I want to tell you all how grateful I am to have met some of the developer comrades I have - a lot of them have been extremely helpful to me regarding programming, software development in general, and even doing things right.I was actually thinking about each one of them as I wrote this, and I would call them out by name but I don't want to reveal any personal information, so for those of you that I'm referring to (and you all know who you are): thank you.

Getting started with programming and getting absolutely nowhere (Part 2)

Now that we're programmers, let's make something

Lesson 1: How To Become a Programmer (In a Few Not-So-Easy Steps)
Lesson 3: Since We've Made Something, Let's Make It Cooler

So two days ago I posted a very rudimentary introduction into "how to become a programmer", which was mostly a rant about what is and isn't important, and about how F# works. (Why F#? Because it lets us be lazy, and it does a whole hell-of-a-lot for you.)

Today we're going to introduce the business domain.

Almost everything is "business logic"

The first thing about the "business domain" is that it's a very real thing. In the business domain everything is what's called "business logic", that is: logic that makes the business work. Now we're not talking "business" as in "for-profit corporate entity", but "business" as in "a person's concern". (Think: "it's none of my business".)

Basically, business logic is any logic that is related to the task at hand. So consider a "cart" system, business logic might be "we need to be able to add an item to the cart, remove an item from the cart, and change quantities of items in the cart." Simple, right? Somewhat straightforward. You may also here this referred to as "BL" or "domain logic", or the collective "domain" (which includes the "business domain" and the "business logic"). I'm going to call it "business logic" because that's what is most applicable to my situation, but feel free to use whatever terms you feel appropriate.

So we saw an example of business logic, what do we do with business logic? Usually we write our software around the business logic, so we may write a function addItemToCart or removeItemFromCart that adjusts the items in our cart as appropriate. These are considered "functions that perform business logic".

Simple, right? This is all largely basic, but it's important to know because we need to understand what is and isn't business logic. For example, when we disucss the implementation itself, we're no longer talking about business logic, but "application logic". The business logic is the broader picture: what is the purpose of our application? It's the "problem" the application is "solving", this can vary all over the place, but the basic idea is it's the higher-overview of the application.

All that long-worded rambling aside, let's define a piece of business logic that we want to build an application to solve:

Given a string, split it into a group of "words" on spaces, where each "word" has no-less-than three characters.

This is a very real problem I had to solve for my job recently, I won't say why, but it was necessary. So, we're going to focus specifically on this problem in this post.

The first step is to define the sub-steps

Every problem is, at it's core, a series of smaller problems. You can always break it down, though sometimes it's unecessary. For our problem let's break it down into subparts:

  1. Split a string on spaces
  2. Analyze the first word, if it has 3 or fewer characters, group it with the next word
  3. Repeat for the next word

So that's more manageable, and we can solve that in "sub-steps". We can solve each step independently. (Granted, we only have 3 steps, but we'll break step 2 down futher when we get to it.) Each step should be testable: so given some input I should have a defined output.

Splitting a string on spaces

So the first step is actually the easiest, split the string on spaces. in F# this is easily done via String.Split, which is available in the entire .NET framework:

let parts = "Some String With A Short Word".Split(' ')

So that's easy, and we can create a function that handles that pretty easily, we'll define a split function that takes a string, and a char (separator) and splits the string on the "char" (separator):

let split (c:char) (s:string) =

That's pretty basic, but why did we define a new function when we could just call s.Split(c) directly, with our own s and c? This is so that we can split strings "idiomatically", that is, matching the general style of F# code. F# isn't about calling methods, it's about composing functions, and you cannot compose String.Split(char) easily like that, so we define a function that lets us do so.

So now we could test our function, which involves simply calling it:

let parts = "Some String With A Short Word" |> split ' '

Well that was easy. This shows that we can pipe a string into the split function, and it does what we expect.

Moving to Step 2: this is going to hurt

So F# makes step 2 pretty easy, and if you have any programming experience with a non-functional language, I want you to forget it right now. What you think you know is not true in F#, and we need to redefine the problem.

We can break step 2 down a little further:

  1. Get a word
  2. Check how many characters the word has
  3. If < 3, it belongs with the next word

So let's start building a groupWords function, we're going to do this the "Elliott Brown" style, which is probably different from what you've usually seen, but this is where functional languages make things pretty awesome.

Instead of looking at the current word, we're going to look at the previous word, since it makes things easier. We're going to use a basic pattern match with a guard clause, List.fold, Array.toList, and Array.ofList.

The easiest way to do this involves converting the string array to a string list, which is done with the Array.toList:

let stringList = parts |> Array.toList

If you're following along in a REPL (the F# interactive window), you should notice the biggest difference between the printed versions of each is that Array has vertical-pipes on the inside of the brackets: [|element1; element2; ...|], and List does not: [element1; element2; ...].

So now we're going to fold over that list: a fold takes an accumulator, and a value. It iterates over each value in the List and applies a function to it, which usually combines the value with the accumulator somehow. Our fold is pretty basic:

let groupStrings (acc:string list) str =
    match acc with
    | prev::tail when prev.Length <= 3 -> (sprintf "%s %s" prev str)::tail
    | _ -> str::acc
let groupedStrings = stringList |> List.fold groupStrings []

We could modify this to take a length instead of 3, but I'll leave that up to you.

Some neat syntax things:

  1. The match is a "pattern match", similar to a C-style language switch, but on sterroids.
  2. The prev::tail is a pattern expression for a list: it means "match the first element in the list to prev, and the remaining elements to tail.
  3. The when ... is a "guard clause": it means this expression is matched first, then the entire pattern is matched if and only if the guard clause returns true.
  4. The -> means "return the right side as the result of the match expression." (Basically)
  5. The (sprintf "%s %s" prev str) just combines the two strings with a space between them.
  6. The ::tail now creates a new list with the sprintf block as the first element, and the remaining elements as the, well, remaining elements.
  7. The _ is a "match anything" pattern-expression.
  8. The [] is an empty list (the type is inferred to be whatever that "folder" expects - a string list in this case).

Now we just need to reverse the list, which in F# is easily achieved by List.rev:

let finalResults = groupedStrings |> List.rev

And lastly, we'll convert them back to an array with Array.ofList:

let result = Array.ofList

That's it, make it reusable

So now we've successfully built the functions to do this work, let's build a function that is reusable.

let splitAndGroupString =
    split ' '
    >> Array.toList
    >> List.fold (fun (acc:string list) str ->
        match acc with
        | prev::tail when prev.Length <= 3 -> (sprintf "%s %s" prev str)::tail
        | _ -> str::acc) []
    >> List.rev
    >> Array.ofList

And we can call it as:

let splitStr = "Some String With A Short Word" |> splitAndGroupString

And that's it!

We went through some basic business logic today, we'll be going through more complex stuff in the coming lessons, but it's good to take it slow (at first), and we'll eventually build up to some pretty complex operations.

I told you I would get more organized, and I did. As we continue I'll figure out how I plan to present things effectively, but for now I'm just going to stick with my gut. I hope you enjoyed it and learned something, but if not it's still helping me learn more and become a better software engineer, so you should probably count on more crap coming from me here soon.

Getting started with programming and getting absolutely nowhere (Part 1)

How to become a programmer (in a few not-so-easy steps)

Lesson 2: Now That We're Programmers, Let's Make Something

Recently a friend of mine has asked me to teach him "how to become a programmer", which is not really an easy task to accomplish, but I'll give it a shot. Since I'm an altruistic person, I am going to make this a series of blog posts on the idea of becoming a programmer, we're going to mostly look at the .NET structure, and the first few lessons will be in F# to get the ball rolling somewhat quickly.

The first rule of programming: forget what you think you know

All programmers, including myself, start somewhere. The biggest mistake I've made on the quest to become a programmer is to focus entirely on some "bigger goal" to achieve, some large project that I wanted to build or work on. Goals are great, but we have to start small. (Especially in the modern world, where we have hundreds of languages and frameworks and environments to choose from.)

So my very first word of wisdom is to forget everything you think you know. All of it. Forget what languages you think you want to learn, forget what areas you want to work in, forget what you know about "HTML coding for Myspace pages", ditch it all. We're going to start your career of programming from the beginning - the basics first, then we'll work up to the fun things.

"Hello World" is so cliche

I'm not going to bore you by starting our adventure with a "Hello World" program, instead we're going to start it from an actual business-driven design. Instead of contriving highly-egregious examples of programs to write, building some sort of "example" application, we're going to build real-world business applications: things that you will actually be expected to come up with. And we're going to do so poorly, so that we can learn where to improve. I'm going to take personal experiences (and mistakes) and burn them into your core, so that you can identify what makes good code good, and what makes bad code bad.

The first mistake is that people always lose interest, as you probably have by this point in this blog post. But hang tight, we have some code coming up, and I'm going to start showing you how we can take some basic, elementary ideas and turn them into truly remarkable pieces of software.

Without further ado, let's get going

We're going to begin our entourage into programming with F#, which is very much unusual from normal adventures. The easiest way to get started is to install a copy of Visual Studio 2017 Community (or if you prefer to purchase it, Enterprise or Pro), with the F# development tools selected. I'm not going to walk you through this for a few reasons:

  1. There are about a hundred thousand articles and blog posts on installing Visual Studio and adding and removing features.
  2. It's actually extremely self-explanatory, it really is pretty simple to get set up.
  3. I want you to struggle a bit with this.

Notice point 3: I'm not going to hold your hand, we're going to go on an adventure and we're all going to make mistakes, I want you to struggle with this entire venture, for two reasons on it's own:

  1. Only by struggling do you force yourself to think critically;
  2. I want you to decide if this is a path you wish to pursue.

Too many people say "teach me to be a great programmer" but refuse to put the work in to become one - I can't write a long, wordy article that makes you a good programmer, I can only guide you down the path to success. It is up to you to decide if you wish to continue down the path or not. By forcing you to struggle with real-world problems you'll be better prepared for how programming actually is.

So let's start with some code. Because we're using F# it's going to be a somewhat quick foray into our career, so open up Visual Studio, and find the F# Interactive window - I won't tell you where it is, so this is the first struggle you'll have to do on your own. (I'll give you a hint - it's under some sort of "Window" menu somewhere.)

Once we've opened F# Interactive, I want you to define some sort of "domain model" to describe some sort of arbitrary user. The F# syntax for such a thing is something like:

type SomeName = {

The syntax for SomeProperty can look something like:

SomeValue : SomeType

Right, so I'll give you the first object for free:

type User = {
    Id : int
    Name : string
    Email : string
    Phone : string option

For those of you who don't know, this is the basic F# syntax for defining a "class", but we get a lot more than just a class, so we won't call it such a thing (because naming is important - if it's not a duck we really ought not call it a duck). In F# this is called a Record.

A Record in F# has a few basic features: properties/fields, values, and is non-nullable. In fact, almost everything in F# is non-nullable which is why we're starting with this language. Nullable values are hard, and if you don't believe me just "Google" for a "NullReferenceException" - it should become clear.

What we've define here is a "User" with four properties: an "Id" (which is an integer: a whole number so-to-speak), a "Name" (which is a string, or a sequence of characters in a specific order), an "Email" (which is also a string), and a "Phone" (which is a string, but it's "optional" - we may or may not have a phone defined). An "option" in F# is a kind of null, but not really. It has two cases: a None value (which means the lack of a value entirely), or a Some value, which means that there is "Some" value there. These are important because these idiomatically represent the idea of a non-existent value.

One of my favourite features of F# is it's type-inference, that is, F# will detect what types you're playing with in most situations correctly. So we're going to define a "User" now, the syntax of which is:

let myUser = {
    Id = 1
    Name = "Elliott"
    Email = ""
    Phone = None

I believe I owe you an explanation: by specifying all four of these properties, F# understands that we want to create a User object. We didn't explicitly tell it that, but it assumed based on what we gave it that we wanted that type of object. We could have also told it that we wanted myUser to be a User with a type-annotation, that is, : SomeType, which would mean let myUser : User = {...} would create our myUser record. Simple, right?

Pointing out our First Mistake

I need to tell you that we already made a few mistakes here, not the least of which is the choice of myUser as a name. What does myUser mean? Why does it have the type-name in it? What will myUser be doing?

We really need to touch on naming, but I don't have a lot of time for that, so I'll be brief: naming things is hard. You should name something for what it means, rather than what it is. That is, instead of myUser, we should really name that ebrown, or something.

We also need to touch on the type-safety (which is one of many such safeties) in F#. As a language, F# forces you to define the entirety of a type at once. We know that Phone has a value of None, which happens to be the default, but we still have to tell F# that we want to set None explicitly to Phone. It won't do it by itself, we have to. Remove that line and you'll see what I'm talking about.

So what have we effectively done here? We've created a Type, but what is a type? What does it do? To put it short, we've created a point that we can hold data. So that's a good thing, right? We've created something that can store state, something that holds a group of related values. We could have just as easily done

let ebrownId = 1
let ebrownName = "Elliott"
let ebrownEmail = ""
let ebrownPhone = None

So what's the difference? Why did we build a type? I'll give you a hint: what stops you from sending ebrownId, ebrownName, alexEmail, ralphPhone to a method? Ah, that's what it's for. It groups related values together so they're unmistakeably related. They belong in a group.

Alright, so I've done a lot of talking (see: rambling), and we've effectively seen what, 20 lines of code? Let's actually make something happen. I'm not going to explain every little detail of this next snippet, but I will go over the general idea.

type User = {
    Id : int
    Name : string
    Email : string
    Phone : string option

let ebrown = {
    Id = 1
    Name = "Elliott"
    Email = ""
    Phone = None

let createReminder thing user =
    sprintf "Hello %s, this is a reminder to do the thing (%s)." user.Name thing

let remind =
    createReminder "Write a Blog Post" >> printfn "%s"

let reminder =
    ebrown |> remind

Right, so what happened? This is actually pretty simple, we already saw creating a user, now all we're adding is passing that user to a function and gathering a result. There are a couple key points to note here:

The pipe-right (|>) operator takes the value on the left side, and transforms it to be the last argument of the function on the right side. So ebrown |> remind is the same as remind ebrown. We use this pipe-right operator mostly to keep the order of things consistent. I.e. let someResult = someValue |> step1 |> step2 |> step3, thus allowing us to follow the "flow" of the functions.

The next notable piece is the compose-right (>>) operator, this has a similar effect as the pipe-right, except that instead of needing a value, it needs a function. So it builds a function that is composed of the steps of the first function, then the second function. This can also be chained as necessary, the difference between this and pipe-right being what they operate on: pipe-right operates on the value, compose-right operates on the function.

This brings us to an interesting aspect of F#: the type system. I've deliberately avoided discussing it yet, but now it's about time to get into the nitty-gritty of what the type-system does.

In most "regular" languages functions are presented as some type of prototype: you have an input and a result. This is usually written (in C-based languages) as ResultTypeName FunctionName(InputTypeName1 inputValue1, InputTypeName2 inputValue2, ...), etc., which basically says "this function takes [...] as input and returns a ....".

In F# we do things a little differently. Instead of declaring a function as having "all these" inputs each and every function in F# has exactly one input.

But Elliott, our createReminder takes two parameters, a thing and a user.

Technically, this is wrong. The createReminder function takes one input, then a new function is created which takes the second input. We call this "currying", in laymans terms it means that functions are partially applied, which is, kind of, part of the point of being a functional language.

Type Definitions aren't so scary

Enough blab, let's talk about a type definition. F# defines a function as 'a -> 'b, that is, 'a is an input which returns a 'b. If you look at the signature for our createReminder, you'll see that it is string -> User -> string, "I first take a string, which returns a function that takes a User, which returns a string."

If you look at remind, you'll see a very different signature: User -> unit. In F# the unit is the "void" type, that is, a unit is a nothing. It's basically the end of something. You can create a unit directly as (). If you run let unitThing = () in F# Interactive, you'll see it prints val unitThing : unit = (), meaning it's a nothing. So remind takes a User and returns a "nothing".

So, all this said, we can build a reasonable thought process from the following signatures:

string -> int -> unit            // Takes a string, then an int, returns a unit / nothing
string -> string -> string       // Takes a string, then a string, returns a string
int -> (string -> int) -> unit   // Takes a string, then a `string -> int` function, returns a nothing
(string -> int) -> float         // Takes a `string -> int` function, returns a float

Do those make sense? Good, I hope you thought that was easy, because I'm only going to make things more difficult from here.

What the snot does any of this do?

I want to intentionally confuse you for a moment, because learning requires a struggle. We're going to look at a highly contrived example (remember when I said I wouldn't do that? I'm breaking that rule apparently) of just how complex we can get.

let formula values =
    let xb =
        |> List.sum
        |> (/)
        <| (values.Length |> float)
    |> List.fold (fun acc x -> acc + (x - xb) * (x - xb)) 0.
    |> (/)
    <| (values.Length |> float) - 1.
    |> System.Math.Sqrt

Oh boy. This looks fun. Can anyone guess what's going on here? If you guessed this calculates the sample-standard deviation, you're spot on. We simply let xb be the average (calculated by summing the values, then divide by the length), then we simple "fold" over the values, summing the squares of x - xBar together, divide by N - 1 and then take the square-root.

You should have been able to guess that the pipe-left (<|) operator does what the pipe-right operator does, but in reverse. Instead of supplying the right side as the last argument to the left call, it supplies the left side as the last argument to the right call. Do note that nothing is special about the division (/) operator, we can actually treat operators as functions in F#. That is the biggest take-away from this.

We'll also notice this function has a type-definition of float list -> float.

Building an order of elements

I'm going to keep on the boring train for the rest of this section, since we're already there. (My rambling doesn't help.) We're going to discuss greating a list/array/sequence. (All three of these are different ways to represent a grouping of values that share a type.)

The most basic of these types is an Array, which is also an array and a []. All three of these are valid ways to state "array". Arrays are one of the most basic "collection" structures. Essentially, you declare an array (of a fixed size) and then set elements in it by "index" (0-based). In F# we access an array element by .[#], where # is the index (with the first being index 0) of the element we want.

Think of an array as a crayon box: you have a specific number of crayons (each of a different color) which you can access by "index" (or position). Simple, right? We could even represent a crayon box in F#:

let crayons : System.Drawing.Color array = ...

Pretty simple. You can construct an array of elements by wrapping them in [|...|], and separating each element by a semicolon. ([|System.Drawing.Color.Black; System.Drawing.Color.Gray; ...|]).

Next is a List or list. This is simply an Array that can be resized. The syntax is [...], with the same semicolon separator.

Finally, a sequence. This is a very special collection in that it's lazy. The syntax is {...} with the semi-colon separator. The thing about a Sequence (or seq) is that it doesn't initiate all the values initially, it only initiates them when you iterate them. (That is, ask for the next value.) While a list and array both allow you to index a value, a sequence does not. Remember this, because we'll need them later.

I'm going to stop here, but I'm going to pick up where I stopped here tomorrow, and we'll go over a couple business domain issues and learn how to build software to represent them. I'll be doing a lot of "here's how you solve the problem, each step of the code" and not a lot of "this is what _ means", so I expect you to research the different operations that we do, and try and use some critical thinking to push yourself to learn what's going on.

I do realize that this was probably the worst possible writing ever by me, but it's been a while since I've done one of these and I tried to just cover what I could as I thought of it. The next few sections of the series will get a lot more informative and organized.

Optimizing for Tail-Call Recursion in F#

Rewriting for Tail-Call Recursion in F#

While this post is written in F# and specifically for it (referencing ILASM), the principles and practices here can be applied to any function (or non-functional) language that uses tail-call recursion and provides optimizations for it.

Recently I was working on a project, and I had to extend the F# List type to make things a bit simpler. I wrote a method takeThrough that allowed me to provide a predicate, and much like takeWhile, it would return elements until the predicate returned false, then it would return one more element. This was important for the code I was writing, I needed to return everything up to and including the first element that caused the predicate to return false.

So, I wrote this method called takeThrough:

let takeThrough(predicate)(source) =
    let rec loop sourceTemp =
        let head = sourceTemp |> List.head
        if head |> predicate = true then
            head :: (sourceTemp |> List.tail |> loop)
    loop source

The problem with this method is that it cannot be optimized for tail-call recursion in it's current state.

What is Tail-Call Recursion?

In order to understand why we're talking about optimizing for tail-call recursion here, we have to first undestand what is tail-call recursion?

In functional languages such as F# things like loops are discouraged, and in some of them even unavailable completely. So, in order to avoid them we have to rewrite our methods for recursion of some form (usually).

If we were to take this same example in C# it might look something like:

IEnumerable<T> TakeThrough<T>(IEnumerable<T> source, Predicate<T> predicate)
    var continueLoop = true;

    foreach (var item in source)
        if (predicate(item))
            yield return item;
            continueLoop = true;
            yield return item;
            continueLoop = false;

        if (!continueLoop)

(This may not be the most optimal manner to write this method in, but it guarantees success.)

So we're just looping through each item here and returning it as we go. With F# we don't want to use such a construct, as we want to avoid loops. So we go to recursion, in C# the F# code we wrote above might look more like:

IEnumerable<T> TakeThrough<T>(IEnumerable<T> source, Predicate<T> predicate)
    return _loop(source, predicate);

IEnumerable<T> _loop<T>(IEnumerable<T> sourceTemp, Predicate<T> predicate)
    var head = sourceTemp.First();

    if (predicate(head))
        var result = new List<T>();
        result.AddRange(_loop(sourceTemp.Skip(1), predicate));
        return result;
        return new List<T> { head };

It's almost verbatim identical to the F# version. What we can see being a problem here is a StackOverflowException being through if source is large enough and predicate would return late enough. This is what we're hoping to avoid with Tail-Call recursion.

Remember: in order for a method to be optimized for tail-call recursion, the recursive call has to be the last thing the method does.

Now you might look at that method and say "well the last thing that happens is everything is piped to loop." Not quite true. We don't realize that head :: is the very last thing the method has to do.

This is an important note because loop is called, then that value is given to the concatenation operator.

The if/else is ugly too

Of course the other problem is the if/else construct, but that can be fixed with a match head |> predicate with and then match to each boolean value (true and false).

Right, so that's simple. Easy fix:

match head |> predicate with
| true -> head :: (sourceTemp |> List.tail |> loop)
| false -> [head]

Great. We solved the easy idiomatic issue, but how in the world do we make it tail-call recursive?

Visualizing our tail-call recursion

The first thing we have to do is determine how can we write the structure of this method so that the loop call is the last thing to happen? Ignore what it does for now. We just want to know what it would have to look like. We need a visual.

let takeThrough predicate list =
    let rec loop ... =
        loop ...
    loop ...

So we know what it should look like-ish. That's a very good start. Now we have to figure out how we can get it to that state.

So we know that our method needs to match each item with a predicate, and then return a List of all the elements that matched and the next element. So we need to accumulate a list of elements.

Notice I bolded accumulate. We need a variable in our loop that is an accumulator in this case.

Now we know our visual needs to change:

let takeThrough predicate list =
    let rec loop acc ... =
        loop newAcc ...
    loop [] ...

This looks about right. Our acc will be a List since that's what we're building out of, and we're going to pipe the newAcc to the list each time we iterate, and then pipe an empty list to our loop before we get started.

Creating our tail-call recursion

So now that we've visualized it, we can start to write the final pieces of it.

We'll start at the final line: loop [] .... What do we know about this call? We know that we only have two parameters in the method, and one variable (well, constant, but it's a function so it will look like a variable). And that's all we need. So we'll pass our initial list to our loop because it's the only we have to pass.

let takeThrough predicate list =
    let rec loop acc ... =
        loop newAcc ...
    loop [] list

Now our definition for loop has to change:

let takeThrough predicate list =
    let rec loop acc listTemp =
        loop newAcc newListTemp
    loop [] list

Alright, great progress. We just have to apply our operations now. In our case, the newAcc will be the appended list, and the listTemp will be stripped of the first item. Let's get the logic for head in there and work from that.

let takeThrough predicate list =
    let rec loop acc listTemp =
        let head = listTemp |> List.head
        match head |> predicate with
        | true -> ... loop newAcc newList
        | false -> ...
    loop [] list

Perfect! We're almost done, getting newAcc and newList are both easy: newAcc is just List.append acc [head], and newList is just listTemp |> List.tail.

let takeThrough predicate list =
    let rec loop acc listTemp =
        let head = listTemp |> List.head
        match head |> predicate with
        | true -> loop (List.append acc [head]) (sourceTemp |> List.tail)
        | false -> ...
    loop [] list

The last issue is our false condition: what do we do here?

Simple: we just kill the batman return what the newAcc would have been.

let takeThrough predicate list =
    let rec loop acc listTemp =
        let head = listTemp |> List.head
        match head |> predicate with
        | true -> loop (List.append acc [head]) (sourceTemp |> List.tail)
        | false -> List.append acc [head]
   loop [] list

And we've achieved our goal of tail-call recursion. The very last thing in loop is a call to itself. (Remember that in this case, match is the last condition, then one of two things happens: we call loop or we return the new stuff.)

Verifying with ILDASM

If we look at the IL for this method, we'll see the following:

.method assembly static class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T> 
        loop@4<T>(class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<!!T,bool> predicate,
                  class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T> acc,
                  class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T> sourceTemp) cil managed
  // Code size       69 (0x45)
  .maxstack  6
  .locals init ([0] !!T head)
  IL_0000:  nop
  IL_0001:  ldarg.2
  IL_0002:  call       !!0 [FSharp.Core]Microsoft.FSharp.Collections.ListModule::Head<!!0>(class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0>)
  IL_0007:  stloc.0
  IL_0008:  ldarg.0
  IL_0009:  ldloc.0
  IL_000a:  callvirt   instance !1 class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<!!T,bool>::Invoke(!0)
  IL_000f:  brfalse.s  IL_0031
  IL_0011:  ldarg.0
  IL_0012:  ldarg.1
  IL_0013:  ldloc.0
  IL_0014:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0> class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T>::get_Empty()
  IL_0019:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0> class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T>::Cons(!0,
                                                                                                                                                                class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0>)
  IL_001e:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0> [FSharp.Core]Microsoft.FSharp.Core.Operators::op_Append<!!0>(class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0>,
                                                                                                                                                      class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0>)
  IL_0023:  ldarg.2
  IL_0024:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0> [FSharp.Core]Microsoft.FSharp.Collections.ListModule::Tail<!!0>(class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0>)
  IL_0029:  starg.s    sourceTemp
  IL_002b:  starg.s    acc
  IL_002d:  starg.s    predicate
  IL_002f:  br.s       IL_0000
  IL_0031:  ldarg.1
  IL_0032:  ldloc.0
  IL_0033:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0> class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T>::get_Empty()
  IL_0038:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0> class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T>::Cons(!0,
                                                                                                                                                                class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0>)
  IL_003d:  tail.
  IL_003f:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0> [FSharp.Core]Microsoft.FSharp.Core.Operators::op_Append<!!0>(class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0>,
                                                                                                                                                      class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0>)
  IL_0044:  ret
} // end of method List::loop@4

We're only concerned with our recursion:

IL_0000:  nop
IL_000f:  brfalse.s  IL_0031
IL_002f:  br.s       IL_0000
IL_0031:  ldarg.1
IL_003d:  tail.

Just as we hoped, it's simply returning to the beginning of the method instead of calling it again.