using Programming;

A Blog about some of the intrinsics related to programming and how one can get the best out of various languages.

Getting started with programming and getting absolutely nowhere (Part 5)

Now we begin our adventure

Lesson 4: Moving on to Solving Business Problems
Lesson 6: It's Time to Accept Arguments

Finally, we've been working on this for a little more than a week now, it's time we get to actually processing a spreadsheet. We're going to solve one of the many business problems that I've had, though it's going to be significantly anonymized. The same principles will hold true, and I've specifically structured this data so that we have to fight some of the same issues I've had to fight. While this is technically an example project, this is entirely based on a real-world problem, and everything we'll use here holds true. (In fact, with minimal modification this example project can be used to solve the same issue I am at work.)

Now, we're going to explore this over a few posts, we are going to jump right in and start developing, but in stages. I want to do things one part at a time, so that you can see how and why it's important to formulate a "thought process" while developing software. As I mentioned before we're going to use real problems, and I won't give you all the answers, but that doesn't mean I won't give you some of the answers. We're going to begin today with loading and reading our Excel spreadsheet, we won't do much else because I want you to think about what the next step might be.

Defining the problem

The business problem we're going to solve first is based upon a list of items I receive on a periodic basis, that has tens of thousands of record, which I have to parse and insert into a database. Previously, this was done with a few simple steps in Excel, though sometimes it took a lot more than just a few steps. Some spreadsheets are not even a little bit normalized, and some have very bad data in them. One of them that I receive (which we'll actually solve last) comes in a very weird format, and I have to do some clever tricks to get it in the correct format. Once I've normalized the spreadsheet, I compare it to the most recently received spreadsheet of the same category, and determine what records are "new" to the database based on what is new to this spreadsheet.

So, our process will have three major steps, one of which is optional:

  1. (Optional) Preprocess the really messed up spreadsheet
  2. Normalize the Spreadsheet, splitting data appropriately.
  3. Compare the sheet to the previous version, determine what records are new and indicate them.

These are actually three separate applications that I have built at work, they all support "drag and drop" functionality, which means you can just drag a spreadsheet on to the "step" represented by the application. Between steps there is a short manual process, but that's only because I haven't yet automated it. (One day, I will.)

We're going to start with step 3, the reason for this being that if we start with the last step, we can work from what our expected output is to the input. It doesn't matter which order you go in, but for something as unguided as this I like to work backwards: figure out what our expected output is, then design a function (or set of functions) that can transform our input to the output. Since steps 1 and 2 only put the data in an "intermediate form" we don't care what the output is, we can make step 1 turn our output into whatever is appropriate for step 2, and step 2 can prepare it for step 3. The actual outputs there are irrelevant, we need to figure out what the output of step 3 is, then manipulate our input to fit it. You can work these problems in either direction, and in the future we'll work some problems from input » output, but today we'll stick to the output « input manner.

What output do we need?

This is the easiest part -- we simply need an excel sheet that has the following columns from our input data in this order: First Name, Middle Name, Last Name, Email, Magic ID. If a "person" has multiple Magic ID values, then we want multiple records per value. So if Elliott Brown has 1 and 2 for Magic ID, then there should be two records: Elliott V. Brown ... 1 and Elliott V. Brown ... 2. Simple, right?

So let's look at our input and try to determine how to format it to our output. This is actually going to be somewhat challenging, it may look easy, but it really isn't. The first issue we note is that there are three name columns on the first sheet, but for some records the middle initial is part of the first name. Well that's not good. We also note that there are two Magic ID columns, and some records have neither, some have one, and some have both. Alright, that's not as bad to deal with. And of course, we have two different email formats: some have none, some have one, and one has email1; email2. Looking at Sheet 2, we get even less consistent: if a user had multiple Magic ID values then they're seperated by a number of spaces, and the first, middle and last names are all in the same column. Oh boy.

Every single one of these problems is somethign that my real data has: some records have a combination of the problems, some have one or two, some have none. We're going to build our application on the assumption that every row may have any combination of problems, but some of the problems in the sheet will be consistent.

The Magic ID format is going to be consistent across the whole sheet: if the sheet has one record with multiple columns for the Magic ID, then all records on the sheet will have that same structure. If the sheet has one name with all the names combined, then all records on the sheet will be like that. However, this only applies to first and last name, the first and middle may be arbitrarily combined or separated, so we still have to test for that.

Let's define a type for our output

This is a good start to getting our output in the right format - if we know what business rules apply to all output records then we can build a type that meets them. In our case, the following type is applicable:

type ListPerson = {
    FirstName : string
    MiddleName : string option
    LastName : string
    Email : string
    MagicId : string
}

So now we have a type which enforces our business rules: for our output we can only have a type that meets all those criteria, thus it must have a FirstName, LastName, Email and MagicId, but it might have a MiddleName.

Building an input to create our output

We cannot forget that we're currently thinking about the matching bit, not the normalization. So right now we only care about how to form some data for our output, and since we know that the output is the same as the input, just skinnied down (because we're removing items in this list that are on the previous one), then we can decide that our input must be the same format as our output. Thus, the ListPerson record should also be used to enforce the input.

This said, we know we need to take a list of ListPerson in from the current list, and a ListPerson list from the previous list, and find the mutually exclusive values. This is actually really Easy with F#, and we can define a function that does this:

let listCompare other el =
    other |> List.filter (compare el) |> List.length = 0

This takes an "other" list an element from the current list to search for. This uses a custom compare function, for reasons we'll discuss shortly, filters the list down to only elements where compare returns true, and returns true if there are no elements in the list. This is pretty simple, and let's us call it by:

List.filter (listCompare oldListMembers)

Now we could, and should, modify listCompare to be more generic. Instead of using our forced compare function (defined elsewhere) we should let it take a function, but I'm going to leave that up to you.

Why did we use a custom comparer, and am I going to show it to you?

Of course, this custom compare function is defined so that we can make an optional comparison. By default F# build a full equality comparer for a record type, but that means that if MiddleName is a string option, and in one list it's Some value, but in another list it's None, we'll treat those as unique records. In our particular case we don't want to do that: we want these to compare based on the data that both records have present. So if MiddleName is Some v for both, then they can compare MiddleName, but if it's None for either, then don't compare it.

The custom compare function is basically the following:

let compare (m1 : ListPerson) (m2 : ListPerson) =
    // Compare mandatory values first
    m1.FirstName = m2.FirstName &&
    m1.LastName = m2.LastName &&
    m1.Email = m2.Email &&
    m1.MagicId = m2.MagicId &&
    // Default the match to `true` if either is `None` so that the previous conditions are the only influence
    match m1.MiddleName, m2.MiddleName with
    | Some a, Some b -> a = b
    | _ -> true

As you can see it just checks everything, and doesn't count a difference in the type of option stored as a false result.

We'll also want to do a List.distinctBy after our filter - in my real data I often get duplicated records within the same list but with or without a MiddleName, for example. We'll actually cheat a bit, and do the following:

List.distinctBy (fun u -> { u with MiddleName = None })

By setting MiddleName to None, then using that for the distinctBy, we'll only return one of the two records (and since the MiddleName is tertiary data in my situation, I don't care which) so that we don't have to deal with duplicates on that front either.

So in the end, we'll end up with some code that looks as follows:

let listCompare other el =
    other |> List.filter (compare el) |> List.length = 0

printfn "Getting members not on old list."
let newListUnique =
    newListMembers
    |> List.filter (listCompare oldListMembers)
    |> List.distinctBy (fun u -> { u with MiddleName = None })
printfn "Getting members not on new list."
let oldListUnique =
    oldListMembers
    |> List.filter (listCompare newListMembers)
    |> List.distinctBy (fun u -> { u with MiddleName = None })

Well that's pretty basic. We should extract that fun u to a new function, which we may as well do now:

let distinction u = { u with MiddleName = None }

printfn "Getting members not on old list."
let newListUnique =
    newListMembers
    |> List.filter (listCompare oldListMembers)
    |> List.distinctBy distinction
printfn "Getting members not on new list."
let oldListUnique =
    oldListMembers
    |> List.filter (listCompare newListMembers)
    |> List.distinctBy distinction

And hell, we could even make that a total composition:

let listCompare other el = other |> List.filter (compare el) |> List.length = 0
let distinction u = { u with MiddleName = None }
let filterBy other = List.filter (listCompare other) >> List.distinctBy distinction

printfn "Getting members not on old list."
let newListUnique =
    newListMembers
    |> filterBy oldListMembers
printfn "Getting members not on new list."
let oldListUnique =
    oldListMembers
    |> filterBy newListMembers

Easy enough, it also helps us see what's happening more clearly, since the steps are clearly isolated.

Pulling some data in

Now that we know what our filter is, what our data is, how it all interacts, let's pull in some good data. We're going to pull in data that already fits our model, which is included in the Step 3 Preparation workbook that's attached. There are just a few rows, like the rest of our data, but there's enough that we can do something with it. NPOI, and F# in general, make it really easy to load data. In fact, way easier than it should be. The F# compiler contains a lot of tricks up it's sleeve to make our lives easier, and we'll explore a couple of those pretty easily.

Today we're going to actually pull all the data in, filter it, and write it out. To pull the data in is pretty easy:

let newBookFilename = "C:\Users\Elliott\Desktop\Step3_Testing_New.xlsx"
let newBook =
    use fs = new FileStream(newBookFilename, FileMode.Open, FileAccess.Read)
    XSSFWorkbook(fs)

let oldBookFilename = "C:\Users\Elliott\Desktop\Step3_Testing_Old.xlsx"
let oldBook =
    use fs = new FileStream(oldBookFilename, FileMode.Open, FileAccess.Read)
    XSSFWorkbook(fs)

To do the XSSFWorkbook bit (which is just a construction of the XSSFWorkbook class from our FileStream) you need to open NPOI.XSSF.UserModel beforehand. I like to arrange these at the top of my file when they're large opens like this.

The next step is to pull our Process_Spec sheet in:

let newSheet = "Process_Spec" |> newBook.GetSheet
let oldSheet = "Process_Spec" |> oldBook.GetSheet

Well that was simple. We're already about 1/4 the way done. Let's pull in a list of all our specification members:

let newListMembers =
    [1..newSheet.LastRowNum]
    |> Seq.choose (newSheet.GetRow >> Option.ofObj)
    |> Seq.choose getSpecMember
    |> Seq.toList

let oldListMembers =
    [1..oldSheet.LastRowNum]
    |> Seq.choose (oldSheet.GetRow >> Option.ofObj)
    |> Seq.choose getSpecMember
    |> Seq.toList

Well this is really easy. And look, we've set ourselves up to just drop in the filtering we wrote in the previous section, damn F# really makes this simple. We start at 1 instead of 0 as we want to skip the header row. We're actually almost done.

We want to save three new sheets on our newWorkbook now: the list of all members on the newBook not on the oldBook, the list of all members on the oldBook not on the newBook, and then we'll format the list of members not on the oldBook in a specific format. This is used at work to drop into a separate Excel sheet that I format my SQL queries with, so some day this will be automated.

let newSheetName = "Process_Spec_New_Add"
let oldSheetName = "Process_Spec_Old_Add"
let templateSheetName = "Process_Spec_Template"

let header = { FirstName = "First Name"; MiddleName = Some "Middle Name"; LastName = "Last Name"; Email = "Email"; MagicId = "Magic ID" }
let addSheet func (workbook : IWorkbook) name list =
    match name |> workbook.GetSheet |> Option.ofObj with
    | Some _ -> name |> workbook.GetSheetIndex |> workbook.RemoveSheetAt
    | None -> ()

    let sheet = name |> workbook.CreateSheet
    (header::list) |> List.iteri (fun i el -> i |> sheet.CreateRow |> func el)

addSheet fillRow newBook newSheetName newListUnique
addSheet fillRow newBook oldSheetName oldListUnique
addSheet fillTemplateRow newBook templateSheetName newListUnique

let write (book : IWorkbook) =
    use fs = File.Create("Spec3_Testing_Finished.xlsx")
    book.Write(fs)

write newBook

And we're done. Boy, that was easy. I left the implemenation of getSpecMember, fillRow and fillTemplateRow out, so let me include them quick for you. They're quite easy to define, and they really do give us more control over what's going on. We end up needing two functions first, which are helpers to set values on a cell:

let setValue (s : string) (cell:ICell) =
    s |> cell.SetCellValue
let setValueOpt (s : string option) (cell:ICell) =
    s |> Option.map (fun s -> cell |> setValue s) |> ignore

After that, we need another helper to get values, this is pretty easy and I excluded formulas because I expect our previous process to strip them out entirely.

let getVal (cell : ICell) : string option = 
    cell
    |> Option.ofObj
    |> Option.map (fun cell ->
        cell.CellType
        |> (function | CellType.Boolean -> cell.BooleanCellValue |> Object.toStr | CellType.Numeric -> cell.NumericCellValue |> Object.toStr | _ -> cell.StringCellValue))

You remember our Object.toStr method, right? If not, I recommend trying to implement it from scratch, based on the idea that it returns o.ToString(). We then have a getSpecMember method:

let getSpecMember (row : IRow) =
    let get = row.GetCell >> getVal
    match 0 |> get, 1 |> get, 2 |> get, 3 |> get, 4 |> get with
    | Some fName, mName, Some lName, Some email, Some magicId ->
        Some { FirstName = fName; MiddleName = mName; LastName = lName; Email = email; MagicId = magicId }
    | _ -> None

We use the options to tell us if we have a valid spec or not, when matched with Seq.choose it will filter away all the None values., Next are the basic fillRow and fillTemplateRow, which should be self-explanatory:

let fillRow el (row : IRow) =
    0 |> row.CreateCell |> setValue el.FirstName
    1 |> row.CreateCell |> setValueOpt el.MiddleName
    2 |> row.CreateCell |> setValue el.LastName
    3 |> row.CreateCell |> setValue el.Email
    4 |> row.CreateCell |> setValue el.MagicId

let fillTemplateRow el (row : IRow) =
    0 |> row.CreateCell |> setValue el.FirstName
    1 |> row.CreateCell |> setValueOpt el.MiddleName
    2 |> row.CreateCell |> setValue el.LastName
    3 |> row.CreateCell |> setValue el.Email
    4 |> row.CreateCell |> setValue el.MagicId
    5 |> row.CreateCell |> setValue (el.MagicId |> String.digitsOnly)

The difference in them is that fillTemplateRow includes an additional column for the MagicId, but with only digit characters in it. In the real data I often get other characters in that column, which may or may not be valid, and for the processing I have to do I only want to consider the numeric characters at times. To use the IRow, ICell and IWorkbook you'll need to open NPOI.SS.UserModel, but then that's it. We have all the pieces to complete our task, in fact, we're pretty much done with it.

Put it together and ship it

At this point, we're done with the third step, and now we know what format our output from Step 2 should be in. This is important to establish as it allows us to work the two ends together - we'll actually work from the start and finish towards each other when we start with Step 2, which I think will end up being covered over more than one post, because it's a long and painful process. Hopefully today's post showed you how to actually start working on projects like a software engineer, and started to allow you to build up your knowledge and thought process. It's important to know how to solve problems, that's probably the biggest thing I see new software engineers struggle with. You need to break it down, and start thinking about what steps need to be done. Build things one step at a time - don't mind of it goes slowly at first, you'll eventually get to the point where you just start breaking problems down without even thinking about them.

By the time we finish, working it as an F# script should look something like:

open System
open System.IO
open NPOI.XSSF.UserModel
open NPOI.SS.UserModel

type ListPerson = {
    FirstName : string
    MiddleName : string option
    LastName : string
    Email : string
    MagicId : string
}

let compare (m1 : ListPerson) (m2 : ListPerson) =
    // Compare mandatory values first
    m1.FirstName = m2.FirstName &&
    m1.LastName = m2.LastName &&
    m1.Email = m2.Email &&
    m1.MagicId = m2.MagicId &&
    // Default the match to `true` if either is `None` so that the previous conditions are the only influence
    match m1.MiddleName, m2.MiddleName with
    | Some a, Some b -> a = b
    | _ -> true

let setValue (s : string) (cell:ICell) =
    s |> cell.SetCellValue
let setValueOpt (s : string option) (cell:ICell) =
    s |> Option.map (fun s -> cell |> setValue s) |> ignore

let getVal (cell : ICell) : string option = 
    cell
    |> Option.ofObj
    |> Option.map (fun cell ->
        cell.CellType
        |> (function | CellType.Boolean -> cell.BooleanCellValue |> Object.toStr | CellType.Numeric -> cell.NumericCellValue |> Object.toStr | _ -> cell.StringCellValue))

let listCompare other el = other |> List.filter (compare el) |> List.length = 0
let distinction u = { u with MiddleName = None }
let filterBy other = List.filter (listCompare other) >> List.distinctBy distinction

let getSpecMember (row : IRow) =
    let get = row.GetCell >> getVal
    match 0 |> get, 1 |> get, 2 |> get, 3 |> get, 4 |> get with
    | Some fName, mName, Some lName, Some email, Some magicId ->
        Some { FirstName = fName; MiddleName = mName; LastName = lName; Email = email; MagicId = magicId }
    | _ -> None

let fillRow el (row : IRow) =
    0 |> row.CreateCell |> setValue el.FirstName
    1 |> row.CreateCell |> setValueOpt el.MiddleName
    2 |> row.CreateCell |> setValue el.LastName
    3 |> row.CreateCell |> setValue el.Email
    4 |> row.CreateCell |> setValue el.MagicId

let fillTemplateRow el (row : IRow) =
    0 |> row.CreateCell |> setValue el.FirstName
    1 |> row.CreateCell |> setValueOpt el.MiddleName
    2 |> row.CreateCell |> setValue el.LastName
    3 |> row.CreateCell |> setValue el.Email
    4 |> row.CreateCell |> setValue el.MagicId
    5 |> row.CreateCell |> setValue (el.MagicId |> String.digitsOnly)

let newBookFilename = "C:\Users\EBrown\Desktop\Step3_Testing_New.xlsx"
let newBook =
    use fs = new FileStream(newBookFilename, FileMode.Open, FileAccess.Read)
    XSSFWorkbook(fs)

let oldBookFilename = "C:\Users\EBrown\Desktop\Step3_Testing_Old.xlsx"
let oldBook =
    use fs = new FileStream(oldBookFilename, FileMode.Open, FileAccess.Read)
    XSSFWorkbook(fs)

let newSheet = "Process_Spec" |> newBook.GetSheet
let oldSheet = "Process_Spec" |> oldBook.GetSheet

let newListMembers =
    [1..newSheet.LastRowNum]
    |> Seq.choose (newSheet.GetRow >> Option.ofObj)
    |> Seq.choose getSpecMember
    |> Seq.toList

let oldListMembers =
    [1..oldSheet.LastRowNum]
    |> Seq.choose (oldSheet.GetRow >> Option.ofObj)
    |> Seq.choose getSpecMember
    |> Seq.toList

printfn "Getting members not on old list."
let newListUnique = newListMembers |> filterBy oldListMembers
printfn "Getting members not on new list."
let oldListUnique = oldListMembers |> filterBy newListMembers

let newSheetName = "Process_Spec_New_Add"
let oldSheetName = "Process_Spec_Old_Add"
let templateSheetName = "Process_Spec_Template"

let header = { FirstName = "First Name"; MiddleName = Some "Middle Name"; LastName = "Last Name"; Email = "Email"; MagicId = "Magic ID" }
let addSheet func (workbook : IWorkbook) name list =
    match name |> workbook.GetSheet |> Option.ofObj with
    | Some _ -> name |> workbook.GetSheetIndex |> workbook.RemoveSheetAt
    | None -> ()

    let sheet = name |> workbook.CreateSheet
    (header::list) |> List.iteri (fun i el -> i |> sheet.CreateRow |> func el)

addSheet fillRow newBook newSheetName newListUnique
addSheet fillRow newBook oldSheetName oldListUnique
addSheet fillTemplateRow newBook templateSheetName newListUnique

let write (book : IWorkbook) =
    use fs = File.Create("C:\Users\EBrown\Desktop\Step3_Testing_Finished.xlsx")
    book.Write(fs)

write newBook

This is roughly 120 lines of code. We read two spreadsheets, filtered the members from each down to only unique values, and saved them back out in 120 lines of code. That's phenomenal. In the past five posts, we went from doing basic function work, to processing a full-on spreadsheet and filtering data in it. You're going to see that we're going to continue this trend of build and improve over the next few posts. We'll actually work out how to replace the variables we use with new and improved parameters, allowing us to send parameters to our application via arguments, which means we will allow the user to have a more dynamic interaction.

As we continue to explore (and I recommend you explore some on your own) we'll get more in-depth looks at how we can process these spreadsheets more effectively. We'll get in to doing the actual, raw-level morphing coming up soon, and actually translate our bad sheets into decent sheets, and decent sheets into good sheets.

This is all actually fairly straightforward, and you'll see once we get started that it's not nearly as hard as I make it out to be. We'll go through things rather quickly, but once it all clicks you should be able to go back to the previous posts and review them to make certain you understand what's happening. Most of what I do is pretty simple, and fairly reasonable, but there may be things that confuse you a bit or make you think a little harde than normal - that's intended. I want you to have to think and understand, I want you to think critically about what's going on, becuase that is the only way you become a better programmer.


I started this whole blog series for a friend of mine (currently a U.S. Marine deployed to Turkey) who has asked me, quote possibly fifty times over the past three years or so, to teach him how to become a programmer. (I mentioned this in the first of these articles.) We both haven't had schedules well enough to connect on it, so this series should allow a person to work at their own pace while still allowing me to work at mine. If anyone has any suggestions for material to cover, or how, please tweet at me (@EBrown8534) or comment, I really want to make this a good series for as many people as possible. And feel free to let me know if you like anything in particular as well, I would really love to hear any input you might have.

Second, if you're using VBA in Excel (or, really, any other VBA environment), this GoFundMe could really use your support! These guys and gals have been building a wonderful add-on for the Excel VBE (and they've even spent a good amount of time making it work in all VBE's, such as AutoCAD, SolidWorks, CorelDRAW and Sage 300 ERP), for a while now, all on their own free time. For the amount of work required, that's incredible. Even a few bucks will help them - they're a really great group.

Spreadsheets: Step3_Testing_Old.xlsx (9KB) Step3_Testing_New.xlsx (9.1KB)

Why it's important to be respectful and humble

Being Humble is Important

I want to talk (briefly) about why being humble is an important aspect of being a developer in the modern world.

Take it from a 50 year old programmer

One of the people who work with my company is a 50-something programmer, who's been doing this work for a very long time, and is mostly self-taught. (Maybe even entirely.) Recently I got the opportunity to sit down with this person for a one-on-one discussion, and his mentality and actions really put things in to perspective.

To give you some background information: this meeting was a discussion on some systems that he built which we'll be taking over shortly, we were going through a lot of the code and trying to make sure that we understood the flow of the software. It wasn't particularly confusing or disheartening, but there were some questions that I thought to myself as we were reviewing it, and thought "boy this part could really be done better."

This is the wrong mentality. Recently, it seems like society (especially the United States) has become very centralized on the idea of finding and pointing out other person's flaws, especially during a technical review. It's much easier to say "this is wrong" than it is to say "I like how you did __". It's much easier to put someone down than lift them up.

I'm telling you this because this particular gentleman changed my aspect on software engineering, as we were going through some of this code (which he had been the only maintainer of for a decade, literally), he was commenting to us out-loud, things such as the following:

I know there's a better way to do that now, but when I wrote this I only knew how to do it this way.

Now this wasn't prompted, we didn't point out flaws in his code, we didn't ask "why is this _ instead of _?", we were just making sure we understood the broader picture. He felt the need to tell us that, which means he probably felt somewhat threatened by us.

The biggest mistake you can make is to be rude or aggressive towards your own camp

This is so much bigger than we think. For a seasoned developer to feel threatened enough to have to preemtively defend himself in that manner is a tragedy. And I won't lie: this is my fault, we had a similar sit-down with this same person several months ago and I had pointed out some things that could have been done better. This is the wrong thing to do.

A colleague of mine on Twitter recently quoted and sed replaced and effectively coined the following phrase:

If you comment on someone's code, especially a junior dev, I expect you to say something nice, even if you have valid criticisms too.

It's easy to make people (especially junior developers) feel bad about the code they write, it's much harder to point out the good things. Just do it, swallow your pride (I know I have recently) and tell them what you liked, no matter how big or small.

I really liked the name you used for this, it actually makes the intent extremely clear.

Don't be sarcastic, mean it.


Don't worry, this post won't interrupt our continued adventures tomorrow, I'm going to be posting the technical articles on Monday/Wednesday/Friday, and non-technical articles (if I have any) on Tuesday and Thursday.

Getting started with programming and getting absolutely nowhere (Part 4)

Moving on to solving Business Problems

Lesson 3: Since We've Made Something, Let's Make it Cooler
Lesson 5: Now We Begin Our Adventure

By now I hope you have a grasp on the language we'll be using, because I'm done with the basic Syntax section. I'm going to start going into business problems, business logic, and we're going to actually build things for our business domain. These are real problems, real challenges that I've had to solve for work, and I want to give you real-world experience, instead of the contrived examples many "introduction to programming" blogs, tutorials and guides go through. (Things like "Hello World", "Fizz Buzz", "Project Euler" - it's not that I think these are not valuable, I just see no need to repeat the same thing over and over again. You can find plenty of tutorials that use these all over the net, mine will not be such things.)

You probably remember that in my first post I said we were going to explore things, and that I would not give you all the answers. I am still going to be following that modus operandi, and we're going to define some problems and talk out the solutions. From here-on-out any new syntax tricks I introduce I'll be brief on, basically I'll tell you the most effective term to google in order to get information on the component of the language. For this reason I recommend you bookmark F# for Fun and Profit. This will become an extraordinary tool on our adventures.

We're going to process a lot of spreadsheet data

I'm not sure if you're aware, but in the business world we use Excel a lot. I mean, way more than is healthy (or even reasonable). As a result of this, many of us in the IT field find that we end up processing Excel data a lot more than normal. (In fact, there's an entire community of programmers I happen to know working on a VBA IDE add-in, to make this less unpleasant.) I'm not telling you this to scare you, but it probably will; that said, don't be concerned, because there are tools out there to make this workable from F#: the first of which is COM Interop (which we won't do, because it sucks), and the second I'm going to mention being the NPOI library, available on NuGet.

We're going to use (or, at least, I'm going to use) the NPOI library, based on the Apache POI for Java. You'll find the documentation to be less than helpful, but it's a very powerful library. It allows you to work with the Excel sheet as a regular .NET object, instead of having to deal with it from a COM Interop point of view, which makes heavy use of dynamic typing. (This, as a side-effect, makes it more painful to work with in F# as dynamic typing is not a regular thing there.) Before we start working with an Excel workbook, though, I want to get some "helper" functions we'll use out of the way.

Keep in mind that I'm going to be demonstrating how to solve problems based on things I've gone through. This means many of the problems I go through here will be Excel based, but not all of them. So bear with it, if it seems like we're doing a lot of boring Excel processing and such, it just means I haven't don anything else at work for a while. None of these things are contrived, I really cannot stress that enough.

Creating some helpers for our operations

Right, so all that said, let's move on. Excel (and, as a result, most spreadsheet processors) use a letter+number to index a cell, with the letter being the column number (I.e. A is column 1), and the number being the row. For this reason, we probably want a function which can turn a letter-based-column-number into an actual digit, since NPOI (and arrays in general) index things by integer values. The function I've defined is pretty basic, but it convers a char to an integer, returning Some if it's valid, and None if not:

let letterVal (c : char) =
    match int c with
    | v when v >= int 'A' && v <= int 'Z' -> Some (v - int 'A')
    | _ -> None

Of course this has the flaw of only supporting "capital case" letters, but we could modify that to support lower-case as well:

let letterVal (c : char) =
    match int c with
    | v when v >= int 'A' && v <= int 'Z' -> Some (v - int 'A')
    | v when v >= int 'a' && v <= int 'z' -> Some (v - int 'a')
    | _ -> None

And if it's necessary for your implementation we can support literal digits:

let letterVal (c : char) =
    match int c with
    | v when v >= int 'A' && v <= int 'Z' -> Some (v - int 'A')
    | v when v >= int 'a' && v <= int 'z' -> Some (v - int 'a')
    | v when v >= int '0' && v <= int '9' -> Some (v - int '0')
    | _ -> None

So this gets us a function which we can test out that should allow us to turn "A" into the first column number, which when 0-based is 0: let colValue = 'A' |> letterVal. Pretty simple, and pretty powerful. Being an F# function that returns an Option, we can do something like the following: let colValues = 'AB' |> String.explode |> Array.choose letterVal. This should give us an array of [|0; 1|]. The problem with this is that we really need a single number, not two. For AB we want to get 27, so we could build a function (colNum) that defines the following:

let colNum =
    String.explode
    >> Array.choose letterVal
    >> Array.rev
    >> Array.mapi (fun i x -> (float x) * System.Math.Pow(26., i |> float))
    >> Array.sum
    >> int

When bugs fly

We have a huge bug with this though: for our value of "AB" it's going to return 1, not 27 as we would expect. Our mistake is not so subtle: the (float x) * is the problem: when x is 0 (as it is for A), it will always return 0. We could fix this by doing float x + 1., but then we're doing extra work that really is part of the letterVal logic, to fix this, we'll just apply the offset in letterVal:

let letterVal (c : char) =
    match int c with
    | v when v >= int 'A' && v <= int 'Z' -> Some (1 + v - int 'A')
    | v when v >= int 'a' && v <= int 'z' -> Some (1 + v - int 'a')
    | v when v >= int '0' && v <= int '9' -> Some (1 + v - int '0')
    | _ -> None

But now we think to ourselves, 'why am I doing the same step (1 + v - int ...) in three places? Can I do that in one place?' We sure can:

let letterVal (c : char) =
    let f (c : char) v = 1 + v - int c
    match int c with
    | v when v >= int 'A' && v <= int 'Z' -> v |> f 'A' |> Some
    | v when v >= int 'a' && v <= int 'z' -> v |> f 'a' |> Some
    | v when v >= int '0' && v <= int '9' -> v |> f '0' |> Some
    | _ -> None

So this almost fixes our bug, now "AB" |> colnum returns 28, which is almost there. We need to subtract 1 from the result to get the actual value. For this we're going to just create a sub method, which will take two values and then return the first value subtracted from the second. You can define this yourself, but it should end up working as follows:

let colNum =
    String.explode
    >> Array.choose letterVal
    >> Array.rev
    >> Array.mapi (fun i x -> (float x) * System.Math.Pow(26., i |> float))
    >> Array.sum
    >> int
    >> sub 1

Beautiful. Our code works, we can test this by running ["A"; "Z"; "AA"; "AB"; "ZZ"; "AAA"] |> List.map colNum, which should give us [1; 26; 27; 28; 702; 703]. We could subtract another 1 in our method to zero-base it, but instead we'll make a new method which returns colNum minus 1: let colNumZBase =. You can implement this on your own.

Making it a bit more usable

This whole colNum function is great, but I want to make the fun i x lambda an actual function, that can be used elsewhere. For this we have to think pretty hard about what it's doing, why and how. For those who don't know what's happening here, the Array.mapi lambda is converting the values from an integer to the placeholder for the base. If we consider that our 26. is our base, call it n, each value is going from x to x * 26.^i, where i is the index. (We reverse it so that as the order of the values goes right, they become larger indexes and thus higher placeholders. This means each position is n ^ i, where n is the base and i is the position, then we can define a function which does this math more abstractly:

let baseValue b i =
    System.Math.Pow(b, i)

Now our formula has some more meaning, but it's not fully reusable. We can see cases when we would be trying to do this often, in fact our baseValue always uses 26. as the base in our scenario, so we could subdefine even further: let indexVal = baseValue 26. and change our formula to fun i x -> (float x) * indexVal (float i), but we can apply further transformations to even more generalize it: let elementVal i x = i |> float |> baseValue 26. |> (*) x, which leads us for a final transformation into:

let colNum =
    let elementVal i x = i |> float |> baseValue 26. |> (*) x
    String.explode
    >> Array.choose letterVal
    >> Array.rev
    >> Array.map float
    >> Array.mapi elementVal
    >> Array.sum
    >> int
let colNumZBase = colNum >> sub 1

This leaves our entire colNum as just a series of function compositions, and each function is independent. This makes our code much more readable, and allows us to be more expressive in the future. Notice that we expected x to be a float in elementVal, we could remove the Array.map float call and then replace x with (float x) in the elementVal function, but that's less clear. Now the transformation is happening elsewhere - we want it to happen before that function to alleviate its responsibilities. We can expect i to be transformed to a float, because after all it's an index, but the values could have special transformations, so we don't want to make that the elementVal functions responsibility.

Organizing our code

We've built everything, and it all works, but we need to organize it. F# has Modules and Namespaces, I won't go into details of the differences, you can read about those elsewhere, instead I'm just going to drop all our code into a module in a separate file that we can reuse later.

module Excel
    /// Allows piping of the subtraction operator as the second argument: sub a b is equivalent to b - a.
    let private sub b a = ...

    /// Returns the integer value of a character if the charater is an alphabet character, or a number. (If char is between 'A' and 'Z', returns char - 'A' + 1, for example.)
    let letterVal (c : char) = ...

    /// Returns the value of the index for the specified base. I.e. base^i
    let private baseValue b i = ...

    /// Returns the 1-based index of the column number. I.e. "A" becomes 1, "AA" becomes 27.
    let colNum = ...

    /// Returns colNum - 1. I.e. "A" becomes 0, "AA" becomes 26.
    let colNumZBase = ...

This keeps it out of our way in our production work - because we won't need to modify or change it there (most likely). We're just going to consume it, which means we just need to be able to reference what we've already built. We could organize it further, since sub and baseValue are more abstract functions, but I leave that up to you if you like.

Some other helpful functions

I don't want to start actually processing an Excel workbook today, so I'm going to give you some more modules and such that I have that are particularly helpful. I won't explain all (most) of them, but they'll be used in my samples later and I want you to have them now.

The first three I'll explain slightly: F# has the three collection types, and these "flatten" a collection of collections, that is: if you have an array of arrays, this will convert them to one array, with all the sub elements. It's basically _.collect id, where id is the identity function, but I find it easier to read _.flatten in my code than _.collect id.

module Array
    /// Flattens an array of arrays to a single-dimensional array. (Equivalent to Array.collect id)
    let flatten<'a> : 'a array array -> 'a array = Array.collect id
module List
    /// Flattens a list of lists to a single-dimensional list. (Equivalent to List.collect id)
    let flatten<'a> : 'a list list -> 'a list = List.collect id
module Seq
    /// Flattens a sequence of sequences to a single-dimensional sequence. (Equivalent to Seq.collect id)
    let flatten<'a> : 'a seq seq -> 'a seq = Seq.collect id

I have all of these in three separate files, but you can arrange them how you like. They're called like a normal function on that collection type, with .flatten. The <'a> is an F# generic type-parameter. Feel free to look those up to learn more.

The rest of the helpful functions that I have (up to now) are below, each module is in it's own file for organization purposes:

module Object
    /// Equivalent to object.ToString.
    let toStr o = o.ToString()

module Char
    /// Returns true if a char is equal to the double-quote (").
    let isDQuote = (=) '"'
    /// Returns true if a char is equal to the single-quote (').
    let isSQuote = (=) '''
    /// Returns true if a char is equal to the double-quote (") or single-quote (').
    let isQuote c = c |> isDQuote || c |> isSQuote

module String
    open System
    /// Gets the char array of a string.
    let explode : string -> char array = Seq.toArray
    /// Assembles a string from a char array.
    let implode : char array -> string = String
    /// Splits a string using the StringSplitOptions on the specified char.
    let splitOpt (options : StringSplitOptions) (c : char) (str : string) = str.Split([|c|], options)
    /// Splits a string on the specified char using the default StringSplitOptions.
    let split (c : char) (str : string) = str.Split(c)
    /// Determines if a string contains a substring.
    let contains (search : string) (subject : string) = subject.Contains(search)
    /// Performs a String.Trim() which removes leading and trailing whitespace.
    let trim (str : string) = str.Trim()

    let private processStr func = explode >> func >> implode
    let private filterStr func = explode >> Array.filter func >> implode

    /// Removes all double-quote characters from a string.
    let stripDQuotes = filterStr (Char.isDQuote >> not)
    /// Removes all single-quote characters from a string
    let stripSQuotes = filterStr (Char.isSQuote >> not)
    /// Removes all double- or single-quote characters from a string in one pass.
    let stripQuotes = filterStr (Char.isQuote >> not)
    /// Filters a string to contain only digits.
    let digitsOnly = filterStr Char.IsDigit
    /// Filters a string to contain only letters.
    let lettersOnly = filterStr Char.IsLetter

All of these are pretty basic, so I won't explain them. Feel free to use them or not, but these are some of the most common functions I use in my work. You'll notice most of them are function compositions: I like to compose functions as much as possible, as it should reduce some overhead as far as function calls later on. (Whether it does or not is untested, but at the very least it forces me to be consistent.)

Defining function "purity"

Many of these are also "pure" functions, that is: a function which will return the exact same output every time it is run for a specific input. Think of something like a + b, the + function is pure, it will always return the same result for any two inputs. In our library functions here all of them are, in-fact, pure. They are entirely reproduceable. If we repeat the function at any given point in time it will always have the same result.

Some functions and methods in .NET are considered impure: they are not guaranteed to return the same output for any given input on each invokation. Consider the .NET Random: calling Next() will generate a different result on each subsequent call, which means more than just the input parameters can influence the function. If a function return value can change based on things other than input parameters, it's impure.


Bonus day: Filtering Parenthesis

One of the fields I get in these spreadsheets is a specific data field, often with a bunch of parenthesized information in it. Let me rephrase that: the data is ______ and then there's a nasty (____) in there, sometimes with sub-parenthesis.

This, of course, doesn't match with what I expect from the field, usually I am only concerned with the non-parenthetical data. (In fact, I'm always concerned with the non-parenthetical data, just occasionally I want the parenthetical data as well. Not often.) In order to support this, I built a function filterParens (filter parenthesis) which will go through each character in the string, and if it is an opening parenthesis, it will ignore all characters until that parenthesis is closed. It also supports sub-parentheticals, that is: (Something (Else)) will be entirely filtered, and (Something (Else) Entirely) will be as well.

I did this through clever use of a fold, which really wasn't all that clever. In fact, it's pretty straightforward:

let filterParens =
    String.explode
    >> Array.fold (fun acc el -> 
        match acc, el with
        | (items, i), '(' -> (items, i + 1)
        | (items, i), ')' -> (items, Math.Min(i - 1, 0))
        | (items, 0), _ -> (el::items, 0)
        | (items, i), _ -> (items, i)) ([], 0)
    >> fst
    >> List.rev
    >> List.toArray
    >> String.implode

If it's not obvious, acc is a tuple of the current list of items, and the level of parenthesis we are in. (Because F# is stateless and immutable by default, we cannot just increment or decrement the value for our levels. This is not a bad thing though - beginners may find it easier to understand a stateless language, which is why I selected F# for our adventures.) Also, fst is a function that takes a tuple and returns the first element, if you want to research further.

Now I don't allow it to drop below 0 (the Math.min(i - 1, 0)), so if we close a parenthesis before we opened it, the only result will be that we don't save either. Anything between them will still be saved, and anything after the opening one will not be saved (because it will be back to level 1). This is not a problem for my data-set, but if it is for yours then you could easily modify the second and third match expressions to account for that. (In fact, I recommend you try doing that. Your homework for this post is to modify filterParens to not stop at 0 levels, but allow negative levels as well, and not ignore text between )...(.)


We're done with todays lesson, I know it wasn't as "cool" as we had hoped, but this sets us up for success when we do begin processing things with NPOI. (Which I expect will be the next post here.) As always, keep in mind that while this may be somewhat mundane, it does lead in to bigger things. (I do have a form of a curriculum planned.)

Getting started with programming and getting absolutely nowhere (Part 3)

Since we've made something, let's make it cooler

Lesson 2: Now That We're Programmers, Let's Make Something
Lesson 4: Moving on to Solving Business Problems

Now that we've built something that actually resembles a business problem, and I want to take this time to show you one of the coolest features of F# in my opinion. So in this post, I want to introduce the conecpt of a "DSL", or a "domain-specific language".

F# has a very powerful feature that allows you to create your own operators. What do I mean by operators? Things like +, or -, or the |> operator. You can build your own, which leads to building your own DSL.

What, exactly, is a DSL?

I mentioned the name "domain-specific language", but I know without context that has little meaning, so let's clarify. By building our own DSL we're actually creating a language that is a subset of another language. We're creating our own language! We don't even have to do much to make it happen, so let's investigate that a little bit.

First off, a DSL isn't always apporpriate, you need to take care to understand what a DSL applies to, and when to use it. As a language, F# already has a lot of operators, including some confusing ones, so adding more to the mix is probably not the greatest idea unless it makes sense, and it's going to be used a lot.

In work I've actually done this, and I'm going to show off the operator I created, how it works, and how you can create one. It's really quite simple, and if you've tried experimenting with the language you probably already learned that it's possible, and may have created one yourself. If so, congrats! That's awesome! You clearly figured it out faster than me, and I applaud you for that, because I didn't figure this out until a very long time after I started using F#.

Let's make a DSL

Seriously, let's create one. Remember that split function we built before? It was pretty basic, wasn't it? We took a char and a string, and we split apporpriately, but it's about time for us to make it a little easier.

In the work I do we do a lot of string parsing, which means a lot of calling split. In fact, I do it so often that I have a String.fs library file, that has this (and a few other) nifty functions in it. I carry that library to all my F# projects, and it saves me a great deal of time.

let split (c:char) (s:string) = s.Split(c)

Remember this method? Pretty basic, take a string and split on the char. We can partially apply it, even, to make it simpler if we're splitting on the same string a lot. Now what if I told you we could build our own operator to make this work? What if we could create something like:

let splitString = someString <|> someChar

Notice the <|> operator? Try to use it in F# right now, you probably can't. (Unless you already created it, then feel free to use it.) Because I'm a nice person I'm going to just give you this operator for free.

let (<|>) (s:string) (c:char) = s |> split c

It is that complex, adding your own operator is as easy as defining a function with the operator in parenthesis as the function name. We could literally create anything, the issue is, should we?

Uh oh - here be dragons, errr...dinosaurs

Be careful when defining a DSL

DSL's can be powerful things - they can make complex things simple, and they can make simple things complex. You want to define a DSL that fits, and a DSL that is powerful. Don't define a DSL "just because you can" - I've done so, bad idea.

However, if you define a DSL well. you can turn something increasingly complex into a much simpler feeling, it feels like a built-in operator. It feels like it belongs. It just feels natural.

Since this post is coming out on a Friday (at least here in the Eastern Time Zone - could easily be Saturday for many of you) you probably don't have to work tomorrow (or today, if it's already Saturday), so if you get a chance, experiment with custom operators. Try to figure out how they work, this one little feature can truly make things easy on you. Shoot, you can even try to build the opposite operator: >|< if you like, it should fit in with the code below. (Though in my real DSL for this, the operator is actually ><, but I demonstrated it as >|< since that's more directly transpated as the opposite of what we have.)

let names =
    rawNames <|> ' '
    |> Array.toList
    |> List.fold (fun (acc:string list) str ->
        match acc with
        | prev::tail when prev.Length <= 3 -> (prev >|< " " >|< str)::tail
        | _ -> str::acc) []
    |> List.rev
    |> Array.ofList

Bonus day: documentation

We've gone through writing some code, but the biggest thing people get caught up on is writing documentation. Now, on one hand, I'm a firm believer that good code documents itself. On the other hand, I know that I very infrequently write good code. So we always need a way to tell the "consumer" (often times ourselves, but if you're working on some API think of the people who will be using it) what to do with our method/function/variable/thing. This comes in many forms, one of which is the Visual Studio / F# documentation comments.

Have you ever wondered how people can put text into the intellisense-hover-dealy of the function/variable/parameter they provide? It's actually quite easy, and in F# it's trivial: use a triple-slash comment: ///.

/// Splits a string into an array on the specified char.
let (<|>) (s:string) (c:char) = s |> split c

Now when we hover any instance of our <|> operator, we should see that string show up:

Woah - its like magic

Damn that's cool. If you're building an F# project, you can also generate an XML file with all the triple-slash comment docs in it. Simply open the project properties, go to "Build" and check the "XML Documentation File" box. Visual Studio will then generate a .xml file in the same directory as your .exe/.dll that contains these comments in a form that Visual Studio can understand, and that can be used to generate web documentation. (This is part of how MSDN is built - the code comment documentation becomes the web documentation.)


As an aside, I want to tell you all how grateful I am to have met some of the developer comrades I have - a lot of them have been extremely helpful to me regarding programming, software development in general, and even doing things right.I was actually thinking about each one of them as I wrote this, and I would call them out by name but I don't want to reveal any personal information, so for those of you that I'm referring to (and you all know who you are): thank you.

Getting started with programming and getting absolutely nowhere (Part 2)

Now that we're programmers, let's make something

Lesson 1: How To Become a Programmer (In a Few Not-So-Easy Steps)
Lesson 3: Since We've Made Something, Let's Make It Cooler

So two days ago I posted a very rudimentary introduction into "how to become a programmer", which was mostly a rant about what is and isn't important, and about how F# works. (Why F#? Because it lets us be lazy, and it does a whole hell-of-a-lot for you.)

Today we're going to introduce the business domain.

Almost everything is "business logic"

The first thing about the "business domain" is that it's a very real thing. In the business domain everything is what's called "business logic", that is: logic that makes the business work. Now we're not talking "business" as in "for-profit corporate entity", but "business" as in "a person's concern". (Think: "it's none of my business".)

Basically, business logic is any logic that is related to the task at hand. So consider a "cart" system, business logic might be "we need to be able to add an item to the cart, remove an item from the cart, and change quantities of items in the cart." Simple, right? Somewhat straightforward. You may also here this referred to as "BL" or "domain logic", or the collective "domain" (which includes the "business domain" and the "business logic"). I'm going to call it "business logic" because that's what is most applicable to my situation, but feel free to use whatever terms you feel appropriate.

So we saw an example of business logic, what do we do with business logic? Usually we write our software around the business logic, so we may write a function addItemToCart or removeItemFromCart that adjusts the items in our cart as appropriate. These are considered "functions that perform business logic".

Simple, right? This is all largely basic, but it's important to know because we need to understand what is and isn't business logic. For example, when we disucss the implementation itself, we're no longer talking about business logic, but "application logic". The business logic is the broader picture: what is the purpose of our application? It's the "problem" the application is "solving", this can vary all over the place, but the basic idea is it's the higher-overview of the application.

All that long-worded rambling aside, let's define a piece of business logic that we want to build an application to solve:

Given a string, split it into a group of "words" on spaces, where each "word" has no-less-than three characters.

This is a very real problem I had to solve for my job recently, I won't say why, but it was necessary. So, we're going to focus specifically on this problem in this post.

The first step is to define the sub-steps

Every problem is, at it's core, a series of smaller problems. You can always break it down, though sometimes it's unecessary. For our problem let's break it down into subparts:

  1. Split a string on spaces
  2. Analyze the first word, if it has 3 or fewer characters, group it with the next word
  3. Repeat for the next word

So that's more manageable, and we can solve that in "sub-steps". We can solve each step independently. (Granted, we only have 3 steps, but we'll break step 2 down futher when we get to it.) Each step should be testable: so given some input I should have a defined output.

Splitting a string on spaces

So the first step is actually the easiest, split the string on spaces. in F# this is easily done via String.Split, which is available in the entire .NET framework:

let parts = "Some String With A Short Word".Split(' ')

So that's easy, and we can create a function that handles that pretty easily, we'll define a split function that takes a string, and a char (separator) and splits the string on the "char" (separator):

let split (c:char) (s:string) =
    s.Split(c)

That's pretty basic, but why did we define a new function when we could just call s.Split(c) directly, with our own s and c? This is so that we can split strings "idiomatically", that is, matching the general style of F# code. F# isn't about calling methods, it's about composing functions, and you cannot compose String.Split(char) easily like that, so we define a function that lets us do so.

So now we could test our function, which involves simply calling it:

let parts = "Some String With A Short Word" |> split ' '

Well that was easy. This shows that we can pipe a string into the split function, and it does what we expect.

Moving to Step 2: this is going to hurt

So F# makes step 2 pretty easy, and if you have any programming experience with a non-functional language, I want you to forget it right now. What you think you know is not true in F#, and we need to redefine the problem.

We can break step 2 down a little further:

  1. Get a word
  2. Check how many characters the word has
  3. If < 3, it belongs with the next word

So let's start building a groupWords function, we're going to do this the "Elliott Brown" style, which is probably different from what you've usually seen, but this is where functional languages make things pretty awesome.

Instead of looking at the current word, we're going to look at the previous word, since it makes things easier. We're going to use a basic pattern match with a guard clause, List.fold, Array.toList, and Array.ofList.

The easiest way to do this involves converting the string array to a string list, which is done with the Array.toList:

let stringList = parts |> Array.toList

If you're following along in a REPL (the F# interactive window), you should notice the biggest difference between the printed versions of each is that Array has vertical-pipes on the inside of the brackets: [|element1; element2; ...|], and List does not: [element1; element2; ...].

So now we're going to fold over that list: a fold takes an accumulator, and a value. It iterates over each value in the List and applies a function to it, which usually combines the value with the accumulator somehow. Our fold is pretty basic:

let groupStrings (acc:string list) str =
    match acc with
    | prev::tail when prev.Length <= 3 -> (sprintf "%s %s" prev str)::tail
    | _ -> str::acc
let groupedStrings = stringList |> List.fold groupStrings []

We could modify this to take a length instead of 3, but I'll leave that up to you.

Some neat syntax things:

  1. The match is a "pattern match", similar to a C-style language switch, but on sterroids.
  2. The prev::tail is a pattern expression for a list: it means "match the first element in the list to prev, and the remaining elements to tail.
  3. The when ... is a "guard clause": it means this expression is matched first, then the entire pattern is matched if and only if the guard clause returns true.
  4. The -> means "return the right side as the result of the match expression." (Basically)
  5. The (sprintf "%s %s" prev str) just combines the two strings with a space between them.
  6. The ::tail now creates a new list with the sprintf block as the first element, and the remaining elements as the, well, remaining elements.
  7. The _ is a "match anything" pattern-expression.
  8. The [] is an empty list (the type is inferred to be whatever that "folder" expects - a string list in this case).

Now we just need to reverse the list, which in F# is easily achieved by List.rev:

let finalResults = groupedStrings |> List.rev

And lastly, we'll convert them back to an array with Array.ofList:

let result = Array.ofList

That's it, make it reusable

So now we've successfully built the functions to do this work, let's build a function that is reusable.

let splitAndGroupString =
    split ' '
    >> Array.toList
    >> List.fold (fun (acc:string list) str ->
        match acc with
        | prev::tail when prev.Length <= 3 -> (sprintf "%s %s" prev str)::tail
        | _ -> str::acc) []
    >> List.rev
    >> Array.ofList

And we can call it as:

let splitStr = "Some String With A Short Word" |> splitAndGroupString

And that's it!

We went through some basic business logic today, we'll be going through more complex stuff in the coming lessons, but it's good to take it slow (at first), and we'll eventually build up to some pretty complex operations.

I told you I would get more organized, and I did. As we continue I'll figure out how I plan to present things effectively, but for now I'm just going to stick with my gut. I hope you enjoyed it and learned something, but if not it's still helping me learn more and become a better software engineer, so you should probably count on more crap coming from me here soon.