using Programming;

A Blog about some of the intrinsics related to programming and how one can get the best out of various languages.

F# Advent 2019: Dawn of the F# Domain Types

If you use .NET, build your objects in F#

So towards the end of October, I tweeted (as one does) about building domain objects and models in F#. I mentioned a few nice things about it, so today I intend to do a deeper dive into some of those nice things, and talk about why they are concerns you should keep in the back of your head when you make a decision about how to build your domain models.

Now it's no secret that I like F#. I talk about it a lot (among many other things), largely because when I am doing .NET work, there is almost always some F# involved. (It might be a small portion, or it might be a large portion, depends on the project.) As a result, if you've ever spoke with me in person about programming, I've probably talked about it.

Today, I'm going to show you why F# is a very cool tool in the .NET ecosystem, and how F# fixes a lot of class-oriented things that, in my opinion, C# does wrong.


F# does a lot of the heavy-lifting for object-oriented work

One of the most impressive things I feel F# has to offer is the level of generation it offers for object-oriented work. How it can give you a lot of things you usually spend a good chunk of time on for free. Three lines of F# can do something that would take, genuinely, dozens (or even hundreds) of lines of C#.

If you've ever done OOP work, chances are you've had to write a .ToString(), or a .Equals(object), or a .GetHashCode(). Spoiler alert: with F# you don't need to write any of those most of the time. (You can, but it's ill-advised.)

Let's take a common game-development scenario in C#: vectors. In physics, a vector is a distance and direction. Start at the origin (or point (0,0)), then adjust position from there based on the direction given, and the distance, move to the new location.

Commonly, we represent them in one of two ways:

  • r, Θ
  • x, y

You are likely familiar with the (x, y) format: this is standard point representation for the cartesian coordinate system. The (r, Θ) format is common in polar coordinate systems.

Long-winded math aside, the two values can be translated between. We'll build a sizable Vector2 feature-set into our program, and see how the F# and C# of the implementation vary.

A standard C# starting point might be something like the following:

public class Vector2 
{
    public int X { get; set; }
    public int Y { get; set; }
}

In F#, we would do something like the following:

type Vector2 = {
    X : int
    Y : int
}

Alright, so far F# hasn't really lived up to the promise. It's the same length as the C#, and we don't really see any advantages.

Fair. We haven't done anything to really take advantage of F# yet, but one of the things we already have is an implementation of .ToString(), .Equals(object), and .GetHashCode(). It also implements IEquatable<Vector2>, IStructuralEquatable, IComparable<Vector2>, IStructuralComparable, and IComparable.

I could go into a large digression on the intermediate language generated and how it works, why it's important, etc., instead, I'm going to just use a quick screenshot of the generated F# class (left) to the generated C# class (right).

F# and C# comparison

Wow are those different. F# implemented 9 additional methods and even setup a constructor for us. Not so bad. The C# version, on the other hand, is just the two properties. So really, our 4 lines of F# is equivalent to a lot, lot more in C#. In fact, the F# version is even immutable. Sure, we could omit the set in the C# to achieve the same, but then we have to define our own constructor. My point with this first example is to demonstrate how effective F# is at generation. We'll dig deeper into what it means in a few moments.

Now, I could stop here, and I suppose if I were publishing this right now I might, but today I'm going to go a little further.


Why is all this F# generation important?

This is probably the more important part to discuss: why is the generation F# provides so important? Why do I push it as such a major reason to build objects in F#?

It all boils down to a couple basic points:

  1. Structural equality and comparison is important for doing effective matching and evaluation of objects against one-another.
  2. These few functions are extremely important if the object itself is to be used in a HashMap scenario. (I.e. a dictionary key, etc.)

Basically, it boils down to performance and developer ease. A lot of times we want to be able to do something like if a = b then ..., but we find out that there is no = operator between a and b, even when a and b are the same type. So you go ahead and define your own operator, just to need to define a .Equals(object) method and then .GetHashCode(). You will also want both of these implemented if you are using LINQ, as some of the LINQ methods / query syntax require them.

Unfortunately, in C# we don't get those features. Instead we only have a base implementation (the default) for both Equals(object) and GetHashCode(). The base implementations don't do structural equality at all, so two objects that are exactly the same might not match via .Equals(object). (Spoiler: unless they are the exact same instance, they won't match via the default.)

This also comes in to heavy play with struct types, as it's expected that two structures (which are expected to be value types) that are instanced via the same values would be the same. But alas, by default, they are not such in .NET.

So, let's look at some situations we would have to write a lot of manual code for in C#, that F# just gives us.

Scenario 1: you want to determine if an element is in an array.

Let's assume you have an array of elements, and you want to determine if you have a certain element in the array. With the new LINQ syntax in C#, it should be rather simple, no?

var find = new Vector2() { X = 1, Y = 2 };
var results = elements.Where(x => x.Equals(find));

Seems reasonable, should work fine. Except it wont. The problem here is that we don't override .Equals(object), and we don't have a .Equals(Vector2), so our default C# version won't work. Our F# version will work fine though, as we have both of those options in the F# version. One note: for the F# version we must initialize find via the constructor, because it's immutable (which lends well to a later point).

To get the C# version to work, we really just need to pick one of the two to implement, though for fairness we'll actually implement both.

public override bool Equals(object other) => Equals(other as Vector2);
public bool Equals(Vector2 other) => other != null && X == other.X && Y == other.Y;    

So we added two lines (at a minimum) to our C# to make sure this works, but we end up with working code, so we are done here.

First note: your IDE might state that you didn't override GetHashCode(), and it's entirely correct! It's always a best practice to override both if you are doing one of them. Second note: you might find that the two lines printed by the C# version are different than the F# version, and they are. The F# version has a custom .ToString() implementation, which means when we do a print line for it, we get a nicer string.

Scenario 2: you have two collections, and you want to create the collection of the elements present in both.

This scenario is your basic Intersect scenario: you have some elements1 in one collection, and some more elements2 in another collection. You want to create a single collection that encompasses only those elements that are present in both collection.

You might write some code like the following:

var results = elementsA.Intersect(elementsB).ToArray();
Console.WriteLine(results.Length);

In the C# version, you'll find that no elements match no matter how you call the Intersect function, and that the F# version gives you some.

Again, we're missing a key piece, I alluded to it earlier: GetHashCode(). We need it for Intersect to work properly.

So, we'll create a poor implementation:

public override int GetHashCode() => X ^ Y;

Now the C# version works fine.

Note here: the F# GetHashCode() is a good implementation, ours was a poor attempt to make sure that we got something in place for .NET to use. In .NET, it's OK for two different elements to return the same code, because Equals(object) is the final decision maker, but two identical elements may NOT return different hash codes.


At this point, I wanted to go into some performance implications of the C# classes vs. the F# generated objects, but because this post is already long, and also because it's Christmas, I'm going to save that discussion for a later date. In fact, I'm going to target mid-January for that discussion. That said, I recommend doing some tests of the C# vs. F# objects, and especially in usage as dictionary keys. You might find some interesting results.

Additionally, this is all up on GitHub, so feel free to play with the exact code I was using to test our setups.

You're Logging Wrong: Stop It

You're taking Dependencies Wrong

Specifically, you're logging wrong.

Here's the deal, we all at some point think "Oh, I need to log some information, better write an abstraction!"

Because we programmers are too afraid (or arrogant) to take dependencies on someone else's "stuff", we always write our own abstractions. They're usually subtly different, but the 99% similarities are:

  1. An ILogger interface of some sort;
  2. A Log(Severity, message) in the ILogger;
  3. Two or three generic logger implementations;

This is basically the bulk of what most implementations call for. Always something like this.

This might look something like the following, in C#:

public enum LoggerLevel : byte
{
    Error = 0,
    Warning = 100,
    Information = 200,
    Verbose = 255
}

public interface ILogger
{
    void Log(LoggerLevel level, string msg);
    void Log<T>(LoggerLevel level, T obj);
}

Right? This probably looks familiar. This is a pretty common pattern, so common I call it the "logger pattern."

The purpose is to make it easy to say "Ok, here's an ILogger, do your thing." We have a logger sample for something like an in-memory buffer:

public class BaseLogger : ILogger
{
    private List<string> messages = new List<string>(10000);

    public void Log(LoggerLevel level, string msg)
    {
        messages.Add(msg);
    }

    public void Log<T>(LoggerLevel level, T obj)
    {
        messages.Add(obj.ToString());
    }
}

And we say "great, things are done."

Here's the deal: this is wrong.


What's wrong with ILogger?

Well, just about everything.

  1. It requires an implementation for every target logger, this means a lot of boiler-plate code, etc.;
  2. We always mess it up with trying to support "common" targets (AutoFac, for example);

A great person named David Fowler tweeted about this a few days ago:

This is what the .NET library ecosystem looks like today. Say you develop an interesting library/framework that calls into user code and also does some interesting logging (lets call this library A). A customer comes along and wants to use Autofac and log4net.

They file a feature request on your repository and you do what a good software engineer would do, you make an abstraction, A.ILogger and A.ICanActivateYourCode and you make 2 new libraries. A.Log4Net and A.Autofac. Rinse and repeat this process with libraries A to Z

What you end up with are different abstractions that all look the same but nobody wants to take a dependency because that's a big deal. What if it's the wrong one, what if that package goes away? The benefit needs to be huge in order for your core library to pull more deps.

So a customer that wants to use A, B and C with log4net and autofac install. A, B, C and A.log4Net, A.Autofact, B.log4Net, b.Autofac, C.log4net, C.Autofac.

I call this glue library hell. There are insufficient shared abstractions that exist, therefore everyone makes their own.

Not to mention you have to hope that enough of the API for the underlying libraries are appropriately exposed so that you have a consistent way to configure them.

David's point was that we have a million of the same implementations, but they're ever-so-slightly different.


So what's the 'right' way?

Ah yes, what is the 'right' way?

Well, due to the nature of logging, we are always adding a message / thing to the log, we're never actually doing anything else. As a result, the "right" way to take a logging dependency is not ILogger, but is instead an Action<string>. That is: a function that you pass a message to.

Wait, what do you mean?

Back to our original example, let's say we have a function DoWork, and DoWork takes an ILogger:

private static void DoWork(ILogger logger)
{
    for (var i = 0; i < COUNT; i++)
        logger.Log(LoggerLevel.Information, "Test");
}

This seems straightforward. "Yeah, you log the whatever with the Log method on the ILogger." Sure, makes sense.

Calling this is straightforward:

ILogger logger = new BaseLogger();
DoWork(logger);

And viola: we have logging.

But, we want to log with a function instead, so we replace DoWork:

private static void DoWork(Action<LoggerLevel, string> logger)
{
    for (var i = 0; i < COUNT; i++)
        logger(LoggerLevel.Information, "Test");
}

Now, instead of taking the ILogger directly, we'll take the Log function:

Action<LoggerLevel, string> logger = new BaseLogger().Log;
DoWork(logger);

Curiously, this makes life much easier on the consumer, as now they don't need a ILogger implementation (and that whole abstraction is just gone), instead, they pass a function.

This means that the consumer can say "here's a function that logs to AutoFac", or "here's a function that logs to log4net", etc.

Even moreso: our implementation is now compatible with any other utility using a logger (should they all use function logging). The same function can be slightly reworked for each utility, assuming they don't share a LogLevel. You just wrap it with a very quick lambda and life is good.


But wait, there's more.

There's one more thing here we should evaluate, which is the obvious: what about performance?

Ah yes, the age-old "is it fast, though?" question.

Everyone wants to make sure their code is fast, they want to make sure they don't have any performance loss. Why? I don't know, it's always something superficial.

So, is it fast? This is a good question, I guess.

Of course, me being me, I benchmarked various types of logging and threw it on GitHub, but the overall consensus was: it is as fast, or faster.

(Sidebar: the nop in the F# version is because of this bug. Looks like Don Syme has thoughts on how we can fix it.)

The Full Results

So I just ran my benchmark alet it do it's thing, I've put the full result below:

// * Summary *

BenchmarkDotNet=v0.11.4, OS=Windows 10.0.17134.407 (1803/April2018Update/Redstone4)
Intel Xeon CPU E3-1505M v6 3.00GHz, 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-preview-010184
  [Host]     : .NET Core 2.1.7 (CoreCLR 4.6.27129.04, CoreFX 4.6.27129.04), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.7 (CoreCLR 4.6.27129.04, CoreFX 4.6.27129.04), 64bit RyuJIT


|                  Method |      Mean |      Error |     StdDev |    Median | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|------------------------ |----------:|-----------:|-----------:|----------:|------:|--------:|------------:|------------:|------------:|--------------------:|
|               LogDirect |  33.42 us |  0.4143 us |  0.3673 us |  33.36 us |  0.62 |    0.01 |     18.8599 |      3.7231 |           - |            78.21 KB |
|             LogDirectFs |  35.46 us |  0.6995 us |  1.4755 us |  35.60 us |  0.68 |    0.04 |     18.8599 |      3.7231 |           - |            78.21 KB |
|         LogViaInjection |  53.94 us |  0.5640 us |  0.4710 us |  54.11 us |  1.00 |    0.00 |     18.8599 |      3.7231 |           - |            78.21 KB |
|       LogViaInjectionFs |  56.81 us |  0.9890 us |  0.8259 us |  56.72 us |  1.05 |    0.01 |     18.8599 |      3.7231 |           - |            78.21 KB |
|          LogViaCallback |  47.99 us |  0.5204 us |  0.4868 us |  47.91 us |  0.89 |    0.01 |     18.8599 |      3.7231 |           - |            78.27 KB |
|        LogViaCallbackFs |  50.81 us |  0.3474 us |  0.3250 us |  50.81 us |  0.94 |    0.01 |     18.8599 |      3.7231 |           - |            78.27 KB |
|            LogObjDirect | 666.71 us | 12.3811 us | 12.7145 us | 664.47 us | 12.38 |    0.32 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|          LogObjDirectFs | 668.30 us |  8.1078 us |  6.7704 us | 664.00 us | 12.39 |    0.15 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|      LogObjViaInjection | 709.30 us | 12.1255 us | 11.3422 us | 706.69 us | 13.14 |    0.26 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|    LogObjViaInjectionFs | 705.13 us |  6.9833 us |  6.5322 us | 705.46 us | 13.07 |    0.18 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|       LogObjViaCallback | 671.84 us |  6.9447 us |  5.4220 us | 673.47 us | 12.44 |    0.11 |    185.5469 |     92.7734 |           - |           1093.9 KB |
|     LogObjViaCallbackFs | 669.24 us |  4.6712 us |  3.9007 us | 670.10 us | 12.41 |    0.09 |    185.5469 |     92.7734 |           - |           1093.9 KB |
|         LogInlineDirect |  33.32 us |  0.3477 us |  0.3253 us |  33.24 us |  0.62 |    0.01 |     18.8599 |      3.7231 |           - |            78.21 KB |
|       LogInlineDirectFs |  33.43 us |  0.4313 us |  0.4035 us |  33.34 us |  0.62 |    0.01 |     18.8599 |      3.7231 |           - |            78.21 KB |
|   LogInlineViaInjection |  33.68 us |  0.6655 us |  0.7121 us |  33.39 us |  0.63 |    0.01 |     18.8599 |      3.7231 |           - |            78.21 KB |
| LogInlineViaInjectionFs |  35.91 us |  0.8274 us |  2.4267 us |  35.15 us |  0.64 |    0.03 |     18.8599 |      3.7231 |           - |            78.21 KB |
|    LogInlineViaCallback |  49.10 us |  0.6939 us |  0.6491 us |  49.02 us |  0.91 |    0.01 |     18.8599 |      3.7231 |           - |            78.27 KB |
|  LogInlineViaCallbackFs |  51.15 us |  0.5555 us |  0.5196 us |  51.01 us |  0.95 |    0.01 |     18.8599 |      3.7231 |           - |            78.27 KB |
|  LogInlineDynamicDirect |  76.38 us |  1.4895 us |  1.9884 us |  75.43 us |  1.43 |    0.05 |     18.7988 |      3.0518 |           - |            78.21 KB |

That's a lot of benchmarking. Let's crop that down:

|                  Method |      Mean |      Error |     StdDev |    Median | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|------------------------ |----------:|-----------:|-----------:|----------:|------:|--------:|------------:|------------:|------------:|--------------------:|
|               LogDirect |  33.42 us |  0.4143 us |  0.3673 us |  33.36 us |  0.62 |    0.01 |     18.8599 |      3.7231 |           - |            78.21 KB |
|         LogViaInjection |  53.94 us |  0.5640 us |  0.4710 us |  54.11 us |  1.00 |    0.00 |     18.8599 |      3.7231 |           - |            78.21 KB |
|          LogViaCallback |  47.99 us |  0.5204 us |  0.4868 us |  47.91 us |  0.89 |    0.01 |     18.8599 |      3.7231 |           - |            78.27 KB |

Those are the three lines I want to look at.

First and foremost: the fastest way is to pass the BaseLogger class in directly (i.e.: no ILogger interface usage). That was pretty obviousl.

But, the next, interesting note is that the interface version is actually about 6us slower than passing the function within the interface. This is curious, not only is passing the function a more proper way to do it, but it's also ever-so-slightly faster. (Granted, the margins are tiny and statistically meaningless, but it does prove that there is not discernable performance disadvantage, so that argument is now completely irrelevant.)

The only "bad" difference is that there was an extra 0.06KB allocated, but I assume that's overhead for Action<>. It's also such a tiny amount that if you're using that (or the time, to be completely frank) for justification, you are not making the right decisions.

The only time the interface is a better option is if you are not passing it, but are doing things inline:

|                  Method |      Mean |      Error |     StdDev |    Median | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|------------------------ |----------:|-----------:|-----------:|----------:|------:|--------:|------------:|------------:|------------:|--------------------:|
|         LogInlineDirect |  33.32 us |  0.3477 us |  0.3253 us |  33.24 us |  0.62 |    0.01 |     18.8599 |      3.7231 |           - |            78.21 KB |
|   LogInlineViaInjection |  33.68 us |  0.6655 us |  0.7121 us |  33.39 us |  0.63 |    0.01 |     18.8599 |      3.7231 |           - |            78.21 KB |
|    LogInlineViaCallback |  49.10 us |  0.6939 us |  0.6491 us |  49.02 us |  0.91 |    0.01 |     18.8599 |      3.7231 |           - |            78.27 KB |

Here, you'll note that the overhead of Action<> actually puts the callback version at a significant disadvantage, and a discernable disadvantage in this case.


In Summation

To summarize our discussion:

  1. Don't make an ILogger, it's unbecoming.
  2. Use an Action or Function, it's much more dynamic and reusable.
  3. The only time this is not the case is when the logger is constructed inline with the thing being logged. Then use a regular function or class.

.NET Gotcha: System.Net.WebException: The remote server returned an error

F# / .NET: Gotcha's

The dreaded System.Net.WebException: The remote server returned an error mess...

If you've ever used the WebClient to target RESTful API's in .NET, you probably hit this error at least once:

System.Net.WebException: The remote server returned an error: {Some Error}.
   at System.Net.WebClient.DownloadDataInternal(Uri address, WebRequest& request)
   at System.Net.WebClient.DownloadString(Uri address)
   at System.Net.WebClient.DownloadString(String address)

Scenario: you're just chugging along, working on some stuff, then an exception comes in and BAM, messes up your whole day.

I had this happen to a coworker yesterday. He was working on something and kept wondering why the endpoint was giving him a 404: Not Found in a WebException. He'd never really hit this before, because this was the first time he needed to consume a RESTful API in .NET.

What's happening here?

When the .NET WebClient sends a request, it's backed by a WebRequest, which happens to have someone's "good" idea: when we hit a non-200 status code, throw an exception. That makes sense, right? If we hit a 404, it makes sense to throw an exception, right?

Sure, in 1990. It's 2019 though, we use the 404 to mean something. Some genius out there said "hey, when building RESTful API's, just send an HTTP status code with your 'thing', we already have a list of them." The problem is that up, down, left and right people were treating 300, 400, and 500 status codes as actual errors, and things would break. Well, .NET is no exception.

So how do we fix it?

Yes, this is the easy part. We have to use a try / catch or try / with, depending on the language (OH NO!).

Effectively, the following code will throw a System.Net.WebException: The remote server returned an error: (404) Not Found. on line 7, when called on line 11. We'll fix that.

open System.Net
let urlBase = "https://httpstat.us/"

let getCode code =
    use wc = new WebClient()
    wc.Headers.Add(HttpRequestHeader.Accept, "application/json")
    code |> sprintf "%s%i" urlBase |> wc.DownloadString

200 |> getCode |> printfn "%s"
202 |> getCode |> printfn "%s"
404 |> getCode |> printfn "%s"

So, what do we do?

Easy, catch the WebException and read the response:

open System.IO
open System.Net
let urlBase = "https://httpstat.us/"

let getCode code =
    try
        use wc = new WebClient()
        wc.Headers.Add(HttpRequestHeader.Accept, "application/json")
        code |> sprintf "%s%i" urlBase |> wc.DownloadString
    with :? WebException as ex ->
        use sr = new StreamReader(ex.Response.GetResponseStream())
        sr.ReadToEnd()

200 |> getCode |> printfn "%s"
202 |> getCode |> printfn "%s"
404 |> getCode |> printfn "%s"

This isn't perfect, we'll catch all WebException issues, so we want one more "guard" clause:

open System.IO
open System.Net
let urlBase = "https://httpstat.us/"

let getCode code =
    try
        use wc = new WebClient()
        wc.Headers.Add(HttpRequestHeader.Accept, "application/json")
        code |> sprintf "%s%i" urlBase |> wc.DownloadString
    with :? WebException as ex when ex.Status = WebExceptionStatus.ProtocolError ->
        use sr = new StreamReader(ex.Response.GetResponseStream())
        sr.ReadToEnd()

200 |> getCode |> printfn "%s"
202 |> getCode |> printfn "%s"
404 |> getCode |> printfn "%s"

Bam. Now we only catch ProtocolError exceptions, like 400, 500, etc.

.NET Gotcha: TimeSpan: Invalid Format String

F# / .NET: Gotcha's

This is another annoying "feature" in .NET. For some reasons, TimeSpan.ToString with a custom format string does all sorts of wonky stuff, particularly it will give you the "Invalid Format String" error on any unescaped character that isn't a valid format specifier.

Basically, take this list: Custom TimeSpan Format Strings and treat it literally.

Want to add a space between days and hours/minutes/seconds? Escape it: d\ hh\:mm\:ss. Yes, even spaces.

.NET Gotcha: A Generic Error Occurred in GDI+

F# / .NET: Gotcha's

This is one of the biggest, most painful errors I've ever seen in my life. There's absolutely no direction this error can take you.

Here's the thing: GDI+ doesn't really report any errors it encounters down the stack properly. They're all "Generic Errors".

So, some things to look for:

  • Is the resource disposed? If you have a Using / using / use, and you try to access the resource outside of that scope, you'll probably get this error.
  • Calling to .Save? There's a whole list of stuff for that. If you get this error when calling .Save on an Image / Bitmap / etc., you might want to check any and all of the following:
    • Does the directory exist? GDI+ won't create it.
    • Do you have permission to the directory / file? GDI+ won't tell you if that's the problem.
    • Did you dispose of the MemoryStream? That'll be a generic error. You'll want to make sure you save before disposal.

There are many, many more, and I'll be updating this list as time permits.