You're taking Dependencies Wrong

Specifically, you're logging wrong.

Here's the deal, we all at some point think "Oh, I need to log some information, better write an abstraction!"

Because we programmers are too afraid (or arrogant) to take dependencies on someone else's "stuff", we always write our own abstractions. They're usually subtly different, but the 99% similarities are:

1. An ILogger interface of some sort;
2. A Log(Severity, message) in the ILogger;
3. Two or three generic logger implementations;

This is basically the bulk of what most implementations call for. Always something like this.

This might look something like the following, in C#:

public enum LoggerLevel : byte
{
Error = 0,
Warning = 100,
Information = 200,
Verbose = 255
}

public interface ILogger
{
void Log(LoggerLevel level, string msg);
void Log<T>(LoggerLevel level, T obj);
}


Right? This probably looks familiar. This is a pretty common pattern, so common I call it the "logger pattern."

The purpose is to make it easy to say "Ok, here's an ILogger, do your thing." We have a logger sample for something like an in-memory buffer:

public class BaseLogger : ILogger
{
private List<string> messages = new List<string>(10000);

public void Log(LoggerLevel level, string msg)
{
}

public void Log<T>(LoggerLevel level, T obj)
{
}
}


And we say "great, things are done."

Here's the deal: this is wrong.

What's wrong with ILogger?

1. It requires an implementation for every target logger, this means a lot of boiler-plate code, etc.;
2. We always mess it up with trying to support "common" targets (AutoFac, for example);

This is what the .NET library ecosystem looks like today. Say you develop an interesting library/framework that calls into user code and also does some interesting logging (lets call this library A). A customer comes along and wants to use Autofac and log4net.

They file a feature request on your repository and you do what a good software engineer would do, you make an abstraction, A.ILogger and A.ICanActivateYourCode and you make 2 new libraries. A.Log4Net and A.Autofac. Rinse and repeat this process with libraries A to Z

What you end up with are different abstractions that all look the same but nobody wants to take a dependency because that's a big deal. What if it's the wrong one, what if that package goes away? The benefit needs to be huge in order for your core library to pull more deps.

So a customer that wants to use A, B and C with log4net and autofac install. A, B, C and A.log4Net, A.Autofact, B.log4Net, b.Autofac, C.log4net, C.Autofac.

I call this glue library hell. There are insufficient shared abstractions that exist, therefore everyone makes their own.

Not to mention you have to hope that enough of the API for the underlying libraries are appropriately exposed so that you have a consistent way to configure them.

David's point was that we have a million of the same implementations, but they're ever-so-slightly different.

So what's the 'right' way?

Ah yes, what is the 'right' way?

Well, due to the nature of logging, we are always adding a message / thing to the log, we're never actually doing anything else. As a result, the "right" way to take a logging dependency is not ILogger, but is instead an Action<string>. That is: a function that you pass a message to.

Wait, what do you mean?

Back to our original example, let's say we have a function DoWork, and DoWork takes an ILogger:

private static void DoWork(ILogger logger)
{
for (var i = 0; i < COUNT; i++)
logger.Log(LoggerLevel.Information, "Test");
}


This seems straightforward. "Yeah, you log the whatever with the Log method on the ILogger." Sure, makes sense.

Calling this is straightforward:

ILogger logger = new BaseLogger();
DoWork(logger);


And viola: we have logging.

But, we want to log with a function instead, so we replace DoWork:

private static void DoWork(Action<LoggerLevel, string> logger)
{
for (var i = 0; i < COUNT; i++)
logger(LoggerLevel.Information, "Test");
}


Now, instead of taking the ILogger directly, we'll take the Log function:

Action<LoggerLevel, string> logger = new BaseLogger().Log;
DoWork(logger);


Curiously, this makes life much easier on the consumer, as now they don't need a ILogger implementation (and that whole abstraction is just gone), instead, they pass a function.

This means that the consumer can say "here's a function that logs to AutoFac", or "here's a function that logs to log4net", etc.

Even moreso: our implementation is now compatible with any other utility using a logger (should they all use function logging). The same function can be slightly reworked for each utility, assuming they don't share a LogLevel. You just wrap it with a very quick lambda and life is good.

But wait, there's more.

There's one more thing here we should evaluate, which is the obvious: what about performance?

Ah yes, the age-old "is it fast, though?" question.

Everyone wants to make sure their code is fast, they want to make sure they don't have any performance loss. Why? I don't know, it's always something superficial.

So, is it fast? This is a good question, I guess.

Of course, me being me, I benchmarked various types of logging and threw it on GitHub, but the overall consensus was: it is as fast, or faster.

(Sidebar: the nop in the F# version is because of this bug. Looks like Don Syme has thoughts on how we can fix it.)

The Full Results

So I just ran my benchmark alet it do it's thing, I've put the full result below:

// * Summary *

BenchmarkDotNet=v0.11.4, OS=Windows 10.0.17134.407 (1803/April2018Update/Redstone4)
Intel Xeon CPU E3-1505M v6 3.00GHz, 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-preview-010184
[Host]     : .NET Core 2.1.7 (CoreCLR 4.6.27129.04, CoreFX 4.6.27129.04), 64bit RyuJIT
DefaultJob : .NET Core 2.1.7 (CoreCLR 4.6.27129.04, CoreFX 4.6.27129.04), 64bit RyuJIT

|                  Method |      Mean |      Error |     StdDev |    Median | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|------------------------ |----------:|-----------:|-----------:|----------:|------:|--------:|------------:|------------:|------------:|--------------------:|
|               LogDirect |  33.42 us |  0.4143 us |  0.3673 us |  33.36 us |  0.62 |    0.01 |     18.8599 |      3.7231 |           - |            78.21 KB |
|             LogDirectFs |  35.46 us |  0.6995 us |  1.4755 us |  35.60 us |  0.68 |    0.04 |     18.8599 |      3.7231 |           - |            78.21 KB |
|         LogViaInjection |  53.94 us |  0.5640 us |  0.4710 us |  54.11 us |  1.00 |    0.00 |     18.8599 |      3.7231 |           - |            78.21 KB |
|       LogViaInjectionFs |  56.81 us |  0.9890 us |  0.8259 us |  56.72 us |  1.05 |    0.01 |     18.8599 |      3.7231 |           - |            78.21 KB |
|          LogViaCallback |  47.99 us |  0.5204 us |  0.4868 us |  47.91 us |  0.89 |    0.01 |     18.8599 |      3.7231 |           - |            78.27 KB |
|        LogViaCallbackFs |  50.81 us |  0.3474 us |  0.3250 us |  50.81 us |  0.94 |    0.01 |     18.8599 |      3.7231 |           - |            78.27 KB |
|            LogObjDirect | 666.71 us | 12.3811 us | 12.7145 us | 664.47 us | 12.38 |    0.32 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|          LogObjDirectFs | 668.30 us |  8.1078 us |  6.7704 us | 664.00 us | 12.39 |    0.15 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|      LogObjViaInjection | 709.30 us | 12.1255 us | 11.3422 us | 706.69 us | 13.14 |    0.26 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|    LogObjViaInjectionFs | 705.13 us |  6.9833 us |  6.5322 us | 705.46 us | 13.07 |    0.18 |    185.5469 |     92.7734 |           - |          1093.84 KB |
|       LogObjViaCallback | 671.84 us |  6.9447 us |  5.4220 us | 673.47 us | 12.44 |    0.11 |    185.5469 |     92.7734 |           - |           1093.9 KB |
|     LogObjViaCallbackFs | 669.24 us |  4.6712 us |  3.9007 us | 670.10 us | 12.41 |    0.09 |    185.5469 |     92.7734 |           - |           1093.9 KB |
|         LogInlineDirect |  33.32 us |  0.3477 us |  0.3253 us |  33.24 us |  0.62 |    0.01 |     18.8599 |      3.7231 |           - |            78.21 KB |
|       LogInlineDirectFs |  33.43 us |  0.4313 us |  0.4035 us |  33.34 us |  0.62 |    0.01 |     18.8599 |      3.7231 |           - |            78.21 KB |
|   LogInlineViaInjection |  33.68 us |  0.6655 us |  0.7121 us |  33.39 us |  0.63 |    0.01 |     18.8599 |      3.7231 |           - |            78.21 KB |
| LogInlineViaInjectionFs |  35.91 us |  0.8274 us |  2.4267 us |  35.15 us |  0.64 |    0.03 |     18.8599 |      3.7231 |           - |            78.21 KB |
|    LogInlineViaCallback |  49.10 us |  0.6939 us |  0.6491 us |  49.02 us |  0.91 |    0.01 |     18.8599 |      3.7231 |           - |            78.27 KB |
|  LogInlineViaCallbackFs |  51.15 us |  0.5555 us |  0.5196 us |  51.01 us |  0.95 |    0.01 |     18.8599 |      3.7231 |           - |            78.27 KB |
|  LogInlineDynamicDirect |  76.38 us |  1.4895 us |  1.9884 us |  75.43 us |  1.43 |    0.05 |     18.7988 |      3.0518 |           - |            78.21 KB |


That's a lot of benchmarking. Let's crop that down:

|                  Method |      Mean |      Error |     StdDev |    Median | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|------------------------ |----------:|-----------:|-----------:|----------:|------:|--------:|------------:|------------:|------------:|--------------------:|
|               LogDirect |  33.42 us |  0.4143 us |  0.3673 us |  33.36 us |  0.62 |    0.01 |     18.8599 |      3.7231 |           - |            78.21 KB |
|         LogViaInjection |  53.94 us |  0.5640 us |  0.4710 us |  54.11 us |  1.00 |    0.00 |     18.8599 |      3.7231 |           - |            78.21 KB |
|          LogViaCallback |  47.99 us |  0.5204 us |  0.4868 us |  47.91 us |  0.89 |    0.01 |     18.8599 |      3.7231 |           - |            78.27 KB |


Those are the three lines I want to look at.

First and foremost: the fastest way is to pass the BaseLogger class in directly (i.e.: no ILogger interface usage). That was pretty obviousl.

But, the next, interesting note is that the interface version is actually about 6us slower than passing the function within the interface. This is curious, not only is passing the function a more proper way to do it, but it's also ever-so-slightly faster. (Granted, the margins are tiny and statistically meaningless, but it does prove that there is not discernable performance disadvantage, so that argument is now completely irrelevant.)

The only "bad" difference is that there was an extra 0.06KB allocated, but I assume that's overhead for Action<>. It's also such a tiny amount that if you're using that (or the time, to be completely frank) for justification, you are not making the right decisions.

The only time the interface is a better option is if you are not passing it, but are doing things inline:

|                  Method |      Mean |      Error |     StdDev |    Median | Ratio | RatioSD | Gen 0/1k Op | Gen 1/1k Op | Gen 2/1k Op | Allocated Memory/Op |
|------------------------ |----------:|-----------:|-----------:|----------:|------:|--------:|------------:|------------:|------------:|--------------------:|
|         LogInlineDirect |  33.32 us |  0.3477 us |  0.3253 us |  33.24 us |  0.62 |    0.01 |     18.8599 |      3.7231 |           - |            78.21 KB |
|   LogInlineViaInjection |  33.68 us |  0.6655 us |  0.7121 us |  33.39 us |  0.63 |    0.01 |     18.8599 |      3.7231 |           - |            78.21 KB |
|    LogInlineViaCallback |  49.10 us |  0.6939 us |  0.6491 us |  49.02 us |  0.91 |    0.01 |     18.8599 |      3.7231 |           - |            78.27 KB |


Here, you'll note that the overhead of Action<> actually puts the callback version at a significant disadvantage, and a discernable disadvantage in this case.

In Summation

To summarize our discussion:

1. Don't make an ILogger, it's unbecoming.
2. Use an Action or Function, it's much more dynamic and reusable.
3. The only time this is not the case is when the logger is constructed inline with the thing being logged. Then use a regular function or class.

Recently I've been digging more-and-more into F#, so I want to start putting together a list of things that I occasionally (or regularly) run into that new users of the language (or even those who are substantially experienced) might not have a great time with.

I'll be notating if it's a general .NET "gotcha", or an F#-specific "gotcha."

.NET:

If you have something you want to see in the list, please let me know either via Twitter or as a comment. I'll try to check back here regularly to keep this list as up-to-date as possible.

Bringing F# into the SQLCLR

It's been some-time since my last post, and don't worry, we're still going to continue the IMAP server. I've been swamped at work, and as a result, haven't had the time to properly dedicate to writing these posts (especially that series, which is a complex topic).

Excuses aside, today we're going to talk about something I recently did for work, which is integrating F# into the SQLCLR (part of Microsoft SQL Server).

For those who don't know, the SQLCLR is a feature of SQL Server that allows one to import .NET assemblies as user-defined functions, or stored procedures. On it's own it doesn't sound impressive, but the SQLCLR allows us to significantly improve performance in some cases, and moderately improve it in others.

I won't go into detail explaining the SQLCLR, a gentleman by the name of Soloman Rutzky does that quite well. I'll let his "Stairway to SQLCLR" give you the introduction.

No, what I'll do today is show you how to import F# into the SQLCLR, instead of just C# or VB.NET. The process is about as straightforward as Soloman describes, but there are a few "gotcha's", so I'm going to include those in our discussion here today.

Without further ado, let's get started.

First: create the project and add the System.Data reference

The first step is obviously to create a project to hold our SQL code. The project should be an F# Class Library, in .NET Framework (I'm using 4.7.1 and F# Core 4.4.3.0). You'll want a module for the functions, and in that module you'll want to open Microsoft.SqlServer.Server, and System.Data.SqlTypes.

Once we've done that, we'll build a function. There are a few rules to creating a function in .NET that can be seen by SQL Server:

1. The function must have the SqlFunction attribute;
2. All inputs must be tupled;
3. All input and output types must be a Sql[Something] type (SqlDouble, SqlInt, etc.);

So, for our example we're going to use a real-world example from my work: distance calculation from two geo-coded points.

To do this, we'll build a function that takes 4 double values: two Latitude/Longitude value sets.

let calculateDistance (fromLat : SqlDouble, fromLon : SqlDouble, toLat : SqlDouble, toLon : SqlDouble) : SqlDouble

That's the signature we'll use, next, we want to define how SQL should treat the function:

[<SqlFunction(
IsDeterministic = true,
IsPrecise = false,
SystemDataAccess = SystemDataAccessKind.None,
DataAccess = DataAccessKind.None)>]


This is where life gets special, so let me explain them piece-by-piece:

• SqlFunction: this is just the attribute we use, there is also SqlProcedure for stored procedures;
• IsDeterministic = true: this value should ONLY be set to true if the function is deterministic, that is, given any input value, it returns one and exactly one output, and that two calls to the function with the same input will result in the same output;
• IsPrecise = false: this value should ONLY be set to true if the function uses the DECIMAL or NUMERIC types, and does precise mathematical calculations;
• SystemDataAccess = SystemDataAccessKind.None: I'll be completely honest with you, I don't know what the difference between this and DataAccess are, but if you do any reading/writing to/from SQL, you should set it to Read, otherwise, probably use None (there's a small performance cost to setting this to Read, I leave it to you to decide whether or not to do so);
• DataAccess = DataAccessKind.None: see above;

So basically, what we did here is define a function and tell SQL what it should expect the function to do. One of the most impotant parts is the IsDeterministic flag: this tells SQL that if it called the function for a set of values, it can reuse that result for any subsequent calls with the same set of values. This means it can memoize the results. If your function has side-effects, do not set this flag to true, or you will get weird results. Basically, if your function is truly "pure" (no side-effects), mark it with IsDeterministic = true.

Next: write the code

Alright, so we've covered the hard parts, next, we write the function.

My version of this function used some logic that was specific to my workplace, so I'm going to remove it and we'll write a vanilla function:

let constMod = 1.852 / 1.61 * 60.
let divPi180 = Math.PI / 180.
let div180Pi = 180. / Math.PI

[<SqlFunction(
IsDeterministic = true,
IsPrecise = false,
SystemDataAccess = SystemDataAccessKind.None,
DataAccess = DataAccessKind.None)>]
let calculateDistance (fromLat : SqlDouble, fromLon : SqlDouble, toLat : SqlDouble, toLon : SqlDouble) : SqlDouble =
let fromLat = fromLat.Value
let fromLon = fromLon.Value
let toLat = toLat.Value
let toLon = toLon.Value

let fromLat = fromLat * divPi180
let toLat = toLat * divPi180
let fromLon = fromLon * divPi180
let toLon = toLon * divPi180

constMod *
(Math.Acos
((Math.Sin toLon) * (Math.Sin fromLon) +
(Math.Cos toLon) * (Math.Cos fromLon) * (Math.Cos (toLat - fromLat))))
|> SqlDouble


This should be self-explanatory: we basically convert the data and do some simple math on it.

Third: enable SQLCLR

Alright, so that's that entirety of our .NET code.

Now, we need to enable the SQLCLR, because it's disabled by default.

The SQLCLR can be enabled through GUI or T-SQL, I prefer to do it through GUI because I typo a lot.

To enable it:

1. Right click your server in SSMS;
2. Click "Facets";
3. In the "Facet" dropdown select "Surface Area Configuration";
4. Change "ClrIntegrationEnabled" to "True";
5. Click "OK";

Easy enough.

Fourth: trust the assembly, and import it

This is one spot where things aren't completely awesome: the FSharp.Core library isn't built to natively support a "SAFE" import to SQLCLR, so we have to trust it first.

To trust the assemblies, we'll want to get a SHA2_512 hash of them, and optionally, a description.

I, personally, don't care so much about the description at the moment, so I'll leave that out and let you locate it if you like. Instead, I'm just going to demonstrate how to hash it and trust it.

We need to trust FSharp.Core, and then our assembly:

DECLARE @hash AS BINARY(64) = (SELECT HASHBYTES('SHA2_512', (SELECT * FROM OPENROWSET (BULK 'C:\path\to\bin\dir\FSharp.Core.dll', SINGLE_BLOB) AS [Data])))


Then, our assembly:

DECLARE @hash AS BINARY(64) = (SELECT HASHBYTES('SHA2_512', (SELECT * FROM OPENROWSET (BULK 'C:\path\to\bin\dir\MyAssembly.dll', SINGLE_BLOB) AS [Data])))


Easy enough.

Because FSharp.Core isn't built for native SQL Server support (which, if anyone want's to fix, I've included the error at the end of this article), we have to add it with PERMISSION_SET = UNSAFE, which is, well...unsafe.

So, to load our assembly, we need a name, and the path:

CREATE ASSEMBLY [MyAssembly]
AUTHORIZATION dbo
FROM 'C:\path\to\bin\dir\MyAssembly.dll'
WITH PERMISSION_SET = SAFE


Not particularly hard. The name ([MyAssembly]) is not restricted to anything other than the regular NVARCHAR(128) for sysname, it does not need to match anything from the DLL, but probably easier if it does.

Finally: create the function

Alright, so our assembly is imported, we have it available, the last part is creating the function.

To create the function, we start it off like a normal T-SQL UDF:

CREATE FUNCTION CalculateDistance
(
@fromLat FLOAT,
@fromLon FLOAT,
@toLat FLOAT,
@toLon FLOAT
)
RETURNS FLOAT


If you've ever written a T-SQL Scalar-Valued UDF, this should look familiar. We build the signature exactly as we defined it in F#, and that part is super important: the signature cannot vary at all.

Next, we write the UDF:

AS EXTERNAL NAME [MyAssembly].[MyAssembly.Namespace.ModuleName].calculateDistance


The EXTERNAL NAME is a three part name:

1. The assembly name as specified in CREATE ASSEMBLY;
2. The assembly namespace and module name, the fully-qualified name of the first outer-container of the function we need;
3. The function name itself;

Once you've created the function, we're literally all done. You can now call directly into your CLR code:

SELECT dbo.CalculateDistance(@fromLat, @fromLon, @toLat, @toLon)


Demonstrations!

For those who want to see the performance difference, the original T-SQL function is:

CREATE FUNCTION CalculateDistanceUdf
(
@fromLat FLOAT,
@fromLon FLOAT,
@toLat FLOAT,
@toLon FLOAT
)
RETURNS FLOAT
WITH SCHEMABINDING
AS
BEGIN
RETURN (1.852 / 1.61) *
60 *
DEGREES(
ACOS(
END


The WITH SCHEMABINDING is a hint to try to tell SQL Server to mark the function deterministic, and it is as verified with SELECT OBJECTPROPERTY(OBJECT_ID('[dbo].[CalculateDistanceUdf]'), 'IsDeterministic'), but it still performs significantly slower than the SQLCLR alternative.

I borrowed the test from this article to run mine, and wrote them as follows:

CREATE TABLE Numbers (
Num INT NOT NULL,
CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Num)
)
GO

WITH N1(C) AS (SELECT 0 UNION ALL SELECT 0),
N2(C) AS (SELECT 0 FROM N1 AS T1 CROSS JOIN N1 AS T2),
N3(C) AS (SELECT 0 FROM N2 AS T1 CROSS JOIN N2 AS T2),
N4(C) AS (SELECT 0 FROM N3 AS T1 CROSS JOIN N3 AS T2),
N5(C) AS (SELECT 0 FROM N4 AS T1 CROSS JOIN N4 AS T2),
N6(C) AS (SELECT 0 FROM N4 AS T1 CROSS JOIN N4 AS T2 CROSS JOIN N3 AS T3),
Nums(Num) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM N6)
INSERT INTO Numbers(Num) SELECT Num FROM Nums
GO


This inserts 1048576 rows to the Numbers table, so it's a good-sized test.

Then we can run each of the following three tests:

DECLARE @fromLat AS FLOAT = 100
DECLARE @fromLon AS FLOAT = 100
DECLARE @toLat AS FLOAT = 120
DECLARE @toLon AS FLOAT = 120

SELECT MAX(dbo.CalculateDistance(Num / @fromLat, Num / @fromLon, Num / @toLat, Num / @toLon)) FROM Numbers
GO

DECLARE @fromLat AS FLOAT = 100
DECLARE @fromLon AS FLOAT = 100
DECLARE @toLat AS FLOAT = 120
DECLARE @toLon AS FLOAT = 120

SELECT MAX(dbo.CalculateDistanceUdf(Num / @fromLat, Num / @fromLon, Num / @toLat, Num / @toLon)) FROM Numbers
GO

DECLARE @fromLat AS FLOAT = 100
DECLARE @fromLon AS FLOAT = 100
DECLARE @toLat AS FLOAT = 120
DECLARE @toLon AS FLOAT = 120

SELECT MAX
(
(1.852 / 1.61) *
60 *
DEGREES(
ACOS(
FROM Numbers
GO


You can run these each individually to time them. My times were roughly 645ms for the SQLCLR, 3369ms for the T-SQL UDF, and 703ms for the inline T-SQL. As you can see, the SQLCLR function is faster than the inline T-SQL, and let's us encapsulate the logic in a single function. (This actually came about as an issue because we have the calculation there copied-and-pasted over several dozen queries, often 3-8x per query.)

So, that said, in this type of situation (raw math) there's no reason to use T-SQL for the task, and for something reasonably complex like this, no reason not to abstract it. Dump the code in .NET, write your unit tests, and then deploy the assembly to the SQL server.

Now, that said, there are times I wouldn't use a SQLCLR function, such as when the math is ultra simple: i.e. * 3, and there are times when a table-valued UDF would be far superior, so I don't want to make the suggestion that this will always help, just that it's another thing you can try, and it might actually surprise you.

For anyone curious, attempting to create an assembly in F# throws the following warning:

Warning: The Microsoft .NET Framework assembly 'fsharp.core, version=4.4.3.0, culture=neutral, publickeytoken=b03f5f7f11d50a3a, processorarchitecture=msil.' you are registering is not fully tested in the SQL Server hosted environment and is not supported. In the future, if you upgrade or service this assembly or the .NET Framework, your CLR integration routine may stop working. Please refer SQL Server Books Online for more details.

And using a PERMISSION_SET of EXTERNAL_ACCESS or SAFE throws the following error:

CREATE ASSEMBLY failed because type 'Microsoft.FSharp.Collections.FSharpMap2' in safe assembly 'FSharp.Core' has a static field 'empty'. Attributes of static fields in safe assemblies must be marked readonly in Visual C#, ReadOnly in Visual Basic, or initonly in Visual C++ and intermediate language.

Greetings everyone. Once again, it's been a while since I've posted anything. I've been swamped with work, personal issues, and then some. I got a new dog (hi Max!), and so on. Fortunately, I have a topic I want to talk about (you can probably guess based on the title what it is), and thanks to a friend of mine, who we'll call "Jim" becasue, well, that's his name, who asked about this on Twitter, I figured we could go all-in.

Disclaimer: I'm a rambler, and this was hastily written.

Jim was asking about cookie-based authentication in ASP.NET, which is a great topic because it's something you should absolutely never do, but we are going to do to help him demonstrate how such a thing can be used for test automation. (Jim is a VERY smart person, and is working up a demo on how we can use cookie authentication hand-in-hand with test automation to do a wide-variety of things. The idea is to allow a test client to authenticate itself to quickly and easily get into the application.) We're going to spend this whole blog-post going over a worst-practice, vs. a best-practice. We're going to do all the things I always say never do, and learn why. (There's a lot of reasons we should not do any of these things, I'll try to cover a few with some examples.)

Then, at the end, I'm going to show you how we could do this type of authentication easily and via some sort of API call, so that we could stick with the core authentication principles that we value, and also satisfy our testers. This will demonstrate where the testers and the developers should be able to work together, to develop a flow that works for both.

We're going to build this out in Visual Studio, as usual, and we'll go over how we can take a blank ASP.NET website and add cookie-based authentication. Typically, we would use Forms-based authentication, or Active Directory / Windows Authentication.

A cookie is a small (usually) piece of information that a users browser stores which allows them to carry and pass information to and from a website. This cookie is a piece of data that is sent client-to-server, and server-to-client. We use cookies to transfer non-confidential data, because cookies are incredibly insecure. Anyone can spoof a cookie, and we'll look at doing exactly that in this blog post as well.

Alright, so let's go ahread and create some cookie-based authentication in our ASP.NET application. For this, we'll have three pages:

• Default.aspx: the main landing page, visible to authenticated and unauthenticated users;
• Login.aspx: the login page, this will test the user login and create an appropriate cookie;
• Authenticated.aspx: a page only available to authenticated users;

Step 1: Create the Authenticated.aspx

<%@ Page Language="C#" AutoEventWireup="true" CodeBehind="Authenticated.aspx.cs" Inherits="Poor_Cookie_Authentication__17_4_2018_.Authenticated" %>

<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml">
<title></title>
<body>
<form id="form1" runat="server">
<div>
</div>
<div>
</div>
</form>
</body>
</html>


Basically, we throw a single literal which will be the name of the authenticated user. We'll populate this from the code-behind file, which is also really simple. But, before we do the code-behind, let's build a user object.

I'm going to do extremely basic authentication: our User class will have a Username (email) and Password. It will also contain a static array of all eligible users, so as to allow us to "pretend" to log someone in. We'll create two sample users, with different passwords:

public class User
{
public static User[] Users { get; } =
new[]
{
};

public string Username { get; private set; }
public string Password { get; private set; }
}


This can be replaced with any type of user loading, but I kept it simple to allow easy demonstration.

public partial class Authenticated : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
{
}

var user = Models.User.Users[userId];
}
}


So, if it's not apparant, cookie reading is super simple in ASP.NET: simply call Request.Cookies[name].Value, I use ?.Value to avoid the need for an additional null-check. (If the cookie doesn't exist, Request.Cookies[name] returns null instead of throwing a KeyNotFoundException like a normal dictionary would.)

Step 2: Login.aspx

Alright, so the next step is to allow the user to login. This is really simple, and we'll do it with a small HTML page and code-behind. The login will take a username and password, and match that to a user in the User.Users array. If we find one, set a cookie with the ID.

Markup:

<%@ Page Language="C#" AutoEventWireup="true" CodeBehind="Login.aspx.cs" Inherits="Poor_Cookie_Authentication__17_4_2018_.Login" %>

<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml">
<title></title>
<body>
<form id="form1" runat="server">
<div>
<asp:Literal runat="server" ID="litError"></asp:Literal><br />
<asp:Button runat="server" ID="btnSubmit" OnClick="btnSubmit_Click" Text="Login" />
</div>
</form>
</body>
</html>


Code:

public partial class Login : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
{
Response.Redirect("Authenticated.aspx");
}
}

protected void btnSubmit_Click(object sender, EventArgs e)
{

if (user != null)
{
Response.Redirect("Authenticated.aspx");
}

}
}


There are two ways to set a cookie:

• Use Response.Cookies.Add;
• Directly call to Response.Cookies[name], which will create the cookie if it does not exist;

Here, I chose the former as it's clearer. We add a new response cookie, set the expiration for 8 hours from now, and then redirect the user to the Authenticated.aspx page.

Step 3: Our Default.aspx with logout

The last step here is to make our Default.aspx page, which is again simple, and perform our logout on this page.

Markup:

<%@ Page Language="C#" AutoEventWireup="true" CodeBehind="Default.aspx.cs" Inherits="Poor_Cookie_Authentication__17_4_2018_.Default" %>

<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml">
<title></title>
<body>
<form id="form1" runat="server">
<div>
</div>
</form>
</body>
</html>


Code:

public partial class Default : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
{
lbLogout.Visible = false;
}
else
{
}
}

protected void lbLogout_Click(object sender, EventArgs e)
{
Response.Redirect("Default.aspx");
}
}


We literally have a "Login" and "Logout", which one is shown depends on whether or not the cookie is set. On logout, we set the cookie expiration to the past so that the browser will delete it, and redirect the user.

Authentication complete!

Alright, so all-in-all we're basicallly done with designing the cookie-based authentication of the application, now let's move on to the insecurities of it.

There are literally hundreds of things we could list that are reasons to not use cookie-based authentication, but let's just go over some of the basics:

• Cookies are handled entirely client-side. That means, the client must be trusted to track the entire lifetime of the cookie: value, expiration, etc., all of it is in the client's hand.
• Cookies are always plain-data. That is to say, they are not secured in any manner. In a non-HTTPS environment, cookies are clearly visible on transmission from client-to-server, and server-to-client. A man-in-the-middle attack with cookies is so unbelievably easy. (Facebook used to be vulnerable to session-hijacking using this attack, and I'll pull references on that later.)
• Cookies carry small amounts of data. There are limits to the size of a cookie, due to the nature of it. Because cookies are passed in the headers of a web request or response, they are limited to the maximum size of a header.

Alright, so let's exploit some cookie insecurity. We're going to do four things:

1. Change the expiration to keep ourselves logged in longer;
3. Change the user ID to cause the server to error (if you place cookies into SQL this can do really bad stuff);
4. Create a cookie with a user ID to login without credentials;

So the first task on our to-do list is to change a cookie expiration. We set it for 8 hours, but any sufficiently skilled user (and you really don't have to be all that skilled) can alter it. In fact, if you download the EditThisCookie plugin for Opera, you can do so with a two clicks. I'm going to use this to do the rest, and I'm just going to throw all the pictures with basic captions.

Step 1: We'll Login via Opera. That "Cookie" icon is our editor.

Step 2: Open the cookie. We click the "Cookie" Icon and expand the cookie we want ("UserId").

Step 3: Edit the Expiration and click the "Checkmark" to save.

As you can see, pretty easy.

Next, we'll edit the value to be someone else:

Now I did not relogin, I used the original login and edited who I was. That's important to remember because it demonstrates why this is insecure.

We can error the server:

And even make a new cookie without logging in:

Automating our Tests

Ok, on to the fun part: let's automate a login, but without ever hitting the login page or function.

This is almost too easy with .NET, we'll use the WebClient and manually send the Cookie header. To do this, I created a TestAutomation.aspx page:

<%@ Page Language="C#" AutoEventWireup="true" CodeBehind="TestAutomation.aspx.cs" Inherits="Poor_Cookie_Authentication__17_4_2018_.TestAutomation" %>

<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml">
<title></title>
<body>
<form id="form1" runat="server">
<div>
<asp:Literal runat="server" ID="litResponseData"></asp:Literal>
</div>
</form>
</body>
</html>


The code is trivial:

public partial class TestAutomation : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
using (var wc = new WebClient())
{
}
}
}


Yes, we logged in and downloaded the Authenticated.aspx text with 5 lines of code, 2 of which were braces. By manualy sending the header, we made it too easy for us to work with.

If you comment out the wc.Headers.Add line, you'll see that it returns the login form. This is because the WebClient follows the redirects. With that line in, we get the Hello ...! message.

An Ideal World

Alright, so all of this is done to demonstrate the point of how easy it is to use cookie-based authentication, and how we can exploit it, but also the ease of which it does what we want. One of the things Jim had mentioned to me was that he wanted the ability to either turn authentication off, or some other way to allow the user to login, but without needing to do too much complex work.

Typically, in this scenario, this is where the developer and tester would work together on a solution, one of which might be a very simple API call to do login: pass a username and password, then it logs that session in. We can simulate this by accepting Username and Password query-string parameters in our Login.aspx:

public partial class Login : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
{
}

{
Response.Redirect("Authenticated.aspx");
}
}

protected void btnSubmit_Click(object sender, EventArgs e)
{
}

{
if (user != null)
{
Response.Redirect("Authenticated.aspx");
}

}
}


So Login.aspx didn't change much, we just handle both cases now. By pushing login into a function, it made it easy to deal with the query-string based login, and the form-based login. We need to modify our TestAutomation.aspx page a little to accommodate:

<%@ Page Language="C#" AutoEventWireup="true" CodeBehind="TestAutomation.aspx.cs" Inherits="Poor_Cookie_Authentication__17_4_2018_.TestAutomation" %>

<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml">
<title></title>
<body>
<form id="form1" runat="server">
<div>
<asp:Literal runat="server" ID="litResponseData1"></asp:Literal><br /><br />
<asp:Literal runat="server" ID="litResponseData2"></asp:Literal>
</div>
</form>
</body>
</html>


The code is the big change:

public partial class TestAutomation : System.Web.UI.Page
{
protected void Page_Load(object sender, EventArgs e)
{
// Method 1
using (var wc = new WebClient())
{
}

// Method 2
req.GetResponse(); // We don't need to do anything with the response

req = WebRequest.CreateHttp("http://localhost:60078/Authenticated.aspx");
var response = (HttpWebResponse)req.GetResponse();
using (var sr = new StreamReader(response.GetResponseStream()))
{
}
}
}


You see the // Method 2 comment? That is the part that uses the query-string to login. It can also use a POST to a form to login, if we wanted, though that is far more complex. Due to the ASP.NET event validation, it is far easier to virtually create the UI form in that case, and submit it. Instead, what we did is query-string based to support the testers use-case, and make it easy to do the testing, while also retaining most of our security.

This project is available on GitHub and I am allowing anyone to use it to any purpose, I simply ask that if you use the project directly, throw some sort of nice message on where you found it.

Implementing Server-Side RFC 3501 (IMAP) in F# (Part 2)

Alright, so the previous blog post got us introducded to our RFC 3501 (IMAP) implementation. We looked at the basic TCP/IP stack of the architecture. Today, we're going to actually start building part of the IMAP server itself.

To start building the IMAP server, we need to start building a mock file-system. Now me, personally, I'm about keeping things as simple as possible. As a result, the mock file-server I'm building is not going to do anything fancy. We'll use a DemoFiles folder in the IMAP server for all the data, then within that folder we'll have a Configuration file, then a Message_{Id} file for each message. The Configuration file will have some basic stuff:

• Password: to verify the user against on authentication;
• UIDVALIDITY: that value that RFC 3501 says can never change;
• NextUniqueIdentifier: the value that RFC 3501 dictates must change only when a message is received;

Next, each Message_{Id} file needs to have a few of it's own pieces:

• Attributes: at least all of the following attributes:
• \Seen
• \Answered
• \Flagged
• \Deleted
• \Draft
• Date/Time Received: the date and time that the message was received by the server (typically via SMTP);
• Message Size: the number of 8-bit octets in the message;
• Envelope Structure: the parsed header of the message;
• Body Structure: the parsed body (MIME-IME) of the message;
• Message Text: the text(s) in the message;

We'll worry about messages later, for now I just want to concern ourselves with the configuration, as that's the first part of implementing IMAP.

Building a Configuration File

Now we have several methods for building the configuration file: XML, JSON, INI, CSV, etc. Because I'm lazy, and because this is a sample storage system, I'm going to build it as a line-delimited file. I.e.: each line is a value in the config. (Easy to write by hand, easy to ready by hand, easy to parse with code, all-in-all, pretty easy.)

Our config file will look like:

Plaintext Password (For easy troubleshooting)
The UIDVALIDITY value (Constant)
The Next Unique Identifier value (Changes when a message comes in)


My sample configuration:

Password123
307110933
1


Building Storage Events

Using this basic configuration, we can start to modify our server to take note of file read/write. We can start modifying the IMAP server to start using some storage events.

type StorageEvents =
{ Authenticate : string * string -> User option
GetUidValidity : User -> int
GetNextUniqueId : User -> int }


So we can start developing some basic events here. We want to be able to tell if a user can be authenticated (take a username and password and output a Some User if it's valid, or None if not). We want to get the UIDVALIDITY value (take a User and return an int), and we want to get the next unique ID (take a User and output an int).

Now we don't really take commands from the server yet, but we can shoehorn a test or two in:

let objectToStr o = o.ToString()
let rec runClient (socket : Socket) =
let printfn = printfn "Socket %s: %s" (socket.RemoteEndPoint.ToString())
let user = Some { Id = "ebrown@example.com" }
async {
let! buffer = () |> socket.AsyncReceiveAll
let str = buffer |> Encoding.ASCII.GetString
sprintf "Received (%i bytes): %s" buffer.Length str |> printfn
if str.Length = 3 && str = "BYE" then printfn "Disconnected"
else
let! bytesSent = [| "Hello, other world!"B |> Some; user |> Option.map (storageEvents.GetUidValidity >> objectToStr >> Seq.map byte >> Seq.toArray) |] |> Array.choose id |> Array.concat |> socket.AsyncSend
bytesSent |> sprintf "Sent response (%i bytes)" |> printfn
return! socket |> runClient }


We'll send back the UIDVALIDITY value every time we get a packet from a client, that should be a good proof-of-concept.

Our server itself needs to change (ever-so-slightly): type Server (storageEvents) =, this will allow us to pass the storageEvents in, and the server has no idea what or how the underlying storage mechanism works (nor does it care).

Defining the entire storage system to work is really, really trivial:

type Configuration =
UidValididty : int
NextUniqueId : int }
let getConfig (f : string) =
let lines = System.IO.Path.Combine(@"DemoFiles", f, @"Configuration.txt") |> System.IO.File.ReadAllLines
{ Password = lines.[0]; UidValididty = lines.[1] |> int; NextUniqueId = lines.[2] |> int }
let serverEvents =
{ Authenticate = (fun (u, p) -> if (u |> getConfig).Password = p then Some { Id = u } else None)
GetUidValidity = (fun u -> (u.Id |> getConfig).UidValididty)
GetNextUniqueId = (fun u -> (u.Id |> getConfig).NextUniqueId) }


We defined a Configuration type, a function to parse it, and then created a serverEvents object to hold all the functions for working with the storage system. Our main function changes little:

use server = Server(serverEvents).Start()
printfn "Closing..."
0


We just put serverEvents in now.

If we put it all together to start testing it, we'll get Received 28 bytes: Hello, other world!307110933 on the client, which proves that our storage works.

We can push this off to a Storage module, to get it out of our work-area:

module EBrown.Imap.Server.Storage
open EBrown.Imap.Core

type Configuration =
UidValididty : int
NextUniqueId : int }
let getConfig (f : string) =
let lines = System.IO.Path.Combine(@"DemoFiles", f, @"Configuration.txt") |> System.IO.File.ReadAllLines
{ Password = lines.[0]; UidValididty = lines.[1] |> int; NextUniqueId = lines.[2] |> int }
let getStorageEvents () =
{ Authenticate = (fun (u, p) -> if (u |> getConfig).Password = p then Some { Id = u } else None)
GetUidValidity = (fun u -> (u.Id |> getConfig).UidValididty)
GetNextUniqueId = (fun u -> (u.Id |> getConfig).NextUniqueId) }


Implementing IMAP Commands

Alright, so now that we have a storage mechanism, we want to start implementing some IMAP commands and state and such.

IMAP commands are either tagged or untagged, and comprise of a command name, and possibly arguments. Thus a command would look like <TAG> <COMMANDNAME> <ARGUMENTS>, where <TAG> is either a client-generated string, or a * if the command is untagged. There are a few, basic commands, that can be implemented regardless of the state of the IMAP connection:

• CAPABILITY: the capability command is tagged and has no arguments. The response should be one untagged response with a list of what capabilities are currently supported by the server, and the server should send one tagged OK response (or a tagged BAD response if the command is badly formed).
• NOOP: the noop command is tagged and has no arguments. There is no specific response, but it can be used as a polling command as any command can return a status update. The server should send one tagged OK (or BAD) response after sending status update data, if appropriate.
• LOGOUT: the logout command is tagged and has no arguments. The response should be one untagged BYE, and one OK or BAD.

The format of an OK response to a command is <TAG> OK <COMMANDNAME> completed.

Thus, if a client sends a a002 NOOP, the server OK response would be a002 OK NOOP completed.

With all this in mind, we can start building a command parser and generator.

The first part we need is a Tag. This is honestly quite trivial:

type Tag = | Untagged | Tagged of string
let getTag = function | Untagged -> "*" | Tagged s -> s


As commands are either tagged or untagged, this makes life really easy.

Next, we need a general generateLine, which can be used to generate any given command or response:

let joinStrings (sep : string) (sarr : string array) = System.String.Join(sep, sarr)
let generateLine tag largs name rargs =
[|[|tag |> getTag |> Some|]; largs; [|name |> Some|]; rargs|]
|> Array.concat
|> Array.choose id
|> joinStrings " "


We generalize this enough so that we can use it for any command, which makes life extremely trivial to build a generateCommand:

let generateCommand tag name args = generateLine (Tagged tag) [||] name (args |> Array.map Some)


Now we can generalize command generation, let's build a capability command:

let capability tag = generateCommand tag "CAPABILITY" [||]
let capabilityResponse tag args = [|generateLine Untagged [||] "CAPABILITY" (args |> Array.map Some); generateLine (Tagged tag) [|"OK" |> Some|] "CAPABILITY" [|"completed" |> Some|]|]


Again, we're keeping it mostly simple.

The next step is to parse a command. This will be just as easy as generation, we'll define new types and a function to handle the input.

type Command = | Capability
type InputCommand = { Tag : Tag; Command : Command }
let parseCommand (command : string) =
let parseTag = function | "*" -> Untagged | s -> Tagged s
let parseCommandName =
function
| [|"CAPABILITY"|] -> Capability |> Some
| _ -> None
let parts = command.Split(' ')
parts.[1..] |> parseCommandName |> Option.map (fun c -> { Tag = parts.[0] |> parseTag; Command = c })


Again, quite simple. We parse the tag out, then we parse the command out, and return the result.

So with this, extending to add a NOOP command is really easy:

type Command = | Capability | Noop
....
let noop tag = generateCommand tag "NOOP" [||]
let noopResponse tag = [|generateLine (Tagged tag) [||] "NOOP" [|"completed" |> Some|]|]
....
let parseCommand (command : string) =
let parseTag = function | "*" -> Untagged | s -> Tagged s
let parseCommandName =
function
| [|"CAPABILITY"|] -> Capability |> Some
| [|"NOOP"|] -> Noop |> Some
| _ -> None
let parts = command.Split(' ')
parts.[1..] |> parseCommandName |> Option.map (fun c -> { Tag = parts.[0] |> parseTag; Command = c })


And now we have NOOP support.

Lastly, LOGOUT command support:

let logout tag = generateCommand tag "LOGOUT" [||]
let logoutResponse tag = [|generateLine Untagged [||] "BYE" ([|"IMAP4rev1"; "Server"; "logging"; "out"|] |> Array.map Some); generateLine (Tagged tag) [||] "LOGOUT" [|"completed" |> Some|]|]
let parseCommand (command : string) =
let parseTag = function | "*" -> Untagged | s -> Tagged s
let parseCommandName =
function
| [|"CAPABILITY"|] -> Capability |> Some
| [|"NOOP"|] -> Noop |> Some
| [|"LOGOUT"|] -> Logout |> Some
| _ -> None
let parts = command.Split(' ')
parts.[1..] |> parseCommandName |> Option.map (fun c -> { Tag = parts.[0] |> parseTag; Command = c })


Now as we work with this, we should realize that we can actually remove most of the specific functions, and use the InputCommand for the capability, noop, and logout functions we built:

type ClientCommandName = | Capability | Noop | Logout
type ClientCommand = { Tag : Tag; Command : ClientCommandName }
let generateClientCommandName =
function
| Capability -> ("CAPABILITY", [||])
| Noop -> ("NOOP", [||])
| Logout -> ("LOGOUT", [||])
let generateClientCommand (command : ClientCommand) = command.Command |> generateClientCommandName ||> generateLine command.Tag [||]
let parseClientCommand (command : string) =
let parseTag = function | "*" -> Untagged | s -> Tagged s
let parseCommandName =
function
| [|"CAPABILITY"|] -> Capability |> Some
| [|"NOOP"|] -> Noop |> Some
| [|"LOGOUT"|] -> Logout |> Some
| _ -> None
let parts = command.Split(' ')
parts.[1..] |> parseCommandName |> Option.map (fun c -> { ClientCommand.Tag = parts.[0] |> parseTag; Command = c })

type ServerCommandName = | Capability of OkBadResult * string array | Noop of OkBadResult | Logout of OkBadResult
type ServerCommand = { Tag : Tag; Command : ServerCommandName }
let generateServerCommandName (command : ServerCommand) =
match command.Command with
| Capability (res, options) ->
let lRes, rRes = res |> getOkBad
[|(Untagged, [||], "CAPABILITY", options); (command.Tag, [|lRes|], "CAPABILITY", [|rRes|])|]
| Noop res ->
let lRes, rRes = res |> getOkBad
[|command.Tag, [|lRes|], "NOOP", [|rRes|]|]
| Logout res ->
let lRes, rRes = res |> getOkBad
[|(Untagged, [||], "BYE", [|"IMAP4rev1 Server logging out"|]); command.Tag, [|lRes|], "LOGOUT", [|rRes|]|]
let generateServerCommand (command : ServerCommand) = command |> generateServerCommandName |> Array.map (fun (t, l, n, r) -> generateLine t (l |> Array.map Some) n (r |> Array.map Some))


So now we have a manner of building commands, somewhat dynamically. We can construct a command from what it needs, which means we can continue on with the next step: authentication / authorization. That will be the subject of the next blog post, as I want to allow a lot of this to sink in, so that we can hit the ground running in the next situation. We have a whole hell-of-a-lot to do yet, so we'll take it in shorter, more manageable bits.

As a reminder, the code for this is on GitHub. The version for this specific blog post is tree 8d7b3f15777d0be7cd78903bb5f0c94d8175230d`.

Due to time constraints, I didn't make it as far as I would have liked with this post. But never fear, consistent, slow progress is better than inconsistent progress.