using Programming;

A Blog about some of the intrinsics related to programming and how one can get the best out of various languages.

Getting started with programming and getting absolutely nowhere (Part 1)

How to become a programmer (in a few not-so-easy steps)

Recently a friend of mine has asked me to teach him "how to become a programmer", which is not really an easy task to accomplish, but I'll give it a shot. Since I'm an altruistic person, I am going to make this a series of blog posts on the idea of becoming a programmer, we're going to mostly look at the .NET structure, and the first few lessons will be in F# to get the ball rolling somewhat quickly.

The first rule of programming: forget what you think you know

All programmers, including myself, start somewhere. The biggest mistake I've made on the quest to become a programmer is to focus entirely on some "bigger goal" to achieve, some large project that I wanted to build or work on. Goals are great, but we have to start small. (Especially in the modern world, where we have hundreds of languages and frameworks and environments to choose from.)

So my very first word of wisdom is to forget everything you think you know. All of it. Forget what languages you think you want to learn, forget what areas you want to work in, forget what you know about "HTML coding for Myspace pages", ditch it all. We're going to start your career of programming from the beginning - the basics first, then we'll work up to the fun things.

"Hello World" is so cliche

I'm not going to bore you by starting our adventure with a "Hello World" program, instead we're going to start it from an actual business-driven design. Instead of contriving highly-egregious examples of programs to write, building some sort of "example" application, we're going to build real-world business applications: things that you will actually be expected to come up with. And we're going to do so poorly, so that we can learn where to improve. I'm going to take personal experiences (and mistakes) and burn them into your core, so that you can identify what makes good code good, and what makes bad code bad.

The first mistake is that people always lose interest, as you probably have by this point in this blog post. But hang tight, we have some code coming up, and I'm going to start showing you how we can take some basic, elementary ideas and turn them into truly remarkable pieces of software.

Without further ado, let's get going

We're going to begin our entourage into programming with F#, which is very much unusual from normal adventures. The easiest way to get started is to install a copy of Visual Studio 2017 Community (or if you prefer to purchase it, Enterprise or Pro), with the F# development tools selected. I'm not going to walk you through this for a few reasons:

  1. There are about a hundred thousand articles and blog posts on installing Visual Studio and adding and removing features.
  2. It's actually extremely self-explanatory, it really is pretty simple to get set up.
  3. I want you to struggle a bit with this.

Notice point 3: I'm not going to hold your hand, we're going to go on an adventure and we're all going to make mistakes, I want you to struggle with this entire venture, for two reasons on it's own:

  1. Only by struggling do you force yourself to think critically;
  2. I want you to decide if this is a path you wish to pursue.

Too many people say "teach me to be a great programmer" but refuse to put the work in to become one - I can't write a long, wordy article that makes you a good programmer, I can only guide you down the path to success. It is up to you to decide if you wish to continue down the path or not. By forcing you to struggle with real-world problems you'll be better prepared for how programming actually is.

So let's start with some code. Because we're using F# it's going to be a somewhat quick foray into our career, so open up Visual Studio, and find the F# Interactive window - I won't tell you where it is, so this is the first struggle you'll have to do on your own. (I'll give you a hint - it's under some sort of "Window" menu somewhere.)

Once we've opened F# Interactive, I want you to define some sort of "domain model" to describe some sort of arbitrary user. The F# syntax for such a thing is something like:

type SomeName = {
    SomeProperty
}

The syntax for SomeProperty can look something like:

SomeValue : SomeType

Right, so I'll give you the first object for free:

type User = {
    Id : int
    Name : string
    Email : string
    Phone : string option
}

For those of you who don't know, this is the basic F# syntax for defining a "class", but we get a lot more than just a class, so we won't call it such a thing (because naming is important - if it's not a duck we really ought not call it a duck). In F# this is called a Record.

A Record in F# has a few basic features: properties/fields, values, and is non-nullable. In fact, almost everything in F# is non-nullable which is why we're starting with this language. Nullable values are hard, and if you don't believe me just "Google" for a "NullReferenceException" - it should become clear.

What we've define here is a "User" with four properties: an "Id" (which is an integer: a whole number so-to-speak), a "Name" (which is a string, or a sequence of characters in a specific order), an "Email" (which is also a string), and a "Phone" (which is a string, but it's "optional" - we may or may not have a phone defined). An "option" in F# is a kind of null, but not really. It has two cases: a None value (which means the lack of a value entirely), or a Some value, which means that there is "Some" value there. These are important because these idiomatically represent the idea of a non-existent value.

One of my favourite features of F# is it's type-inference, that is, F# will detect what types you're playing with in most situations correctly. So we're going to define a "User" now, the syntax of which is:

let myUser = {
    Id = 1
    Name = "Elliott"
    Email = "example@example.com"
    Phone = None
}

I believe I owe you an explanation: by specifying all four of these properties, F# understands that we want to create a User object. We didn't explicitly tell it that, but it assumed based on what we gave it that we wanted that type of object. We could have also told it that we wanted myUser to be a User with a type-annotation, that is, : SomeType, which would mean let myUser : User = {...} would create our myUser record. Simple, right?

Pointing out our First Mistake

I need to tell you that we already made a few mistakes here, not the least of which is the choice of myUser as a name. What does myUser mean? Why does it have the type-name in it? What will myUser be doing?

We really need to touch on naming, but I don't have a lot of time for that, so I'll be brief: naming things is hard. You should name something for what it means, rather than what it is. That is, instead of myUser, we should really name that ebrown, or something.

We also need to touch on the type-safety (which is one of many such safeties) in F#. As a language, F# forces you to define the entirety of a type at once. We know that Phone has a value of None, which happens to be the default, but we still have to tell F# that we want to set None explicitly to Phone. It won't do it by itself, we have to. Remove that line and you'll see what I'm talking about.

So what have we effectively done here? We've created a Type, but what is a type? What does it do? To put it short, we've created a point that we can hold data. So that's a good thing, right? We've created something that can store state, something that holds a group of related values. We could have just as easily done

let ebrownId = 1
let ebrownName = "Elliott"
let ebrownEmail = "example@example.com"
let ebrownPhone = None

So what's the difference? Why did we build a type? I'll give you a hint: what stops you from sending ebrownId, ebrownName, alexEmail, ralphPhone to a method? Ah, that's what it's for. It groups related values together so they're unmistakeably related. They belong in a group.

Alright, so I've done a lot of talking (see: rambling), and we've effectively seen what, 20 lines of code? Let's actually make something happen. I'm not going to explain every little detail of this next snippet, but I will go over the general idea.

type User = {
    Id : int
    Name : string
    Email : string
    Phone : string option
}

let ebrown = {
    Id = 1
    Name = "Elliott"
    Email = "example@example.com"
    Phone = None
}

let createReminder thing user =
    sprintf "Hello %s, this is a reminder to do the thing (%s)." user.Name thing

let remind =
    createReminder "Write a Blog Post" >> printfn "%s"

let reminder =
    ebrown |> remind

Right, so what happened? This is actually pretty simple, we already saw creating a user, now all we're adding is passing that user to a function and gathering a result. There are a couple key points to note here:

The pipe-right (|>) operator takes the value on the left side, and transforms it to be the last argument of the function on the right side. So ebrown |> remind is the same as remind ebrown. We use this pipe-right operator mostly to keep the order of things consistent. I.e. let someResult = someValue |> step1 |> step2 |> step3, thus allowing us to follow the "flow" of the functions.

The next notable piece is the compose-right (>>) operator, this has a similar effect as the pipe-right, except that instead of needing a value, it needs a function. So it builds a function that is composed of the steps of the first function, then the second function. This can also be chained as necessary, the difference between this and pipe-right being what they operate on: pipe-right operates on the value, compose-right operates on the function.

This brings us to an interesting aspect of F#: the type system. I've deliberately avoided discussing it yet, but now it's about time to get into the nitty-gritty of what the type-system does.

In most "regular" languages functions are presented as some type of prototype: you have an input and a result. This is usually written (in C-based languages) as ResultTypeName FunctionName(InputTypeName1 inputValue1, InputTypeName2 inputValue2, ...), etc., which basically says "this function takes [...] as input and returns a ....".

In F# we do things a little differently. Instead of declaring a function as having "all these" inputs each and every function in F# has exactly one input.

But Elliott, our createReminder takes two parameters, a thing and a user.

Technically, this is wrong. The createReminder function takes one input, then a new function is created which takes the second input. We call this "currying", in laymans terms it means that functions are partially applied, which is, kind of, part of the point of being a functional language.

Type Definitions aren't so scary

Enough blab, let's talk about a type definition. F# defines a function as 'a -> 'b, that is, 'a is an input which returns a 'b. If you look at the signature for our createReminder, you'll see that it is string -> User -> string, "I first take a string, which returns a function that takes a User, which returns a string."

If you look at remind, you'll see a very different signature: User -> unit. In F# the unit is the "void" type, that is, a unit is a nothing. It's basically the end of something. You can create a unit directly as (). If you run let unitThing = () in F# Interactive, you'll see it prints val unitThing : unit = (), meaning it's a nothing. So remind takes a User and returns a "nothing".

So, all this said, we can build a reasonable thought process from the following signatures:

string -> int -> unit            // Takes a string, then an int, returns a unit / nothing
string -> string -> string       // Takes a string, then a string, returns a string
int -> (string -> int) -> unit   // Takes a string, then a `string -> int` function, returns a nothing
(string -> int) -> float         // Takes a `string -> int` function, returns a float

Do those make sense? Good, I hope you thought that was easy, because I'm only going to make things more difficult from here.

What the snot does any of this do?

I want to intentionally confuse you for a moment, because learning requires a struggle. We're going to look at a highly contrived example (remember when I said I wouldn't do that? I'm breaking that rule apparently) of just how complex we can get.

let formula values =
    let xb =
        values
        |> List.sum
        |> (/)
        <| (values.Length |> float)
    values
    |> List.fold (fun acc x -> acc + (x - xb) * (x - xb)) 0.
    |> (/)
    <| (values.Length |> float) - 1.
    |> System.Math.Sqrt

Oh boy. This looks fun. Can anyone guess what's going on here? If you guessed this calculates the sample-standard deviation, you're spot on. We simply let xb be the average (calculated by summing the values, then divide by the length), then we simple "fold" over the values, summing the squares of x - xBar together, divide by N - 1 and then take the square-root.

You should have been able to guess that the pipe-left (<|) operator does what the pipe-right operator does, but in reverse. Instead of supplying the right side as the last argument to the left call, it supplies the left side as the last argument to the right call. Do note that nothing is special about the division (/) operator, we can actually treat operators as functions in F#. That is the biggest take-away from this.

We'll also notice this function has a type-definition of float list -> float.

Building an order of elements

I'm going to keep on the boring train for the rest of this section, since we're already there. (My rambling doesn't help.) We're going to discuss greating a list/array/sequence. (All three of these are different ways to represent a grouping of values that share a type.)

The most basic of these types is an Array, which is also an array and a []. All three of these are valid ways to state "array". Arrays are one of the most basic "collection" structures. Essentially, you declare an array (of a fixed size) and then set elements in it by "index" (0-based). In F# we access an array element by .[#], where # is the index (with the first being index 0) of the element we want.

Think of an array as a crayon box: you have a specific number of crayons (each of a different color) which you can access by "index" (or position). Simple, right? We could even represent a crayon box in F#:

let crayons : System.Drawing.Color array = ...

Pretty simple. You can construct an array of elements by wrapping them in [|...|], and separating each element by a semicolon. ([|System.Drawing.Color.Black; System.Drawing.Color.Gray; ...|]).

Next is a List or list. This is simply an Array that can be resized. The syntax is [...], with the same semicolon separator.

Finally, a sequence. This is a very special collection in that it's lazy. The syntax is {...} with the semi-colon separator. The thing about a Sequence (or seq) is that it doesn't initiate all the values initially, it only initiates them when you iterate them. (That is, ask for the next value.) While a list and array both allow you to index a value, a sequence does not. Remember this, because we'll need them later.


I'm going to stop here, but I'm going to pick up where I stopped here tomorrow, and we'll go over a couple business domain issues and learn how to build software to represent them. I'll be doing a lot of "here's how you solve the problem, each step of the code" and not a lot of "this is what _ means", so I expect you to research the different operations that we do, and try and use some critical thinking to push yourself to learn what's going on.

I do realize that this was probably the worst possible writing ever by me, but it's been a while since I've done one of these and I tried to just cover what I could as I thought of it. The next few sections of the series will get a lot more informative and organized.

Phishing: how do we identify it?

Holy phishing batman

Note: this will be part of a longer series on identifying, preventing, and remedying issues related to phishing attempts.

I don't usually post about issues like this, but this is a huge problem and can definitely ruin people's lives, I think it's important to get more information on the topic out there.


So recently I tweeted about two very obscure phishing scams I was sent:

Recently I got two very good-looking phishing emails, annotated to help future potential victims identify them and prevent scams.

It was obvious to me that they were scams, based on all the tell-tale signs of the email I was immediately thrown to the fact that they were not legitimate emails from either of these organizations. (I happen to have accounts with both of them, which made identification slightly easier.)

We're in the year 2017, and if anyone tells you Email or computer systems are dying then they're absolutely wrong. We're entering an age where the internet is becoming a necessary requirement for anyone's life, and I want to talk about one of the biggest problems with email: phishing,

What is phishing?

When I went to college (and even before then) we studied a term called 'social engineering', that is, the idea that an attacker will not attempt to hack/crack/break into your system through technical means, but will convince you to do it for them. This is the broader category that phishing fits into.

Phishing is the act of an attacker convincing a victim (usually a user) to deliver secured information to them in a means that makes the victim completely unaware that they just gave secure information away through the use of faked online forms/websites/emails. The most common (in my experience) is banking information.

With all this in mind, I want to try to help you prevent becoming a victim. There isn't a true "guide" for how you can stop yourself from becoming a victim, but there are steps you can take. I'm going to use actual phishing emails I was sent for this demonstration. Real emails you may receive and how you can identify them and prevent yourself from becoming a victim.

Do note: I'm not even going to talk about the technical aspects of these emails, this guide will be an end-user guide, for the people these emails are designed to victimize.

What does phishing look like?

To see what phishing looks like we're going to use two very real examples, which I've already annotated with some of the tell-tale signs of a phishing email (any one of these signs on it's own isn't necessarily an indicator, but all of them together add up quickly).

Screenshot 1 - Wells Fargo Phishing Attempt Screenshot 2 - Capital One Phishing Attempt

As you can see, I drew a lot of red. Let's talk about this section by section to explore how we can apply this more broadly.


The Wells Fargo email broke down

The first image is from 'Wells Fargo' (or someone that wants you to think they're Wells Fargo). We can see an issue with this right off the bat: the 'From' address. I happen to be a Wells Fargo customer (funny: the email that this phishing scam was sent to is not my account email), and all the emails I get from them have a specific from address: Wells Fargo Online <alerts@notify.wellsfargo.com>. So of course that was the first big red flag for me. However, let's assume you don't know that the usual alert email is alerts@notify.wellsfargo.com, the address itself has one fatal flaw that should give you at least a yellow flag:

wellsfargo_notification_alerts@wellsfargo.com

Generally a bank won't send you an email from bankname_notification_alerts@bankname.com, that's not their usual M.O., typically the email address is just notifications@bankname.com or alerts@bankname.com.

So let's assume that the sender address isn't the problem, that's fine. We'll move to the 'To' line. They sent this email To: Recipients. If you are using Outlook you can expand the contact card and you'll find that the Recipients email address is wellsfargo_notification_alerts@wellsfargo.com. So the attacker sent the email to themselves, and they must have blind-carbon-copied us on it. That immediately tells me it's a phishing attempt. Why would my bank send an email to itself and blind copy me in?

If that wasn't obvious enough, and for some reason you're still in the yellow zone, we have this attachment named Wells Fargo Online Verification.htm.

For those of you that aren't technical users: an htm document is synonymous for an html document, which is basically a webpage. (It is a webpage, but it's not on the web at this point, it's on your PC.) This type of document can easily be processed by a web-browser on your computer (Internet Explorer, Mozilla Firefox, Apple Safari, Opera, Microsoft Edge, Google Chrome) and can do many very cool things, and also many very bad things.

The problem is most people aren't aware of what to do with these documents, so they're not used much for actions with the end-user. Generally you can just double-click the document and it will open in your browser of choice, whatever your default is.

Professional organizations don't usually send you these types of documents. Usually when this type of information needs to be sent you get one of two things: a link to a web-page or a PDF file. Why? Because msot people know what to do with both of those. They know to click a link, and they know to download/save/print a PDF.

The problem with HTML files is that they can contain malicious data. (There are more technical terms for it, but I'll save you the hassle.) They can install files on your PC, they can change settings (in some cases), and they can make you think you're going to a legitimate banking website.

We're not going to open it yet, we're going to mark it as a red flag and continue with the email.

The next thing we see that's a moderately yellow flag is the email body. There is a lot of text in this body that rubs me the wrong way.

We recently reviewed your account, and we are suspecting that your Wells Fargo account may have been accessed from an unauthorized computer.

This may be due to changes in your IP address or location. Protecting the security of your account and of the Wells Fargo network is our primary concern.

We are asking you to immediately login and report any unauthorized withdrawals, and check your account profile to make sure no changes have been made.

This alone isn't a bad phrase, but when it's combined with the next phrase it becomes more disturbing:

To protect your account please follow the instructions below:

  • LOG OFF AFTER USING YOUR ONLINE ACCOUNT

If it were truly a security concern, the bank wouldn't just recommend logging off after you finish using your account. They would also recommend changing your credentials. If your account is being accessed by another PC/user then logging off will not fix it. (In almost all cases, this is true.) So, if the sender were truly concerned about your information, as a bank would be, the recommendation would be to change your password.

Please Download the Attachment file of your Wells Fargo Online Verification and Open on on a browser to complete your account verification process:

Verify the information you entered is correct.

We apologize for any inconvenience this may cause, and appreciate your support in helping us maintaining the integrity of the entire Wells Fargo System. Please verify your account as soon as possible.

Why would I need to download an attachment to login? If I simply need to login to my account, send me a URL/address.

Lastly:

Copyright © 1999 - 2016 Wells Fargo. All rights reserved..

No self-respecting bank would leave that double-period typo in an email. Bank emails go through a rigorous approval process, that typo alone isn't justification for a phishing attempt, but when added to the rest of the email, 'Mark as Spam'.


Breaking down the Capital One attempt

So we just broke the Wells Fargo attempt down, let's do the same for the Capital One phishing attempt.

I have an account with Capital One, and I have seen three distinct email addresses from them: Capital One <capitalone@service.capitalone.com>, Capital One <capitalone@notification.capitalone.com>, and Capital One <capitalone@email.capitalone.com>.

I've also seen a fourth one, but the address itself is slightly disconcerting: Capital One <capitalone@capitaloneemail.com>. That's a really bad email for a bank.

We do at least see a pattern: emails from them are generally Capital One <capitalone@somedomain.capitalone.com>, which is a mostly good thing. This means we can write off Capital One <notifications.alerts@capitalone.com> as a non-legitimate email.

Next we have the same problem as the Wells Fargo email: Recipients and the .htm attachment. We'll skip those since we talked about them above.

We get to the body, and we run into a few things that are fairly alarming:

It has come to our attention that your Billing Information records are recently changed.

That's grammatically wrong, 'records have recently changed' is better.

That requires you to verify your Billing Information. Failure to validate your billing information may result to account termination.

Capital One isn't going to terminate my auto-loan over this, they would call me and verify it first. Better: 'may result to'? Bad grammer again.

To verify your billing information, Please Download Attachment and open in a browser to Continue. We value your privacy and your preferences...

Why are 'Please Download Attachment' and 'Continue' all upper-case on the first letter (capital-case)? That's not normal. Just as well: 'value your privacy and your preferences'? What does that even mean? Then the three dots / elipsis? This is atrocious.

Failure to abide by these instructions may subject you to Capital One account restrictions or inactivity.

That's not what you said above.

TM and copyright © 2017 Capital One Inc. 1 Infinite Loop, MS 96-DM, Cupertino, CA 95015.

For those who don't know, that's Apple's address.


Summary

Overall, I hope this is helpful to increase your ability (and your friends, family, coworkers and loved one's abilities) to identify phishing scams that look legitimate, and prevent becoming a victim to the tactics that these attackers use to steal your information. In a future blog post I'll talk about why it's important to identify them, and how they steal your information.

Demonstrating Insecurity of Managed Windows Program Memory

Is Memory in a Managed Windows Program Secure?

Recently I was on one of the (many) Stack Exchange sites answering a question a user posted, like usual. This question was a bit different though: the asker was concerned about the best way to make sure people couldn't read the password the user entered out of memory.

Unfortunately, this is not a task that can really be solved on consumer devices. Anyone with enough knowledge (and it's not really a lot) can do it. I'm going to demonstrate how to today with a simple Visual Studio programme.

Essentially, what I'm going to do is 'connect' to a fake SQL server (nothing about it will really exist) and then demonstrate how one can (with a copy of Visual Studio) extract the entire Connection String of that SQL connection out. It's actually quite trivial and with enough practice can be done in seconds.

Of course there are other ways to do this, it can be done programatically, there are other bits of software for it, etc. I'm just going to demonstrate how any developer can do it with the tools (s)he has at their disposal.

Creating our test projects

So the first step is to create a test project we can use to 'attack'. We're going to consider this an attacker/victim scenario, since that's what one of the real world applications is.

Our code is going to be pretty simple:

using Evbpc.Framework.Utilities.Prompting;
using System;
using System.Data.SqlClient;

namespace VictimApplication
{
    class Program
    {
        static void Main(string[] args)
        {
            var consolePrompt = new ConsolePrompt(null);

            var connectionString = new SqlConnectionStringBuilder();
            connectionString.DataSource = consolePrompt.Prompt<string>("Enter the SQL server hostname/ip", PromptOptions.Required);
            connectionString.UserID = consolePrompt.Prompt<string>("Enter the SQL server user id", PromptOptions.Required);
            connectionString.Password = consolePrompt.Prompt<string>("Enter the SQL server password", PromptOptions.Required);
            connectionString.InitialCatalog = consolePrompt.Prompt<string>("Enter the SQL server database", PromptOptions.Required);

            using (var sqlConnection = new SqlConnection(connectionString.ToString()))
            {
                try
                {
                    Console.WriteLine("Connecting...");
                    sqlConnection.Open();

                    using (var command = new SqlCommand("SELECT 15", sqlConnection))
                    {
                        Console.WriteLine($"Command output: {command.ExecuteScalar()}");
                    }
                }
                catch
                {
                    Console.WriteLine("Could not establish a connection to the server.");
                }
            }

            Console.WriteLine("Press enter to exit.");
            Console.ReadLine();
        }
    }
}

Do note this uses my ConsolePrompt from GitHub.

So we have our victim application, now we'll go ahead and attack it.

Attaching Visual Studio to a running application

You're probably expecting a title like 'attacking an application with Visual Studio' but that's not as descriptive as what we're doing. Yes, this is how you attack it, but attack sounds nefarious. We're not doing anything nefarious, we're just attaching a debugger to a running application.

So we're going to open a new instance of Visual Studio, and not open or create a project. Just open the instance.

Screenshot 1 - Fresh Visual Studio Instance

So, we've opened Visual Studio (I'm using 2015 but this should work on 2010+). The next thing we'll do is launch our application.

Screenshot 2 - Launch Application Outside Debugger

Right, so we have the application running outside the debugger. No other instances of Visual Studio need to be open, nothing else needs to be running, just that application and our fresh instance. The next step is to attach the debugger to a process.

This is under Debug -> Attach to Process. You should see a new window open, and we want to find our 'Victim Application' (VictimApplication.exe).

Screenshot 3 - Attach to Process

We'll go ahead and attach it. Our screen should change to look like we're in a regular debug session, even though we didn't launch the program through Visual Studio.

Screenshot 4 - Debug Session is Green err Blue

Now we still have our other window open with our running application in it. All we have to do next is start checking it out and see what we can inspect.

This next part isn't required, but it should help you familiarize yourself with what we're going to do. Let's hit the 'Break All' button (CTRL + ALT + Break with default shortcuts).

Screenshot 5 - Break mode

As of this moment the program is paused. Since it's a console application, you can still type into it, but your typing will not be processed by the program at this point.

Screenshot 6 - Text not handled by program

Next we'll hit 'Show Diagnostic Tools' and then select the 'Memory Usage' tab. Once we have done that, we'll hit 'Take Snapshot'.

Screenshot 7 - Taking our first memory snapshot

So now we're at the point we can start inspecting objects in our program. The first thing we'll want to do is click the blue 429 (your number may vary) link to the list of objects.

We'll then sort them by name since we're not concerned about the count, we just want to look through them.

I'm going to inspect our ConsolePrompt as an example, which in this case is listed as Evbpc.Framework.Utilities.Prompting.ConsolePrompt. When you find an object you want to inspect, hover over it and you should see an icon that looks like a square grid with a circular shape on the top-left corner, click that and a new page should open.

Screenshot 8 - Selecting an object to inspect

We'll then see a new page with all the instances of that object listed. If you hover over the Value, you should get a tool-tip that has a breakdown of the object itself, which you can explore just like normal. We'll see that the Logger is in fact null like we wanted.

Screenshot 9 - Exploring our object

Now that we've played with our explorer, we can go ahead and close that breakdown and continue with our program. We'll hit 'Continue' and resume execution. If you typed a server host into the console, you'll see as soon as we hit continue that the program continues to the next step. We'll fill out all our requirements and then break our program again when it starts connecting, and take another memory snapshot.

Screenshot 10 - Connecting to our server

We see that the new snapshot has 3,825 objects allocated, and the difference is an increase of 3,396. Our graph shows that we allocated a lot more memory (relatively speaking) and we can now go ahead and inspect our snapshot to try to find our password. We'll be looking for a string type with a value of pass.

We know it'll be part of the SqlConnection, so we'll sort that by name and then go down to SqlConnection and explore it like before.

Screenshot 11 - Find our SqlConnection

Upon exploring it we'll just a different method of extracting our string. Click 'Referenced Objects' at the bottom of our window, and hover over the middle String object. (Mine is 0x2FB0BD8)

Screenshot 12 - Extracting our Connection String

And there we have it. We have successfully extracted our password from a separate Visual Studio instance while the original application was running completely separately.

Debug Symbols and why they are important

Of course, our demonstration was made slightly easier by the inclusion of the .pdb files (debug symbols), usually you won't have access to these for the running application, so you'll have to look a little harder sometimes to find what you're looking for.

If you don't know what Debug Symbols are, Wikipedia has a nice description. Essentially, the pdb file (stands for 'Program Database') is the symbol map for .NET programs. It contains each generated instruction header and what the generated name of it was.

Finding our String without SqlConnection

The last thing we'll do is find our string value without exploring the SqlConnection object. We're only going to look with the Diff list, and run from there.

So, we'll restart our application, then attach the debugger, then enter our host and user, then take a memory snapshot like we did earlier.

Screenshot 13 - Round 2 First Snapshot

Then we'll hit 'Continue', enter our password, and take another snapshot.

Screenshot 14 - Round 2 Second Snapshot

The next step is to disable 'Just My Code' in the filter. If we don't do this it becomes much more difficult to locate what we changed.

Screenshot 15 - Round 2 Disable Just My Code

So we see that it created one string, by the Count Diff. being +1 on the String type, this helps us narrow down what we're looking for. If click once into it, and view our 'Paths to Root', it helps us discover that we have +1 in String [Local Variable]. So we're in the right place.

Screenshot 16 - Round 2 String Local Variable

We'll inspect the String like before (Square icon with Round outset) and we'll see that by default it sorts the list by `'Inclusive Size (Bytes)', we'll sort it by 'Instance'. Theorhetically our password should be the last instance listed. If we scroll to the bottom of the list we see that, indeed, it is.

We also see that our user id is right above it.

Screenshot 17 - Round 2 Find our Password


And there we have it! We learned how to inspect objects in our program when it was launched outside Visual Studio by attaching a debug instance of Visual Studio to it.

Optimizing for Tail-Call Recursion in F#

Rewriting for Tail-Call Recursion in F#

While this post is written in F# and specifically for it (referencing ILASM), the principles and practices here can be applied to any function (or non-functional) language that uses tail-call recursion and provides optimizations for it.

Recently I was working on a project, and I had to extend the F# List type to make things a bit simpler. I wrote a method takeThrough that allowed me to provide a predicate, and much like takeWhile, it would return elements until the predicate returned false, then it would return one more element. This was important for the code I was writing, I needed to return everything up to and including the first element that caused the predicate to return false.

So, I wrote this method called takeThrough:

let takeThrough(predicate)(source) =
    let rec loop sourceTemp =
        let head = sourceTemp |> List.head
        if head |> predicate = true then
            head :: (sourceTemp |> List.tail |> loop)
        else
            [head]
    loop source

The problem with this method is that it cannot be optimized for tail-call recursion in it's current state.

What is Tail-Call Recursion?

In order to understand why we're talking about optimizing for tail-call recursion here, we have to first undestand what is tail-call recursion?

In functional languages such as F# things like loops are discouraged, and in some of them even unavailable completely. So, in order to avoid them we have to rewrite our methods for recursion of some form (usually).

If we were to take this same example in C# it might look something like:

IEnumerable<T> TakeThrough<T>(IEnumerable<T> source, Predicate<T> predicate)
{
    var continueLoop = true;

    foreach (var item in source)
    {
        if (predicate(item))
        {
            yield return item;
            continueLoop = true;
        }
        else
        {
            yield return item;
            continueLoop = false;
        }

        if (!continueLoop)
        {
            break;
        }
    }
}

(This may not be the most optimal manner to write this method in, but it guarantees success.)

So we're just looping through each item here and returning it as we go. With F# we don't want to use such a construct, as we want to avoid loops. So we go to recursion, in C# the F# code we wrote above might look more like:

IEnumerable<T> TakeThrough<T>(IEnumerable<T> source, Predicate<T> predicate)
{
    return _loop(source, predicate);
}

IEnumerable<T> _loop<T>(IEnumerable<T> sourceTemp, Predicate<T> predicate)
{
    var head = sourceTemp.First();

    if (predicate(head))
    {
        var result = new List<T>();
        result.Add(head);
        result.AddRange(_loop(sourceTemp.Skip(1), predicate));
        return result;
    }
    else
    {
        return new List<T> { head };
    }
}

It's almost verbatim identical to the F# version. What we can see being a problem here is a StackOverflowException being through if source is large enough and predicate would return late enough. This is what we're hoping to avoid with Tail-Call recursion.

Remember: in order for a method to be optimized for tail-call recursion, the recursive call has to be the last thing the method does.

Now you might look at that method and say "well the last thing that happens is everything is piped to loop." Not quite true. We don't realize that head :: is the very last thing the method has to do.

This is an important note because loop is called, then that value is given to the concatenation operator.

The if/else is ugly too

Of course the other problem is the if/else construct, but that can be fixed with a match head |> predicate with and then match to each boolean value (true and false).

Right, so that's simple. Easy fix:

match head |> predicate with
| true -> head :: (sourceTemp |> List.tail |> loop)
| false -> [head]

Great. We solved the easy idiomatic issue, but how in the world do we make it tail-call recursive?

Visualizing our tail-call recursion

The first thing we have to do is determine how can we write the structure of this method so that the loop call is the last thing to happen? Ignore what it does for now. We just want to know what it would have to look like. We need a visual.

let takeThrough predicate list =
    let rec loop ... =
        ...
        loop ...
    loop ...

So we know what it should look like-ish. That's a very good start. Now we have to figure out how we can get it to that state.

So we know that our method needs to match each item with a predicate, and then return a List of all the elements that matched and the next element. So we need to accumulate a list of elements.

Notice I bolded accumulate. We need a variable in our loop that is an accumulator in this case.

Now we know our visual needs to change:

let takeThrough predicate list =
    let rec loop acc ... =
        ...
        loop newAcc ...
    loop [] ...

This looks about right. Our acc will be a List since that's what we're building out of, and we're going to pipe the newAcc to the list each time we iterate, and then pipe an empty list to our loop before we get started.

Creating our tail-call recursion

So now that we've visualized it, we can start to write the final pieces of it.

We'll start at the final line: loop [] .... What do we know about this call? We know that we only have two parameters in the method, and one variable (well, constant, but it's a function so it will look like a variable). And that's all we need. So we'll pass our initial list to our loop because it's the only we have to pass.

let takeThrough predicate list =
    let rec loop acc ... =
        ...
        loop newAcc ...
    loop [] list

Now our definition for loop has to change:

let takeThrough predicate list =
    let rec loop acc listTemp =
        ...
        loop newAcc newListTemp
    loop [] list

Alright, great progress. We just have to apply our operations now. In our case, the newAcc will be the appended list, and the listTemp will be stripped of the first item. Let's get the logic for head in there and work from that.

let takeThrough predicate list =
    let rec loop acc listTemp =
        let head = listTemp |> List.head
        match head |> predicate with
        | true -> ... loop newAcc newList
        | false -> ...
    loop [] list

Perfect! We're almost done, getting newAcc and newList are both easy: newAcc is just List.append acc [head], and newList is just listTemp |> List.tail.

let takeThrough predicate list =
    let rec loop acc listTemp =
        let head = listTemp |> List.head
        match head |> predicate with
        | true -> loop (List.append acc [head]) (sourceTemp |> List.tail)
        | false -> ...
    loop [] list

The last issue is our false condition: what do we do here?

Simple: we just kill the batman return what the newAcc would have been.

let takeThrough predicate list =
    let rec loop acc listTemp =
        let head = listTemp |> List.head
        match head |> predicate with
        | true -> loop (List.append acc [head]) (sourceTemp |> List.tail)
        | false -> List.append acc [head]
   loop [] list

And we've achieved our goal of tail-call recursion. The very last thing in loop is a call to itself. (Remember that in this case, match is the last condition, then one of two things happens: we call loop or we return the new stuff.)

Verifying with ILDASM

If we look at the IL for this method, we'll see the following:

.method assembly static class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T> 
        loop@4<T>(class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<!!T,bool> predicate,
                  class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T> acc,
                  class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T> sourceTemp) cil managed
{
  // Code size       69 (0x45)
  .maxstack  6
  .locals init ([0] !!T head)
  IL_0000:  nop
  IL_0001:  ldarg.2
  IL_0002:  call       !!0 [FSharp.Core]Microsoft.FSharp.Collections.ListModule::Head<!!0>(class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0>)
  IL_0007:  stloc.0
  IL_0008:  ldarg.0
  IL_0009:  ldloc.0
  IL_000a:  callvirt   instance !1 class [FSharp.Core]Microsoft.FSharp.Core.FSharpFunc`2<!!T,bool>::Invoke(!0)
  IL_000f:  brfalse.s  IL_0031
  IL_0011:  ldarg.0
  IL_0012:  ldarg.1
  IL_0013:  ldloc.0
  IL_0014:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0> class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T>::get_Empty()
  IL_0019:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0> class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T>::Cons(!0,
                                                                                                                                                                class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0>)
  IL_001e:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0> [FSharp.Core]Microsoft.FSharp.Core.Operators::op_Append<!!0>(class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0>,
                                                                                                                                                      class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0>)
  IL_0023:  ldarg.2
  IL_0024:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0> [FSharp.Core]Microsoft.FSharp.Collections.ListModule::Tail<!!0>(class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0>)
  IL_0029:  starg.s    sourceTemp
  IL_002b:  starg.s    acc
  IL_002d:  starg.s    predicate
  IL_002f:  br.s       IL_0000
  IL_0031:  ldarg.1
  IL_0032:  ldloc.0
  IL_0033:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0> class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T>::get_Empty()
  IL_0038:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0> class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!T>::Cons(!0,
                                                                                                                                                                class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!0>)
  IL_003d:  tail.
  IL_003f:  call       class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0> [FSharp.Core]Microsoft.FSharp.Core.Operators::op_Append<!!0>(class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0>,
                                                                                                                                                      class [FSharp.Core]Microsoft.FSharp.Collections.FSharpList`1<!!0>)
  IL_0044:  ret
} // end of method List::loop@4

We're only concerned with our recursion:

IL_0000:  nop
...
IL_000f:  brfalse.s  IL_0031
...
IL_002f:  br.s       IL_0000
IL_0031:  ldarg.1
...
IL_003d:  tail.

Just as we hoped, it's simply returning to the beginning of the method instead of calling it again.

Functionally replacing if in F#

Eliminating if from F#

Recently I was asked by a colleague if there were a better way to write a specific method this colleague was using. It was a simple method which called a couple other methods and returned a value from them. Essentially, if two conditions met specific criteria, call one of four other methods. Oh, and it was in F#.

Naturally, as C-style programmers it's easy for us to use if or switch to do what we want, but for some reason when we look at functional languages we cannot seem to reason how we should replace these two constructs with match. It must be trivial, right? We must be missing some silly detail. That's not entirely true. We're not missing anything trivial, we're just not being creative enough.

Functional languages like F# bear the advantage of being very verbose about what's going on. They're also great at implicitly typing things, and making a function read as a mathematical expression. I bolded that for a reason: if we begin to look at our code as a mathematical expression instead of code, we will hopefully see what we're missing.

Let's look at a much reduced sample of the code we were working with:

type ThingType = 
    Left = 0
    | Right = 1

member private this.methodLeftOne =
    true

member private this.methodRightOne =
    false

member private this.methodLeftTwo =
    false

member private this.methodRightTwo =
    true

member this.MatchAndIf var thingType =
    match var with
    | 1 -> if thingType = ThingType.Left then this.methodLeftOne else this.methodRightOne
    | 2 -> if thingType = ThingType.Left then this.methodLeftTwo else this.methodRightTwo
    | _ -> false

My colleague was calling the MatchAndIf function which was to return a boolean value from the two parameters. The code in the other four methods was a bit more complex, but I've simplified it here so we can see how things will turn out.

So, we're looking at a pretty simple bit of code: if var and thingType are 1 and ThingType.Left respectively, return methodLeftOne, if they're 1 and any other ThingType value, return this.methodRightOne, etc. Pretty easy to follow.

We have a slight inconsistency here, however. If thingType is set to a non-valid value, then unexpected (well, unintended) things can happen. This is not so ideal. To fix it with this code would be a mess, now we would have if ... then ... else if ... then ... else .... Sure, that does what we want, but it's really ugly for F#.

Nested match

So, the first thing we might think of to rewrite it is to use a nested match. Alright, easy enough, replace the inner if with a match.

member this.NestedMatch var thingType =
    match var with
    | 1 ->
        match thingType with
        | ThingType.Left -> this.methodLeftOne
        | _ -> this.methodRightOne
    | 2 ->
        match thingType with
        | ThingType.Left -> this.methodLeftTwo
        | _ -> this.methodRightTwo
    | _ -> false

This is obviously more F#-like. It gives us a lot more peace-of-mind, right? But we didn't fix the issue above, so let's do that.

member this.NestedMatchFixed var thingType =
    match var with
    | 1 ->
        match thingType with
        | ThingType.Left -> this.methodLeftOne
        | ThingType.Right -> this.methodRightOne
        | _ -> false
    | 2 ->
        match thingType with
        | ThingType.Left -> this.methodLeftTwo
        | ThingType.Right -> this.methodRightTwo
        | _ -> false
    | _ -> false

Wait a minute, why do we need three default (_) cases? Ah, right, because if 1 or 2 are matched, they won't fall through to the default case, and F# will get very upset if we omit it and implicitly return false. (That's not always a bad thing.)

Tuple match

Well, we might think to ourselves "I can just match on a Tuple instead." Indeed that's true, let's see how that looks.

member this.TupleMatch var thingType =
    match (var, thingType) with
    | (1, ThingType.Left) -> this.methodLeftOne
    | (1, ThingType.Right) -> this.methodRightOne
    | (2, ThingType.Left) -> this.methodLeftTwo
    | (2, ThingType.Right) -> this.methodRightTwo
    | _ -> false

Alright, that's not bad. We've gotten a lot closer to our goal. But now we have things knowing about things they shouldn't. The TupleMatch method does too many things inside it. It's looking for a var of 1 or 2 and a ThingType.

Finally Isolating Everything

The only other thing we can do to fix this (which I can tell you is the best option based on the context of what code I had) is to check thingType in our Match method, and pipe var to our methodLeft or methodRight method (whichever is appropriate).

member private this.methodLeft var =
    match var with
    | 1 -> true
    | _ -> false

member private this.methodRight var =
    match var with
    | 2 -> true
    | _ -> false

member this.FinalMatch var thingType =
    match thingType with
    | ThingType.Left -> var |> this.methodLeft
    | ThingType.Right -> var |> this.methodRight
    | _ -> false

Now each method is only responsible for checking and reporting the parts it cares about. We complied with SRP and we kept it entirely functional. Each method is responsible for looking at only the code it cares about, it's not worried about what the next method down the chain is doing.

C# Expression-Bodied Members

Using Expression-Bodied Members in C#6.0

So it's been a while since I've written a blog post (far too long if you ask me). I'll not go into too much detail, but I've been busy doing some life things. I recently changed jobs, moved, and started working on a great deal of other projects that just consume all my time, and I haven't had time to share any of my thoughts recently.

With that said, I do have something I wish to talk about at the moment, and that is a new feature found in C#6.0: Expression-Bodied Members.

This feature, in my opinion, is one of the more useful features of the recent language update. It opens a great deal of doors and has shortened a lot of my code substantially, which is always good. (For some reason, we strive to keep the number of characters/lines on our source code to a minimum.) It offers us the ability to take certain methods and properties, which would have otherwise taken up a fair amount of space, and shrink them down to a much more manageable level.

However, before we talk about this feature, we need to talk about all the parts of it. So let's begin with the most important part of an expression-bodied member: the expression.

What is an expression?

We're going to take a definition from webopedia (see http://www.webopedia.com/TERM/E/expression.html) which should sum it up fairly well:

In programming, an expression is any legal combination of symbols that represents a value.

This is a pretty basic definition, but should work to serve our needs quite well. When we use the term expression we mean a fairly simple set of instructions that return a value. Do note: I bolded "return a value" for a reason. If the set of instructions do not return a value, then they are not an expression, and cannot make up an expression-bodied member. (We'll see this isn't entirely true shortly.)

A few examples of what we mean by expressions when discussing them in C#:

2 + 5
x * 4
"EBrown"
name ?? "No Name"
person.ToString()

The last one might surprise you - yes, calling .ToString() on something is an expression, as .ToString() returns a value. The second-to-last might surprise you as well, but it's pretty simple: if name is null, then the expression evaluates to "No Name".

Do note: expressions don't have to return a non-null value. The person.ToString() call above could potentially return a null value, and that's perfectly OK. It's still an expression. The value itself has no bearing on the definition of the term, only whether or not something actually returns a valid value. This is an important concept to bear in mind, as not all of our expressions have non-null values.

So what do expressions usually look like in C#?

Any series of tokens that follow a return token, up to the first subsequent semicolon (;) are an expression. So for all of our examples above, they might look like:

return 2 + 5;
return x * 4;
return "EBrown";
return name ?? "No Name";
return person.ToString();

Mind you, expressions don't have to follow a return statement. Consider that 2+5 is still an expression in the following example:

var myString = "Some string " + (2 + 5).ToString() + " with numbers in it.";

But we also have several other expressions:

2 + 5
(2 + 5).ToString()
"Some string " + (2 + 5).ToString() + " with numbers in it."

All three of these are expressions, in their own right. All three of them also contain other expressions. This is an important concept to understand, in the expression 2 + 5, there are actually two other expressions: 2 and 5. These are both expressions as well.

What does an expression-bodied member look like, in C#?

There are two types of expression-bodied members in C#:

  • Expression-bodied readonly properties;
  • Expression-bodied methods;

Both types of expression-bodied members in C# look something like the following:

member => expression;

It's very simple, you provide a member, the "lambda" sign (=>), and an expression. You can also provide the standard member modifiers in the member itself as well. (Access modifiers, attributes, etc.) It's a regular member, it just uses a different body syntax. You should note that there are no braces in play, it's just a member and expression.

An example of a C# expression-bodied member:

public override string ToString() => $"Name: {Name}";

Note that there is no return statement. An expression-bodied member always returns a value. (Except in the case of void methods.) That is why we just talked about expressions to such detail. We need something to return. Something to give back.

What about void methods?

You can still use an expression-bodied member in a void method, it simply has to have a void return type, or be a disposable call. The following code is completely valid C#6.0:

public void MethodA() { }
public void MethodB() => MethodA(); // `MethodA()` is `void`, and `MethodB()` is `void`
public string MethodC() { return "MethodC"; }
public void MethodD() => MethodC(); // The result of `MethodC()` is disposable

The following is invalid:

public void MethodA() { }
public string MethodB() => MethodA(); // The expression returns a `void` type, but a `string` is expected
public void MethodC() => "MethodC"; // The expression returns a `void` type, but the value is not disposable

An expression-bodied member with a return type can mentally be rewritten as:

member { return expression; }

An expression-bodied member with a void return type can mentally be rewritten as:

member { expression; }

How do I use expression-bodied members?

It's pretty simple to use an expression-bodied member. You have two options: an expression-bodied readonly property, and an expression-bodied method. Both of these are trivial to use, the only difference is a minor issue in syntax.

Just like a normal readonly property, and expression-bodied readonly property only has a get-method within it. The difference is syntax. As we may recall, a normal readonly property may look something like:

public double TotalPrice { get { return Quantity * Price; } }

To convert this to an expression-bodied member, we simply replace the getter with an expression of the previous syntax:

public double TotalPrice => Quantity * Price;

This is a property bodied by an expression. You'll note that there are no parameters passed, so as far as other code is concerned it's treated just like a normal property, that only has a getter.

The only difference between a method and a property being bodied by an expression is that a method has the requisite parenthesis in the definition.

public override string ToString() { return $"Price: {Price}, Quantity: {Quantity}"; }

As a method, could be rewritten as:

public override string ToString() => $"Price: {Price}, Quantity: {Quantity}";

As you can see, in both cases we omitted the return altogether. Properties and methods that specify a non-void return type implicitly return whatever the result of the expression is.

A real life example of the benefits of expression-bodied members

For this I'm going to use a partial copy of a class I wrote for a C# library I'm working on. (I'm omitting all the comments and attributes, for brevity.)

This is (most of) a Rectangle class I wrote in a drawing library (for a clone of Windows Forms for XNA). This first version is the version without expression bodied members at all. You can find the most recent version at: https://github.com/EBrown8534/Framework/blob/master/Evbpc.Framework/Drawing/Rectangle.cs

public struct Rectangle
{
    public Rectangle(Point location, Size size)
    {
        Location = location;
        Size = size;
    }

    public int Bottom { get { return Location.Y + Size.Height; } }
    public bool IsEmpty { get { return this == Empty; } }
    public int Left { get { return Location.X; } }
    public Point Location { get; }
    public int Right { get { return Location.X + Size.Width; } }
    public Size Size { get; }
    public int Top { get { return Location.X; } }

    public override bool Equals(object obj) { return obj is Rectangle && (Rectangle)obj == this; }
    public override int GetHashCode() { return base.GetHashCode(); }

    public override string ToString()
    {
        return $"({Location.X},{Location.Y},{Size.Width},{Size.Height})";
    }

    public static bool operator ==(Rectangle left, Rectangle right)
    {
        return left.Location == right.Location && left.Size == right.Size;
    }

    public static bool operator !=(Rectangle left, Rectangle right)
    {
        return left.Location != right.Location || left.Size != right.Size;
    }

    public static readonly Rectangle Empty = new Rectangle(0, 0, 0, 0);
}

Pretty simple, right? I'm not going to discuss any of the other C#6.0 features that I've used, just know that there are some.

Now, let's see what this looks like if we replace all the smaller methods with expressions.

public struct Rectangle
{
    public Rectangle(Point location, Size size)
    {
        Location = location;
        Size = size;
    }

    public int Bottom => Location.Y + Size.Height;
    public bool IsEmpty => this == Empty;
    public int Left => Location.X;
    public Point Location { get; }
    public int Right => Location.X + Size.Width;
    public Size Size { get; }
    public int Top => Location.Y;

    public override bool Equals(object obj) => obj is Rectangle && (Rectangle)obj == this;
    public override int GetHashCode() => base.GetHashCode();
    public override string ToString() => $"({Location.X},{Location.Y},{Size.Width},{Size.Height})";
    public static bool operator ==(Rectangle left, Rectangle right) => left.Location == right.Location && left.Size == right.Size;
    public static bool operator !=(Rectangle left, Rectangle right) => left.Location != right.Location || left.Size != right.Size;

    public static readonly Rectangle Empty = new Rectangle(0, 0, 0, 0);
}

A little cleaner, yes? The horizontal space of our code has been significantly reduced for most of the methods and properties. A lot of that clutter is now gone.

Limitations of Expresison-Bodied Members

One of the major limitations of expression-bodied members is exception throwing. Exceptions cannot be thrown directly from an expression-bodied member. You can still do things that would throw exceptions, but you cannot actually throw anything. This is due to the fact that throw ... is a statement, rather than an expression.

See this Stack Overflow question and answer for more information on this limitation.

DO's and DON'Ts of Expression-Bodied Members

Here are a few of the general do's and don'ts I use when determining if I can use an expression-bodied member:

  • DO use expression-bodied members on non-auto-implemented readonly properties

    • This helps reduce clutter in code and makes the intention much more explicit. It allows future programmers to see that the property was meant to be explicitly readonly, and that a set clause should never appear for it.


  • DON'T use expression-bodied members on static readonly fields (Empty, etc.)

    • Any static readonly fields should be simple values, which should never change. By rewriting them as expression-bodied members, these simple fields are now properties, and as such slightly more overhead is attributed to them. (Especially in the case of Empty fields.)


  • DO use expression-bodied members on methods with simple return statements

    • Methods that have a single return statement written as expression-bodied methods allow the programmer to be completely explicit about the intention of the method.


  • DON'T use expression-bodied members when the expression contains multiple ternary or null-coalescing operators

    • Expression-bodied members may be used when one of either (or one of both) is found, but should not be used if more than one of either of these is found. This creates confusion and makes debugging the method much more difficult.

And the last one, which you may or may not want to adopt (I have):

  • DON'T use expression-bodied members on void methods, period

    • In the case of void methods, an expression-bodied method is misleading. It tends to hint at the idea that something should be returned (as expressions should always return a value) when in fact nothing is to be returned, by design. It creates confusion among developers.

SQL Server Datatypes: How to avoid VarChar

I've seen, time and time again, programmers make many of the same mistakes regarding their SQL datatypes, and one of them is to use VarChar for almost everything. I've seen it so many times that if I had a nickel for each time I saw it, well, let's just say my McLaren P1 would be yellow.

Why do people use VarChar so much?

Well, to be honest, it's easy. We, as people, are generally lazy, and it's easy to store anything in a VarChar(50), or worse, a VarChar(MAX)! Why is this a bad thing? Well for some data, it's not, but for others, it's just not the best option. As developers and programmers, we almost always have a choice as to how we should store our data, and sometimes, it's easy to make an inefficient one.

Let's take a solid example. I was over on Stack Overflow one day, and I noticed a developer doing something odd: the developer was storing an IP address (we'll assume IPv4 of 192.168.0.1 which is a pretty common IP for default gateways in small home and office networks) in a VarChar or a Char field. I'm not sure on the precision of it, or which it was (as the developer left out the DDL), but for sake of argument let's assume it was the smallest precision required to store any IP Address, and as such a VarChar(15).

The developer, much like the rest of us, was trying to find a way to shrink the amount of data used down. So, the developer proposed the suggestion of, instead of store 1.1.1.1, we'll just omit all the characters except the last two (in this example: .1), and keep the fourth octet in the database. The downfall of this is quite obvious: we now have no way of distinguishing whether our value is 1.1.1.1, 2.2.2.1, 3.3.3.1 or any other repeated value. But, there's a better way.

Let's take a peek at what we know at this point:
  1. The data being stored is binary data;
  2. It's being stored in a string field;
  3. The maximum length on the string field is 15 characters;
Now this doesn't just apply to IP Addresses, it also applies to hashes, encrypted data and other binary objects.

At first glance this might not seem so bad. The IP Address as a string is 192.168.0.1. The maximum data-size is going to be 17 bytes, as the VarChar type takes one byte per character, and two bytes of overhead. The size for our specific address is 12, by the same math. The developer took the time to address the issue of fitting the data within the seemingly smallest datatype possible. But what did the developer forget?

First, we're trying to store binary data. The smallest way to store this (at least in string format) is either in hexadecimal or Base64 encoding. Let's assume we use hexadecimal (it really doesn't matter either way). We're storing data that is four bytes, which means we need eight characters. Our example leaves us with 0xC0A80001 or, for short: C0A80001. So, this alone allows us to reduce our maximum storage space to almost half it's original size, and our utilized space (for this example) to 10 bytes from 12. With just one quick optimization we converted our 15-character string to an 8-character hexadecimal string. Now that we know that, we can make another optimization and change it to a Char(8) type. This reduces two more bytes of overhead, and leaves our example at a cool 8 bytes of storage space.

But, we're forgetting one small thing: SQL Server (at least, Microsoft SQL Server) has a Binary type. Much like the Char type, the Binary type has a fixed size. The difference is that the Binary type can store raw byte data. It takes a length, just like the Char does, so in our case, it would be Binary(4) (to store four bytes for one IPv4 address). The binary type will only store the raw data for the address, so we're left with:
  1. Byte 1: 0xC0
  2. Byte 2: 0xA8
  3. Byte 3: 0x00
  4. Byte 4: 0x01
Microsoft SQL Server also has a VarBinary type which works just like the VarChar type. It supports the same size limits: 1-8000 or MAX. It also requires two bytes of overhead for each row, just like a VarChar type.

The nice thing about using a Binary type for this field, is that it allows us to save a significant amount of space. By optimizing this field, we've saved 11 bytes of storage per row. How significant is that? If we had 500,000,000 we've saved 5.5GB of data. (And for big-data applications, 500,000,000 rows is insignificant.)

You might say, "well my application is small data, 500,000,000 rows is a pretty significant number, and 5.5GB for that many records is small." While that may be true, this is just one field we've optimized.

The DateTime example

Let's take another example: I've seen a lot of people use the VarChar type for DateTime data as well, when it's completely unnecessary. The SQL Server has several types for DateTime data, the more useful being DateTime, DateTime2, and DateTimeOffset. Microsoft recommends that you no longer use DateTime for new work, as the DateTime2 and DateTimeOffset types align with the SQL standard, and are more portable. The DateTime2 and DateTimeOffset fields also have better precision and a larger range.

Why is this so important? You can just as easily store a as a string in a VarChar field, and then parse it later. The problem with that is that you can't filter quite so easily for certain criteria. It's easy (at least with a DateTime2 field) to filter for dates within a certain range, on a certain date, etc. It's less intuitive with any string type.

The other problem is less obvious: with a VarChar type, there is no validation done that guarantees the input string is a DateTime string. This means it's up to whatever logic you have manipulating the database to make this guarantee.

What about the NVarChar and NChar types?

I've not discussed these so far because we were talking about binary data, which in most any form is stored in some ASCII or raw form. These types (NVarChar and NChar) are Unicode (UTF-16, specifically) variants of the VarChar and Char types, respectively. These types take two bytes per character, with the variable-length type taking an extra two bytes of overhead. In our example, were the first field type an NVarChar(15) it would have taken up to 32 bytes of data. (As 30 bytes for the 15 characters plus two bytes of overhead.) The specifiable sizes for these two fields are any integers in the range 1-4000, or MAX for NVarChar.

What do the numbers in parenthesis represent?

Many fields have an optional size, precision or other parameter to represent different amounts and forms of data that can be stored within them. For all fields we're discussing in this article, the parenthesis represent how many characters (for the Char, VarChar, NChar and NVarChar types), or how many bytes (for the Binary and VarBinary types) the field can store.

What are the VarChar, NVarChar and VarBinary types doing internally?

All three of these types work in a very specific way, internally. You can see that the maximum size any of the three of them can take is up to 8000 bytes, but what does that mean?

Internally, in Microsoft SQL Server, the variable length fields (which have the optional MAX specification) store data in one of two ways:
  1. For data that fits within 8000 bytes, the data is stored in-row;
  2. For data greater than 8000 bytes, the data is stored out-of-row and a pointer to the data is stored in-row;
This should help clarify what the server is doing, and what the specifications mean, and why I always cringe when I see VarChar(MAX) or NVarChar(MAX), in a situation that doesn't call for it.

In summation:

As always: know your data, know your users, and most of all, know your environment.

Visual C++: Bug with constant arithmetic loops

I was working with Visual C++ for another article I'm preparing, and I noticed an odd bug with the const modifier in Visual C++.

The following code demonstrates the issue:

#include "stdafx.h"
#include <stdio.h>
#include <Windows.h>

#define ITERATIONS 500000
#define GET_START_TIME QueryPerformanceCounter(&StartingTime);
#define GET_END_TIME QueryPerformanceCounter(&EndingTime);
#define CALC_DIFF_TIME ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart; ElapsedMicroseconds.QuadPart *= 1000000; ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;

int main()
{
    short results[ITERATIONS];
    const int n = 5;
    int m = 5;
    LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds;
    LARGE_INTEGER Frequency;
	
    QueryPerformanceFrequency(&Frequency);

    // This loop seems to take about 1400 us on my computer.
    printf("Beginning loop over %i iterations with n constant.\n", ITERATIONS);

    GET_START_TIME;

    for (int i = 0; i < ITERATIONS; i++)
    {
        int statement = i % 10;

        if (statement == 0)
            results[i] = n * 0;
        else if (statement == 4)
            results[i] = n * 4;
        else if (statement == 2)
            results[i] = n * 2;
        else if (statement == 5)
            results[i] = n * 5;
        else if (statement == 7)
            results[i] = n * 7;
        else if (statement == 6)
            results[i] = n * 6;
        else if (statement == 1)
            results[i] = n * 1;
        else if (statement == 3)
            results[i] = n * 3;
        else if (statement == 9)
            results[i] = n * 9;
        else if (statement == 8)
            results[i] = n * 8;
    }

    GET_END_TIME;
    CALC_DIFF_TIME;

    printf("Finished in %lld us.\n", ElapsedMicroseconds.QuadPart);

    // This one takes about 800 us on my computer.
    printf("Beginning loop over %i iterations with m variable.\n", ITERATIONS);

    GET_START_TIME;

    for (int i = 0; i < ITERATIONS; i++)
    {
        int statement = i % 10;

        if (statement == 0)
            results[i] = m * 0;
        else if (statement == 4)
            results[i] = m * 4;
        else if (statement == 2)
            results[i] = m * 2;
        else if (statement == 5)
            results[i] = m * 5;
        else if (statement == 7)
            results[i] = m * 7;
        else if (statement == 6)
            results[i] = m * 6;
        else if (statement == 1)
            results[i] = m * 1;
        else if (statement == 3)
            results[i] = m * 3;
        else if (statement == 9)
            results[i] = m * 9;
        else if (statement == 8)
            results[i] = m * 8;
    }

    GET_END_TIME;
    CALC_DIFF_TIME;

    printf("Finished in %lld us.\n", ElapsedMicroseconds.QuadPart);

    getchar();

    return 0;
}

Essentially, if I use a constant (declared in the method) to multiply against for the if blocks, it takes 175% of the time to run through the loops than if I use a regular variable.

I'm no expert on the subject, but this doesn't seem to be the expected behavior.

If anyone has any ideas on it, I'm all ears. Otherwise, I'm just going to sum it all up in that it's a bug with the compiler or execution runtime.


Additional investigation has revealed the following:

If the short array is replaced with an int array, and the number of ITERATIONS is halved, then both loops take the same amount of time. It seems the issue is somewhere with the assignment of the second arithmetic result to a short array is faster than assigning it to an int.

Update:

As it turned out, after inspecting the .asm file, the loops were being optimized because results was never used. This caused the body of the loops to be removed, and the only operation remaining was the i % 10 operation, which was slightly different for each loop.

As Hans Passant said on Stack Overflow:

Looking at the machine code is important to see what is happening. Very little of your code remains after the optimizer is done with it, the result[] assignments are all removed since they don't have any observable side-effects and the n and m identifiers never get used. All that remains is the code for i % 10. Which is optimized to a multiplication, much faster on Intel cores. It uses two different strategies for some reason, one is signed and the other is unsigned. You are seeing that the unsigned version is slightly faster. - Hans Passant, 13 Nov 2015

I guess it goes to show: you can never depend on the compiler doing exactly what you think it does.

On GitHub as promised.

Download: Constant Arithmetic Bug (13-11-2015).zip (232.1KB)

About this Blog

What is Using Programming?

Using Programming is a blog about some of the hidden features of various languages, not-so-obvious optimization strategies, and other ways you can take advantage of various languages and their particular gems. This blog is not exclusive to any one language or framework, I'm going to cover things based on what I run into in my day-to-day work with various languages.

There will probably be a higher quantity of .NET (Visual Basic, C#, ASP.NET) and JavaScript posts simply because that's what my full-time job is in, and what my pet projects are in, but never-fear! I will be making posts on all languages I run into.

Why was the name "Using Programming" chosen? 

The name Using Programming is a two part name. First, it's a play on the C# style of including additional types from additional namespaces in your code. Second, it stands for the ideal of this blog: to help developers get the most out of their programming experience. 

So how do I take the most out of Using Programming?

The best way to use this blog as a resource is to simply try and experiment with what concepts I am drawing out. Everything I run into and blog about I will attach source-code for, so that you may try the exact same experiments that I have done, to help you see exactly how these things work. Some of the optimization strategies I will be going into may be of significant importance to you, and as such you may find the source-code much more usable.

What can I expect to see on Using Programming?

I'm going to try to follow a few guidelines here on Using Programming:
  • All posts will have a summary at the top to indicate a little bit about the topic;
  • All posts related to a language feature will include a digression on what problem the feature is designed to solve, and why it needs solving;

How is the source code licensed?

I will be placing all source-code on GitHub under the MIT license. You may do anything you wish with it and redistribute it at your heart's content. The only request I make is that you include credit where credit is due.

How often is Using Programming updated?

I'll be attempting to make posts at least once-a-week to keep users informed on all the things I've run into. Do note, however, that I may not be able to guarantee a post each week, so don't fret if you don't see a post for a week or two, I promise, I'm still around.

Where can I find examples?

Source code for all articles can be found over on GitHub. You are free to use them to your hearts content, and may do anything you wish with them. I don't guarantee that they will be following best practices, though I do guarantee they cover the text of the article they represent fully.