using Programming;

A Blog about some of the intrinsics related to programming and how one can get the best out of various languages.

C# Expression-Bodied Members

Using Expression-Bodied Members in C#6.0

So it's been a while since I've written a blog post (far too long if you ask me). I'll not go into too much detail, but I've been busy doing some life things. I recently changed jobs, moved, and started working on a great deal of other projects that just consume all my time, and I haven't had time to share any of my thoughts recently.

With that said, I do have something I wish to talk about at the moment, and that is a new feature found in C#6.0: Expression-Bodied Members.

This feature, in my opinion, is one of the more useful features of the recent language update. It opens a great deal of doors and has shortened a lot of my code substantially, which is always good. (For some reason, we strive to keep the number of characters/lines on our source code to a minimum.) It offers us the ability to take certain methods and properties, which would have otherwise taken up a fair amount of space, and shrink them down to a much more manageable level.

However, before we talk about this feature, we need to talk about all the parts of it. So let's begin with the most important part of an expression-bodied member: the expression.

What is an expression?

We're going to take a definition from webopedia (see http://www.webopedia.com/TERM/E/expression.html) which should sum it up fairly well:

In programming, an expression is any legal combination of symbols that represents a value.

This is a pretty basic definition, but should work to serve our needs quite well. When we use the term expression we mean a fairly simple set of instructions that return a value. Do note: I bolded "return a value" for a reason. If the set of instructions do not return a value, then they are not an expression, and cannot make up an expression-bodied member. (We'll see this isn't entirely true shortly.)

A few examples of what we mean by expressions when discussing them in C#:

2 + 5
x * 4
"EBrown"
name ?? "No Name"
person.ToString()

The last one might surprise you - yes, calling .ToString() on something is an expression, as .ToString() returns a value. The second-to-last might surprise you as well, but it's pretty simple: if name is null, then the expression evaluates to "No Name".

Do note: expressions don't have to return a non-null value. The person.ToString() call above could potentially return a null value, and that's perfectly OK. It's still an expression. The value itself has no bearing on the definition of the term, only whether or not something actually returns a valid value. This is an important concept to bear in mind, as not all of our expressions have non-null values.

So what do expressions usually look like in C#?

Any series of tokens that follow a return token, up to the first subsequent semicolon (;) are an expression. So for all of our examples above, they might look like:

return 2 + 5;
return x * 4;
return "EBrown";
return name ?? "No Name";
return person.ToString();

Mind you, expressions don't have to follow a return statement. Consider that 2+5 is still an expression in the following example:

var myString = "Some string " + (2 + 5).ToString() + " with numbers in it.";

But we also have several other expressions:

2 + 5
(2 + 5).ToString()
"Some string " + (2 + 5).ToString() + " with numbers in it."

All three of these are expressions, in their own right. All three of them also contain other expressions. This is an important concept to understand, in the expression 2 + 5, there are actually two other expressions: 2 and 5. These are both expressions as well.

What does an expression-bodied member look like, in C#?

There are two types of expression-bodied members in C#:

  • Expression-bodied readonly properties;
  • Expression-bodied methods;

Both types of expression-bodied members in C# look something like the following:

member => expression;

It's very simple, you provide a member, the "lambda" sign (=>), and an expression. You can also provide the standard member modifiers in the member itself as well. (Access modifiers, attributes, etc.) It's a regular member, it just uses a different body syntax. You should note that there are no braces in play, it's just a member and expression.

An example of a C# expression-bodied member:

public override string ToString() => $"Name: {Name}";

Note that there is no return statement. An expression-bodied member always returns a value. (Except in the case of void methods.) That is why we just talked about expressions to such detail. We need something to return. Something to give back.

What about void methods?

You can still use an expression-bodied member in a void method, it simply has to have a void return type, or be a disposable call. The following code is completely valid C#6.0:

public void MethodA() { }
public void MethodB() => MethodA(); // `MethodA()` is `void`, and `MethodB()` is `void`
public string MethodC() { return "MethodC"; }
public void MethodD() => MethodC(); // The result of `MethodC()` is disposable

The following is invalid:

public void MethodA() { }
public string MethodB() => MethodA(); // The expression returns a `void` type, but a `string` is expected
public void MethodC() => "MethodC"; // The expression returns a `void` type, but the value is not disposable

An expression-bodied member with a return type can mentally be rewritten as:

member { return expression; }

An expression-bodied member with a void return type can mentally be rewritten as:

member { expression; }

How do I use expression-bodied members?

It's pretty simple to use an expression-bodied member. You have two options: an expression-bodied readonly property, and an expression-bodied method. Both of these are trivial to use, the only difference is a minor issue in syntax.

Just like a normal readonly property, and expression-bodied readonly property only has a get-method within it. The difference is syntax. As we may recall, a normal readonly property may look something like:

public double TotalPrice { get { return Quantity * Price; } }

To convert this to an expression-bodied member, we simply replace the getter with an expression of the previous syntax:

public double TotalPrice => Quantity * Price;

This is a property bodied by an expression. You'll note that there are no parameters passed, so as far as other code is concerned it's treated just like a normal property, that only has a getter.

The only difference between a method and a property being bodied by an expression is that a method has the requisite parenthesis in the definition.

public override string ToString() { return $"Price: {Price}, Quantity: {Quantity}"; }

As a method, could be rewritten as:

public override string ToString() => $"Price: {Price}, Quantity: {Quantity}";

As you can see, in both cases we omitted the return altogether. Properties and methods that specify a non-void return type implicitly return whatever the result of the expression is.

A real life example of the benefits of expression-bodied members

For this I'm going to use a partial copy of a class I wrote for a C# library I'm working on. (I'm omitting all the comments and attributes, for brevity.)

This is (most of) a Rectangle class I wrote in a drawing library (for a clone of Windows Forms for XNA). This first version is the version without expression bodied members at all. You can find the most recent version at: https://github.com/EBrown8534/Framework/blob/master/Evbpc.Framework/Drawing/Rectangle.cs

public struct Rectangle
{
    public Rectangle(Point location, Size size)
    {
        Location = location;
        Size = size;
    }

    public int Bottom { get { return Location.Y + Size.Height; } }
    public bool IsEmpty { get { return this == Empty; } }
    public int Left { get { return Location.X; } }
    public Point Location { get; }
    public int Right { get { return Location.X + Size.Width; } }
    public Size Size { get; }
    public int Top { get { return Location.X; } }

    public override bool Equals(object obj) { return obj is Rectangle && (Rectangle)obj == this; }
    public override int GetHashCode() { return base.GetHashCode(); }

    public override string ToString()
    {
        return $"({Location.X},{Location.Y},{Size.Width},{Size.Height})";
    }

    public static bool operator ==(Rectangle left, Rectangle right)
    {
        return left.Location == right.Location && left.Size == right.Size;
    }

    public static bool operator !=(Rectangle left, Rectangle right)
    {
        return left.Location != right.Location || left.Size != right.Size;
    }

    public static readonly Rectangle Empty = new Rectangle(0, 0, 0, 0);
}

Pretty simple, right? I'm not going to discuss any of the other C#6.0 features that I've used, just know that there are some.

Now, let's see what this looks like if we replace all the smaller methods with expressions.

public struct Rectangle
{
    public Rectangle(Point location, Size size)
    {
        Location = location;
        Size = size;
    }

    public int Bottom => Location.Y + Size.Height;
    public bool IsEmpty => this == Empty;
    public int Left => Location.X;
    public Point Location { get; }
    public int Right => Location.X + Size.Width;
    public Size Size { get; }
    public int Top => Location.Y;

    public override bool Equals(object obj) => obj is Rectangle && (Rectangle)obj == this;
    public override int GetHashCode() => base.GetHashCode();
    public override string ToString() => $"({Location.X},{Location.Y},{Size.Width},{Size.Height})";
    public static bool operator ==(Rectangle left, Rectangle right) => left.Location == right.Location && left.Size == right.Size;
    public static bool operator !=(Rectangle left, Rectangle right) => left.Location != right.Location || left.Size != right.Size;

    public static readonly Rectangle Empty = new Rectangle(0, 0, 0, 0);
}

A little cleaner, yes? The horizontal space of our code has been significantly reduced for most of the methods and properties. A lot of that clutter is now gone.

Limitations of Expression-Bodied Members

One of the major limitations of expression-bodied members is exception throwing. Exceptions cannot be thrown directly from an expression-bodied member. You can still do things that would throw exceptions, but you cannot actually throw anything. This is due to the fact that throw ... is a statement, rather than an expression.

See this Stack Overflow question and answer for more information on this limitation.

DO's and DON'Ts of Expression-Bodied Members

Here are a few of the general do's and don'ts I use when determining if I can use an expression-bodied member:

  • DO use expression-bodied members on non-auto-implemented readonly properties

    • This helps reduce clutter in code and makes the intention much more explicit. It allows future programmers to see that the property was meant to be explicitly readonly, and that a set clause should never appear for it.


  • DON'T use expression-bodied members on static readonly fields (Empty, etc.)

    • Any static readonly fields should be simple values, which should never change. By rewriting them as expression-bodied members, these simple fields are now properties, and as such slightly more overhead is attributed to them. (Especially in the case of Empty fields.)


  • DO use expression-bodied members on methods with simple return statements

    • Methods that have a single return statement written as expression-bodied methods allow the programmer to be completely explicit about the intention of the method.


  • DON'T use expression-bodied members when the expression contains multiple ternary or null-coalescing operators

    • Expression-bodied members may be used when one of either (or one of both) is found, but should not be used if more than one of either of these is found. This creates confusion and makes debugging the method much more difficult.

And the last one, which you may or may not want to adopt (I have):

  • DON'T use expression-bodied members on void methods, period

    • In the case of void methods, an expression-bodied method is misleading. It tends to hint at the idea that something should be returned (as expressions should always return a value) when in fact nothing is to be returned, by design. It creates confusion among developers.

SQL Server Datatypes: How to avoid VarChar

I've seen, time and time again, programmers make many of the same mistakes regarding their SQL datatypes, and one of them is to use VarChar for almost everything. I've seen it so many times that if I had a nickel for each time I saw it, well, let's just say my McLaren P1 would be yellow.

Why do people use VarChar so much?

Well, to be honest, it's easy. We, as people, are generally lazy, and it's easy to store anything in a VarChar(50), or worse, a VarChar(MAX)! Why is this a bad thing? Well for some data, it's not, but for others, it's just not the best option. As developers and programmers, we almost always have a choice as to how we should store our data, and sometimes, it's easy to make an inefficient one.

Let's take a solid example. I was over on Stack Overflow one day, and I noticed a developer doing something odd: the developer was storing an IP address (we'll assume IPv4 of 192.168.0.1 which is a pretty common IP for default gateways in small home and office networks) in a VarChar or a Char field. I'm not sure on the precision of it, or which it was (as the developer left out the DDL), but for sake of argument let's assume it was the smallest precision required to store any IP Address, and as such a VarChar(15).

The developer, much like the rest of us, was trying to find a way to shrink the amount of data used down. So, the developer proposed the suggestion of, instead of store 1.1.1.1, we'll just omit all the characters except the last two (in this example: .1), and keep the fourth octet in the database. The downfall of this is quite obvious: we now have no way of distinguishing whether our value is 1.1.1.1, 2.2.2.1, 3.3.3.1 or any other repeated value. But, there's a better way.

Let's take a peek at what we know at this point:
  1. The data being stored is binary data;
  2. It's being stored in a string field;
  3. The maximum length on the string field is 15 characters;
Now this doesn't just apply to IP Addresses, it also applies to hashes, encrypted data and other binary objects.

At first glance this might not seem so bad. The IP Address as a string is 192.168.0.1. The maximum data-size is going to be 17 bytes, as the VarChar type takes one byte per character, and two bytes of overhead. The size for our specific address is 12, by the same math. The developer took the time to address the issue of fitting the data within the seemingly smallest datatype possible. But what did the developer forget?

First, we're trying to store binary data. The smallest way to store this (at least in string format) is either in hexadecimal or Base64 encoding. Let's assume we use hexadecimal (it really doesn't matter either way). We're storing data that is four bytes, which means we need eight characters. Our example leaves us with 0xC0A80001 or, for short: C0A80001. So, this alone allows us to reduce our maximum storage space to almost half it's original size, and our utilized space (for this example) to 10 bytes from 12. With just one quick optimization we converted our 15-character string to an 8-character hexadecimal string. Now that we know that, we can make another optimization and change it to a Char(8) type. This reduces two more bytes of overhead, and leaves our example at a cool 8 bytes of storage space.

But, we're forgetting one small thing: SQL Server (at least, Microsoft SQL Server) has a Binary type. Much like the Char type, the Binary type has a fixed size. The difference is that the Binary type can store raw byte data. It takes a length, just like the Char does, so in our case, it would be Binary(4) (to store four bytes for one IPv4 address). The binary type will only store the raw data for the address, so we're left with:
  1. Byte 1: 0xC0
  2. Byte 2: 0xA8
  3. Byte 3: 0x00
  4. Byte 4: 0x01
Microsoft SQL Server also has a VarBinary type which works just like the VarChar type. It supports the same size limits: 1-8000 or MAX. It also requires two bytes of overhead for each row, just like a VarChar type.

The nice thing about using a Binary type for this field, is that it allows us to save a significant amount of space. By optimizing this field, we've saved 11 bytes of storage per row. How significant is that? If we had 500,000,000 we've saved 5.5GB of data. (And for big-data applications, 500,000,000 rows is insignificant.)

You might say, "well my application is small data, 500,000,000 rows is a pretty significant number, and 5.5GB for that many records is small." While that may be true, this is just one field we've optimized.

The DateTime example

Let's take another example: I've seen a lot of people use the VarChar type for DateTime data as well, when it's completely unnecessary. The SQL Server has several types for DateTime data, the more useful being DateTime, DateTime2, and DateTimeOffset. Microsoft recommends that you no longer use DateTime for new work, as the DateTime2 and DateTimeOffset types align with the SQL standard, and are more portable. The DateTime2 and DateTimeOffset fields also have better precision and a larger range.

Why is this so important? You can just as easily store a as a string in a VarChar field, and then parse it later. The problem with that is that you can't filter quite so easily for certain criteria. It's easy (at least with a DateTime2 field) to filter for dates within a certain range, on a certain date, etc. It's less intuitive with any string type.

The other problem is less obvious: with a VarChar type, there is no validation done that guarantees the input string is a DateTime string. This means it's up to whatever logic you have manipulating the database to make this guarantee.

What about the NVarChar and NChar types?

I've not discussed these so far because we were talking about binary data, which in most any form is stored in some ASCII or raw form. These types (NVarChar and NChar) are Unicode (UTF-16, specifically) variants of the VarChar and Char types, respectively. These types take two bytes per character, with the variable-length type taking an extra two bytes of overhead. In our example, were the first field type an NVarChar(15) it would have taken up to 32 bytes of data. (As 30 bytes for the 15 characters plus two bytes of overhead.) The specifiable sizes for these two fields are any integers in the range 1-4000, or MAX for NVarChar.

What do the numbers in parenthesis represent?

Many fields have an optional size, precision or other parameter to represent different amounts and forms of data that can be stored within them. For all fields we're discussing in this article, the parenthesis represent how many characters (for the Char, VarChar, NChar and NVarChar types), or how many bytes (for the Binary and VarBinary types) the field can store.

What are the VarChar, NVarChar and VarBinary types doing internally?

All three of these types work in a very specific way, internally. You can see that the maximum size any of the three of them can take is up to 8000 bytes, but what does that mean?

Internally, in Microsoft SQL Server, the variable length fields (which have the optional MAX specification) store data in one of two ways:
  1. For data that fits within 8000 bytes, the data is stored in-row;
  2. For data greater than 8000 bytes, the data is stored out-of-row and a pointer to the data is stored in-row;
This should help clarify what the server is doing, and what the specifications mean, and why I always cringe when I see VarChar(MAX) or NVarChar(MAX), in a situation that doesn't call for it.

In summation:

As always: know your data, know your users, and most of all, know your environment.

Visual C++: Bug with constant arithmetic loops

I was working with Visual C++ for another article I'm preparing, and I noticed an odd bug with the const modifier in Visual C++.

The following code demonstrates the issue:

#include "stdafx.h"
#include <stdio.h>
#include <Windows.h>

#define ITERATIONS 500000
#define GET_START_TIME QueryPerformanceCounter(&StartingTime);
#define GET_END_TIME QueryPerformanceCounter(&EndingTime);
#define CALC_DIFF_TIME ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart; ElapsedMicroseconds.QuadPart *= 1000000; ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;

int main()
{
    short results[ITERATIONS];
    const int n = 5;
    int m = 5;
    LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds;
    LARGE_INTEGER Frequency;
	
    QueryPerformanceFrequency(&Frequency);

    // This loop seems to take about 1400 us on my computer.
    printf("Beginning loop over %i iterations with n constant.\n", ITERATIONS);

    GET_START_TIME;

    for (int i = 0; i < ITERATIONS; i++)
    {
        int statement = i % 10;

        if (statement == 0)
            results[i] = n * 0;
        else if (statement == 4)
            results[i] = n * 4;
        else if (statement == 2)
            results[i] = n * 2;
        else if (statement == 5)
            results[i] = n * 5;
        else if (statement == 7)
            results[i] = n * 7;
        else if (statement == 6)
            results[i] = n * 6;
        else if (statement == 1)
            results[i] = n * 1;
        else if (statement == 3)
            results[i] = n * 3;
        else if (statement == 9)
            results[i] = n * 9;
        else if (statement == 8)
            results[i] = n * 8;
    }

    GET_END_TIME;
    CALC_DIFF_TIME;

    printf("Finished in %lld us.\n", ElapsedMicroseconds.QuadPart);

    // This one takes about 800 us on my computer.
    printf("Beginning loop over %i iterations with m variable.\n", ITERATIONS);

    GET_START_TIME;

    for (int i = 0; i < ITERATIONS; i++)
    {
        int statement = i % 10;

        if (statement == 0)
            results[i] = m * 0;
        else if (statement == 4)
            results[i] = m * 4;
        else if (statement == 2)
            results[i] = m * 2;
        else if (statement == 5)
            results[i] = m * 5;
        else if (statement == 7)
            results[i] = m * 7;
        else if (statement == 6)
            results[i] = m * 6;
        else if (statement == 1)
            results[i] = m * 1;
        else if (statement == 3)
            results[i] = m * 3;
        else if (statement == 9)
            results[i] = m * 9;
        else if (statement == 8)
            results[i] = m * 8;
    }

    GET_END_TIME;
    CALC_DIFF_TIME;

    printf("Finished in %lld us.\n", ElapsedMicroseconds.QuadPart);

    getchar();

    return 0;
}

Essentially, if I use a constant (declared in the method) to multiply against for the if blocks, it takes 175% of the time to run through the loops than if I use a regular variable.

I'm no expert on the subject, but this doesn't seem to be the expected behavior.

If anyone has any ideas on it, I'm all ears. Otherwise, I'm just going to sum it all up in that it's a bug with the compiler or execution runtime.


Additional investigation has revealed the following:

If the short array is replaced with an int array, and the number of ITERATIONS is halved, then both loops take the same amount of time. It seems the issue is somewhere with the assignment of the second arithmetic result to a short array is faster than assigning it to an int.

Update:

As it turned out, after inspecting the .asm file, the loops were being optimized because results was never used. This caused the body of the loops to be removed, and the only operation remaining was the i % 10 operation, which was slightly different for each loop.

As Hans Passant said on Stack Overflow:

Looking at the machine code is important to see what is happening. Very little of your code remains after the optimizer is done with it, the result[] assignments are all removed since they don't have any observable side-effects and the n and m identifiers never get used. All that remains is the code for i % 10. Which is optimized to a multiplication, much faster on Intel cores. It uses two different strategies for some reason, one is signed and the other is unsigned. You are seeing that the unsigned version is slightly faster. - Hans Passant, 13 Nov 2015

I guess it goes to show: you can never depend on the compiler doing exactly what you think it does.

On GitHub as promised.

Download: Constant Arithmetic Bug (13-11-2015).zip (232.1KB)

About this Blog

What is Using Programming?

Using Programming is a blog about some of the hidden features of various languages, not-so-obvious optimization strategies, and other ways you can take advantage of various languages and their particular gems. This blog is not exclusive to any one language or framework, I'm going to cover things based on what I run into in my day-to-day work with various languages.

There will probably be a higher quantity of .NET (Visual Basic, C#, ASP.NET) and JavaScript posts simply because that's what my full-time job is in, and what my pet projects are in, but never-fear! I will be making posts on all languages I run into.

Why was the name "Using Programming" chosen? 

The name Using Programming is a two part name. First, it's a play on the C# style of including additional types from additional namespaces in your code. Second, it stands for the ideal of this blog: to help developers get the most out of their programming experience. 

So how do I take the most out of Using Programming?

The best way to use this blog as a resource is to simply try and experiment with what concepts I am drawing out. Everything I run into and blog about I will attach source-code for, so that you may try the exact same experiments that I have done, to help you see exactly how these things work. Some of the optimization strategies I will be going into may be of significant importance to you, and as such you may find the source-code much more usable.

What can I expect to see on Using Programming?

I'm going to try to follow a few guidelines here on Using Programming:
  • All posts will have a summary at the top to indicate a little bit about the topic;
  • All posts related to a language feature will include a digression on what problem the feature is designed to solve, and why it needs solving;

How is the source code licensed?

I will be placing all source-code on GitHub under the MIT license. You may do anything you wish with it and redistribute it at your heart's content. The only request I make is that you include credit where credit is due.

How often is Using Programming updated?

I'll be attempting to make posts at least once-a-week to keep users informed on all the things I've run into. Do note, however, that I may not be able to guarantee a post each week, so don't fret if you don't see a post for a week or two, I promise, I'm still around.

Where can I find examples?

Source code for all articles can be found over on GitHub. You are free to use them to your hearts content, and may do anything you wish with them. I don't guarantee that they will be following best practices, though I do guarantee they cover the text of the article they represent fully.