using Programming;

A Blog about some of the intrinsics related to programming and how one can get the best out of various languages.

Visual C++: Bug with constant arithmetic loops

I was working with Visual C++ for another article I'm preparing, and I noticed an odd bug with the const modifier in Visual C++.

The following code demonstrates the issue:

#include "stdafx.h"
#include <stdio.h>
#include <Windows.h>

#define ITERATIONS 500000
#define GET_START_TIME QueryPerformanceCounter(&StartingTime);
#define GET_END_TIME QueryPerformanceCounter(&EndingTime);
#define CALC_DIFF_TIME ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart; ElapsedMicroseconds.QuadPart *= 1000000; ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;

int main()
{
    short results[ITERATIONS];
    const int n = 5;
    int m = 5;
    LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds;
    LARGE_INTEGER Frequency;
	
    QueryPerformanceFrequency(&Frequency);

    // This loop seems to take about 1400 us on my computer.
    printf("Beginning loop over %i iterations with n constant.\n", ITERATIONS);

    GET_START_TIME;

    for (int i = 0; i < ITERATIONS; i++)
    {
        int statement = i % 10;

        if (statement == 0)
            results[i] = n * 0;
        else if (statement == 4)
            results[i] = n * 4;
        else if (statement == 2)
            results[i] = n * 2;
        else if (statement == 5)
            results[i] = n * 5;
        else if (statement == 7)
            results[i] = n * 7;
        else if (statement == 6)
            results[i] = n * 6;
        else if (statement == 1)
            results[i] = n * 1;
        else if (statement == 3)
            results[i] = n * 3;
        else if (statement == 9)
            results[i] = n * 9;
        else if (statement == 8)
            results[i] = n * 8;
    }

    GET_END_TIME;
    CALC_DIFF_TIME;

    printf("Finished in %lld us.\n", ElapsedMicroseconds.QuadPart);

    // This one takes about 800 us on my computer.
    printf("Beginning loop over %i iterations with m variable.\n", ITERATIONS);

    GET_START_TIME;

    for (int i = 0; i < ITERATIONS; i++)
    {
        int statement = i % 10;

        if (statement == 0)
            results[i] = m * 0;
        else if (statement == 4)
            results[i] = m * 4;
        else if (statement == 2)
            results[i] = m * 2;
        else if (statement == 5)
            results[i] = m * 5;
        else if (statement == 7)
            results[i] = m * 7;
        else if (statement == 6)
            results[i] = m * 6;
        else if (statement == 1)
            results[i] = m * 1;
        else if (statement == 3)
            results[i] = m * 3;
        else if (statement == 9)
            results[i] = m * 9;
        else if (statement == 8)
            results[i] = m * 8;
    }

    GET_END_TIME;
    CALC_DIFF_TIME;

    printf("Finished in %lld us.\n", ElapsedMicroseconds.QuadPart);

    getchar();

    return 0;
}

Essentially, if I use a constant (declared in the method) to multiply against for the if blocks, it takes 175% of the time to run through the loops than if I use a regular variable.

I'm no expert on the subject, but this doesn't seem to be the expected behavior.

If anyone has any ideas on it, I'm all ears. Otherwise, I'm just going to sum it all up in that it's a bug with the compiler or execution runtime.


Additional investigation has revealed the following:

If the short array is replaced with an int array, and the number of ITERATIONS is halved, then both loops take the same amount of time. It seems the issue is somewhere with the assignment of the second arithmetic result to a short array is faster than assigning it to an int.

Update:

As it turned out, after inspecting the .asm file, the loops were being optimized because results was never used. This caused the body of the loops to be removed, and the only operation remaining was the i % 10 operation, which was slightly different for each loop.

As Hans Passant said on Stack Overflow:

Looking at the machine code is important to see what is happening. Very little of your code remains after the optimizer is done with it, the result[] assignments are all removed since they don't have any observable side-effects and the n and m identifiers never get used. All that remains is the code for i % 10. Which is optimized to a multiplication, much faster on Intel cores. It uses two different strategies for some reason, one is signed and the other is unsigned. You are seeing that the unsigned version is slightly faster. - Hans Passant, 13 Nov 2015

I guess it goes to show: you can never depend on the compiler doing exactly what you think it does.

On GitHub as promised.

Download: Constant Arithmetic Bug (13-11-2015).zip (232.1KB)