Micro optimization atoul

 

Introduction

This optimization example is learnt from [1]. This small piece of code aim to convert string to unsigned 64 bit integer.

Implementation

Base case

uint64_t Stoul::Atoui_Base(const char *b, const char *e)
{
    enforce(b < e);
    uint64_t result = 0;
    for (; b != e; ++b)
    {
        enforce(*b >= '0' && *b <= '9');
        result = result * 10 + (*b - '0');
    }

    return result;
}

 

001 case

uint64_t Stoul::Atoui_001(const char *b, const char *e)
{
    // we use a table in this solution
    static const uint64_t pow10[20] =
        {
            10000000000000000000UL
            , 1000000000000000000UL
            , 100000000000000000UL
            , 10000000000000000UL
            , 1000000000000000UL
            , 100000000000000UL
            , 10000000000000UL
            , 1000000000000UL
            , 100000000000UL
            , 10000000000UL
            , 1000000000UL
            , 100000000UL
            , 10000000UL
            , 1000000UL
            , 100000UL
            , 10000UL
            , 1000UL
            , 100UL
            , 10UL
            , 1UL
        };
    enforce(b < e);
    uint64_t result = 0;
    auto i = sizeof(pow10) / sizeof(*pow10) - (e - b);
    for (; b != e; ++b)
    {
        enforce(*b >= '0' && *b <= '9');
        result += pow10[i++] * (*b - '0');
    }
    return result;
}

In this case, we try to break calculation so that there is no dependency between loops and the CPU can optimize it out. We believe there should be improvement due to parallelism in the CPU.

002 case

uint64_t Stoul::Atoui_002(const char *b, const char *e)
{
    // we use a table in this solution
    // we reduce the number of comparison
    static const uint64_t pow10[20] =
        {
            10000000000000000000UL
            , 1000000000000000000UL
            , 100000000000000000UL
            , 10000000000000000UL
            , 1000000000000000UL
            , 100000000000000UL
            , 10000000000000UL
            , 1000000000000UL
            , 100000000000UL
            , 10000000000UL
            , 1000000000UL
            , 100000000UL
            , 10000000UL
            , 1000000UL
            , 100000UL
            , 10000UL
            , 1000UL
            , 100UL
            , 10UL
            , 1UL
        };
    enforce(b < e);
    uint64_t result = 0;
    auto i = sizeof(pow10) / sizeof(*pow10) - (e - b);
    for (; b != e; ++b)
    {
        // we can do this because subtraction is injective function --- no two inputs lead to the same output
        // in other word, there is only one possible input when d < 10, and that is all we need
        auto d = uint64_t(*b) - '0';
        if (d >= 10) throw std::range_error("ehm");
        result += pow10[i++] * d;
    }
    return result;
}

In this case, we build on top from 001 case. We further reduce the number of comparison from 2 to 1. We expect the improvement comes from we do lesser things.

Result

$ g++ --version
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE

2023-09-26T05:33:30+00:00
Running ./user_benchmark/cpp_showcases_Benchmark
Run on (20 X 5100 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x10)
  L1 Instruction 32 KiB (x10)
  L2 Unified 2048 KiB (x10)
  L3 Unified 24576 KiB (x1)
Load Average: 0.66, 1.30, 1.6

 

According to the author. The implementation of 001 case should bring improvements but from my implementation and testing, there is no improvement or even a little bit worse than the base line. Probability, C++ compiler is optimized it out for us already.

The 002 case is over performed for all cases as expected. Although there is no correlation between the number of digits and the improvements. 

One can find the benchmark source code here [3].

Reference

[1] N. T. C. W. (2016, January 13). code::dive conference 2015 - Andrei Alexandrescu - Writing Fast Code II. YouTube. https://www.youtube.com/watch?v=9tvbz8CSI8M

[2] project/app/Algo_Data_Structure/Stoul.hpp · main · SulfredLee / cpp_showcases · GitLab. (2023, September 26). GitLab. https://gitlab.com/SulfredLee/cpp_showcases/-/blob/main/project/app/Algo_Data_Structure/Stoul.hpp?ref_type=heads

[3] project/user_benchmark/Stoul_Benchmark.hpp · main · SulfredLee / cpp_showcases · GitLab. (2023, September 26). GitLab. https://gitlab.com/SulfredLee/cpp_showcases/-/blob/main/project/user_benchmark/Stoul_Benchmark.hpp?ref_type=heads

Comments