Game Engine, C 1, P 6, Benchmarking Fibers

What's here

In my last post, I decided to go with using fibers for a fiber-based multi-threading task-oriented parallel processing system. What a banger of a name. The only thing left is to test if my assumption was correct and to write the framework. But the thing is, I'm not that good with fibers. Although I understand basic stuff, I'm not sure I'll be able to write a fast framework on my own.

For this reason, I decided to do the most responsible thing and learn from examples. I found a few libraries that use fibers to achieve good performance. I then decided to write a benchmark to test them all out and compare, how good of an abstraction they are.
If you are interested in the results and not what this benchmark is and how to reproduce it, click here.
Here is the list of the libraries, that I decided to check out:
They seem to have followed the ideas, that were presented during GDC by Naughty Dog Studios, so they probably have performance in mind.

Designing the benchmark

Designing a good benchmark is not an easy task. You want to represent and shed light on these libraries as well as possible. But there are some differences between them, so I was hesitant to use features like manual task switching, task waiting and scheduling, task graphs and so on. I decided to focus on the multithreading alone. For this reason, I tried to build a fair playing ground for all the libraries and also include a bare-bones example of a beginner-friendly multithreading approach and a single-threaded approach.

My idea

I decided to do the simplest benchmark possible. It has a "max integer" as input and calculates the number of prime integers within [0;max]. I first made a single-threaded version and tried to find a suitable max to be able to showcase advantages of multithreaded programming, but not to take too long, because I lack patience and this is not a professional benchmark.

My implementation

The implementation is extremely simple. I used CMake to configure the project and import the libraries.
#include "benchmark_platform.h"

bool isPrimeNumber(test_type n) {
    bool isPrime = true;
    test_type max = (test_type)sqrt((double)n);
    for (test_type i = 2; i <= max; i++) {
        if (n % i == 0) {
            isPrime = false;
            break;
        }
    }
    return isPrime;
}

int main()
{
    auto t1 = std::chrono::high_resolution_clock::now();
    auto res = perform_test(20'000'000ull);
    auto t2 = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> time_span = std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1);
 std::cout << "Primes : " << res << std::endl;
 std::cout << "Spent,s: " << time_span.count() << std::endl;
 return 0;
}

As you can see, here is the prime check function and the main function. I am using C++11 chrono functions to measure the time. The header file that I use is here:
#ifndef BM_PLATFORM
#define BM_PLATFORM

#include <chrono>
#include <ctime>
#include <ratio>
#include <iostream>
#include <cmath>

using test_type = unsigned long;

bool isPrimeNumber(test_type n);

test_type perform_test(test_type max);

#endif // !BM_PLATFORM

It includes a declaration for the isPrimeNumber function and the perform_test function itself.
In the next blog post I will show you my single- and multi-threaded functions.
Next, here's the CMake file. It is very simple too. It uses a multi-choice option for the benchmarked item and selects the cpp file based on that.
set(PROJECT_NAME Benchmark)
set(LIBRARY_NAME ${PROJECT_NAME})
project(${PROJECT_NAME})
cmake_minimum_required(VERSION 3.15)

set(LibraryName "single" CACHE STRING "Name of the library to benchmark")
set_property(CACHE LibraryName PROPERTY STRINGS single multi)

set(SOURCES_BENCHMARK benchmark_${LibraryName}.cpp)
message("Performing benchmark " ${SOURCES_BENCHMARK})
add_executable(${LIBRARY_NAME} benchmark.cpp ${SOURCES_BENCHMARK})
target_include_directories(${LIBRARY_NAME} PUBLIC "./")
target_compile_features(${LIBRARY_NAME} PUBLIC cxx_std_17)

I've named the files accordingly, so selecting the file and adding it to the target executable was a piece of cake.

Future posts

In the next post, I'll discuss how I made a simple single- and multi-threaded implementation of the benchmark. Then Sewing and Marl implementations, FiberTaskingLib  and Fiber-Job-System. Lastly, I'll post the benchmark results and my library choice.

Comments