Game Engine, C 1, P 9, FiberTaskingLib and Fiber-job-system

Info

You can check out tasking and job if you want.
Image from Pixelbay
I have to admit, I had a pretty strong bias towards fiber-job-system from the beginning. It was the first library I spotted and was referencing ideas by Christian Gyrling. Although I noticed, that the library only had a windows implementation, I was ready to work around it when time comes.
As for FiberTaskingLib, it felt for me as a weird mix between fiber-job-system and sewing, but I still wanted to give it a try just in case.

Installation

I didn't manage to get the first library working with my CMake benchmarking project. This library has CMake bindings and they work fine alone, so I was lazy and just imported my code into the library project, rather than the opposite. I was pretty satisfied with the result.
Fiber-job-system on the other hand doesn't have CMake bindings, but it uses only a few files, so I made a quick draft CMakeLists.txt file to use it.
set(PROJECT_NAME_FJS "fjs")
project(${PROJECT_NAME_FJS})

cmake_minimum_required(VERSION 3.0.0 FATAL_ERROR)

file(GLOB_RECURSE FJS_SRC_FILES src/*.cpp)
add_library(${PROJECT_NAME_FJS} ${FJS_SRC_FILES})
target_include_directories(${PROJECT_NAME_FJS} PUBLIC include)

Nothing fancy here, just getting all the source files and including the header folder.

Benchmark

benchmark_tasking.cpp

As I've said, FiberTaskingLib feels like a cheap C++ overlay of a C library. Check out the code and see what I mean for yourself:
#include "benchmark_platform.h"
#include <thread>

#include "ftl/atomic_counter.h"
#include "ftl/task_scheduler.h"

static test_type core_count = (test_type)std::thread::hardware_concurrency();
static std::atomic<test_type> hit_count = 0;

const test_type task_count = 256;

struct argument_t
{
    test_type offset;
    test_type max;
};

struct info_t
{
    argument_t* args;
    size_t job_count;
};

void test_increment(ftl::TaskScheduler* taskScheduler, void* argument)
{
    argument_t* arg = (argument_t*)argument;
    for (test_type i = arg->offset; i < arg->max; i += core_count)
    {
        if (isPrimeNumber(i)) ++hit_count;
    }
}

test_type perform_test(test_type mmax)
{
    hit_count = 0;

    test_type job_count = mmax / (task_count - 1);
    argument_t* args = new argument_t[job_count];
    test_type incr = core_count * task_count;
    test_type index = 0;
    for (test_type i = 2; i < mmax; i += incr)
    {
        for (test_type j = 0; j < core_count; ++j)
        {
            test_type offset = i + j;
            (args + index)->offset = offset;
            (args + index)->max = offset + incr < mmax ? offset + incr : mmax;
            ++index;
        }
    }
    job_count = index;

    ftl::TaskScheduler taskScheduler;
    taskScheduler.Init();
    ftl::Task* tasks = new ftl::Task[job_count];
    for (test_type i = 0; i < job_count; ++i)
    {
        tasks[i] = { test_increment, (args + i) };
    }
    ftl::AtomicCounter counter(&taskScheduler);
    taskScheduler.AddTasks(job_count, tasks, &counter);
    delete[] tasks;
    taskScheduler.WaitForCounter(&counter, 0);

    delete[] args;
    return hit_count;
}

All of the data has to be allocated beforehand and passed as pointers. Perhaps, it is a good thing.

benchmark_job.cpp

It was a bit difficult to get this one going. I had to rewrite the loops a bit in order to fit the job manager queue. I'm still not sure, what I got right and wrong. Anyway, here's the code:
#include "benchmark_platform.h"

#include <thread>
#include <atomic>
#include <future>
#include <list>
#include <iostream>
#include <algorithm>

#include <fjs/Manager.h>
#include <fjs/Counter.h>
#include <fjs/List.h>
#include <fjs/Queue.h>

static test_type core_count = (test_type)std::thread::hardware_concurrency();
static std::atomic<test_type> hit_count = 0;

const test_type task_count = 256;

struct argument_t
{
    test_type offset;
    test_type max;
};

struct info_t
{
    argument_t* args;
    size_t job_count;
};

static info_t minfo;

void test_increment(argument_t* arg)
{
    for (test_type i = arg->offset; i < arg->max; i += core_count)
    {
        if (isPrimeNumber(i)) ++hit_count;
    }
}

void main_test(fjs::Manager* mgr)
{
    test_type pieces = minfo.job_count / task_count;
    for (test_type k = 0; k < pieces; ++k)
    {
        fjs::List list(mgr);
        for (test_type i = 0; i < task_count; ++i)
        {
            list.Add(fjs::JobPriority::Normal, test_increment, (minfo.args + (i + k * task_count)));
        }
        list.Wait();
    }
}

test_type perform_test(test_type mmax)
{
    hit_count = 0;

    test_type job_count = mmax / (task_count - 1);
    argument_t* args = new argument_t[job_count];
    test_type incr = core_count * task_count;
    test_type index = 0;
    for (test_type i = 2; i < mmax; i += incr)
    {
        for (test_type j = 0; j < core_count; ++j)
        {
            test_type offset = i + j;
            (args + index)->offset = offset;
            (args + index)->max = offset + incr < mmax ? offset + incr : mmax;
            ++index;
        }
    }
    job_count = index;
    minfo = { args, index };

    fjs::ManagerOptions managerOptions;
    managerOptions.NumFibers = task_count;
    managerOptions.ThreadAffinity = true;

    managerOptions.HighPriorityQueueSize = 1;
    managerOptions.NormalPriorityQueueSize = task_count;
    managerOptions.LowPriorityQueueSize = 1;

    managerOptions.ShutdownAfterMainCallback = true;

    fjs::Manager manager(managerOptions);

    if (manager.Run(main_test) != fjs::Manager::ReturnCode::Succes)
        std::cout << "oh no" << std::endl;

    delete[] args;
    return hit_count;
}

I have to add jobs to the list and wait for them in batches, otherwise I get either a buffer overflow or a memory access violation. I also felt, that fiber-job-system really lacked a parameter pointer in its main function.

Results

The results were surprising. I hoped FiberTaskingLib to be slightly worse than Marl and fiber-job-system to be the worst due to all the issues I had to face.
Launch 1 Launch 2 Launch 3 Launch 4 Launch 5 Launch 6 Launch 7 Total
Tasking 6.65683 6.18805 6.3974 6.18256 6.20655 6.22112 6.20632 6.2337
Job 7.58917 7.54043 7.65869 7.56652 7.61828 7.56469 7.55989 7.5848

Tasking, as expected, outperformed Job. However, I didn't expect it to get the best result overall! Fiber-job-system didn't get the last place either. There are not many conclusions, that I can do in this post. Let's focus on the overall results in the next post. You can read it here.

Comments