Time in C++ – present and future – Part 2

In the previous post we discussed elements of the <chrono> library which are already available as of C++11 or C++14. This is our present. What about the future? Enter the date and timezone libraries. We will talk about the date library today. date will be, most likely, available in C++20. The paper currently voted into C++20 can be found here.

But what about the people who want to play with it now? Well, Howard Hinnant has implemented a prototype of the library and has graciously made it available to everybody. date.h is provided as an header-only library so using it in your favorite build system/IDE should be trivial. Even better, if you use Vcpkg the library is available on this package manager. So if you are a Vcpkg user, obtaining the library is a simple as writing:

vcpkg.exe install date:x64-windows

vcpkg.exe install date:x86-windows

Here we will summarize some key ideas of the header date.h. What we discuss today will be applicable to the date library in C++20 and, hopefully, the difference will be minimal.

For a more in-depth look, sit down, relax, allocate 1 hour of your time and watch this:

The talk is extremely entertaining and Howard goes in the nitty-gritty details of the architecture of the library and discuss its performance. We will discuss the library focusing on its use cases and main types, leaving the rest to the video for anybody who’s interested.

Counting the days

First, we should say that date deals only with UTC (like system_clock), so all the dates we will see are UTC, for non-UTC dates we will need the timezone library.

I feel like a broken record, but date is built following the same principles of chrono: if it builds it is correct, enforcing constraints and performing conversions is achieved through the type system. The core type of date is date::sys_days (day_point in the video):

using sys_days =  
std::chrono::time_point<std::chrono::system_clock, date::days>

sys_days represents the number of days from the epoch of the system_clock. The epoch is 00:00:00 UTC, 01/01/1970 (standard in C++20, de-facto standard before).

We can already start doing some tests:

#include <iostream>
#include <chrono>
#include "date/date.h"

int main(){

	auto now = std::chrono::system_clock::now();

	date::sys_days nowInDays =
	             std::chrono::floor<date::days>(now);

	std::cout << "Days since epoch " 
                  << nowInDays.time_since_epoch().count() << '\n';
}

Sure enough the program prints 18136 days, using this side you’ll be able to find out when this article was actually written.

We saw before that chrono allowed us to lift the hood of its compile-time machinery and extended it. date does the same.

Story time: during my first “real” job I had to work in a codebase where dates were stored as days since epoch in double. Debatable idea, but when in Rome… Let’s see if we can handle a double number of days. We first create our custom duration having as underlying type a double.

using perH = std::chrono::hours::period;
using hoursInADay = std::ratio_multiply<std::ratio<24>, perH>;
using daysD = std::chrono::duration<double, hoursInADay>;
using sys_days_d = 
	std::chrono::time_point<std::chrono::system_clock, daysD>;

Then, we assign the value of now expressed in the duration of system_clock to an instance of sys_days_d. And, wait, no cast? Yes, no cast, the cast is needed only when introducing truncation error, not rounding error.

auto now = std::chrono::system_clock::now();
//No casts here
sys_days_d nowInDoubleDays = now;
	
std::cout << "Days since epoch (double) " 
	  << nowInDoubleDays.time_since_epoch().count() << '\n';

Now, we transform the difference between the number of days in double and the number of days in integer in seconds: here we have truncation, (double to integer) so we need a cast.

std::chrono::seconds diff = 
	std::chrono::floor<std::chrono::seconds>
	(nowInDoubleDays - nowInDays);
std::cout << date::time_of_day<decltype(diff)>{ diff } << '\n';

time_of_day is a type that breaks a duration from midnight in hours, minutes and seconds.

Note: in older material one can find the factory function date::make_time. While still available in the library, it was substituted in the standard by direct calls to the constructor of time_of_day with appropriate deduction guides, hence we will be able to write the following in C++20:

std::cout << time_of_day{ diff } << '\n';

Have you noticed that we can call directly operator<< ? The types in date.h (and corresponding C++20 proposal) have the operator and C++20 introduces operator<< for the other types in chrono.

Counting months, years, etc.

Counting days since the epoch is nice and good, but we would like to do something more useful. Like knowing today’s date. Let’s use date::year_month_day:

date::year_month_day ymd{ nowInDays };
std::cout <<"Today is " << ymd << '\n';

Or we would like to know the number of days between epoch and a particular date. We could try:

date::year_month_day ymd_does_not_compile{ 2019, 5, 8 };
std::cout << "Number of days between epoch and 8 May 2019" 
          << date::sys_days{ymd2}.time_since_epoch().count() 
          << '\n';

And, it doesn’t compile, why? Well, again, type safety, implicit conversions between numbers and types in date are not allowed, but this will work

date::year_month_day ymd2{date::year{2019}, 
                          date::month{5}, 
                          date::day{8} };
std::cout << "Number of days since epoch "
          << date::sys_days{ymd2}.time_since_epoch().count()  
          << '\n';

or, using the cute API we can have even more readable code:

using namespace date::literals;
date::year_month_day ymd2 = 2019_y/may/8_d;
std::cout << "Number of days since epoch " 
          << date::sys_days{ymd2}.time_since_epoch().count() 
          << '\n';

Using the cute API one can create year_month_day instances in three different ways:

date::year_month_day ymd3 = 2019_y / may / 8_d;
date::year_month_day ymd4 = 8_d/ may / 2019_y;
date::year_month_day ymd5 = may / 8_d / 2019_y;;

Note: as of now, the C++20 version of the library will have the same literals with some differences, for example there will be d instead of _d or May instead of may, so you will have to write 2019y/May/8d.

We have seen also that we can convert date::sys_days to date::year_month_day and vice versa. Now, what if I want to know what day of the week is today? Sure enough we have a type for it.

date::year_month_weekday ymwd{ nowInDays };
std::cout << ymwd << '\n';

Done! So, what about finding out the final day of the current month? Yet another type.

date::year_month_day_last ymd_last = ymwd.year()/ymwd.month()/last;
std::cout << ymd_last.day() << '\n';

And finally, we have year_month_weekday_last which allows us to get the last Sunday of the current month:

date::year_month_weekday_last 
ymwd_last = ymwd.year()/ ymwd.month()/date::Sunday[last];

std::cout << date::year_month_day{ ymwd_last } << '\n';

Lastly, we should notice that all these types provide arithmetical operators and they have a consistent algebra. If you cannot find the operator you need, then, most likely its use would have required an expensive conversion. In this case you are responsible to perform the appropriate conversion, execute the operation and convert back the date if needed. It might look clumsy but in this way nothing is hidden and you are responsible to make the smart choices.

Invalid dates

The library seems to work well when dealing with normal dates. What if the date does not exist? In this case the September 31st:

date::year_month_day ymd2020 = 2020_y / 8 / last;
ymd2020 += date::months{1};
std::cout << ymd2020 << " "<< ymd2020.ok() << std::endl;

The types of the library implement the member function bool ok() that returns false if the value is invalid. With this knowledge we can choose what to do, the library impose no choice on us.

For example we can decide to snap to the end of the month: we can disassemble the invalid date and use year, month and date::last to construct a new date that snaps to the last valid day of the current month.

year_month_day snapToEndOfMonth(const year_month_day& date) {
   
  if (date.ok()) return date;
  return date.year() / date.month() / date::last;
}

But we can also decide to overflow in the next month, here we can see that the date, while invalid, can still be converted to sys_days (number of days from the epoch). Then the number of days can be converted back to a valid date, as we saw in one of the first examples.

year_month_day overflowIntoNextMonth(const year_month_day& date) {
  if (date.ok()) return date;
  sys_days d{ date };
  return year_month_day{ d };
}

constexpr

constexpr declares that it is possible to evaluate the function at compile time. Looking at the date library we can see that a slew of functions are constexpr, meaning that they can be calculated at compile time. So the compiler can decide to push at compile time the calculations for which it knows already the inputs:

constexpr year_month_day dyw_last = 2019_y / 8 / last;
constexpr auto has_31_days = dyw_last.day()==date::day{31};
static_assert(has_31_days, "No");

Conclusion

I hope that this non exhaustive tour showed you how powerful, clean and simple this library is and the kind of facilities C++20 will have for date calculations. More information regarding all the types, operators and conversions can be found in the part of the proposal detailing the proposed wording for the standard.

Vcpkg and MSVC – First impressions

If you read the first line of the title, you might think that I’m having a stroke and I have fallen on the keyboard. That’s not the case, MSVC stands for Microsoft Visual C++, part of Microsoft Visual Studio, while Vcpkg is a package manager from Microsoft, for third-party libraries. Here we will explore the use of Vcpkg with Visual Studio 2019 on Windows 10. Vcpkg can also be used on MacOS and Linux with other IDEs and build systems.

Why a package manager?

Package managers allow us to easily install 3rd-party libraries on our systems. If you are a Python user, you know that it is incredibly simple to get a new library: you issue the command pip new_library_I_want and the library gets downloaded, installed and it is ready to use.

For what concerns C++ on Linux, you usually need only to use your favorite package manager (apt-get for example) and install the development package of the library you want (normally something like library_I_want-dev). Then, you might also need to fiddle with your favorite build system to convince it to work with the library and you may have some binary compatibility issues when linking. But in my (limited) experience with Linux, I found issues only a couple of times.

With Microsoft Visual Studio on Windows the experience was never particularly good. NuGet was a decent alternative, but finding C++ libraries that work for the latest MSVC version has been, at least for me, complicated.

You can always download the sources of the library you are interested in and integrate it in your project: this is a good solutions for medium to large codebases that use pretty stable libraries.

Vcpkg and other modern package managers for C++ (like Conan) download the sources of the library you want and integrate them with your build system.

Vcpkg brings to the table:

Good number of supported libraries (more than 1000 at the moment of writing);
Simple to install;
Simple to use with MSVC;
Cross platform (learn a tool once, use it in Linux, Windows and MacOS);
It encourages the distribution of libraries as source code and not binaries.
They claim support for CMake and MSBuild (both recommended) plus others build systems, like MAKE. I tested only the integration with MSVC.

Why am I hesitant in joining the bandwagon:

Having many libraries does not automatically imply being a good package manger: the libraries should be also updated. I used OpenCV for this test: until 12 days ago only OpenCV 3.4.3 (released on year ago) was available. Now OpenCV 4.1.1 is available.
There are other alternatives, like Conan which might win in the long run;
Will Microsoft continue to support it? If not, is there enough of a community to keep it going?
As far as I can tell the way in which you can specify an older version of a library is convoluted (basically you need to checkout an old version of the repository of the package manager).

Distributing libraries as sources has great advantages: no issues of binary compatibility and the possibility of tuning the library using appropriate compiler switches. On the other side, installation times (i.e. compilation of the library) can be pretty high. Luckily, the libraries are (unless specified otherwise) installed globally and made available to all the projects.

For all these reasons, I would suggest the use of Vcpkg for now, only to play around with 3rd-party library, but for serious work I would suggest to continue integrating the library in MSVC or use directly a binary.

Install

The installation process is pretty simple. First you should get your hands on a git client and install Microsoft Visual Studio 2019 (Community edition is available for free). Then you need to open a terminal, and clone the Vcpkg repository somewhere on your disk:

git clone https://github.com/microsoft/vcpkg.git

Enter the vcpkg directory, inside you’ll find bootstrap-vcpkg.bat, invoke it

bootstrap-vcpkg.bat

at the end you should see a reassuring:

Building vcpkg.exe… done.

Ok, Vcpkg is done, now you need to make sure that it and MSVC are able to talk to each other, invoke:

vcpkg.exe integrate install

My first library

Now we want to search our first library, OpenCV.

vcpkg.exe search opencv

We obtain the following list of libraries (this was done before the update of the OpenCV library):

darknet[opencv]              Build darknet with support for OpenCV                                 darknet[opencv-cuda]         Build darknet with support for a CUDA-enabled OpenCV                              
opencv               3.4.3-9          computer vision library                                                           opencv[opengl]                        opengl support for opencv                                                         opencv[dnn]                           opencv_dnn module                                                                 opencv[ovis]                          opencv_ovis module                                                                opencv[flann]                         opencv_flann module                                                               opencv[sfm]                           opencv_sfm module                                                                 opencv[contrib]                       opencv_contrib module    
.......................... others.............................

Now we can install with the following command, grab a cup of coffee and wait:

vcpkg.exe install opencv:x64-windows

x64-windows is called a triplet and it is used to specify the architecture. The default is x86-windows so if you want to use the x64 platform in MSVC you need to specify manually the triplet or change the default, setting the environmental variable VCPKG_DEFAULT_TRIPLET appropriately. Downloading x64-windows means that compilation will fail when using the x86 platform and hence you’ll need to download also the x86 platform if you are targeting both. Note that I had no issue in having OpenCV x86 and x64 installed together.

Now, open Visual Studio, create a new project and copy-paste some example code, like this one, compile and run. Welcome to the future of C++ software development.

Conclusion

Installing Vcpkg was straightforward, I failed the first time because no matter what I did the integration was not working. I wiped the directory, re-installed and re-integrated and everything worked. No idea why, but long story short I was able install the package manager, install OpenCV and run my first example in less than 30 minutes and 15 minutes were used for compiling OpenCV. Actually, I had to lose other 20 minutes to find and install the Microsoft Media Feature Pack to fix a DLL error, but it seems to be an issue unrelated to Vcpkg: also Python users have the same problem.

For now Vcpkg seems to be a wonderful tool for people who want to install and test a library without too much hassle. I would like to see a better integration with MSVC with a GUI tool like the one in NuGet and a simpler way to move between versions of a library.

I am looking forward to the evolution of these tools: if we could install libraries easily in few clicks we could make life easier for everybody: in particular for beginners and teachers who can avoid going through tedious set-up or use pre-configured Virtual Machines. The only concern is fragmentation: I really hope that the various package managers can be, at least, easy to use together.

Coroutines: co_yield – Part 4 or many

Let’s recap: we have seen how co_await and co_return work and we have implemented an awaiter type and a return type for a coroutine. Now building on the promise_type we have previously discussed we will see how to handle co_yield, the last of the keywords used in coroutines.

Why co_yield ?

co_yield allows us to return a sequence of values to the caller as opposed to co_return where we return only one element. Let’s see an example first, we will be using, as before, Microsoft Visual Studio 2019, with option /await and the preview of the latest standard as language version (see this previous post for more details). MSVC has some handy extensions (i.e. features that might not work on other compilers) for coroutines, for example std::experimental::generator<T>. We will use it to write our first coroutine with co_yield:


#include <iostream>
#include <random>
#include <experimental/coroutine>
//MSVC extension's header
#include <experimental/generator>

//MSVC extension's class
using int_generator = std::experimental::generator<int>;

int_generator rand_numbers(std::size_t num) {

	//Random number generator between 1 and 6
        std::random_device rd;
	std::mt19937 gen{ rd() };
	std::uniform_int_distribution<> dis(1, 6);

        //core of the coroutine
	for (size_t i = 0; i < num; ++i){
                auto e = dis(gen);
		co_yield e;
        }
}

int main()
{
	for (auto val : rand_numbers(10))
		std::cout << val << std::endl;
	
}

As we can see co_yield returns a value to the caller and suspends the coroutine. This can be useful to create a stream of lazily generated values that the user can inspect one after another without allocating a whole container and returning all of them in one go.

The promise_type

According to N4775 the yield expression (co_yield e in our case) is equivalent to

co_await promise_type_instance.yield_value(e);

So, here we have the first ingredient: our promise_type must have a new member function, namely yield_value taking as input the value we want to yield and it has to return an awaitable type. The rest is similar to a promise_type, so let’s write our int_generator type and corresponding promise_type, starting with the usual boilerplate: we do not need move or copy, so we can remove them. If they are needed one has to be very careful, the handle is basically a raw pointer and it has to be handled accordingly.

struct int_generator {
	int_generator(int_generator const&) = delete;
	int_generator(int_generator&&) = delete;

	int_generator& operator=(int_generator&&) = delete;
	int_generator& operator=(int_generator const&) = delete;

Now we can define the promise_type, and the handle.

	struct promise_type;

	using handle = 
                std::experimental::coroutine_handle<promise_type>;

We implement the get_return_object() to construct an int_generator from a promise_type

	struct promise_type {
		
		auto get_return_object() {
		  return int_generator{ 
                                 handle::from_promise(*this) };
		}

We can now define yield_value. We return a suspend suspend_always because we want to suspend the execution, returning the control to the caller after each value is produced.

                auto yield_value(int value) {
		  m_value = value;
		  return std::experimental::suspend_always{};
		}

Then we add the usual boilerplate: return_void, because flowing off the coroutine is equivalent of calling co_return. We implement also initial_suspend and final_suspend, both returning suspend_always because we want to suspend before generating the first value and after the generation of the last value. final_suspend needs some further clarifications: when the coroutine is suspended at the final suspension point the member function done() of the handle returns true: we will use this information to determine if we are done iterating or not.

		void return_void() { }
		
		auto initial_suspend() {
		  return std::experimental::suspend_always{};
		}

		auto final_suspend() {
		  return std::experimental::suspend_always{};
		}

		int m_value{ 0 };
	};

Here we have other boilerplate code, constructor, destructor, get() similar to the one we already saw for task (remember that the handle has a the member function promise() for accessing a reference to the corresponding promise). Then, we see the next() member function to resume the coroutine, returning false if we have reached the end. It is important to note that get() and next() are user-facing member function, not defined by the standard, so we can define and use them the way we please. Finally we see the private member variable coro: the coroutine handle.

int_generator(handle h) : coro{h} {}

~int_generator() {
   if (coro)
     coro.destroy();
}

int get() {
   return coro.promise().m_value;
}

bool next() {
  if (!coro)
    return false;

  coro.resume();
  return !coro.done();
}

private:
     handle coro;
};

Using these types we can print the values yielded by the coroutine:

int main()
{
	auto coro = rand_numbers(10);
	while (coro.next()) {
		std::cout << coro.get() << std::endl;
	}
}

This code looks like the Java code I had to write during Computer Science 101: bad memories. Let’s use C++ iterators so that we can have a nice range-based for loop.

Iterators

Each iterator instance has, as member variable, an handle, but the iterator is not the owner of the coroutine frame, so we do not need to manage the frame’s lifetime.

struct int_generator{
......
	struct iterator {

		handle coro;

We see here the usual preincrement operator, which takes care of resuming the coroutine and if the coroutine is done (suspended at the last suspension point) the handle is set to nullptr. This is useful for the equality comparisons with end() as we will see later.

    iterator& operator++() {
        coro.resume();
        if (coro.done())
            coro = nullptr;
        return *this;
    }

We can implement the dereference operator to extract the yielded value

    const int& operator*() const {
       return coro.promise().m_value;
    }

Now we have to define equality operators between iterators. According to N4775 §21.11.2.6 the comparison between coroutine handles is equivalent to check the output of the member function address() of an handle. For our use case we need to be able to tell when an iterator reached the end(), and this is why, before, we set the handle to nullptr when we reach the end: because if the handle is set to nullptr, address() returns nullptr, see §21.11.2.1. So we can use nullptr to mark the end of the sequence of values.

    bool operator==(const iterator& other) const {
        return coro == other.coro;
    }

    bool operator!=(const iterator& other) const {
        return !(operator==(other));
    }

}; //end of iterator

Finally we simply need to add begin() and end() to the int_generator class. Here we use nullptr as a way to mark the end() of the sequence of values.


iterator begin() { iterator it{ coro }; ++it; return it; }
iterator end() { return iterator{ nullptr }; }

Note that in the iterators I skipped a bunch of operators (postincrement, operator->, etc.) that might be useful in general but they were not needed in this example. The complete code of this second version (with iterators) is available on my github.

Conclusion

We saw some examples that I hope have shed some light on coroutines. I would like to point out that Lewis Baker has published cppcoro: and industrial-strength implementation of useful types for coroutines, so if you need to use coroutines in production, before rolling your own types, take a look at this library. In the next articles we will take a look a more complicated features of coroutines, for example we will revisit the concept of Awaitable.

Time in C++ – present and future – Part 1

I was always interested in how computers can deal with the messiness of the human way of counting time: months of different lengths, 60 minutes is an hour, 24 hours in a day, it’s logic, isn’t it?

My fascination for the subject was met by my disgust (I really struggled to find a less harsh word, I failed) for the tools C++ provided us with, which where, before C++11, not particularly good (and this is an euphemism). To be blunt, C++ had no tools to speak of, it had the C header <ctime>, and in that header we find this masterpiece to convert epoch (seconds from 1/1/1970) to date:

 struct tm *localtime( const time_t *time );

The inattentive reader might think that this is a C API that returns an heap allocated pointer: so, let’s wrap it up in a std::unique_ptr with custom deleter or a wrapper class and be done with it, no? No! From the documentation:

The result is stored in static storage and a pointer to that static storage is returned.
[…]
This function localtime may not be thread-safe.
From: https://en.cppreference.com/w/c/chrono/localtime

Reading this paragraph made me lose a couple of years of life. So, before C++11 dealing with time was complex, potentially non thread-safe and not particularly nice. There are extensions that avoid the lack of thread-safety, localtime_r for Posix, localtime_s for Windows (not to be confused with the C11 function of the same name) and other similarly named functions.

The rest of the articles in this series will summarize some videos from Howard Hinnant who, after giving us move semantics, threads, mutexes and std::unique_ptr gave us also <chrono> and continues to work to provide date and timezone support in C++20 .

The videos will be presented in logical order, not in chronological order (ironic, given the subject at hand).

We will start with “A <chrono> tutorial – CppCon 2016”.

Durations (C++11/14/17)

In the above video we are introduced to the goodies of the <chrono> header in its C++14 incarnation (with a small detour to C++17). The mantra that is repeated during the presentation is: “If it compiles, it works”, meaning that the type system is designed to enforce as many checks as possible at compile time.

Key features

The library provides and allows the definition of duration types (e.g. hours, seconds, etc.);
Library-provided durations behave like signed built-in integral types, i.e. they can be declared without being initialized. So, please, make sure you initialize them.
Implicit conversions (and mixed operations, like sum) between built-in types (int, float) and duration types are not possible, appropriate constructors must be called, this makes sense: if I write 2 in my code, what do I mean? Seconds, hours, bananas? Let’s assume that implicit conversions are possible: if we have a + 2 the value of time expressed by 2 depends on a. Is a seconds? Then, 2 is seconds. We refactor and a is millisecond, now 2 is milliseconds. Refactoring becomes a nightmare;
The duration classes take care of converting types, for example you can compare or sum seconds and milliseconds, the correct transformations will be done for you;
Lossless (no truncation error) conversions are implicit (e.g. from seconds to milliseconds), potentially lossy conversions (e.g. milliseconds to seconds) requires explicit casts: duration_cast<target_type> in C++11, and also floor, ceil and round in C++17;
The use of the member function count() to extract the number of units (ticks) stored in a duration is discouraged and should be used only for I/O and interaction with legacy code.

Why all these constraints? Because using, for example, int for seconds and relying on variable names or comments to make sure that the user of your library provides the right units is brittle. Furthermore, refactoring a codebase that relies on these conventions is a nightmare.

Below a small code snippet to summarize what we saw so far:

#include <chrono>
#include <iostream>

//Works: implicit lossless conversion from s to ms
std::chrono::milliseconds sum1(std::chrono::seconds m1, std::chrono::milliseconds m2) {
        //here m1 * 1000 done by the type system
	return m1 + m2;
}

//Does not work, no implicit lossy conversion from ms to s
std::chrono::seconds sum2(std::chrono::seconds m1, std::chrono::milliseconds m2) {
	return m1 + m2;
}

//Does not work, no implicit conversion from built-in types
std::chrono::seconds sum3(std::chrono::seconds m1, int m2) {
	return m1 + m2;
}

int main()
{
        std::chrono::seconds t1{ 4 };
        std::chrono::seconds t2{ 4 };
	std::chrono::microseconds s = sum1(t1, t2);

	//Last resort, count() to extract the time in ms
	std::cout << s.count() << " ms\n";
}

Lifting the hood

Like any C++ feature, expert developers can lift the hood of the machinery provided by <chrono> and fiddle with it. std::chrono::seconds and other durations are alias for

template<
    class Rep, 
    class Period = std::ratio<1> 
> class duration;

Where Rep is the representation, i.e. the underlying type that is storing the duration (for example int). Rep can also be floating point. In this case all the conversions are considered lossless (no truncation error, only rounding errors) and therefore implicit.

Period is way more interesting and gives us an idea of how the type system works: it is a fraction representing the duration in seconds, so a millisecond has Period = std::ratio<1, 1000>, while a minute has Period = std::ratio<60, 1>. One can create its own duration, say a floating-point half-day made of 12 hours (see below).

std::ratio is a core building block of the compile-time machinery that makes <chrono> so performing. When summing two durations with different ratios, the least common denominator will be calculated at compile-time, leaving only a couple of multiplications and a sum to be performed at run-time. Below a simple example where we sum a custom floating point duration spanning 12 hours and a 8 hours expressed using a built-in type. The output is, as expected, 20 hours.

#include <chrono>
#include <iostream>

int main() {

	auto constexpr secondsIn12Hours = 43200;
	using halfDay = std::chrono::duration<float, std::ratio<secondsIn12Hours, 1>>;

	halfDay workedTime{ 1 };
	
	std::chrono::hours sleepedIn{ 8 };

	std::chrono::hours totalTime = std::chrono::duration_cast<std::chrono::hours>(workedTime + sleepedIn);

	std::cout << "Total time used " << totalTime.count() << '\n';
}

We have analyzed duration: basically we can handle the concept of “how long” in C++. But what about the concept of a time in point, like “now”? clock and time point to the rescue.

Clocks (C++11/14/17)

The clocks defined <chrono> allow us to get the current time thanks to the static function now(). A clock can be steady (i.e. the tick is constant and it will never be adjusted) or not. A clock counts a number of ticks (e.g. seconds) from a point in time called the epoch.

std::chrono::steady_clock: this clock is steady and should be used as a stopwatch, to measure intervals;
std::chrono::system_clock: this represents the wall-clock of the system. Usually (mandatory from C++20) it uses as epoch (starting point) 00:00 1/1/1970 UTC.
std::chrono::high_resolution_clock: clock with the smallest tick, usually an alias for one of the two previous ones;

Time Points (C++11/14/17)

A clock and a duration, together, define a time_point: the clock defines the epoch (i.e. instant zero) and the Duration expresses how many ticks are elapsed from the epoch.

template< 
    class Clock, 
    class Duration = typename Clock::duration 
> class time_point;

We can cast one time_point that uses seconds as ticks to a time_point that uses hours with time_point_cast (C++11) or floor, ceil and round (C++17). The duration from the epoch can be retrieved thanks to the member function time_since_epoch().

Timepoints and Durations have consistent algebra, enforced by the time system: two timepoints can be subtracted to obtain a duration, but not summed. A duration can be added to a timepoint.

int main() {
	
	std::chrono::steady_clock::time_point t1 = std::chrono::steady_clock::now();
	std::this_thread::sleep_until(t1 + std::chrono::seconds{ 2 });
	std::chrono::steady_clock::time_point t2 = std::chrono::steady_clock::now();

	std::chrono::steady_clock::duration duration = t2 - t1;
        //sum 2 time points does not work!
	//auto does_not_work = t1 + t2;

	auto works = duration + std::chrono::steady_clock::now();

	//We need cast because we don't know, in general, the duration type of the clock
	std::cout << "Time elapsed " << 
		std::chrono::duration_cast<std::chrono::milliseconds>(duration).count() 
		<< " ms\n";


	//Two equivalent ways of casting

	//Cast on duration
	std::chrono::system_clock::time_point sysTime = std::chrono::system_clock::now();
	std::cout << "Hours elapsed from 1970 " <<
		std::chrono::duration_cast<std::chrono::hours>(sysTime.time_since_epoch()).count()
		<< '\n';

	//Cast the time point
	auto sysTime2 = std::chrono::time_point_cast<std::chrono::hours>(std::chrono::system_clock::now());
	std::cout << "Hours elapsed from 1970 " <<
		sysTime2.time_since_epoch().count()
		<< '\n';
}

Takeaways

The wrapper classes have (with the appropriate optimization switch enabled) the same performance of the built-in types;
The typesystem takes care of the of the conversions;
Using count() to retrieve the number of ticks should be done as last resort, only for I/O or compatibility with legacy APIs;
The type system is easy to expand and user-defined durations are first-class citizens.

Next time we will peek at the future, we will analyse how we will be able, using the same type system, to handle days, months and years.

Coroutines: co_return – Part 3 of many

We saw, in the previous post, the connection between co_await and an Awaiter type: the co_await allows the coroutine to suspend itself, while pushing in the Awaiter the coroutine handle. The handle can be used to resume the coroutine. Awaiter is basically a channel between the coroutine and the asynchronous process which does the work while the coroutine is suspended.

co_return, on the other hand, allows the coroutine to return to the caller, optionally giving back a value. A problem now arises: when the coroutine returns to the caller, the real return value might not be available. So we need to return to the caller a placeholder for such value. Furthermore this placeholder type must be able to get the result from the coroutine once it is ready. We used std::future to perform such a task in the previous article, thanks to an MSVC extension. This is not good because:

It is not portable code, everything outside the standard might work differently on other compilers or not work at all;
std::future might be an overkill for what we are trying to achieve.

Why is std::future an overkill?

std::promise and std::future share a (reference counted) common state, and they need synchronization primitives to make sure that std::future doesn’t read from the shared state before the value is stored.

Shamelessly stolen from Gor Nishanov’s CppCon 2015 slides

Now, assume that we are using an std::future as return type: the coroutine frame is typically heap allocated like the std::future shared state, so why not storing the shared state directly in the coroutine frame? To do this we need to study how a promise type should be implemented. Again, the standard tells us how to write the type, but does not provide an implementation of it.

co_return

Let’s go back to our (contrived) example and let’s define a task class, that will have a get() member function to access the result of the calculation. In our example all the calculations are done on the main thread, so, when we arrive to call the get() member function, the task doesn’t need to synchronize: we know that the result is ready. Let’s review our coroutine, where now we are using task instead of std::future<int>.

task coRoutine(std::string name) {
	int sum = 0;
	while (sum < 20) {
		int input_data = rand() % 5;
		const auto s = co_await Awaiter{ input_data };
		std::cout << name << " received " << s << '\n';
		sum += s;
	}
	co_return sum;
}

Let’s use the same trick we used to understand Awaiter: let’s see what code will be injected by the compiler and then derive how the types we use should look like (code adapted from James McNellis’s slides and N4775):

task coRoutine(std::string name) {

   task::promise_type _promise;
   task task_to_return = _promise.get_return_object();
   co_await _promise.initial_suspend(); 

  try { 
           //..... body of the coroutine....

          //code injected for co_return
          _promise.return_value(sum);

      } catch(...) { _promise.unhandled_exception(); }

 final_suspend :
  co_await _promise.final_suspend(); 
}

An instance of type task will be returned to the caller the first time the coroutine suspends itself. The _promise which corresponds to the task will then be used to provide a return value when co_return is invoked. So we need to implement the following member functions for the promise_type:

auto get_return_object(): allow a promise to return an instance of the corresponding return type, task in our case;
auto initial_suspend(): returns an awaitable to allow a first suspension point immediately at the beginning to the function body;
Now we have a choice, does the coroutine return a value or is it void?
- void return_value(T value) is called by the coroutine if co_return returns a value;
- void return_void() is called if co_return returns void (or if there is no co_return).
- Important: you cannot implement both, if you do it, the program is ill-formed. Furthermore, if you implement return_value (i.e. you want ot return a value) and there is no co_return some_value in your flow of execution, co_return without arguments is assumed, and your program is, you guessed it, ill-formed.
auto final_suspend(): returns an awaitable to allow a last suspension point immediately before the destruction of the coroutine frame;
void unhandled_exception(): called if the body of the coroutine throws an exception (we won’t be talking about it for the moment).

Now we know what to do for what concerns promise_type. What about task? How to establish the connection between promise_type and task? The task will own an instance (we’ll call it coro) of the type std::experimental::coroutine_handle<promise_type> .This type inherits from std::experimental::coroutine_handle<> (see previous post). We will restrict our analysis to two new functions:

promise_type& promise(): given an handle, we can get a reference to the promise stored in the frame (the instance _promise we saw in the example);
static
coroutine_handle<promise_type> from_promise(promise_type &): given a reference to _promise we can get an handle to the corresponding coroutine.

Lastly a word of about coroutine_handle<> and coroutine_handle<promise_type>: these are handles, non-RAII types: if you have a class owning them (like task) you have to treat the handle like a raw pointer, you need to write destructor, move and copy constructors and assignment operators for the owning class. Talking about constructors, in order to destroy the coroutine frame you might need to call the destroy() member function of the handle (more further below).

Let’s put all together

So, a small recap: we will be writing a type, called task who owns an instance of a coroutine_handle. Inside the task we will define a promise_type where the coroutine will store the return value once available. We will make task non-movable and non-copyable given that we do not need these features now, just to avoid dealing with the handle, which is non-RAII. The following code is an evolution of the one provided in the previous post, regardless, I created a new project to make everything self-contained.

struct task {
	struct promise_type;

	task(task const&) = delete;
	task(task&&) = delete;

	task& operator=(task&&) = delete;
	task& operator=(task const&) = delete;

	//std::coroutine_handle<promise_type> in the future
	using handle = 
                std::experimental::coroutine_handle<promise_type>;

	struct promise_type {

		auto get_return_object() { 
			return task{ handle::from_promise(*this) }; 
		}
		
		void return_value(int value) { 
			m_value = value; 
		}

		auto initial_suspend() { 
			return std::experimental::suspend_never{}; 
		}
		
		auto final_suspend() { 
			return std::experimental::suspend_always{}; 
		}

		int m_value{ 0 };
	};

	task(handle h) : coro(h) {}

	int get() {
		return coro.promise().m_value;
	}

	~task() {
		if (coro)
			coro.destroy();
	}

private:
	handle coro;
};

Analyzing get_return_object we can see that from_promise allows us to create a task owning the handle to the coroutine from an instance of promise_type.

The task::get() member function can access the instance of the promise_type inside the coroutine frame thanks to promise(): a member function of the coroutine_handle. Once we have access to the promise, we can also access its value (m_value).

Lifetime issues

Some words on lifetime: why do we return an instance of suspend_always on final_suspend? suspend_always is one of the trivial types the STL provides us with, together with suspend_never . suspend_never never suspends, invoking co_await does not suspend the coroutine. suspend_always instead suspends the coroutine and requires that resume() is called.

So, why suspend_never? We need to keep in mind that the coroutine frame is destroyed by either:

The call to the member function of the coroutine_handle, destroy();
Resumption from the final suspension point.

Whichever comes first. Note that calling destroy() on a non-suspended coroutine is undefined behavior. So returning a suspend_never allows us to keep the frame alive until the corresponding instance of task is accessible. When task is destroyed, also the coroutine frame will be destroyed, see the implementation of ~task().

In my github repo you can find code where I added an instance of the class noisy to the promise_type. noisy prints when it is created and destroyed. Using suspend_always we can see that the destruction of the promises happen after main access the data. Using suspend_never, instead, shows that the destruction happens before task::get() invocation, leading to UB (and usually a crash).

We’ll discuss co_yield, the last of the keywords related to coroutines.

Coroutines: co_await – Part 2 of many

Let’s start with some good news: the coroutine boat is sailing without many issues through the final steps of the standardization process. Finger crossed. There are still some who feel that there is room from improvement, but it looks like the feature won’t be dropped.

In the previous part of this series we discussed the general idea of a coroutine. Now we will discuss the operator co_await and the type Awaiter.

In general co_await should be invoked on an Awaitable type, the simpler of these types is an Awaiter type. All the other Awaitable types are Awaitable because they can be converted to an Awaiter type, so, let’s first focus in understanding an Awaiter type.

co_await can suspend the execution of the coroutine. Once the asynchronous operation represented by the Awaitable type is completed, the coroutine can be resumed and it can use the results of the asynchronous operation. One can imagine the Awaiter to be a channel between the coroutine and the logic that asynchronously provides the result.

Reminder: the Standard Library, except for very trivial types, do not provide support to coroutines. The Standard describes the member functions an Awaiter type should provide, but you have to write it.

An Awaiter type

The following is an example of an Awaiter type

struct Awaiter {
	bool await_ready();

	//in the future most likely std::coroutine_handle<>;
	using coroHandle = std::experimental::coroutine_handle<>;
	void await_suspend(coroHandle handle);

	auto await_resume();
};

coroutine_handle<> is a type that provides a way to interact with the coroutine. For now, we care only about the resume() member function: it allows use to, you guessed it, resume the coroutine.

In the body of a very simple coroutine the Awaiter type could be used in this way:

some_return_type coro(){
    ....
    Awaiter a{};
    int i = co_await a;
    ....
}

To understand how we should implement Awaiter, we must first see what code the compiler injects in place of co_await (adapted from James McNellis slides):

if (!a.await_ready())
{
  a.await_suspend(coroutine_handle);
  // ...suspend/resume point...
}
int i = a.await_resume();

First we check if await_ready returns true. If that is the case we know that the result of our async operation is already available, therefore we do not need to go through the expensive process of suspending ourselves.

await_suspend is a bit trickier: it is used to inject the coroutine_handle in the logic that will take care of resuming the coroutine. This logic will invoke the resume() member function on the handle once the result of the computation is ready.

await_resume takes care of providing the result of the asynchronous operation.

Let’s put all together

Now we can try to put everything together. The code I will present will compile on MSVC (Microsoft Visual Studio) 2019. You will need to add the /await switch to your solution: right click on the project, select Properties. A window will appear, in the list, on the left select C/C++ and then “All Options”. In the table that appears in the center of your screen add /await to the line “Additional Options” and in the line “C++ Language Standard” select “Preview – Feature from the Latest …”.

All the code and the MSVC project is available in my github repo.

We will be writing a very simple Awaiter that pushes a value to a worker and suspends. The worker does some (extremely complex :-)) calculations and once the data is ready the coroutine is resumed.

Let’s see what the code does: Awaiter::await_suspend pushes the current coroutine handle and data to the worker. The worker resumes the coroutine once the result is ready. await_ready always returns false because in this use-case we must always suspend: the worker cannot have the result of the computation ready without us providing the input. inData is the input data to be provided to the worker.

class Awaiter {
public:
	
	Awaiter(int inData) : inData(inData) {}

	bool await_ready()
	{
		return false;
	}

	using coroHandle = std::experimental::coroutine_handle<>;
	void await_suspend(coroHandle handle)
	{
		Worker::getInstance().suspend(handle, inData);
	}

	int await_resume()
	{
		return Worker::getInstance().getData();
	}

private:
	int inData;
};

The body of the coroutine looks something like this, Awaiter::await_resume is invoked by the injected code in order to initialize const int s.

std::future coRoutine(std::string name){
  int sum = 0;
  while (sum < 20) {
     int input_data = rand() % 5;
     const int s = co_await Awaiter{input_data};
     std::cout << name << " received " << s << '\n';
     sum += s;   
   }
   co_return sum;
}

For now we are using the extension that allows us to return a std::future from a coroutine. Furthermore we are not going to explain how co_return works for now, simply think that behind the scene co_return is setting the value of the promise corresponding to the returned future.

The rest of the code is quite straightforward: we iterate a number of times, every time we suspend on co_await and once the result is ready we are resumed by the worker (which is a Singleton, which might be bad, but I can save my reputation saying that it is just for illustration purposes).

Let’s give a look to the worker (which runs on the main thread) just for completeness:

class Worker {

public:

	static Worker& getInstance() {
		static Worker worker;
		return worker;
	}

	using coroHandle = std::experimental::coroutine_handle<>;

	void suspend(coroHandle handle, int inputData) {
		m_suspendedCoros.push({handle, inputData});
	}

	int veryComplexOp(int inputData) 
            { return std::rand() % 10 + inputData; }

	void run() {
		while (!m_suspendedCoros.empty()) {
			auto [coro, inputData] = 
                                         m_suspendedCoros.front();
			m_suspendedCoros.pop();
			outputData = veryComplexOp(inputData);
			coro.resume();
		}
	}

	int getData() const { return outputData; }

private:

	std::queue<std::pair<coroHandle, int>> m_suspendedCoros;
	int outputData{ -1 };
};

It keeps a queue of suspended coroutines and the corresponding input value. It is executed calling the member function run() that exits once there are no more suspended coroutines. When computational resources become available, the worker pops a coroutine and its value (see run()), perform the calculation (veryComplexOp()), stores the result in outputData and resume the coroutine via the resume() member function of the handle. In order for the loop to not exit immediately, run() is called in main() after the invocation of the coroutines.

A word on std::future

In the next part of the series we’ll see how we should implement our own return type. Why should we? std::future works out-of-the-box in MSVC and can be adapted for the other compilers (see for example this talk). We should implement our own type for a variety of reasons: firstly, for a single-threaded case, like our contrived example, all the synchronization machinery that std::future has is useless to us. More in general std::future might be an overkill for many use cases: the frame of the coroutine (the area of memory containing its local variables) is heap allocated (this can be customized, we’re talking of C++ after all) but also the common state between promise and future is heap-allocated. Wouldn’t it be more efficient if we could:

Store the common state directly in the coroutine frame?
Keep the common state alive until we get the data?

This can be done with a custom-made return type that we will discuss next time.

A deeper look at await_suspend

Reading the TS we can see that await_suspend can have three different return types:

void: the case we saw already, suspend the coroutine unconditionally and wait to be resumed;
coroutine_handle: upon suspension of the current coroutine, the returned coroutine handle will be resumed. This is useful in cases where the current coroutine already knows which coroutine should be resumed next and we want to avoid paying the price of relinquishing control to the caller. This is called symmetric coroutine to coroutine transfer, for more details see this video.
bool: the feature was first described, as far as I know, in the CppCon 2015 talk from Gor. If you return false the coroutine will not suspend (note that this is the opposite value that await_ready should return to avoid suspension). This is useful in the case where we cannot check if the result is ready without being also ready to suspend. For example one can think that the worker might return a future which might be immediately ready but we cannot obtain it without enqueuing our handle (hopefully the worker is smart enough to discard the handle if a result is ready). If we return bool the injected code looks something like this:

if (!a.await_ready() && a.await_suspend(coroutine_handle))
{
  // ...suspend/resume point...
}
int i = a.await_resume();

Next time

Next time we will write our own very return type and we will explore how co_return actually works.

Mutex in C++: the story so far – Part 2

In the previous part of this series we discussed the main primitives available in C++11 to lock shared resources. I reiterated (too many times?) that if you call directly unlock on a std::mutex in your code, you are most likely wrong.

std::lock_guard

std::lock_guard is the simplest and most efficient wrapper for mutex-like classes. Its interface is:

template< class Mutex >
class lock_guard;

We can already notice something interesting: it is templated, so if your codebase has already a mutex-like class (BasicLockable), you can use the lock_guard out of the box. Let’s see the simplest use case: lock_guard locks the mutex in its constructor and unlocks it in its destructor.

struct TSafeContainer{

  void set(std::string inData){
      //This is equivalent to mx.lock()
      std::lock_guard<std::mutex> guard{mx};  
        
       data = inData;

     /* guard goes out of scope, in its destructor mx.unlock() is 
     called */
  }

  std::string data;
  std::mutex mx;
};

std::lock_guard is a RAII wrapper. Fun fact: RAII is most likely, the most powerful and the worse named feature of the language. In simple terms, RAII it means that if a class own a resource, the destructor of that class is responsible for releasing the resource. In our case, the mutex is the resource and it gets released at the destruction of the lock_guard.

Why should you use lock_guard? To avoid writing a line? Not only that. Potentially the line data = inData can throw. If it throws, and you are not using lock_guard nobody will ever call unlock() and any other thread trying to obtain ownership of the lock will stay blocked, forever. And your application will hang, your costumers will be mad and it can take days to debug these issues in a big enough codebase.

But, but, but I hear the people scared by RAII say: what if the instructions in-between can never throw? TODAY they cannot throw, but before you know it, an innocent-looking C-style function might get rewritten in C++, or a noexcept bestowed upon a function might get removed once the author realizes that it is too restrictive. As far as I know the use of std::lock_guard incurs in no performance cost in decently good compilers, so you should use it.

But, but, but I hear the more attentive readers say: what if we need to use, for example, the member function try_lock or the free function std::lock() (see below) ? lock_guard has another constructor:

lock_guard( mutex_type& m, std::adopt_lock_t t );

This allow us to construct a lock_guard on a mutex we already own, the constructor will not lock, but the destructor will unlock. If the mutex is not owned by the current thread, we are in Undefined Behavior territory. Let’s add a new member function to our previously defined class to write a small example:

  bool trySet(std::string inData){
       if(!mx.try_lock())
          return false;
       //mx is locked already, create a guard but without locking
       std::lock_guard<std::mutex> guard{mx, std::adopt_lock};  
        
       data = inData;

     /* guard goes out of scope, in its destructor mx.unlock() is 
     called */
  }

std::adopt_lock is a global variable of type std::adopt_lock_t, see https://en.cppreference.com/w/cpp/thread/lock_tag .

Note: you might want to use unique_lock in this case, unless performance considerations (and measurements) suggest otherwise, see below.

Summary:

lock_guard can be used with any class implementing lock() and unlock() member functions;
lock_guard unlocks in the destructor of the wrapped mutex;
lock_guard usually locks in the constructor the wrapped mutex, this behavior can be disabled with std::adopt_lock

std::unique_lock

std::lock_guard while very simple and lightweight, covers a great deal of the possible use cases. std::unique_lock is bigger and maybe slower, but provides more flexibility with the same RAII guarantees we saw before.

std::unique_lock provides lock() and unlock() member functions. This allows more flexibility, in a function we can, for example, lock(), execute some operations, unlock(), modify non-shared resources and then lock() again. The member function

 bool owns_lock() const noexcept

can be used to check whether the associated mutex is owned, if the mutex is owned at the moment of destruction std::unique_lock will unlock the mutex in its destructor.

So we can already see that std::unique_lock might incur in performance penalties: it has a bigger footprint because it must “remember” whether it owns or not the mutex and the logic in the destructor is more complex.

Furthermore the mutex member functions try_lock(), try_lock_for() and try_lock_until() can be invoked using corresponding member function in std::unique_lock.

The constructors of std::unique_lock provide many functionalities, the most basic ones are the ones providing the same functionalities of lock_guard:

//lock mutex on construction
explicit unique_lock( mutex_type& m );

/* assume lock is already owned, invoke using global variable std::adopt_lock */
unique_lock( mutex_type& m, std::adopt_lock_t t );

next we have three ctors that allow us to try to lock the mutex using thetry_* member functions we saw before:

/* like using try_lock, invoke using the global variable std::try_to_lock */
unique_lock( mutex_type& m, std::try_to_lock_t t );

//like using try_lock_for
template< class Rep, class Period >
unique_lock( mutex_type& m, 
             const std::chrono::duration<Rep,Period>& timeout_duration );

//like using try_lock_until
template< class Clock, class Duration >
unique_lock( mutex_type& m, 
             const std::chrono::time_point<Clock,Duration>& timeout_time );

Lastly we have a constructor allowing deferred locking: the unique_lock is created but it does not own the mutex, lock() should be invoked when needed. The constructor can be invoked using the global variable std::defer_lock .

unique_lock( mutex_type& m, std::defer_lock_t t ) noexcept;

std::lock and std::try_lock

std::lock and std::try_lock are two free functions that allow us to lock (or try to lock) a number of Lockable objects, therefore both functions work with both std::mutex and std::unique_lock, we therefore suggest the use of std::unique_locks constructed using std::defer_lock: if you add a new mutex, it is likely that you will remember to add it to the std::lock call but you might forget creating the appropriate lock_guard.

try_lock invokes the try_lock member function of the [0, …, N-1] passed Lockable objects following the argument order. If the call fails for the k-th object, all the previous locks are released and k is returned, otherwise -1 is returned.

Attention:

/* THIS IS WRONG! success is only for -1, this will be considered true for every failure in locking except for the first element! */
if(std::try_lock(....)){//do something with the shared resource}

//THIS IS RIGHT! 
if(std::try_lock(....) == -1){//do something with the shared resource}

std::lock on the other hand calls lock(): the order is unspecified but it is guaranteed to not produce deadlock against other invocation of std::lock on the same series of arguments (even if the order if different). To understand why we need lock() below there is a small example: if you try to run it, you will go, sooner or later in a deadlock: t1 will obtain ownership of mutex b, but before being able to own a, t2 will get it. Now t1 needs a resource owned by t2 to continue, and t2 needs a resource owned by t1 to proceed. This is a deadlock.

#include <mutex>
#include <cmath>
#include <thread>
#include <chrono>
#include <iostream>

std::mutex a;
std::mutex b;

int main() {

	std::thread t1{
		[]() {
			while (true) {
				{
					std::lock_guard<std::mutex> lk_b{ b };
					std::lock_guard<std::mutex> lk_a{ a };
					std::cout << "T1 in critical region\n";
				}
				std::this_thread::sleep_for(std::chrono::seconds(rand() % 1));
			}
		}
	};

	std::thread t2{
		[]() {
			while (true) {
				{
					std::lock_guard<std::mutex> lk_a{ a };
					std::lock_guard<std::mutex> lk_b{ b };
					std::cout << "T2 in critical region\n";
				}
				std::this_thread::sleep_for(std::chrono::seconds(rand() % 1));
			}
		}
	};

	t1.join();
	t2.join();
}

On the other hand, using std::lock prevents the issue, even though we provide the Lockable object in a different order.

#include <mutex>
#include <cmath>
#include <thread>
#include <chrono>
#include <iostream>

std::mutex a;
std::mutex b;

int main() {

	std::thread t1{
		[]() {
			while (true) {
				{
					std::unique_lock<std::mutex> lk_a{ a, std::defer_lock };
					std::unique_lock<std::mutex> lk_b{ b, std::defer_lock };
					std::lock(lk_a, lk_b);
					std::cout << "T1 in critical region " << lk_a.owns_lock() << " " << lk_a.owns_lock() << '\n';
				}
				std::this_thread::sleep_for(std::chrono::seconds(rand() % 1));
			}
		}
	};

	std::thread t2{
		[]() {
			while (true) {
				{
					std::unique_lock<std::mutex> lk_a{ a, std::defer_lock };
					std::unique_lock<std::mutex> lk_b{ b, std::defer_lock };
					std::lock(lk_b, lk_a);
					std::cout << "T2 in critical region " << lk_a.owns_lock() << " "<< lk_a.owns_lock() << '\n';
				}
				std::this_thread::sleep_for(std::chrono::seconds(rand() % 1));
			}
		}
	};

	t1.join();
	t2.join();
}

Howard Hinnant provided an interesting paper on proposed implementations of the function and some clarifications which might be useful. The basic idea of a possible std::lock implementation is to try to lock in a particular sequence, in case of failure (detectable thanks to try_lock which is required by Lockable) one can relinquish ownership of all the already owned locks, allowing other threads to continue and possibly release some of the locks, and try again, maybe following a different order.

Note: If you have C++17 consider using std::scoped_lock instead.

If an exception is thrown during the locking attempt both std::lock and std::try_lock will release the already owned locks before propagating the exception.

std::call_once

std::call_once allow you to make sure that a function is called exactly once, even in a multi-threaded environment:

std::once_flag flag;

void func(){
    std::call_once(flag, myFunc, arg1, arg2,...);
}

No matter how many threads will call f(), myFunc will be called only once, unless it throws an exception. flag allows call_once to determine if a previous successful (i.e. no exceptions thrown) invocation was already done, so flag must be the same instance among the various concurrent threads. If myFunc throws, flag is not set and another invocation will be attempted.

Note: one might be tempted to use call_once to initialize singletons. The Meyer Singleton is a simpler (and possibly faster) alternative, details here.

We will be moving to the features provided by C++14.

Coroutines: an introduction – Part 1 of many

During the previous weeks I tried to play around with the Coroutine TS (Technical Specification) implementation currently available in Microsoft Visual Studio. Coroutines were adopter for C++20 in February 2019 so they have to be considered still experimental. Until the standard is final, changes can still be made, up to dropping the feature, but hopefully we will get coroutines in C++20. Namespaces and include path will change and some details might also change so the code snippets presented in the next parts of the series will work with MSVC (Microsoft Visual Studio) 2019 version 16.1.6.

Motivation

Online one can find many interesting tutorials on coroutines. So, why am I writing this series?

I am selfish: I need to clarify some concepts myself, so writing them down is useful to me;
Many tutorials use custom-made libraries. I want small code snippets that I can run on MSVC to begin with. Then, little by little, I want to remove the MSVC-specific extensions to have, at the end, something that is standard compliant.

Coroutines, as of C++20, are meant to be a low-level feature, intended mainly for library writers. The standard library does not provide types that support writing coroutines (except some trivial types, like suspend_never). So if you think that you can come in, drop yield (or the C++ equivalent) like in e.g. Python and have a coroutine, you’re out of luck. Hopefully, after C++20 the Standard Library will start providing types allowing us, mere mortals, to write coroutines. Nevertheless, I think that investigating the current status of coroutines and write some simple types can provide insights that could make us better users of third-party libraries that use coroutines.

Watch/Read list

Let’s start with an high-school trick: why read the book when you can watch the movie(s)? Seriously though, below you can get a list of four videos that will give you an idea of what we are talking about.

Motivations: CppCon 2015: Gor Nishanov “C++ Coroutines – a negative overhead abstraction”: this is an old talk, the speaker still assumes that coroutines will be available in C++17 (not true, unfortunately) and the keywords and some details got changed in the meantime. Good to understand the basic idea, not as a reference.

Practical applications:

CppCon 2016: James McNellis “Introduction to C++ Coroutines” : basic examples and some interesting gotchas related to lifetime issues.
CppCon 2016: Kenny Kerr & James McNellis “Putting Coroutines to Work with the Windows Runtime”: the talk is very WinRT-centric but it provides, in particular in the 2nd part, an idea on how to employ coroutines to enhance an already developed codebase.

More advanced applications: CppCon 2018: G. Nishanov “Nano-coroutines to the Rescue! (Using Coroutines TS, of Course)”, code and slides available at https://github.com/GorNishanov/await/tree/master/2018_CppCon .

Paper: as per the list of experimental features the latest draft of the TS is n4775 .

What is a Coroutine?

A normal function (routine) can only be invoked. Once the function has completed its duty, it returns, its local variables are destroyed (not entirely true, e.g. because of static, but bear with me) and the caller can continue from where it left. The caller can also use the values the function has returned or exploit the side-effects the routine has generated. A coroutine brings an extra feature to the table: it can also be suspended and resumed. In particular, the coroutine can “remember” its state once suspended and it can restart from where it left once resumed. When dealing with coroutines there are two classes of types we should focus on:

Return types and corresponding promises
Awaitable/awaiter types

and three keywords:

co_await
co_return
co_yield

We will discuss here only the return types, the rest will be discussed in future posts.

Return types and usage patterns

The signature of a coroutine is identical to the one of a routine, more specifically, the distinction between a function and a coroutine is an implementation detail. As an example, we will see that a coroutine can return a std::future thanks to a MSVC extension, but also a normal function can return a std::future. Nevertheless, being a coroutine imposes some restrictions on the return type. A discussion on the implementation details of the return types is beyond the scope of this first article, but we can give two examples of common use cases:

task-like return types
generators-like return types

Task-like return types

This kind of types are used when the coroutine returns but its result is not ready yet. With result we mean both a value the coroutine should provide or a side-effect the coroutine is tasked with generating. In both cases the coroutine will return a task-like object which allows the caller to detect when the result is ready. Let’s assume then that we have access to a class called task. If the result is a value, the task will be used to access it. MSVC allows, as an extension, the use of std::future as a task-like return type. While using std::future in coroutines has performance implications (to be discussed in a future article), it is a good way to experiment with coroutines. So if a coroutine needs to conceptually return a type Foo, it will return task<Foo> or in MSVC std::future<Foo>. On the other hand, if the coroutine should return void, it will return task<void>/std::future<void>. Then the caller can, for example, wait() once it needs the result.

Note: task is not a type provided by the standard library. The standard specifies how the type should look like, but the implementation is on you. The use of std::future as a task-like type is a MSVC extension.

Generators-like return types

A coroutine can return a value while suspending itself, once resumed it can return another value, and so on (spoiler-alert: this is what co_yield is needed for). Generators-like return types allow the caller to have a handle that will provide access to this stream of values. Furthermore, the standard has already provisions which allow adding an iterator to the type, so it can be used, for example, in a for-loop.

generator<Foo> getFoos();

for(auto element : getFoos()){
  //do something with the generated elements
}

Note: also generator is not a type provided by the standard library. The standard specifies how the type should look like, but the implementation is on you. MSVC provides an header experimental/generator with an implementation of a generator type.

Moving forward

In the next few weeks I will publish a discussion on the three keywords I mentioned and we will write our first awaiter object. Then we will start looking into the guts of a return type.

Vacation: Reading and Watch lists

I am currently enjoying some (well deserved?) vacation time with my family, while enjoying the sun and a preposterous amount of food I am also spending my time reading and watching stuff on Youtube. Maybe you can find some of these books and videos interesting:

Exceptional C++ – Herb Sutter

Exceptional C++ is an old book, but I am finding in it a lot of very interesting gems. Currently I am halfway through it.

Is it good? Sure! Herb Sutter always delivers great and entertaining reads.

Should I read it? Depends. Are you an experienced developer who already masters modern C++ concepts, like auto, move, etc.? Then sure, keep in mind that the syntax is outdated and some points might be not as valid as they used to be because of the introduction of move construction, and for the love of your favorite deity: do NOT use auto_ptr as the author does. Regardless, if you want to learn about solid OOP concepts, exception safety and a some (now) historical curiosities about the language, be my guest.

Are there any issue? Yes, sure, somebody should convince the author to write an updated version for C++17.

Edit: Just finished the book, I confirm 100% what I said and, given that the book is old, finding it second hand for dirty cheap shouldn’t be an issue.

Real Time C++ – Christopher Kormanyos

Real Time C++ is a newer book, in the description it is claimed that the book is C++17 compliant. I am phrasing the sentence in this way because I just arrived to page 100 (out of around 40o pages).

Is it good? I’m unsure, in particular given the hefty price tag (around 50 euros on the European market). The first 100 pages are pretty basic for any decently experienced C++ application developer, but, for example, the author throws around volatile, without explaining anything about it. As far as I can tell the target audience is embedded C developers who wants to understand whether C++17 could be an useful tool for their job. Another thing that perplexes me are some weird inconsistencies in the style: the author still uses typedef instead of using for type alias and at Page 51 he writes the following:

volatile std::atomic<std::uint32_t> system_tick;

As far as I know using volatile with std::atomic is completely useless, see e.g. https://stackoverflow.com/questions/8819095/concurrency-atomic-and-volatile-in-c11-memory-model, but again, without a proper explanation of the use-case is difficult to understand what is going on.

Should I read it? Are you a C developer tired of writing macros and cleanup code that a C++ compiler will write for you? Dig in! Are you a decent C++ developer interested in improving your skills in the embedded space? You might want to consider buying it second-hand or get your employer to buy a copy for your team.

Youtube

I had the opportunity to catch up with a number of conferences in these days, some interesting videos I stumbled upon:

Developing Kernel Drivers with Modern C++ – Pavel Yosifovich: great talk on kernel development in C++ on Windows, I will most likely get one of the speaker’s books for my next vacations. It is a shame that such a great speaker has been under my radar for so long;
ML and the IoT: Living on the Edge – Brandon Satrom: great demos of a machine learning application running on an IoT device, unfortunately a very Python-centric demo, but the IoT/ML space is surely something any C++ developer should keep an eye on: there’s a limit to what you can do with Python when it comes to performance;
Two very interesting talks that should be seen back-to-back, allocate around 3 hours though:
- Taming dynamic memory – An introduction to custom allocators in C++ – Andreas Weis – code::dive 2018 : a nice introduction on C++17 polymorphic allocators and a discussion of their pros;
- C++Now 2018: David Sankel “C++17’s std::pmr Comes With a Cost” on the cons of C++17 polymorphic allocators. Some understanding of the subject is required, so, you should see the previous talk first. The C++Now crowd is pretty interactive and John Lakos, colleague of the presenter and huge fan of polymorphic allocators had some issues (that he voiced) with the presentation. It is a shame that there isn’t a microphone for the crowd at C++Now.
KEYNOTE: De-fragmenting C++: Making exceptions more affordable and usable – Herb Sutter [ACCU 2019]: Herb Sutter explains us his new proposal about static exceptions, at the moment of writing this is the most up-to-date talk about the subject. At the very end, in my opinion, he goes a bit too visionary but the first one hour is a great explanation of a feature that might revolutionize the use of C++ for embedded and kernel development.

Mutex in C++: the story so far – Part 1

I assumed that by now everybody knew how to properly lock and unlock a mutex in C++, in particular after this great book was published in 2012 and reprinted this year (2019). Well, a long debug session on Friday, few weeks ago, made all too clear to me that I was wrong, and with this series of articles I’ll try my best to explain my take on mutexes in C++.

What is a mutex?

Before talking about mutexes we need to understand what they are trying to fix. Consider this sum:

void add(int& num1, const int& num2) {
    num1 += num2;
}

And the corresponding assembly

add(int&, int const&):
        mov     eax, DWORD PTR [rsi]
        add     DWORD PTR [rdi], eax
        ret

The increment is at least 2 instructions, first we load num2 from memory to a CPU register (eax) and then we add it to num1, writing the result back to memory. Let’s assume now that num1 is accessed and modified by another thread (thread2) at the same time, while thread1 invokes the above function.

thread1 invokes the function, num1 is 5, num2 is 3.
thread1 executes the first assembly instruction and then for whatever reason, it cannot advance;
In the meantime thread2 sets num1 to 0;
thread1 can finally advance, and writes back 8 in the memory location where num1, and we have basically “lost” the work that thread2 did.

The situation is even worse with multi-processor environments (so basically the majority of modern environments) where a change made by a CPU might not be published to the other CPU(s) working on the same memory.

The mutexes, and other synchronization primitives, aim to fix all these problems. A mutex, in particular, serializes the access to shared resources. With serializing we mean that only one thread at a time can access the resources the mutex is protecting. Mutex stands for MUtual EXclusion.

The pre-history

Before C++11 the C++ memory model did not contemplate multi-threading, hence various vendors created their own extensions. Those were dark times that we do not want to remember, but if you go in some legacy code you might find remnants of those times. Writing multi-threaded and portable code in C++ was a struggle and one had to find the common set of features between the various target systems and wrap the various system-specific facilities so that they could expose the same API, or use boost or other external libraries.

C++11

With C++11 the language finally caught up with the 21st Century and we got a portable way to coordinate threads (and also std::threads), and behold! The header <mutex> was born. We will talk about the following elements of this header, for more information, please refer to cppreference . This is a list of the elements in the header in C++11:

std::mutex
std::timed_mutex
std::recursive_mutex
std::recursive_timed_mutex
std::lock_guard
std::unique_lock
std::lock
std::try_lock
std::call_once

Mutex

std::mutex is the simplest class we have to protect shared resources. A thread must take ownership of an instance of a mutex calling lock() and release it calling unlock(). Until ownership of the instance is released, any other thread will stop at any invocation of the lock() member function of the particular instance, until unlock() is called, even the thread that already owns the mutex! Yes, dear Windows developers, std::mutex is non-recursive as opposed to the mutexes available on the Windows APIs.

Should I show an example right? No! Calling directly unlock() is one of the worst sins you can commit in multithreaded C++. This can create the kind of insidious bugs that might force you (or your colleagues) to embark in a painful debug session on a Friday evening. You don’t want to meet those colleagues on a Monday morning.

The standard gives us lock_guard and unique_lock to avoid the direct calls. We will discuss them in the next part of this series. For completeness, we can now introduce another member function of std::mutex: try_lock(), with it a thread tries to acquire ownership of a mutex, in case of success the function returns true and the mutex is acquired, otherwise false is returned and the thread can retry later, maybe using the time in between to do something else.

To summarize:

lock(): acquire ownership of the mutex and exclusive access to the shared resource protected by it. The call blocks until ownership is successfully acquired;
try_lock(): like lock() but non-blocking, if ownership cannot be acquired, the function returns false and the thread can do something else, but not access the shared resource;
unlock(): release ownership, allowing other threads to acquire the mutex;
std::mutex is non-recursive calling lock() in the thread that already owns the mutex will stall the execution forever (deadlock);
If you call directly unlock() on a std::mutex (or similar) in your code, it’s a mistake.

std::timed_mutex has the same properties of std::mutex and two extra member functions:

try_lock_for(): like lock() (i.e. blocking) but after the timeout is expired and ownership cannot be acquired, the function returns false and the thread can do something else. If ownership can be acquired before the timeout expiration, the function return true.
try_lock_until(): like try_lock_for() but with a point in time rather than a timeout.

std::recursive_mutex and std::recursive_timed_mutex are the same as a std::mutex and std::timed_mutex respectively, but recursive, meaning that if a thread owns the mutex, any other tentative of acquiring the mutex by the thread will be successful. Obviously other threads will still be suspended until the release of ownership. Moreover, for every acquisition of the resource, there must be a corresponding call to unlock(), example: if we call 3 times lock() and two times try_lock_for() on the same thread, the mutex will be released only after 5 calls to unlock().

Next time we will discuss guards and the safe way to use mutex in concurrent code.