Modified: 07th Jul 14 |

Synchronized Threads (in the D Programming Language)

Original post date: 2008-09-03

Lately i have seen a lot of uncertainty about multithreading and thread synchronization, and when you have explained the same thing a certain number of times, you just get tired of it. So what I intend to do here, is to outline how one makes thread safe routines in the D Programming Language using the built in synchronized keyword.

First i will clear things up a bit to those who already understand the principles behind synchronization and multithreaded execution. Evidently the specification doesn't clearly specify semantics of the keyword (because many have asked how it works even though they did read the specification), so i will quickly outline these. After that i will go a little more in depth with how and why to use synchronization (locking), in the hope that someone might learn something.

Outline of the semantics of the synchronized keyword in D

in D all class types contain two additional, but hidden members. The first member is the virtual table, which we will not discuss today, and the other is the monitor, which we indeed will discuss today.
There are 3 cases of usage of the synchronized keyword:

  1. <statements> synchronized { <critical statements> } <more statements>
  2. <statements> synchronized ( <expression> ) { <critical statements> } <more statements>
  3. <declarations> synchronized <class member function declaration> <more declarations>

The semantics:
In case one, we only allow one thread at a time to access the critical statements this is equivalent to having a global mutex that we lock at the point we pass the opening bracket and unlock as soon as we leaves the scope. (no matter how we leaves the scope) there are one global lock for each of the constructs of case one.
Case two is similar to case one. However in this case we use a monitor from a class as the lock rather than a global lock. The expression will be implicitly casted to Object (the base object in D), and then the monitor of that object will be used as the lock. This construct is useful for ensuring atomic access to an object. This will be explained in a moment.
Lastly we have case three which is semantically identical to case two with the expression being the object itself. That is

  1. class Foo {
  2.     synchronized void bar() {
  3.         . . .
  4.     }
  5. }

is semantically equivalent to

  1. class Foo {
  2.     void bar() { synchronized (this) {
  3.         . . .
  4.     }}
  5. }

More in-depth explanation of how and why to use synchronization

So why do we even care to synchronize at all? Lets say you have a simple stack of characters declared as char[] global_stack. And we have two concurrent threads; A and B, that both push data to the stack.

Example of two threads adding data to the same stack
  1. import std.stdio;  // for writefln()   -
  2. import std.thread; // for class Thread -
  3. import std.c.time; // for sleep()      -
  5. char[] global_stack = "Stack: ";
  7. void main()
  8. {
  9.     // runs thread_a() and thread_b() in parallel
  10.     Thread a = new Thread(&amp;A, null); a.start();
  11.     B(null);
  12.     sleep(1); // gives time for A to finish
  14.     writefln(global_stack);
  15. }
  17. int A(void* ignore)
  18. {
  19.     writefln("A running");
  21.     for (size_t i = 0; i &lt; 10; i += 1)
  22.         push(global_stack, "AAAAA");
  24.     writefln("A stopped");
  25.     return 0;
  26. }
  28. int B(void* ignore)
  29. {
  30.     writefln("B running");
  32.     for (size_t i = 0; i &lt; 10; i += 1)
  33.         push(global_stack, "BBBBB");
  35.     writefln("B stopped");
  36.     return 0;
  37. }
  39. /// adds item to the top of the stack
  40. void push(ref char[] stack, char[] item)
  41. {
  42.     foreach (c; item)
  43.     {
  44.         usleep(1); // because we cannot rely on time nor scheduling
  45.         stack ~= c; // adds a bit of data
  46.     }
  47. }

A little explanation of the code, just to keep everyone on track:
void main() initializes and starts a thread that runs int A(void* ignore) and then runs int B(void* ignore) in the existing main thread. Then gives time for the threads to finish and prints the contents of the global_stack in the console.
int A(void* ignore) and int B(void* ignore) are almost the same, only difference is that A pushes "AAAAA" to the stack and B pushes "BBBBB" to the stack.
void push(ref char[] stack, char[] item) will push the data from item to the top of the stack one byte at a time, while also challenging the scheduler (the component of the operating system that gives the illusion of running more threads than there are processing units) for the sake of illustrating the worst case. (because worst cases actually do happen)

When running the above code, we might expect the output to be something similar to:
"Stack: AAAAABBBBBAAAAABBBBBBBBBBAAAAAAAAAAAAAAABBBBBAAAAABBBBB" [...] With this output all the items in the stack indeed has retained their original structure. But if we truly does expect this output from the above code, we are in for a surprise. the output is in fact "Stack: BABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABABA" This is the worst case we could possibly get, all items are completely destroyed if we try to pop an item we will get "BABAB" which we never put into the stack. This is where we can make use of synchronization. The simplest kind of synchronization the D Programming Language offers, is the basic critical section:

Simple critical section
  1. void foo()
  2. {
  3.     synchronized // critical section
  4.     {
  5.         // only one thread can stay within these brackets at one time
  6.         bar();
  7.     }
  8. }

But what does this mean?
It means that whenever a thread is executing within the critical section or calling something from the critical section (same thing) no other thread can enter the section.

Lets apply this to our stack example.
What is really going wrong in our stack is that items are not pushed atomically (all at once or nothing at all) to the stack. This is because the scheduler can interrupt the thread at any point in time, thereby breaking the push function into tiny pieces of work that can be interleaved with other unrelated tiny pieces of work. What the critical section does, is that it prevents this interleaving from happening, by disallowing anyone else from entering the section while someone else is already in there.

Lets make a new function for pushing items on the stack:

Pushing an item to the stack while synchronizing with simple critical section
  1. void safe_push(ref char[] stack, char[] item)
  2. {
  3.     synchronized // critical section
  4.     {
  5.         foreach (c; item)
  6.         {
  7.             usleep(1); // because we cant rely on time
  8.             stack~= c;
  9.         }
  10.     }
  11. }

This safe_push function has exactly the same semantics as the old and rusty push function, with the addition of preventing interleaving of other threads.
So if we alter A and B to call safe_push instead of push, we will find that the result is "Stack: BBBBBAAAAABBBBBAAAAABBBBBAAAAABBBBBAAAAABBBBBAAAAABBBBBAAAAABBBBBAAAAABBBBBAAAAABBBBBAAAAABBBBBAAAAA" which only contains valid items, that is, items that we actually supplied to the safe_push function. We can now say that our stack maintains consistency.

When we want to get something out of the stack we just make a similar method to pull items out.

Popping an item from the stack while synchronizing with simple critical section
  1. void safe_pop(ref char[] stack, ref char[] buffer)
  2. {
  3.     synchronized // critical section
  4.     {
  5.         char[] top_item = stack[$-buffer.length .. $];
  6.         stack = stack[0 .. $-buffer.length];
  8.         foreach (i, c; top_item)
  9.         {
  10.             usleep(1); // because we cant rely on time
  11.             buffer[i] = c;
  12.         }
  13.     }
  14. }

A little explanation of the safe_pop function:

The function expects buffer to be allocated to the size of the item that is supposed to be moved from the top of the stack and into the buffer. First the top of the stack gets referenced by top_item after which the stack is reduced. The data from top_item is now copied into the buffer in the same manner as it was originally pushed onto the stack.

While this safe_pop function indeed will pop items from the stack while maintaining consistency. It will not maintain consistency while being used in parallel with safe_push. (We ignore the case of the stack being empty for the sake of simplicity of the examples).

The kind of synchronization used in our safe_push and safe_pop functions is however of limited use. For example, consider a thread that begins to pop an item from the stack while another thread is in the process of pushing another item.
The pushing thread might have pushed 2 out of 5 characters and the global_stack currently contains "AAAAABBBBBAA". Now the scheduler decides to run the code of the popping thread, and the popping thread puts the last 5 characters "BBBAA" into top_item and reduces the global_stack which will now contain "AAAAABB". The popping thread finishes, and the scheduler restarts the pushing thread, which continues by adding the remaining 3 characters, leaving the global_stack with the contents "AAAAABBAAA". Remember that each of the critical sections in our safe_pop and safe_push functions have different global locks. So there is nothing that prevents one thread to be in the critical section of safe_push while another thread is in the critical section of safe_pop and vice versa.

Another limitation is that if we want to use more stacks than one, we cannot concurrently push to two different stacks synchronously, same goes for the case of popping from two different stacks. This is however because of the fact that the locks are global. There is however another kind of synchronization that overcomes these limitations.

The other kind of synchronization
We have realized that we might want to have several stacks than only one global stack. So let's take all of our stack code that we have so far, and put it into a class to make it a little more clear that we intend to have several instances of stacks.

Recapitulation of our implementation
  1. import std.stdio;  // for writefln()   -
  2. import std.thread; // for class Thread -
  3. import std.c.time; // for sleep()      -
  5. void main()
  6. {
  7.     Stack s = new Stack();
  8.     // runs thread_a() and thread_b() in parallel
  9.     Thread a = new Thread(&amp;A, cast(void*)s); a.start();
  10.     B(cast(void*)s);
  11.     sleep(1); // gives time for A to finish
  13.     writefln(s.stack);
  14. }
  16. int A(void* stack_ptr)
  17. {
  18.     Stack stack = cast(Stack)stack_ptr;
  19.     writefln("A running");
  21.     for (size_t i = 0; i &lt; 10; i += 1)
  22.         stack.safe_push("AAAAA");
  24.     writefln("A stopped");
  25.     return 0;
  26. }
  28. int B(void* stack_ptr)
  29. {
  30.     Stack stack = cast(Stack)stack_ptr;
  31.     writefln("B running");
  33.     for (size_t i = 0; i &lt; 10; i += 1)
  34.         stack.safe_push("BBBBB");
  36.     writefln("B stopped");
  37.     return 0;
  38. }
  39. class Stack
  40. {
  41.     char[] stack = "Stack: ";
  43.     /// safely adds item to the top of the stack
  44.     void safe_push(char[] item)
  45.     {
  46.         synchronized // critical section
  47.         {
  48.             foreach (c; item)
  49.             {
  50.                 usleep(1); // because we cant rely on time
  51.                 stack~= c;
  52.             }
  53.         }
  54.     }
  56.     /// safely pops item from the top of the stack
  57.     void safe_pop(ref char[] buffer)
  58.     {
  59.         synchronized // critical section
  60.         {
  61.             char[] top_item = stack[$-buffer.length .. $];
  62.             stack = stack[0 .. $-buffer.length];
  64.             foreach (i, c; top_item)
  65.             {
  66.                 usleep(1); // because we cant rely on time
  67.                 buffer[i] = c;
  68.             }
  69.         }
  70.     }
  71. }

But as mentioned earlier the synchronization constructs we are using are attached to different global locks. But the D Programming Language have yet another synchronization construct that looks pretty much like the one we're already using. But before we're going to use this construct we need to know, that In D, classes have two hidden member variables. The first one is a pointer to a structure called a virtual table, and the other member is something called a monitor. The one we are interested in is the monitor. "Monitor" is just a silly word for what we call a "lock". So all classes have their own lock which is handy when you want to operate on an object that you don't want anyone else to touch while you're using it. And that is in fact just what we want to achieve.

The second synchronization construct looks like this: "synchronized ( <expression> ) <statement>". The expression must evaluate to any object, because it uses the monitor/lock hidden in the object instead of a global lock. So if we change the lines that contain "synchronized", in the above code, to "synchronized (this)" then we have effectively made our stack implementation maintain consistency regardless of the number of threads that access it concurrently. As well as made it possible to access several different stack instantiations in parallel, without unnecessary halting of other threads.

The third 'kind' of synchronization
There is another way to achieve the exact same thing as we have above, and that is by using the attribute "synchronized" in the method declaration of the methods that need exclusive access to the members of the object.

Rewrite to use synchronized attribute
  1. class Stack
  2. {
  3.     char[] stack = "Stack: ";
  5.     /// safely adds item to the top of the stack
  6.     synchronized void safe_push(char[] item)
  7.     {
  8.         foreach (c; item)
  9.         {
  10.             usleep(1); // because we cant rely on time
  11.             stack~= c;
  12.         }
  13.     }
  15.     /// safely pops item from the top of the stack
  16.     synchronized void safe_pop(ref char[] buffer)
  17.     {
  18.         char[] top_item = stack[$-buffer.length .. $];
  19.         stack = stack[0 .. $-buffer.length];
  21.         foreach (i, c; top_item)
  22.         {
  23.             usleep(1); // because we cant rely on time
  24.             buffer[i] = c;
  25.         }
  26.     }
  27. }

This last piece of code is clearly more readable, not to mention, shorter, than the other code. But we cant completely ignore the more explicit version, because we might have some code that needs to have exclusive access to our stack while doing several operations on it. For example if we need to push two items that we need to be consecutive in the stack. To do something like that, consider the following code, where A and B runs in parallel and pushes items to the same stack instance:

Maintaining atomicity across calls
  1. int A(void* stack_ptr)
  2. {
  3.     Stack stack = cast(Stack)stack_ptr;
  4.     writefln("A running");
  6.     for (size_t i = 0; i &lt; 10; i += 1)
  7.         synchronized (stack)
  8.         {
  9.             stack.safe_push("AAAAA");
  10.             stack.safe_push("CCCCC");
  11.         }
  13.     writefln("A stopped");
  14.     return 0;
  15. }
  17. int B(void* stack_ptr)
  18. {
  19.     Stack stack = cast(Stack)stack_ptr;
  20.     writefln("B running");
  22.     for (size_t i = 0; i &lt; 10; i += 1)
  23.         synchronized (stack)
  24.         {
  25.             stack.safe_push("BBBBB");
  27.             stack.safe_push("DDDDD");
  28.         }
  30.     writefln("B stopped");
  31.     return 0;
  32. }

Here we have locked the stack instance from the outside, making it impossible to anyone else to access member methods that have been declared with the synchronized attribute. In general it prevents any other thread from entering a critical section that uses the same monitor/lock. So executing the above code we can be confident that regardless of how it is scheduled, any A item will have a C item directly above it in the stack, and same goes for B items in relation to D items. In other words, we will see that whenever "AAAAA" occurs in the stack, there will be "CCCCC" immediately following it.

Final words

So this was my second blog post ever, and the first serious blog post ever, so i hope i got something right. I intend it to be newbie friendly, I'm not sure if i have succeeded. I realize I have used some less-newbie-friendly terminology, but explaining every detail of everything is just silly.

All of the code should be compilable. The code in this post is in no way intended to be efficient, pretty nor smart, it is solely supposed to explain how and why to use synchronization.

Anyways, I just hope that someone learned something from reading some part of this, because i spend a lot of time writing it. But please let me know what you think!..