C++11 では標準化されたメモリモデルが導入されました。これは何を意味するのでしょうか。また、C++ プログラミングにどのような影響を与えるのでしょうか。質問する

Question

まず、言語弁護士のように考えることを学ばなければなりません。

C++ 仕様では、特定のコンパイラ、オペレーティングシステム、CPU は参照されません。実際のシステムを一般化した抽象マシンを参照します。言語弁護士の世界では、プログラマーの仕事は抽象マシン用のコードを書くことです。コンパイラの仕事は、そのコードを具体的なマシンで実現することです。仕様に厳密に従ってコーディングすることで、現在でも 50 年後でも、準拠した C++ コンパイラを備えたあらゆるシステムで、コードが変更なしでコンパイルされ、実行されることが保証されます。

C++98/C++03 仕様の抽象マシンは、基本的にシングルスレッドです。そのため、仕様に関して「完全に移植可能な」マルチスレッド C++ コードを記述することはできません。仕様では、メモリのロードとストアのアトミック性や、ロードとストアが発生する順序について何も言及されておらず、ミューテックスなどについては言うまでもありません。

もちろん、実際には、pthreads や Windows などの特定の具体的なシステム向けにマルチスレッドコードを記述できます。ただし、 C++98/C++03 向けにマルチスレッドコードを記述する標準的な方法はありません。

The abstract machine in C++11 is multi-threaded by design. It also has a well-defined memory model; that is, it says what the compiler may and may not do when it comes to accessing memory.

Consider the following example, where a pair of global variables are accessed concurrently by two threads:

           Global
           int x, y;

Thread 1            Thread 2
x = 17;             cout << y << " ";
y = 37;             cout << x << endl;

What might Thread 2 output?

Under C++98/C++03, this is not even Undefined Behavior; the question itself is meaningless because the standard does not contemplate anything called a "thread".

Under C++11, the result is Undefined Behavior, because loads and stores need not be atomic in general. Which may not seem like much of an improvement... And by itself, it's not.

But with C++11, you can write this:

           Global
           atomic<int> x, y;

Thread 1                 Thread 2
x.store(17);             cout << y.load() << " ";
y.store(37);             cout << x.load() << endl;

Now things get much more interesting. First of all, the behavior here is defined. Thread 2 could now print 0 0 (if it runs before Thread 1), 37 17 (if it runs after Thread 1), or 0 17 (if it runs after Thread 1 assigns to x but before it assigns to y).

What it cannot print is 37 0, because the default mode for atomic loads/stores in C++11 is to enforce sequential consistency. This just means all loads and stores must be "as if" they happened in the order you wrote them within each thread, while operations among threads can be interleaved however the system likes. So the default behavior of atomics provides both atomicity and ordering for loads and stores.

Now, on a modern CPU, ensuring sequential consistency can be expensive. In particular, the compiler is likely to emit full-blown memory barriers between every access here. But if your algorithm can tolerate out-of-order loads and stores; i.e., if it requires atomicity but not ordering; i.e., if it can tolerate 37 0 as output from this program, then you can write this:

           Global
           atomic<int> x, y;

Thread 1                            Thread 2
x.store(17,memory_order_relaxed);   cout << y.load(memory_order_relaxed) << " ";
y.store(37,memory_order_relaxed);   cout << x.load(memory_order_relaxed) << endl;

The more modern the CPU, the more likely this is to be faster than the previous example.

Finally, if you just need to keep particular loads and stores in order, you can write:

           Global
           atomic<int> x, y;

Thread 1                            Thread 2
x.store(17,memory_order_release);   cout << y.load(memory_order_acquire) << " ";
y.store(37,memory_order_release);   cout << x.load(memory_order_acquire) << endl;

This takes us back to the ordered loads and stores – so 37 0 is no longer a possible output – but it does so with minimal overhead. (In this trivial example, the result is the same as full-blown sequential consistency; in a larger program, it would not be.)

Of course, if the only outputs you want to see are 0 0 or 37 17, you can just wrap a mutex around the original code. But if you have read this far, I bet you already know how that works, and this answer is already longer than I intended :-).

So, bottom line. Mutexes are great, and C++11 standardizes them. But sometimes for performance reasons you want lower-level primitives (e.g., the classic double-checked locking pattern)。新しい標準では、ミューテックスや条件変数などの高レベルのガジェットが提供され、アトミック型やさまざまなメモリバリアなどの低レベルのガジェットも提供されます。そのため、標準で指定された言語内で完全に高度で高性能な並行ルーチンを記述できるようになり、コードが現在のシステムでも将来のシステムでも変更なしでコンパイルおよび実行されることが保証されます。

率直に言うと、あなたが専門家で、本格的な低レベルコードに取り組んでいるのでなければ、おそらくミューテックスと条件変数に固執するべきでしょう。それが私がやろうとしていることです。

詳細については、このブログ投稿。

Answer 1