memory-barriers.txt - OpenGrok cross reference for /linux-4.4.14/Documentation/memory-barriers.txt

Lines Matching refs:that
101 Each CPU executes a program that generates memory access operations.  In the
163 Note that CPU 2 will never try and load C into D because the CPU will load P
173 registers that are accessed through an address port register (A) and a data
192 There are some minimal guarantees that may be expected of a CPU:
195      respect to itself.  This means that for:
203      and always in that order.  On most systems, smp_read_barrier_depends()
206      note that you should normally use something like rcu_dereference()
210      ordered within that CPU.  This means that for:
229 And there are a number of things that _must_ or _must_not_ be assumed:
231  (*) It _must_not_ be assumed that the compiler will do what you want
232      with memory references that are not protected by READ_ONCE() and
237  (*) It _must_not_ be assumed that independent loads and stores will be issued
238      in the order given.  This means that for:
251  (*) It _must_ be assumed that overlapping memory accesses may be merged or
252      discarded.  This means that for:
286      variables.  "Properly sized" currently means variables that are
291      on 32-bit and 64-bit systems, respectively.  Note that these
344      A write memory barrier gives a guarantee that all the STORE operations
356      [!] Note that write barriers should normally be paired with read or data
363      where two loads are performed such that the second depends on the result
366      make sure that the target of the second load is updated before the address
374      committing sequences of stores to the memory system that the CPU being
376      under consideration guarantees that for any load preceding it, if that
378      time the barrier completes, the effects of all the stores prior to that
385      [!] Note that the first load really has to have a _data_ dependency and
392      [!] Note that data dependency barriers should normally be paired with
398      A read barrier is a data dependency barrier plus a guarantee that all the
409      [!] Note that read barriers should normally be paired with write barriers;
415      A general memory barrier gives a guarantee that all the LOAD and STORE
430      This acts as a one-way permeable barrier.  It guarantees that all memory
436      Memory operations that occur before an ACQUIRE operation may appear to
445      This also acts as a one-way permeable barrier.  It guarantees that all
451      Memory operations that occur after a RELEASE operation may appear to
459      RELEASE on that same variable are guaranteed to be visible.  In other
461      previous critical sections for that variable are guaranteed to have
464      This means that ACQUIRE acts as a minimal "acquire" operation and
469 between two CPUs or between a CPU and a device.  If it can be guaranteed that
471 memory barriers are unnecessary in that piece of code.
474 Note that these are the _minimum_ guarantees.  Different architectures may give
482 There are certain things that the Linux kernel memory barriers do not guarantee:
484  (*) There is no guarantee that any of the memory accesses specified before a
486      instruction; the barrier can be considered to draw a line in that CPU's
487      access queue that accesses of the appropriate type may not cross.
489  (*) There is no guarantee that issuing a memory barrier on one CPU will have
494  (*) There is no guarantee that a CPU will see the correct order of effects
499  (*) There is no guarantee that some intervening piece of off-the-CPU
515 it's not always obvious that they're needed.  To illustrate, consider the
527 There's a clear data dependency here, and it would seem that by the end of the
528 sequence, Q must be either &A or &B, and that:
558 [!] Note that this extremely counterintuitive situation arises most easily on
559 machines with split caches, so that, for example, one cache bank processes
606 dependency, but rather a control dependency that the CPU may short-circuit
607 by attempting to predict the outcome in advance, so that other CPUs see
617 However, stores are not speculated.  This means that ordering -is- provided
626 said, please note that READ_ONCE() is not optional! Without the
631 Worse yet, if the compiler is able to prove (say) that the value of
670 'b', which means that the CPU is within its rights to reorder them:
713 If MAX is defined to be 1, then the compiler knows that (q % MAX) is
725 relying on this ordering, you should make sure that MAX is greater than
738 Please note once again that the stores to 'b' differ.  If they were
756 This example underscores the need to ensure that the compiler cannot
787 that is, just before or just after the "if" statements.  Furthermore,
814   (*) Control dependencies require that the compiler avoid reordering the
874 [!] Note that the stores before the write barrier would normally be expected to
903 that the rest of the system might perceive as the unordered set of { STORE A,
971 In the above example, CPU 2 perceives that B is 7, despite the load of *C
1127 But it may be that the update to A from CPU 1 becomes perceptible to CPU 2
1153 The guarantee is that the second load will always come up with A == 1 if the
1155 A; that may come up with either A == 0 or A == 1.
1161 Many CPUs speculate with loads: that is they see that they will need to load an
1164 got to that point in the instruction execution flow yet.  This permits the
1168 It may turn out that the CPU didn't actually need the value - perhaps because a
1258 Transitivity is a deeply intuitive notion about ordering that is not
1269 Suppose that CPU 2's load from X returns 1 and its load from Y returns 0.
1270 This indicates that CPU 2's load from X in some sense follows CPU 1's
1271 store to X and that CPU 2's load from Y in some sense preceded CPU 3's
1275 is natural to expect that CPU 3's load from X must therefore return 1.
1278 CPU A's load must either return the same value that CPU B's load did,
1287 For example, suppose that CPU 2's general barrier in the above example
1301 The key point is that although CPU 2's read barrier orders its pair
1305 General barriers are therefore required to ensure that all CPUs agree
1316 The Linux kernel has a variety of different barriers that act at different
1329 The Linux kernel has an explicit compiler barrier function that prevents the
1336 thought of as weak forms of barrier() that affect only the specific
1344      interrupt-handler code and the code that was interrupted.
1347      in that loop's conditional on each pass through that loop.
1350 optimizations that, while perfectly safe in single-threaded code, can
1356      rights to reorder loads to the same variable.  This means that
1415      Note that if the compiler runs short of registers, it might save
1422      what the value will be.  For example, if the compiler can prove that
1433      gets rid of a load and a branch.  The problem is that the compiler
1434      will carry out its proof assuming that the current CPU is the only
1437      compiler that it doesn't know as much as it thinks it does:
1442      But please note that the compiler is also closely watching what you
1449      Then the compiler knows that the result of the "%" operator applied
1455      if it knows that the variable already has the value being stored.
1456      Again, the compiler assumes that the current CPU is the only one
1462 	/* Code that does not store to variable a. */
1465      The compiler sees that the value of variable 'a' is already zero, so
1474 	/* Code that does not store to variable a. */
1519      Note that the READ_ONCE() and WRITE_ONCE() wrappers in
1521      be interrupted by something that also accesses 'flag' and 'msg',
1524      for documentation purposes.  (Note also that nested interrupts
1529      You should assume that the compiler can move READ_ONCE() and
1537      discard the value of all memory locations that it has currented
1582      Please note that GCC really does use this sort of optimization,
1583      which is not surprising given that it would likely take more
1618 All that aside, it is never necessary to use READ_ONCE() and
1619 WRITE_ONCE() on a variable that has been marked volatile.  For example,
1621 say READ_ONCE(jiffies).  The reason for this is that READ_ONCE() and
1625 Please note that these compiler barriers have no direct effect on the CPU,
1648 the C specification that the compiler may not speculate the value of b
1656 systems because it is assumed that a CPU will appear to be self-consistent,
1659 [!] Note that SMP memory barriers _must_ be used to control the ordering of
1684      decrement) functions that don't return a value, especially when used for
1687      These are also used for atomic bitop functions that do not return a
1690      As an example, consider a piece of code that marks an object as being dead
1697      This makes sure that the death mark on the object is perceived to be set
1712      that can be used both with and without RCU.
1722      For example, consider a device driver that shares memory with a device
1751      can see it now has ownership.  The wmb() is needed to guarantee that the
1765 This is a variation on the mandatory write barrier that causes writes to weakly
1806      subsequent loads and stores. Note that this is weaker than smp_mb()!
1820      completed before that ACQUIRE operation.
1835 one-way barriers is that the effects of instructions outside of a critical
1855 another CPU not holding that lock.  In short, a ACQUIRE followed by an
1861 so that:
1872 It might appear that this reordering could introduce a deadlock.
1878 	One key point is that we are only talking about the CPU doing
1880 	that matter, the developer) switched the operations, deadlock
1887 	try to sleep, but more on that later).	The CPU will eventually
1892 	But what if the lock is a sleeplock?  In that case, the code will
1922 	[+] Note that {*F,*A} indicates a combined access.
1936 Functions that disable interrupts (ACQUIRE equivalent) and enable interrupts
1947 the event and the global data used to indicate the event.  To make sure that
1991 Secondly, code that performs a wake up normally follows something like this:
2047 [!] Note that the memory barriers implied by the sleeper and the waker do _not_
2064 there's no guarantee that the change to event_indicated will be perceived by
2086 Other functions that imply barriers:
2096 that does affect memory access ordering on other CPUs, within the context of
2175 this will ensure that the two stores issued on CPU 1 appear at the PCI bridge
2224 operations that affect both CPUs may have to be carefully ordered to prevent
2268 before proceeding.  Since the record is on the waiter's stack, this means that
2305 In this case, the barrier makes a guarantee that all memory accesses before the
2307 with respect to the other CPUs on the system.  It does _not_ guarantee that all
2314 CPU, that CPU's dependency ordering logic will take care of everything else.
2325 Any atomic operation that modifies some state in memory and returns information
2401 [!] Note that special memory barrier primitives are available for these
2417 in that the carefully sequenced accesses in the driver code won't reach the
2419 efficient to reorder, combine or merge accesses - something that would cause
2447 form of locking), such that the critical operations are all contained within
2451 handled, thus the interrupt handler does not need to lock against that.
2453 However, consider a driver that was talking to an ethernet card that sports an
2454 address register and a data register.  If that driver's core talks to the card
2472 If ordering rules are relaxed, it must be assumed that accesses done inside an
2479 registers that form implicit I/O barriers. If this isn't sufficient then an
2484 running on separate CPUs that communicate with each other. If such a case is
2498      that's primarily a CPU-specific concept. The i386 and x86_64 processors do
2505      memory map, particularly on those CPUs that don't support alternate I/O
2510      that.
2549      required, an mmiowb() barrier can be used. Note that relaxed accesses to
2563 It has to be assumed that the conceptual CPU is weakly-ordered but that it will
2569 This means that it must be considered that the CPU will execute its instruction
2570 stream in any order it feels like - or even in parallel - provided that if an
2571 instruction in the stream depends on an earlier instruction, then that
2573 instruction may proceed; in other words: provided that the appearance of
2580 A CPU may also discard any instruction sequence that winds up having no
2585 Similarly, it has to be assumed that compiler might reorder the instruction
2595 a certain extent by the caches that lie between CPUs and memory, and by the
2596 memory coherence system that maintains the consistency of state in the system.
2626 CPU that issued it since it may have been satisfied within the CPU's own cache,
2655 caches are expected to be coherent, there's no guarantee that that coherency
2656 will be ordered.  This means that whilst changes made on one CPU will
2657 eventually become visible on all CPUs, there's no guarantee that they will
2661 Consider dealing with a system that has a pair of CPUs (1 & 2), each of which
2696  (*) each cache has a queue of operations that need to be applied to that cache
2703 Imagine, then, that two writes are made on the first CPU, with a write barrier
2704 between them to guarantee that they will appear to reach that CPU's caches in
2717 The write memory barrier forces the other CPUs in the system to perceive that
2719 now imagine that the second CPU wants to read those values:
2749 no guarantee that, without intervention, the order of update will be the same
2750 as that committed on CPU 1.
2776 split cache that improves performance by making better use of the data bus.
2798 obscure the fact that RAM has been updated, until at such time as the cacheline
2809 Memory mapped I/O usually takes place through memory locations that are part of
2810 a window in the CPU's memory space that has different properties assigned than
2813 Amongst these properties is usually the fact that such accesses bypass the
2815 may, in effect, overtake accesses to cached memory that were emitted earlier.
2825 A programmer might take it for granted that the CPU will perform memory
2826 operations in exactly the order specified, so that if the CPU is, for example,
2835 they would then expect that the CPU will complete the memory operation for each
2859      memory or I/O hardware that can do batched accesses of adjacent locations,
2865      - there's no guarantee that the coherency management will be propagated in
2876 However, it is guaranteed that a CPU will be self-consistent: it will see its
2887 and assuming no intervention by an external influence, it can be assumed that
2900 in that order, but, without intervention, the sequence may have almost any
2902 of the world remains consistent.  Note that READ_ONCE() and WRITE_ONCE()
2908 and st.rel instructions (respectively) that prevent such reordering.
2923 assumed that the effect of the storage of V to *A is lost.  Similarly:
2940 The DEC Alpha CPU is one of the most relaxed CPUs there is.  Not only that,