memory-barriers.txt - OpenGrok cross reference for /linux-4.1.27/Documentation/memory-barriers.txt

Lines Matching refs:CPU
29      - CPU memory barriers.
39  (*) Inter-CPU locking barrier effects.
84 		| CPU 1 |<----->| Memory |<----->| CPU 2 |
101 Each CPU executes a program that generates memory access operations.  In the
102 abstract CPU, memory operation ordering is very relaxed, and a CPU may actually
109 CPU are perceived by the rest of the system as the operations cross the
110 interface between the CPU and rest of the system (the dotted lines).
115 	CPU 1		CPU 2
142 Furthermore, the stores committed by a CPU to the memory system may not be
143 perceived by the loads made by another CPU in the same order as the stores were
149 	CPU 1		CPU 2
156 the address retrieved from P by CPU 2.  At the end of the sequence, any of the
163 Note that CPU 2 will never try and load C into D because the CPU will load P
192 There are some minimal guarantees that may be expected of a CPU:
194  (*) On any given CPU, dependent memory accesses will be issued in order, with
199      the CPU will issue the following memory operations:
209  (*) Overlapping loads and stores within a particular CPU will appear to be
210      ordered within that CPU.  This means that for:
214      the CPU will only issue the following sequence of memory operations:
222      the CPU will only issue:
322 in random order, but this can be a problem for CPU-CPU interaction and for I/O.
324 CPU to restrict the order.
352      A CPU can be viewed as committing a sequence of store operations to the
374      committing sequences of stores to the memory system that the CPU being
375      considered can then perceive.  A data dependency barrier issued by the CPU
377      load touches one of a sequence of stores from another CPU, then by the
469 between two CPUs or between a CPU and a device.  If it can be guaranteed that
486      instruction; the barrier can be considered to draw a line in that CPU's
489  (*) There is no guarantee that issuing a memory barrier on one CPU will have
490      any direct effect on another CPU or any other hardware in the system.  The
491      indirect effect will be the order in which the second CPU sees the effects
492      of the first CPU's accesses occur, but see the next point:
494  (*) There is no guarantee that a CPU will see the correct order of effects
495      from a second CPU's accesses, even _if_ the second CPU uses a memory
496      barrier, unless the first CPU _also_ uses a matching memory barrier (see
499  (*) There is no guarantee that some intervening piece of off-the-CPU
500      hardware[*] will not reorder the memory accesses.  CPU cache coherency
518 	CPU 1		      CPU 2
533 But!  CPU 2's perception of P may be updated _before_ its perception of B, thus
545 	CPU 1		      CPU 2
563 even-numbered bank of the reading CPU's cache is extremely busy while the
572 	CPU 1		      CPU 2
606 dependency, but rather a control dependency that the CPU may short-circuit
637 	b = p;  /* BUG: Compiler and CPU can both reorder!!! */
670 'b', which means that the CPU is within its rights to reorder them:
721 Given this transformation, the CPU is not required to respect the ordering
764 	CPU 0                     CPU 1
772 The above two-CPU example will never trigger the assert().  However,
774 then adding the following CPU would guarantee a related assertion:
776 	CPU 2
783 assertion can fail after the combined three-CPU example completes.  If you
784 need the three-CPU example to provide ordering, you will need smp_mb()
785 between the loads and stores in the CPU 0 and CPU 1 code fragments,
826 When dealing with CPU-CPU interactions, certain types of memory barrier should
838 	CPU 1		      CPU 2
848 	CPU 1		      CPU 2
858 	CPU 1		      CPU 2
876 	CPU 1                               CPU 2
891 	CPU 1
911 	| CPU 1 |  :    | B=2  |     }
922 	                   | memory system by CPU 1
929 	CPU 1			CPU 2
939 Without intervention, CPU 2 may perceive the events on CPU 1 in some
940 effectively random order, despite the write barrier issued by CPU 1:
945 	|       |  :    +------+     \          +-------+  | CPU 2
946 	| CPU 1 |  :    | A=1  |      \     --->| C->&Y |  V
956 	                               |        :       :       | CPU 2 |
969 In the above example, CPU 2 perceives that B is 7, despite the load of *C
973 and the load of *C (ie: B) on CPU 2:
975 	CPU 1			CPU 2
992 	| CPU 1 |  :    | A=1  |      \     --->| C->&Y |
1002 	                               |        :       :       | CPU 2 |
1016 	CPU 1			CPU 2
1025 Without intervention, CPU 2 may then choose to perceive the events on CPU 1 in
1026 some effectively random order, despite the write barrier issued by CPU 1:
1032 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1038 	                                |       +-------+       | CPU 2 |
1050 load of A on CPU 2:
1052 	CPU 1			CPU 2
1062 then the partial ordering imposed by CPU 1 will be perceived correctly by CPU
1069 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1075 	                                |       +-------+       | CPU 2 |
1081 	  to be perceptible to CPU 2            +-------+       |       |
1088 	CPU 1			CPU 2
1106 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1112 	                                |       +-------+       | CPU 2 |
1121 	  to be perceptible to CPU 2            +-------+       |       |
1125 But it may be that the update to A from CPU 1 becomes perceptible to CPU 2
1132 	| CPU 1 |   wwwwwwwwwwwwwwww   \    --->| B->9  |
1138 	                                |       +-------+       | CPU 2 |
1163 actual load instruction to potentially complete immediately because the CPU
1166 It may turn out that the CPU didn't actually need the value - perhaps because a
1172 	CPU 1			CPU 2
1184 	                                        +-------+       | CPU 2 |
1187 	The CPU being busy doing a --->     --->| A->0  |~~~~   |       |
1193 	the CPU can then perform the            :       :       |       |
1200 	CPU 1			CPU 2
1215 	                                        +-------+       | CPU 2 |
1218 	The CPU being busy doing a --->     --->| A->0  |~~~~   |       |
1231 but if there was an update or an invalidation from another CPU pending, then
1237 	                                        +-------+       | CPU 2 |
1240 	The CPU being busy doing a --->     --->| A->0  |~~~~   |       |
1260 	CPU 1			CPU 2			CPU 3
1267 Suppose that CPU 2's load from X returns 1 and its load from Y returns 0.
1268 This indicates that CPU 2's load from X in some sense follows CPU 1's
1269 store to X and that CPU 2's load from Y in some sense preceded CPU 3's
1270 store to Y.  The question is then "Can CPU 3's load from X return 0?"
1272 Because CPU 2's load from X in some sense came after CPU 1's store, it
1273 is natural to expect that CPU 3's load from X must therefore return 1.
1275 CPU A follows a load from the same variable executing on CPU B, then
1276 CPU A's load must either return the same value that CPU B's load did,
1280 transitivity.  Therefore, in the above example, if CPU 2's load from X
1281 returns 1 and its load from Y returns 0, then CPU 3's load from X must
1285 For example, suppose that CPU 2's general barrier in the above example
1288 	CPU 1			CPU 2			CPU 3
1296 legal for CPU 2's load from X to return 1, its load from Y to return 0,
1297 and CPU 3's load from X to return 0.
1299 The key point is that although CPU 2's read barrier orders its pair
1300 of loads, it does not guarantee to order CPU 1's store.  Therefore, if
1302 or a level of cache, CPU 2 might have early access to CPU 1's writes.
1304 on the combined order of CPU 1's and CPU 2's accesses.
1319   (*) CPU memory barriers.
1352      to the same variable, and in some cases, the CPU is within its
1360      Prevent both the compiler and the CPU from doing this as follows:
1404      a was modified by some other CPU between the "while" statement and
1431      carry out its proof assuming that the current CPU is the only one
1453      Again, the compiler assumes that the current CPU is the only one
1464      surprise if some other CPU might have stored to variable 'a' in the
1534      occur, though the CPU of course need not do so.
1552      could cause some other CPU to see a spurious value of 42 -- even
1618 Please note that these compiler barriers have no direct effect on the CPU,
1622 CPU MEMORY BARRIERS
1625 The Linux kernel has eight basic CPU memory barriers:
1648 systems because it is assumed that a CPU will appear to be self-consistent,
1660 CPU from reordering them.
1700      of writes or reads of shared memory accessible to both the CPU and a
1705      to the device or the CPU, and a doorbell to notify it when new
1748 CPU->Hardware interface and actually affect the hardware at some level.
1838 another CPU not holding that lock.  In short, a ACQUIRE followed by an
1846 CPU or task, or (b) the RELEASE and ACQUIRE act on the same variable.
1848 Without smp_mb__after_unlock_lock(), the CPU's execution of the critical
1866 	One key point is that we are only talking about the CPU doing
1871 	But suppose the CPU reordered the operations.  In this case,
1872 	the unlock precedes the lock in the assembly code.  The CPU
1875 	try to sleep, but more on that later).	The CPU will eventually
1915 See also the section on "Inter-CPU locking barrier effects".
1975 	CPU 1
2016 	CPU 1				CPU 2
2028 	CPU 1				CPU 2
2036 In contrast, if a wakeup does occur, CPU 2's load from X would be guaranteed
2103 INTER-CPU ACQUIRING BARRIER EFFECTS
2117 	CPU 1				CPU 2
2126 Then there is no guarantee as to what order CPU 3 will see the accesses to *A
2142 	CPU 1				CPU 2
2157 CPU 3 might see:
2162 But assuming CPU 1 gets the lock first, CPU 3 won't see any of:
2170 here: Without it CPU 3 might see some of the above orderings.
2172 to be seen in order unless CPU 3 holds lock M.
2186 	CPU 1				CPU 2
2207 	CPU 1				CPU 2
2220 this will ensure that the two stores issued on CPU 1 appear at the PCI bridge
2221 before either of the stores issued on CPU 2.
2228 	CPU 1				CPU 2
2264 When there's a system with more than one processor, more than one CPU in the
2315 another CPU might start processing the waiter and might clobber the waiter's
2320 	CPU 1				CPU 2
2358 right order without actually intervening in the CPU.  Since there's only one
2359 CPU, that CPU's dependency ordering logic will take care of everything else.
2457 Many devices can be memory mapped, and so appear to the CPU as if they're just
2461 However, having a clever CPU or a clever compiler creates a potential problem
2463 device in the requisite order if the CPU or the compiler thinks it is more
2494 routine is executing, the driver's core may not run on the same CPU, and its
2543      that's primarily a CPU-specific concept. The i386 and x86_64 processors do
2548      CPUs as i386 and x86_64 - readily maps to the CPU's concept of I/O
2549      space.  However, it may also be mapped as a virtual I/O space in the CPU's
2565      respect to each other on the issuing CPU depends on the characteristics
2608 It has to be assumed that the conceptual CPU is weakly-ordered but that it will
2614 This means that it must be considered that the CPU will execute its instruction
2625 A CPU may also discard any instruction sequence that winds up having no
2636 THE EFFECTS OF THE CPU CACHE
2643 As far as the way a CPU interacts with another part of the system through the
2644 caches goes, the memory system has to include the CPU's caches, and memory
2645 barriers for the most part act at the interface between the CPU and its cache
2648 	    <--- CPU --->         :       <----------- Memory ----------->
2652 	|  CPU   |    | Memory |  :   | CPU    |    |           |    |        |
2662 	|  CPU   |    | Memory |  :   | CPU    |    |           |--->| Device |
2671 CPU that issued it since it may have been satisfied within the CPU's own cache,
2674 cacheline over to the accessing CPU and propagate the effects upon conflict.
2676 The CPU core may execute instructions in any order it deems fit, provided the
2684 accesses cross from the CPU side of things to the memory side of things, and
2688 [!] Memory barriers are _not_ needed within a given CPU, as CPUs always see
2693 the use of any special device communication instructions the CPU may have.
2701 will be ordered.  This means that whilst changes made on one CPU will
2707 has a pair of parallel data caches (CPU 1 has A/B, and CPU 2 has C/D):
2714 	|  CPU 1 |<---+                        |        |
2722 	|  CPU 2 |<---+                        |        |
2737  (*) whilst the CPU core is interrogating one cache, the other cache may be
2748 Imagine, then, that two writes are made on the first CPU, with a write barrier
2749 between them to guarantee that they will appear to reach that CPU's caches in
2752 	CPU 1		CPU 2		COMMENT
2763 the local CPU's caches have apparently been updated in the correct order.  But
2764 now imagine that the second CPU wants to read those values:
2766 	CPU 1		CPU 2		COMMENT
2773 cacheline holding p may get updated in one of the second CPU's caches whilst
2775 CPU's caches by some other cache event:
2777 	CPU 1		CPU 2		COMMENT
2793 Basically, whilst both cachelines will be updated on CPU 2 eventually, there's
2795 as that committed on CPU 1.
2802 	CPU 1		CPU 2		COMMENT
2837 the kernel must flush the overlapping bits of cache on each CPU (and maybe
2841 cache lines being written back to RAM from a CPU's cache after the device has
2842 installed its own data, or cache lines present in the CPU's cache may simply
2844 is discarded from the CPU's cache and reloaded.  To deal with this, the
2846 cache on each CPU.
2855 a window in the CPU's memory space that has different properties assigned than
2870 A programmer might take it for granted that the CPU will perform memory
2871 operations in exactly the order specified, so that if the CPU is, for example,
2880 they would then expect that the CPU will complete the memory operation for each
2901      of the CPU buses and caches;
2908  (*) the CPU's data cache may affect the ordering, and whilst cache-coherency
2913 So what another CPU, say, might actually observe from the above piece of code
2921 However, it is guaranteed that a CPU will be self-consistent: it will see its
2940 The code above may cause the CPU to generate the full sequence of memory
2948 in the above example, as there are architectures where a given CPU might
2955 the CPU even sees them.
2977 and the LOAD operation never appear outside of the CPU.
2983 The DEC Alpha CPU is one of the most relaxed CPUs there is.  Not only that,
2984 some versions of the Alpha CPU have a split data cache, permitting them to have