Merge branch 'origin'

chewitt · Feb 6, 2006 · b2faf59 · b2faf59
2 parents 638e174 + 410c054
commit b2faf59
Show file tree

Hide file tree

Showing 1,079 changed files with 32,439 additions and 19,299 deletions.
diff --git a/CREDITS b/CREDITS
@@ -3101,7 +3101,7 @@ S: Minto, NSW, 2566
 S: Australia
 
 N: Stephen Smalley
-E: sds@epoch.ncsc.mil
+E: sds@tycho.nsa.gov
 D: portions of the Linux Security Module (LSM) framework and security modules
 
 N: Chris Smith

diff --git a/Documentation/RCU/RTFP.txt b/Documentation/RCU/RTFP.txt
@@ -90,16 +90,20 @@ at OLS.  The resulting abundance of RCU patches was presented the
 following year [McKenney02a], and use of RCU in dcache was first
 described that same year [Linder02a].
 
-Also in 2002, Michael [Michael02b,Michael02a] presented techniques
-that defer the destruction of data structures to simplify non-blocking
-synchronization (wait-free synchronization, lock-free synchronization,
-and obstruction-free synchronization are all examples of non-blocking
-synchronization).  In particular, this technique eliminates locking,
-reduces contention, reduces memory latency for readers, and parallelizes
-pipeline stalls and memory latency for writers.  However, these
-techniques still impose significant read-side overhead in the form of
-memory barriers.  Researchers at Sun worked along similar lines in the
-same timeframe [HerlihyLM02,HerlihyLMS03].
+Also in 2002, Michael [Michael02b,Michael02a] presented "hazard-pointer"
+techniques that defer the destruction of data structures to simplify
+non-blocking synchronization (wait-free synchronization, lock-free
+synchronization, and obstruction-free synchronization are all examples of
+non-blocking synchronization).  In particular, this technique eliminates
+locking, reduces contention, reduces memory latency for readers, and
+parallelizes pipeline stalls and memory latency for writers.  However,
+these techniques still impose significant read-side overhead in the
+form of memory barriers.  Researchers at Sun worked along similar lines
+in the same timeframe [HerlihyLM02,HerlihyLMS03].  These techniques
+can be thought of as inside-out reference counts, where the count is
+represented by the number of hazard pointers referencing a given data
+structure (rather than the more conventional counter field within the
+data structure itself).
 
 In 2003, the K42 group described how RCU could be used to create
 hot-pluggable implementations of operating-system functions.  Later that
@@ -113,7 +117,6 @@ number of operating-system kernels [PaulEdwardMcKenneyPhD], a paper
 describing how to make RCU safe for soft-realtime applications [Sarma04c],
 and a paper describing SELinux performance with RCU [JamesMorris04b].
 
-
 2005 has seen further adaptation of RCU to realtime use, permitting
 preemption of RCU realtime critical sections [PaulMcKenney05a,
 PaulMcKenney05b].

diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
@@ -177,3 +177,9 @@ over a rather long period of time, but improvements are always welcome!
 
 	If you want to wait for some of these other things, you might
 	instead need to use synchronize_irq() or synchronize_sched().
+
+12.	Any lock acquired by an RCU callback must be acquired elsewhere
+	with irq disabled, e.g., via spin_lock_irqsave().  Failing to
+	disable irq on a given acquisition of that lock will result in
+	deadlock as soon as the RCU callback happens to interrupt that
+	acquisition's critical section.
diff --git a/Documentation/RCU/listRCU.txt b/Documentation/RCU/listRCU.txt
@@ -232,7 +232,7 @@ entry does not exist.  For this to be helpful, the search function must
 return holding the per-entry spinlock, as ipc_lock() does in fact do.
 
 Quick Quiz:  Why does the search function need to return holding the
-per-entry lock for this deleted-flag technique to be helpful?
+	per-entry lock for this deleted-flag technique to be helpful?
 
 If the system-call audit module were to ever need to reject stale data,
 one way to accomplish this would be to add a "deleted" flag and a "lock"
@@ -275,8 +275,8 @@ flag under the spinlock as follows:
 	{
 		struct audit_entry  *e;
 
-		/* Do not use the _rcu iterator here, since this is the only
-		 * deletion routine. */
+		/* Do not need to use the _rcu iterator here, since this
+		 * is the only deletion routine. */
 		list_for_each_entry(e, list, list) {
 			if (!audit_compare_rule(rule, &e->rule)) {
 				spin_lock(&e->lock);
@@ -304,9 +304,12 @@ function to reject newly deleted data.
 
 
 Answer to Quick Quiz
-
-If the search function drops the per-entry lock before returning, then
-the caller will be processing stale data in any case.  If it is really
-OK to be processing stale data, then you don't need a "deleted" flag.
-If processing stale data really is a problem, then you need to hold the
-per-entry lock across all of the code that uses the value looked up.
+	Why does the search function need to return holding the per-entry
+	lock for this deleted-flag technique to be helpful?
+
+	If the search function drops the per-entry lock before returning,
+	then the caller will be processing stale data in any case.  If it
+	is really OK to be processing stale data, then you don't need a
+	"deleted" flag.  If processing stale data really is a problem,
+	then you need to hold the per-entry lock across all of the code
+	that uses the value that was returned.
diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt
@@ -111,6 +111,11 @@ o	What are all these files in this directory?
 
 		You are reading it!
 
+	rcuref.txt
+
+		Describes how to combine use of reference counts
+		with RCU.
+
 	whatisRCU.txt
 
 		Overview of how the RCU implementation works.  Along

diff --git a/Documentation/RCU/rcuref.txt b/Documentation/RCU/rcuref.txt
@@ -1,7 +1,7 @@
-Refcounter design for elements of lists/arrays protected by RCU.
+Reference-count design for elements of lists/arrays protected by RCU.
 
-Refcounting on elements of  lists which are protected by traditional
-reader/writer spinlocks or semaphores are straight forward as in:
+Reference counting on elements of lists which are protected by traditional
+reader/writer spinlocks or semaphores are straightforward:
 
 1.				2.
 add()				search_and_reference()
@@ -28,12 +28,12 @@ release_referenced()			delete()
 					    ...
 					}
 
-If this list/array is made lock free using rcu as in changing the
-write_lock in add() and delete() to spin_lock and changing read_lock
+If this list/array is made lock free using RCU as in changing the
+write_lock() in add() and delete() to spin_lock and changing read_lock
 in search_and_reference to rcu_read_lock(), the atomic_get in
 search_and_reference could potentially hold reference to an element which
-has already been deleted from the list/array.  atomic_inc_not_zero takes
-care of this scenario. search_and_reference should look as;
+has already been deleted from the list/array.  Use atomic_inc_not_zero()
+in this scenario as follows:
 
 1.					2.
 add()					search_and_reference()
@@ -51,17 +51,16 @@ add()					search_and_reference()
 release_referenced()			delete()
 {					{
     ...					    write_lock(&list_lock);
-    atomic_dec(&el->rc, relfunc)	    ...
-    ...					    delete_element
-}					    write_unlock(&list_lock);
- 					    ...
+    if (atomic_dec_and_test(&el->rc))       ...
+        call_rcu(&el->head, el_free);       delete_element
+    ...                                     write_unlock(&list_lock);
+} 					    ...
 					    if (atomic_dec_and_test(&el->rc))
 					        call_rcu(&el->head, el_free);
 					    ...
 					}
 
-Sometimes, reference to the element need to be obtained in the
-update (write) stream.  In such cases, atomic_inc_not_zero might be an
-overkill since the spinlock serialising list updates are held. atomic_inc
-is to be used in such cases.
-
+Sometimes, a reference to the element needs to be obtained in the
+update (write) stream.  In such cases, atomic_inc_not_zero() might be
+overkill, since we hold the update-side spinlock.  One might instead
+use atomic_inc() in such cases.
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
@@ -200,10 +200,11 @@ rcu_assign_pointer()
 	the new value, and also executes any memory-barrier instructions
 	required for a given CPU architecture.
 
-	Perhaps more important, it serves to document which pointers
-	are protected by RCU.  That said, rcu_assign_pointer() is most
-	frequently used indirectly, via the _rcu list-manipulation
-	primitives such as list_add_rcu().
+	Perhaps just as important, it serves to document (1) which
+	pointers are protected by RCU and (2) the point at which a
+	given structure becomes accessible to other CPUs.  That said,
+	rcu_assign_pointer() is most frequently used indirectly, via
+	the _rcu list-manipulation primitives such as list_add_rcu().
 
 rcu_dereference()
 
@@ -258,9 +259,11 @@ rcu_dereference()
 	locking.
 
 	As with rcu_assign_pointer(), an important function of
-	rcu_dereference() is to document which pointers are protected
-	by RCU.  And, again like rcu_assign_pointer(), rcu_dereference()
-	is typically used indirectly, via the _rcu list-manipulation
+	rcu_dereference() is to document which pointers are protected by
+	RCU, in particular, flagging a pointer that is subject to changing
+	at any time, including immediately after the rcu_dereference().
+	And, again like rcu_assign_pointer(), rcu_dereference() is
+	typically used indirectly, via the _rcu list-manipulation
 	primitives, such as list_for_each_entry_rcu().
 
 The following diagram shows how each API communicates among the
@@ -327,7 +330,7 @@ for specialized uses, but are relatively uncommon.
 3.  WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
 
 This section shows a simple use of the core RCU API to protect a
-global pointer to a dynamically allocated structure.  More typical
+global pointer to a dynamically allocated structure.  More-typical
 uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt.
 
 	struct foo {
@@ -410,6 +413,8 @@ o	Use synchronize_rcu() -after- removing a data element from an
 	data item.
 
 See checklist.txt for additional rules to follow when using RCU.
+And again, more-typical uses of RCU may be found in listRCU.txt,
+arrayRCU.txt, and NMI-RCU.txt.
 
 
 4.  WHAT IF MY UPDATING THREAD CANNOT BLOCK?
@@ -513,7 +518,7 @@ production-quality implementation, and see:
 
 for papers describing the Linux kernel RCU implementation.  The OLS'01
 and OLS'02 papers are a good introduction, and the dissertation provides
-more details on the current implementation.
+more details on the current implementation as of early 2004.
 
 
 5A.  "TOY" IMPLEMENTATION #1: LOCKING
@@ -768,7 +773,6 @@ RCU pointer/list traversal:
 	rcu_dereference
 	list_for_each_rcu		(to be deprecated in favor of
 					 list_for_each_entry_rcu)
-	list_for_each_safe_rcu		(deprecated, not used)
 	list_for_each_entry_rcu
 	list_for_each_continue_rcu	(to be deprecated in favor of new
 					 list_for_each_entry_continue_rcu)
@@ -807,7 +811,8 @@ Quick Quiz #1:	Why is this argument naive?  How could a deadlock
 Answer:		Consider the following sequence of events:
 
 		1.	CPU 0 acquires some unrelated lock, call it
-			"problematic_lock".
+			"problematic_lock", disabling irq via
+			spin_lock_irqsave().
 
 		2.	CPU 1 enters synchronize_rcu(), write-acquiring
 			rcu_gp_mutex.
@@ -894,7 +899,7 @@ Answer:		Just as PREEMPT_RT permits preemption of spinlock
 ACKNOWLEDGEMENTS
 
 My thanks to the people who helped make this human-readable, including
-Jon Walpole, Josh Triplett, Serge Hallyn, and Suzanne Wood.
+Jon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern.
 
 
 For more information, see http://www.rdrop.com/users/paulmck/RCU.
diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
@@ -0,0 +1,41 @@
+
+Export cpu topology info by sysfs. Items (attributes) are similar
+to /proc/cpuinfo.
+
+1) /sys/devices/system/cpu/cpuX/topology/physical_package_id:
+represent the physical package id of  cpu X;
+2) /sys/devices/system/cpu/cpuX/topology/core_id:
+represent the cpu core id to cpu X;
+3) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
+represent the thread siblings to cpu X in the same core;
+4) /sys/devices/system/cpu/cpuX/topology/core_siblings:
+represent the thread siblings to cpu X in the same physical package;
+
+To implement it in an architecture-neutral way, a new source file,
+driver/base/topology.c, is to export the 5 attributes.
+
+If one architecture wants to support this feature, it just needs to
+implement 4 defines, typically in file include/asm-XXX/topology.h.
+The 4 defines are:
+#define topology_physical_package_id(cpu)
+#define topology_core_id(cpu)
+#define topology_thread_siblings(cpu)
+#define topology_core_siblings(cpu)
+
+The type of **_id is int.
+The type of siblings is cpumask_t.
+
+To be consistent on all architectures, the 4 attributes should have
+deafult values if their values are unavailable. Below is the rule.
+1) physical_package_id: If cpu has no physical package id, -1 is the
+default value.
+2) core_id: If cpu doesn't support multi-core, its core id is 0.
+3) thread_siblings: Just include itself, if the cpu doesn't support
+HT/multi-thread.
+4) core_siblings: Just include itself, if the cpu doesn't support
+multi-core and HT/Multi-thread.
+
+So be careful when declaring the 4 defines in include/asm-XXX/topology.h.
+
+If an attribute isn't defined on an architecture, it won't be exported.
+
diff --git a/Documentation/driver-model/overview.txt b/Documentation/driver-model/overview.txt
@@ -1,50 +1,43 @@
 The Linux Kernel Device Model
 
-Patrick Mochel	<mochel@osdl.org>
+Patrick Mochel	<mochel@digitalimplant.org>
 
-26 August 2002
+Drafted 26 August 2002
+Updated 31 January 2006
 
 
 Overview
 ~~~~~~~~
 
-This driver model is a unification of all the current, disparate driver models
-that are currently in the kernel. It is intended to augment the
+The Linux Kernel Driver Model is a unification of all the disparate driver
+models that were previously used in the kernel. It is intended to augment the
 bus-specific drivers for bridges and devices by consolidating a set of data
 and operations into globally accessible data structures.
 
-Current driver models implement some sort of tree-like structure (sometimes
-just a list) for the devices they control. But, there is no linkage between
-the different bus types.
+Traditional driver models implemented some sort of tree-like structure
+(sometimes just a list) for the devices they control. There wasn't any
+uniformity across the different bus types.
 
-A common data structure can provide this linkage with little overhead: when a
-bus driver discovers a particular device, it can insert it into the global
-tree as well as its local tree. In fact, the local tree becomes just a subset
-of the global tree.
-
-Common data fields can also be moved out of the local bus models into the
-global model. Some of the manipulations of these fields can also be
-consolidated. Most likely, manipulation functions will become a set
-of helper functions, which the bus drivers wrap around to include any
-bus-specific items.
-
-The common device and bridge interface currently reflects the goals of the
-modern PC: namely the ability to do seamless Plug and Play, power management,
-and hot plug. (The model dictated by Intel and Microsoft (read: ACPI) ensures
-us that any device in the system may fit any of these criteria.)
-
-In reality, not every bus will be able to support such operations. But, most
-buses will support a majority of those operations, and all future buses will.
-In other words, a bus that doesn't support an operation is the exception,
-instead of the other way around.
+The current driver model provides a comon, uniform data model for describing
+a bus and the devices that can appear under the bus. The unified bus
+model includes a set of common attributes which all busses carry, and a set
+of common callbacks, such as device discovery during bus probing, bus
+shutdown, bus power management, etc.
 
+The common device and bridge interface reflects the goals of the modern
+computer: namely the ability to do seamless device "plug and play", power
+management, and hot plug. In particular, the model dictated by Intel and
+Microsoft (namely ACPI) ensures that almost every device on almost any bus
+on an x86-compatible system can work within this paradigm.  Of course,
+not every bus is able to support all such operations, although most
+buses support a most of those operations.
 
 
 Downstream Access
 ~~~~~~~~~~~~~~~~~
 
 Common data fields have been moved out of individual bus layers into a common
-data structure. But, these fields must still be accessed by the bus layers,
+data structure. These fields must still be accessed by the bus layers,
 and sometimes by the device-specific drivers.
 
 Other bus layers are encouraged to do what has been done for the PCI layer.
@@ -53,7 +46,7 @@ struct pci_dev now looks like this:
 struct pci_dev {
 	...
 
-	struct device device;
+	struct device dev;
 };
 
 Note first that it is statically allocated. This means only one allocation on
@@ -64,9 +57,9 @@ the two.
 
 The PCI bus layer freely accesses the fields of struct device. It knows about
 the structure of struct pci_dev, and it should know the structure of struct
-device. PCI devices that have been converted generally do not touch the fields
-of struct device. More precisely, device-specific drivers should not touch
-fields of struct device unless there is a strong compelling reason to do so.
+device. Individual PCI device drivers that have been converted the the current
+driver model generally do not and should not touch the fields of struct device,
+unless there is a strong compelling reason to do so.
 
 This abstraction is prevention of unnecessary pain during transitional phases.
 If the name of the field changes or is removed, then every downstream driver