From: eLinux.org
Realtime Preemption
Contents
- 1 Description
- 2 Resources
- 3 Downloads
- 4 How To Use
- 5 How to validate
- 6 Related projects
- 7 Sample Results
- 8 Status
- 9 Future Work/Action Items
Description
Overview
Realtime Preemption is (as of this writing 12/21/2004) a patch which
tries to improve realtime performance of the Linux kernel.
Recent patches from Ingo include a (large) number of technologies for
improving preemption and debugging preemption issues with the Linux
kernel.
An overview of the technologies is as follows:
- voluntary preempt = a set of voluntary preemption points for the
kernel, to improve normal scheduling latency (These changes
basically - BKL change to semaphore
- latency tracer
Voluntary Preempt
Overview:
- if it’s on at compile time, it can be turned off at runtime with
the command line: “voluntary-preemption=0” or
“voluntary-preemption=off” Creates a new function might_resched(), which is used by might_sleep().
might_resched calls cond_resched() if voluntary preemption is on.
Adds might_sleep in several places.
Conversion of Spinlocks to Mutexes
According to Ingo Molnar, it’s primary author, “the big change in this
release is the addition of PREEMPT_REALTIME, which is a new
implementation of a fully preemptible kernel model”
For a brief description of the overall technology, see:
http://kerneltrap.org/node/3995?PHPSESSID=4bc02ae16e5a27308031f3cd664fd574
Briefly, the technology makes spinlocks and rwlocks preemptible by
default.
- the patch auto-detects at compile-time the type of lock to use
for a spinlock (mutex or original raw_spinlock) - it uses a feature of gcc to manage this (reducing patch size)
- it uses native Linux semaphores for preemption
- it convert rwlocks to rw-semaphores
- apparently, about 90 locks are targetted for NON-conversion to
preemptibility (that is, they are preserved as RAW_SPINLOCKS)
Ingo mentioned at one time that this was about 20% of the locks in his
kernel configuration, implying that there were about 450 spinlocks
present in the kernel in his configuration.
Ingo said this about how well this works on Un-processor (UP) systems
versus SMP systems.
...and no matter how well UP works, to fix SMP one has to 'cover' all the
necessary locks first before fixing it, which (drastic) increase in raw
locks invalidates most of the UP efforts of getting rid of raw locks.
That's why i decided to go for SMP primarily - didnt see much point in
going for UP.
Normally, in UP the spinlocks are compiled away. When PREEMPT is turned
on (without the new patch) these spinlocks are turned into markers for
non-preemptible regions. When RT-PREEMPT is used,
people working on/interested in this stuff
- Ingo Molnar, Red Hat, voluntary
preemption, Ingo real-time preemption - Sven Dietrich, Monta Vista, MV
real-time preemption - Daniel Walker, Monta Vista,
priority inheritance?? - John Cooper, Time Sys, ???
- Tim Bird, Sony, port to 2.6.10-native, port to PPC
- Scott Woods, Time Sys, IRQ threading??
people working on related stuff
- Bill Huey, Lynux Works??, mmlinux
miscellaneous comments
Comments regarding the scheduling of RT tasks
Ingo said (in this
message):
note that my -RT patchset includes scheduler changes that implement
“global RT scheduling” on SMP systems. Give it a go, it’s at:
http://redhat.com/~mingo/realtime-preempt/
you have to enable CONFIG_PREEMPT_RT to active this feature. I’ve
designed this code to not hurt non-RT scheduling, and i’ve optimized
performance for the ‘lightly loaded case’ (which is the most common to
occur on mainline-using systems).
A very short description of the design: there’s a global ‘RT overload
counter’ - which is zero and causes no overhead if there is at most 1 RT
task in every runqueue. (i.e. at most 2 RT tasks on a 2-way system, at
most 4 RT tasks on a 4-way system, etc.) If the system gets into ‘RT
overload’ mode (e.g. the third RT task gets activated on a 2-way box),
then the scheduler starts to balance the RT tasks agressively. Also,
whenever an RT task is preempted on a CPU, or is woken up but cannot
preempt a higher-prio RT task on a given CPU, then it’s ‘pushed’ to
other CPUs if possible. This design avoids global locking (it avoids a
global runqueue), which simplifies things immensely. (I first tried a
global runqueue for RT tasks but the complexity impact was much bigger.)
(note that these scheduler changes are resonably self-contained and do
not depend on other parts of PREEMPT_RT, so in theory they could be
added to mainline too, after some time - given lots of testing and broad
agreement.)
comments regarding the hard parts of this work
Ingo says (at:
http://groups-beta.google.com/group/linux.kernel/msg/cf036477d30ab736)
some of the harder stuff:
the handling of per-CPU data structures (get_cpu_var())
RCU and softirq data structures
the handling of the IRQ flag
comments about the number of raw spinlocks needed
Ingo says (at:
http://groups-beta.google.com/group/linux.kernel/msg/e63b2860d2e993dd)
Sven Dietrich sdietr...@mvista.com wrote:
IMO the number of raw_spinlocks should be lower, I said teens before.
Theoretically, it should only need to be around hardware registers and
some memory maps and cache code, plus interrupt controller and other
SMP-contended hardware.yeah, fully agreed. Right now the 90 locks i have means roughly 20% of
all locking still happens as raw spinlocks.But, there is a ‘correctness’ minimum set of spinlocks that must be
raw spinlocks - this i tried to map in the -T4 patch. The patch does run
on SMP systems for example. (it was developed as an SMP kernel - in fact
i never compiled it as UP :-|.) If code has per-CPU or preemption
assumptions then there is no choice but to make it a raw spinlock, until
those assumptions are fixed.
Rationale
This feature is intended to provide much better realtime scheduling
response for a Linux system.
Resources
Projects
Various parties are working on ports: Time Sys
and Monta Vista, in particular, seem to have made ports to PPC and ARM
platforms.
Specifications
None that I’m aware of.
Online resources
The original announcement for voluntary-preemption:
Here’s some stuff by Jonathon Corbet:
- http://lwn.net/Articles/106010/
- http://lwn.net/Articles/107269/
- http://lwn.net/Articles/108216/
- http://lwn.net/Articles/129511/
There’s a page of links about RT for audio at:
A brief introduction of RT patch (Sorry, in Japanese only):
Paper: “Embedded GNU/Linux and Real-Time an executive
summary“,
2010 by Robert Berger- This papers, prepared for the Embedded World Conference 2010,
compares different real-time approaches (including RT-preempt
and dual-kernel approaches). - The paper has an extensive list of references, which are very
good.
- This papers, prepared for the Embedded World Conference 2010,
Downloads
Patch
See http://redhat.com/~mingo/realtime-preempt/
Utility programs
[other programs, user-space, test, etc. related to this technology]
How To Use
- apply patch
- choose desired preemption level
- compile kernel
Configuration variables
The patch introduces (or modifies) the following configuration
variables:
Variable | Purpose |
ASM_SEMAPHORES | |
BLOCKER | |
CRITICAL_IRQSOFF_TIMING | |
CRITICAL_PREEMPT_TIMING | |
CRITICAL_TIMING | |
FRAME_POINTER | |
LATENCY_TIMING | |
LATENCY_TRACE | |
MCOUNT | |
PREEMPT | |
PREEMPT_BKL | |
PREEMPT_DESKTOP | |
PREEMPT_HARDIRQS | |
PREEMPT_NONE | |
PREEMPT_RT | |
PREEMPT_SOFTIRQS | |
PREEMPT_TRACE | |
PREEMPT_VOLUNTARY | |
RTC_HISTOGRAM | |
RT_DEADLOCK_DETECT | |
RWSEM_GENERIC_SPINLOCK | |
RWSEM_XCHGADD_ALGORITHM | |
SPINLOCK_BKL | |
USE_FRAME_POINTER | |
WAKEUP_TIMING |
- retrieved from patch with command:
grep "[+-]config " realtime-preempt-2.6.10-mm1-V0.7.34-01 | sed "s/[+-]config //" | sort | uniq
How to validate
[put references to test plans, scripts, methods, etc. here]
- use included trace feature, or
- use included latency overrun reporting mechanism
- Preemption_Instrumentation
Related projects
Monta Vista released a similar technology,
which had the following features:
See
http://groups-beta.google.com/group/linux.kernel/msg/7eeef031d9ec1446
These RT enhancements are an integration of features developed by
others and some new MontaVista components:
- Voluntary Preemption by Ingo Molnar
- IRQ thread patches by Scott Wood and Ingo Molnar
- BKL mutex patch by Ingo Molnar (with MV extensions)
- PMutex from Germany’s Universitaet der Bundeswehr, Munich
- MontaVista mutex abstraction layer replacing spinlocks with mutexes
Sample Results
[Examples of use with measurement of the effects.]
Case Study 1
- Linux RT Benchmarking Framework
- Summary of dicussion in LKLM (sorry in Japanese)
Case Study 2
Trevor Woerner published some results in November 2005 regarding some
latency measurements he have been recording on the 2.6.14 kernel with
Ingo’s patches.
See
http://geek.vtnet.ca/embedded/LatencyTests/html/index.html
Case Study 3
Status
- Rt_Preempt_Subpatch_Table
Status: [not started??]
(one of: not started, researched, implemented, measured, documented, accepted)
Architecture Support:
(for each arch, one of: unknown, patches apply, compiles, runs, works, accepted)
- i386: unknown
- ARM: unknown
- PPC: unknown
- MIPS: unknown
- SH: unknown
Future Work/Action Items
Here is a list of things that could be worked on for this feature:
- help with mainlining???
- perform testing on multiple platforms
- provide use cases for justification
- what else?
- break patch into manageable pieces - doesn’t Ingo use any kind of patch management system???
people who expressed interest
Manas Saksena, Jon Masters, Takeharu Kato, Ralph Siemsen, Jyunji Kondo