From: eLinux.org

Realtime Preemption

Contents

Description

Overview

Realtime Preemption is (as of this writing 12/21/2004) a patch which
tries to improve realtime performance of the Linux kernel.

Recent patches from Ingo include a (large) number of technologies for
improving preemption and debugging preemption issues with the Linux
kernel.

An overview of the technologies is as follows:

  • voluntary preempt = a set of voluntary preemption points for the
    kernel, to improve normal scheduling latency (These changes
    basically
  • BKL change to semaphore
  • latency tracer

Voluntary Preempt

Overview:

  • if it’s on at compile time, it can be turned off at runtime with
    the command line: “voluntary-preemption=0” or
    “voluntary-preemption=off”
  • Creates a new function might_resched(), which is used by might_sleep().

    • might_resched calls cond_resched() if voluntary preemption is on.

    • Adds might_sleep in several places.

Conversion of Spinlocks to Mutexes

According to Ingo Molnar, it’s primary author, “the big change in this
release is the addition of PREEMPT_REALTIME, which is a new
implementation of a fully preemptible kernel model”

For a brief description of the overall technology, see:
http://kerneltrap.org/node/3995?PHPSESSID=4bc02ae16e5a27308031f3cd664fd574

Briefly, the technology makes spinlocks and rwlocks preemptible by
default.

  • the patch auto-detects at compile-time the type of lock to use
    for a spinlock (mutex or original raw_spinlock)
  • it uses a feature of gcc to manage this (reducing patch size)
  • it uses native Linux semaphores for preemption
  • it convert rwlocks to rw-semaphores
  • apparently, about 90 locks are targetted for NON-conversion to
    preemptibility (that is, they are preserved as RAW_SPINLOCKS)

Ingo mentioned at one time that this was about 20% of the locks in his
kernel configuration, implying that there were about 450 spinlocks
present in the kernel in his configuration.

Ingo said this about how well this works on Un-processor (UP) systems
versus SMP systems.

  1. ...and no matter how well UP works, to fix SMP one has to 'cover' all the
  2. necessary locks first before fixing it, which (drastic) increase in raw
  3. locks invalidates most of the UP efforts of getting rid of raw locks.
  4. That's why i decided to go for SMP primarily - didnt see much point in
  5. going for UP.

Normally, in UP the spinlocks are compiled away. When PREEMPT is turned
on (without the new patch) these spinlocks are turned into markers for
non-preemptible regions. When RT-PREEMPT is used,

people working on/interested in this stuff

  • Ingo Molnar, Red Hat, voluntary
    preemption, Ingo real-time preemption
  • Sven Dietrich, Monta Vista, MV
    real-time preemption
  • Daniel Walker, Monta Vista,
    priority inheritance??
  • John Cooper, Time Sys, ???
  • Tim Bird, Sony, port to 2.6.10-native, port to PPC
  • Scott Woods, Time Sys, IRQ threading??

miscellaneous comments

Comments regarding the scheduling of RT tasks

Ingo said (in this
message):


note that my -RT patchset includes scheduler changes that implement
“global RT scheduling” on SMP systems. Give it a go, it’s at:

  1. http://redhat.com/~mingo/realtime-preempt/

you have to enable CONFIG_PREEMPT_RT to active this feature. I’ve
designed this code to not hurt non-RT scheduling, and i’ve optimized
performance for the ‘lightly loaded case’ (which is the most common to
occur on mainline-using systems).

A very short description of the design: there’s a global ‘RT overload
counter’ - which is zero and causes no overhead if there is at most 1 RT
task in every runqueue. (i.e. at most 2 RT tasks on a 2-way system, at
most 4 RT tasks on a 4-way system, etc.) If the system gets into ‘RT
overload’ mode (e.g. the third RT task gets activated on a 2-way box),
then the scheduler starts to balance the RT tasks agressively. Also,
whenever an RT task is preempted on a CPU, or is woken up but cannot
preempt a higher-prio RT task on a given CPU, then it’s ‘pushed’ to
other CPUs if possible. This design avoids global locking (it avoids a
global runqueue), which simplifies things immensely. (I first tried a
global runqueue for RT tasks but the complexity impact was much bigger.)

(note that these scheduler changes are resonably self-contained and do
not depend on other parts of PREEMPT_RT, so in theory they could be
added to mainline too, after some time - given lots of testing and broad
agreement.)


comments regarding the hard parts of this work

Ingo says (at:
http://groups-beta.google.com/group/linux.kernel/msg/cf036477d30ab736)

some of the harder stuff:

  • the handling of per-CPU data structures (get_cpu_var())

  • RCU and softirq data structures

  • the handling of the IRQ flag

comments about the number of raw spinlocks needed

Ingo says (at:
http://groups-beta.google.com/group/linux.kernel/msg/e63b2860d2e993dd)

Sven Dietrich sdietr...@mvista.com wrote:

IMO the number of raw_spinlocks should be lower, I said teens before.

Theoretically, it should only need to be around hardware registers and
some memory maps and cache code, plus interrupt controller and other
SMP-contended hardware.

yeah, fully agreed. Right now the 90 locks i have means roughly 20% of
all locking still happens as raw spinlocks.

But, there is a ‘correctness’ minimum set of spinlocks that must be
raw spinlocks - this i tried to map in the -T4 patch. The patch does run
on SMP systems for example. (it was developed as an SMP kernel - in fact
i never compiled it as UP :-|.) If code has per-CPU or preemption
assumptions then there is no choice but to make it a raw spinlock, until
those assumptions are fixed.

Rationale

This feature is intended to provide much better realtime scheduling
response for a Linux system.

Resources

Projects

Various parties are working on ports: Time Sys
and Monta Vista, in particular, seem to have made ports to PPC and ARM
platforms.

Specifications

None that I’m aware of.

Online resources

The original announcement for voluntary-preemption:

Here’s some stuff by Jonathon Corbet:

There’s a page of links about RT for audio at:

A brief introduction of RT patch (Sorry, in Japanese only):

Downloads

Patch

  1. See http://redhat.com/~mingo/realtime-preempt/

Utility programs

[other programs, user-space, test, etc. related to this technology]

How To Use

  • apply patch
  • choose desired preemption level
  • compile kernel

Configuration variables

The patch introduces (or modifies) the following configuration
variables:












































































































VariablePurpose
ASM_SEMAPHORES
BLOCKER
CRITICAL_IRQSOFF_TIMING
CRITICAL_PREEMPT_TIMING
CRITICAL_TIMING
FRAME_POINTER
LATENCY_TIMING
LATENCY_TRACE
MCOUNT
PREEMPT
PREEMPT_BKL
PREEMPT_DESKTOP
PREEMPT_HARDIRQS
PREEMPT_NONE
PREEMPT_RT
PREEMPT_SOFTIRQS
PREEMPT_TRACE
PREEMPT_VOLUNTARY
RTC_HISTOGRAM
RT_DEADLOCK_DETECT
RWSEM_GENERIC_SPINLOCK
RWSEM_XCHGADD_ALGORITHM
SPINLOCK_BKL
USE_FRAME_POINTER
WAKEUP_TIMING
  • retrieved from patch with command:
  1. grep "[+-]config " realtime-preempt-2.6.10-mm1-V0.7.34-01 | sed "s/[+-]config //" | sort | uniq

How to validate

[put references to test plans, scripts, methods, etc. here]

Monta Vista released a similar technology,
which had the following features:

See
http://groups-beta.google.com/group/linux.kernel/msg/7eeef031d9ec1446

These RT enhancements are an integration of features developed by
others and some new MontaVista components:

  • Voluntary Preemption by Ingo Molnar
  • IRQ thread patches by Scott Wood and Ingo Molnar
  • BKL mutex patch by Ingo Molnar (with MV extensions)
  • PMutex from Germany’s Universitaet der Bundeswehr, Munich
  • MontaVista mutex abstraction layer replacing spinlocks with mutexes

Sample Results

[Examples of use with measurement of the effects.]

Case Study 1

Case Study 2

Trevor Woerner published some results in November 2005 regarding some
latency measurements he have been recording on the 2.6.14 kernel with
Ingo’s patches.

See
http://geek.vtnet.ca/embedded/LatencyTests/html/index.html

Case Study 3

Status

  • Rt_Preempt_Subpatch_Table
  • Status: [not started??]

    (one of: not started, researched, implemented, measured, documented, accepted)

  • Architecture Support:

    (for each arch, one of: unknown, patches apply, compiles, runs, works, accepted)

    • i386: unknown
    • ARM: unknown
    • PPC: unknown
    • MIPS: unknown
    • SH: unknown

Future Work/Action Items

Here is a list of things that could be worked on for this feature:

  • help with mainlining???
  • perform testing on multiple platforms
  • provide use cases for justification
  • what else?
  • break patch into manageable pieces - doesn’t Ingo use any kind of patch management system???

people who expressed interest

Manas Saksena, Jon Masters, Takeharu Kato, Ralph Siemsen, Jyunji Kondo

Categories: