- System Size Auto-Reduction
- Contents
- Talk info
- Research Areas
- Talk outline
- Title
- Self-Introduction
- The problem of Bloat
- Bloat (cont.)
- Bloat (cont. 2)
- Automatic reduction (intro)
- auto-reduce - story of 8 bytes of bloat
- generalizing the problem of bloat
- An example of fixed input (uid in kernel)
- types of constraints
- also goes back up to user-space
- kernel command line args
- /proc values
- Tiny Distribution
- References
- Related Projects
- Materials
From: eLinux.org
System Size Auto-Reduction
This page has notes and an outline for Tim Bird’s Linux Auto-Reduction
research.
Contents
- 1 Talk info
- 2 Research Areas
- 3 Talk outline
- 3.1 Title
- 3.2 Self-Introduction
- 3.3 The problem of Bloat
- 3.4 Bloat (cont)
- 3.5 Bloat (cont- 2)
- 3.6 Automatic reduction
(intro) - 3.7 auto-reduce - story of 8 bytes of
bloat - 3.8 generalizing the problem of
bloat - 3.9 An example of fixed input (uid in
kernel) - 3.10 types of constraints
- 3.11 also goes back up to
user-space - 3.12 kernel command line args
- 3.13 /proc values
- 3.14 Tiny Distribution
- 4 References
- 5 Related Projects
- 6 Materials
Talk info
Tim gave a talk on this research at LinuxCon Japan 2013 (May 29 in
Tokyo, Japan)
Title
Advanced size optimization of the Linux kernel
Abstract
This presentation will cover recent research by Tim on aggressive size
reduction of the Linux kernel. This will include results from using gcc
link-time optimization (LTO) with the ARM architecture (using Andi
Kleen’s out-of-tree patches), as well as results and discussion of other
optimization techniques (including whole-system optimization for
embedded devices).
This talk is directed at kernel developers interested in reducing the
size of their Linux systems (and possible improving their performance in
the process). The talk will be highly technical.
Final slides from talk
Media:Bird-Kernel-Size-Optimization-LCJ-2013.pdf
Research Areas
LTO
- What is it?
- what was required to get it to work?
- Andi Kleen’s patch set
- what do they do?
- how big are they?
- mainline status?
- what is the size gain (see ELC poster)
- what can be done with it?
- long-term possibilities for LTO
global constraints
- overall idea: create constraints external to code, and use for
optimization - rationale: can’t maintain in-tree - too many config items
- make the application of constraints automatic
- use existing constraints to generate new constraints
constraints can flow between user-space and kernel
example: uid=0
- constraint language
- application by commenting out references (replace with 0 constant)
- use compiler to find code references (via error messages)
- eliminates problem with duplicate names (uid in different
structure)
- eliminates problem with duplicate names (uid in different
- use compiler to find code references (via error messages)
- constant propagation (by, e.g. LTO) reduces code
syscall elimination
- scan file system
- create report of used and unused system calls
- mark syscalls unused in kernel
- arch/arm/kernel/calls.S (and arch/arm/kernel/entry-common.S
- make sure unused syscalls are not
__attribute__(externally_visible)- technique of asmlinkage_\
- technique of asmlinkage_\
- use LTO to eliminate calls
- results: 50K-90K
ARM stack reduction
- 4k stacks
- stack extensions
link-time rewriting
cold code compression
- D. Chanet did cold code compression
- consists of:
- profiling the kernel
- marking code regions as cold or frozen
- replacing them with stubs
- compressing them
- At execution time:
- if a stub is called, it decompresses the code and calls it
- stub is fixed up to directly call decompressed code in future
- code is left decompressed forever
cold code compression
Results:
- MUST see paper for details (it’s quite complicated)
- on 2.4.25 kernel
- cold code compression resulted in 7% reduction for i386 kernel
and - 11.7% reduction for ARM kernel
- cold code compression resulted in 7% reduction for i386 kernel
Talk outline
This talk will be presented at LinuxCon Japan 2013:
Title
- Advanced size optimization of the Linux kernel
- by Tim Bird, Sony Mobile Communication
Self-Introduction
- I am Tim Bird
- Now working at Sony Mobile
- Researching system size for many years
- Long background in extremely small systems
- pre-professional: first program on TRS-80, in basic, 8K ram
- NetWare Lite - file and print server in 50K (in 1991)
The problem of Bloat
- Software bloat occurs because systems are built with more software
than is really needed for a given task - Open Source software meets the needs of thousands of different
systems- Linux scales from tiny sensors to supercomputers (extreme SMP
and high-end clusters) - Linux supports many, many features, only some of which are
configurable
- Linux scales from tiny sensors to supercomputers (extreme SMP
- Software must be generalized for many use cases
- bloat problem is:
- How to re-specialize the software, eliminating unused features
and dead code?
- How to re-specialize the software, eliminating unused features
Bloat (cont.)
- Software gets more generalized over time
- Can’t use strategy of manual tuning (config options)
- It gets harder and harder to remove things over time
- About 13,000 config items now (2.6.12 had 4700)
- You have to be an expert in too many things to reduce the kernel
- Must rely on automated methods of reduction
- Should use an additive, rather than subtractive method of building a
system- ultimate vision: indicate what you want/need, and build up
system to support it
- ultimate vision: indicate what you want/need, and build up
Bloat (cont. 2)
- In desktop or server, virtual memory makes bloat issue less
important for user-space programs- Only working set of program is loaded - pages are loaded on
demand - For kernel, all pages are always loaded
- Only working set of program is loaded - pages are loaded on
Automatic reduction (intro)
The problem with automatic reduction is that “the system” doesn’t know
what software is needed and what is not. there needs to be a way to tell
it about things that are not going to be used.
auto-reduce - story of 8 bytes of bloat
Story of the conditional check in kdb:
- I found a bug in kdb, when a particular option was using in the
configuration file - not everyone uses the configuration file
- not everyone uses the particular option
- bug only triggered in those circumstances
- I wrote a small patch, to guard against use of a variable
prematurely - problem: all users of KDB now have this check, and suffer this
overhead- it wasn’t much, just a single compare
- but this is how bloat builds up over time
- It bothered me because I knew most people didn’t need the check
- “correct” solution would be to parse the config file, and make the
code compile-time configurable- this adds more complexity than it is worth.
generalizing the problem of bloat
System doesn’t know inputs:
- It’s very easy to configure the kernel to omit the driver for
missing hardware. - It’s very difficult to configure the kernel to omit error handling
for bugs that
will never occur due to fixed use cases.
An example of fixed input (uid in kernel)
- throughout the kernel, there are references to uid
- comparisons, storing, referencing
- it turns out this is set by setuid(), by the ‘login’ program.
- login does a lookup and validates user account name in
/etc/passwd
- login does a lookup and validates user account name in
- what if /etc/passwd only has ‘root’ and no others?
- setuid() could only be called with a value of 0
- can I encode this constraint on the system.
types of constraints
There are numerous other examples of constraints:
- kernel command line arguments never used
- syscalls never called by any program
- parameters that are never used, or parameter values that are never
passed in- e.g. ioctl value that is not possible
- this only works in a fixed
- /proc values never referenced
- /sys values never referenced
also goes back up to user-space
- return values that are not possible
kernel command line args
- Documented in Documentation/kernel-parameters.txt
- defined with __setup() and early_param from include/linux/init.h
- approximately 480 __setup routines in kernel
- about 200 __setup_* in System.map on ARM kernel build (98
__setup_str_*) - about 230 early_param routines in kernel
- points to function
almost always sets a variable, which would default to 0
on ARM, with only console_setup and early_mem marked as ‘used’,
there was a 19K difference in size:
(non-LTO kernel)
vmlinux.baseline-setup-used => vmlinux-param-used
baseline other change percent
text: 7680084 7663472 -16612 0%
data: 362868 360516 -2352 0%
bss: 745312 745184 -128 0%
total: 8788264 8769172 -19092 0%
- on ARM, with only console_setup and early_mem marked as ‘used’,
there was a 19K difference in size:
(LTO kernel)
vmlinux.lto-param => vmlinux.param-used
baseline other change percent
text: 1653672 1648920 -4752 0%
data: 131636 130244 -1392 -1%
bss: 50688 50528 -160 0%
total: 1835996 1829692 -6304 0%
System.map from kernel with console_setup and early_mem as only
routines marked ‘used’:
$ grep __setup System.map
c00ea4bc T __setup_irq.153323
c00f1adc t __setup_per_zone_wmarks.172539.15755
c019d570 t __setup_str_early_mem.21664.160821
c019d884 t __setup_str_console_setup.61958.160201
c019ef00 t __setup_early_mem.21659.160819
c019ef00 T __setup_start
c019ef0c t __setup_console_setup.61953.160195
c019ef18 T __setup_end
/proc values
- Includes sysctl values
- there are approximately 1200, NOT related specifically to a process
- about 120 per process
- about 80 related to networking (on my desktop box)
- 40 others
Tiny Distribution
- poky-tiny distribution (yocto project)
- see
https://wiki.yoctoproject.org/wiki/Poky-Tiny - Good for testing and further research
References
- Chanet D. … “Automated reduction of the memory footprint of the
linux kernel” - Haifen He. …”Code Compaction of an Operating System Kernel”
Related Projects
- AnyKernel and Rumpkernel - see thesis by Antti Kantee - pooka (at)
iki (dot) fi- https://github.com/rumpkernel/wiki/wiki
- provides a system based on NetBSD to isolate sub-systems and
drivers and allow their use in micro-kernels and user-space - haven’t read enough of it to determine if it could be applied to
Linux, but sounds like just API wrapping - I’m not sure how robust it would be in the context of rapid
mainline churn