From: eLinux.org
Application Init Optimizations
Contents
- 1 Description
- 2 Rationale
- 3 Application Tuning
- 4 Resources
- 5 Specifications
- 6 Downloads
- 7 Sample Results
- 8 Future Work/Action Items
- 9 Other resource
Description
This page describes optimizations to a large application and to the
kernel, to shorten the time required to load and execute an application.
Two main techniques are described here: 1) use of mmap vs. read and 2)
control over page mapping characteristics. These techniques are
discussed below.
Rationale
Kernel bootup time is drastically improved with recent efforts including
CELF activities. As a next step, application bootup time should be
considered to cut down the system total bootup time. The techniques
described here are applicable to a large number of embedded systems,
which consist of large, single-application programs.
Application Tuning
Using mmap() instead of read() for initial application data load
An application may load a large amount of data when it is first
initialized. This can result in a long delay as the file data is read
into memory. It is possible to avoid the initial cost of this read, by
using mmap() instead of read().
Instead of loading all of the data into memory with the read system
call, the file can be mapped into memory with the mmap system call. Once
the data file is mapped, individual pages will be demand loaded during
execution, when the application reads them. Depending on the initial
working set size of the data in the file, this can result in significant
time savings. (For example, if an application only initially uses 50% of
the data from the file, then only 50% of the data will be read into
memory from persistent storage. There is extra overhead due to the cost
of page-faults incurred in loading the pages on demand. However, this
page fault overhead is offset by the savings in the number of page reads
(compared to the read() case).
Customizing file cache control in the Kernel
To further improve this method, the kernel can be modified to reduce
page copying and page faults.
Eliminating redundant page copies
When pages are demand loaded to a memory-mapped file, the pages are kept
in memory as part of the kernel “file cache” and mapped into the
requesting process’s address space. If the page is accessed via a write
operation, then the page in the file system cache is copied to a newly
allocated memory page. (This is referred to as “copy-on-write”). The
copied page can be then be freely modified by the process which maps it.
Suppose, however, that a file is mapped or accessed by only one process.
Then, copying the page is redundant. In this case, we can convert the
page in the file cache to a private page immediately. By utilizing this
assumption (only one user for the page), the cost of the copy can be
eliminated. This has the side benefit of reducing memory consumption as
well.
Reducing page faults
In some cases, an individual page in the process address space is
accessed first with a read operation, then with a write operation. This
results in two page faults for the same page (one to load the page and
move it “through” the file cache, and the other to get a local copy of
the page.) By eliminating the page copy, and making the page private on
the first access (whether read or write), the second page fault can be
reduced.
Controlling API
The current system is experimental, in the way it manages the files
affected by this caching/virtual memory customization. It would be
better to control this mechanism per file or virtual memory area. The
fcntl system call or mmap system call are candidates where this control
could be introduced.
Resources
Projects
None.
Specifications
Downloads
Patch
Sorry but there is no available patch at this time.
Sample Results
Case Study 1
Hardware’
SH3(7709) 133MHz
- 32MB RAM
- 64MB CF memory
Software Kernel
- 2.4.27 kernel.
Target application
- intent
- Loading data size: 8MB
Methods explanation
- read(CF/ext3) The data file is loaded using read system call from a
ext3 file system on a CompactFlash memory. - mmap(CF/ext3) The data file is mapped to the process virtual space
using mmap system call. - takeover(CF/ext3) The data file is mapped and the page in the file
system cache (which is created during page fault handling) is
converted to private page immediately. - takeover(CF/squash) Same as No.3 except using the SquashFS file
system. - takeover(RD/squash) Same as No.3 except the file system is on read
from a RAM Disk instead of Compact Flash.
Results
No. | Method | Media | FS | Ave. | 1st | 2nd | 3rd | Diff. |
---|---|---|---|---|---|---|---|---|
1 | read | CF | ext3 | 4.420 | 4.418 | 4.420 | 4.421 | - |
2 | mmap | CF | ext3 | 3.995 | 3.995 | 3.995 | 3.996 | -0.424 |
3 | takeover | CF | ext3 | 3.959 | 3.959 | 3.958 | 3.966 | -0.461 |
4 | takeover | CF | squash | 4.002 | 4.000 | 4.000 | 4.007 | -0.417 |
5 | takeover(total) | RD | squash | 4.588 | 4.579 | 4.590 | 4.595 | 0.168 |
dd(CF -> RD) | RD | squash | 1.212 | 1.209 | 1.209 | 1.217 | ||
mount | RD | squash | 0.041 | 0.040 | 0.041 | 0.041 | ||
takeover | RD | squash | 3.336 | 3.330 | 3.340 | 3.337 |
- UNIT: sec
- CF: CompactFlash / RD: RAM Disk
- As the result of using mmap system call, bootup time is reduced by
about 400msec (10% of total init time). - By using the takeover method, page faults are reduced to 317 times,
versus 496 under the mmap method. Also, redundant page copies are
eliminated. As the result, about 40msec is eliminated. - Squashfs is compressed ROM file system and there are some extra cost
to access data, decompression and so on… But the performance is not
so bad against ext3fs. Using squashfs is a good choice to reduce
consumption of storage spaces. - Using a file system on a RAM disk is the most efficient way to
increase file access performance. If the storage
device which stores the file system image is enough fast and extra RAM
usage is affordable, it might be a good choice to reduce bootup time.
Case Study 2
- Status: measured
Architecture Support:
i386: unknown
- ARM: unknown
- PPC: unknown
- MIPS: unknown
- SH: works on SH3
Future Work/Action Items
Here is a list of things that could be worked on for this feature:
- I’m considering to implement similar file cache control using
fadvise system call under 2.6 kernel.
Other resource
This project was demo-ed at the 2005 CELF Technical Conference. The
picture of the poster is here: