Profile Guided Optimization

rustc supports doing profile-guided optimization (PGO).This chapter describes what PGO is, what it is good for, and how it can be used.

What Is Profiled-Guided Optimization?

The basic concept of PGO is to collect data about the typical execution ofa program (e.g. which branches it is likely to take) and then use this datato inform optimizations such as inlining, machine-code layout,register allocation, etc.

There are different ways of collecting data about a program's execution.One is to run the program inside a profiler (such as perf) and anotheris to create an instrumented binary, that is, a binary that has datacollection built into it, and run that.The latter usually provides more accurate data and it is also what issupported by rustc.

Usage

Generating a PGO-optimized program involves following a workflow with four steps:

  • Compile the program with instrumentation enabled(e.g. rustc -Cprofile-generate=/tmp/pgo-data main.rs)
  • Run the instrumented program (e.g. ./main) which generates adefault_<id>.profraw file
  • Convert the .profraw file into a .profdata file usingLLVM's llvm-profdata tool
  • Compile the program again, this time making use of the profiling data(for example rustc -Cprofile-use=merged.profdata main.rs)An instrumented program will create one or more .profraw files, one for eachinstrumented binary. E.g. an instrumented executable that loads two instrumenteddynamic libraries at runtime will generate three .profraw files. Running aninstrumented binary multiple times, on the other hand, will re-use therespective .profraw files, updating them in place.

These .profraw files have to be post-processed before they can be fed backinto the compiler. This is done by the llvm-profdata tool. This toolis most easily installed via

  1. rustup component add llvm-tools-preview

Note that installing the llvm-tools-preview component won't addllvm-profdata to the PATH. Rather, the tool can be found in:

  1. ~/.rustup/toolchains/<toolchain>/lib/rustlib/<target-triple>/bin/

Alternatively, an llvm-profdata coming with a recent LLVM or Clangversion usually works too.

The llvm-profdata tool merges multiple .profraw files into a single.profdata file that can then be fed back into the compiler via-Cprofile-use:

  1. # STEP 1: Compile the binary with instrumentation
  2. rustc -Cprofile-generate=/tmp/pgo-data -O ./main.rs
  3. # STEP 2: Run the binary a few times, maybe with common sets of args.
  4. # Each run will create or update `.profraw` files in /tmp/pgo-data
  5. ./main mydata1.csv
  6. ./main mydata2.csv
  7. ./main mydata3.csv
  8. # STEP 3: Merge and post-process all the `.profraw` files in /tmp/pgo-data
  9. llvm-profdata merge -o ./merged.profdata /tmp/pgo-data
  10. # STEP 4: Use the merged `.profdata` file during optimization. All `rustc`
  11. # flags have to be the same.
  12. rustc -Cprofile-use=./merged.profdata -O ./main.rs

A Complete Cargo Workflow

Using this feature with Cargo works very similar to using it with rustcdirectly. Again, we generate an instrumented binary, run it to produce data,merge the data, and feed it back into the compiler. Some things of note:

  • We use the RUSTFLAGS environment variable in order to pass the PGO compilerflags to the compilation of all crates in the program.

  • We pass the —target flag to Cargo, which prevents the RUSTFLAGSarguments to be passed to Cargo build scripts. We don't want the buildscripts to generate a bunch of .profraw files.

  • We pass —release to Cargo because that's where PGO makes the most sense.In theory, PGO can also be done on debug builds but there is little reasonto do so.

  • It is recommended to use absolute paths for the argument of-Cprofile-generate and -Cprofile-use. Cargo can invoke rustc withvarying working directories, meaning that rustc will not be able to findthe supplied .profdata file. With absolute paths this is not an issue.

  • It is good practice to make sure that there is no left-over profiling datafrom previous compilation sessions. Just deleting the directory is a simpleway of doing so (see STEP 0 below).

This is what the entire workflow looks like:

  1. # STEP 0: Make sure there is no left-over profiling data from previous runs
  2. rm -rf /tmp/pgo-data
  3. # STEP 1: Build the instrumented binaries
  4. RUSTFLAGS="-Cprofile-generate=/tmp/pgo-data" \
  5. cargo build --release --target=x86_64-unknown-linux-gnu
  6. # STEP 2: Run the instrumented binaries with some typical data
  7. ./target/x86_64-unknown-linux-gnu/release/myprogram mydata1.csv
  8. ./target/x86_64-unknown-linux-gnu/release/myprogram mydata2.csv
  9. ./target/x86_64-unknown-linux-gnu/release/myprogram mydata3.csv
  10. # STEP 3: Merge the `.profraw` files into a `.profdata` file
  11. llvm-profdata merge -o /tmp/pgo-data/merged.profdata /tmp/pgo-data
  12. # STEP 4: Use the `.profdata` file for guiding optimizations
  13. RUSTFLAGS="-Cprofile-use=/tmp/pgo-data/merged.profdata" \
  14. cargo build --release --target=x86_64-unknown-linux-gnu

Troubleshooting

  • It is recommended to pass -Cllvm-args=-pgo-warn-missing-function during the-Cprofile-use phase. LLVM by default does not warn if it cannot findprofiling data for a given function. Enabling this warning will make iteasier to spot errors in your setup.

  • There is a known issue inCargo prior to version 1.39 that will prevent PGO from working correctly. Besure to use Cargo 1.39 or newer when doing PGO.

Further Reading

rustc's PGO support relies entirely on LLVM's implementation of the featureand is equivalent to what Clang offers via the -fprofile-generate /-fprofile-use flags. The Profile Guided Optimization sectionin Clang's documentation is therefore an interesting read for anyone who wantsto use PGO with Rust.