The PDB File Format
Introduction
PDB (Program Database) is a file format invented by Microsoft and which containsdebug information that can be consumed by debuggers and other tools. Sinceofficially supported APIs exist on Windows for querying debug information fromPDBs even without the user understanding the internals of the file format, alarge ecosystem of tools has been built for Windows to consume this format. Inorder for Clang to be able to generate programs that can interoperate with thesetools, it is necessary for us to generate PDB files ourselves.
At the same time, LLVM has a long history of being able to cross-compile fromany platform to any platform, and we wish for the same to be true here. So itis necessary for us to understand the PDB file format at the byte-level so thatwe can generate PDB files entirely on our own.
This manual describes what we know about the PDB file format today. The layoutof the file, the various streams contained within, the format of individualrecords within, and more.
We would like to extend our heartfelt gratitude to Microsoft, without whom wewould not be where we are today. Much of the knowledge contained within thismanual was learned through reading code published by Microsoft on their GitHubrepo.
File Layout
Important
Unless otherwise specified, all numeric values are encoded in little endian.If you see a type such as uint16_t
or uint64_t
going forward, alwaysassume it is little endian!
The MSF Container
A PDB file is an MSF (Multi-Stream Format) file. An MSF file is a “file systemwithin a file”. It contains multiple streams (aka files) which can representarbitrary data, and these streams are divided into blocks which may notnecessarily be contiguously laid out within the MSF container file.Additionally, the MSF contains a stream directory (aka MFT) which describes howthe streams (files) are laid out within the MSF.
For more information about the MSF container format, stream directory, andblock layout, see The MSF File Format.
Streams
The PDB format contains a number of streams which describe various informationsuch as the types, symbols, source files, and compilands (e.g. object files)of a program, as well as some additional streams containing hash tables that areused by debuggers and other tools to provide fast lookup of records and typesby name, and various other information about how the program was compiled suchas the specific toolchain used, and more. A summary of streams contained in aPDB file is as follows:
Name | Stream Index | Contents |
---|---|---|
Old Directory | - Fixed Stream Index 0 | - Previous MSF Stream Directory |
PDB Stream | - Fixed Stream Index 1 | - Basic File Information- Fields to match EXE to this PDB- Map of named streams to stream indices |
TPI Stream | - Fixed Stream Index 2 | - CodeView Type Records- Index of TPI Hash Stream |
DBI Stream | - Fixed Stream Index 3 | - Module/Compiland Information- Indices of individual module streams- Indices of public / global streams- Section Contribution Information- Source File Information- References to streams containingFPO / PGO Data |
IPI Stream | - Fixed Stream Index 4 | - CodeView Type Records- Index of IPI Hash Stream |
/LinkInfo | - Contained in PDB StreamNamed Stream map | - Unknown |
/src/headerblock | - Contained in PDB StreamNamed Stream map | - Summary of embedded source file content(e.g. natvis files) |
/names | - Contained in PDB StreamNamed Stream map | - PDB-wide global string table used forstring de-duplication |
Module Info Stream | - Contained in DBI Stream- One for each compiland | - CodeView Symbol Records for this module- Line Number Information |
Public Stream | - Contained in DBI Stream | - Public (Exported) Symbol Records- Index of Public Hash Stream |
Global Stream | - Contained in DBI Stream | - Single combined master symbol-table- Index of Global Hash Stream |
TPI Hash Stream | - Contained in TPI Stream | - Hash table for looking up TPI recordsby name |
IPI Hash Stream | - Contained in IPI Stream | - Hash table for looking up IPI recordsby name |
More information about the structure of each of these can be found on thefollowing pages:
- The PDB Info Stream (aka the PDB Stream)
- Information about the PDB Info Stream and how it is used to match PDBs to EXEs.
- The PDB TPI and IPI Streams
- Information about the TPI stream and the CodeView records contained within.
- The PDB DBI (Debug Info) Stream
- Information about the DBI stream and relevant substreams including theModule Substreams, source file information, and CodeView symbol recordscontained within.
- The Module Information Stream
- Information about the Module Information Stream, of which there is one foreach compilation unit and the format of symbols contained within.
- The PDB Public Symbol Stream
- Information about the Public Symbol Stream.
- The PDB Global Symbol Stream
- Information about the Global Symbol Stream.
- The PDB Serialized Hash Table Format
- Information about the serialized hash table format used internally torepresent things such as the Named Stream Map and the Hash Adjusters in theTPI/IPI Stream.
CodeView
CodeView is another format which comes into the picture. While MSF definesthe structure of the overall file, and PDB defines the set of streams thatappear within the MSF file and the format of those streams, CodeView definesthe format of symbol and type records that appear within specific streams.Refer to the pages on CodeView Symbol Records and CodeView Type Records formore information about the CodeView format.