MemorySSA
Introduction
MemorySSA
is an analysis that allows us to cheaply reason about theinteractions between various memory operations. Its goal is to replaceMemoryDependenceAnalysis
for most (if not all) use-cases. This is because,unless you’re very careful, use of MemoryDependenceAnalysis
can easilyresult in quadratic-time algorithms in LLVM. Additionally, MemorySSA
doesn’thave as many arbitrary limits as MemoryDependenceAnalysis
, so you should getbetter results, too.
At a high level, one of the goals of MemorySSA
is to provide an SSA basedform for memory, complete with def-use and use-def chains, whichenables users to quickly find may-def and may-uses of memory operations.It can also be thought of as a way to cheaply give versions to the completestate of heap memory, and associate memory operations with those versions.
This document goes over how MemorySSA
is structured, and some basicintuition on how MemorySSA
works.
A paper on MemorySSA (with notes about how it’s implemented in GCC) can befound here. Though, it’srelatively out-of-date; the paper references multiple heap partitions, but GCCeventually swapped to just using one, like we now have in LLVM. LikeGCC’s, LLVM’s MemorySSA is intraprocedural.
MemorySSA Structure
MemorySSA is a virtual IR. After it’s built, MemorySSA
will contain astructure that maps Instruction
s to MemoryAccess
es, which areMemorySSA
’s parallel to LLVM Instruction
s.
Each MemoryAccess
can be one of three types:
MemoryPhi
MemoryUse
MemoryDef
MemoryPhi
s are PhiNode
s, but for memory operations. If at anypoint we have two (or more) MemoryDef
s that could flow into aBasicBlock
, the block’s top MemoryAccess
will be aMemoryPhi
. As in LLVM IR, MemoryPhi
s don’t correspond to anyconcrete operation. As such, BasicBlock
s are mapped to MemoryPhi
sinside MemorySSA
, whereas Instruction
s are mapped to MemoryUse
sand MemoryDef
s.
Note also that in SSA, Phi nodes merge must-reach definitions (that is,definitions that must be new versions of variables). In MemorySSA, PHI nodesmerge may-reach definitions (that is, until disambiguated, the versions thatreach a phi node may or may not clobber a given variable).
MemoryUse
s are operations which use but don’t modify memory. An example ofa MemoryUse
is a load
, or a readonly
function call.
MemoryDef
s are operations which may either modify memory, or whichintroduce some kind of ordering constraints. Examples of MemoryDef
sinclude store
s, function calls, load
s with acquire
(or higher)ordering, volatile operations, memory fences, etc.
Every function that exists has a special MemoryDef
called liveOnEntry
.It dominates every MemoryAccess
in the function that MemorySSA
is beingrun on, and implies that we’ve hit the top of the function. It’s the onlyMemoryDef
that maps to no Instruction
in LLVM IR. Use ofliveOnEntry
implies that the memory being used is either undefined ordefined before the function begins.
An example of all of this overlaid on LLVM IR (obtained by running opt-passes='print<memoryssa>' -disable-output
on an .ll
file) is below. Whenviewing this example, it may be helpful to view it in terms of clobbers. Theoperands of a given MemoryAccess
are all (potential) clobbers of saidMemoryAccess, and the value produced by a MemoryAccess
can act as a clobberfor other MemoryAccess
es. Another useful way of looking at it is interms of heap versions. In that view, operands of a givenMemoryAccess
are the version of the heap before the operation, andif the access produces a value, the value is the new version of the heapafter the operation.
- define void @foo() {
- entry:
- %p1 = alloca i8
- %p2 = alloca i8
- %p3 = alloca i8
- ; 1 = MemoryDef(liveOnEntry)
- store i8 0, i8* %p3
- br label %while.cond
- while.cond:
- ; 6 = MemoryPhi({%0,1},{if.end,4})
- br i1 undef, label %if.then, label %if.else
- if.then:
- ; 2 = MemoryDef(6)
- store i8 0, i8* %p1
- br label %if.end
- if.else:
- ; 3 = MemoryDef(6)
- store i8 1, i8* %p2
- br label %if.end
- if.end:
- ; 5 = MemoryPhi({if.then,2},{if.else,3})
- ; MemoryUse(5)
- %1 = load i8, i8* %p1
- ; 4 = MemoryDef(5)
- store i8 2, i8* %p2
- ; MemoryUse(1)
- %2 = load i8, i8* %p3
- br label %while.cond
- }
The MemorySSA
IR is shown in comments that precede the instructions they mapto (if such an instruction exists). For example, 1 = MemoryDef(liveOnEntry)
is a MemoryAccess
(specifically, a MemoryDef
), and it describes the LLVMinstruction store i8 0, i8 %p3
. Other places in MemorySSA
refer to thisparticular MemoryDef
as 1
(much like how one can refer to load i8, i8
%p1
in LLVM with %1
). Again, MemoryPhi
s don’t correspond to any LLVMInstruction, so the line directly below a MemoryPhi
isn’t special.
Going from the top down:
6 = MemoryPhi({entry,1},{if.end,4})
notes that, when enteringwhile.cond
, the reaching definition for it is either1
or4
. ThisMemoryPhi
is referred to in the textual IR by the number6
.2 = MemoryDef(6)
notes thatstore i8 0, i8* %p1
is a definition,and its reaching definition before it is6
, or theMemoryPhi
afterwhile.cond
. (See the Build-time use optimization and Precisionsections below for why thisMemoryDef
isn’t linked to a separate,disambiguatedMemoryPhi
.)3 = MemoryDef(6)
notes thatstore i8 0, i8* %p2
is a definition; itsreaching definition is also6
.5 = MemoryPhi({if.then,2},{if.else,3})
notes that the clobber beforethis block could either be2
or3
.MemoryUse(5)
notes thatload i8, i8* %p1
is a use of memory, and thatit’s clobbered by5
.4 = MemoryDef(5)
notes thatstore i8 2, i8* %p2
is a definition; it’sreaching definition is5
.MemoryUse(1)
notes thatload i8, i8* %p3
is just a user of memory,and the last thing that could clobber this use is abovewhile.cond
(e.g.the store to%p3
). In heap versioning parlance, it really only depends onthe heap version 1, and is unaffected by the new heap versions generated sincethen.
As an aside, MemoryAccess
is a Value
mostly for convenience; it’s notmeant to interact with LLVM IR.
Design of MemorySSA
MemorySSA
is an analysis that can be built for any arbitrary function. Whenit’s built, it does a pass over the function’s IR in order to build up itsmapping of MemoryAccess
es. You can then query MemorySSA
for thingslike the dominance relation between MemoryAccess
es, and get theMemoryAccess
for any given Instruction
.
When MemorySSA
is done building, it also hands you a MemorySSAWalker
that you can use (see below).
The walker
A structure that helps MemorySSA
do its job is the MemorySSAWalker
, orthe walker, for short. The goal of the walker is to provide answers to clobberqueries beyond what’s represented directly by MemoryAccess
es. For example,given:
- define void @foo() {
- %a = alloca i8
- %b = alloca i8
- ; 1 = MemoryDef(liveOnEntry)
- store i8 0, i8* %a
- ; 2 = MemoryDef(1)
- store i8 0, i8* %b
- }
The store to %a
is clearly not a clobber for the store to %b
. It wouldbe the walker’s goal to figure this out, and return liveOnEntry
when queriedfor the clobber of MemoryAccess
2
.
By default, MemorySSA
provides a walker that can optimize MemoryDef
sand MemoryUse
s by consulting whatever alias analysis stack you happen tobe using. Walkers were built to be flexible, though, so it’s entirely reasonable(and expected) to create more specialized walkers (e.g. one that specificallyqueries GlobalsAA
, one that always stops at MemoryPhi
nodes, etc).
Locating clobbers yourself
If you choose to make your own walker, you can find the clobber for aMemoryAccess
by walking every MemoryDef
that dominates saidMemoryAccess
. The structure of MemoryDef
s makes this relatively simple;they ultimately form a linked list of every clobber that dominates theMemoryAccess
that you’re trying to optimize. In other words, thedefiningAccess
of a MemoryDef
is always the nearest dominatingMemoryDef
or MemoryPhi
of said MemoryDef
.
Build-time use optimization
MemorySSA
will optimize some MemoryAccess
es at build-time.Specifically, we optimize the operand of every MemoryUse
to point to theactual clobber of said MemoryUse
. This can be seen in the above example; thesecond MemoryUse
in if.end
has an operand of 1
, which is aMemoryDef
from the entry block. This is done to make walking,value numbering, etc, faster and easier.
It is not possible to optimize MemoryDef
in the same way, as werestrict MemorySSA
to one heap variable and, thus, one Phi nodeper block.
Invalidation and updating
Because MemorySSA
keeps track of LLVM IR, it needs to be updated wheneverthe IR is updated. “Update”, in this case, includes the addition, deletion, andmotion of Instructions
. The update API is being made on an as-needed basis.If you’d like examples, GVNHoist
is a user of MemorySSA
s update API.
Phi placement
MemorySSA
only places MemoryPhi
s where they’re actuallyneeded. That is, it is a pruned SSA form, like LLVM’s SSA form. Forexample, consider:
- define void @foo() {
- entry:
- %p1 = alloca i8
- %p2 = alloca i8
- %p3 = alloca i8
- ; 1 = MemoryDef(liveOnEntry)
- store i8 0, i8* %p3
- br label %while.cond
- while.cond:
- ; 3 = MemoryPhi({%0,1},{if.end,2})
- br i1 undef, label %if.then, label %if.else
- if.then:
- br label %if.end
- if.else:
- br label %if.end
- if.end:
- ; MemoryUse(1)
- %1 = load i8, i8* %p1
- ; 2 = MemoryDef(3)
- store i8 2, i8* %p2
- ; MemoryUse(1)
- %2 = load i8, i8* %p3
- br label %while.cond
- }
Because we removed the stores from if.then
and if.else
, a MemoryPhi
for if.end
would be pointless, so we don’t place one. So, if you need toplace a MemoryDef
in if.then
or if.else
, you’ll need to also createa MemoryPhi
for if.end
.
If it turns out that this is a large burden, we can just place MemoryPhi
severywhere. Because we have Walkers that are capable of optimizing above saidphis, doing so shouldn’t prohibit optimizations.
Non-Goals
MemorySSA
is meant to reason about the relation between memoryoperations, and enable quicker querying.It isn’t meant to be the single source of truth for all potential memory-relatedoptimizations. Specifically, care must be taken when trying to use MemorySSA
to reason about atomic or volatile operations, as in:
- define i8 @foo(i8* %a) {
- entry:
- br i1 undef, label %if.then, label %if.end
- if.then:
- ; 1 = MemoryDef(liveOnEntry)
- %0 = load volatile i8, i8* %a
- br label %if.end
- if.end:
- %av = phi i8 [0, %entry], [%0, %if.then]
- ret i8 %av
- }
Going solely by MemorySSA
’s analysis, hoisting the load
to entry
mayseem legal. Because it’s a volatile load, though, it’s not.
Design tradeoffs
Precision
MemorySSA
in LLVM deliberately trades off precision for speed.Let us think about memory variables as if they were disjoint partitions of theheap (that is, if you have one variable, as above, it represents the entireheap, and if you have multiple variables, each one represents somedisjoint portion of the heap)
First, because alias analysis results conflict with each other, andeach result may be what an analysis wants (IETBAA may say no-alias, and something else may say must-alias), it isnot possible to partition the heap the way every optimization wants.Second, some alias analysis results are not transitive (IE A noalias B,and B noalias C, does not mean A noalias C), so it is not possible tocome up with a precise partitioning in all cases without variables torepresent every pair of possible aliases. Thus, partitioningprecisely may require introducing at least N^2 new virtual variables,phi nodes, etc.
Each of these variables may be clobbered at multiple def sites.
To give an example, if you were to split up struct fields intoindividual variables, all aliasing operations that may-def multiple structfields, will may-def more than one of them. This is pretty common (calls,copies, field stores, etc).
Experience with SSA forms for memory in other compilers has shown thatit is simply not possible to do this precisely, and in fact, doing itprecisely is not worth it, because now all the optimizations have towalk tons and tons of virtual variables and phi nodes.
So we partition. At the point at which you partition, again,experience has shown us there is no point in partitioning to more thanone variable. It simply generates more IR, and optimizations stillhave to query something to disambiguate further anyway.
As a result, LLVM partitions to one variable.
Use Optimization
Unlike other partitioned forms, LLVM’s MemorySSA
does make oneuseful guarantee - all loads are optimized to point at the thing thatactually clobbers them. This gives some nice properties. For example,for a given store, you can find all loads actually clobbered by thatstore by walking the immediate uses of the store.