Alias analysis proceeds in 4 main phases:
This phase walks the types for structure variables, and determines which of the fields can overlap using offset and size of each field. For each field, a “subvariable” called a “Structure field tag” (SFT) is created, which represents that field as a separate variable. All accesses that could possibly overlap with a given field will have virtual operands for the SFT of that field.
struct foo { int a; int b; } struct foo temp; int bar (void) { int tmp1, tmp2, tmp3; SFT.0_2 = VDEF <SFT.0_1> temp.a = 5; SFT.1_4 = VDEF <SFT.1_3> temp.b = 6; VUSE <SFT.1_4> tmp1_5 = temp.b; VUSE <SFT.0_2> tmp2_6 = temp.a; tmp3_7 = tmp1_5 + tmp2_6; return tmp3_7; }
If you copy the symbol tag for a variable for some reason, you probably also want to copy the subvariables for that variable.
This phase walks the use-def chains in the SSA web looking for three things:
P_i = &VAR
The concept of `escaping' is the same one used in the Java world. When a pointer or an ADDR_EXPR escapes, it means that it has been exposed outside of the current function. So, assignment to global variables, function arguments and returning a pointer are all escape sites.
This is where we are currently limited. Since not everything is renamed into SSA, we lose track of escape properties when a pointer is stashed inside a field in a structure, for instance. In those cases, we are assuming that the pointer does escape.
We use escape analysis to determine whether a variable is call-clobbered. Simply put, if an ADDR_EXPR escapes, then the variable is call-clobbered. If a pointer P_i escapes, then all the variables pointed-to by P_i (and its memory tag) also escape.
We have two classes of memory tags. Memory tags associated with the pointed-to data type of the pointers in the program. These tags are called “symbol memory tag” (SMT). The other class are those associated with SSA_NAMEs, called “name memory tag” (NMT). The basic idea is that when adding operands for an INDIRECT_REF *P_i, we will first check whether P_i has a name tag, if it does we use it, because that will have more precise aliasing information. Otherwise, we use the standard symbol tag.
In this phase, we go through all the pointers we found in points-to analysis and create alias sets for the name memory tags associated with each pointer P_i. If P_i escapes, we mark call-clobbered the variables it points to and its tag.
This pass will compare the alias set of every symbol memory tag and every addressable variable found in the program. Given a symbol memory tag SMT and an addressable variable V. If the alias sets of SMT and V conflict (as computed by may_alias_p), then V is marked as an alias tag and added to the alias set of SMT.
Every language that wishes to perform language-specific alias analysis
should define a function that computes, given a tree
node, an alias set for the node. Nodes in different alias sets are not
allowed to alias. For an example, see the C front-end function
c_get_alias_set
.
For instance, consider the following function:
foo (int i) { int *p, *q, a, b; if (i > 10) p = &a; else q = &b; *p = 3; *q = 5; a = b + 2; return *p; }
After aliasing analysis has finished, the symbol memory tag for
pointer p
will have two aliases, namely variables a
and
b
.
Every time pointer p
is dereferenced, we want to mark the
operation as a potential reference to a
and b
.
foo (int i) { int *p, a, b; if (i_2 > 10) p_4 = &a; else p_6 = &b; # p_1 = PHI <p_4(1), p_6(2)>; # a_7 = VDEF <a_3>; # b_8 = VDEF <b_5>; *p_1 = 3; # a_9 = VDEF <a_7> # VUSE <b_8> a_9 = b_8 + 2; # VUSE <a_9>; # VUSE <b_8>; return *p_1; }
In certain cases, the list of may aliases for a pointer may grow too large. This may cause an explosion in the number of virtual operands inserted in the code. Resulting in increased memory consumption and compilation time.
When the number of virtual operands needed to represent aliased loads and stores grows too large (configurable with --param max-aliased-vops), alias sets are grouped to avoid severe compile-time slow downs and memory consumption. The alias grouping heuristic proceeds as follows:
may-aliases(T) = { V1, V2, V3, ..., Vn }
This means that every statement that references T, will get
n
virtual operands for each of the Vi tags. But, when
alias grouping is enabled, we make T an alias tag and add it
to the alias set of all the Vi variables:
may-aliases(V1) = { T } may-aliases(V2) = { T } ... may-aliases(Vn) = { T }
This has two effects: (a) statements referencing T will only get a single virtual operand, and, (b) all the variables Vi will now appear to alias each other. So, we lose alias precision to improve compile time. But, in theory, a program with such a high level of aliasing should not be very optimizable in the first place.
may-aliases(T) = { V1, V2, V3 } may-aliases(R) = { V2, V4 }
In step (2) we would have reverted the aliases for T as:
may-aliases(V1) = { T } may-aliases(V2) = { T } may-aliases(V3) = { T }
But note that now V2 is no longer aliased with R. We could add R to may-aliases(V2), but we are in the process of grouping aliases to reduce virtual operands so what we do is add V4 to the grouping to obtain:
may-aliases(V1) = { T } may-aliases(V2) = { T } may-aliases(V3) = { T } may-aliases(V4) = { T }