forked from microsoft/SizeBench
-
Notifications
You must be signed in to change notification settings - Fork 1
/
dynamic-initializers.html
215 lines (182 loc) · 12.1 KB
/
dynamic-initializers.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
<!DOCTYPE html>
<html>
<head>
<title>Dynamic Initializers</title>
<link rel="stylesheet" type="text/css" href="styles.css?v1">
</head>
<body>
<h1 style="border-bottom: 0;">SizeBench - a binary analysis tool for Windows</h1>
<h2>Dynamic Initializers</h2>
<p>
Or, why 'const' may not be as const as you thought.
</p>
<p>
Dynamic initializers are an interesting thing in binaries because they not only contribute code and data, they do so eagerly
whenever your binary is loaded into a process. Thus, they create overhead even if they're never used.
</p>
<h3>What is a Dynamic Initializer?</h3>
<p>
Before I explain what a dynamic initializer is, let's show an example type and how its constructor is impacted by const
non-POD <a href="https://en.wikipedia.org/wiki/Passive_data_structure">(Plain Old Data)</a> types. So let's start with this type:
</p>
<pre>
class SomeType {
const std::wstring str1{L"str1"};
const std::wstring str2{L"str2"};
// ... many more of these ...
};
</pre>
<p>
If you're familiar with modern C++, this looks pretty reasonable. You have some strings, they're constant, they carry their
length with them unlike C-style strings that are just raw pointers. This seems good! But, 'const' here isn't as constant as
you may have intended. In this case, the constant-ness of this is enforced by the C++ langauge which will prevent you from
writing to these variables at compile-time. But nothing enforces that at the operating system level. These strings still have
buffers that are in read/write memory from an OS perspective, and you can potentially const_cast them and mutate them (don't
do that...but you might get away with it).
</p>
<p>
The reason for this is because a Plain-Old-Data (POD) type like an integer or floating point value can really be stored in
the binary and in memory as a set of bytes that are in read-only memory. A non-POD type can't do that because it needs to
have its constructor run, and that code in many cases won't run at compile-time (constexpr can help, but only so much). So
in this case, <tt>std::wstring</tt> is not a POD type because it has a nontrivial constructor, and that in turn means that
<tt>SomeType</tt> has a non-trivial constructor that will call the <tt>str1</tt> and <tt>str2</tt> constructors. Thus,
<tt>SomeType</tt> is also not a POD type.
</p>
<p>
So what can you do about this? You want constant wide strings in C++, but why does it need to run constructors at runtime?
What you can do is translate the code to something like this:
</p>
<pre>
class SomeType {
static constexpr wchar_t* str1 = L"str1";
static constexpr wchar_t* str2 = L"str2";
// ... many more of these ...
};
</pre>
<p>
Or you could use <tt>constexpr std::wstring_view</tt> too, if you know you'll need the lengths often to avoid repeatedly
running strlen. But the point is that now these are old-school POD types (wchar_t*) and that means the <tt>SomeType</tt>
constructor does not need to call constructors on these. This in turn can cause <tt>SomeType</tt> to now have a trivial
constructor and become a POD type itself. In Microsoft Office we found a case like this with a lot of <tt>std::wstring</tt>
instances, and changed it as shown above. This resulted in the <tt>SomeType</tt> constructor going from 50kb of code to
not existing at all. And as a bonus, now each instance of <tt>SomeType</tt> doesn't have separate copies of the strings,
they are static, so memory consumption goes down if you have more than one of these objects around.
</p>
<p>
If you do decide to change a type to be POD then it is possible to have the compiler help verify that this is true and that
it doesn't regress in the future, courtesy of <tt>static_assert</tt>. The following statement would verify the previous
example:
</p>
<pre>
static_assert(std::is_podp<SomeType>(), "SomeType should be a POD type");
</pre>
<p>
So that's the basic idea behind why POD types are so efficient. But how does this relate to Dynamic Initializers?<br />
<br />
Well, imagine that an instance of <tt>SomeType</tt> were declared in the global scope. That means someone, somewhere needs
to generate code to run that <tt>SomeType</tt> constructor upon module load - this is done by the C(++) Run Time (CRT) before
your <tt>[Dll]Main</tt> executes. That way you can immediately use your global object as you'd expect. Thus, to continue
with the example from Office, this means all 50KB of code in the <tt>SomeType</tt> constructor needs to run just for this
DLL to be loaded into memory, even if no one ever actually touches that global instance of <tt>SomeType</tt>.<br />
<br />
Worse, if some of those strings are long enough, this could go past the small-string optimization and have the <tt>std::wstring</tt>
variables allocating memory on the heap, which is substantially slower. And all these dynamic initializers run synchronously,
so they can get in the way of critical paths like launching your app.
</p>
<h3>A worse example, with a <tt>std::map</tt></h3>
<p>
Let's see another example of this. Imagine you have this code:
</p>
<pre>
// In the global namespace
const std::map<int,std::string> g_enumsToNames = {
{ 0, "invalid" },
{ 1, "whoops" },
{ 2, "failure" }
};
</pre>
<p>
This is a pattern I've seen in many codebases, to map enumeration values to friendly string names for things like logging, error
messages for users, and so on.<br />
<br />
<tt>std::map</tt> has a nontrivial constructor so it is not a POD type, and because this is in the global scope it will generate
a dynamic initializer. That initializer calls the constructor of <tt>std::map</tt>, the constructor of <tt>std::string</tt> (3 times),
it allocates memory on the heap for the map's buckets, and more. Doing all of that requires code generation by the compiler, and
as mentioned above this will synchronously execute before <tt>[Dll]Main</tt> and impacts runtime to execute.
</p>
<p>
This is even worse because when you want to look up a value in this map, it will end up chasing pointer-dereferences to find the right
bucket in the map's internal data structures, and so on, which will quickly blow out your data cache on your CPU.
</p>
<p>
Instead of the above, you could have code like this:
</p>
<pre>
struct EnumToNameMapEntry {
int enumValue;
const char* name;
};
constexpr EnumToNameMapEntry g_enumsToNames[] = {
{ 0, "invalid" },
{ 1, "whoops" },
{ 2, "failure" }
};
</pre>
<p>
In this case, <tt>EnumToNameMapEntry</tt> is a POD type because it stores only POD types as members and has no nontrivial constructor
defined. The <tt>g_enumsToNames</tt> array is then also a POD because it's just an array of PODs. So no dynamic initializer is
generated at all. Each dynamic initializer generates a symmetric 'atexit destructor' in the <tt>.text$yd</tt> <a href="coff-group.html">COFF Group</a>
that can run in the case of shutdown which is similarly gone after making this change. The data in this map is also now in
<tt>.rdata</tt> read-only data pages and shareable between processes, if this binary should be loaded into multiple processes.
Lots of sweet things have happened with this transformation.
</p>
<p>
But wait, there's more! This array is very small, and these entries are very small - each instance of <tt>EnumToNameMapEntry</tt> is
only 8-12 bytes long depending on whether you compile for 32-bit or 64-bit. So this array of 3 values consumes 24-36 bytes in total
and fits entirely on a CPU cache line. Thus, walking linearly down this to find the right value instead of <tt>std::map</tt>'s fancier
search algorithms, will be much faster. All the data will stay in L0 cache and the code to linearly search an array is something that
modern CPUs are incredibly tuned for.
</p>
<p>
So you get smaller data, that is share-able between processes, less code to execute on module load, and faster lookup times at runtime
too! Pretty awesome. This is an anonymized real example that we fixed in the Microsoft Dynamics AX codebase.
</p>
<h3>Example in SizeBench UI</h3>
<img class="screenshot" src="Images/DynamicInitializers_Overview.png" width="800" />
<p>
To look for dynamic initializers in your binary, start by opening it up in SizeBench. Go to the 'Binary Sections' view and click on the
<tt>.text</tt> <a href="binary-section.html">binary section</a>. That's where all the code lives and dynamic initializers are just
code generated by the compiler for you. Then, within <tt>.text</tt> look for the <a href="coff-group.html">COFF Group</a> called
<tt>.text$di</tt> (di stands for "dynamic initializers" so at least it's somewhat memorable).
</p>
<p>
The example to the right shows what that looks like for the OpenConsole.exe binary from
<a href="https://github.com/Microsoft/Terminal">Windows Terminal</a>, as of commit <tt>10222a2b</tt>. At the bottom of the
screen you can see a list of all the symbols in that <a href="coff-group.html">COFF Group</a>. The last one visible in that list is a good
example. It's called <tt>`dynamic initializer for 'aliasesSeparator''</tt> and it is, as the name implies, related to the variable
named <tt>aliasesSeparator</tt>. This dynamic initializer is 45 bytes of code, again executed synchronously on module load.
</p>
<br style="clear:right" />
<br />
<br />
<img class="screenshot" src="Images/DynamicInitializers_aliasesSeparator.png" width=800 />
<p>
To the right is a screenshot of what SizeBench displays when you click on this dynamic initializer symbol. It will show what section
and COFF Group the symbol is contained in (<tt>.text$di</tt> in this case, as expected for a dynamic initializer), what library,
<a href="compiland.html">compiland</a> and source file it came from (alias.cpp in alias.obj in ConhostV2Lib.lib), the length of the
function, stuff like that. Next, look up alias.cpp from this commit on GitHub and it's
<a href="https://github.com/microsoft/terminal/blob/d09fdd61cbb11b7ef2ccdd4820349ffe898ad583/src/host/alias.cpp#L297">declared like this</a>:
</p>
<pre>
static std::wstring aliasesSeparator(L"=");
</pre>
<p>
This is very much like the example with <tt>std::map</tt> from Dynamics AX above. It's a non-POD type (<tt>std::wstring</tt>), so
the initializer is running the <tt>std::wstring</tt> constructor, pointing to the appropriate "=" constant string in the read-only
pages of the binary. If this code were changed to use <tt>const std::wstring_view</tt> or <tt>wchar_t*</tt> or some other form of
encoding the string that is a POD, then this dynamic initializer would disappear, as would its corresponding atexit destructor in
<tt>.text$yd</tt>.
</p>
<br style="clear:right" />
</body>
</html>