return address integrity should mitigate against
overwriting the return address after the stack con‐
tents have been leaked (stack contents can be stack
and code pointers). Defenses can still be probabilis‐
tic which means that mitigation is not fully guaran‐
teed.
Replay attacks need observation of a large execu‐
tion path to inject old return addresses in a different
context where they can not be distinguished from
the valid addresses. If a solution does not prevent
this, it is a drawback but overall still a huge im‐
provement to the current state if this is the only vul‐
nerable point.
Because we value compatibility we disregard the
time window between the call instruction until the
function prolog has finished as well from the start
of the function epilog until the return address is ex‐
ecuted. Reads and writes during this time are out of
our threat model.
Defenses against the described leaks have to be
more involved than a simple XOR and therefore we
expect them to have a bigger impact on the runtime.
Shadow stacks are easy solutions that fulfill the cri‐
teria but they are known to be slow. On the other
hand this means that any cryptographic approach
should be faster than shadow stacks, otherwise it
makes no sense to use it because the properties are
likely to be weaker than those of shadow stacks.
4. PROJECT PLAN AND RESULT
First we intended to accompany SafeStack with a
RETGUARD-like XOR of the return address, but in‐
stead of the RSP using a secret value. As we realized
that this is vulnerable to the described known-plain‐
text attack we studied stronger cryptograhic primi‐
tives for signatures, encryption and HMACs and
also found out how CCFI and Qualcomm Pointer
Authentification approached this.
The implementation also had unexpected chal‐
lenges because a separate LLVM pass does not allow
the replacement of machine instructions. Like RET‐
GUARD we would have to modify LLVM with a
new machine target pass. But there is not much doc‐
umentation to find and the compile times allowed
only four tries per hour. Therefore we decided to go
with assembly instrumentation for the limited scope
of this project. This also allowed us to produce more
prototypes with different techniques for return ad‐
dress protection. Each prototype is a pass which op‐
erates on assembly artifacts of Clang.
Main building blocks for a compatible instrumen‐
tation of the function pro- and epilogs are a way to
decide whether initialization code should be execut‐
ed and a secure information storage. Thread local
storage (TLS) variables, in C declared with the
__thread prefix, can be declared with a initial val‐
ue which allows detection if a new thread was start‐
ed. For the safe storage place the TLS is not appro‐
priate because it is writable in memory and is
placed at the beginning of each secondary thread.
Since the System V 64 bit ABI does not guarantee
regular registers to stay untouched we considered
switching off AVX usage during compilation. This
not only impacts performance but also hurts compa‐
bility when linking. Luckily we found out that the
old x87 floating point register stack (st0, st1, …) is
not used by GCC or Clang and even can not be
turned on with
-mfpmath=387. We therefore con‐
sider these registers as safe because only our instru‐
mentation code will use them. Binary ROP gadgets
are irrelevant because they require that the control
flow hijack already took place.
Next we will describe the assembly instrumenta‐
tion framework, a RAP-like simple XOR pass, an
userspace shadow stack pass, an in-kernel shadow
stack pass and finally a pass with HMAC-based
cryptographic protection.
4.1 INSTRUMENTATION FRAMEWORK
We designed a compiler wrapper script that will
first produce assembly output for instrumentation
and then continue with the final compilation to a
linked binary. Because it should behave invoking
the regular compiler some more tricks are involved
but we hope to have most things covered. The in‐
strumentation passes are kept in separate files and
are specified as argument for the wrapper. An extra
layer of complexity is that for supporting shared ob‐
jects each pass needs to switch between the modes
of addressing the TLS variables. Also, a pass should
emit its global definitions only once.
# Set compiler for build scripts
# or manual invocation as $CC
export CC="$PWD/ccwrapper $PWD/pypass…"
For usage with a build system exporting the CC en‐
vironment variable was enough in many cases,
some build scripts might need adaptions. The test
about how the compiler “reports undeclared, stan‐
dard C functions” needs to be patched away for
gzip. Another instance are
-O2 flags for optimiza‐
tion e.g. if the used instrumentation pass can only
follow strict LIFO semantics the stack. But our
framework may have other problems with opti‐
mized code, too.
The requirements for a pass are that it can be
called with these arguments by the wrapper: