Topic: Robust and Efficient Elimination of Cache and Timing Side Channels Review Guidelines: 1. One page, single space, margin 1
Topic: Robust and Efficient Elimination of Cache and Timing Side Channels
Review Guidelines:
1. One page, single space, margin 1 inch (top, bottom, left, right), font (times new roman, size 10)
Supporting document are attached.
Robust and Efficient Elimination of Cache and Timing Side Channels
Benjamin A. Braun1, Suman Jana1, and Dan Boneh1
1Stanford University
Abstract—Timing and cache side channels provide powerful attacks against many sensitive operations including cryptographic implementations. Existing defenses cannot protect against all classes of such attacks without incurring prohibitive performance overhead. A popular strategy for defending against all classes of these attacks is to modify the implementation so that the timing and cache access patterns of every hardware instruction is independent of the secret inputs. However, this solution is architecture-specific, brittle, and difficult to get right. In this paper, we propose and evaluate a robust low-overhead technique for mitigating timing and cache channels. Our solution requires only minimal source code changes and works across multiple lan- guages/platforms. We report the experimental results of applying our solution to protect several C, C++, and Java programs. Our results demonstrate that our solution successfully eliminates the timing and cache side-channel leaks while incurring significantly lower performance overhead than existing approaches.
I. INTRODUCTION
Defending against cache and timing side channel attacks is known to be a hard and important problem. Timing and cache attacks can be used to extract cryptographic secrets from running systems [14, 15, 23, 29, 35, 36, 36, 40], spy on Web user activity [12], and even undo the privacy of differential privacy systems [5, 24]. Attacks exploiting timing side channels have been demonstrated for both remote and local adversaries. A remote attacker is separated from its target by a network [14, 15, 29, 36] while a local attacker can execute unprivileged spyware on the target machine [7, 9, 11, 36, 45, 47].
Most existing defenses against cache and timing attacks only protect against a subset of attacks and incur significant performance overheads. For example, one way to defend against remote timing attacks is to make sure that the timing of any externally observable events are independent of any data that should be kept secret. Several different strategies have been proposed to achieve this, including application-specific changes [10, 27, 30], static transformation [17, 20], and dynamic padding [6, 18, 24, 31, 47]. However, none of these strategies defend against local timing attacks where the attacker spies on the target application by measuring the target’s impact on the local cache and other resources. Similarly, the strategies for defending against local cache attacks like static partitioning of resources [28, 37, 43, 44], flushing state [50], obfuscating cache access patterns [9, 10, 13, 35, 40], and moderating access to fine-grained timers [33, 34, 42], also incur significant performance penalties while still leaving the target potentially vulnerable to timing attacks. We survey these methods in related work (Section VIII).
A popular approach for defending against both local and remote timing attacks is to ensure that the low-level instruction sequence does not contain instructions whose performance depends on secret information. This can be enforced by manually re-writing the code, as was done in OpenSSL1, or by changing the compiler to ensure that the generated code has this property [20].
Unfortunately, this popular strategy can fail to ensure security for several reasons. First, the timing properties of instructions may differ in subtle ways from one architecture to another (or even from one processor model to another) resulting in an instruction sequence that is unsafe for some architectures/processor models. Second, this strategy does not work for languages like Java where the Java Virtual Machine (JVM) optimizes the bytecode at runtime and may inad- vertently introduce secret-dependent timing variations. Third, manually ensuring that a certain code transformation prevents timing attacks can be extremely difficult and tedious, as was the case when updating OpenSSL to prevent the Lucky-thirteen timing attack [32].
Our contribution. We propose the first low-overhead, application-independent, and cross-language defense that can protect against both local and remote timing attacks with minimal application code changes. We show that our defense is language-independent by applying the strategy to protect applications written in Java and C/C++. Our defense requires relatively simple modifications to the underlying OS and can run on off-the-shelf hardware.
We implement our approach in Linux and show that the execution times of protected functions are independent of secret data. We also demonstrate that the performance overhead of our defense is low. For example, the performance overhead to protect the entire state machine running inside a SSL/TLS server against all known timing- and cache-based side channel attacks is less than 5% in connection latency.
We summarize the key insights behind our solution (de- scribed in detail in Section IV) below.
• We leverage programmer code annotations to identify and protect sensitive code that operates on secret data. Our defense mechanism only protects the sensitive func- tions. This lets us minimize the performance impact of our scheme by leaving the performance of non-sensitive functions unchanged.
1In the case of RSA private key operations, OpenSSL uses an additional defense called blinding.
ar X
iv :1
50 6.
00 18
9v 2
[ cs
.C R
] 3
1 A
ug 2
01 5
• We further minimize the performance overhead by sepa- rating and accurately accounting for secret-dependent and secret-independent timing variations. Secret-independent timing variations (e.g., the ones caused by interrupts, the OS scheduler, or non-secret execution flow) do not leak any sensitive information to the attacker and thus are treated differently than secret-dependent variations by our scheme.
• We demonstrate that existing OS services like schedulers and hardware features like memory hierarchies can be leveraged to create a lightweight isolation mechanism that can protect a sensitive function’s execution from other local untrusted processes and minimize timing variations during the function’s execution.
• We show that naive implementations of delay loops in most existing hardware leak timing information due to the underlying delay primitive’s (e.g., NOP instruction) limited accuracy. We create and evaluate a new scheme for implementing delay loops that prevents such leakage while still using existing coarse-grained delay primitives.
• We design and evaluate a lazy state cleansing mechanism that clears the sensitive state left in shared hardware resources (e.g., branch predictors, caches, etc.) before handing them over to an untrusted process. We find that lazy state cleansing incurs significantly less overhead than performing state cleaning as soon as a sensitive function finishes execution.
II. KNOWN TIMING ATTACKS
Before describing our proposed defense we briefly survey different types of timing attackers. In the previous section, we discussed the difference between a local and a remote timing attacker: a local timing attacker, in addition to monitoring the total computation time, can spy on the target application by monitoring the state of shared hardware resources such as the local cache.
Concurrent vs. non-concurrent attacks. In a concurrent attack, the attacker can probe shared resources while the target application is operating. For example, the attacker can measure timing information or inspect the state of the shared resources at intermediate steps of a sensitive operation. The attacker’s process can control the concurrent access by adjusting its scheduling parameters and its core affinity in the case of symmetric multiprocessing (SMP).
A non-concurrent attack is one in which the attacker only gets to observe the timing information or shared hardware state at the beginning and the end of the sensitive computation. For example, a non-concurrent attacker can extract secret information using only the aggregate time it takes the target application to process a request.
Local attacks. Concurrent local attacks are the most prevalent class of timing attacks in the research literature. Such attacks are known to be able to extract the secret/private key against a wide-range of ciphers including RSA [4, 36], AES [23, 35, 40, 46], and ElGamal [49]. These attacks exploit information leakage through a wide range of shared hardware resources: L1 or L2 data cache [23, 35, 36, 40], L3 cache [26, 46], instruction cache [1, 49], branch predictor cache [2, 3], and floating-point multiplier [4].
There are several known local non-concurrent attacks as well. Osvik et al. [35], Tromer et al. [40], and Bonneau and Mironov [11] present two types of local, non-concurrent attacks against AES implementations. In the first, prime and probe, the attacker “primes” the cache, triggers an AES en- cryption, and “probes” the cache to learn information about the AES private key. The spy process primes the cache by loading its own memory content into the cache and probes the cache by measuring the time to reload the memory content after the AES encryption has completed. This attack involves the attacker’s spy process measuring its own timing information to indirectly extract information from the victim application. Alternatively, in the evict and time strategy, the attacker measures the time taken to perform the victim operation, evicts certain chosen cache lines, triggers the victim operation and measure its execution time again. By comparing these two execution times, the attacker can find out which cache lines were accessed during the victim operation. Osvik et al. were able to extract an 128-bit AES key after only 8,000 encryptions using the prime and probe attack.
Remote attacks. All existing remote attacks [14, 15, 29, 36] are non-concurrent, however this is not fundamental. A hy- pothetical remote, yet concurrent, attack would be one in which the remote attacker submits requests to the victim application at the same time that another non-adversarial client sends some requests containing sensitive information to the victim application. The attacker may then be able to measure timing information at intermediate steps of the non-adversarial client’s communication with the victim application and infer the sensitive content.
III. THREAT MODEL
We allow the attacker to be local or remote and to execute concurrently or non-concurrently with the target application. We assume that the attacker can only run spy processes as a different non-privileged user (i.e., no super-user privileges) than the owner of the target application. We also assume that the spy process cannot bypass the standard user-based isolation provided by the operating system. We believe that these are very realistic assumptions because if either one of these assumptions fail, the spy process can steal the user’s sensitive information without resorting to side channel attacks in most existing operating systems.
In our model, the operating system and the underlying hardware are trusted. Similarly, we expect that the attacker does not have physical access to the hardware and cannot monitor side channels such as electromagnetic radiations, power use, or acoustic emanations. We are only concerned with timing and cache side channels since they are the easiest side channels to exploit without physical access to the victim machine.
IV. OUR SOLUTION
In our solution, developers annotate the functions perform- ing sensitive computation(s) that they would like to protect. For the rest of the paper, we refer to such functions as protected functions. Our solution instruments the protected functions such that our stub code is invoked before and after execution of each protected function. The stub code ensures
that the protected functions, all other functions that may be invoked as part of their execution, and all the secrets that they operate on are safe from both local and remote timing attacks. Thus, our solution automatically prevents leakage of sensitive information by all functions (protected or unprotected) invoked during a protected function’s execution.
Our solution ensures the following properties for each protected function:
• We ensure that the execution time of a protected function as observed by either a remote or a local attacker is independent of any secret data the function operates on. This prevents an attacker from learning any sensitive in- formation by observing the execution time of a protected function.
• We clean any state left in the shared hardware resources (e.g., caches) by a protected function before handing the resources over to an untrusted process. As described earlier in our threat model (Section III), we treat any process as untrusted unless it belongs to the same user who is performing the protected computation. We cleanse shared state only when necessary in a lazy manner to minimize the performance overhead.
• We prevent other concurrent untrusted processes from ac- cessing any intermediate state left in the shared hardware resources during the protected function’s execution. We achieve this by efficiently dynamic partitioning the shared resources while incurring minimal performance overhead.
L2#cache#
L3#cache#
L1#cache#
L2#cache#
L1#cache#
L2#cache#
L1#cache#
per,user#page#coloring#isolates#protected# func7on’s#cache#lines##
• no#user#process#can#preempt#protected#func7ons# • apply#padding#to#make#7ming#secret,independent# • lazily#clean#per,core#resources#
core#1# core#2# core#3# protected# func7on#
untrusted# process#
untrusted# process#
Fig. 1: Overview of our solution
Figure 1 shows the main components of our solution. We use two high-level mechanisms to provide the properties described above for each protected function: time padding and preventing leakage through shared resources. We first briefly summarize these mechanisms below and then describe them in detail in Sections IV-A and IV-B.
Time padding. We use time padding to make sure that a protected function’s execution time does not depend on
the secret data. The basic idea behind time padding is sim- ple—pad the protected function’s execution time to its worst- case runtime over all possible inputs. The idea of padding execution time to an upper limit to prevent timing channels itself is not new and has been explored in several prior projects [6, 18, 24, 31, 47]. However, all these solutions suffer from two major problems which prevent them from being adopted in real-world setting: i) they incur prohibitive performance overhead (90−400% in macro-benchmarks [47]) because they have to add a large amount of time padding in order to prevent any timing information leakage to a remote attacker, and ii) they do not protect against local adversaries who can infer the actual unpadded execution time through side channels beyond network events (e.g., by monitoring the cache access patterns at periodic intervals).
We solve both of these problems in this paper. One of our main contributions is a new low-overhead time padding scheme that can prevent timing information leakage of a protected function to both local and remote attackers. We minimize the required time padding without compromising security by adapting the worst-case time estimates using the following three principles:
1) We adapt the worst-case execution estimates to the target hardware and the protected function. We do so by pro- viding an offline profiling tool to automatically estimate worst-case runtime of a particular protected function running on a particular target hardware platform. Prior schemes estimate the worst-case execution times for complete services (i.e., web servers) across all possible hardware configurations. This results in an over-estimate of the time pad that hurts performance.
2) We protect against local (and remote) attackers by ensur- ing that an untrusted process cannot intervene during a protected function’s execution. We apply time padding at the end of every protected function’s execution. This en- sures minimal overhead while preventing a local attacker from learning the running time of protected functions. Prior schemes applied a large time pad before sending a service’s output over the network. Such schemes are not secure against local attackers who can use local resources, such as cache behavior, to infer the execution time of individual protected functions.
3) Timing variations result from many factors. Some are secret-dependent and must be prevented, while others are secret independent and cause no harm. For example, timing variations due to the OS scheduler and interrupt handlers are generally harmless. We accurately measure and account for secret-dependent variations and ignore the secret-independent variations. This lets us compute an optimal time pad needed to protect secret data. None of the existing time padding schemes distinguish between the secret-dependent and secret-independent variations. This results in unnecessarily large time pads, even when secret-dependent timing variations are small.
Preventing leaks via shared resources. We prevent in- formation leakage through shared resources without adding significant performance overhead to the process executing the protected function or to other (potentially malicious) processes. Our approach is as follows:
• We leverage the multi-core processor architecture found in most modern processors to minimize the amount of shared resources during a protected function’s execution without hurting performance. We dynamically reserve exclusive access to a physical core (including all per- core caches such as L1 and L2) while it is executing a protected function. This ensures that a local attacker does not have concurrent access to any per-core resources while a protected function is accessing them.
• For L3 caches shared across multiple cores, we use page coloring to ensure that cache accesses during a protected function’s execution are restricted within a reserved por- tion of the L3 cache. We further ensure that this reserved portion is not shared with other users’ processes. This prevents the attacker from learning any information about protected functions through the L3 cache.
• We lazily cleanse the state left in both per-core resources (e.g., L1/L2 caches, branch predictors) and resources shared across cores (e.g., L3 cache) only before handing them over to untrusted processes. This minimizes the overhead caused by the state cleansing operation.
A. Time padding
We design a safe time padding scheme that defends against both local and remote attackers inferring sensitive information from observed timing behavior of a protected function. Our de- sign consists of two main components: estimating the padding threshold and applying the padding safely without leaking any information. We describe these components in detail next.
Determining the padding value. Our time padding only accounts for secret-dependent time variations. We discard variations due to interrupts or OS scheduler preemptions. To do so we rely Linux’s ability to keep track of the number of external preemptions. We adapt the total padding time based on the amount of time that a protected function is preempted by the OS.
• Let Tmax be the worst-case execution time of a protected function when no external preemptions occur.
• Let Text preempt be the worst-case time spent during pre- emptions given the set of n preemptions that occur during the execution of the protected function.
Our padding mechanism pads the execution of each protected function to Tpadded cycles, where
Tpadded = Text preempt + Tmax.
This leaks the amount of preemption time to the attacker, but nothing else. Since this is independent of the secret, the attacker learns nothing useful.
Estimating Tmax. Our time padding scheme requires a tight estimate of the worst-case execution time (WCET) of every protected function. There are several prior projects that try to estimate WCET through different static analysis tech- niques [19, 25]. However, these techniques require precise and accurate models of the target hardware (e.g., cache, branch target buffers, etc.) which are often very hard to get in practice. In our implementation we use a simple dynamic profiling method to estimate WCET described below. Our time padding
time
Padding target:
Leak
Fig. 2: Time leakage due to naive padding
scheme is not tied to any particular WCET estimation method and can work with other estimation tools.
We estimate the WCET, Tmax, through dynamic offline pro- filing of the protected function. Since this value is hardware- specific, we perform the profiling on the actual hardware that will run protected functions. To gather profiling informa- tion, we run an application that invokes protected functions with an input generating script provided by the application developer/system administrator. To reduce the possibility of overtimes occurring due to uncommon inputs, it is important that the script generate both common and uncommon inputs. We instrument the protected functions in the application so that the worst-case performance behavior is stored in a profile file. We compute the padding parameters based on the profiling results.
To be conservative, we obtain all profiling measurements for the protected functions under high load conditions (i.e., in parallel with other application that produces significant loads on both memory and CPU). We compute Tmax from these measurements such that it is the worst-case timing bound when at most a κ fraction of all profiling readings are excluded. κ is a security parameter which provides a tradeoff between security and performance. Higher values of κ reduce Tmax but increase the chance of overtimes. For our prototype implementation we set κ to 10−5.
Safely applying padding. Once the padding amount has been determined using the techniques described earlier, waiting for the target amount might seem easy at first glance. However, there are two major issues that make application of padding complicated in practice as described below.
Handling limited accuracy of padding loops. As our solution depends on fine-grained padding, a naive padding scheme may leak information due to limited accuracy of any padding loops. Figure 2 shows that a naive padding scheme that repeatedly measures the elapsed time in a tight loop until the target time is reached leaks timing information. This is because the loop can only break when the condition is evaluated, and hence if one iteration of the loop takes u cycles then the padding loop leaks timing information mod u. Note that earlier timing padding schemes do not get affected by this problem as their padding amounts are significantly larger than ours.
Our solution guarantees that the distribution of running times of a protected function for some set of private inputs is indistinguishable from the same distribution produced when a different set of private inputs to the function are used. We
call this property the safe padding property. We overcome the limitations of the simple wait loop by performing a timing randomization step before entering the simple wait loop. During this step, we perform m rounds of a randomized waiting operation. This goal of this step is to ensure that the amount of time spent in the protected function before the beginning of the simple wait loop, when taken modulo u, the stable period of the simple timing loop (i.e. disregarding the first few iterations), is close to uniform. This technique can be viewed as performing a random walk on the integers modulo u where the runtime distribution of the waiting operation is the support of the walk and m is the number of steps walked. Prior work by Chung et al. [16] has explored the sufficient conditions for the number of steps in a walk and its support that produce a distribution that is exponentially close to uniform.
For the purposes of this paper, we perform timing random- ization using a randomized operation with 256 possible inputs that runs for X + c cycles on input X where c is a constant. We describe the details of this operation in Section V. We then choose m to defeat our empirical statistical tests under pathological conditions that are very favorable to an attacker as shown in Section VI.
For our scheme’s guarantees to hold, the randomness used inside the randomized waiting operation must be generated using a cryptographically secure generator. Otherwise, if an attacker can predict the added random noise, she can subtract it from the observed padded time and hence derive the original timing signal, modulo u.
A padding scheme that pads to the target time Tpadded using a simple padding loop and performs the randomization step after the execution of the protected function will not leak any information about the duration of the protected function, as long as the following conditions hold: (i) no preemptions occur; (ii) the randomization step successfully yields a distribution of runtimes that is uniform modulo u; (iii) The simple padding loop executes for enough iterations so that it reaches its stable period. The security of this scheme under these assumptions can be proved as follows.
Let us assume that the last iteration of the simple wait loop take u cycles. Assuming the simple wait loop has iterated enough times to reach its stable period, we can safely assume that u does not depend on when the simple wait loop started running. Now, due to the randomization step, we assume that the amount of time spent up to the start of the last iteration of the simple wait loop, taken modulo u, is uniformly distributed. Hence, the loop will break at a time that is between the target time and the target time plus u − 1. Because the last iteration began when the elapsed execution time was uniformly distributed modulo u, these u different cases will occur with equal probability. Hence, regardless of what is done within the protected function, the padded duration of the function will follow a uniform distribution of u different values after the target time. Therefore, the attacker will not learn anything from observing the padded time of the function.
To reduce the worst-case performance cost of the random- ization step, we generate the required randomness at the start of the protected function, before measuring the start time of the protected function. This means that any variability in the runtime of the randomness generator does not increase Tpadded.
// At the return point of a protected function: // Tbegin holds the time at function start // Ibegin holds the preemption count at function start
1 for j = 1 to m 2 Short-Random-Delay() 3 Ttarget = Tbegin + Tmax 4 overtime = 0 5 for i = 1 to ∞ 6 before = Current-Time() 7 while Current-Time() < Ttarget, re-check. 8 // Measure preemption count and adjust target 9 Text preempt = (Preemptions()− Ibegin) ·Tpenalty
10 Tnext = Tbegin + Tmax + Text preempt + overtime 11 // Overtime-detection support 12 if before ≥ Tnext and overtime = 0 13 overtime = Tovertime 14 Tnext = Tnext + overtime 15 // If no adjustment was made, break 16 if Tnext = Ttarget 17 return 18 Ttarget = Tnext
Fig. 3: Algorithm for applying time padding to a protected function’s execution.
Handling preemptions occurring inside the padding loop. The scheme presented above assumes that no external pre- emptions can occur during the the execution of the padding loop itself. However, blocking all preemptions during the padding loop will degrade the responsiveness of the system. To avoid such issues, we allow interrupts to be processed during the execution of the padding loop and update the padding time accordingly. We repeatedly update the padding time in response to preemptions until a “safe exit condition” is met where we can stop padding.
Our approach is to initially pad to the target value Tpadded, regardless of how many preemptions occur. We then repeatedly increase Text preempt and pad to the new adjusted padding target until we execute a padding loop where no preemptions occur. The pseudocode of our approach is shown in Figure 3. Our technique does not leak any information about the actual runtime of the protected function as the final padding target only depends on the pattern of preemptions but not on the initial elapsed time before entering the padding loops. Note that forward progress in our padding loops is guaranteed as long as preemptions are rate limited on the cores executing protected functions.
The algorithm computes Text preempt based on observed preemptions simply by multiplying a constant Tpenalty by the number of preemptions. Since Text preempt should match the worst-case execution time of the observed preemptions, Tpenalty is the worst-case execution time of any single preemption. Like Tmax, Tpenalty is machine specific and can be determined empirically from profiling data.
Handling overtimes. Our WCET estimator may miss a pathological input that causes the protected function to run for significantly more time than on other inputs. While we never
observed this in our experiments, if such a pathological input appeared in the wild, the protected function may take longer than the estimated worst-case bound and this will result in an overtime. This leaks information: the attacker learns that a pathological input was just processed. We therefore augment our technique to detect such overtimes, i.e., when the elapsed time of the protected function, taking interrupts into account, is greater than Tpadded.
One option to limit leakage when such overtimes are detected is to refuse to service such requests. The system administrator can then act by either updating the secrets (e.g., secret keys) or increasing the parameter Tmax of the model.
We also support updating Tmax of a protected function on the fly without restarting the running application. The padding parameters are stored in a file that has the same access permissions as the application/library containing the protected function. This file is memory-mapped when the corresponding protected function is called for the first time. Any changes to the memory-mapped file will immediately impact the padding parameters of all applications invoking the protected function unless they are in the middle of applying the estimated padding.
Note that each overtime can at most leak log(N) bits of information, where N is the total number of timing measure- ments observed by the attacker. To see why, consider a string of N timing observations made by an attacker with at most B overtimes. There can be < NB such unique strings and thus the maximum information content of such a string is < Blog(N) bits, i.e., < log(N) bits per overtime. However, the actual effect of such leakage depends on how much entropy an application’s timing patterns for different inputs have. For example, if an application’s execution time for a particular secret input is significantly larger than all other inputs, even leaking 1 bit of information will be enough for the attacker to infer the complete secret input.
Minimizing external preemptions. Note that even though Tpadded does not leak any sensitive information, padding to this value will incur significant performance overhead if Text preempt is h
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.
