Hardware Support for Efficient Virtualization
Using a Word Document, write a Report of no more than 2 pages that summarizes the main concepts discussed in the paper.
Hardware Support for Efficient Virtualization
John Fisher-Ogden University of California, San Diego
Abstract
Virtual machines have been used since the 1960’s in creative ways. From multiplexing expensive mainframes to providing backwards compatibility for customers migrating to new hard- ware, virtualization has allowed users to maximize their usage of limited hardware resources. Despite virtual machines falling by the way-side in the 1980’s with the rise of the minicomputer,we are now seeing a revival of virtualization with virtual machines being used for security, isolation, and testing among others.
With so many creative uses for virtualization, ensuring high performance for applications running in a virtual machine be- comes critical. In this paper, we survey current research to- wards this end, focusing on the hardware support which en- ables efficient virtualization. Both Intel and AMD have incor- porated explicit support for virtualization into their CPUde- signs. While this can simplify the design of a stand alone virtual machine monitor (VMM), techniques such asparavirtualization and hosted VMM’s are still quite effective in supporting virtual machines.
We compare and contrast current approaches to efficient vir- tualization, drawing parallels to techniques developed byIBM over thirty years ago. In addition to virtualizing the CPU, we also examine techniques focused on virtualizing I/O and the memory management unit (MMU). Where relevant, we identify shortcomings in current research and provide our own thoughts on the future direction of the virtualization field.
1 Introduction
The current virtualization renaissance has spurred excit- ing new research with virtual machines on both the soft- ware and the hardware side. Both Intel and AMD have incorporated explicit support for virtualization into their CPU designs. While this can simplify the design of a stand alone virtual machine monitor (VMM), techniques such asparavirtualizationand hosted VMM’s are still quite effective in supporting virtual machines.
This revival in virtual machine usage is driven by many motivating factors. Untrusted applications can be safely sandboxed in a virtual machine providing added security and reliability to a system. Data and performance isola- tion can be provided through virtualization as well. Se- curity, reliability, and isolation are all critical components for data centers trying to maximize the usage of their hard- ware resources by coalescing multiple servers to run on a
single physical server. Virtual machines can further in- crease reliability and robustness by supporting live migra- tion from one server to another upon hardware failure.
Software developers can also take advantage of virtual machines in many ways. Writing code that is portable across multiple architectures requires extensive testingon each target platform. Rather than maintaining multiple physical machines for each platform, testing can be done within a virtual machine for each platform, all from a sin- gle workstation. Virtualization can also be exploited for debugging purposes. Post-mortem forensics of a crashed or compromised server can be expedited if the server was running in a virtual machine [9]. Virtualization can also be used to support techniques such as bidirectional debug- ging [12] which aid both software developers and system administrators.
One final factor in the revival of virtual machines is they can provide simplified application deployment by packaging anentire environment together to avoid com- plications with dependencies and versioning.
With so many creative uses for virtualization, ensur- ing high performance for applications running in a virtual machine becomes critical. In this paper, we survey cur- rent research towards this end, focusing on the hardware support which enables efficient virtualization.
We compare and contrast current approaches to effi- cient virtualization, drawing parallels to techniques de- veloped by IBM over thirty years ago. In addition to vir- tualizing the CPU, we also examine techniques focused on virtualizing I/O and the memory management unit (MMU). Where relevant, we identify shortcomings in cur- rent research and provide our own thoughts on the future direction of the virtualization field.
In the remainder of this paper, we present and evalu- ate multiple techniques aimed at providing efficient virtu- alization. In Section 2, we provide some historical back- ground to put current Intel and AMD proposals in context. Section 3 then details the current approach from Intel. We next turn to the virtualization of the MMU in Section 4 and I/O in Section 5. Finally, Section 6 provides some discussion and comparisons before considering future di- rections for this field. Section 7 concludes our analysis.
1
2 Background
In this section, we will highlight relevant approaches to virtualization from the past few decades before discussing the current techniques from Intel and AMD.
2.1 Classical Virtualization
Popek and Goldberg’s 1974 paper define requirements for what is termedclassical virtualization[15]. By their stan- dards, a piece of software can be considered a VMM if it meets the following three requirements:
• Equivalent execution. Programs running in a virtual environment run identically to running natively, bar- ring differences in resource availability and timing.
• Performance. A “statistically dominant” subset of instructions must be executed directly on the CPU.
• Safety. A VMM must completely control system re- sources.
An early technique for virtualization wastrap and em- ulate. While this approach was effective at providing an equivalent execution environment, its performance was severely lacking as each instruction could require tens of native instructions to emulate. The performance require- ment for a VMM does not rule out trap and emulate, but rather, limits its application.
Popek and Goldberg also definesensitiveinstructions which can violate the safety and encapsulation that a VMM provides. For example, an instruction which changes the amount of system resources available would be considered sensitive. A VMM can be constructed for an architecture if the sensitive instructions are a subset of the privileged instructions. This ensures that the VMM can step in on all sensitive instructions and handle them safely since they are guaranteed to trap.
However, even if an architecture fails this, as the x86 architecture does, software techniques can be employed to achieve a similar execution environment despite not being classically virtualizable.
2.2 IBM Virtualizable Architectures
Now that we have established a baseline for virtualizable architectures, we examine a few IBM systems which pio- neered the field of virtualization.
2.2.1 VM/370
The Virtual Machine Facility/370 (VM/370) [8] provides multiple virtual machines to users, each having the same architecture as the underlying IBM System/370 hardware they run on.
The VM/370 is comprised of three distinct modules: the Control Program (CP), Conversational Monitor Sys- tem (CMS), and Remote Spooling and Communications Subsystem (RSCS). The Control Program handles the du- ties of a VMM and creates virtual machines, while CMS is the guest operating system which runs in each virtual machine. CMS was originally written for the IBM Sys- tem/360 and transitioned to the virtual environment once CP came on-line. The final module of VM/370, RSCS, handles the networking and communication between vir- tual machines and also remote workstations.
A major goal of IBM was maintaining compatibility across afamily of computers. While the VM/370 ran on the System/370 and exported that architecture through its virtual machines, programs written for the System/360 could still be run with degraded performance, despite the underlying architecture not supporting certain features.
An important design goal for CP and CMS was to make the virtual machine environment appear identical to its native counterpart. However, IBM did not make ef- ficiency and performance an explicit design goal. While efficiency was not eschewed outright, these pioneering ef- forts rightly focused on functionality and correctness.
When running multiple guests in virtual machines, each guest believes that all of memory is at its disposal. Since a VMM must provide anequivalentenvironment for guests, dynamic address translation (DAT) must be performed to translate guest physical addresses to host physical addresses. VM/370 usesshadow page tablesto achieve this translation.
Shadow page tables are a fairly simple mechanism for providing DAT but have been used quite heavily over the years. A guest OS manages its own page tables to map guest virtual addresses to guest physical addresses. Since guest physical addresses are actuallyhost virtual addresses, the VMM must then use its own page tables to map to a host physical address. Once a host physical address is obtained, a mapping from guest virtual address to host physical address can be inserted into the hardware translation lookaside buffers (TLB).
2.2.2 370-XA
The System/370 Extended Architecture (370-XA) [11] continues the evolution of virtual machines beyond the VM/370. Given that performance was not an explicit goal for the VM/370, the 370-XA was able to increase the ef- ficiency of virtualization in a variety of ways.
Since the trap and emulate technique was used so heav- ily, the 370-XA incorporatedµ-code extensions calledas- siststo the CPU to replace common functions that were expensive to emulate. As not all the available assists were targeted at virtualization support, we restrict our discus- sion to the assists that did target virtualization.
2
In previous systems, assists had proved themselves to be quite indispensable for running virtual machines. This caused the 370-XA to coalesce a large number of these assists into a new execution mode for the CPU,interpre- tive execution, which recognizes special instructions and enables most privileged instructions to execute directly in the virtual environment.
To enter interpretive execution mode, the privileged SIE instruction is used (Start Interpretive Execution). The operand given to SIE is thestate descriptionwhich de- scribes the current state of the guest. Upon exiting in- terpretive execution, the state description is updated, in- cluding the guest program status word (PSW). The state description also details the reason for the exit to expedite any necessary handling by the host program.
Potential causes for exiting interpretive execution in- clude interrupts, exceptions, instructions that require sim- ulation, or even any instruction that the host program chooses via a mask.
Interpretive execution on the 370-XA can provide vir- tual environments for both the System/370 and the 370- XA architectures. However, the 370-XA does not use shadow page tables like VM/370. Since the 370-XA sup- ports a larger 2GB address space, there were concerns over a possible sparseness of address references leading to a poor TLB cache hit rate. Maintaining the shadow page tables can be costly as well.
To avoid these issues, the 370-XA performs both lev- els of translation in hardware rather than relying on the shadow page tables to map guest physical addresses to host physical addresses. In Section 4, we see that both Intel and AMD have adopted similar approaches.
While guests can execute many privileged instructions in interpretive execution, guest I/O instructions do cause a trap to the VMM. The 370-XA does support a check- ing mode on a sub-channel basis that limits references to guest’s storage only. This checking mode provides some protection against malicious or buggy guests
However, the 370-XApreferred-machine assistallows trusted guests to run directly in the host address space to avoid the overhead of an extra level of translation. These trusted guests can execute most privileged instructions, in- cluding those for I/O. Guests also handle their own inter- rupts in this mode, reducing the need to trap to the VMM.
On a final note, the 370-XA supports segment protec- tion for limiting access among guests for isolation and se- curity. This is not an assist per se, but rather an extension of the base architecture.
2.2.3 VM/ESA
Building upon the 370-XA, the Virtual Ma- chine/Enterprise Systems Architecture (VM/ESA) [14] also uses interpretive execution to efficiently
support virtual machine guests. While the 370-XA supported two architectures as virtual environments, the VM/ESA supportsfive different architecture modes: System/370, ESA/390, VM Data Spaces mode, 370-XA, and ESA/370, with the latter two being architectural subsets of ESA/390.
The VM Data Spaces mode enables memory sharing communication among guests that do not use DAT and also removes the 2 GB address space limit. While sup- porting five environments for virtual machines may seem unnecessary with today’s personal workstations, one must remember that the VM/ESA was designed to run on large mainframes forenterprises. Providing compatibility dur- ing migration to a newer platfrom as well as enabling test- ing of the new platform was critical to IBM’s business since the hardware was quite expensive.
Like the 370-XA, the VM/ESA also supports preferred storage mode via the preferred-machine assist. The 370- XA could only support a single guest in this mode since the guest did not use paging. However, the VM/ESA in- cludes Multiple Domain Facility (MDF) which addszones to support multiple guests in preferred mode. A guest is assigned a contiguous block of host storage with a register set to the base of this block and another register with the size of the block. The VM/ESA can then support multiple preferred guests each in its own zone, using single regis- ter translation to efficiently map between a guest physical address and a host physical address.
The dominant reason for guests to run in preferred stor- age mode is to achieve high performance I/O without the need to perform multiple levels of address translation. The single register translation maintains the performance gains while enabling multiple preferred guests.
The VM/ESA does support running VM/ESA as a guest of itself, “Russian doll” style. Interpreted SIE enables another instance of interpretive execution when already interpretively executing, distinguishing between “virtual” guests and “real” guests. However, not all hard- ware models support interpreted SIE. In that case, inter- preted SIE can be simulated through shadow page tables and other shadow structures in the “real” guest. Zone relocation replaces the lowest level of dynamic address translation to reduce the performance premium for run- ning nested virtual machines.
To conclude our discussion of VM/ESA, we note that the hardware TLBs are not tagged and must be flushed when switching between guests.
The VM/370, 370-XA, and VM/ESA illustrate the progression of virtualization techniques, with increasing amounts of functionality and performance as the systems matured. Many ground-breaking ideas were formulated in these systems, and we can clearly see their influence on the current virtualization offerings from Intel and AMD.
3
2.3 x86 Virtualization
We now step forward in time and consider the widely used x86 architecture. Due to the rise of personal work- stations and decline of mainframe computers, virtual ma- chines were considered nothing more than an interesting footnote in the history of computing. Because of this, the x86 was designed without much consideration for virtual- ization. Thus, it is unsurprising that the x86 fails to meet Popek and Goldberg’s requirements for beingclassically virtualizable.
However, techniques were developed to circumvent the shortcomings in x86 virtualization. We first present a few of the architectural challenges inherent in the x86 before discussing various solutions to these challenges.
2.3.1 Architectural Challenges
The x86 architecture supports 4 privilege levels, orrings, with ring 0 being the most privileged and ring 3 the least. Operating systems run in ring 0, user applications run in ring 3, and rings 1 and 2 are not typically used.
Ring Compression
To provide isolation among virtual machines, the VMM runs in ring 0 and the virtual machines run either in ring 1 (the 0/1/3 model) or ring 3 (the 0/3/3 model). While the 0/1/3 model is simpler, it can not be used when running in 64 bit mode on a CPU that supports the 64 bit extensions to the x86 architecture (AMD64 and EM64T).
To protect the VMM from guest OSes, either paging or segment limits can be used. However, segment limits are not supported in 64 bit mode and paging on the x86 does not distinguish between rings 0, 1, and 2. This results in ring compression, where a guest OS must run in ring 3, unprotected from user applications.
Ring Aliasing
A related problem isring aliasingwhere the true privilege level of a guest OS is exposed, contrary to the guest’s be- lief that it is running in ring 0. For example, executing a PUSH instruction on the CS register, which includes the current privilege level, and then subsequently examining the results would reveal the privilege discrepancy.
Address Space Compression
Address space compressionprovides another hurdle for virtualizing the x86 architecture. The VMM can either run its own address space which can be costly when switching between guests and the VMM, or it can run in part of the guest’s address space. When the VMM runs in its own ad- dress space, some storage in the guest address space is still
required for control structures like the interrupt-descriptor table (IDT) and the global-descriptor table (GDT) [13]. In either case, the VMM must protect the portions of the address space it uses from the guest. Otherwise, a guest could discover its running in a virtual machine or compro- mise the virtual machine’s isolation by reading or writing those locations.
Non-Privileged Sensitive Instructions
Next, in clear violation of “classical” virtualization, the x86 supports sensitive instructions that are not privileged and therefore do not trap to the VMM for correct handling. For example, the SMSW instruction stores the machine status word in a register which can then be read by the guest [16], exposing privileged information.
Silent Privilege Failures
Another problem involving privileged state is that some privileged accesses, rather than trapping to the VMM, fail silently without faulting. This violates Popek and Gold- berg’s tenet that guest virtual machines must execute iden- tically to native execution barring solely timing and re- source availability.
Interrupt Virtualization
Finally, interrupt virtualizationcan be a challenge for x86 virtual machines. The VMM wants to manage external interrupt masking and unmasking itself to maintain con- trol of the system. However, some guest OSes frequently mask and unmask interrupts, which would result in poor performance if a switch to the VMM was required on each masking instruction.
We have briefly presented some of the challenges to virtualization on the x86 architecture. We refer interested readers to Robin and Irvine’s analysis [16] for a more thor- ough presentation.
2.3.2 Binary Translation
While emulation can provide transparency and compati- bility for guest virtual machines, its performance can be poor. One technique to improve virtualization perfor- mance isbinary translation.
Binary translation involves rewriting the instructions of an application and inserting traps before problem sections or converting instructions to an entirely different instruc- tion set architecture (ISA). Binary translation can be done statically or dynamically. Dynamic binary translation is used in just-in-time compilation (JIT), for example when executing bytecode on a Java Virtual Machine (JVM).
4
Many of the x86 architectural challenges outlined pre- viously can be solved by simply inserting a trap instruc- tion that enables the VMM to gain control and correctly emulate any problematic instructions.
Static binary translation can have difficulty analyzing a binary to reconstruct basic block information and a con- trol flow graph. Dynamic translation avoids this because it can translate instructions as needed. However, the online translation must be done quickly to maintain acceptable levels of performance.
A novel example of binary translation is the FX!32 profile-directed binary translator from DEC [7]. FX!32 emulates an application on its first run while profiling the application to determine the instructions that would most benefit from running natively. These instructions are then translated so the next time the application is run, its per- formance improves dramatically.
While FX!32 is a solution to running x86 applications on DEC’s Alpha architecture, its hybrid approach com- bining emulation and dynamic binary translation illus- trates an effective solution to executing unmodified bina- ries transparently, without sacrificing performance.
2.3.3 Paravirtualization
Binary translation enables virtualization when recompil- ing source code is not desirable or feasible.Paravirtual- izationeschews this restriction in the name of high perfor- mance virtual machines.
Rather than presenting an equivalent virtual environ- ment to guests, paravirtualization exposes virtual ma- chine information to guest operating systems, enabling the guests to make more informed decisions on things like page replacement. In addition, source-level modifications can be made to avoid the x86 challenges to virtualiza- tion. Whereas binary translation would trap on problem- atic instructions, paravirtualization can avoid the instruc- tions entirely.
A leading paravirtualization system is Xen [6]. Xen achieves high performance for guest virtual machines while retaining the benefits of virtualization—resource utilization, isolation, etc.
Of course, Xen must sacrifice binary compatibility for guest operating systems. While one can easily recompile a Linux OS to run on Xen, the same can not be said for Microsoft’s Windows OSes.
2.4 Co-designed Virtual Machines
While software tricks can often be played to support virtu- alization on an uncooperative architecture, an alternative is to design the architecture and VMM in tandem. These co-designedvirtual machines blur the strict ISA boundary
into a virtual ISA that enables increased communication between hardware and software.
For example, software can track the phases of an ap- plication and tune the branch prediction logic in the hard- ware to optimize for the current application phase.
While this technique has only seen limited use, the best example is Transmeta’s Crusoe processor. The Crusoe ex- ternally supports an x86 ISA while internally using a very long instruction word (VLIW) architecture for power effi- ciency [17].
3 Current Approaches
Virtualization on the x86 architecture has required unnec- essary complexity due to its inherent lack of support for virtual machines. However, extensions to the x86 remedy this problem and as a result, can support a much simpler VMM. Further, the extensions succeed in making the x86 architecture classically virtualizable.
Both leading chip manufacturers, Intel and AMD, have rolled out these virtualization extensions in current pro- cessors. Intel calls its virtualization technology VT-x, pre- viously codenamed Vanderpool. AMD’s extensions go by the name AMD-V, previously Secure Virtual Machine (SVM) and codenamed Pacifica.
While Intel VT-x and AMD-V are not entirely equiv- alent, they share the same basic structure. Therefore, we focus our discussion on Intel’s offering, noting significant departures for AMD in Section 6.2.
3.1 Intel VT-x
Intel VT-x introduces new modes of CPU operation: VMX root operation and VMX non-root operation[13]. One can think of VMX root operation being similar to previous IA-32 operation before VT-x and is intended for VMMs (“host” mode), while VMX non-root operation is essentially aguestmode targeted at virtual machines. Both operating modes support execution in all four privi- lege rings.
The VMRUN instruction performs a VM entry, trans- ferring from host to guest mode. Control transfers back to host mode on a VM exit which can be triggered by both conditional and unconditional events. For example, theINVD instruction unconditionally triggers a VM exit while a write to a register or memory location might de- pend on which bits are being modified.
Critical to the interaction between hosts and guests is the virtual machine control structure (VMCS) which con- tains both guest state and host state. On VM entry, the guest processor state is loaded from the VMCS after stor- ing the host processor state. VM exit swaps these opera- tions, saving the guest state and loading the host state.
5
The processor state includes segment registers, the CR3 register, and the interrupt descriptor table register (IDTR). The CR3 register (control register 3) holds the physical location of the page tables. By loading and storing this register on VM entry and exit, guest virtual machines can run in an entirely separate address space than the VMM.
However, the VMCS doesnot contain any general pur- pose registers as the VMM can do this as needed. This improves VM entry and VM exit performance. On a re- lated note, a guest’s VMCS is referenced with a physical address to avoid first translating a guest virtual address.
As alluded to above, the biggest difference between host and guest mode (VMX root and non-root operation) is that many instructions in guest mode will trigger a VM exit. TheVM-execution control fieldsset the conditions for triggering a VM exit.
The control fields include:
• External-interrupt exiting. Sets whether external in- terrupts causes VM exits, regardless of guest inter- rupt masking.
• Interrupt-window exiting. Causes a VM exit when guest interrupts are unmasked.
• Use TPR shadow. Accesses to the task priority reg- ister (TPR) through register CR8 (64-bit mode only) can be set to use a shadow TPR register, available in the VMCS. This avoids a VM exit in the common case.
• CR masks and shadows. Bit masks for each con- trol register enable guest modification of select bits while transferring to host mode on writes to other bits. Similar to the TPR register, the VMCS also includes shadow registers which a guest can freely read.
While the register masks provide fine-grained control over specific control registers, the VMCS also includes several bitmaps that provide added flexibility.
• Exception bitmap. Selects which exceptions cause a VM exit. Page faults can be further differentiated based on the fault’s error code.
• I/O bitmap. Configures which ports in the 16-bit I/O port space cause VM exits when accessed.
• MSR bitmaps. Similar to CR bit masks, each model specific register (MSR) has a read bitmap and a write bitmap to control accesses.
With all of these possible events causing a VM exit, it becomes important for a VMM to quickly identify the problem and correct it so control can return to the guest virtual machine. To facilitate this, a VM exit also includes
details on the reasons for the exit to aid the VMM in han- dling it.
While the VMM responds to events from a guest, this becomes a two-way communication channel withevent injection. Event injection allows the VMM to introduce interrupts or exceptions to a guest using the IDT.
3.1.1 Architectural Challenges Addressed
In Section 2.3.1, we outlined several architectural chal- lenges inherent in the x86 which created barriers to vir- tualization. Now that we have examined VT-x in more detail, we see that VT-x does in fact provide solutions to each challenge.
By introducing a new mode of execution with full ac- cess to all four privilege rings, both the ring compression and ring aliasing problems disappear. A guest OS exe- cutes in ring 0 while the VMM is still fully protected from any errant behavior.
Since each guest VMCS is referenced with a physical address and the VMCS stores the critical IDTR and CR3 registers, virtual machines have full access to their entire address space, eliminating the problem of address space compression.
The x86 contains both non-privileged sensitive instruc- tions and privileged instructions that fail silently. How- ever, given VT-x’s extensive flexibility for triggering VM exits, fine-grained control over any potentially problem- atic instruction is available.
Lastly, the VMCS control fields also address the chal- lenge of interrupt virtualization. External interrupts can be set to always cause a VM exit, and VM exits can be conditionally triggered upon guest masking and unmask- ing of interrupts.
With these solutions to the x86 virtualization chal- lenges, the x86 can finally be termedclassicallyvirtual- izable. With VT-x, the VMM can be much simpler com- pared to the previous techniques of paravirtualization and binary translation. A simpler VMM leaves less room for error and can provide a more secure virtual environment for guest virtual machines.
3.1.2 Performance
Intel VT-x provides the hardware support enabling a sim- pler VMM. However, simplicity and performance are of- ten competing goals.
Adams and Agesen demonstrate that software tech- niques for virtualization, e.g. paravirtualization and bi- nary translation, outperform a hardware-base
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.