Wednesday, 15 October 2014

Trustworthy Hardware

This is good news, it's dated 23 September:
http://www.nsf.gov/news/news_summ.jsp?cntn_id=132795&org=NSF&from=news
Coincidentally, this  is something I sent out on 10 September, it went to Richard Stallman, Linus Torvalds, Theo deRaadt and Roger Schell, amongst others:

The Intel x86 CPU "Instruction Set Architecture" is not really what
one ordinarily thinks of as machine code. It is more like a bytecode
language. Processors such as the Intel Atom series do not directly
execute ISA instructions, rather they emulate an x86 CPU, by running
an interpreter which is programmed in "microcode." In the case of the
Atom CPUs, the "microcode" possibly bears a striking resemblence to
ARM machine code. We don't actually know this, because it is
encrypted---some may claim for "obvious reasons," but we will show
that these are neither obvious, nor are they reasons.

The published ISA specification is highly redundant in that there are
many instructions which can be encoded in more than one way, but which
have identical final results (i.e. they have the same "big step"
operational semantics). Because the redundant encodings have defined
operational semantics, they provide a means by which any agency having
the capacity to inject "microcode" into the CPU can affect a covert
two-way channel between machine code programs and the CPU.

For example, it is possible to arrange that application and/or system
software can determine whether or not it is running under emulation,
and thereby moderate its behaviour if there is any risk of it being
observed. This could be done by "instruction knocking" which is
another instance of the teletype string trigger trap door described by
Karger and Schell in [1]: using a special, highly improbable string of
variant encodings of otherwise normal, well-defined, instructions, to
trigger operational effects not explicitly specified in the processor
documentation. Unless a software emulator were programmed to recognise
all such undocumented sequences, that emulator would behave in a way
that was observably different to the way a real processor would be
expected to behave. Having once identified that it is very probably
running on a "real CPU", such a program could then safely use
undocumented instructions to directly interface with covert functions
implemented in the "microcode". This channel could be used to
re-program the microcode, for example: to recognise a new sequence of
instructions characteristic of a particular cryptographic checksum
calculation, and to adjust the results when they matched any of the
target checksum signatures stored in a table. Obviously it could also
be used to effect a "microcode" update, bypassing the documented
"microcode" update mechanism.

A similar technique could be applied to any actual hardware, such as a
USB mouse, for example. A software vendor who also has control of the
hardware in a widely-used mouse could employ those devices as a
'trusted' platform to verify that it is not running under emulation,
and that an out of band channel through the USB interface could safely
be used to apply firmware updates, say, to other USB devices attached
to the same hub. For example, it could update any attached USB
DVD-player firmware with a blacklist of signatures of files, the
contents of which would then be altered as they were read or written
to/from the device.

This would not be too damaging, if its use were restricted to
enforcement of so-called "intellectual property rights". However the
same mechanisms, through subversion or otherwise, could be used to
interfere with software distributed on CD-ROM, or with data
transmitted to/from electronic voting systems, or data
transmitted/received by a cellular modem. Of course this problem is
not unique to USB; the PCI bus interface offers similar
"opportunities," with the potential for re-programming the firmware on
network adapters, disk drives, storage area networks etc. Such devices
typically have more potential for autonomous data-processing than do
USB devices.

The ability to do after-the-fact updates of device firmware is not a
necessary pre-requisite for the establishment of covert functions in
hardware, but it makes them very easy to retrofit to hardware in the
field, and that hardware is therefore correspondingly more difficult
to protect against subversion: if the manufacturer can install a
"microcode" update then, in principle, so can anyone else.

Even if the probability that any one device is compromised is
relatively small, the number of such devices in even a modest network
of workstations and servers makes it more likely than not that at
least one has in fact been compromised. Furthermore, if one device on
the network is compromised then the probability that others will be
compromised as a result is far higher, and so on and so forth.

There is therefore a strong case to be made for imposing a legal
requirement on hardware manufacturers to fully disclose the internal
software interface specifications of every digital data gathering,
communications, computing or storage device with a firmware
field-update capability. The lawful owners of these devices would then
have control over which updates are applied, and what that software
actually does. They could thereby more effectively secure those devices,
because there would no longer exist a single point of failure which would
enable a successful attacker to compromise any and every instance of a
particular class of device.

It is conceivable that such a motion would be delayed, or even
successfully opposed by the manufacturers. In that case other
techniques will be needed to secure integrity, privacy and
availability of computer and communications systems with
field-updatable firmware. One plausible approach is to restrict direct
software access to device hardware, and pass all device i/o operations
through a highly restrictive channel: a formally-specified interpreter
implementing the required operations in such a way as to provide NO
MEANS WHATSOEVER by which ANY software could detect whether physical
hardware was in fact being used at all to implement the function of
the device. In effect, one would run ALL software on an emulated
processor, just like the Intel Atom does.

Any such interpeter software would ultimately have to have direct
hardware access, and that layer of software would have to be
trusted. To make such trust plausible, the operational semantics of
the interpreter (i.e. both the processor emulator and the peripheral
harware emulator) would need to be formally specified, and that
specification be machine-readable, so that implementations could be
automatically generated in any sufficiently expressive programming
language.

This is not as difficult to do as it might at first seem. The open
source Bochs emulator is a reasonably complete emulation of the
publicly specified fragment of the operational semantics of a large
class of x86 CPUs, and a modest collection of common PC hardware
interfaces. Bochs is written in C++, but in fact the core instruction
emulation code is remarkably uniform, and an abstract operational
semantics could be extracted from it with only a little
difficulty. Initially, this would be in the form of C expressions, but
it could be further formalised if it were re-written in an abstract
assembler language such as that implemented by GNU Lightning.

Such an abstract expression of device semantics could be used to
implement Bochs-like emulators in many different contexts, which the
current concrete C++ implementation prohibits, because they would
require major restructuring of the code. For example, one could
generate a version of the emulator which allowed emulation of
different machine states in different threads of the same process; or
which embedded an emulator in an interpreted programming language,
allowing programmed interaction with the emulation in progress; or
which split the emulation of one virtual machine between co-processes
running on different physical and/or virtual machines. The
possibilities are endless.

If that abstract assembler language were fairly carefully specified,
then translations to machine code for particular processors could be
partially verified by automatic means, wherever a machine-readable
specification of the operational semantics of the target hardware was
available. For example, Intel's (closed-source) XED library,
distributed as part of their Pin tool, provides a partial
specification of x86 semantics by way of the resource flags it
provides with each instruction decoding. These specify the gross
effects of the instruction in terms of the CPU flags upon which it
depends, and those which it affects, the registers and memory
locations which are read/written, etc. If the abstract assembler had a
similar machine-readable formal semantics, then these could be
compared to partially verify any translation between the abstract and
concrete assembler languages.

Given more than one such formal specification of different CPU
emulators, one could arrange for implementations to be stacked:
emulating the emulators. Then it is not too hard to see how one could
stack another, different pair of emulators together on some other
physical hardware, and compare the results of emulating the same
program. The more available implementations of processor semantics
there were, the more confidence would be justified in the correctness
of those semantics. So although we cannot trust any hardware to
directly interpret its own instructions, we could perhaps trust it to
directly interpret its own instructions when they are emulating those
of some other processor which is emulating that system.  The
interpretation stack need not be constant over the duration of the
program's execution: I am told it is not too difficult to migrate
running virtual machines from one physical machine to another, so it
should be significantly easier to migrate a virtual machine image from
one virtual machine to another, whether those virtual emulators are
running on the same physical machine or not.

An interpretive environment such as this could then be used whenever
there was any doubt that the underlying hardware was trustworthy,
i.e. until manufacturers are forced to publish the specifications of
the internal firmware interfaces. Emulation need not be grossly
inefficient: the existence of the Atom processors shows that emulating
a CISC machine on a RISC machine is a workable proposition. There is
no absolute requirement that the processors we emulate be real extant
machines. In fact, it would be better if some of them weren't, because
they would be less likely to be subverted.

The key element is the existence of a _machine-readable formal
specification_ of the operational semantics. The fact that the
semantics is machine-readable means that it can be automatically
implemented. It is important to realise that, although in one sense
any C program at all is clearly machine readable, it is not
necessarily a formal semantics, because it may not be amenable to
automatic transformation into another implementation, unless that
other implementation were effectively a C language interpreter of some
kind. This is because of the well-known results of Rice and others,
which show that the defined semantics of programs written in
Turing-universal languages are not necessarily consistent: they are
susceptible to syntactic fixedpoints, which are deliberately
constructed contradictory interpretations of the operation of some
particular implementation of the interpreter.

But a processor emulator does not _have_ to be written in a
Turing-universal language. Every part of its operation is necessarily
finite, because it is a physically finite device with a finite state
space. Therefore we can use a so-called domain-specific language to
describe its operation, and that would be machine-readable in the
sense we need it to be. We could then construct any number of
particular implementations of interpreters (i.e. emulators of hardware
and CPU) and although any finite subset of those _may_ be susceptible
to the construction of a syntactic fixedpoint, the general class of
such implementations as a whole will not, because they are
indeterminate: a purely formal specification will not specify the
representation of any particular implementation whatsoever.

Putting it another way: any formal language which could specify the
operational semantics of a broad class of modern CPUs and peripheral
hardware would also be able to express many variations on existing
hardware designs which had never before been realized. It is difficult
to see how an attacker would able to subvert the semantics of a device
which has not yet been invented, provided the inventor was imaginative
and her invention independent of, and substantially different from,
any of those already well-known.

All we need to do to make this real, is to carefully specify a GNU
Lightning-like abstract assembler, and formally describe the mappings
from that language into the encoded instructions of real and imaginary
machine languages.

Anyone with any interest at all in computer security should look at
[1]. Those interested in learning more about this method of specifying
operational semantics as processes of interpretation should look at
John Reynolds' paper [2]. Those interested in the logical point of
view: the same idea but from the other side of the
Curry-Howard-Griffin correspondence, as it were, should look at Girard,
Lafont and Taylor's [3].

Thanks to Stefan Monnier whose clear and insightful criticisms
inspired these thoughts.

Ian Grant
La Paz, Bolivia
10 September 2014

References:

[1] Karger, P. A., and Schell, R. R. (1974) MULTICS Security
    Evaluation: Vulnerability Analysis, ESD-TR-74-193,
    Vol. II, Electronic Systems Division, Air Force Systems Command,
    Hanscom AFB, Bedford, MA, June.

    http://seclab.cs.ucdavis.edu/projects/history/papers/karg74.pdf 
 
    Also in Proceedings of the Computer Security Applications Conference,
    Las Vegas, NV, USA, December, pp 126-146.

[2] Reynolds, John C., "Definitional Interpreters for Higher-Order
    Programming Languages," Higher-Order and Symbolic Computation,
    11, 363--397 (1998)

    Available on-line, search for: HOSC-11-4-pp363-397.pdf

[3] Girard, Jean-Yves, Lafont, Yves and Taylor, Paul. "Proofs and Types"
    Cambridge University Press, 1989.

     http://www.paultaylor.eu/stable/prot.pdf 
 

No comments:

Post a Comment