Friday 11 April 2014

Security and Concrete Representation

Almost all computer security problems are instances of the same generic form, which we could call the leaking abstraction.

For example, the so-called tempest leaks, are all a result of there being a definite concrete representation of display pixels, be they on raster scanning CRT displays or LCD displays controlled by a digital bus. The abstraction we naturally make is that of the one-one correspondence between the contents of the display buffer memory (the logical pixels) and the colour and intensity of the dots on the screen.

This is what we are taught in Computer Graphics 101, but we later un-learn that lesson when we discover anti-aliasing and sub-pixel rendering; and around the same time we learn that in fact there is a great deal more information emitted by the display than the representation of the pixels on the screen, so that in a process which is a little like the inverse of anti-aliasing, a well-educated computer scientist can recover a representation of the displayed image on a CRT from continuous observation of the colour and intensity of light reflected from the walls of the room, or from the Radio Frequency emissions of the LCD connector on a laptop.

The leaking abstraction is always a matter of information, which, when it is finite, serves to identify a region of the phase space of the system under observation. It is only the fact of the finiteness of the information that makes possible the reconstruction of data from that information. Now any fixed concrete representation of data results in a finite phase space, and so any concrete representation necessarily must leak information.

This applies not only to physical signals, but to any representation whatsoever. For example, as explained by Ken Thompson in his famous article Reflections On Trusting Trust, the Unix login program has a particular concrete representation, and the C compiler itself also has a particular concrete representation which can be used to identify a region of its own phase space. And this syntactic fixed point allows one to choose a completely arbitrary denotation for any C program in the system.

So if we wish to do secure communications and computation, we must find a way to avoid using systems with finite representations. We can do this using metaprogramming, which gives us access to the greatest fixed point. If we metaprogramme our communications systems then we can vary the underlying concrete representation at will, and this means that the information in the representation is effectively infinite, so there is no way to identify any region of the phase space of the system, and therefore no way to recover data from the information in any particular concrete representation.

It's really quite simple. The principle is that we cannot know what the message means unless we know the format of the message data. If we build communications systems which are abstract of the concrete representation of the messages, then the messages can only be decoded by systems that are aware of the particular representation that happens to be in use at that time and place.

Now all this has been very carefully thought out and is documented in the ISO standards for Open Systems Interconnect. The key is Abstract Syntax Notation One, or ASN.1 for short ISO/IEC 8824-1:2008, and the associated Encoding Control Notation or ECN, ref. ISO/IEC 8825-3:2008 which specifies the description of encoding rules.

Now implementing ASN.1 tools is no joke, as anyone who looks at the specification will quickly appreciate. But if we had a metaprogrammed specification of the ASN.1 language and semantics then we could use it to generate source code for parsers and transformers in any particular  programming language into which anyone cared to interpret it.

And it is not only for specifying secure communications protocols: we will be able to use ASN.1 to specify any data we use, in any context. For example, a company offering a funds transfer service wishes to provide its on-line users in any country with a form in which they identify the intended recipient in another country. The data required will depend upon the various forms of personal identification used by citizens of the destination country, as well as those used by foreign visitors.  The services provider simply specifies the receiver identity data by referencing an ASN.1 module which takes as a parameter the country specifier, and defines a data type sufficient to identify any citizen or visitor to that country.  If the government of that state later chooses to introduce a new form of identity credential, for special forces personnel, say, then they simply update the ASN.1 module, and all the government databases, as well as those of the funds transfer agents, are automatically updated to offer the new identity credential form.

Not only data types, but data values can be specified using ASN.1, so we will be able to refer to canonical identification in any and every communication or data processing system. Thus, to identify any entity in the world, we will require only a few bits. And since all data systems will share the same value references, we will never have to record identity information more than once.

Clearly the pay-off will be huge: once this is done, we will achieve an overnight increase in information systems efficiency of several orders of magnitude. This will in turn release computation and communications resources for the far more important, and far more urgent, task that we are currently neglecting, at our peril.

1 comment:

  1. The book "ASN.1 — Communication Between Heterogeneous Systems" by Olivier Dubuisson (translated by Philippe Fouquart) is excellent and it is a free down-load. Also Larmouth's "ASN.1 Complete", but neither cover ECN which dates from 2008. Both of these are available from OSS Nokalva, Inc.

    http://www.oss.com/asn1/resources/books-whitepapers-pubs/asn1-books.html

    ReplyDelete