Friday 26 September 2014

How to Write Programs

Thesis: The Software Development Process Itself is THE Major Security
Vulnerability in Current Computing and Communications Practices

This is because feasibility of the process depends entirely on re-use
of concrete implementations of algorithms, and so software developers
must routinely download and run code, because it is absolutely
necessary if they are to do their work. The quantity of code that they
download and run is far, far greater than they could possibly inspect,
even cursorily. Therefore software developers are constantly running
unknown code. Any such code they run as part of their work typically
has both read and write access to all the other code they work on,
because they typically use the same machine user account for all their
work. On Unix systems, this is practically mandated by the fact that
account setups are not easy to duplicate, and switching from one
account to another is not easy, because it is almost impossible to
transfer the 'desktop' setups and running applications from one user
account to another, without stopping and restarting them. Of course,
if this switching of running programs from one account to another were
made automatic, then there would be no point in switching accounts
when working on different projects.

The security situation is further aggravated by the fact that software
developers have a "trusted status" amongst the computer-using
community, on account of their specialist skills. Because of this
perception, there is a greater tendency to actually trust the words
and deeds of software developers. Thus the general public are on the
whole far less likely to question the actions, intentional or
unintentional, of software developers. This tendency is inversely
proportional to the perceived status of the software developer, which
is in turn proportional to the extent to which the results of their
work are distributed. So this is contrary (doubly so) to the healthier
purely rational attitude, which would be to treat the actions of _all_
software developers with the utmost suspicion, and the better known
they are, the more so! Since the products of software developers are
frequently duplicated en masse and then widely distributed, the
developers own vulnerability is yet further exaggerated when it
extends transitively to the consumers of their software.

Problem:

Clearly then, the fewer software developers there are, the better, and
also, the the less those few developers do, the better. But many
people seem to enjoy developing software. Should they be deprived of
that right?

The first question is whether it is really software development itself
that they find enjoyable, and not just some particular aspects of it
which are not so much evident in any other activity? The fact that
software developers often have other interests in common indicates
that this is probably not so.

The second question is related to the first. We must ask why they
enjoy software development. If software development were practised
differently, would they still enjoy it? For example, if software
developers had to pay constant attention to issues of security, in
order to ameliorate the vulnerability we have identified, would they
find it equally enjoyable? Every indication is that most would
not. Software projects which have high-security criteria are not as
popular with developers as others which are typically more relaxed,
and where developers perceive they have far greater freedom, and
therefore many more opportunities to be creative. This accords with
our tentative answer to the first question, because the interests that
software developers have most in common seem to do with creative use
of imagination in areas where people can exercise intelligence.

So if we could only find a way to re-employ all the people who enjoy
"coding," in an area where they have far more freedom and opportunity
to express creative intelligence, then everyone will be better off,
except those people who rely on the products of software developers,
because they need software in order to be able to express their creative intelligence! Therefore we need at the same time to find a
way to produce software without using human "coders".

Solution:

It is widely held that "human coders" are absolutely necessary,
because machines simply aren't intelligent enough to write
programs. This is probably true: machines cannot be programmed to be
creative, because creativity is, almost by definition, NOT
mechanical. But this is an error, and that it is so is clear as soon
as one recalls that the difficulty we identified, that people have
with producing secure software, is that there are fewer opportunities to creatively express intelligence: production of secure
software requires a formal discipline, and the results must be in some
sense more reproducible and predictable than those of imaginative and
creative expression of intelligence would be. Therefore it is very
likely that machines could in fact be used to carry out predictable,
disciplined coding, controlled by formal rules which could be shown to
result, when followed very strictly, in the production of secure,
reliable software.

Some may object that this is all very well for communications
protocols and device drivers, but how could software produced by
rigidly following formal rules be used to express, say, artistic
creativity? For example, what sort of a computer game could be
produced by a program following formal rules? What sort of music could
be produced by a computer following formal rules which determined the
notes that were played? This is again an error, one of confusing the
medium with the message: just because the rules are rigid, doesn't
mean the results are not creative expression. One need only listen to
the music of J.S.Bach, or read a Shakespeare sonnet to see this.

Of course the same applies to the process of generating
programs. Software written by a formal process following carefully
constructed rules, as well as being more correct, is in fact far more
beautiful than that constructed ad hoc. This is because there is far
more opportunity to change the expression of a formally specified
process: one simply changes the specification of the generating
process itself, and then re-generates the program. And because the
process is controlled by a computation, the result is typically far
more elaborate than any human mind could conceive.

The evidence for this is in the quite spectacular results which have
been achieved using mechanical processes to generate Fast Fourier
Transform implementations. The process used by the FFTW program's
elaboration of formal specifications has produced several elegant
optimizations of the basic algorithm for special cases, which cases
had not hitherto been published in the technical literature. The same
process also produced some of the published algorithms, which until
then had very probably been considered to be the results of an
essential intelligence. But the intelligence is not in the formal
system, it is being expressed through the use of the system, not by the system itself. This is probably not very different to the
intelligent Human Being's use of the formal system of mathematics to
express intelligent thought. In highly specialised mathematical
fields, people use well-understood methods to derive results. For
example, in [1] which is available from:

   https://www.cs.utexas.edu/users/EWD/transcriptions/EWD13xx/EWD1303.html,

Edsger Dijkstra mentions a method of deriving concurrent algorithms
with correctness proofs. It is disarmingly simple:

    "And then something profound and lovely happened. By analyzing by
    what structure of argument the proof obligations could be met, the
    numerical mathematician Th.J. Dekker designed, within a few hours,
    the above solution, together with its correctness argument, and
    this settled the contest. [...] the pair "c1,c2" implements the
    mutual exclusion, while "turn" has been introduced to resolve the
    tie when the two processes simultaneously try to enter their
    critical sections."

Similarly, a mathematician working on an optimization for a special
case of the FFT algorithm, might use a combination of analysis and
synthesis as a more or less formal process by which she can derive
certain classes of solution, perhaps also with correctness proofs.

There are in fact numerous examples of this in the computing
literature, although it is seldom if ever stated that this is a
recognised technique that is being used. A very obvious example is the
Hindley-Milner type inference algorithm as described by Damas and
Milner in [2]. Here the soundness and completeness are described in
terms of a set of formal rules of deduction, and the structure of the
algorithm W, which infers the type of a given expressions, very closely
follows the available rules in the inference systems for type
judgements and semantic truth.

Another good example is John Harrison's elegant treatment of exact
real arithmetic in [3]. Here a series of proofs of the required
properties of the arithmetic operations on real numbers expressed to
any given precision are used to produce the algorithms which are
written as ML functions to compute those values. The system resulting
from this specification is therefore automatically, because _by
definition,_ formally correct. I am sure numerous other examples could
be found. Often all that is necessary is a subtle shift of one's point
of view, to see the process of deriving the algorithm as a proof
process, or vice-versa, depending on the point of view from which the
author writes the description. Whatever that is, one gets the unifying
process by adding the opposite, unstated aspect: the proof or the
algorithm.

Changing the process of generation of a program is impossible to do
when it is written ad-hoc, unless it is so uniform and regular that it
could in fact have been produced by a formal mechanical process,
because then one can simply treat that program as a formal
specification. But unless it has a highly uniform, regular pattern,
which is not so elaborate that its abstract form is imperceptible to a
human being (*), an ad-hoc program cannot be restructured
automatically. Ask anyone who has applied a large number of patches,
from different sources, to an operating system kernel, say, and they
will tell you this is indeed so. And to see the truth of the clause
(*), try to abstract the FFTW generating algorithm form from a visual
inspection of the code it produces in different cases.

One should not think of this process of proving correctness as a
highly abstract one. It is no more or less abstract than the algorithm
itself, because in this view, the two things, the proof and the
algorithm, ARE THE SAME THING, seen from two different points of
view. This is a common experience, because people only feel they
actually understand an algorithm when the can SEE IMMEDIATELY WHY IT
REALLY DOES WORK. This is why NOBODY really understands public-key
cryptography algorithms: because no-one actually knows WHY they are
secure. Their security, such as it is, rests not on knowledge, but
solely on the ABSENCE of knowledge that they are NOT secure. So they
provide only a certificate, not actual knowledge. That's what ALL
certificates are for: they are used ONLY when actual knowledge is not
strictly necessary, because it is expected to be available at some
subsequent stage of a process. The certificate is therefore used
solely for the purpose of increasing efficiency in cases where it is
more likely than not that, when it becomes available, the actual
knowledge will concur with the certificate representing it.

There is a quite well-known problem on which the reader may enjoy
trying out this technique. One is set the task of determining which of
a certain number of otherwise indistinguishable objects (coins or
footballers, in the two instances we have heard) is either lighter or
heavier than the all others, which are the same weight, and whether
that object actually is lighter or heavier. The difficulty is that the
only measuring device one has available is a balance with fixed-length
arms, and in each version of the problem there is a certain maximum
number of weighings that one is allowed. The task is to describe an
_infallible method_ of determining the _correct_ answer. In some cases
it is obvious, and in others it is clearly impossible. For example, if
one is allowed three weighings to distinguish between one of 64
possibilities. Then the absolute maximum possible outcomes of three
weighings are 3*3*3=9 in number, so one cannot hope to distinguish the
one correct answer from 64 equally probable ones.

In some of these problems the solution is not immediately obvious, and
so it seems unlikely that it is possible, but not because of any
absolute bound on the information that the result of the given number
of measurements could contain. Of course these are the more
interesting cases, and one can use the algorithmic method of analysis
and synthesis Dijkstra mentions to solve them, or indeed to determine
that there is no possible solution. One needs only to consider how one
would convince _someone else_ that the result the method determines in
every possible case is correct; and then analyse the forms of those
arguments, and this will _automatically_ yield the algorithm, together
with its correctness proof. Of course, when the number of available
measurements is critical, as it is in the interesting cases, it is
important to make only those measurements which TOGETHER yield the
most information overall. So "maximizing the entropy," as it were
(choosing those measurements which discriminate most between the
possible outcomes of the WHOLE series of measurements) is the guiding
principle which determines the algorithm that this method generates.

The process as we have just described it is typical of the way
inference problems are tackled in Artifical Intelligence applications
such as are beautifully described by E.T Jaynes in his book
"Probability".

So we hope that we have allayed any fears the reader may have had that
the process of deriving correctness proofs is highly abstract, and
involves a great deal of incomprehensible logical and/or mathematical
symbols, and persuaded her instead that it is really an intuitive
process, and in a strong sense, one that is IDENTICAL with the actual
understanding of the algorithm with which it deals.

We can now go on to consider what Dijkstra's method has to do with
coding secure software. The problem with all security, is what one
might call "the leaking abstraction." This is always a result of some
fixed concrete representation of data as being encoded by
_information_ of some kind. And the leaking abstraction occurs when
that encoding is redundant, so that there are many different possible
encodings of the same data. Simple examples are:

(1) a forged bank-note, where there are two or more representations of
the same one note. Each note carries a unique identifying number,
which is its certificate of authenticity, and the difficulty of
manufacturing a convincing replica means that the certificate is
useful, because it is more likely than not that 'at the end of the
day' the real note will be matched with the actual object of value it
represents. Some fraction of some particular gold bar, for example.

(2) a person in disguise, masquerading as another person who has some
unique privilege; for example, the right of entry to some secure
facility, such as a military compound. Here there is redundancy
introduced into the representation of the individuals, so that, as in
the case of the forged bank-note, the successful infiltrator
compromises the ability of the experimental method to infallibly
discriminate between those who have this privilege and others who do
not.

(3) 'Picking' the five-lever lock on a door. Here the redundancy is in
the representation of the 'certificate' which signifies authenticity
of the tool used to open the door. It is because the lock-picker's
tools are another representation of the genuine key, and the method
the levers implement is not sufficient to discriminate between the
possible outcomes of genuine and false 'certificates.' Of course the
same principle applies to an illicit copy of the key.

(4) 'Tempest' decoding of the electromagnetic emissions from an LCD
connector on a flat-panel computer display. This is possible because
of the repeated scanning of similar images successively displayed on
the screen. This causes multiple, repeated transmissions of similar
signals which allow an 'attacker' to deconvolve the signal and recover
the original modulating pattern, which is the image being displayed on
the screen. If the computer only transmitted the changes to individual
pixels when they changed, this deconvolution would not be possible in
general.

Hopefully the reader will easily be able to produce many, many more
examples of what are all essentially different forms of the same
fundamental phenomenon. Abstractions which are realized in fixed
concrete representations invariably leak, simply because they
introduce redundancy into the representation of what are supposed (or
assumed) to be unique events. So if we could only find a method of
making abstractions in such a way that their concrete representations
were always changing, then we would have solved the problem of
computer and communications security in general, because there would
be no fixed 'certificate of identity' for the 'attacker' to forge. And
dually, there would be no redundancy for the 'attacker' to 'key into'
in the information which represents the data that system processes.

In [4], Dijkstra presents a method of designing 'operating systems' in
which layers of abstraction are successively introduced, one on top of
the other, and in which some concrete representation of the physical
hardware of the machine 'disappears' as each new abstraction is
added. The disappearing physical objects are replaced by multiple
virtual representations of objects with the same or similar
functionality. I.e., forgeries of the real identifiable hardware. The
paper does not give detailed examples of how this is done, but it
seems to us more than likely that using Dijkstra's other idea of
designing algorithms by a process of analysing what would be required
of a proof of correctness, and taking the view of this analysis as
being guided by a principle of maximizing the information content
(i.e. minimising the redundancy) of successive measurements, one could
formulate a method for deriving a general class of system designs of
the kind Dijkstra describes. One would start by considering what it is
one would need to do to convince someone else that a system is secure
(i.e. that the data are represented concretely with minimal
redundancy, or equivalently, maximum entropy.) For example, that the
data stored in a disk or core memory was 'mixed up' as much as
possible. This is achieved in Dijkstra's example system by
virtualising the physical pages and storing addresses as 'segments'
which are mapped by a virtual process onto physical pages of
memory. So it is only via the virtual page-mapping process that the
segment identifiers are meaningful. But the virtual page-mapping
process is not visible to any other process except via the highly
restrictive synchronisation primitives via which processes are
constrained to communicate.

In [1] Dijkstra mentions:

  "In the beginning I knew that programming posed a much greater
   intellectual challenge that generally understood of acknowledged,
   but my vision of the more detailed structure of that vision was
   still vague. Halfway [through] the multiprogramming project in
   Eindhoven I realized that we would have been in grave difficulties
   had we not seen in time the possibility of definitely unintended
   deadlocks. From that experience we concluded that a successful
   systems designer should recognize as early as possible situations
   in which some theory was needed. In our case we needed enough
   theory about deadlocks and their prevention to develop what became
   known as "the banker's algorithm", without which the
   multiprogramming system would only have been possible in a very
   crude form. By the end of the period I knew that the design of
   sophisticated digital systems was the perfect field of activity for
   the Mathematical Engineer."

[1] Edsger W. Dijkstra. EWD1303

    https://www.cs.utexas.edu/users/EWD/transcriptions/EWD13xx/EWD1303.html

[2] Luis Damas and Robin Milner,
    Principal Type-schemes for Functional Programs,
    In: Proceedings of the 9th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages  ACM, 1982, 202--212.

    http://prooftoys.org/ian-grant/hm/milner-damas.pdf

[3] John Harrison, Introduction to Functional Programming.

    http://www.cl.cam.ac.uk/teaching/Lectures/funprog-jrh-1996/all.pdf

[4] Edsger W. Dijkstra "The Structure of 'THE'--Multiprogramming System"
    Commun. ACM 11 (1968), 5:341–346.

    http://www.cs.utexas.edu/users/EWD/ewd01xx/EWD196.PDF

No comments:

Post a Comment