Friday, 26 September 2014

The Free, the Fair and the Forged

Paul Koning <paulkoning@comcast.net> writes:

> First of all, for the reasons I mentioned yesterday, it doesn’t
> really have anything to do with voting machines.  The issue with
> voting machines is not the trustworthiness of programmers or the
> reliability of their work.  It is the fact that paperless machines
> are conceptually defective, and that defect would remain even if the
> programmers were infallible angels.

The whole problem with Karger-Schell trap doors (see
http://seclab.cs.ucdavis.edu/projects/history/papers/karg74.pdf page
51) is that even if programmers were actually known to _be_
infallible angels, the software they produce would not be trustworthy.

> A voting system — i.e., the entire legal, social, and mechanical
> machinery and processes involved in running an election — is valid
> only if (a) it is sufficiently honest, and (b) that honesty can be
> confirmed by ANY fair observer with near-average intellectual
> abilities.  For (b), if John Doe the farmer or Fred Flintstone the
> day laborer can’t be a poll watcher and see for himself that the
> polls are honest, you don’t have a reliable election.

For the purposes of this discussion, can we assume that (b) implies
(a), or can I raise the paranoia level a notch, and consider the
possibility that there is a Karger-Schell "compiler" trap door in the
education system? This would mean we could have a problem, because it
might be that what John Doe and Fred Flintstone _actually know_ is
honest, is inconsistent with what educated people judge as honest,
with respect to your (a).

I am serious. We _can_ discuss this. If you want to muse over it
yourself a little, start by thinking of a teacher as a sort of
compiler, taking source from text books and compiling some 'object
code' which is the mind of her student. c.f. Dijkstra's comment "I
long ago became convinced that the only students to whom I could teach
anything would become misfits." I think I read this in:

  http://www.cs.utexas.edu/users/EWD/misc/vanVlissingenInterview.html

So, did Dijkstra ever teach you anything? If so, then my
commiserations, but "welcome to the misfits club!" I am pretty sure
Roger is a member, and John Harrison too, so the misfits seem to be
doing OK, as long as you don't count me.

I think a better criterion by which to judge what is a free and fair
election is to let the populace as a whole decide that _after the
election_. And if it is not judged free and fair, then the election is
repeated. And so on and so forth, until the whole population is in
unanimous agreement. Then it doesn't matter if there are trap doors in
the voting hardware and in the software, because they will show up
like any other device malfunction would. So the manufacturer would
have to replace the failing components of the system. And if the
replacements also failed, then they would have to be replaced. So the
manufacturer of voting machines which corrupt the data they collect
will not be in business for very long.  And I think that the business
of any Government elected like this will likely be far far more
efficient and far more easily seen to be for the greater good, and not
solely in the interests of some "powerful" minority.

> Paper ballots, with some straightforward precautions, pass
> these tests.

One needs to take into the account the whole process though. In a
general election in a country the size of the United States of
America, the organization necessary to carry out a paper ballot is
enormous. It may be that a few individuals can be poll-watchers, and
can directly verify the counts of paper votes, and know to some extent
that the actual votes those counts represent exist, and are all the
votes that were made. But then only a few people are actual
witnesses. What these people do is to produce a certificate of some
sort. They sign a paper saying "I observed the electoral process at
polling station #nnn on this date dd/mm/yy, and I hereby declare it to
have been conducted in a free and fair manner."  Then all these papers
have to be collected up and at some point authenticated. And of course
all the direct evidence has to be collected too: the data that says "I
hereby certify that Box mmm, sealed at polling station nnn, on date
dd/mm/yy hh:mm, contains the following: x for canditate X, y for
candidate Y, z spoiled." And this data has to be authenticated and
collected. And the process is a distributed one. So at some point the
data has to be collated and summarized. And _that_ process too has to
be verified. At various levels people are using computers, electronic
communications systems etc.

What I am trying to say is that a paper ballot is _almost_ an
electronic one. The only part that's different is the actual act of
voting. But that is not infallible either. The authenticity of the
electoral roll has to be checked to make sure there aren't false
entries that can be 'filled in' by someone who knows which are the
false numbers on the register, and the individual id's of voters have
to be checked, etc, etc. Thus the integrity of the people who check
these things must checked, etc, etc.

And the question I have for anyone that says a paper ballot can be
made verifiably free and fair is "Who makes the judgement that the
whole combined process is free and fair, and on what basis do they
make that judgement? (i.e. in the words of Martin-Lof, what is it that
they must _actually know_ in order to have the right to say this?)"

> Computerized machinery of ANY kind, whether closed (Diebold)
> or open source, are utterly incapable of providing a
> legitimate election system. 

In order to know this, you need to know the _reasons why_ it is
incapable. So you need to be able to give an informal proof showing
that there is some fundamental difference between paper ballots
and electronic ballots which makes the latter incapable of being
verifiably fair where the former are.

> But while lack of trust in the software developers only makes
> it worse (which is why I mentioned Diebold as a particular
> offender), the presence of trust along the lines you discuss
> does not cure the problem.  It might possibly convince you
> and me (though the odds of convincing me are quite low)
> — it won’t and can’t convince John Doe the farmer, and until
> you can do that the election does not have legitimacy.

This means I didn't explain the idea properly. I was not expecting the
public to be able to verify the code and the hardware in the
electronic vote machine, I was expecting the process as a whole to be
verifiable, tying the individual items of data together with actual
knowledge of how the whole system is arranged.

Perhaps I can give a less vague idea by example. I think we could make
electronic voting systems verifiable in this way if the voter gave a
certificate of authentication to the machine. Imagine that in each
booth there's a numeric key-pad and a die, and each voter, in secret,
generates a 10 digit random number which is the voting machine's
certificate of authenticity of the vote that person cast. The machine
records and reports the anonymous certificate and the vote together
with the date, time, polling station id, etc. The voter may also
record the details, if they wish to (and they will be encouraged to do
so,) or they can choose to have it appear on a printed paper receipt
that the machine produces, and they will be asked to verify that what
is printed is correct, and that fact will also be recorded. Regardless
of whether they choose to record their random number or not, the
receipt will include the polling station details, the date and time
and the serial number of the machine.  Then when the results of the
election as a whole are posted, a cryptographic hash of each secret
number appended to that person's vote is published, along with the
hash algorithm (the algorithm is chosen at random, after all the votes
have been collected). Individuals can compute by a published
algorithm, what number should appear in the published list, which will
show that their vote was accurately represented in the election. So
individually, people will know that their votes are there, and if they
aren't they have a certificate that proves this is the case, and they
can protest.  This certificate ought to appear at various points in
the audit trail of the electoral processing that leads from the voting
machine to the final published result. This will make it easier to
locate failures.

The final stage of the election requires voters to confirm that their
vote has been accurately represented: this is done by transmitting the
random number to the electoral registrar, by some means of their
choice. It could be via a web site, a text message or some such. At
that point another hash of the random number and their vote appended
to the vote sequence number on the receipt they got from the machine
is published. Then the count is done to determine the result. They
will then be able to see the fact that their vote was counted
correctly by calculating the second hash function on their random key
and the machine receipt no and their voting choice. If the number they
get isn't listed, then they will protest.

There would hopefully be a very wide choice of means to do the
calculation of hash functions. People could write their own programs,
or use government-supplied programs; and they will be able to compare
the results these different systems give. So they can individually
know that their own vote was very probably correctly represented in
the count.

Regardless of the specific problems you might see in this scheme (and
they will be many) I hope I've explained the general principle of
macro-verifying the wole process, rather than trying to micro-verify
each and every one of the individual steps. The principle is that if
we do this 'big step' verification, then we can simply assume the
'small step' semantics are good, because otherwise the system would
fail; and this would either show up as a failure at the high-level
verification, or if not, we would have found a way to make a 10
peta-qubit quantum computer, so either way, we win.

> Second observation: you raise some interesting questions.  I gather
> from your email that you’re hoping for actions, but I can’t tell
> from the essay what actions you’re looking for.  Are you looking for
> the open source community to adopt more rigor in design? 
> What do you view as next steps, what would you define as a
> successful outcome?

Yes, all software, not just open source, will be produced with rigour:
specifically, that rigour will be achieved by only using software that
is machine generated from a formal specification. I don't mean
anything especially mathematical-looking. Take an assembler, for
example. Imagine you are rewriting the disaster that is GNU
as. Instead of writing thousands of lines of C full of macros and
what-not, you write a grammar for a language which specifies the
syntax of assembler statements, and their semantics, which are the
rules for encoding instructions into macine code when given abstract
syntax trees representing the instruction and its arguments. You spend
nine months doing this, and in the next three months you write a
parser for that language, and you write the data files that describe
the instruction encoding rules for the different CPUs such as i386,
PowerPC etc. These you are just transliterations of the manufacturer's
documentation for the CPU instruction set.

Then you define another language which defines a process which takes
abstract syntax trees representing these CPU metadata files, and
generates source code to implement the assembler in some concrete
language. Then you write a translator for C, say. You could also write
another translation which generates Python code from the same formal
specification, and then you would have an assembler written in Python
which was 100% compatible with the C one. Now because these concrete
source-code implementations in C and Python are automatically
generated, you can be sure that there are no "extraneous semantics" in
them, because there is not one single line of code in there that does
not have just one purpose, being the implementation of that part of
the specification that caused it to be generated.

And then, if you wanted to, you could implement the i386 assembler in
Z80 assembler, and the Z80 assembler in PDP11 assembler. This would
just be a matter of writing different translations of the abstract
assembler specs, instead of producing C or Python, they produce PDP11
or Z80 assembler code. Then you could go out to the garage and fire up
your old PDP11 to compile the i386 assembler into Z80 machine code,
and transfer the data via an RS232 cable to your Z80 development
board. Then you wire up the development board to the USB serial
adapter on your fancy new 8 processor xxxx server, and use the Z80
development board as a trusted hardware platform from which you can
reflash the BIOS with your specialized boot loader, which boots
directly off the xxxx storage area network, rather than a local IDE
hard disk drive, because you really have no idea what that's got on
it.

Now this process will give you confidence in that machine, but your
customers won't have any basis for extra confidence, unless you
publish the method and the formal semantics you used. Then one of them
could carry out a similar exercise, using an IBM 370 and an Apple ][e
with a 3270 terminal adapter. It wouldn't be so much work for them,
they'd just need to program the translators to 370 assembler and 6502.

So this is not merely a certificate that xxxx be offering, it is the
possibility of actually knowing that the system is secure. And I think
you'd sell more machines if you did that.

> Are you looking to "find a way to re-employ all the people
> who enjoy coding”?

The people who enjoy coding will have to be trained to think on a
higer level. Instead of thinking about just one program they're
working on, they'll have to learn to generalise that technology, so
that they will be thinking about an entire generic class of programs
implementing that type of system, and using what Roger Schell calls
"local configuration data" to adapt the specification to specific
conditions in a given concrete implementation. The local configuration
would be written in a language designed for exactly this purpose, so
that it cannot be used to introduce "extraneous semantics".

> [...] what would you define as a successful outcome?

The successful outcome will be when all governments spend twice as
much on education as they currently spend on the military, and when
military service is known by all to be a necessary and just part of
education.

---

> Third: reading between the lines, I think you’re saying that the
> formal statement of the algorithm’s properties (its pre- and
> post-conditions) are actually equivalent to the algorithm, and one is
> no easier or harder to produce than the other (there’s a more or less
> mechanical mapping from one to the other).  I agree with that.  The
> corrolary is that anyone willing and able to produce the one is
> capable of producing the other.  Or in other words, the reason that
> you see little of this (outside the world of people building seriously
> secure systems) is that few programmers are either willing or,
> especially, able.

Yes. I hope that the only part of that you had to read from between
the lines was the bit about few programmers being able to do it. This
is not news though. Roger tells several good stories about it from
the early 60s in the Computer Security History Project interview.

> I was going to say that part of the program is that people program
> in languages that are unsuitable for constructing reliable systems,
> such as C or C++.  But that’s not quite true.  Consider that Dijstra
> did his early work in assembly language, which is a bit worse than
> even C.

My experience is that assembler is actually much better when you are
generating code automatically from a formal spec. You can compose
fragments of assembler programmatically from a machine readable formal
spec. much more easily than you can fragments of C, where you have to
worry about large-scale structure, declaring variables and
what-not. When you do this with assembler, all you need to worry about
is register allocation, and I think it very likely there are ways to
automate that.

> What probably is accurate is to say that the right choice
> will make it easier, or will help compensate for less than complete
> rigor.  For example, a lot of software developers use design specs
> of varying levels of detail and formalism.  Few are the kinds of
> formal specifications Dijkstra talked about.  And as such, they
> don’t produce the same benefits, but they do produce *some*
> benefit. And for such intermediate levels of rigor, using good tools
> helps.  Implement a moderately precise spec using C, or using Ada or
> Modula-3 — the resulting product quality will most likely be lower
> in the C implementation.

I think I agree, but I am suggesting using even better tools than
general-purpose programming languages. The problem with Ada and
Modula-3 etc. is that they are _too_ flexible. You have no way to
verify that what the programmer wrote only does what it is supposed to
do, so you can't know that "... there is no extraneous code and the
system is shown to do what it supposed to do – _and nothing more._"
(Anderson et al. "Subversion as a Threat in Information Warfare")

It is better to use a domain-specific language, with a formal
semantics, so that you can generate concrete code in any
language. Your abstract system, the one written in the domain-specific
language, is then immune to subversion by any particular compiler or
system.

> On specifications, rigorous analysis, and security: are you familiar
> with the international standard for security evaluation called
> Common Criteria?  I’ve been involved in applying that process to our
> products for some time now.  Rhere are 7 levels, we’re using level 2
> which appears to be quite common, though some of our other products
> use level 4.  Even at level 2 it has visible benefits.  I recently
> heard of a real-time OS that had been evaluated to level 7.  Pretty
> amazing; that requires full formal specifications and analysis.  Hm,
> that suggests that the sort of thing that Dijkstra was talking about
> is indeed used today in commercial practice — even if only rarely.

I didn't know about the ISO secure systems standard until I read about
them in the aforementioned Anderson et. al., which Roger Schell
pointed us to. They seem to be a successor to the Orange Book TCSEC. I
have scanned them (after spending an hour or so making a mental map of
the acronyms!) I need to spend at least another 100 hours before I
could say I know what it's about! The problem with _all_ the ISO
standards is that they're written in English, for people to read. I
want the ISO to join in with this project I am trying to describe, and
produce machine-readable documents which contain all the data in the
standards, so that people can write parsers to read the languages the
data is defined in, and then write programs to translate that ISO spec
into a reference implementation. This is not as hard as you might
think. Already the ISO C standard includes a grammar, which is BNF
which you can just type into a YACC parser spec. and generate a set of
parsing tables in Moscow ML, say. So I wrote an ISO 1989 C parser in a
very little time. But there is a lot more to C than the parser. The C
grammar doesn't constrain parsed code to those programs which have
well-defined semantics. There are lots of extra conditions which are
written informally in the language spec, (at least in the freely
available drafts which I have seen. Maybe if you have the published
standards you have access to more.) And these have to be checked in a
separate stage.

This too could be formalised though. I have been experimenting with a
very simple term-rewriting language in which the semantic phases of a
C compiler could be formally described. It's just a bottom-up pattern
matching and term-rewriting notation, and using this I can express
quite a lot of the post-parsing 'semantics' in the C spec.  I wrote an
interpreter for this term-rewriting language in a few hundred lines of
Standard ML, and I think anyone could do something similar in any
other general-purpose programming language. So if the ISO standard
came with the machine-readable semantics of a reference
implementation, then anyone could independently produce a reference C
compiler, which they could use to compile an open source C compiler,
whose source code they have thoroughly audited.

The code for this is at the github repo I sent you a link to. The file
that has it is src/dynlibs/ffi/testrewrite.sml

> A few of your examples seem to be inaccurate.  The one about
> paper money has not been valid for a century (no money in
> current use is backed by anything).

I know. But why do you say that makes my example invalid?  The fact
that the vast majority of USD, GBP and very probably many, many other
promisory notes in circulation, are forged doesn't mean this is not a
valid example of forgery of a certificate! Arguably it is a better
example, because you can just look in your bill-fold if you want to
see it for yourself! The note represents a promise, and the number on
the note is a certificate of the trustworthiness of the promise. But
most people think the note itself has value, which is how the forgers
managed to get away with it so long; that and the fact that the
forgers are the same people who produce the genuine notes, so the
forgeries can be made physically just as convincing as the real
things.

If there _is_ some knowledge which discriminates the forged notes from
the genuine ones then it will show up in their 'street value'. So you
might want to try to withdraw your USD 100,000 savings in US$ 2 bills,
for example, and if they said "Certainly Sir, that will be $US
10,000,000 then you know you're onto something :-) If it doesn't work
with $2 then try another denomination.

But if the authenticity of the certificate is not simply encoded in
the denominations, well then you're out of luck. It could be an
arbitrarily complex sub-set of the serial numbers, and without the
actual algorithm for checking them (which is a one-way function, one
would hope, otherwise they screwed-up disastrously!) you won't be able
to know.  Whatever it is, if there is such a "Federal reserve-reserve
currency," it must have been enshrined in Federal law somewhere. So
the only people who would know about it would be people who know
enough to be able understand the statutes in detail. I.e. people who
have been to a "really good" law school, and had a "really good"
teacher, and who weren't "the wrong sort of person," which is a
euphemism for someone who shows any sign whatsoever of moral or
intellectual integrity. I am now wondering if maybe I know more about
conservative American domestic politics than I thought I did. :-)

> A more significant one -- you said: "This is why NOBODY really
> understands public-key cryptography algorithms: because no-one
> actually knows WHY they are secure.”  That’s not right.  Public key
> algorithms, especially RSA, are quite simple.  RSA is simpler than
> FFT, I would say.  It is entirely obvious that they are NOT
> absolutely secure.  It is also entirely obvious that RSA is NO MORE
> secure than the difficulty of factoring a number of a given size.
> (Some crypto systems — but, as far as I recall, not RSA — have
> proofs of security that demonstrates they are AS secure as a certain
> hard math problem, rather than only being at most as secure as
> that.)  The practical usefulness of RSA derives simply from two
> facts: (a) the best known factoring algorithm is hard enough that
> factoring a 2000 bit number is unaffordable, and (b) there are no
> known attacks on RSA that are easier than factoring.  It’s really as
> simple as that.

Someone must have changed the text in transmission, because the rest
of my paragraph, the first sentence of which you quoted above, says
pretty much what you are saying. The knowledge that the PK certifies
is not actual knowledge of security, it is simply the absence of
actual knowledge that it is not secure. I.e. it is the absence of a
computationally feasible algorithm for computing the inverse of the
'one-way' function. The mathematics is only involved in computing the
values of the function, or showing, say, that the factorization
problem reduces to the problem in question. There is never any
mathematics _per se_ involved in the deduction that the inverse of the
one-way function is computationally infeasible, because that requires
knowing enough about every single one of the possible algorithms there
are which compute it, to show that they are all exponential in time.

So the security of the algorithm is not known, it is just the
INSECURITY of the algorithm is ASSUMED to be UNKNOWN. Now that might
be OK, if there were plausible grounds for the assumption being a
valid one. But there aren't any. There are however plausible grounds
for the assumption being INVALID, because if someone did know a way to
crack a popular PK cryptographic code then it is more likely that they
would keep that secret to themselves, than that they would publish
it. This evaluation of the probability being > 50% comes from the fact
that we are dealing with a situation in which the adversary is assumed
to have some motive to intercept and decipher encryted messages, so it
seems they would be more likely not to tell you that they can do this;
and it seems reasonable to assume that the adversary will take every
opportunity to make it easier for them to decrypt your messages. But
these days, not all adversaries are reasonable, so who knows!

Either way, anyone who claims PK cryptography is in any sense
positively secure is obliged to explain why what I have said here is
invalid. Your (a) and (b) may _seem_ to be statements about what is
known. But that is not what is _actually known_. What is _actually
known_ in general is utterly unknowable to any one of us. Your (a) and
(b) are only hypotheses about what is not known _in general,_ and I
hope I've explained why I think their truth is implausible.

>  (By the way, (a) becomes false as soon as quantum
> computers become feasible at that scale; factoring is fast in
> quantum computers if only you could build one bigger than a few
> dozen qubits.)  The same sort of statement of practical strength
> applies to all cryptosystems with the single exception of One Time
> Pad (which is easily proven to be unconditionally secure).  For a
> lot more in a reasonably easy to read package, I recommend “Applied
> Cryptography” by Bruce Scheier.

I've heard of Schneier's book, and his blog, but I've not read
anything of his. If he doesn't mention this issue of the absence of
actual knowledge though, then I don't feel particularly inclined to go
out of my way to read it.

I was a Unix system administrator for about 15 years, and the issue I
constantly came up against with security is that it is _always_ a
question of actual knowledge, and that the necessary actual knowledge
almost _never_ existed. So my conclusion was that computer and
communications security was an emotional state, not an intellectual
one. You can spend months locking down all the potential problems in a
system, and as a result of all that work you feel safe and secure. And
then a vulnerability is publicly announced, concerning some critical
component in your system and *poof*, all that security is vaporised!
In an instant. Yet that vulnerability was there all the time you felt
safe and secure. So the cause of that all-important sense of security
is not knowledge, it's ignorance.

Because of this, quite early on I began thinking about security from
the point of view of it being what George Bush Jr. memorably referred
to as an "unknown unknown" as opposed to a "known unknown", and I
would _assume_ that there was always at least one compromised user
account on each machine on the network, and that the root account on
one or more of the central servers was compromised. And I hope I had
that in mind when I made decisions about how to configure things.

But what I am proposing is a method that will allow us to base
security on actual knowledge rather than emotion. All we need to do is
learn how to generate code automatically, and how to calculate with
probabilities.

> On Sep 19, 2014, at 9:18 PM, Ian Grant <ian.a.n.grant@googlemail.com> wrote:
> > ...
> > What do you think are your chances of persuading xxxxxxx to give
> > your three best programmers a one-year sabbatical, to re-juvinate
> > their careers, and show the world that xxxx can produce world-class
> > software like those Idaho moo-cows of his produce milk?

> On that notion: I like it, but I don’t think the odds are good.
> xxxx is much more a PC hardware company.  The business I’m in is
> storage systems, that has a large software component.  And there are
> other parts that are all software.  Still, I’m not sure how the sort
> of idea you mention would fly.

Discussing it can't do any harm, can it? Storage systems are just
layers 0 and 1 of the THE-mps, you needn't implement layers 2 and 3 if
you don't want to.  And I gave the example above of one selling point:
real security. I'm not aware that anyone offers that these days, or if
they do it must be at a price that's out of my ability to even
imagine. You could use this as a basis for that level 7 validation for
your storage products. I would be happy to help in way I can, and
Roger knows the methodology, because he has used it, probably many,
many times, so I'm sure Aesec would be able to help too. I do not have
any connection with that company; my only interest in your doing this
is that I would like to be able to source hardware that is potentially
knowably secure, so I am going to encourage all hardware manufacturers
to do this.

Thank you very, very much for your comments. I can't tell you how good
it is to be able to discuss these things openly with someone who goes
to some trouble to read and understand what I have written.

Best wishes,
Ian

No comments:

Post a Comment