Monday 21 July 2014

Writing Assembler using Standard ML Functors

Here is an example of something important which is seldom mentioned in Functional Programming circles. One can use Standard ML functors to produce type-checked, efficient assembler code. For example, here is a functor implementing assembler primitives for CAML byte-code machines. The same functor implements code for three different ABI representations of "machine words," which are instances of the Standard ML basis signature Word. The Functor AbstractMachineWord effectively composes the assembler code from JIT instruction-generating routines which are in turn composed by other functors. In this instance, for example, the parameter WordEnc (which is another structure with a defined interface like this one) is something like  that produced by the functor VectorSliceWordEnc.sml which uses other assembler code composed by the code-generating primitives defined in PrimEnc.sml which, you might be pleased to hear, is a first-class structure, also with a defined interface. This structure is fixed in terms of the primitives implemented by the 'foreign function interface' binding to the GNU lightning JIT code-generating library. , and it implements only the  CAML ABI. But VectorSliceWordEnc.sml is just one of the possible representations of machine words in memory, and another is ArraySliceWordEnc.sml. And this uses the same primitives, but results in a different set of assembler functions.

There are also other possible representations of the ABI calling conventions, so we could use the same AbstractMachineWord functor to generate primitives for GNU guile, Python, or Ocaml, just by writing a different version of PrimEnc.sml. Provided this conforms to the interface signature defined in PrimEnc.sig, it will work.

The remarkable thing about these functors is their reliability. It is actually easier to write assembler this way than it is to write it ad-hoc. You don't have to take my word for it, you can try it yourself. The whole repository is available for download and/or forking.

Hindley-Milner Type Inference

A few years ago I lost access to my web site and the Proof-toys people very kindly offered to host this description of Hindley-Milner type inference, including the brilliantly written 1982 paper by Damas and Milner, Principal Type Schemes for Functional Programs, describing the soundness and completeness proofs. There is also a file of the purely functional (but nevertheless buggy! See below.) Standard ML implementation. I am posting this here just to increase the number of links to it.

There are a couple of errors. One is a serious misapprehension I was under. Despite having implemented the algorithm in (apparently working) code, I still thought that resolution happened under universal quantification. That is not so. All quantifiers are stripped before resolving the type-schemes. The mistake is in the last-but-two paragraph on page 12 where I state:
If the assumptions include a type-scheme for x, then the result is simply the generic instantiation of the type-scheme to new variables. Otherwise the algorithm fails.
The other error is in the implementation of capture-avoiding substitution in the function tssubs: it fails when there are free variables in the assumptions Gamma, because an over-enthusiastic optimisation means it doesn't rename below a binder of the variable being alpha-converted. Capture-avoiding substitution is a simple idea, but a bit tricky to implement efficiently, and you probably shouldn't try to implement it in just 22 lines of code!

Thanks for BZIP2

Here's a mail I just sent to the author of the Bzip2 tools.

"If you can be bothered, please email me to say you've got a copy. It's nice to know where this stuff gets to."

Everywhere, I think is the answer, except the default OpenBSD distribution ... which is why I'm fetching source.

Thanks for making bzip2. I have always appreciated it. I used to take tea with David Wheeler every day at Cambridge. I spent a decade at the Computer Laboratory, for some sins I must have committed in a past life, ... but tea with David Wheeler never seemed like punishment. He didn't have a lot to say, but what he did say was always pretty interesting.

He used to use bzip2 as his spam filter. Did you know that? He used it to measure the information content of his incoming messages M as the change in entropy between his "verified as spam" archive S and S+M. The change in entropy was just the difference in the size of the bzip2 compressed files. I was a sys-admin and had to help him once with some technical fiddling to get the mail message processing done automatically.

He used to think about physics in terms of information and entropy too. He had a Machian perspective. He told me once that a better physical theory was one which explained the data more concisely. In a sense he viewed physical theories as programs which reproduced the observed data from a compressed expression of the mechanical cause, i.e. the program. This was essentially Mach's view. I didn't ask David whether he took Mach to Mach's extreme of saying that there was no truth independent of our ability to concisely describe phenomena.

So there, I hope this amuses you, and I hope you feel good about all that effort you put into making bzip2 a supremely reliable workhorse of the Free Software revolution.

And do you know David carried on working right to the end. Despite failing eyesight and heart. He died in the bicycle shed seconds after arriving by bike for work at 8am on a cold winter's day.

Best wishes,
Ian