RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://existentialtype.wordpress.com/category/teaching-2/ below:

Teaching | Existential Type

Sequentiality as the Essence of ParallelismNovember 4, 2017

I recently thought of a nice way to structure a language for parallel programming around the concept of sequential composition. Think of parallelism as the default—evaluate everything in parallel unless the semantics of the situation precludes it: sums are posterior to summands, but the summands can be evaluated simultaneously. You need a way to express the necessary dependencies without introducing any spurious ones.

There’s a tool for that, called lax logic, introduced by Fairtlough and Mendler and elaborated by Davies and Pfenning, which I use extensively in PFPL. The imperative language Modernized Algol is formulated in the lax style, distinguishing two modes, or levels, of syntax, the (pure) expressions and the (impure) commands. The lax modality, which links the two layers, behaves roughly like a monad, but, all the hype notwithstanding, it is not the central player. It’s the modes, not the modality, that matter. (See the Commentary on PFPL for more.)

The lax modality is just the ticket for expressing parallelism. Rather than separate expressions from commands, here we distinguish between values and computations. The names are important, to avoid possible confusion. Values are fully evaluated; they are not a source of parallelism. (If values were called “pure”, it would be irresistible to think otherwise.) Computations have yet to be evaluated; they engender parallelism by sequential composition. What? No, you didn’t nod off! Let me explain.

Parallelism is all about the join points. If parallel execution is the default, then the job of the programmer is not to induce parallelism, but to harness it. And you do that by saying, “this computation depends on these others.” Absent that, there is nothing else to say, just go for it. No sub-languages. No program analysis. No escaping the monad. Just express the necessary dependencies, and you’re good to go.

So, what are the join points? They are the elimination forms for two parallel modalities. They generalize the sequential case to allow for statically and dynamically determined parallelism. A value of parallel product type is a tuple of unevaluated computations, a kind of “lazy” tuple (but not that kind of laziness, here I just mean unevaluated components). The elimination form evaluates all of the component computations in parallel, creates a value tuple from their values, and passes it to the body of the form. Similarly, a value of parallel sequence type is a generator consisting of two values, a natural number n indicating its size, and a function determining the ith component computation for each 1≤i<n. The elimination form activates all n component computations, binds their values to a value sequence, and passes it to the body of the form.

The join point effects a change of type, from encapsulated computations to evaluated values, neatly generalizing sequential composition from a unary to a multiway join. If you’d like, the parallel products and parallel sequences are “generalized monads” that encapsulate not just one, but many, unevaluated computations. But they are no more monads than they are in any other functional language: the categorial equational laws need not hold in the presence of, say, divergence, or exceptions.

The dynamics assigns costs to computations, not to values, whose cost of creation has already been paid. The computation that just returns a value has unit work and span. Primitive operations take unit work and span. The sequential composition of a parallel product with n components induces span one more than the maximum span of the constituents, and induces work one more than the sum of their work. The dynamics of sequential composition for parallel sequences is similar, with the “arity” being determined dynamically rather than statically.

Programming in this style means making the join points explicit. If you don’t like that, you can easily define derived forms—and derived costs—for constructs that do it for you. For example, a pair of computations might be rendered as activating a parallel pair of its components, then returning the resulting value pair. And so on and so forth. It’s no big deal.

En passant the modal formulation of parallelism solves a nasty technical problem in a substitution-based cost semantics that does not make the modal distinction. The issue is, how to distinguish between the creation of a value, and the many re-uses of it arising from substitution? It’s not correct to charge again and again for cresting the value each time you see it (this cost can be asymptotically significant), but you do have to charge for creating it somewhere (it’s not free, and it can matter). And, anyway, how is one to account for the cost of assessing whether an expression is, in fact, a value? The usual move is to use an environment semantics to manage sharing. But you don’t have to, the modal framework solves the problem, by distinguishing between a value per se; the computation that returns it fully created; and the computation that incrementally constructs it from its constituent parts. It’s the old cons-vs-dotted pair issue, neatly resolved.

Please see Section 10 of the Commentary on PFPL for a fuller account. The main idea is to generalize a type of single unevaluated computations, which arises in lax logic, to types of statically- and dynamically many unevaluated computations. The bind operation becomes a join operation for these computations, turning a “lazy” tuple or sequence into eager tuples or sequences.

Updates: word-smithing, added cite to Davies-Pfenning, replaced cite of course notes with reference to commentary.

6 Comments | Programming, Research, Teaching | Tagged: functional programming, parallelism, programming languages, semantics | Permalink
Posted by Robert Harper

Proofs by contradiction, versus contradiction proofsMarch 4, 2017

It is well-known that constructivists renounce “proof by contradiction”, and that classicists scoff at the critique. “Those constructivists,” the criticism goes, “want to rule out proofs by contradiction. How absurd! Look, Pythagoras showed that the square root of two is irrational by deriving a contradiction from the assumption that it is rational. There is nothing wrong with this. Ignore them!”

On examination that sort of critique fails, because a proof by contradiction is not a proof that derives a contradiction. Pythagoras’s proof is valid, one of the eternal gems of mathematics. No one questions the validity of that argument, even if they question proof by contradiction.

Pythagoras’s Theorem expresses a negation: it is not the case that the square root of two can be expressed as the ratio of two integers. Assume that it can be so represented. A quick deduction shows that this is impossible. So the assumption is false. Done. This is a direct proof of a negative assertion; it is not a “proof by contradiction”.

What, then, is a proof by contradiction? It is the affirmation of a positive statement by refutation of its denial. It is a direct proof of the negation of a negated assertion that is then pressed into service as a direct proof of the assertion, which it is not. Anyone is free to ignore the distinction for the sake of convenience, as a philosophical issue, or as a sly use of “goto” in a proof, but the distinction nevertheless exists and is important. Indeed, part of the beauty of constructive mathematics is that one can draw such distinctions. Once drawn, such distinctions can be disregarded; once blurred, forever blurred, irrecoverably.

For the sake of explanation, let me rehearse a standard example of a genuine proof by contradiction. The claim is that there exists irrationals a and b such that a to the b power is rational. Here is an indirect proof, a true proof by contradiction. Let us prove instead that it is impossible that any two irrationals a and b are such that a to the b is irrational. This is a negative statement, so of course one proves it by deriving a contradiction from assuming that which is negated. Suppose, for a contradiction, that for every two irrationals a and b, the exponentiation a to the b power is irrational. We know from Pythagoras that root two is irrational, so plug it in for both a and b, and conclude that root two to the root two power is irrational. Now use the assumption again, taking a to be root two to the root two, and b to be root two. Calculate a to the power of b, it is two, which is eminently rational. Contradiction.

We have now proved that it is not the case that every pair of irrationals, when exponentiated, give an irrational. There is nothing questionable about this proof. But it does not prove that there are two irrationals whose exponent is rational! If you think it does, then I ask you, please name them for me. That information is not in this proof (there are other proofs that do name them, but that is not relevant for my purposes). You may, if you wish, disregard the distinction I am drawing, that is your prerogative, and neither I nor anyone has any problem with that. But you cannot claim that it is a direct proof, it is rather an indirect proof, that proceeds by refuting the negative of the intended assertion.

So why am I writing this? Because I have learned, to my dismay, that in U.S. computer science departments–of all places!–students are being taught, erroneously, that any proof that derives a contradiction is a “proof by contradiction”. It is not. Any proof of a negative must proceed by contradiction. A proof by contradiction is, contrarily, a proof of a positive by refutation of the negative. This distinction is important, even if you want to “mod out” by it in your work, for it is only by drawing the distinction that one can even define the equivalence with which to quotient.

That’s my main point. But for those who may not be familiar with the distinction between direct and indirect proof, let me take the opportunity to comment on why one might care to draw such a distinction. It is a matter of honesty, of a sort: the information content of the foregoing indirect proof does not fulfill the expectation stated in the theorem. It is a kind of boast, an overstatement, to claim otherwise. Compare the original statement with the reformulation used in the proof. The claim that it is not the case that every pair of irrationals exponentiate to an irrational is uncontroversial. The proof proves it directly, and there is nothing particularly surprising about it. One would even wonder why anyone would bother to state it. Yet the supposedly equivalent claim stated at the outset appears much more fascinating, because most people cannot easily think up an example of two irrationals that exponentiate to rationals. Nor does the proof provide one. Once, when shown the indirect proof, a student of mine blurted out “oh that’s so cheap.” Precisely.

Why should you care? Maybe you don’t, but there are nice benefits to keeping the distinction, because it demarcates the boundary between constructive proofs, which have direct interpretation as functional programs, and classical proofs, which have only an indirect such interpretation (using continuations, to be precise, and giving up canonicity). Speaking as a computer scientist, this distinction matters, and it’s not costly to maintain. May I ask that you adhere to it?

Edit: rewrote final paragraph, sketchy and irrelevant, and improved prose throughout. Word-smithing, typos.

31 Comments | Programming, Research, Teaching | Permalink
Posted by Robert Harper

PCLSRING in SemanticsJuly 11, 2016

PCLSRING is an operating system kernel design technique introduced in ITS for managing interruptions of long-running synchronous system calls. It was mentioned in an infamous diatribe by Dick Gabriel, and is described in loving detail by Allen Bawden in an article for the ages.

Discussions of PCLSRING usually center on fundamental questions of systems design. Is the ITS approach better than the Unix approach? Should the whole issue be avoided by using asynchronous system calls, as in VMS? And weren’t the good old days better than the bad new days anyway?

Let’s set those things aside for now and instead consider what it is, rather than what it’s for or whether it’s needed. The crux of the matter is this. Suppose you’re working with a system such as Unix that has synchronous system calls for file I/O, and you initiate a “large” read of n bytes into memory starting at address a. It takes a while to perform the transfer, during which time the process making the call may be interrupted for any number of reasons. The question is, what to do about the process state captured at the moment of the interrupt?

For various reasons it doesn’t make sense to snapshot the process while it is running inside the kernel. One solution is to simply stop the read “in the middle” and arrange that, when the process resumes, it returns from the system call indicating that some m<=n bytes have been read. You’re supposed to check that m=n yourself anyway, and restart the call if not. (This is the Unix solution.) It is all too easy to neglect the check, and the situation is made the worse because so few languages have sum types which would make it impossible to neglect the deficient return.

PCLSRING instead stops the system call in place, backs up the process PC to the system call, but with the parameters altered to read n-m bytes into location a+m, so that when the process resumes it simply makes a “fresh” system call to finish the read that was so rudely interrupted. The one drawback, if it is one, is that your own parameters may get altered during the call, so you shouldn’t rely on them being anything in particular after it returns. (This is all more easily visualized in assembly language, where the parameters are typically words that follow the system call itself in memory.)

While lecturing at this year’s OPLSS, it occurred to me that the dynamics of Modernized Algol in PFPL, which is given in Plotkin’s style, is essentially the same idea. Consider the rule for executing an encapsulated command:

if m → m’, then bnd(cmd(m);x.m”) → bnd(cmd(m’);x.m”)

(I have suppressed the memory component of the state, which is altered as well.) The expression cmd(m) encapsulates the command m. The bnd command executes m and passes its result to another command, m”, via the variable x. The above rule specifies that a step of execution of m results in a reconstruction of the entire bnd, albeit encapsulating m’ , the intermediate result, instead of m. It’s exactly PCLSRING! Think of m as the kernel code for the read, think of cmd as the system call, and think of the bnd as the sequential composition of commands in an imperative language. The kernel only makes partial progress executing m before being interrupted, leaving m’ remaining to be executed to complete the call. The “pc” is backed up to the bnd, albeit modified with m’ as the new “system call” to be executed on the next transition.

I just love this sort of thing! The next time someone asks “what the hell is PCLSRING?”, you now have the option of explaining it in one line, without any mention of operating systems. It’s all a matter of semantics.

8 Comments | Programming, Research, Teaching | Tagged: operating systems, pclsring, structural operational semantics | Permalink
Posted by Robert Harper

PFPL CommentaryJune 3, 2016

I am building a web page devoted to the 2nd edition of Practical Foundations for Programming Languages, recently published by Cambridge University Press. Besides an errata, the web site features a commentary on the text explaining major design decisions and suggesting alternatives. I also plan to include additional exercises and to make sample solutions available to faculty teaching from the book.

The purpose of the commentary is to provide the “back story” for the development, which is often only hinted at, or is written between the lines, in PFPL itself. To emphasize enduring principles over passing fads, I have refrained from discussing particular languages in the book. But this makes it difficult for many readers to see the relevance. One purpose of the commentary is to clarify these connections by explaining why I said what I said.

As a starting point, I explain why I ignore the familiar concept of a “paradigm” in my account of languages. The idea seems to have been inspired by Kuhn’s (in)famous book The Structure of Scientific Revolutions, and was perhaps a useful device at one time. But by now the idea of a paradigm is just too vague to be useful, and there are many better ways to explain and systematize language structure. And so I have avoided it.

I plan for the commentary to be a living document that I will revise and expand as the need arises. I hope for it to provide some useful background for readers in general, and teachers in particular. I wish for the standard undergraduate PL course to evolve from a superficial taxonomy of the weird animals in the language zoo to a systematic study of the general theory of computation. Perhaps PFPL can contribute to effecting that change.

Update: As I had hoped, I have been making many new additions to the commentary, exposing alternatives, explaining decisions, and expanding on topics in PFPL. There are also a few errors noted in the errata; so far, nothing major has come up. (The sections on safety are safely sound.)

1 Comment | Research, Teaching | Permalink
Posted by Robert Harper

It Is What It Is (And Nothing Else)February 22, 2016

A recent discussion of introductory computer science education led to the topic of teaching recursion. I was surprised to learn that students are being taught that recursion requires understanding something called a “stack” that is nowhere in evidence in their code. Few, if any, students master the concept, which is usually “covered” only briefly. Worst, they are encouraged to believe that recursion is a mysterious bit of esoterica that is best ignored.

And thus is lost one of the most important and beautiful concepts in computing.

The discussion then moved on to the implementation of recursion in certain inexplicably popular languages for teaching programming. As it turns out, the compilers mis-implement recursion, causing unwarranted space usage in common cases. Recursion is dismissed as problematic and unimportant, and the compiler error is elevated to a “design principle” — to be serpentine is to do it wrong.

And thus is lost one of the most important and beautiful concepts in computing.

And yet, for all the stack-based resistance to the concept, recursion has nothing to do with a stack. Teaching recursion does not need any mumbo-jumbo about “stacks”. Implementing recursion does not require a “stack”. The idea that the two concepts are related is simply mistaken.

What, then, is recursion? It is nothing more than self-reference, the ability to name a computation for use within the computation itself. Recursion is what it is, and nothing more. No stacks, no tail calls, no proper or improper forms, no optimizations, just self-reference pure and simple. Recursion is not tied to “procedures” or “functions” or “methods”; one can have self-referential values of all types.

Somehow these very simple facts, which date back to the early 1930’s, have been replaced by damaging myths that impede teaching and using recursion in programs. It is both a conceptual and a practical loss. For example, the most effective methods for expressing parallelism in programs rely heavily on recursive self-reference; much would be lost without it. And the allegation that “real programmers don’t use recursion” is beyond absurd: the very concept of a digital computer is grounded in recursive self-reference (the cross-connection of gates to form a latch). (Which, needless to say, does not involve a stack.) Not only do real programmers use recursion, there could not even be programmers were it not for recursion.

I have no explanation for why this terrible misconception persists. But I do know that when it comes to programming languages, attitude trumps reality every time. Facts? We don’t need no stinking facts around here, amigo. You must be some kind of mathematician.

If all the textbooks are wrong, what is right? How should one explain recursion? It’s simple. If you want to refer to yourself, you need to give yourself a name. “I” will do, but so will any other name, by the miracle of α-conversion. A computation is given a name using a fixed point (not fixpoint, dammit) operator: fix x is e stands for the expression e named x for use within e. Using it, the textbook example of the factorial function is written thus:

fix f is fun n : nat in case n {zero => 1 | succ(n') => n * f n'}.

Let us call this whole expression fact, for convenience. If we wish to evaluate it, perhaps because we wish to apply it to an argument, its value is

fun n : nat in case n {zero => 1 | succ(n') => n * fact n'}.

The recursion has been unrolled one step ahead of execution. If we reach fact again, as we will for a positive argument, fact is evaluated again, in the same way, and the computation continues. There are no stacks involved in this explanation.

Nor is there a stack involved in the implementation of fixed points. It is only necessary to make sure that the named computation does indeed name itself. This can be achieved by a number of means, including circular data structures (non-well-founded abstract syntax), but the most elegant method is by self-application. Simply arrange that a self-referential computation has an implicit argument with which it refers to itself. Any use of the computation unrolls the self-reference, ensuring that the invariant is maintained. No storage allocation is required.

Consequently, a self-referential functions such as

fix f is fun (n : nat, m:nat) in case n {zero => m | succ(n') => f (n',n*m)}

execute without needing any asymptotically significant space. It is quite literally a loop, and no special arrangement is required to make sure that this is the case. All that is required is to implement recursion properly (as self-reference), and you’re done. There is no such thing as tail-call optimization. It’s not a matter of optimization, but of proper implementation. Calling it an optimization suggests it is optional, or unnecessary, or provided only as a favor, when it is more accurately described as a matter of getting it right.

So what, then, is the source of the confusion? The problem seems to be a too-close association between compound expressions and recursive functions or procedures. Consider the classic definition of factorial given earlier. The body of the definition involves the expression

n * fact n'

where there is a pending multiplication to be accounted for. Once the recursive call (to itself) completes, the multiplication can be carried out, and it is necessary to keep track of this pending obligation. But this phenomenon has nothing whatsoever to do with recursion. If you write

n * square n'

then it is equally necessary to record where the external call is to return its value. In typical accounts of recursion, the two issues get confused, a regrettable tragedy of error.

Really, the need for a stack arises the moment one introduces compound expressions. This can be explained in several ways, none of which need pictures or diagrams or any discussion about frames or pointers or any extra-linguistic concepts whatsoever. The best way, in my opinion, is to use Plotkin’s structural operational semantics, as described in my Practical Foundations for Programming Languages (Second Edition) on Cambridge University Press.

There is no reason, nor any possibility, to avoid recursion in programming. But folk wisdom would have it otherwise. That’s just the trouble with folk wisdom, everyone knows it’s true, even when it’s not.

Update: Dan Piponi and Andreas Rossberg called attention to a pertinent point regarding stacks and recursion. The conventional notion of a run-time stack records two distinct things, the control state of the program (such as subroutine return addresses, or, more abstractly, pending computations, or continuations), and the data state of the program (a term I just made up because I don’t know a better one, for managing multiple simultaneous activations of a given procedure or function). Fortran (back in the day) didn’t permit multiple activations, meaning that at most one instance of a procedure can be in play at a given time. One consequence is that α-equivalence can be neglected: the arguments of a procedure can be placed in a statically determined spot for the call. As a member of the Algol-60 design committee Dijkstra argued, successfully, for admitting multiple procedure activations (and hence, with a little extra arrangement, recursive/self-referential procedures). Doing so requires that α-equivalence be implemented properly; two activations of the same procedure cannot share the same argument locations. The data stack implements α-equivalence using de Bruijn indices (stack slots); arguments are passed on the data stack using activation records in the now-classic manner invented by Dijkstra for the purpose. It is not self-reference that gives rise to the need for a stack, but rather re-entrancy of procedures, which can arise in several ways, not just recursion. Moreover, recursion does not always require re-entrancy—the so-called tail call optimization is just the observation that certain recursive procedures are not, in fact, re-entrant. (Every looping construct illustrates this principle, albeit on an ad hoc basis, rather than as a general principle.)

45 Comments | Programming, Teaching | Permalink
Posted by Robert Harper

Summer of Programming LanguagesJuly 6, 2014

Having just returned from the annual Oregon Programming Languages Summer School, at which I teach every year, I am once again very impressed with the impressive growth in the technical sophistication of the field and with its ability to attract brilliant young students whose enthusiasm and idealism are inspiring. Eugene was, as ever, an ideal setting for the summer school, providing a gorgeous setting for work and relaxation. I was particularly glad for the numerous chances to talk with students outside of the classroom, usually over beer, and I enjoyed, as usual, the superb cycling conditions in Eugene and the surrounding countryside. Many students commented to me that the atmosphere at the summer school is wonderful, filled with people who are passionate about programming languages research, and suffused with a spirit of cooperation and sharing of ideas.

Started by Zena Ariola a dozen years ago, this year’s instance was organized by Greg Morrisett and Amal Ahmed in consultation with Zena. As usual, the success of the school depended critically on the dedication of Jim Allen, who has been the de facto chief operating officer since it’s inception. Without Jim, OPLSS could not exist. His attention to detail, and his engagement with the students are legendary. Support from the National Science Foundation CISE Division, ACM SIGPLAN, Microsoft Research, Jane Street Capital, and BAE Systems was essential for providing an excellent venue, for supporting a roster of first-rate lecturers, and for supporting the participation of students who might otherwise not have been able to attend. And, of course, an outstanding roster of lecturers donated their time to come to Eugene for a week to share their ideas with the students and their fellow lecturers.

The schedule of lectures is posted on the web site, all of which were taped, and are made available on the web. In addition many speakers provided course notes, software, and other backing materials that are also available online. So even if you were not able to attend, you can still benefit from the summer school, and perhaps feel more motivated to come next summer. Greg and I will be organizing, in consultation with Zena. Applying the principle “don’t fix what isn’t broken”, we do not anticipate major changes, but there is always room for improvement and the need to freshen up the content every year. For me the central idea of the summer school is the applicability of deep theory to everyday practice. Long a dream held by researchers such as me, these connections become more “real” every year as the theoretical abstractions of yesterday become the concrete practices of today. It’s breathtaking to see how far we’ve come from the days when I was a student just beginning to grasp the opportunities afforded by ideas from proof theory, type theory, and category theory (the Holy Trinity) to building beautiful software systems. No longer the abstruse fantasies of mad (computer) scientists, these ideas are the very air we breathe in PL research. Gone are the days of ad hoc language designs done in innocence of the foundations on which they rest. Nowadays serious industrial-strength languages are emerging that are grounded in theory and informed by practice.

Two examples have arisen just this summer, Rust (from Mozila) and Swift (from Apple), that exemplify the trend. Although I have not had time to study them carefully, much less write serious code using them, it is evident from even a brief review of their web sites that these are serious languages that take account of the academic developments of the last couple of decades in formulating new language designs to address new classes of problems that have arisen in programming practice. These languages are type safe, a basic criterion of sensibility, and feature sophisticated type systems that include ideas such as sum types, which have long been missing from commercial languages, or provided only in comically obtuse ways (such as objects). The infamous null pointer mistakes have been eradicated, and the importance of pattern matching (in the sense of the ML family of languages) is finally being appreciated as the cure for Boolean blindness. For once I can look at new industrial languages without an overwhelming sense of disappointment, but instead with optimism and enthusiasm that important ideas are finally, at long last, being recognized and adopted. As has often been observed, it takes 25 years for an academic language idea to make it into industrial practice. With Java it was simply the 1970’s idea of automatic storage management; with languages such as Rust and Swift we are seeing ideas from the 80’s and 90’s make their way into industrial practice. It’s cause for celebration, and encouragement for those entering the field: the right ideas do win out in the end, one just has to have the courage to be irrelevant.

I hope to find the time to comment more meaningfully on the recent developments in practical programming languages, including Rust and Swift, but also languages such as Go and OCaml that are also making inroads into programming practice. (The overwhelming success and future dominance of Haskell is self-evident. Kudos!) But for now, let me say that the golden age of programming language research is here and now, and promises to continue indefinitely.

Update: word smithing.

16 Comments | Programming, Research, Teaching | Tagged: OPLSS14, programming languages | Permalink
Posted by Robert Harper

Bellman on “Dynamic Programming”April 21, 2014

Everyone who has studied algorithms has wondered “why the hell is Bellman’s memorization technique called dynamic programming?”. I recently learned the answer from my colleague, Guy Blelloch, who dug up the explanation from Richard Bellman himself:

“I spent the Fall quarter (of 1950) at RAND. My first task was to find a name for multistage decision processes.

“An interesting question is, ‘Where did the name, dynamic programming, come from?’ The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word, research. I’m not using the term lightly; I’m using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term, research, in his presence. You can imagine how he felt, then, about the term, mathematical. The RAND Corporation was employed by the Air Force, and the Air Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what name, could I choose? In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various rea- sons. I decided therefore to use the word, ‘programming.’ I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying—I thought, let’s kill two birds with one stone. Let’s take a word that has an absolutely precise meaning, namely dynamic, in the classical physical sense. It also has a very interesting property as an adjective, and that is it’s impossible to use the word, dynamic, in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It’s impossible. Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to. So I used it as an umbrella for my activities” (p. 159).

As with algorithms, so too with dynamic languages?

Update: why is it called “memoization” and not “memorization”?

Update: rewrite of the commentary.

40.443442 -79.944632

40 Comments | Research, Teaching | Tagged: dynamic and static typing | Permalink
Posted by Robert Harper

Old Neglected Theorems Are Still TheoremsMarch 20, 2014

I have very recently been thinking about the question of partiality vs totality in programming languages, a perennial topic in PL’s that every generation thinks it discovers for itself. And this got me to remembering an old theorem that, it seems, hardly anyone knows ever existed in the first place. What I like about the theorem is that it says something specific and technically accurate about the sizes of programs in total languages compared to those in partial languages. The theorem provides some context for discussion that does not just amount to opinion or attitude (and attitude alway seems to abound when this topic arises).

The advantage of a total programming language such as Goedel’s T is that it ensures, by type checking, that every program terminates, and that every function is total. There is simply no way to have a well-typed program that goes into an infinite loop. This may seem appealing, until one considers that the upper bound on the time to termination can be quite large, so large that some terminating programs might just as well diverge as far as we humans are concerned. But never mind that, let us grant that it is a virtue of T that it precludes divergence.

Why, then, bother with a language such as PCF that does not rule out divergence? After all, infinite loops are invariably bugs, so why not rule them out by type checking? (Don’t be fooled by glib arguments about useful programs, such as operating systems, that “run forever”. After all, infinite streams are programmable in the language M of inductive and coinductive types in which all functions terminate. Computing infinitely does not mean running forever, it just means “for as long as one wishes, without bound.”) The notion does seem appealing until one actually tries to write a program in a language such as T.

Consider computing the greatest common divisor (GCD) of two natural numbers. This can be easily programmed in PCF by solving the following equations using general recursion:

The type of defined in this manner has partial function type , which suggests that it may not terminate for some inputs. But we may prove by induction on the sum of the pair of arguments that it is, in fact, a total function.

Now consider programming this function in T. It is, in fact, programmable using only primitive recursion, but the code to do it is rather painful (try it!). One way to see the problem is that in T the only form of looping is one that reduces a natural number by one on each recursive call; it is not (directly) possible to make a recursive call on a smaller number other than the immediate predecessor. In fact one may code up more general patterns of terminating recursion using only primitive recursion as a primitive, but if you examine the details, you will see that doing so comes at a significant price in performance and program complexity. Program complexity can be mitigated by building libraries that codify standard patterns of reasoning whose cost of development should be amortized over all programs, not just one in particular. But there is still the problem of performance. Indeed, the encoding of more general forms of recursion into primitive recursion means that, deep within the encoding, there must be “timer” that “goes down by ones” to ensure that the program terminates. The result will be that programs written with such libraries will not be nearly as fast as they ought to be. (It is actually quite fun to derive “course of values” recursion from primitive recursion, and then to observe with horror what is actually going on, computationally, when using this derived notion.)

But, one may argue, T is simply not a serious language. A more serious total programming language would admit sophisticated patterns of control without performance penalty. Indeed, one could easily envision representing the natural numbers in binary, rather than unary, and allowing recursive calls to be made by halving to achieve logarithmic complexity. This is surely possible, as are numerous other such techniques. Could we not then have a practical language that rules out divergence?

We can, but at a cost. One limitation of total programming languages is that they are not universal: you cannot write an interpreter for T within T (see Chapter 9 of PFPL for a proof). More importantly, this limitation extends to any total language whatever. If this limitation does not seem important, then consider the Blum Size Theorem (BST) (from 1967), which places a very different limitation on total languages. Fix any total language, L, that permits writing functions on the natural numbers. Pick any blowup factor, say , or however expansive you wish to be. The BST states that there is a total function on the natural numbers that is programmable in L, but whose shortest program in L is larger by the given blowup factor than its shortest program in PCF!

The underlying idea of the proof is that in a total language the proof of termination of a program must be baked into the code itself, whereas in a partial language the termination proof is an external verification condition left to the programmer. Roughly speaking, there are, and always will be, programs whose termination proof is rather complicated to express, if you fix in advance the means by which it may be proved total. (In T it was primitive recursion, but one can be more ambitious, yet still get caught by the BST.) But if you leave room for ingenuity, then programs can be short, precisely because they do not have to embed the proof of their termination in their own running code.

There are ways around the BST, of course, and I am not saying otherwise. For example, the BST merely guarantees the existence of a bad case, so one can always argue that such a case will never arise in practice. Could be, but I did mention the GCD in T problem for a reason: there are natural problems that are difficult to express in a language such as T. By fixing the possible termination arguments in advance, one is tempting fate, for there are many problems, such as the Collatz Conjecture, for which the termination proof of a very simple piece of code has been an open problem for decades, and has resisted at least some serious attempts on it. One could argue that such a function is of no practical use. I agree, but I point out the example not to say that it is useful, but to say that it is likely that its eventual termination proof will be quite nasty, and that this will have to be reflected in the program itself if you are limited to a T-like language (rendering it, once again, useless). For another example, there is no inherent reason why termination need be assured by means similar to that used in T. We got around this issue in NuPRL by separating the code from the proof, using a type theory based on a partial programming language, not a total one. The proof of termination is still required for typing in the core theory (but not in the theory with “bar types” for embracing partiality). But it’s not baked into the code itself, affecting its run-time; it is “off to the side”, large though it may be).

Updates: word smithing, fixed bad link, corrected gcd, removed erroneous parenthetical reference to Coq, fixed LaTeX problems.

40.436644 -79.927319

30 Comments | Programming, Research, Teaching | Tagged: functional programming, primitive recursion, programming languages, type theory, verification | Permalink
Posted by Robert Harper

What, If Anything, Is A Declarative Language?July 18, 2013

Back in the 1980’s it was very fashionable to talk about “declarative” programming languages. But to my mind there was never a clear definition of a “declarative language”, and hence no way to tell what is declarative and what is not. Lacking any clear meaning, the term came to refer to the arbitrary conflation of functional with logic programming to such an extent that “functional-and-logic-programming” almost became a Germanic thing-in-itself (ding an sich). Later, as the logic programming wave subsided, the term “declarative”, like “object-oriented”, came to be an expression of approval, and then, mercifully, died out.

Or so I had thought. Earlier this week I attended a thriller of an NSF-sponsored workshop on high-level programming models for parallelism, where I was surprised by the declarative zombie once again coming to eat our brains. This got me to thinking, again, about whether the term has any useful meaning. For what it’s worth, and perhaps to generate useful debate, here’re some things that I think people mean, and why I don’t think they mean very much.

“Declarative” means “high-level”. This just seems to replace one vague term by another.
“Declarative” means “not imperative”. But this flies in the face of reality. Functional languages embrace and encompass imperative programming as a special case, and even Prolog has imperative features, such as assert and retract, that have imperative meaning.
“Declarative” means “functional”. OK, but then we don’t really need another word.
“Declarative” means “what, not how”. But every language has an operational semantics that defines how to execute its programs, and you must be aware of that semantics to understand programs written in it. Haskell has a definite evaluation order, just as much as ML has a different one, and even Prolog execution is defined by a clear operational semantics that determines the clause order and that can be influenced by “cut”.
“Declarative” means “equational”. This does not distinguish anything, because there is a well-defined notion of equivalence for any programming language, namely observational equivalence. Different languages induce different equivalences, of course, but how does one say that one equivalence is “better” than another? At any rate, I know of no stress on equational properties of logic programs, so either logic programs are not “declarative” or “equational reasoning” is not their defining characteristic.
“Declarative” means “referentially transparent”. The misappropriation of Quine’s terminology only confuses matters. All I’ve been able to make of this is that “referentially transparent” means that beta-equivalence is valid. But beta equivalence is not a property of an arbitrary programming language, nor in any case is it clear why this equivalence is first among equals. In any case why you would decide a priori on what equivalences you want before you even know what it means to run a program?
“Declarative” means “has a denotation”. This gets closer to the point, I think, because we might well say that a declarative semantics is one that gives meaning to programs as some kind of mapping between some sort of spaces. In other words, it would be a synonym for “denotational semantics”. But every language has a denotational semantics (perhaps by interpretation into a Kripke structure to impose sequencing), so having one does not seem to distinguish a useful class of languages. Moreover, even in the case of purely functional programs, the usual denotational semantics (as continuous maps) is not fully abstract, and the fully abstract semantics (as games) is highly operational. Perhaps a language is declarative in proportion to being able to give it semantics in some “familiar” mathematical setting?
“Declarative” means “implicitly parallelizable“. This was certainly the sense intended at the NSF meeting, but no two “declarative” languages seemed to have much in common. Charlie Garrod proposes just “implicit”, which is pretty much synonymous with “high level”, and may be the most precise sense there is to the term.

No doubt this list is not exhaustive, but I think it covers many of the most common interpretations. It seems to me that none of them have a clear meaning or distinguish a well-defined class of languages. Which leads me to ask, is there any such thing as a declarative programming language?

[Thanks to the late Steve Gould for inspiring the title of this post.]

[Update: wordsmithing.]

[Update: add a remark about semantics, add another candidate meaning.]

[Update: added link to previous post.]

24 Comments | Research, Teaching | Tagged: declarative language, functional programming, imperative programming | Permalink
Posted by Robert Harper

Exceptions Are Shared SecretsDecember 3, 2012

Exceptions are commonly criticized as being the “goto’s” of modern programming languages. Raising an exception transfers control to an unknown destination, it is said, and this is a bad thing on engineering grounds. I disagree. It is perfectly predictable where a raised exception will be handled, provided that exceptions are done properly, which, unfortunately, is not always the case.

The crucial point that exception values are shared secrets. Let us distinguish two parties, the raiser of the exception, and the handler of it. The fundamental idea of exceptions is to transfer a value from the raiser to the handler without the possibility of interception by another party. While the language of secrecy seems appropriately evocative, I hasten to add that I am not here concerned with “attackers” or suchlike, but merely with the difficulties of ensuring modular composition of programs from components. In such a setting the “attacker” is yourself, who is not malicious, but who is fallible.

By raising an exception the raiser is “contacting” a handler with a message. The raiser wishes to limit which components of a program may intercept that message. More precisely, the raiser wishes to ensure that only certain previously agreed-upon components may handle that exception, perhaps only one. This property should remain stable under extension to the program or composition with any other component. It should not be possible for an innocent third party to accidentally intercept a message that was not intended for it.

Achieving this requires a secrecy mechanism that allows the raiser and the handler(s) to agree upon their cooperation. This is accomplished by dynamic classification, exactly as it is done properly in Standard ML (but not O’Caml). The idea is that the raiser has access to a dynamically generated constructor for exception values, and any handler has access to the corresponding dynamically generated matcher for exception values. This means that the handler, and only the handler, can decode the message sent by the raiser; no other party can do anything with it other than pass it along unexamined. It is “perfectly encrypted” and cannot be deciphered by any unintended component.

The usual exception mechanisms, as distinct from exception values, allow for “wild-card handlers”, which means that an exception can be intercepted by a third party. This means that the raiser cannot ensure that the handler actually receives the message, but it can ensure, using dynamic classification, that only a legitimate handler may decipher it. Decades of experience with Standard ML shows that this is a very useful thing indeed, and has application far beyond just the simple example considered here. For full details, see my forthcoming book, for a full discussion of dynamic classification and its role for ensuring integrity and confidentiality in a program. Dynamic classification is not just for “security”, but is rather a good tool for everyday programming.

Haskell is one language that does not get exceptions right. Allocating an exception incurs a storage effect, and so would have to be confined to the IO monad in Haskell. But this would destroy the utility of exceptions in pure code. The result is an exception mechanism that is arguably broken and that does not provide the guarantees that make exceptions a perfectly pleasant and useful way to write code.

Update: Reworked last paragraph to clarify the point I am making; the previous formulation appears to have invited misinterpretation.

Update: This account of exceptions also makes clear why the perennial suggestion to put exception-raising information into types makes no sense to me. I will write more about this in a future post, but meanwhile contemplate that a computation may raise an exception that is not even in principle nameable in the type. That is, it is not conservativity that’s at issue, it’s the very idea.

Update: Wordsmithing, removal of irrelevant remarks to focus on the main point about dynamic exceptions.

40.436644 -79.927319

35 Comments | Programming, Research, Teaching | Tagged: exceptions | Permalink
Posted by Robert Harper

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4