programming is terriblelessons learned from a life wasted

Reflections on Trusting Trust, by Ken Thompson

Reflections on Trusting Trust, by Ken Thompson (cm.bell-labs.com)

As a programmer, I write programs. I would like to present to you the cutest program I ever wrote.

This is a classic paper on programming and security, about self referential programs called Quines. A quine is a program that outputs its source code

There is a trick to writing quines, so if you’ve never tried to write one, you might want to give it a try before reading this paper.

Quines come in many forms, some are a series of programs which output the next in sequence, like this famous perl spiral. Others are written in multiple languages, like this python and ruby quine I wrote. What Thompson does with their Quine is rather insidious, but I won’t spoil the fun

You once referred to computing as pop culture.

It is. Complete pop culture. I’m not against pop culture. Developed music, for instance, needs a pop culture. There’s a tendency to over-develop. Brahms and Dvorak needed gypsy music badly by the end of the 19th century. The big problem with our culture is that it’s being dominated, because the electronic media we have is so much better suited for transmitting pop-culture content than it is for high-culture content. I consider jazz to be a developed part of high culture. Anything that’s been worked on and developed and you [can] go to the next couple levels.

One thing about jazz aficionados is that they take deep pleasure in knowing the history of jazz.

Yes! Classical music is like that, too. But pop culture holds a disdain for history. Pop culture is all about identity and feeling like you’re participating. It has nothing to do with cooperation, the past or the future — it’s living in the present. I think the same is true of most people who write code for money. They have no idea where [their culture came from] — and the Internet was done so well that most people think of it as a natural resource like the Pacific Ocean, rather than something that was man-made. When was the last time a technology with a scale like that was so error-free? The Web, in comparison, is a joke. The Web was done by amateurs.

From Interview with Alan Kay, the second grumpiest programmer in the world. (The first is Ted Nelson).

Object-Oriented Programming Versus Abstract Data Types, by William R. Cook

Object-Oriented Programming Versus Abstract Data Types, by William R. Cook (cs.utexas.edu)

This tutorial collects and elaborates arguments for distinguishing between object-oriented programming and abstract data types. The basic distinction is that object-oriented programming achieves data abstraction by the use of procedural abstraction, while abstract data types depend upon type abstraction. Object-oriented programming and abstract data types can also be viewed as complementary implementation techniques: objects are centered around the constructors of a data abstraction, while abstract data types are organized around the operations. These differences have consequences relating to extensibility, efficiency, typing, and verification; in many cases the strengths of one paradigm are the weaknesses of the other. Most object-oriented programming languages support aspects of both techniques,not a unification of them, so an understanding of their relative merits is useful in designing programs
“we write everything small, thus saving time”, or perhaps UNIX’s lowercase traditions go back to the bauhaus.

“we write everything small, thus saving time”, or perhaps UNIX’s lowercase traditions go back to the bauhaus.

On the Notion of Inheritance (1996) by Antero Taivalsaari

On the Notion of Inheritance (1996) by Antero Taivalsaari (citeseerx.ist.psu.edu)

One of the most intriguing—and at the same time most problematic—notions in object-oriented programming is inheritance. Inheritance is commonly regarded as the feature that distinguishes object-oriented programming from other modern programming paradigms, but researchers rarely agree on its meaning and usage.

Yet inheritance is often hailed as a solution to many problems hampering software development, and many of the alleged benefits of object-oriented programming, such as improved conceptual modelling and reusability, are largely credited to it. This article aims at a comprehensive understanding of inheritance, examining its usage, surveying its varieties, and presenting a simple taxonomy of mechanisms that can be seen as underlying different inheritance models.

Inheritance, like OO is one of those concepts that is ill-defined, with a variety of different implementations and uses. This paper is an excellent summary of the different forms and uses of inheritance. Delegation vs Composition, Subclassing vs Code-Reuse, Late vs Early binding, and various other technicalities often glossed over.

I reckon your message broker might be a bad idea.

As a curmudgeon, I am often full of reckons. A reckon is often confused for an opinion, but it’s more like a heuristic. It isn’t always true in theory, but it’s often true in practice. Today’s reckons are about message brokers.

They are used by inexperienced programmers to glue systems together. Although they are easy to set up and to get going, when they fail, they take the entire application with them — a broker is often trading convenience for a central point of failure.

Before you start throwing away your working code base, I want to ask you a few questions: What do you do when the broker fails? What happens if a message isn’t handled? What about consumers or producers crashing? Knowing the answers to these, are the first steps to understanding if your broker is helping or hurting your system.

To show how a broker can hurt, let’s look at a classic use: running tasks in parallel over a cluster of machines. In particular, I encountered problems while working on a distributed web crawler built around a message broker.

The crawler starts from a seed list of URLs, pushing them into the broker, meanwhile a number of workers pull out pages to crawl and then push any found links back through the broker, into the crawler. Both the workers and crawler were run by hand. Simple enough, but we encountered many failure cases.

If a single worker died, it would take the task with it, we could resend after a timeout, but if the task froze the worker, eventually all of the workers would halt. If enough workers died the queue would fill up unknowingly and collapse, or worse, the crawler would sit idly for days, not realising anything was broken. When the crawler broke, a similar process happened. The workers would churn away for hours, racking up cpu time to no avail. Occasionally, we could restart the workers or crawler, and everything would be fine. Sometimes we’d have to reset the queues, throw away the work and restart. These failures cost us a lot of money, and an awful lot of time, things a small company didn’t have an abundance of.

The more experienced engineers amongst you will know what we had done wrong. Brokers are a great way of isolating components in your system, and unfortunately we’d isolated the crawler and workers from finding out if something had failed.

There are many ways to fix this: We can set a hard limit on the queue, or a maximum rate of messages, or even set a time to live on the messages. Introducing back-pressure in the system would allow errors to propagate quicker, saving us some time. Another complementary technique is to add acknowledgement messages, allowing the crawler to know when a job was accepted, and heartbeat messages to know that the job was still ongoing.

Instead, we got rid of the message broker.

The crawler communicated with the workers directly, and managed the worker pool. When launched by the crawler, workers would poll the crawler for work, sending all of the results back. If a worker failed, the crawler would notice, and kill the remote process. If the crawler failed, the workers would terminate. At all times, the crawler knew which machine was doing which task. At last, something in the system was responsible for handling failure, beyond the already stretched operations team.

At this point, the canny reader will shout “Ha! But there is queueing going on here”, and sure enough, there was. Instead of a queue hiding away in the middle of the network, we’d pushed it into the crawler. There was still messaging too, but we were just using TCP.

We’d ignored and subsequently embraced three good design principles for reliable systems: Fail fast, Process Supervision, and the End-to-End principle.

Fail fast means that processes give up as soon as they encounter failure, rather than ambling on and hoping for the best. Process supervision means that errant components are watched and restarted. These two principles alone account for much of the reliability of Erlang/OTP. On the other hand, The End-to-End principle is why TCP works.

TCP gets reliable delivery over an unreliable system by getting the hosts to handle missing or broken messages, rather than burdening it with the routers. Handling things at the edges covers problems in the middle. We’d done something similar, pushing the queue inside the crawler: what was now responsible for handing out work was now responsible for handling when it failed.

Not every queue was like ours, although i’ve encountered a few. A similar pattern emerges when brokers are persistent: The temptation is to avoid storing messages at the ends, and when the broker fails, all of the new messages are lost. When you restart the broker, the system still doesn’t work until the backlog of messages is cleared.

In the end, it isn’t so much about message brokers, but treating them as infallible gods. By using acknowledgements, back-pressure and other techniques you can move responsibility out of the brokers and into the edges. What was once a central point of failure becomes an effectively stateless server, no-longer hiding failures from your components. Your system works because it knows that it can fail.

Remember the golden rule of reckons: It depends. Engineering in practice is not a series of easy choices, but a series of tradeoffs. Brokers aren’t bad, but the tradeoffs you’re making might not be in your favour.

To Wash It All Away — James Mickens’ final usenix column.

To Wash It All Away — James Mickens’ final usenix column. (usenix.org)

Mickens takes on the rubix cube made of razor blades that is modern web development, with his usual panache and cutting wit.

Things You Might Not Know — The Miura Fold.

I talk to Tom Scott about one of my favourite origami models — The miura fold.

For those who want to try this at home, here is how to fold it.

Scratch: Programming for Everyone

Scratch: Programming for Everyone (web.media.mit.edu)

What happened to the initial enthusiasm for introducing programming to children? Why did Logo and other initiatives not live up to their promise? There were several factors:

  • Early programming languages were too difficult to use. Many children had difficulty mastering the syntax of programming.
  • Programming was often introduced with activities (generating lists of prime numbers, or making simple line drawings) that were not connected to young people’s interests or experiences.
  • Programming was often introduced in contexts where no one had the expertise needed to provide guidance when things went wrong – or encourage deeper explorations when things went right.

Papert argued that programming languages should have a low floor (easy to get started with) and a high ceiling (opportunities for increasingly complex projects over time). In addition, we believe that languages need wide walls (supporting many different types of projects, so that people with different interests and learning styles can all become engaged). Satisfying the triplet of low-floor/high-ceiling/wide-walls hasn’t been easy.

Programming with building blocks.

Over the last six months, I’ve probably spent more time programming in Scratch than any other language, and despite the limitations it’s rather lovely to program in. I’ve made a lot of little games, toy examples, as well as operator precedence parsers (the latter was one of the more painful exercises).

Scratch isn’t really like other programming languages. For a start, it comes with a sandbox of its own: there is a stage with sprites that can move around and interact with each other, as well as being able to insert pictures and sound. However the biggest difference is how programs are assembled — from building blocks, not from text.

image

With scratch you build programs from square-ish building blocks, snapping them together inside the editor. Programmers in Scratch never receive a syntax error — No missing semicolons, no unmatched braces. There isn’t any tabs versus spaces palaver either, and it’s wonderful.

Block based programming is very similar to what Bret Victor calls “Sentence based configuration” in his excellent essay Magic Ink: Information Software and the Graphical Interface. The entire essay is worth a gander, but we’re interested in the case study on building a train journey planner, in particular how the application can be configured:

A typical design would use a preference dialog or form that the user would manipulate to tell the software what to do. However, an information design approach starts with the converse—the software must explain to the user what it will do. It must graphically express the current configuration.

For presenting abstract, non-comparative information such as this, an excellent graphical element is simply a concise sentence.

Scratch, albeit visual, is still a language built around sentence fragments. Instead of using punctuation and assorted ascii art to enchant text into code, the structure and action of a Scratch program is almost as clear as prose. By using natural language fragments instead of magic identifiers in text, Scratch is unique in another aspect — Internationalization — Scratch programs can be built up out of German, French, Chinese, Japanese fragments too.

image

The look on peoples faces when you change scratch into their native language is worth treasuring, an aha moment where the code becomes readable in an instant. Scratch is so readable that it has come as a surprise to some, one remarking “I understand this. It can’t be programming”.

Block based programming isn’t without drawbacks. The back and forth of dragging and dropping becomes tiresome, and in Scratch, it’s hard to share code and reuse fragments without duplication. The underlying abstract language isn’t too powerful either — using strings and lists is rather unpleasant for more than the most trivial tasks. I don’t think programmers at large are going to throw away their text editors, or stop arguing over tabs, spaces, or semicolons.

Block based programming is still aimed entirely at beginners, but I’d really like to see how it can grow to encompass more experienced programmers — especially incorporating features from modern IDEs like auto-complete, or refactoring tools. Extending the vocabulary would be useful too — one notable descendent of Scratch, Snap!, adds first class lists, first class procedures, and continuations.

Maybe one day we’ll abandon our text adventures.

Lowering the Barriers to Programming: a survey of programming environments and languages for novice programmers

Lowering the Barriers to Programming: a survey of programming environments and languages for novice programmers (cs.cmu.edu)

“Since the early 1960’s, researchers have built a number of programming languages and environments with the intention of making programming accessible to a larger number of people. This paper presents a taxonomy of languages and environments designed to make programming more accessible to novice programmers of all ages. The systems are organized by their primary goal, either to teach programming or to use programming to empower their users, and then by the authors’ approach to making learning to program easier for novice programmers. The paper explains all categories in the taxonomy, provides a brief description of the systems in each category, and suggests some avenues for future work in novice programming environments and languages.”

We Need To Talk About Binary Search.

We Need To Talk About Binary Search. (canhazcode.blogspot.co.uk)

Binary search is the classical algorithm which is easy to describe, but hard to implement well.

It’s a basic divide and conquer algorithm to find the position of an element within a sorted list: Pick a pivot element in the centre of the list, and either the pivot matches, or the element must be above, or below it in the list. Repeat ad nauseum and return the position, if found

The article suggests an elegant reframing of binary search: Instead of thinking of it as searching for the position of an element within a sorted list, they break it into two problems: searching for the first position, or final position of an element in a sorted list. As a result, the code is much simpler to reason about.