A video tutorial with a code editor and a canvas, a somewhat gentle introduction to programming with processing. With rampant enthusiasm, the host walks you through the basics of the canvas, drawing shapes, colors, and interactivity.
Nomography, or Computing without Computers, are graphical devices for approximating functions—these charts and graphs allow for complex calculation and algorithms, without demanding more than being able to draw a straight line.
Akin to slide-rules, these have fallen out of fashion with the rise of computers, but they still look rather wonderful. For more information, check out The Lost art of Nomography and PyNomo.
It’s hard to come up with anything more important
in crypto. It’s the starting point for everything.
Who is attacking? How will they attack? How will you mitigate it? These are the most important questions you must ask of any security claims, and unfortunately the ones often unanswered.
Threat modelling is one of the many ways in which you evaluate the promises of security techniques, usability another. Kerckhoff’s Principles outlines what made early crypto systems humane, and much of it still holds true today–
The system must be substantially, if not mathematically, undecipherable;
The system must not require secrecy and can be stolen by the enemy without causing trouble;
It must be easy to communicate and retain the key without the aid of written notes, it must also be easy to change or modify the key at the discretion of the correspondents;
The system ought to be compatible with telegraph communication;
The system must be portable, and its use must not require more than one person;
Finally, given the circumstances in which such system is applied, it must be easy to use and must neither stress the mind or require the knowledge of a long series of rules.
Security is not just a technical problem, it is a social one —a secure system is one that accounts for users, rather than blames them. If you don’t take users into account, or attackers, what you have is not security, but security theatre.
The Practice of Programming, by Kernighan and Pike, is one of my favourite books about programming. The book covers style, algorithms and data structures, design and implementation, interfaces, debugging, testing, performance, portability, and notation, in a clear, readable and understandable style that is lacking in many other technical books.
Each chapter is short, stands alone, peppered with examples, and summed up neatly at the end. Reading this book didn’t automatically make me a better programmer, but it encouraged me to think about programming, and gave me solid advice for the troubles ahead.
Littered with war stories, the chapters take you through example code, asking questions, and providing some answers. They don’t tell you how code should look, but explore how code can be improved. (Aside, the first chapter in its entirety was reprinted another book “Beautiful Code”).
Don’t loan it to friends, they might never return it—One of my copies spent five years with a co-worker who proclaimed that they’d read it, but kept going back to it for advice in times of trouble. I can’t blame them.
When I suggested languages to start with, a friend asked me why I didn’t recommend PHP. Thankfully, Google knows the answer:
Although some people confuse PHP for a scripting language, perhaps PHP is really two things: The most popular C web framework, and one of the most powerful templating languages out there.
PHP is the framework, and unlike Django and Rails much of the logic is written in C—including how routing, argument decoding and session handling work by default. PHP is also the template language, and unlike Python or Ruby, “Hello World” is a quine.
Bashing PHP by a long list of quirks and misfeatures has been done to exhaustion, and misses the reason people use it: It’s popular, and programmers use popular things. Personally, I don’t think PHP is all that bad, it is still probably the easiest way to write a cgi-bin script that spits out html, and most of the PHP you will ever need has been written for you already.
For beginners, diving in and hacking at a larger codebase can be just as challenging, fun, and frustrating than writing all new code. Although I have recommended software written in PHP to people, I don’t recommend it for new projects, or to beginners. Why? Internationalization and Security.
PHP itself, and many major products in it have a rather hilarious security record, even rivalling Paul Vixie’s, but that is not my real concern. Writing secure code, even if just the trio of Cross Site Request Forgery, Cross Site Scripting, and SQL Injection, requires expertise in a language touted as being for beginners. PHP does very very little to prevent these or mitigate them effectively, by default in a new project.
Internationalization is still quite hard in PHP too, and the last major attempt to make it easier, PHP6, failed. Although less of a concern to those who suffer from american exceptionalism, ASCII and Latin-1 aren’t good enough for the rest of the world, and PHP’s companion, MySQL doesn’t help much with this either.
Still, if you think Wordpress, Drupal, MediaWiki et al, solve most of your problems, use them. They’ve worked around the big issues, and have helpful supportive, communities. For new programmers writing new programs, other languages may be less painful if you don’t speak US-ASCII and don’t want to be hacked.
This isn’t directed so much at those who are already wedded and entrenched in the world of PHP. I’m not asking you to abandon PHP, but fix it—Not by writing a new framework atop, but by fixing what lurks beneath.
As a card carrying member of the church of http and all that is Fielding, I’m often prone to pedantical fits when I hear people talking about ReST. It’s up there with agile, as a technical term which has lost it’s meaning as it has been adopted or co-opted by the community at large.
RPC goes back a long way, and one of the earliest, RFC 707, describes it as a way of sharing resources over a network, but the modern usage is probably closer to remote function or method calls. So let’s talk about remote method invocation over HTTP, usually with the following components–
An endpoint — One URL exposed on the server side.
A request format — How to call a specific function.
A response format — How the return value is packages
And if you’re lucky, a schema for the latter two which can be hopefully generated into stub client code. JSON-RPC, XML-RPC and SOAP all fall into this pattern. Tunneled over HTTP, but. not taking advantage of HTTP Let’s see if we can make RPC play nicely with HTTP, and get web-services to behave more like websites.
Step 1: Different things have different URLs.
Instead of having one endpoint, let’s expose the class and method name in the URL. Instead of /endpoint, we’ll go to /endpoint/Class/method. Now we can get http servers to route methods to different machines, transparently to the client.
Step 2: Not everything needs to be POST
Although POST works well, some requests don’t have side effects, and can be safely retried. Let’s take those commands and use GET instead. Now middleware can retry commands if they fail, without having to inspect the messages.
Step 3: Caching is good for you.
Caching is a useful tool to stop websites falling over under load, and web-services should be able to take advantage of it. By adding cache-control headers, we can avoid hammering a service for relatively static information, or responses that don’t need to be accurate. Additionally, Edge caching makes this a tantalising proposition.
Step 4: Expose the data types.
We can start by exposing the schema at the end point, adding a header that says “This response uses this schema”. A great place to do this is in the content-type: field. Instead of just “application/json”, we can use “application/vnd.fartapp.fartcount+json”. Now clients can ask for the version they understand, rather than breaking the API and starting afresh. When you update a schema, both versions can be served concurrently, without breaking old clients.
Step 5: Expose the service
You can open a website in a browser and click around to find pages, without having to remember all the URLs in advance. Being able to explore the service, without having the schema in advance means it’s far easier to debug and check what is going on.
If you connect to a web-service endpoint, it can list the classes and methods available. If you get a response, it could also tell you what methods are available too, and you can even add new methods into the list. Old clients can ignore them, but new clients can check the responses and use the new options available. This loose coupling allows for a smooth migration path.
Step 6: Make the URLs Opaque
Instead of the client knowing where everything is advance, we can tell the client in the responses where to go next. Websites already take advantage of this, Images are often served under a different domain, and URL schemes can change but the interface stays the same. When a service grows too large, it often gets split into different clusters or services. We can break the existing API and get the application to track this, or we can put a router in front of the services which knows where to send requests.
We can even run clone services on different domains entirely. The only URL that changes in the client, is where it starts, and the rest it learns as it goes along, navigating responses.
Step 7: Carry state inside the URLs
Clients now open resources, follow links, and submit requests, all without knowing the exact URL schema. This means we can hide things within the URLs, to carry information from one request to the next.
When you go to a forum, click on a thread, read a post and hit reply, you don’t have to repeat yourself and tell the server which thread or post you’re replying it. The link you clicked has this information inside “/forum/reply?id=12345”. Web services can do a similar trick.
A common tactic to scale out services is to fragment them, or shard them. Hosting different parts of the data on different services. With a traditional RPC API, you have two options, either add this state to every method exposed, or put middleware infront to route the requests. With opaque URLs, you have a third option.
Your endpoint can redirect requests to /endpoint?cluster=A, and expose the same information, but the URLs in the response all have that same bit of state inside. Instead of breaking clients, or middleware, you can let the client handle it transparently.
So is this ReST? Maybe, but really it’s loose coupling between services, and exploiting HTTP for all that it is worth. By taking these steps we’ve turned a hardcoded set of stubs into a loosely coupled system which can change and adapt without breaking compatibility, and we can debug it too. We’re not hardcoding locations or URLs, but hardcoding request and response types. We’re no longer running a web service, but running a machine-readable website.
Why filesystems have loose-coupling and your protocol doesn’t
Interfaces are a powerful concept in programming: the idea that you can separate the abstract behaviour from the concrete implementation with a protocol: an agreement on behaviour between components.
Good interfaces can lower cognitive overhead, and encourage code-reuse, but they don’t guarantee it. Merely hiding implementation detail in itself does not make libraries easy to compose, or replace. These properties are commonly known as loose coupling, which comes from a larger focus on decomposition, clear management of dependencies, and separation of concerns.
Unfortunately, loose coupling is one of those things where “I know it when I see it”, and it’s hard to break down the constraints and design choices except through hindsight. Thankfully we have plenty of examples, so let’s look at two interface styles, which I will term “RPC style” and “Filesystem style”.
An RPC style API is one with a focus on verbs, IMAP, POP3 and SMTP all fall into this pattern. Like many application protocols, the client issues textual commands and the server responds with numerical codes, but each of these protocols has a different command set and response set. Although you can swap out implementations, they are hard to extend: each new feature requires a new command to be added, subtly changing the base protocol.
Meanwhile, a filesystem style has a focus on nouns, or filenames, with a comparatively small set of operations (open, close, append). This commonality of interface means it’s quite easy to swap out one filesystem with another, or even use network filesystems without substantial changes to interface or code, but you might be lured into thinking they are less extensible because of the fixed command set. If you fix the verbs, how do you extend the system? Through different types of files, and filesystems.
Plan 9 pioneered this approach, making unix’s ‘most things are files’ into 'everything is a file’, or more precisely, “all services are filesystems”. Instead of adding new commands (or in unix terms, ioctls), operating system services were exposed as regular files.
For example, network protocols became a directory of connections /net/tcp/1, /net/tcp/2. To create a new connection, /net/clone was read, which returned the connection number. Inside the connection directory, there were a number of files, like “ctl” and “data”. To send data you wrote to the data file, and to configure the connection you wrote to the ctl file. This turned out to be far more extensible, without having to write all new commands for each service.
New features don’t change the protocol. New services speak the same basic interfaces. Instead of encoding the behaviour in the protocol, the behaviour is determined by the underlying resources or files and how the client interprets them. You don’t need a separate filesystem API for image editors, text editors, or IDEs.
This design is often called the “Uniform Interface Constraint”, and is present in one of the more popular protocols, HTTP. Although it shares text commands and numeric responses with earlier internet protocols, it is far more like a filesystem than an RPC system—a small set of common verbs. HTTP when used well, can result in amazing amounts of loose coupling: you can use the same caches, proxies and load balances, to speak to a variety of services. Similarly to Plan9, instead of hard coding absolute paths to files, files can link to each other.
By comparison, BEEP (A HTTP competitor), set out to provide a generic and extensible protocol for services. Instead of focusing on a common interface, it focused on a common transport layer. It aimed for extensibility, with workarounds for problems with using TCP (Pipelining, Head of Line blocking, and Multiplexing), unfortunately with a considerable implementation cost.
Although designed as a transport for IMAP, POP3 and the like, most new protocols copy them, or tunnel everything over HTTP. BEEP hasn’t seen the adoption the designers hoped for, possibly because they felt using XML for connection negotiation was reasonable, but URLs were too complex.
Ultimately, extensibility in a protocol doesn’t come from adding new commands and responses, it comes from being able to use existing commands on different resources. Uniform interfaces are at the heart of loose coupling in systems.
Despite this, many people who use HTTP today ram everything through one resource via POST requests. Although Plan 9 gave us UTF-8 and /proc, the idea of exposing services as filesystems has yet to go mainstream. Building in loose-coupling doesn’t mean that developers will take advantage of it.
When I’m not fighting programmers who are dismissive of learning, I often get into fights with other people in the industry, who seem heartbroken that people might go and learn a thing.
In Nick Marsh’s recent piece, “Why I’m not learning to code”, he opines that great coders are numb machines for delivering code at the expense of all other sensation, and that those learning to code are doing so to gain power, rather than be driven by curiosity or understanding.
Why I’m not learning to code—Because spending time understanding coders is more valuable than understanding code itself.
I think what the people looking for code power really want is to not be confused by code, and to be able to use the power of code to get what they want. But I think learning to code yourself is a very inefficient way of doing this.
If you want to make use of the power of code, and you aren’t already a coder, the best thing you can do in my opinion is to find some people who are really good at coding, and make yourself invaluable to them.
His argument is clear “Don’t learn to code if you want to build things, hire them instead”. Confusingly, the same author made a passionate plea for everyone to learn SQL, a few weeks later.
It took me a while to realise the power of knowing SQL. I guess it was because I’m so used to asking developers to give me answers to questions I have, or to relying on the user interfaces provided by the big web analytics providers. And that was always good enough.
But, as is always the case, you don’t know what you don’t know – and what I didn’t realise was how hamstrung and limited my understanding of how the apps I was working on worked, and what people were doing with them.
As a result I’ve become totally converted to the idea that everyone working on web products should learn some SQL. […] You don’t need to be a query master to make use of SQL to the highest level. You just need to be aware of how structured data works and be able to write some simple data extraction queries so you can self serve reports as and when you need them.
Although contradictory in titles and outcome, what separates them is the underlying assumption of what code is and what code is for. At first, code is seen as a means shipping a new iPhone fart app, but later on the author comes to see code as a means to communicating with the computer directly. What started as a defiant streak of wilful ignorance turned into a plea for everyone to learn.
The author isn’t the only one to struggle with this distinction. Even long time coders suggest “Why not learn to Plumb?” (and the answer of course is, why not?), but code is much more than just a way to funnel shit around, but a sandbox for learning and experimentation.
In the late 60’s Seymour Papert argued that teaching children to code would give them powerful tools for learning and creativity, and he was right—Code can be used to teach complex ideas in mathematics, help children understand grammar, to create art, music and, most importantly of all, have fun.
I’m not saying everyone should learn the web framework of the month, or that everyone should be able to recite QuickSort before being allowed to touch a keyboard, but in effect we’re dragging people around art galleries, and chastising them for picking up crayons: If you want art! Hire an artist!
It’s ok that everyone isn’t a poet, or a lawyer, or a journalist, but it’s a good thing that many are literate. It’s also ok that not everyone is an embedded engineer, a web developer, or a data analyst, but it is a good thing for more to be able to express their ideas on a computer.
Code is about communication, expression, and play—not just apps.
I have a thing for film noir detectives, and good debugging stories. This is the latter. Fred (or as I know him, MononcQc) writes another excellent piece on tracking down transient production errors.
The slides and video for D. Richard Hipp’s talk on the development of SQLite.
It’s no secret SQLite is one of my favourite databases. A project with one comment per two lines of code, and where the test data vastly outnumbers the code by more than seven hundred to one—it’s an incredibly reliable piece of software, with a level of engineering discipline I can only aspire too.
Despite some pseudoscience in the talk about left and right brains, there is a clear moral here: Document your code, and document the changes. Write defensive code and thoroughly test it. Fix the processes that lead to bugs, not just the buggy code.
Programming languages grow or stagnate. No language is created perfectly, and often it takes a few years to sort out the good ideas from the bad ideas. Earlier design decisions can hamper the evolution of a language, and often the core developers decide it’s time for a breaking change.
Breaking changes are good for language developers, they remove obstacles to newer features, and eliminate a slew of edge cases. Breaking changes aren’t so good for developers though, code requires rewriting, libraries require updating. A considerable amount of work to turn a working program into a working program. Backwards incompatibility is a tradeoff between existing and new developers, and existing and new projects too. It isn’t easy.
Perl 6, is probably the best example of what happens when you abandon backwards compatibility. Perl 5 was burdened with the earlier semantics of the language, and the codebase had the scars to prove it. Larry decided it was time for a breaking change, and left Perl 5 in the hands of community.
As Larry went on his merry way, creating one specification to unite coders of all backgrounds, Perl 5 languished and CPAN stagnated. Thirteen years later, Perl 5 is finally picking itself up again—with calls for a rewrite, but without changing all of the language at the same time. Perl 5 isn’t dead, but the days of being the duct-tape of the internet are behind it.
A stable and mature Perl 6 implementation may eventually appear within my lifetime, until then it stands as yet another example of the second system effect. Pretty much every other language has been more successful at managing change, and some don’t even wait for a major version number before making backwards incompatible changes.
Ruby added “Unicode Support” in 1.9, not by introducing a unicode type, but by attaching an encoding to every String object. Ruby 2.0 almost caused a mutiny by introducing refinements, but eventually the feature was toned down and the developers put down their torches and pitchforks. Despite Ruby’s charge forwards, some codebases remain on 1.8, without the care and attention needed to upgrade.
Meanwhile Python 3 obsoleted many old features, without substantially changing the language, and many changes were backported into 2.7. Unfortunately, Python 3 changed the language and the C API at the same time, creating a chicken and egg problem: Users wouldn’t move until libraries did, and library maintainers wouldn’t move until users did. The process was a bumpy one, but slowly the migration is happening.
When you write a new incompatible version of your language, library, or framework—you are forking it. If you don’t change enough, people will be reluctant to upgrade. If you change too much, it may be easier for developers to move to another language, library, or platform.
Alternatively you can be chained to backwards compatibility forever, and if you have enough money and time, this can work out. For the rest of us, the tradeoffs involved in changes are a very thorny path. It’s easy to poke fun at the problems in hindsight, but I’ve yet to see a language handle evolution gracefully.