Author Archive

The use of bleeding-edge technology in the enterprise can be a daunting prospect. There are bugs to deal with, nuances to learn and third party libraries to overcome. Our team has been dealing with all of these issues over the past few months since one of our clients decided to use node for an upcoming project.

Node provides an event-driven JavaScript engine for the development of server-side applications. If the git logs are to be believed, the first commits to node were made on the 16th of February, 2009. That’s about a year and a half ago as I write this. As such, the surrounding community is still quite young. It’s this immaturity that has presented us with some of our biggest challenges while using node.

One early issue we ran into was template engines: it became apparent rather early on that our choice of template engines for node was somewhat limited. Although there are plenty of projects dedicated to delivering template engines for node, many are of a similar style: clones of HAML, cTemplate, ERB/EJS and a few other weird variants.

Our team started out using json-template, but found it cumbersome to structure our template data in the same way it was to be rendered (amongst other things). We then tried various other template engines and encountered similar issues. The nail in the coffin was that the client wasn’t too keen on the idea of using EJS or HAML style templates.

So we took matters into our own hands and churned out the initial version of jazz over the course of a few evenings.

Jazz is a simple, but powerful templating engine that compiles down to JavaScript at runtime. It supports many features expected of modern templating engines with simple syntax and there are plenty of examples of how to use it. Feedback from the client has been positive, with jazz “just working” where other template engines were causing grief.

Of course, this isn’t to say that there is something wrong with all existing template engines for node: many people on the node mailing list are happy with things like mustache.js and the many HAML-esque engines out there. Our needs/wants are just a little different to what is currently popular in the node community.

So if you’re using node and the more popular template engines weird you out, consider giving jazz a try. You can get the (GPLed) source code using the following git command:

$ git clone git://github.com/shinetech/jazz.git jazz

If you like jazz, please help us make it better! We’re always looking for patches, so drop me a line (thomas [dot] lee [at] shinetech.com) with any improvements.

You’ve probably heard it somewhere already: NoSQL is the new hotness. There are a growing number of weirdly named storage engines out there purporting to be part of the NoSQL movement. This post is the first of a small series about some recent work we’ve been doing with CouchDB. The project is still ongoing, but now seems like an appropriate time to cover what I’ve learned so far. If I’ve made any newbie mistakes, please feel free to flame me in the comments.

Anyway, let’s get on with it then. In this post, I’ll talk a little about the bits of CouchDB that have instantly appealed to me. The good bits.

CouchDB fundamentals are easy to understand

People rave about performance vs. traditional RDBM systems and its powerful replication features, but the thing that really stands out for me is how easy it is to understand and start using CouchDB. I love CouchDB’s simplicity — and I’m not just talking about its neat little web-based user interface, Futon. If you can largely comprehend JSON, JavaScript and HTTP, you’re well on the way to understanding the basics of CouchDB.

In a single CouchDB instance you can have any number of databases. Each database acts as an isolated namespace for a collection of documents. Each document is represented in CouchDB as a JSON object. Documents are not associated with one another in any meaningful way as they might be in an RDBMS. CouchDB databases may also have any number of “design documents”, which are special documents where you may write functions for views and validation functions, among other things.

Users define views in CouchDB by writing functions that map a document to one or more key/value pairs and, optionally, functions to somehow combine the aggregate values associated with each key. This approach is more popularly known as Map/Reduce. Out of the box, CouchDB supports JavaScript for writing Map/Reduce functions using Mozilla SpiderMonkey.

With all this talk about JSON and JavaScript, you might be thinking that most of your interactions with CouchDB are going to involve JavaScript in one form or another — and you probably wouldn’t be far off the mark. By default, CouchDB supports JavaScript for Map/Reduce functions, but is capable of supporting any language for which a query server implementation exists. See CouchDB’s documentation on the topic for supported languages.

CouchDB’s API is HTTP and JSON

Another thing that makes CouchDB great for developers is its RESTful HTTP API and the use of JSON objects to represent documents and collections of documents. This gives us a lot of flexibility when choosing tools — flexibility that we probably wouldn’t have if we were working with custom, binary or closed protocols and/or data formats.

For example, I recently ran into something that many mainstream CouchDB client libraries seem to struggle with: stream-based parsing of JSON arrays. It was a simple matter to quickly hack together a solution to my particular problem using Jackson, JRuby and plain old HTTP. Running into this kind of limitation in a library supporting a closed protocol or even an open-but-unfamiliar binary protocol (as in MongoDB) would have required significantly more effort on my part.

No Architectural Lock-in

CouchDB doesn’t force you into any arrangement of nodes. All CouchDB instances are equal, with no explicit master/slave relationships. Any CouchDB instance can push to any other CouchDB instance. This means you are free to pick an arrangement of Couch nodes that works for your particular application. This is probably to be expected, after all: NoSQL is about choice.

Easy Replication

Replicating data from one CouchDB database to another is a snap. Even if the target database is over a network. Further, replication is very easy to set up and use. I’ve not yet had the opportunity to use this feature in the work we’re doing beyond a few simple tests, but it’s already quite clear that CouchDB’s support for replication is going to make our lives a lot easier when it comes time to scale out.

The Stuff You Get For Free

Append-only writes ensure that data corruption is not an issue.

Automatic conflict resolution ensures that peers can replicate between one another without needing too much hand-holding.

There’s a lot of stuff in CouchDB that you probably don’t experience directly, but that you will often hear touted as benefits. Some of these features are really “behind the scenes” and don’t necessarily jump out to slap you in the face. As a result, I think to a degree we take a some of these more subtle CouchDB features for granted.

A Summary of the Good Bits

The system is conceptually simple. The JSON/HTTP API is beautiful and easy. Replication between nodes is a walk in the park. As a result, you can arrange CouchDB nodes in a way that suit you and your problem. Then, of course, there all the little things that CouchDB does which you will likely take for granted until they save your skin.

There’s a lot to like about CouchDB, but it hasn’t been all roses. In the next part of this series, I’ll talk a little about the aspects of the system that have caused us some pain on a real-world project.

One of my pet hates in software development is repetitive tasks: a complicated deployment process, tricky configuration of an application, repetitive editing motions that are just a little too messy for a find/replace. All of the above have, in the past, killed my concentration, dulled my senses and otherwise numbed my brain. I quickly realized that learning the ins and outs of a few powerful tools can take away a lot of the drudgery that sometimes crops up in my day-to-day work. Further, combining these tools makes it possible to automate just about everything that I need to do on a regular (or even not-so-regular) basis.

What’s important about everything that follows isn’t necessarily the process/tools itself, but what you get out of learning them. This is just what works for me, what has accumulated over a few years.

0. Pick a Good Editor (or IDE)

This first point is a half-hearted gesture because I don’t actually expect anyone to listen to me.

Use vim.

In all seriousness, if your editor doesn’t support basic macros and/or keyboard recording you’re going to encounter situations where you get slowed down by dull, repetitive editing. Even in an IDE that supports refactoring, macros will save you time and brain-dead copy/pastes. Automating repetitive tasks in editing source code is one of the simplest ways you can help yourself.

1. Learn Your Shell (or Learn Your IDE)

I think the Pragmatic Programmer said it well before I even knew what a “shell” was: the command-line is powerful.

Becoming very familiar with a powerful shell is one of the best things you can do for yourself as a developer. Yes, the syntax can be funky and the semantics confusing but over time you come to love it. For me, it’s the Unix shell: in *nix, the shell becomes the glue between all the other tools you use to get work done — including custom scripts and programs that take care of the heavy lifting for you. Windows users might be interested in PowerShell, which is a similar offering from Microsoft — you have the power of the .NET framework at your fingertips.

I understand, too, that folks accustomed to IDEs tend to get a little irked by shell programming. I still strongly recommend giving it a go, but scripting or extending your IDE to automate certain tasks you perform on a regular basis is a great start. If you find you lose a lot of time uploading and modifying configuration files to various servers, you may want to write an Eclipse plugin to make that process as simple as the click of a button.

Whatever path you choose — learning the more exotic aspects of a shell or the plugin API of your favourite IDE — learn it well. The only way to learn it well is to use it all the time.

2. Learn a Dynamically Typed Programming Language

If your shell scripts start to look like a train wreck when you need to perform complex operations, you may want to consider falling back to a scripting language: Perl, Python and/or Ruby are all great choices — bonus points if you mix the three to meet your needs. Languages like these will give you a more structured environment in which to express what you’re trying to achieve (at the cost of it being slightly more difficult to invoke external processes than, say, Bash — a standard Unix shell).

I think it’s important to note that I mention “dynamically typed” languages here because they generally don’t have a separate compilation step: bytecode compilation occurs as part of the execution of a Perl/Python/Ruby program. Thus, a small change doesn’t require an ant/make call to pull your code together again. Remember: you’re trying to automate a process, not create more process. Build management is overhead.

By the same token, if your program needs low-level access to hardware or raw speed for processing large quantities of data — don’t be afraid to fall back to a language like C or Java if you’re sure of your reasons for doing so. Needless to say, situations such as this are rare but not impossible.

3. Appreciation of the Supporting Cast

ssh, scp, rsync, svn/cvs/git, diff, patch, wget, sed, grep, find … the list of handy *nix command-line tools goes on and on. These are smaller tools that — with a good shell — can easily be combined to automate more complicated process. The output of these programs can also be piped into a script of your own doing to perform more complicated operations; then, the output of that program can be piped into another and so on.

This is the big thing I miss when working in Windows: even with PowerShell in play to alleviate the pain that is DOS, it has terrible tool support out of the box. Having the .NET framework on hand is great, but even that’s no substitute for the *nix command-line tools mentioned above. Rant aside, there are ways around most of the shortcomings in Windows and it’s possible to emulate some of the more useful command-line *nix tools. It just means you may need to put in more effort than a *nix counterpart at times.

For folks interested in automating with their IDE, your “supporting cast” might mean third party Java/.NET libraries which your automation plugin(s)/scripts can use to perform more complicated operations.

4. Invest Time in Learning Your Tools

Really, it all comes down to learning the tools from steps 1-3 backwards. Windows, Linux or OSX — you should be an expert in how your tools work and ideally use them on a daily basis. Really getting to know your tools might mean stepping out of your comfort zone a little to learn the nuts and bolts of how your shell ticks, or taking some time out of your day to learn certain esoteric API calls, or the semantics of some new construct in your scripting language of choice.

The goal of such practice is to surround yourself with an easily extensible suite of tools where every action feels like it’s an obvious choice. These tools will help to take away some of the unavoidable “noise” on the fringes of any software development project, and let you get on with getting your job done.

5. Write Lots of Scripts to Automate Common Tasks

This is at the core of automation: combine the above tools to get things done. Copy files to a server using scp, have your script extract and install it via ssh, manipulate configuration files using sed, grep and perl before finally starting the application and storing the process ID in a file with a quick shell command. All without you typing much more than “deploy”.

Automated tasks should be easy to kick off, require little-to-no input from you and — ideally — be fast, so you don’t have to wait long to determine if anything went haywire.

Automate at the First Sign of Resistance

Once you become accustomed to your tools — and again, by no means am I implying that those described here are the best or the only — you’ll step back when you start feel some resistance performing a task throughout your day:

Hmm. I don’t want to have to go through that 12-step deployment process all over again.

It becomes second nature to whip up a script that not only does the complicated deployment but verifies that the application is up and running as expected, checking a process exists and scanning the logs for hints indicating that everything’s fine. Over time you should find your automated tasks are easier to write and “feel” much more reliable than any manual process.

And, hopefully, you’ll be happier for it. :)

Ruby on Rails comes with a lot of nice helper methods for generating the JavaScript driving the AJAX calls to your controllers. Handling the responses from the HTTP server becomes a snap too, with Rails providing a few simple callbacks to handle the response from the server:

But what if the server doesn’t get a chance to respond at all? What if the user’s browser has been unexpectedly disconnected from the Internet? What if your server - god forbid - has crashed?

Most AJAX applications in the wild will simply sit there forever waiting for a response. Certainly, that’s what happened to us when we first tested such a scenario. This is unacceptable: if the browser is unable to communicate with the server, we need to let the user know somehow.

Internally, Rails uses a cross-platform Javascript library called Prototype to do the heavy lifting when making AJAX calls. When you call a helper method like link_to_remote, Rails generates code to instantiate a Prototype Ajax.Request object (or a Ajax.Updater object in some cases) to make the remote call when a certain event occurs. In the case of link_to_remote, this will be when a user clicks on the generated link.

Many of the helper methods accept a common set of options which provide callbacks for handling success or failure using the following callbacks:

  • :success will be executed if the server responds to the AJAX request with any sort of 200 HTTP OK code.
  • :failure will be executed if the server responds to the AJAX request with anything *but* a 200 code.
  • :complete will be executed after the AJAX request has finished, irrespective of the result.

These callbacks can be mapped to corresponding callbacks in Prototype on a one-to-one basis. For the above callbacks, the corresponding Prototype callbacks are called onSuccess, onFailure and onComplete, respectively. Unfortunately, only the “complete” callback will be triggered if some fatal error occurs when communicating the server - “failure” will only be triggered if we get a response back from the remote host indicating a server-side. We have no way of determining what will happen if the browser is unable to communicate with the server at all.

So how do we detect server outage? Well it just turns out that if your browser is unable to communicate with the server, an exception will be thrown. We can actually use this as a basis for detecting server outages and other classes of errors that might occur browser-side. For this purpose, Prototype provides onException which is triggered by your browser if any sort of exception is thrown while the AJAX call is in progress. If we can tap into this callback, we can ensure our AJAX methods will dutifully report any problems that might occur.

Unfortunately Rails doesn’t yet provide access to this callback out-of-the-box, so there is no “:exception” callback available to the Rails helpers. We don’t want to clutter our templates with ugly JavaScript: we’d prefer to keep with the existing Rails conventions if we can. So what can we do?

PrototypeHelper#build_callbacks is used by remote_function to generate the Javascript responsible for making an AJAX call in helper methods like link_to_remote, form_remote_tag and others. It turns out that this little method exposes virtually all the callbacks available to Prototype *but* onException.

This is the hook we’re looking for: here we override and adapt the original code in PrototypeHelper#build_callbacks to add the following method to our ApplicationHelper:

include ActionView::Helpers::PrototypeHelper # ... def build_callbacks(options) callbacks = {} options.each do |callback, code| name = 'on' + callback.to_s.capitalize if CALLBACKS.include?(callback) callbacks[name] = "function(request){#{code}}" elsif callback == :exception callbacks[name] = "function(request,exception){#{code}}" end end callbacks end

This makes the :exception callback available to Rails using the familiar notation seen for the other types of callback. :exception is somewhat different in that it provides the exception object thrown internally available to your callbacks as the “exception” variable. Outside of that, it’s just like any other callback.

With this new callback available to us, we can then override the AJAX helper methods to include our error handling code in ApplicationHelper. As an example, here’s how we’d do it for form_remote_tag:

alias old_form_remote_tag form_remote_tag def form_remote_tag(options={}, &block) # In the event of a network failure (or some other error), we want to display an error message options[:exception] = (options[:exception] || '') + 'alert("Anerror occured while communicating with the server. Please try again later.");' + # Bubble the javascript exception up to prevent the caller from continuing execution (failure # to do this may result in multiple error messages being displayed!). 'throw exception;' old_form_remote_tag(options, &block) end end

With this code in place, any attempts to submit a form via AJAX will include our special error detection code, so we can rest at ease knowing our users will be notified if something goes horribly wrong at the network or language leve.

When using this technique to check, you have to be quite careful in your JavaScript code. Where any language- or browser-level exception that was thrown during the AJAX request was once ignored, it will be caught by the :exception callback. This means syntax errors in the response Javascript, bad content types (e.g. a content type of “text/javascript” with a body of HTML) or other errors will trigger the :exception callback. Unfortunately there doesn’t seem to be an obvious way to get more information about the nature of the error. For our purposes, however, this seemed to be enough if we were careful with the AJAX responses.

A further improvement to this approach might use a timeout mechanism to provide a hard limit for AJAX requests: if the server doesn’t respond with data within X seconds, abort the request and display an error. This would remove the reliance on the browser reporting errors via an exception, resulting in somewhat more robust code - the feedback may not be as immediate as if one were using exceptions, however.

Don’t leave your users out in the dark: keep them informed if things go wrong with an AJAX request.