Monday, July 29, 2013

Nil is not enough

Here's the context: I'm separating out all my data-store-related functions into their own namespace. Specifically, I'm working on my User model. I want to write a function that creates a new user in my data store, but only if there's not already a user with the same username. No problem, right? If the user already exists, I just return nil, the same as for any other error.

Now I'm using the create-user function I wrote above, and I pass in a user name, and I get back nil. Does that mean there was a database error (not enough space left on drive) or does that mean there was already a user with the same name? I want to get additional information out of nil, but nil doesn't contain any information. It's nil!


There are a couple of approaches I can take to this problem. One is inspired by my PHP day job: instead of returning nil, I can return a structure that takes one of the two following forms:

{:status :ok, :data {:username "joe" :password "asdf1234" :etc "etc..."}
{:status :error, :reason :duplicate-user}

So for my functions, I can just follow a convention: they all return a map with a :status key that I can check to see whether or not the call succeeded. If it did, my data is stored under the :data key, and if not, I can check the :reason key to see why it didn't.

This is fairly versatile, but a bit bulky, and roundabout. In PHP this is probably as close as we can get to an optimal solution, but in Clojure we can go one better: if the function fails, just return a keyword instead of the data. For example, in my create-user function, I could return :duplicate-user when the user already exists, or :disk-full-error when the query fails due to a disk full error.

Now I can check whether create-user succeeded just by checking whether or not the result is a keyword. If it is, I can drill down and get more information about what went wrong just by checking which keyword I got back.

Thanks to namespaces, I can use this technique even on functions that are supposed to return keywords. Suppose I had a json-decode function that converts individual strings to keywords. Contrived example, sure, but bear with me. I might want to use :invalid-format to signal errors in the decoding process, but if someone sent the string "invalid-format", I'd get a false error message.

If I use a namespaced keyword like :json-error:invalid-format, no problem, or at least a much-reduced problem. Collisions are still possible, but are much less likely to happen by accident.

I don't recall seeing this approach much in the existing libs I've used, but I think it has the potential to avoid some of the issues that arise from nil punning (Stuart Sierra gives a good example). Why don't more people use it? Is it because they haven't thought of it? Is it because they'd rather write (if result...) instead of (if (keyword? result) ...)?

I'm going to start using this in my own code, so I guess I'll find out first-hand. Either way, your comments are welcome.

2 comments:

  1. The idiomatic approach in PHP and Java would be to throw an exception. In something like Scala or Haskell you'd return a tagged union such as Either, or you'd create your own.

    I do Scala at my day job, and I resorted to my own tagged unions precisely for the reason you mentioned, I want more data about why something failed. I do one more thing though. I include relevant diagnostic data in the structure I return. For example (something I worked on today actually), if some limit has been reached, I'll put that limit in the "error object" I'm returning. This is very useful with optimistic concurrency, where by the time you get some error, the values may have changed already.

    I think for this particular reason, adding diagnostic data to error conditions, I think I'd keep returning a map or some other compound structure.

    ReplyDelete
  2. That sounds like a good upgrade from the simple keyword result. I also thought of another approach that would work with PHP, assuming you didn't want to throw exceptions: define a class to serve as an error indicator. I was thinking "Flag", for brevity.

    if(empty($dbQueryResult) {
    return new Flag("No data");
    }

    ...

    if(isa($returned, "Flag")) {
    // handle error...
    }

    And so on.

    ReplyDelete