Sunday, November 20, 2011

Re-formatting variable names

(Series: From PHP to Clojure)

Here's a simple problem: as part of my Salty lib, I want a function that will take camel-case variable names, as used by Java, and convert them to the dash format that's idiomatic in Clojure. In other words, I want a function that will take the string "someVariableName" (camel case)and return "some-variable-name" (with dashes).


For comparison, let's do this in PHP first. Here's a simple approach.

function camelToDash($s) {
    $dash = ''; 
    foreach(str_split($s) as $c) {
        if(ctype_upper($c)) {
            $dash .= '-' . strtolower($c);
        } else {
            $dash .= $c;
        }
    }
    return $dash;
}

We take a string, split it into individual characters, and process each character in a loop. If it's an upper case character, we add a dash followed by the lower-cased form, otherwise we just add the (already lower case) character. We had to introduce a local variable to hold the result, and then explicitly return it. And there's a bit of a quirk, in that any strings that begin with an upper case letter will generate a name that begins with a dash. We'll ignore that in this case, because (a) we're only going to pass in strings with a lower case initial character, and (b) in Clojure it's perfectly fine for function names to start with a dash.

Here's the equivalent Clojure code.

(defn camel-to-dash
  [s]
  (apply str (map #(if (Character/isUpperCase %)
                     (str "-" (lower-case %))
                     %)
                  s)))

That's not quite as concise as it could be, mostly because we're borrowing the Java isUpperCase() method from java.lang.Character. Functionally, though, it's roughly the same as the PHP version. Let's unravel it.

The basic approach here is that we're using (map) to apply a function to each of the characters in the string, which gives us a list of results. Then we use (apply str) to turn the list of results back into a single string. This is a common pattern in Clojure, so I wanted to use it here for comparison with PHP's foreach loop. And there are some interesting differences.

Notice, for example, that in Clojure, I didn't need to explicitly split the string into individual characters. Clojure strings are "seqs," which means they're a type of collection that can be treated like a list. The (map) function takes two arguments, a function and a seq, and applies the function to each element in the seq. Since a string is already a seq, I didn't need to do anything special to use it as the second argument to (map).

Notice also that in PHP, I had to create two new variables, $c and $dash, to hold the current character and the accumulated string. In Clojure, I didn't need to do that. The (map) function implicitly knows that it is processing a list, taking one element at a time, and accumulating the results in another list. The whole thing is like a pipeline: a string goes in one end, each character in the string runs through the pipeline, and comes out the other end transformed, where it is collected back into a new string again.

Now let's look at that inner function:

#(if (Character/isUpperCase %)
   (str "-" (lower-case %))
   %)

This is an anonymous function: in Clojure, you don't have to define every function that you use. For simple things, you can define a function on-the-fly and pass it to other functions like (map). In this case, we've got a very simple thing to do: we want to test the current character, and do one thing if it's upper case, and another if it isn't---a simple if/then/else. The (if) statement in Clojure looks like this.

#(if predicate?
   true-expression
   else-expression)

In other words, evaluate the predicate? (test) function; if it returns true, then evaluate true-expression, otherwise evaluate else-expression. In our code, the predicate is the expression (Character/isUpperCase %). The percent sign (%) is a placeholder for the first argument to the anonymous function. If we had more than one argument, the second would be %2, followed by %3, and so on. It's just a built-in shortcut to save us the trouble of giving explicit argument names inside an anonymous function.

Thus (Character/isUpperCase %) says, "Take the current character in the string, apply the isUpperCase method from the java.lang.Character class to it, and return the boolean result." If this result is true, we evaluate the true clause of the if statement, which in this case is (str "-" (lower-case %)). This expression says, "apply the Clojure string function lower-case to the current character, and return a string consisting of a dash plus the lower-cased character." If the predicate returns false, on the other hand, just return the current character unchanged.

In a nutshell, we have an anonymous function that looks at a character and returns dash-plus-lower-case if it's upper case, or just the character itself if not. We then map this anonymous function over all the characters in the string s, giving us an anonymous list of results, and then call (apply str) on this list, which concatenates everything back into a string again.

In some ways, the PHP code is doing the same thing, but from a slightly different perspective. The PHP code is imperative and iterative: you tear down a string, and loop through a procedure that builds a new string based on the old one. The Clojure code, by contrast, is functional, rather than imperative and iterative.

If you were to describe what the Clojure code is doing, you might say, "It applies the str function to the results of mapping a transformation function to each character of the string." That's eerily similar to the code itself: (apply str (map f s))). In Clojure, the description of what you want to do turns out to be pretty much the code for doing it. That is way cool, and it's a taste of how playing with Clojure can change how you think about coding. (Did you know PHP has an array_map() function?)

The Clojure example isn't necessarily the best way to de-camelize a string in Clojure, but I wanted to keep it as parallel to the PHP base as possible. Clojure enthusiasts, feel free to suggest more idiomatic alternatives in the comments. Meanwhile, I'm going to see if I can get SyntaxHighlighter working on this blog. Bear with me if the coloring looks funny in the short term, here.

4 comments:

  1. It's great to see Clojure experimentation going on!

    How about using a regex pattern match:

    (require '[clojure.string :as str])

    (str/replace "camelCase" #"[A-Z]" #(str \- (str/lower-case %)))

    ReplyDelete
  2. Awesome, that's actually the code I wanted to write, but I didn't know you could pass an anonymous function as the third arg to (replace). Thanks!

    ReplyDelete
  3. By the way, this code appears in Salty's find.clj file, which has the :require and :use statements needed to make it work. I left some of those details out of the blog post in the interests of minimizing distracting details.

    ReplyDelete
  4. Heh. I have to admit that I didn't know replace could also take a function either, until I checked the docs; I'm still learning too... but this is why blog posts and public discussions are so great - they stimulate this kind of discovery :-)

    ReplyDelete