Sunday, November 27, 2011

What I've learned from Salty

I spent the weekend hacking at my Salty library for Clojure. Salty is a thin wrapper around the Selenium WebDriver framework, and it allows you to programmatically control a web browser like Firefox, as I've mentioned before. It turns out there is already a Clojure lib for WebDriver---clj-webdriver---that is full-featured, easy to use, and very mature. For that reason, I'm probably not going to invest too much more effort in Salty. But that's fine: I mostly wanted to try this as a learning exercise, and clj-webdriver makes it easier for me to check my homework. Here's a few notes on what I think I got right and what I think I got wrong.

Looking at Salty vs clj-webdriver, the biggest thing I notice that I got wrong is sticking too close to the spirit of Java. The Java library has a zillion and one functions for everything, and so, pretty much, does Salty. The clj-webdriver, by contrast, exposes an elegantly simple DSL for working with web pages programmatically. Here, for example, is how you fill in a form using clj-webdriver:


(require '[clj-webdriver.form-helpers :as form])
    
(form/quick-fill b [{{:class "text", :name "login"}     "username"}
                    {{:xpath "//input[@id='password']"} "password"}
                    {{:value #"(?i)log"}                click}])

A browser and a vector of maps, where the key is a map that lets you identify which input to fill in, and the value is the text you want to type into the field. My Salty lib would let you accomplish the same results, but in a more piecemeal fashion.

(require '[salty.ui :as ui]
         '[salty.find :as find])
    
(-> (find/find-by-css ".text[@name='login']")
    (ui/type-into "username"))
(-> (find/find-by-id "password")
    (ui/type-into "password"))

Salty doesn't have the equivalent of locating an input by value, so I can't even do the last one as written. Wouldn't be too hard to add, but it wouldn't fix the fundamental problem: my approach is a lot closer to the metal than clj-webdriver, and thus my code is harder to use. What really hurts is that I was specifically trying to write a more domain-friendly API! Years of imperative programming are coming back to haunt me.

I do feel rather pleased with a couple of Salty features. One is the dual interface. In salty.core, you can write

(with-browser :firefox :at "http://www.google.com/"
  (-> (find/find-by-css "input.button")
      (ui/click))

Or, via the salty.impl namespace, you can access the close-to-metal versions, which require you to type in the driver/browser as the first argument. In other words, there's a version where you use (with-browser) to set up a default browser to use with all enclosed salty functions, and there's a version where you specify the browser as the first argument to every function (allowing you to, say, script an interaction where more than one browser is active at a time). That strikes me as being the right balance between ease-of-use and flexibility/control.

I also had fun implementing a Salty wrapper around the Selenium Actions class. The Actions class has methods that let you build up a complex composite series of actions, like like "click on this element with the control key pressed, and drag down by 32 pixels, and release it." The design pattern being used requires you to create an Actions object first, then add all the composite actions to it, then call perform() at the end to actually execute the actions you've set up. Each new action takes the Actions object as its first argument, so it's prefect for the -> operator.

In other words, we're looking at an ideal use case for a macro. The arguments are going to consist of one or more actions, each of which must not be evaluated until the middle of the body of the macro. Plus, we have to create the initial Actions object first, and then call .perform at the end---if we forget the .perform, nothing's going to happen, no matter how many actions we specify. So I wrote a "run" macro that would take a list of actions and then execute them correctly.

The catch was that some of the actions were things like "hold down the control key" and "let up on the control key," with a warning on the latter that the behavior would be undefined if you called keyUp() with no preceding keyDown(). That sounds bad, so once again, I thought a macro would be a good solution. I wrote macros for with-shift, with-control and with-alt that would wrap around any arbitrary list of actions. Each macro presses and holds the key, executes the given set of actions, and then releases the key.

The catch here was that with-shift and friends all have to run inside my "run" macro, so they have to return a function that fits a particular signature. It took a few iterations of experimentation, but I finally got the arguments and everything in the right place so it would compile. (I never got to see it actually execute the commands, though, so either there's still a bug somewhere, or else I might not have Selenium installed right.)

What have I learned (besides the importance of doing a thorough search for existing libraries)? My main take-away has been that you get the best answers when you ask the right questions. I started this project with the idea that I was going to wrap a Java lib in Clojure, and the result, despite my conscious effort to write a domain-friendly DSL, ended up being a Java lib in Clojure. Had I approached from the domain side ("How can I control a web browser to do integration testing?"), I think I might have come up with an API closer to clj-webdriver.

I'll probably play with this a little more, just to see if I can get the (run) and (with-shift) stuff working the way I want. It might even make a worthy addition to clj-webdriver at some point. Meanwhile, I had a blast, and learned a few things in the process.

No comments:

Post a Comment