Selenium WebDriver Explained

Providing the best UI automation support for development teams.

photo of Sebastian Monte
Sebastian Monte

Software Engineer in Test

Posted on Mar 16, 2016

The testing team at Zalando is using and extending the Selenium library in order to provide the best UI automation support for development teams. Selenium is a big library and there are many technologies bundled within it: browser drivers, WebDriver Protocol, Selenium clients and so on. If you feel lost in the Selenium jungle, you should keep on reading. After this post you’ll feel a bit more confident about using Selenium, with a basic understanding of how it works. This post will demonstrate what Selenium does under the hood when we are requesting an URL and getting the title of an HTML page.

Our Simple Test

Let’s start with a simple test to check the page title of: https://www.zalando.de:

null

null

null

But when we try to run the test, it doesn’t work:

null

Selenium wants to know the path to a ChromeDriver. But what is a ChromeDriver, exactly?

The release of Selenium 2 included the introduction of WebDriver, a tool that’s responsible for controlling the browser running the automated tests. The WebDriver for Chrome can be downloaded here. When you execute the driver, this message appears in the terminal:

null

The ChromeDriver reserves a port from the machine its running on; in this case port 9515. All WebDrivers work in a similar fashion: they start up a server that receives commands for controlling the browser (for example, setting a browser cookie.) Commands must follow the WebDriver Protocol format, which defines a RESTish JSON API that all WebDrivers must implement. Most programming languages have decent libraries for JSON and HTTP, so clients interacting with the WebDriver are relatively easy to implement.

Now let's see how we can use the WebDriver Protocol to get the page title of: https://www.zalando.de. First, we execute the ChromeDriver and wait for the server to start. When the server is ready, we can send the following request to create a new browser session:

null

Body:

null

Response:

null

The server responds with a sessionId—in this case, e4b3adf2fe9b10fbabd6611d1bb50c93. Next, we want the browser to navigate to https://www.zalando.de:

null

Body:

null

When the command is sent, the browser navigates to https://www.zalando.de/ just as a real user would. Our final step is to GET the page title:

null

The response:

null

And there we have it, our title :)

Usually developers do not use this API directly, but instead download a client library for their programming language. The languages that Selenium officially supports are Java, C#, Ruby, Python and JavaScript (Node).

Java Client

Our team is mostly working with Java, so let’s take a deeper look at the Java client. Now that we have covered how Selenium works with the REST API, you hopefully have a better idea of how a Selenium client works. Basically, as an HTTP client for the WebDriver server when we are creating a new ChromeDriver with:

null

We end up launching the Chrome driver that we downloaded earlier. The call goes through the startSession(Capabilities desiredCapabilities, Capabilities requiredCapabilities) method, and a new browser session is created. Eventually this method makes a POST request to: http://localhost:9515/session.

null

The above method will result in an HTTP POST request to: http://localhost:9515/session/:sessionId/url. If you are interested, you can check how Java code is mapped to WebDriver Protocol URLs in the JsonHttpCommandCodec class.

Inside the Chrome WebDriver

So far we have covered what goes on in the client side. Now, let’s look at what happens inside the Chrome WebDriver server when it accepts the WebDriver Protocol messages. For example, what happens when we do a request like this: http://localhost:9515/session/e4b3adf2fe9b10fbabd6611d1bb50c93/title?

To find out, I cloned the chromium repo. After some searching for code that handles incoming WebDriver Protocol requests, I found the actual file at: chrome/test/chromedriver/server/http_handler.cc. The file maps different WebDriver Protocol URLs to C++ functions. In the case of /session/:sessionId/title, the following function gets executed:

null

The document title is received with a JavaScript function. The web_view->CallFunction uses Remote Debugging Protocol to send a command to Chrome browser. This works in similar fashion to the WebDriver Protocol in that the commands are sent in JSON format, but in this case, they’re targeted to Chrome’s remote debugging port.

Here’s an example of a Remote Debugging Protocol message that gets the title of a page: { "id": 2, "method": "Runtime.evaluate", "params": { "expression": "(function() { return document.title;}).apply(null, [null, null, null])" }} It is possible to evaluate any JavaScript expression on global object with the Runtime.evaluate method. No magic here!

Conclusion

Many things happen when we verify that a page has a correct title. Luckily for us, Selenium offers a simple-to-use WebDriver APIs for different browsers in various languages that hides all the complexity. Want to learn more about Selenium WebDriver? Tweet us @ZalandoTech to let us know want you want our team to write about next!



Related posts