Wikimedia Foundation employee Oliver Keyes in March 2012.

WhoAPI case study: Wikimedia

Oliver Keyes is a programmer and Human-Computer Interaction researcher who works for the Wikimedia Foundation. He’s also the author of the WhoAPI R client package.

About a year ago, I was tinkering around with API client packages in R – I like exposing data and making it available. At the same time, I’d been approached by our lawyers wondering if I could (in my spare time) automate their trademark checks. These things combined nicely into “what if we could automatically grab WHOIS data?”

There are a lot of WHOIS services out there, many with APIs attached, but I was immediately impressed by WhoAPI’s API – a simple, uniform interface with trivial authentication and clear documentation. I was even more impressed when I dropped the company a note and woke up to an email from the CEO asking what they could do to make their service easier to integrate with, and whether I’d had any problems doing so thus far. At that point, we parted ways for a while, simply because my spare time got thinner on the ground.

Once some time freed up, though, I was back at it, and even though the legal use case had vanished into the past, making information available was still a goal, and in this case, a particularly useful goal since reducing the barrier to information security work and user tracing is always a good thing. The result is the whoapi client package for R, currently on GitHub and slated for a CRAN release shortly.

I’ve now written 7 or 8 client packages, along with a host of other R packages, and WhoAPI has been one of the easiest to write, if not the easiest – the entire package comes to 63 lines of code absent inline documentation. This is largely thanks to the API’s design: authentication checks are made with the query itself, the results are returned in a sensible JSON format that notes missing data values rather than ignoring those key-value pairs entirely, and server-side error messages are consistently surfaced – also as JSON blobs – in a way that means client-side error handling consists largely of passing out that JSON. The semantic structure of the API makes it easy to predict how to access particular functionality, and the documentation is clearly written and extensive.

Whois API with JSON or XML

Whois API with JSON or XML

The package doesn’t cover all of WhoAPI’s functionality, because some of it isn’t the sort of thing R programmers are likely to be interested in, but it does cover most of it. The first step to using that functionality is creating a token with whoapi_token, which does things like note your API access key and user agent – ensuring you can access the service but be respectful of the service by allowing them to distinguish users and reach out if someone is unintentionally causing a problem. It sets a default user agent if you don’t specify one, and then they come shout at me instead ;p.

Once you’ve created that token, it can be passed through to the functions that directly interface with WhoAPI. For direct WHOIS and certificate information, we have is_taken, which returns whether a domain is reserved or held as a simple boolean, is_blacklisted, which provides both a boolean check on whether the domain is blacklisted by anyone and a more granular breakdown of who particularly is blacklisting it, and whois_info and certificate_info to retrieve information about the domain’s WHOIS lookup and SSL certificates.

As well as this data, though, WhoAPI also provides a lot of fantastic metadata about domains, and my R package is only too happy to make that accessible. With domain_rank and domain_search you can check how common metrics for a website’s popularity (PageRank and Alexa Rank) rate a particular domain, and how many search results it returns through Google or Bing. domain_location gives you information about where the domain geolocates to, while domain_metadata lets you grab literal metadata – the title and description found within the website’s index page.

With this, most of the use cases around WHOIS information can be seamlessly handled in R – all you need is this package and an API key, and you’re good to go. It wouldn’t have been anywhere as easy without the excellence of WhoAPI’s design, or Hadley Wickham’s httr package.

Oliver Keyes

Programmer and Human-Computer Interaction researcher who works for the Wikimedia Foundation. He's also the author of the whoapi R client package.

Post a Comment

Your email address will not be published. Required fields are marked *