Technistas

Matthew D. Laudato writes about software and technology

Posts Tagged ‘XML

Using REST APIs from R – XML operations

with 3 comments

In the first part of this series, I showed you how to make calls to REST APIs from R. In this part, we’ll look at how to work with the XML documents that the REST APIs return.  I’ll stick with the Constant Contact v1 APIs, since I’m most familiar with those and since the data (campaign statistics) is appropriate for analysis in R.

Once you have the raw XML from a REST API in an R variable, you need to parse it in order to extract the data that you’re interested in. To do this, we use the ‘XML’ package in R. First, load the package with:

> library(‘XML’)

If we start with the ‘campaignsXML’  vector from the previous post, we can easily create a DOM object that contains the XML in a form that is useful for extracting data. The ‘XML’ library in R makes this easy:

> campaignDOM = xmlRoot(xmlTreeParse(campaignXML))

This creates an DOM object called ‘campaignDOM’ that represents the contents of campaignXML. To get data from this object, we’ll write our own function that iterates over the nodes in the DOM object and extracts the data. As a simple example, let’s say we wanted a vector of all the campaign names, perhaps to use later as labels on campaign statistics graphs. The function to do this looks like:

getCampaignNames <- function(doc) {
 namelist <- NULL
 for (i in 1:xmlSize(doc)) {
    node <- doc[[i]]
    namelist <- c(namelist, node$children[["content"]]$children$Campaign[["Name"]]$children$text$value)
 }
 namelist
}

There are two key parts to this function. First, the for loop iterates over all nodes in the DOM object – the function xmlSize(doc) from the ‘XML’ package returns an integer representing the number of nodes in the object. Then for each node, getCampaignNames extracts the value of the campaign name and adds it to the ‘namelist’ vector, which is returned when the function completes. The syntax for how to access nodes and children can be a little daunting, but remember, it’s just XML and all you’re really doing is walking the tree. One useful fact: the node functions in the ‘XML’ package are fairly forgiving, so in our example, even though there are several nodes in the DOM object that aren’t of type ‘content’, we don’t need to do any special checking for that condition. Nodes that don’t have ‘content’ will be silently ignored and thus the c(namelist, …) function call will not push them onto the ‘namelist’ vector.

For convenience, you should place the function definition in a file called ‘getCampaignNames.R’ and load it as needed with:

> source (“getCampaignNames.R”)

Putting this all together, you can get the vector of campaign names by simply calling the function:

> namelist <- getCampaignNames(campaignDOM)

If we do this on my Constant Contact account, the ‘namelist’ vector will contain the following:

> namelist
[1] “Created via API30”    “Created via API205”   “Created via API24”
[4] “Created via API23”    “Created via API22”    “Created via API21”
[7] “Created via API20”    “Created via API19”    “Created via API18”
[10] “Created via API17”    “Created via API16”    “Created via API15”
[13] “Created via API13”    “Created via API12”    “Created via API11”
[16] “Created via API10”    “Created via API9”     “Created via API8”
[19] “Created via API6”     “Created via API5”     “Created via API4”
[22] “Created via API3”     “Created via API2”     “Created via API”
[25] “BlockTest 20110425”   “Social Test 20110407” “Feb 16 2011”

Overall, R combined with the RCurl and XML packages makes for a powerful system to get data from REST APIs and then process the resulting XML. In the next installment in this series, we’ll look at actual campaign statistics and use R to do some basic campaign analysis.

Happy Model Building!

– Matt

Advertisements

Written by Matthew D. Laudato

June 16, 2012 at 5:15 pm

Using REST APIs from R

with 2 comments

It’s hard to read a website, blog post or even mainstream business press article without coming across the term ‘big data’. Big Data is one of those terms that means nothing and everything all at once, and for that reason alone, you should pay attention to it. When it comes to the how of big data, it’s equally hard to avoid bumping into R, the open source statistics and computational environment. If you’re going to describe and model the behavior of your customers in a big data initiative, R is one tool that you need in your software toolbox.

Data is everywhere, and increasing, data of all kinds on customer engagement is available through REST APIs. I started a little project in my spare time to bring data available via REST interfaces into R, to set the stage for doing what I expect to be some fairly sophisticated model building. The rest of this post is a quick introduction to how to work with REST APIs in R.

At its most basic, calling a REST API to obtain data involves making an HTTP GET request to a server. If the call succeeds, you’ll have a document that contains the requested data. In R, the best way to make these requests is by using RCurl. The RCurl package is – you guessed it – an R interface to curl. Once you’ve installed it into your R environment, getting data from REST APIs is pretty straightforward.

For my project, I started with the Constant Contact API, partly because I work for Constant Contact on the Web Services team, and party because the API makes available exactly the kind of data that you typically want to analyse in a marketing big data project – specifically, sends, clicks, opens and the like for marketing campaigns. The current v1 API returns XML, so I also installed the XML package into my R environment (though I haven’t done much with it yet). To install the packages use the following command in R:

> install.packages(‘RCurl’, ‘XML’)

To load these packages, use:

> library(‘RCurl’)

> library(‘XML’)

Once these preliminary tasks are taken care of, there are just 2 steps required to get campaign data from the API and into R:

1. Obtain an access token. Constant Contact uses OAuth 2.0 for authentication, as do many other public REST APIs. There’s no good way to get a token from inside R, so I used the client flow with a little bit of javascript to get the token in my browser, and then just saved it for use in R. See here for details on how to get access tokens. If you’re building an app that analyses data from multiple Constant Contact accounts, you’ll need the owners of those accounts grant access to your app in order for you to obtain access tokens. But for now, sign up for a trial and use your own account.

2. Call an HTTP endpoint using RCurl. This is very easy. For my initial test, I wanted to get the list of available email campaigns, so that I could later iterate over the list and get the campaign statistics for analysis. The call is:

campaignsXML = getURL(“https://api.constantcontact.com/ws/customers/{username}/campaigns?access_token={token}”)

{username} : replace with the name of the account for which you have the access token

{token} : replace with the actual access token granted to you by the account owner

This issues the HTTP GET request, and puts the resulting XML response into the R vector ‘campaignsXML’, ready to be processed further. That’s all there is to it.

In my next post on this topic, I’ll show you how to parse the XML to get it into a more usable form using the R XML package.

Happy Model Building!

– Matt

Written by Matthew D. Laudato

June 11, 2012 at 1:57 am