Technistas

Matthew D. Laudato writes about software and technology

Using REST APIs from R – XML operations

with 3 comments

In the first part of this series, I showed you how to make calls to REST APIs from R. In this part, we’ll look at how to work with the XML documents that the REST APIs return.  I’ll stick with the Constant Contact v1 APIs, since I’m most familiar with those and since the data (campaign statistics) is appropriate for analysis in R.

Once you have the raw XML from a REST API in an R variable, you need to parse it in order to extract the data that you’re interested in. To do this, we use the ‘XML’ package in R. First, load the package with:

> library(‘XML’)

If we start with the ‘campaignsXML’  vector from the previous post, we can easily create a DOM object that contains the XML in a form that is useful for extracting data. The ‘XML’ library in R makes this easy:

> campaignDOM = xmlRoot(xmlTreeParse(campaignXML))

This creates an DOM object called ‘campaignDOM’ that represents the contents of campaignXML. To get data from this object, we’ll write our own function that iterates over the nodes in the DOM object and extracts the data. As a simple example, let’s say we wanted a vector of all the campaign names, perhaps to use later as labels on campaign statistics graphs. The function to do this looks like:

getCampaignNames <- function(doc) {
 namelist <- NULL
 for (i in 1:xmlSize(doc)) {
    node <- doc[[i]]
    namelist <- c(namelist, node$children[["content"]]$children$Campaign[["Name"]]$children$text$value)
 }
 namelist
}

There are two key parts to this function. First, the for loop iterates over all nodes in the DOM object – the function xmlSize(doc) from the ‘XML’ package returns an integer representing the number of nodes in the object. Then for each node, getCampaignNames extracts the value of the campaign name and adds it to the ‘namelist’ vector, which is returned when the function completes. The syntax for how to access nodes and children can be a little daunting, but remember, it’s just XML and all you’re really doing is walking the tree. One useful fact: the node functions in the ‘XML’ package are fairly forgiving, so in our example, even though there are several nodes in the DOM object that aren’t of type ‘content’, we don’t need to do any special checking for that condition. Nodes that don’t have ‘content’ will be silently ignored and thus the c(namelist, …) function call will not push them onto the ‘namelist’ vector.

For convenience, you should place the function definition in a file called ‘getCampaignNames.R’ and load it as needed with:

> source (“getCampaignNames.R”)

Putting this all together, you can get the vector of campaign names by simply calling the function:

> namelist <- getCampaignNames(campaignDOM)

If we do this on my Constant Contact account, the ‘namelist’ vector will contain the following:

> namelist
[1] “Created via API30”    “Created via API205”   “Created via API24”
[4] “Created via API23”    “Created via API22”    “Created via API21”
[7] “Created via API20”    “Created via API19”    “Created via API18”
[10] “Created via API17”    “Created via API16”    “Created via API15”
[13] “Created via API13”    “Created via API12”    “Created via API11”
[16] “Created via API10”    “Created via API9”     “Created via API8”
[19] “Created via API6”     “Created via API5”     “Created via API4”
[22] “Created via API3”     “Created via API2”     “Created via API”
[25] “BlockTest 20110425”   “Social Test 20110407” “Feb 16 2011”

Overall, R combined with the RCurl and XML packages makes for a powerful system to get data from REST APIs and then process the resulting XML. In the next installment in this series, we’ll look at actual campaign statistics and use R to do some basic campaign analysis.

Happy Model Building!

– Matt

Advertisements

Written by Matthew D. Laudato

June 16, 2012 at 5:15 pm

3 Responses

Subscribe to comments with RSS.

  1. Great article, thanks!

    Matthew B.

    February 26, 2013 at 7:01 pm

  2. […] should review the previous posts (here and here) if this script doesn’t make sense to […]

  3. […] data, analysis and APIs. In my earlier 3 part series on R programming and XML REST APIs (Part 1, Part 2, Part 3) I focused on obtaining email campaign data from the leading online marketing service, […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: