Technistas

Matthew D. Laudato writes about software and technology

Posts Tagged ‘Big Data

Working with JSON REST APIs from R

with 3 comments

My regular readers know that I have a passion for  a couple of technical areas – among them, data, analysis and APIs. In my earlier 3 part series on R programming and XML REST APIs (Part 1, Part 2, Part 3) I focused on obtaining email campaign data from the leading online marketing service, Constant Contact, through their XML-based REST APIS. Talk about combining my interests! The approach was pretty straight forward, but there is no question that the hardest part was working with the XML output from the REST APIs. As APIs have evolved however, most vendors have switched over to using JSON as their main data protocol. Constant Contact recently released their v2 APIs, which now support JSON, so I took an hour to go back over the work I did on using REST APIs from R to rework it to use these new JSON APIs. Here are the basics.

The overall approach is the same as before – we want to obtain campaign data by calling a REST API, and then perform some basic manipulations in R to ready the data for more sophisticated analysis. The two significant differences between the v1 and v2 APIS are:

  1. The URL format and authentication parameters are different
  2. The results are returned in a JSON document

The impact of these differences was that I needed to change how I set up the RCurl calls to the APIs, and I had to change how I parsed the results differently to obtain the campaign data and get it into an R data frame object. The main driver script that gets the JSON data looks like this:


library('RCurl')
library('rjson')
source("OAuthAccessToken.R")
source("getCampaignDataframeFromJSON.R")
campaignJSON = getURL(url = paste("https://api.constantcontact.com/v2/emailmarketing/campaigns?access_token=", access_token, "&api_key=", api_key, sep=""))
campaign.dataframe <- getCampaignDataframeFromJSON(campaignJSON)

A couple of comments. First, I use the ‘rjson’ library to assist in JSON parsing. Also, since I plan to access my account fairly often, I wrote a simple R script called OAuthAccessToken.R that stores values for ‘api_key’ and ‘access_token’ and just source it to make those values available. Next, the URL format is a bit different from the v1 APIs, but nothing earth shattering. (Click for details on the new Constant Contact v2 APIs – the new developer portal is very cool!). Finally, I wrote a new R function called getCampaignDatafromFromJSON that accepts a JSON document (as returned from RCurl) and transforms it into a data frame with one row for each campaign found. Here is the source code for the new function:


getCampaignDataframeFromJSON <- function(campaignJSON) {
namelist <- NULL
urllist <- NULL
statuslist <- NULL
datelist <- NULL
JSONList <- fromJSON(CampaignJSON)
results <- JSONList$results
for (i in 1:length(results)) {
namelist <- c(namelist, results[i][[1]]$name )
urllist <- c(urllist, paste("https://api.constantcontact.com/v2/emailmarketing/campaigns/", results[i][[1]]$id, sep="", collapse=NULL))
statuslist <- c(statuslist, results[i][[1]]$status)
datelist <- c(datelist, results[i][[1]]$modified_date)
}
campaignDF = data.frame(name=namelist,url=urllist,status=statuslist,date=datelist,stringsAsFactors=FALSE)
return(campaignDF)
}

The basic algorithm is the same as with the XML APIs, but the code is a whole lot simpler with JSON! The resulting data frame looks something like this (fake data to protect my accounts of course!):


name                                                                                 url                                                                                                                                                status      date

1          Email Created 2013/02/18, 9:53 AM                      https://api.constantcontact.com/v2/emailmarketing/campaigns/{someid1}    SENT      2013-02-25T16:45:41.191Z

2          Email Created 2012/12/20, 1:01 PM                       https://api.constantcontact.com/v2/emailmarketing/campaigns/{someid2}    SENT      2012-12-20T18:15:06.565Z

3          APR 15 2011 Attend Help The Humane Society   https://api.constantcontact.com/v2/emailmarketing/campaigns/{someid3}    DRAFT   2011-03-03T15:22:16.407Z

4          About Me Newsletter                                                    https://api.constantcontact.com/v2/emailmarketing/campaigns/{someid4}    DRAFT   2012-12-04T18:51:02.667Z

Once you have the data frame available, it’s pretty simple to iterate over each campaign and get the detail data like sends, opens and clicks – more on that in an upcoming post!

Written by Matthew D. Laudato

April 3, 2013 at 5:03 pm