Matthew D. Laudato writes about software and technology

Working with JSON REST APIs from R

leave a comment »

My regular readers know that I have a passion for  a couple of technical areas – among them, data, analysis and APIs. In my earlier 3 part series on R programming and XML REST APIs (Part 1, Part 2, Part 3) I focused on obtaining email campaign data from the leading online marketing service, Constant Contact, through their XML-based REST APIS. Talk about combining my interests! The approach was pretty straight forward, but there is no question that the hardest part was working with the XML output from the REST APIs. As APIs have evolved however, most vendors have switched over to using JSON as their main data protocol. Constant Contact recently released their v2 APIs, which now support JSON, so I took an hour to go back over the work I did on using REST APIs from R to rework it to use these new JSON APIs. Here are the basics.

The overall approach is the same as before – we want to obtain campaign data by calling a REST API, and then perform some basic manipulations in R to ready the data for more sophisticated analysis. The two significant differences between the v1 and v2 APIS are:

  1. The URL format and authentication parameters are different
  2. The results are returned in a JSON document

The impact of these differences was that I needed to change how I set up the RCurl calls to the APIs, and I had to change how I parsed the results differently to obtain the campaign data and get it into an R data frame object. The main driver script that gets the JSON data looks like this:

campaignJSON = getURL(url = paste("", access_token, "&api_key=", api_key, sep=""))
campaign.dataframe <- getCampaignDataframeFromJSON(campaignJSON)

A couple of comments. First, I use the ‘rjson’ library to assist in JSON parsing. Also, since I plan to access my account fairly often, I wrote a simple R script called OAuthAccessToken.R that stores values for ‘api_key’ and ‘access_token’ and just source it to make those values available. Next, the URL format is a bit different from the v1 APIs, but nothing earth shattering. (Click for details on the new Constant Contact v2 APIs – the new developer portal is very cool!). Finally, I wrote a new R function called getCampaignDatafromFromJSON that accepts a JSON document (as returned from RCurl) and transforms it into a data frame with one row for each campaign found. Here is the source code for the new function:

getCampaignDataframeFromJSON <- function(campaignJSON) {
namelist <- NULL
urllist <- NULL
statuslist <- NULL
datelist <- NULL
JSONList <- fromJSON(CampaignJSON)
results <- JSONList$results
for (i in 1:length(results)) {
namelist <- c(namelist, results[i][[1]]$name )
urllist <- c(urllist, paste("", results[i][[1]]$id, sep="", collapse=NULL))
statuslist <- c(statuslist, results[i][[1]]$status)
datelist <- c(datelist, results[i][[1]]$modified_date)
campaignDF = data.frame(name=namelist,url=urllist,status=statuslist,date=datelist,stringsAsFactors=FALSE)

The basic algorithm is the same as with the XML APIs, but the code is a whole lot simpler with JSON! The resulting data frame looks something like this (fake data to protect my accounts of course!):

name                                                                                 url                                                                                                                                                status      date

1          Email Created 2013/02/18, 9:53 AM            {someid1}    SENT      2013-02-25T16:45:41.191Z

2          Email Created 2012/12/20, 1:01 PM             {someid2}    SENT      2012-12-20T18:15:06.565Z

3          APR 15 2011 Attend Help The Humane Society{someid3}    DRAFT   2011-03-03T15:22:16.407Z

4          About Me Newsletter                                          {someid4}    DRAFT   2012-12-04T18:51:02.667Z

Once you have the data frame available, it’s pretty simple to iterate over each campaign and get the detail data like sends, opens and clicks – more on that in an upcoming post!

Written by Matthew D. Laudato

April 3, 2013 at 5:03 pm

Using REST APIs from R – Campaign statistics

with 2 comments

In my previous posts in this series, we looked at how to call REST APIs from R. Now let’s get serious and get real email campaign data from the Constant Contact APIs. To do this, we’ll write a new function to do more sophisticated parsing of the XML returned from the Constant Contact campaigns resource. The algorithm looks like this:

  • GET the campaigns collection, which provides high level XML data on all campaigns
  • Transform the raw XML into a DOM object
  • Iterate over each campaign in the DOM and make a second call to the APIs to get the detail data for that campaign, and assemble an R dataframe object that holds the data for all campaigns

To get started, here’s a simple R script that we’ll use to set up the call to our new function:

campaignXML = getURL(url = "{username}/campaigns?access_token=your_token_goes_here")
campaignDOM = xmlRoot(xmlTreeParse(campaignXML))
campaignStats = getCampaignDataframe(campaignDOM)

You should review the previous posts (here and here) if this script doesn’t make sense to you.

The heavy lifting is done by our new function, getCampaignDataframe. The code is pretty straightforward, and amounts to parsing the DOM object to extract per-campaign data:

getCampaignDataframe <- function(doc) {
 namelist <- NULL
 urllist <- NULL
 sends <- NULL
 opens <- NULL
 for (i in 1:xmlSize(doc)) {
    node <- doc[[i]]
    namelist <- c(namelist, node$children[["content"]]$children$Campaign[["Name"]]$children$text$value)
    url = node$children[["content"]]$children$Campaign$attributes[["id"]]
    url = sub ("http","https",url)
    urllist <- c(urllist, url)
    if (length(url) > 0) {
       url = paste(url,"?access_token=your_token_goes_",sep="",collapse=NULL)
       campaignDetailXML = getURL(url = url)
       campaignDetailDOM = xmlRoot(xmlTreeParse(campaignDetailXML))
       sends <- strtoi(c(sends, campaignDetailDOM[["content"]]$children$Campaign[["Sent"]]$children$text$value))
       opens <- strtoi(c(opens, campaignDetailDOM[["content"]]$children$Campaign[["Opens"]]$children$text$value))
 campaignstats = data.frame(name=namelist,url=urllist,sends=sends,opens=opens,stringsAsFactors=FALSE)

The function takes a single parameter ‘doc’, which is a DOM object containing the list of campaigns – including the URLs to get to the campaign detail. The for() loop simply iterates over the DOM and extracts the campaign name and URL. It then issues a GET request to the URL to obtain the sends and opens for that campaign. Once this is done, the data is assembled into the data frame and returned. Here is an example of the dataframe contents when I run the script against one of my Constant Contact accounts (with specific identifying data for username and campaign removed):

name                                                                url                                                                                                                                                                           sends    opens

1 Email Created 2012/11/10, 9:03 AM{username}/campaigns/{campaignid}           3             1
2 Created via API as UTF-8 v6       {username}/campaigns/{campaignid}           3             1
3 Created via API as UTF-8 v5       {username}/campaigns/{campaignid}           3             2

Given this raw data, now in a useful form in a dataframe, we can use R to help us calculate the open rate for emails. In R, this is as simple as:

openRate <- campaignStats$opens/campaignStats$sends
campaignStats$openRate <- openRate

This adds a new column called openRate to the dataframe that contains the calculated open rates for each email campaign. Admittedly this is a simple example, but at this point, you have all the tools you need to pull data into R from a REST API, and do some basic manipulations on it.

Happy Modeling!

– Matt

Written by Matthew D. Laudato

March 13, 2013 at 4:30 pm

Using REST APIs from R – XML operations

with 3 comments

In the first part of this series, I showed you how to make calls to REST APIs from R. In this part, we’ll look at how to work with the XML documents that the REST APIs return.  I’ll stick with the Constant Contact v1 APIs, since I’m most familiar with those and since the data (campaign statistics) is appropriate for analysis in R.

Once you have the raw XML from a REST API in an R variable, you need to parse it in order to extract the data that you’re interested in. To do this, we use the ‘XML’ package in R. First, load the package with:

> library(‘XML’)

If we start with the ‘campaignsXML’  vector from the previous post, we can easily create a DOM object that contains the XML in a form that is useful for extracting data. The ‘XML’ library in R makes this easy:

> campaignDOM = xmlRoot(xmlTreeParse(campaignXML))

This creates an DOM object called ‘campaignDOM’ that represents the contents of campaignXML. To get data from this object, we’ll write our own function that iterates over the nodes in the DOM object and extracts the data. As a simple example, let’s say we wanted a vector of all the campaign names, perhaps to use later as labels on campaign statistics graphs. The function to do this looks like:

getCampaignNames <- function(doc) {
 namelist <- NULL
 for (i in 1:xmlSize(doc)) {
    node <- doc[[i]]
    namelist <- c(namelist, node$children[["content"]]$children$Campaign[["Name"]]$children$text$value)

There are two key parts to this function. First, the for loop iterates over all nodes in the DOM object – the function xmlSize(doc) from the ‘XML’ package returns an integer representing the number of nodes in the object. Then for each node, getCampaignNames extracts the value of the campaign name and adds it to the ‘namelist’ vector, which is returned when the function completes. The syntax for how to access nodes and children can be a little daunting, but remember, it’s just XML and all you’re really doing is walking the tree. One useful fact: the node functions in the ‘XML’ package are fairly forgiving, so in our example, even though there are several nodes in the DOM object that aren’t of type ‘content’, we don’t need to do any special checking for that condition. Nodes that don’t have ‘content’ will be silently ignored and thus the c(namelist, …) function call will not push them onto the ‘namelist’ vector.

For convenience, you should place the function definition in a file called ‘getCampaignNames.R’ and load it as needed with:

> source (“getCampaignNames.R”)

Putting this all together, you can get the vector of campaign names by simply calling the function:

> namelist <- getCampaignNames(campaignDOM)

If we do this on my Constant Contact account, the ‘namelist’ vector will contain the following:

> namelist
[1] “Created via API30″    “Created via API205″   “Created via API24″
[4] “Created via API23″    “Created via API22″    “Created via API21″
[7] “Created via API20″    “Created via API19″    “Created via API18″
[10] “Created via API17″    “Created via API16″    “Created via API15″
[13] “Created via API13″    “Created via API12″    “Created via API11″
[16] “Created via API10″    “Created via API9″     “Created via API8″
[19] “Created via API6″     “Created via API5″     “Created via API4″
[22] “Created via API3″     “Created via API2″     “Created via API”
[25] “BlockTest 20110425″   “Social Test 20110407″ “Feb 16 2011″

Overall, R combined with the RCurl and XML packages makes for a powerful system to get data from REST APIs and then process the resulting XML. In the next installment in this series, we’ll look at actual campaign statistics and use R to do some basic campaign analysis.

Happy Model Building!

– Matt

Written by Matthew D. Laudato

June 16, 2012 at 5:15 pm

Using REST APIs from R

with 2 comments

It’s hard to read a website, blog post or even mainstream business press article without coming across the term ‘big data’. Big Data is one of those terms that means nothing and everything all at once, and for that reason alone, you should pay attention to it. When it comes to the how of big data, it’s equally hard to avoid bumping into R, the open source statistics and computational environment. If you’re going to describe and model the behavior of your customers in a big data initiative, R is one tool that you need in your software toolbox.

Data is everywhere, and increasing, data of all kinds on customer engagement is available through REST APIs. I started a little project in my spare time to bring data available via REST interfaces into R, to set the stage for doing what I expect to be some fairly sophisticated model building. The rest of this post is a quick introduction to how to work with REST APIs in R.

At its most basic, calling a REST API to obtain data involves making an HTTP GET request to a server. If the call succeeds, you’ll have a document that contains the requested data. In R, the best way to make these requests is by using RCurl. The RCurl package is – you guessed it – an R interface to curl. Once you’ve installed it into your R environment, getting data from REST APIs is pretty straightforward.

For my project, I started with the Constant Contact API, partly because I work for Constant Contact on the Web Services team, and party because the API makes available exactly the kind of data that you typically want to analyse in a marketing big data project – specifically, sends, clicks, opens and the like for marketing campaigns. The current v1 API returns XML, so I also installed the XML package into my R environment (though I haven’t done much with it yet). To install the packages use the following command in R:

> install.packages(‘RCurl’, ‘XML’)

To load these packages, use:

> library(‘RCurl’)

> library(‘XML’)

Once these preliminary tasks are taken care of, there are just 2 steps required to get campaign data from the API and into R:

1. Obtain an access token. Constant Contact uses OAuth 2.0 for authentication, as do many other public REST APIs. There’s no good way to get a token from inside R, so I used the client flow with a little bit of javascript to get the token in my browser, and then just saved it for use in R. See here for details on how to get access tokens. If you’re building an app that analyses data from multiple Constant Contact accounts, you’ll need the owners of those accounts grant access to your app in order for you to obtain access tokens. But for now, sign up for a trial and use your own account.

2. Call an HTTP endpoint using RCurl. This is very easy. For my initial test, I wanted to get the list of available email campaigns, so that I could later iterate over the list and get the campaign statistics for analysis. The call is:

campaignsXML = getURL(“{username}/campaigns?access_token={token}”)

{username} : replace with the name of the account for which you have the access token

{token} : replace with the actual access token granted to you by the account owner

This issues the HTTP GET request, and puts the resulting XML response into the R vector ‘campaignsXML’, ready to be processed further. That’s all there is to it.

In my next post on this topic, I’ll show you how to parse the XML to get it into a more usable form using the R XML package.

Happy Model Building!

– Matt

Written by Matthew D. Laudato

June 11, 2012 at 1:57 am

Building an OAuth enabled website using Java

leave a comment »

For me, it’s always been about the APIs. I write them. I use them. I have built and continue to grow my career around them. So as it became clear over the past few years that OAuth has become the de facto standard for API authentication, I took the plunge and figured out how to do practical things with OAuth-based APIs. Since Java is my preferred language, and since I’m now the Product Manager at Constant Contact for the Platform and Partner Integrations, the natural path for my latest project was to build a website using Java that lets users authenticate to Constant Contact via OAuth.

That said, here’s the basics of the application and how it works. The technology stack looks like this:

Web UI: HTML, CSS, JavaScript
Application Services: Java Servlets, JavaScript (Ajax), Scribe-Java (OAuth Java library), REST calls to the Constant Contact API
Database (for storing OAuth tokens and secrets): Hibernate, MS-SQL Server

These are all tools that should be in any programmers toolbox, so the application also serves as a good training app and sample code if you’re trying to get familiar with this stack. The code is available on Github (you do have a Github account, don’t you??). Just fork my repo and you should be ready to go.

Since the main point of this article building an OAuth based website, let’s get right into it. If you want your users to display their data on your website, you as the website owner must let your users grant access to their data. In my case, the data is coming from Constant Contact, so I implemented what is known as the ‘web flow’ for authentication. The flow uses a series of callbacks and goes roughly like this:

– Get a request token from Constant Contact
– Redirect the user to Constant Contact’s OAuth endpoint and ask the end user to authenticate and grant access to their data
– Exchange the authenticated request token for a valid access token and secret and have OAuth redirect the user back to your site.

Now, most of us are not hard-core security programmers, and trust me, if you (like me) prefer to focus on the functionality of your app and not on plumbing, you should use an OAuth library. In my case I chose scribe-java, an excellent open source library for Java OAuth programmers. I submitted the Constant Contact API to the project so that the rest of you can easily authenticate to Constant Contact. You can get scribe-java from github at this url.

I handle the callbacks through two Java servlets. makes the initial request for an OAuth Request token, and if successful, redirects to Constant Contact to let the user authenticate and grant access. The basic code looks like this:

OAuthService service = new ServiceBuilder()
httpsession.setAttribute("oauth.service", service);

Token requestToken = service.getRequestToken();
httpsession.setAttribute(“oauth.request_token”, requestToken);

String confirmAccessURL = service.getAuthorizationUrl(requestToken);

try {
} catch (Exception e) {


The simplicity of the code is due to scribe-java. A couple of things to notice. First, we tell scribe-java what the callback URL is when we create the service object. Second, we store the request_token in the httpsession object. This isn’t strictly necessary (we could just store the request token secret, and then reconstruct the request token in the callback), but it is convenient. The reason for this has to do with the scribe-java Token class – it isn’t really a Token per-se, it’s a token-secret pair. I suspect that the author of the library did it this way because token-secret pairs are ubiquitous in OAuth. In any case, when the callback is invoked, it winds up in The relevant code looks like this:

String oauth_token = req.getParameter("oauth_token");
String oauth_verifier = req.getParameter("oauth_verifier");
String username = req.getParameter("username");

if (oauth_verifier.length() > 0) {
Verifier verifier = new Verifier(oauth_verifier);

HttpSession httpsession = req.getSession(true);
OAuthService service = (OAuthService) httpsession.getAttribute(“oauth.service”);
Token requestToken = (Token) httpsession.getAttribute(“oauth.request_token”);
httpsession.setAttribute(“username”, username);

Token accessToken = service.getAccessToken(requestToken, verifier);
httpsession.setAttribute(“oauth.access_token”, accessToken);

Long accessTokenId = null;
AccessToken at = new AccessToken();
Date dt = new Date();
Timestamp ts = new Timestamp(dt.getTime());

Session session = HibernateUtil.getSessionFactory().openSession();
Transaction transaction = null;
try {
transaction = session.beginTransaction();
accessTokenId = (Long);
} catch (HibernateException e) {
} finally {

Again, refer to the full source code on github. There’s several things going on here. First, the entry point to the call back is a servlet URL, which the Constant Contact OAuth implementation has called with three parameters. The oauth_token is the same request token that you generated earlier (but just the token, not the secret). Because I stored the request-token/secret pair earlier, I just ignore this parameter (but see the earlier comment – you could store the secret and then reconstruct the scribe-java Token object). The verifier is a string that the OAuth server creates when the user authenticates and then grants access. The magic is in the call to service.getAccessToken, where you trade in the (now verified) request token for a valid access token. Once you have the token, you need to do something with it so that the next time your user visits the website they can access their Constant Contact data. In my case, I opted to store the username, access token and the token secret in a SQL database. The schema for the database is pretty simple, and Hibernate makes it fairly trivial to store and retrieve the token and secret.

That’s about it for now. Happy coding!

– Matt

Written by Matthew D. Laudato

June 28, 2011 at 9:46 pm

Posted in Java, OAuth

Tagged with , , ,

A Framework for Evaluating Continuous Integration Tools

leave a comment »

For those of you interested in the methodology behind my ongoing series on CI tools, you should check out a new article that I wrote for CMCrossroads. In it I provide a framework for evaluating CI tools, and give you a checklist and ranking system to help you organize and rate your evalution.

The 2nd installment of the hands-on tool evaluation is a few days away – stay tuned for how the four tools (Hudson, Mojo, Bamboo and TeamCity) fare on providing access to common development tools and on enabling you to assemble complex build workflows.

Happy Building!

– Matt

Written by Matthew D. Laudato

June 16, 2010 at 6:36 pm

Comparing Continuous Integration Tools, Part 1

with 6 comments

One of the more enjoyable parts of my job at OpenMake Software is getting to examine and analyze the various build tools on the market. This is partly to see what the competition is up to, and partly to make sure that I can effectively communicate the technical bits with our customers, many of whom have multiple build tools in their environments.

To that end, I recently embarked on a continuous integration tool evaluation. I chose to look at Hudson, an opensource tool commercially supported by Sun Microsystems;  TeamCity, a commercial tool from JetBrains;  Bamboo, a commercial tool from Atlassian; and Mojo, a freeware and commercially supported tool from OpenMake Software. My goal was to compare the tools along several vectors:

  • Installation
  • Configuration
  • Running a simple job
  • Viewing logs
  • Interacting with source control
  • Performing complex distributed build workflows

I decided to break the effort into two parts. The first part, covered in this post, is the ‘getting my feet wet’ portion of the evaluation. I tackled the first four bullets above to get a sense of how the tools were installed and configured, and to see if I could get them each to do something useful. The useful thing was to run a job that spits out the current environment, the equivalent of running the ‘set’ command from a DOS prompt in Windows.

The table below summarizes my findings, and below that, I give some general impressions about the tools and the evaluation process.

PRODUCT: Mojo Bamboo Hudson Team City
Version 7.31 2.5.5 1.332 5.1.1
Windows installer Windows installer Executable war file Windows installer
50M 84M 27M 268M
Free, unlimited
single-server license
30 day trial Free, unlimited
single-server license
Free, unlimited
single-server license
Does not ask for default
port as part of install. That is configured once you have started the client.
Server starts as part of installation and gets installed as Windows service.
Start Menu group and icons installed for access to thick and web client.
Asks for default port as
part of install. Does not start server as part of startup. When you do start
the server, does not recognize your port choice.
No issues. Hard to figure
out how to change the default ports.
No issues. Asked for
default port in install wizard. Starts server and build agent as windows
service as part of install and then runs web interface.
None. If you like the
defaults, then you can create a workflow immediately through the thick client
Asks you to ‘Create a Plan’
as the first activity. Did not like this as it forces me to digest their
meaning of the generic word ‘Plan’
None. If you like the
defaults, then you can create a workflow immediately.
Wants you to create
projects and build configurations but does not define exactly what these are.
a simple job (ENVPEEK – prints build server environment vars to build log)
Easy. Create a workflow,
add a ‘Mojo | Execute shell command’ activity, and type in the command
Difficult. In order to
‘Create a Plan’ you need to go through an 8 step wizard. The second wizard
screen requires you to select an SCM system and a repository location. I had
to give it a repository location from my Subversion server to get past this
screen. Annoying, since for this job I don’t care about SCM. Rest of the
wizard was OK, but way too many steps just to set up a simple job.
Easy. Create a new build
job. Use the ‘Execute Windows batch command’ option and type in the command
Moderate. Team City asks
you to create a project, which is pretty easy. You then have to create at
least one Build Configuration. There is a web-based wizard that like Bamboo
has an SCM screen, but you can choose to ignore it. You can then choose a
command line Build Runner, in which you specify the ‘set’ command.
a simple job
Easy. Open the workflow,
either in the thick client or in the web interface, and press the run button.
Runs successfully
Moderate. From the Bamboo
home, select the Plan, then select ‘Run Build’ from the Plan Actions menu on
the right. Because of the SCM choice, even jobs that don’t require SCM will
check out from Subversion. Tool is geared for building code projects – does not appear to be a general workflow tool.
Easy. Select the job and
select the ‘Schedule a build’ button.
Easy. From the Project tab,
find the project that you want to run and then click on the Run… button.
job logs
Easy. In the thick client,
open the workflow, and go to the History/Trends tab. Select the run that you
want to see and double-click. In the web interface, select the workflow and
submit a query to retrieve the run information. Select the specific run you
want to view.
Moderate. You have to click
on the plan, then the Completed Builds tab, then click on the build you want,
then click on its Logs tab. Lots of drilling down required.
Easy. Click on the Job name
and then select any link from the Build History link.
Easy. From the Projects
tab, click on the Project that you want to view. Select the link for the run
that you want to view.


Overall, Hudson and Mojo were the easiest tools to install and use. Hudson definitely takes the cake when it comes to installation, since you don’t have to install it – you just run the executable war file from the command line. Mojo, TeamCity and Bamboo have more traditional installers, of which the Mojo install was the most straight-forward, asking the fewest questions before proceeding with the install. Atlassian’s Bamboo has the most restrictive trial license, but Mojo, Hudson and TeamCity all have a more open approach – you can use them in very useful forms without any cost or special licensing.

Once the tools were installed, I next looked to do any initial configuration, which I define loosely as ‘stuff the tool requires me to do before it lets me do what I really want to do’. On this measure, I again put Mojo and Hudson in the lead, as I didn’t have to do anything – I just went straight to thinking about the job I wanted to run. TeamCity wanted me to create a project and a build configuration, which was fairly easy – but I had to figure out what they meant by ‘project’ and ‘build configuration’. Bamboo was by far the most difficult tool to configure. Any time I see an 8-step wizard just to turn the engine over and get the motor running, my initial response is ‘who wrote this thing’?

Getting the actual job configured was again easy in Mojo and Hudson. The Mojo interface is very straight-forward – you select a machine to run on, and then start adding workflow steps (called activities). There is a large built-in list of activities (around 50) for interacting with commercial and opensource tools. I used the ‘Execute shell script’ activity type to run the set command, and that constituted the entirely of my ‘ENVPEEK’ job. Hudson was also easy to set up. TeamCity and Bamboo were the most painful to set up for actual jobs – you are forced into their concepts, instead of just being able to think about the job at hand. The other comment on both TeamCity and Bamboo is that they are both very ‘source code biased’. By that I mean that they have an implicit assumption that your jobs require interaction with source control. In both tools I was required to specify a location in a source control tool (I used Subversion from Collabnet). Since my initial job was a codeless one, this was annoying.

Running jobs in all tools is fairly easy, as is reviewing the logs – though in Bamboo I did have drill down quite a bit to get to my logs. Going back to my ‘source control bias’ comment, Bamboo needed to check out code from a repository location that I specified – and then ignored it since my inital job was just to run ‘set’.

Next installment: doing actual code builds with each of the tools, and then putting together complex build processes.

Happy Building!

– Matt

Written by Matthew D. Laudato

June 7, 2010 at 4:09 pm


Get every new post delivered to your Inbox.