Wednesday, November 30, 2011

When Google Analytics meets SAS

Thanks to Tricia’s introduction, I recently realized that Google Analytics is such a powerful tool for web analytics or business intelligence. It will fit the special needs if we use SAS to analyze the well-structure users’ data accumulated in Google Analytics. The challenge is that Google Analytics API and SAS hardly meet each other: Google Analytics often serves web/Linux, and SAS dwells in the ecosystems of Windows/UNIX/Mainframe. On a Windows-equipped computer, I tried three methods to pull out this blog’s data from Google Analytics to SAS: they have their own pros and cons.

Method 1: CliendLogin + HTTP protocol
The Data Export API of Google Analytics has 3 types of authorization, and ClientLogin is one of them. After downloading a token, the information from Google Analytics can be received through the HTTP protocol. William Roehl has a wonderful paper to describe how to pass the authorization step and then parse the XML data by applying two SAS macros. R’s RGoogleAnalytics package and Python’s GA library are also based the similar principles.
Pros: simple and effective. A client user can choose SAS, R or Python to download data. The codes are all open-sourced and easy to get modified for any particular need.
Cons: they all need cURL to set up SSL connection while downloading data. Since cURL is not built for Windows, it’s really awkward to use cURL on a PC which could fail many attempts.

Method 2: Data Feed Query Explorer
Google Analytics API has a web portal to supply data. It uses a browser to realize the operations in the first method above.
Pros: the easiest solution. The portal provides all options for the metrics, dimensions and segments.
Cons: the data has to be re-structured in SAS. It is getting slow when displaying a lot of results.



Method 3: OAuth + Google’s Python client library
OAuth is another authorization method. To obtain the necessary client’s key and secret for this approach, Google Analytics API has to be activated from Google API Console.
Pros: this authorization method is recommended by Google. The official Python library is very fast. Data downloaded in Python can be saved as CSV and then incorporated by SAS.

Cons: a little complicated. A SAS user has to learn some Python to tweak the codes.

Google Analytics now kicks off a new web interface which has a pretty high learning curve. In my opinion, using it's Data Export API as a front-end database and SAS as a back-end analytics platform will help generate customized models.

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...