Friday, July 19, 2013

Cluster analysis on a pivot table

The link of the pivot table is here

The increasing supremacy of JavaScript on both server side and client side seems a good news for those statistical workers who deal with data and model, and therefore always live in the darkness. They could eventually find a relatively easier way to show off their hard work on Web, the final destination of data. Here I show how to display the result of a cluster analysis on a web-based pivot table.
Back-end: cluster analysis
SAS has a FASTCLUS procedure, which implements a nearest centroid sorting algorithm and is similar to k-means. It has some time and space advantages over other more complicated clustering algorithms in SAS.
I still use the SASHELP.CLASS dataset and cluster the rows by weight and height. I specify 2 clusters and easily obtain the distances to the centroids by PROC FASTCLUS. The plot demonstrates thatweight=100 looks like the boundary to separate the two clusters. Next in DATA Step, I translate the SAS dataset to JSON format so that the browser can understand it.
************(1) Cluster the dataset*******;
proc fastclus data = sashelp.class maxclusters = 2 out = class;
   var height weight;

proc sgplot data = class;
   scatter x = height y = weight /group = cluster;
   yaxis grid;

************(2) Transform to JSON*********;
data toJSON;
   set class;
   length line $200.;
   array a[5] _numeric_; 
   array _a[5] $20.;
   do i = 1 to 5;
      _a[i] = cat('"',vname(a[i]),'":', a[i], ',');
   array b[2] name sex;
   array _b[2] $20.;
   do j = 1 to 2;
      _b[j] =cat('"',vname(b[j]),'":"', b[j], '",');
   line = cats('{', cats(of _:), '},');
   substr(line, length(line)-2, 1) = ' ';
   keep line;
Front-end: pivot table
Pivot table is a nice way to present data, especially raw data. There are a few approaches to realize pivot table on web, such as Google's fusion table. Nicolas Kruchten developed a framework called PivotTable.js on github, which is very popular.
I embed the JSON data with the PivotTable.js to make the HTML file static, since the Blogger doesn't provide the function of HTTP server. The file content will be like:

Eventually we can view the cluster result on a pivot table. The audience can now interactively play with data.

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...