Thursday, February 16, 2012

Mahalanobis distances on a heat map


I just learned Mahalanobis distance from Rick’s blog post yesterday, and realized its significance in detecting outliers. One of SAS’s online documents shows how to use PCA method to find Mahalanobis distances. And in SAS 9.3, the popular heat map becomes available

SAS’s classic help dataset SASHELP.CLASS has weight, height, age and some other information for 19 teenagers. I calculated the pair-wise Mahalanobis distances according to their age, weight and height, and showed those distances on a heat map. It seems that it is helpful to tell how similar two teenagers are to each other.


/* 1 -- Find pairwise Mahalanobis distances */
proc princomp data=sashelp.class std out=_1 noprint;
     var age weight height;
run;

proc distance data=_1 out=_2;
   var interval(prin:);
   id name;
run;

/* 2 -- Restructrue data */
data _3(where=(missing(distance)=0));
   set _2;
   array a[*] _numeric_;
   do i = 1 to dim(a);
      x = name;
      y = vlabel(a[i]);
      distance = a[i];
      output;
   end;
   keep x y distance;
run;

data _4;
   set _3 _3(rename=(x=y y=x));
run;

/* 3 -- Draw Mahalanobis distances on a heat map */
proc template;
  define statgraph heatmapparm;
    begingraph;
      layout overlay / xaxisopts=(label=" ") yaxisopts=(label=" ");
        heatmapparm x = x y = y colorresponse = distance / name = "heatmap";
        continuouslegend "heatmap" / orient = vertical location = outside;
      endlayout;
    endgraph;
  end;
run;

ods html style = money;
proc sgrender data=_4 template=heatmapparm;
run;


Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...