Thursday, March 15, 2012

Make a frequency function in SAS/IML

Aggregation is probably the most popular operation in the data world. R comes with a handy table() function. Usually in SAS, the FREQ procedure would deal with this job. It will be great if SAS/IML has an equivalent function. I just created a user-defined function or module for such a purpose. Since it contains a DO loop, the efficiency is not very ideal -- always 10 times slower than PROC FREQ for a simulated data set of one million records.

/* 1 - Use IML for simulation and aggregation */
proc iml;
   start freq(invec);
      x = t(unique(invec));
      y = repeat(x, 1, 2);
      do i = 1 to nrow(x);
         y[i, 2] = ncol(loc(invec=y[i, 1]));
   store module = freq;

proc iml;
   load module = freq;
   /* Simulate a vector with 1 million values */
   test = abs(floor(rannor(1:1e6)*100));
   t0 = time();
      result = freq(test);
   timer = time() - t0;
   print timer;   
   /* Output the result matrix as SAS data set */
   create a var{"level" "frequency"};
      append from result;
   close a;

proc sgplot data = a;
   series x = level y = frequency;

/* 2 - Use PROC FREQ for simulation and aggregation */
data test;
   do i = 1 to 1e6;
      test = abs(floor(rannor(1234)*100));
   drop i;

options fullstimer;
proc freq data = test noprint;
   table test / out = b;

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...