Saturday, January 7, 2012

DO loop vs. vectorization in SAS/IML


Vectorization is an important skill for many matrix languages. From Rick Wiklin’s book about SAS/IML and his recent cheat sheet, I found a few new vector-wise functions since SAS 9.22. To compare the computation efficiency between the traditional do loop style and the vectorization style, I designed a simple test in SAS/IML: square a number sequence(from 1 to 10000) and calculate the time used.

Two modules were written according to these two coding styles. Each module was repeated 100 times, and system time consumed was recorded by SAS/IML’s time() function.

proc iml;
   start module1; * Build the first module;
      result1 = j(10000, 1, 1); * Preallocate memory to the testing vector;
      do i = 1 to 10000;  * Use a do-loop to square the sequence;
         result1[i] = i**2; 
      end;
      store result1; * Return the resulting object;
   finish;   
   t1 = j(100, 1, 1); * Run the first test;
   do m = 1 to 100;
      t0 = time(); * Set a timer;
         call module1;
      t1[m] =  time() - t0;
   end;
   store t1;
quit;

proc iml;
   start module2; * Build the second module;
      result2 = t(1:10000)##2; * Vectorise the sequence;
      store result2; * Return the resulting object;
   finish;   
   t2 = j(100, 1, 1); * Run the second test;
   do m = 1 to 100;
      t0 = time(); * Set a timer;
         call module2;
      t2[m] =  time() - t0;
   end;
   store t2;
quit;

proc iml;
   load result1 result2; * Validate the results;
   print result1 result2;
quit;

Then the results were released to Base SAS and visualized by a box plot with the SG procedures. In this experiment, the winner is the vectorizing method: vectorization seems much faster than do loop in SAS/IML. Therefore, my conclusions are: (1) avoid the do loop if possible; (2)use those vector-wise functions/operators in SAS/IML; (3) always test the speed of modules/functions by SAS/IML’s time() function.

proc iml;
   load t1 t2;
   t = t1||t2;
   create _1 from t;
      append from t;
   close _1;
   print t;
quit;

data _2;
   set _1;
   length test $25.;
   test = "do_loop"; time = col1; output;
   test = "vectorization"; time = col2; output;
   keep test time;
run;

proc sgplot data = _2;
   vbox time / category = test;
   yaxis grid;
run;

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...