Thursday, December 2, 2010

Find the 'right' SAS functions


How many functions SAS has? Well, it sounds like a job interview question. For SAS 9.2, by querying the system dictionary (sashelp.vfunc or dictionary.functions), the exact answer is 946, including all functions and call routines. There are two types - unicode/bit based on input argument, while three types –numeric/character/bitwise based on output argument. Again according to their usage[1], the common SAS functionscan be categorized into several types: array(3), bitwise logical operation(3), PERL regular expression(11), character(91), time(38), descriptive statistics(32), random number(22), probability(18) , mathematics(36), finance(32), etc.

Some functions have evolved for several generations Since SAS has development history of more than 40 years. For example, there are 6 functions from SAS dictionary for random number generator from normal distribution, including ‘normal’, ‘rannor’ and its call routine, ‘rannorm’ and its call routine, and ‘rand’. All functions need seeds to produce random numbers and the random number queue can be replicated if with the same seed. The difference is that the latest one ‘rand’ has astronomical possibilities of seeds, while the older types only contains merely 2 trillion seeds that can cause dependence among various random number queues.

SAS handles data with rows as units (SAS calls row as observation), which is a unique characteristics, while most other software packages tend to process data with columns or vectors. Thus, many summarization functions in Data step only work on the ‘right’ by combining all variables in a row. As for vertical summarization, the SAS procedures are more appropriate, such as Proc Summary, Proc Means or Proc report. In a word, if we prefer SAS Data step, a transposition may be necessary.

Reference: 1. SAS 9.2 Language Reference: Dictionary, Third Edition. SAS Publishing. 2009.
2. Ron Cody. SAS Functions by Example, Second Edition. 2010.

/**********************AUTHOR(DAPANGMAO)----HCHAO8@GMAIL.COM***********************************;
*(1)CALCULATE THE COMPOSITION OF SAS FUNCTIONS*/
proc sgplot data=sashelp.vfunc;
vbar fnctype/barwidth=0.5
transparency=0.2 ; 
run;

proc sgplot data=sashelp.vfunc;
vbar source;
run;
proc sgplot data=sashelp.vfunc;
scatter x= source y=fnctype;
run;
proc sql;
select *
from sashelp.vfunc
where lowcase(fncname) in ('rannorm', 'rannor', 'normal', 'rand');
quit;
proc freq data=sashelp.vfunc;
tables fnctype*source/nopercent nocum norow nocol;
run;

/*(2)COMPARE SERVAL RANDOM FUNCTIONS*/
data one;
call streaminit(1234);
do i=1 to 10000;
x1=rannorm(1234);
x2=rannor(1234);
x3=normal(1234);
x4=rand('normal');
output;
end;
run;
data two;
seed1 = 1;
seed2 = 3;
seed3 = 5;
seed4=7;
do i = 1 to 10000;
call rannor(seed1, x1);
call rannor(seed2, x2);
call rannor(seed3, x3);
call rannor(seed4, x4);
output;
end;
run;
%macro test(input);
proc sgscatter data = &input;
title 'Independence test';
plot x1*x2 x1*x3 x3*x2 x1*x4 x2*x4 x3*x4 / markerattrs = (size = 1);
run;
%mend test;

/*(3)PICK OUT THE LARGEST THREE FROM VARIOUS TRANSACTIONS*/
data test;
attrib amt informat=dollar10.2 format=dollar10.2;
do id=1 to 10;
times=ceil(100*ranuni(12345));
do i=1 to times;
amt=10000*ranuni(123);
output;
end;
end;
drop i times;
run;
proc sort data=test out=test1; 
by id descending amt;
run;
data test2;
do _n_=1 by 1 until(last.id);
set test1 ;
by id;
if _n_<4 then output;
end;
run;
proc transpose data=test out=test3(drop=_name_);
by id notsorted;
var amt;
run;
proc sql;
select name into: vname separated by ', '
from sashelp.vcolumn
where libname='WORK'
and memname='TEST3'
and name contains 'COL'
;quit;
%put &vname;
data test4;
set test3;
call sortn(of &vname);  
vstd=std(of &vname);
vmax=max(of &vname);
vmean=mean(of &vname);
vmedian=median(of &vname);
vrange=range(of &vname);
vnum=n(of &vname);
vmissing=nmiss(of &vname);
no1=largest(1, of &vname);
no2=largest(2, of &vname);
no3=largest(3, of &vname);
run;

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...