Friday, February 11, 2011

Macro embedded function finds AUC


As a routine practice to reuse codes, SAS programmers tend to contain procedures in a SAS macro and pass arguments to them as macro variables. The result could be anything by data set and SAS procedure: figure, dataset, SAS list, etc. Thus, macro in SAS is like module or class in other languages. However, if repeated calling of a macro is about to accumulate some key values based on different input variables, the design of such a macro could be tricky. My first thought is to use a nested macro (child macro within parent macro) to capture the invisible macro variables floating everywhere in the environment. The idea is pretty daunting, since I have to consider the scopes of macro variables from different macro layers and naming of temporary datasets.

Now with Proc Fcmp, macro embedded function allows the utilization of SAS’s procedure within Data Step. One of the unique features inProc Fcmp is that it could encapsulate macros by its RUN_MACRO function and RC handle. Implementation of macro embedded function has some advantages. First, since a self-defined function is called in a data step, all values would be automatically saved in the dataset of this data step. Second, we can just keep the exactly return values we want. The ODS statement first specifies the output dataset to be temporally saved. Then the exact value as a macro variable will be transported to function as return argument. Third, Proc Fcmp would provide an exception-handling mechanism, depending on the macro’s output. In this post, I showed one example with a SAS help file ‘SASHELP.CARS’ (code below). This dataset has the information of some car models. I would like to see which variables, such as car’s length, weight, price, etc, could predict whether this car is made by US or NON-US countries. Values of AUC (area under ROC curve) from logistic regression were obtained and compared by this strategy.

The user-defined function library is accessible and reusable. SAS provides a four-level directory to save user-defined functions (library-1st level directory- 2nd level directory –function’s name), and also a two-level directory for a complied macro (library-macro’s name). As for a macro embedded function, the complied macros and functions should be stored together at the same place. Then next time, by specifying the path in SAS option statement, those functions can be instantly invoked.

Reference: Stacey and Jacques. Adding Statistical Functionality to the DATA Step with PROC FCMP. SAS Global 2010.

******(1) CREATE COMPLIED MACRO AND FUNCTION******;
****(1.0) SET A PATH TO STORE MACRO AND FUNCTION***;
libname myfunc 'h:\myfun';
option  mstored sasmstore=myfunc;

****(1.1) CREATE THE EMBEDDED MACRO*****;
%macro auc_logis_macro() /store source des='auc_logis_macro';
    /* NOTE: DEQUOTE THE INPUT STRINGS */
 %let ds_in = %sysfunc(dequote(&ds_in));
 %let target = %sysfunc(dequote(&target));
 %let input = %sysfunc(dequote(&input));
 /* NOTE: USE PROC LOGISTIC TO GENERATE AUC */
 ods graphics;
 ods output Association= temp;
 ods select association roccurve;
 proc logistic data=&ds_in descending plots(only)=roc ;
  model &target = &input  ; 
 run;
    ods graphics off;
 /*NOTE: ONLY CHOOSE AUC TO OUTPUT  */
 proc sql noprint;
  select nValue2 into: auc_value
  from temp
  where Label2 = 'c'
 ;quit;
 /*NOTE: KILL THE INTERIM DATASET  */
 proc datasets; delete temp;run;
%mend auc_logis_macro;

****(1.2) INCORPORATE MACRO INTO FUNCTION ****;
proc fcmp outlib = myfunc.macro.functions;
   function auc_logis(ds_in $, target $ , input $);
      rc = run_macro('auc_logis_macro', ds_in, target, input, auc_value);
      if rc eq 0 then return(auc_value);
      else return(.);
   endsub;
run;
*******END OF STEP (1)***********;


*****(2) USE THE MACRO EMBEDDED FUNCTION*******;
****(2.0) FIND PATH FOR STORED FUNCTION AND MACRO****;
libname myfunc 'h:\myfun';
option cmplib = (myfunc.macro) mstored sasmstore=myfunc mprint symbolgen;

****(2.1) CREATE A BINARY TARGET VAR FROM A SAS HELP DATASET****;
data test;
 set sashelp.cars;
 length country  $6;
 if origin = 'USA' then country = 'US';
 else country = 'NON-US';
run;

****(2.2) INVOKE THE FUNCTION TO EVALUATE MULTIPLE VARS OR VARS' COMBINATION****;
data auc_ds;
  ds_in='test';
  target = 'country';
  length input $70;
 do input = ' Weight' , ' Length', 'Horsepower', 'EngineSize',  'MPG_City', 
'Wheelbase', 'Weight Length', ' EngineSize Weight Length MPG_City Wheelbase Horsepower MPG_Highway';
  auc=auc_logis(ds_in ,target,  input); OUTPUT;
 end;
run;

****(2.3) COMPARE ALL AUC VALUES****;
data auc_ds1;
 set auc_ds;
 label auc ='AUC';
 if _n_=8 then input ='All numeric variables';
run;
proc sgplot data=auc_ds1;
 vbar input/response=auc;
 yaxis grid;
 refline 0.5;
run;

****END OF STEP (2)****;
*****************END OF THE PROGRAM************;
 

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...