Wednesday, April 14, 2010

Labeling variables by a macro in SAS

To rename the variables of a dataset in SAS is a daily routine. SAS or the programmer s would give an arbitrary name for any variable at the initial stage of data integration. Those names have to be modified afterward. Wensui [Ref.1] developed a macro to add prefixes to the variables . Vincent et al. [Ref. 2] extended his idea and added some parameters into the macros. However, giving a name to a variable in SAS has many restrictions regarding the length and the format. For better understanding and recognition, labeling variables instead of renaming them would be useful. In the example below, first comes with an integration of complicated text data. Proc Transpose generates a number of variables with the same prefix.  Then by invoking the label() macro, the dataset would be correctly labeled as desired.

1. Wensui Liu. ‘How to rename many variables in SAS’.
2. Vincent Weng. Ying Feng. ‘Renaming in Batches’. SAS Global 2009.

****************(1) MODULE-BUILDING STEP******************;
%macro label(dsin = , dsout = , dslabel = );
   *  MACRO:      label()
   *  GOAL:       use a label dataset to label the variables 
   *              of the target dataset
   *  PARAMETERS: dsin = input dataset
   *              dsout = output dataset
   *              dslabel = label dataset
      data _tmp;
            set &dslabel ;
            num = _n_;

      ods listing close;
      ods output variables = _varlist;
      proc contents data = &dsin;

      proc sql;
            select cats(a.variable, '="', b.labelname, '"') 
                   into: labellist separated by ' '
            from _varlist as a, _tmp as b 
            where a.num = b.num

      data &dsout;
            set &dsin;
            label &labellist;

      proc datasets;
            delete _:;
      ods listing;

****************(2) TESTING STEP******************;
******(2.1) INTEGRATE COMPLICATED DATA*************;
data have;
    infile datalines dlm = ',';
        retain _row;
        input _tmpvar $ @@ ; 
        if prxmatch("/10\d/", _tmpvar) ne 0 then _row + 1; 
        if missing(_tmpvar) then delete;
    100, Tom, 3,1,5,2,6
    101, Marlene, 1,2,4
    102, Jerry, 9,10,4,
    5, 6                    
    103, Jim,2 ,1, 2, 2,4

proc transpose data=have out=want(drop = _:)
    prefix = var;
    by _row;
    var _tmpvar;

******(2.2) INPUT LABELS FOR USE*************;
data label;
      input labelname $30.;
      Patient ID
      Patient last name
      The 1st treatment
      The 2nd treatment
      The 3rd treatment
      The 4th treatment
      The 5th treatment

******(2.3) INVOKE MACRO TO LABEL*************;
%label(dsin = want, dsout = want_labeled, dslabel = label);

****************END OF ALL CODING***************************************;

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...