Thursday, June 2, 2011

Using Proc IML for credit risk validation

Validation step is crucial for a scorecard in credit risk industry. Gunter and Peter mentioned in their fantastic book [Ref. 1] that cumulative accuracy profile (CAP) and receiver operating characteristic (ROC) are two popular methods. Thus, the values of accuracy ratio from CAP (or I refer it as Gini coefficient) and area under curve(AUC) from ROC would be important metrics to evaluate the discriminatory power of the scorecard. And actually they can be derived from each other by their linear relationship.

In the latest post of his blog, Rick Wicklin introduced how to implement the trapezoidal rule or calculate trapezoid areas under curve by a function of Proc IML [Ref. 2]. Although the same methodology can be realized by Data Step array or Proc FCMP, the beauty of Rick’s method is that it avoids the loops through IML’s matrix operation and therefore is more efficient and scalable. In the example below, I built Rick’s function into a macro to calculate AUC and accuracy ratio for a tiny testing dataset. The ‘TrapIntegral’ function can be further applied for validation of large-scale credit risk records.

1. Gunter Löeffler and Peter Posch. ‘Credit Risk Modeling using Excel and VBA’. The 2nd edition. Wiley.
2. Rick Wicklin. ‘The Trapezoidal Rule of Integration’. The Do Loop. 01JUN2011.

/*******************READ ME*********************************************
* - Using Proc IML for credit risk validation -
* SAS VERSION:    9.2.2
* DATE:           02jun2011
****************END OF READ ME******************************************/

****************(1) MODULE-BUILDING STEP********************************;
%macro auc(data =, path =, filename =);
   *  MACRO:      auc()
   *  GOAL:       calcuate auc and accuracy ratio for default risk
   *  PARAMETERS: data     = dataset to use
   *              path     = output path
   *              filename = name for validation card
   options mprint mlogic;
   ods listing close;
   proc sql;
      select sum(default) into :totaldef
      from &data;

   data _tmp01;
      set &data nobs = nobs;
      xratio = ifn(default = 0, 1, 0)/(nobs - &totaldef);
      yratio = default/&totaldef;
   proc sort data = _tmp01 out = _tmp02;
      by descending rating descending default;
   data _tmp03;
      set _tmp02;
      by descending rating;
      retain x y;
      if _n_ = 1 then do; x = 0; y = 0; end; 
      x + xratio;
      y + yratio;
      if last.rating;
   data _tmp04;
      if _n_ = 1 then do; x = 0; y = 0; end; output;
      set _tmp03(keep = x y); 

   proc iml;
      use _tmp04;
      read all var{x y};
      start TrapIntegral(x,y);
         N = nrow(x);
         dx = x[2:N] - x[1:N-1];
         meanY = (y[2:N] + y[1:N-1])/2;
         return( dx` * meanY );
      area = TrapIntegral(x,y);
      acuratio = 2*area - 1;
      call symput('area', left(char(area)));
      call symput('acuratio', left(char(acuratio)));

   ods html file = "&path&filename..xls" gpath = "&path" style = harvest;
   title; footnote;
   proc print data = &data label noobs;

   proc sgplot data = _tmp04;
      series x = x y = y ;
      scatter x = x y = y;
      band x =x upper = y lower = 0 / transparency=.5;
      xaxis grid;
      yaxis grid;
      inset "AUC is: %sysfunc(putn(&area, 8.4)); 
            Accuracy Ratio is: %sysfunc(putn(&acuratio, 8.4))" 
            / position = bottomright border;
      keylegend "scatter";
   ods html close;

   proc datasets;
      delete _:;
   ods listing;
%mend auc;

****************(2) TESTING STEP****************************************;
data test;      
   input Observation Rating $ Default;
   label rating = 'Rating(A is best)'
        default = 'Default(1=default)';
1   A   0
2   A   0
3   A   0
4   B   1
5   B   0
6   B   0
7   C   1
8   C   1
9   C   1
10  C   0

%auc(data = test, path = h:\, filename = valid);
****************END OF ALL CODING***************************************;

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...