Wednesday, June 8, 2011

Bootstrap prediction models for probability of default

Not like consumer credit scoring, corporate default study is usually jeopardized by the low-n-low-p data sizes. In the fourth chapter of their book, Gunter and Peter, demonstrated an example about how to construct prediction models for IDR (invesment grade default rate) using VBA and therefore evaluate them by residual sum of squares [Ref. 1]. The only shortcoming is that the variables are limited and the observations are scarce (25 total, 22 valid), which makes me feel awkward to estimate the distribution. In this case, bootstrapping may be a good alternative, since it is a simple and straightforward method to increase predictability. Previously, Wensui showed the logit bootstrapping for credit risk by Proc LOGISTIC and Proc GENMOD [Ref. 2]. Data transformation was conducted by Data Step merge first raised by Liang Xie [Ref. 3].

The GLMSELECT procedure in SAS 9.2.2 harnesses the power of variable selection and bootstrapping together. In the example, the trend of IDR is hard to tell, even with a 3-year moving average chart. Thus, I chose 10000 times of resampling. As the result, 3 of the four predictors remained with their corresponding satisfying parameters.

1. Gunter Loeffler and Peter Posch. ‘Credit Risk Modeling using Excel and VBA’. The 2nd edition. Wiley.
2. Wensui Liu. ‘Improving credit scoring by generalized additive model’. SAS Global 2007.
3. Liang Xie.

/*******************READ ME*********************************************
* - Bootstrap prediction models for probability of default -
* SAS VERSION:    9.2.2
* DATE:           09jun2011
****************END OF READ ME******************************************/

****************(1) DATA INTEGRATION/TRANSFORMATION STEP*****************;
data idr;
   infile datalines delimiter = ',' missover dsd lrecl=32767;
   format year idr prf age bbb spr best12.;
   input year idr prf age bbb spr;
   /*To buy Gunter and Peter's book will have the full data*/

data idr_t;
   merge idr(keep=idr firstobs=2 ) 
      idr(rename=(idr=_idrforward year=_yearforward));
   year = _yearforward + 1;
   label idr = 'invesment grade default rate'
        prf = 'forecasted change in corporate profits'
        age = 'fraction of new issuers'
        bbb = 'fraction of bbb-rated issuers'
        spr = 'spread on baa bonds';

****************(2) MODULE-BUILDING STEP********************************;
%macro idrbs(data =, nsamp =, out =);
   *  MACRO:      idrbs()
   *  GOAL:       build prediction model by variable selection and 
   *              bootstrapping
   *  PARAMETERS: data     = dataset to use
   *              nsamp    = numbers of bootstrapping
   *              out      = name of scored dataset
   proc sgplot data = &data;
      title 'the invesment grade default rates by years';
      series x = year y = idr;
      yaxis grid;

   ods graphics on;
   proc macontrol data = &data;
      title 'three year moving average chart for invesment grade default rate';
      machart idr*year / span = 3 odstitle = title;
   proc glmselect data = &data plots = all;
      model idr = prf age bbb spr/selection = stepwise(select = press);
      modelaverage nsamples = &nsamp subset(best = 1);
      output out = &out(drop = _:) p = pred_idr;
   ods graphics off;
%mend idrbs;

%idrbs(data = idr_t, nsamp = 10000, out = idr_scored);

****************END OF ALL CODING***************************************;

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...