Wednesday, November 13, 2013

When ROC fails logistic regression for rare-event data

ROC or AUC is widely used in logistic regression or other classification methods for model comparison and feature selection, which measures the trade-off between sensitivity and specificity. The paper by Gary King warns the dangers using logistic regression for rare event and proposed a penalized likelihood estimator. In PROC LOGISTIC, the FIRTH option implements this penalty concept.
When the event in the response variable is rare, the ROC curve will be dominated by minority class and thus insensitive to the change of true positive rate, which provides litter information for model diagnosis. For example, I construct a subset of SASHELP.CARS with the response variable Type including 3 hybrid cars and 262 sedan cars, and hope to use the regressors, Weight, Wheelbase, Invoice to predict whether a car’s type is either hybrid or sedan. After the logistic regression, the AUC tents to be 0.9109 that is a pretty high value. However, the model is still ill-fitted and needs tuning, since the classification table shows the sensitivity is zero.
data rare;
    where type in ("Sedan", "Hybrid");

proc freq data = rare;
    tables type;

proc logistic data = rare;
    model Type(event='Hybrid') = Weight Wheelbase Invoice 
       / pprob = 0.01 0.05 pevent = 0.5 0.05 ctable; 
Prob EventProb LevelCorrect EventCorrect Non-EventIncorrect EventIncorrect Non-EventAccuracySensitivitySpecificityFalse POSFalse NEG


In case that ROC won’t help PROC LOGISTIC any more, there seem three ways that may increase the desired sensitivity or boost the ROC curve.
  1. Lower the cut-off probability
    In the example above, moving the cut-off probability to an alternative value to 0.01 will significant increase the sensitivity. However, the result comes with the drastic loss of specificity as the cost.
  2. Up-sampling or down-sampling
    Imbalanced classes in the response variable could be adjusted by unequal weight such as up-sampling or down-sampling. Down-sampling, would be easy to fulfill using a stratified sampling by PROC SURVEYSELECT. Up-sampling is more appropriate for this case, but may need over-sampling techniques in SAS.
  3. Use different criterions such as F1 score
    For modeling rare event classification, the most important factors should be sensitivity and precision, instead of accuracy that combines sensitivity and specificity. On the contrary, the F1 score can be interpreted as a weighted average of the sensitivity and precision, which makes it a better candidate to replace AUC.
    \text{F1 score} = {2* {precision * sensitivity\over precision + sensitivity}}

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...