Wednesday, July 31, 2013

Regularization adjustment for PROC SVM

SVM is a popular statistical learning method for either classification or regression. For classification, a linear classifier or a hyperplane, such as f(x) = w^TX+b with w as weight vector and b as the bias, would label data into various categories. The geometric margin is defined as 2\over{||w||}. For SVM, the maximum margin approach is equivalent to minimize {||w||}^2. However, with the introduction of regularization to inhibit complexity, the optimization has to upgrade to minimize {||w||}^2+C\Sigm^N_i\xi_i, where C is the regularization parameter. Eventually the solution for SVM turns out to be a quadratic optimization problem over w and ξ with the constraints of y_if(X_i)\ge1-\xi_i.
Since the sashelp.class dataset is extremely simple, I attempt to use its variables age and weight to predict sex, which is just for demonstration purpose. According to the plot, the data points are linearly non-separable. The kernel methods have to be applied to map the input data to a high-dimensional space so that they are linearly separable. To harness SVM in SAS, three procedures are commonly used under the license of SAS EMiner. For example, PROC DMDB is used to recode the categorical data and set up the working catalog, PROC SVM is used to build the model, and PROC SVMSCORE is applied to implement the model.
proc sgplot data=sashelp.class;
   scatter x = weight y = age / group = sex;
run;

proc dmdb batch data=sashelp.class dmdbcat=_cat out=_class;
   var weight age;
   class sex;
run;

Hard margin

If we let C be infinitely large, then all constraints will be executed. Therefore, the margin is narrowed down.
proc svm data=_class dmdbcat=_cat c=1e11 kernel=linear out= _1;
   title 'hard margin';
   ods output restab = restab1;
   var weight age ;
   target sex;
run;
The accuracy is 63.16%. Overall, the result is below.
Name Value
Regularization Parameter C 100000000000
Classification Error (Training) 7.000000
Geometric Margin 1.624447E-10
Number of Support Vectors 17
Estimated VC Dim of Classifier 3.4494098E24
Number of Kernel Calls 74

Soft margin

On the contrary, the small C allows constraints to be easily ignored, which leads to the desired large margin.
proc svm data=_class dmdbcat=_cat kernel=linear out= _2;
   title 'soft margin';
   ods output restab = restab2;
   var weight age;
   target sex;
run;
The accuracy or miscalculation rate keeps the same, since the data is so small. In PROC SVM, without the specification, the C value is solved to be almost near zero, and the margin are huge.
Name Value
Regularization Parameter C 0.000098161
Classification Error (Training) 7.000000
Geometric Margin 158.553426
Number of Support Vectors 18
Estimated VC Dim of Classifier 3.850370
Number of Kernel Calls 76

Conclusion

  1. For the SVM procedure, except the training data, adding a validation data for the testdata option at the PROC statement could effectivley increase the C parameter and decrease the possibility of overfitting.
  2. There are a few advantages for SVM over other data mining methods. First SVM is suitable for high dimension data, and more importantly the complexity can be easily controlled by the adjustment of the regularization parameter C.

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...