Thursday, November 21, 2013

Kernel selection in PROC SVM

The support vector machine (SVM) is a flexible classification or regression method by using its many kernels. To apply a SVM, we possibly need to specify a kernel, a regularization parameter c and some kernel parameters like gamma. Besides the selection of regularization parameter c in my previous post, the SVM procedure and the iris flower data set are used here to discuss the kernel selection in SAS.

Exploration of the iris flower data

The iris data is classic for classification exercise. If we use the first two components from Principle Component Analysis (PCA) to compress the four predictors, petal length, petal width, sepal length, sepal width, to 2D space, then two linear boundaries seem barely able to separate the three different species such as Setosa, Versicolor and Virginica. In general, the SASHELP.IRIS is a well-separated data set for the response variable
****(1). Data exploration of iris flower data set****;
data iris;
   set sashelp.iris;
run;

proc contents data=iris position;
run;

proc princomp data=iris out=iris_pca;
   var Sepal: Petal:;
run;

proc sgplot data=iris_pca;
   scatter x = prin1 y = prin2 / group = species;
run;

PROC SVM with four different kernels

Kernel methodsOption in SASFormulaParameter in SAS
linearlinearu’*vNA
polynomialpolynom(gamma*u’*v + coef)^degreeK_PAR
radial basisRBFexp(-gamma*|u-v|^2)K_PAR
sigmoidsigmoidtanh(gamma*u’*v + coef)K_PAR; K_PAR2
PROC SVM in SAS has provided a range of kernels for selection, including ERBFFOURIERLINEARPOLYNOMRBFRBFRECSIGMOID and TANH. Another great thing is that it supports cross-validation including Leave-One-Out Cross-Validation ( by loo option in PROC SVM) and k-Fold Cross-Validation (by split option in PROC SVM).
Here the error rates of Leave-One-Out Cross-Validation is used to compare the performance among the four common kernels including linear, radial basis function, polynomial and sigmoid. And in this experiment most time the parameters such as c and gamma are arbitrarily set to be 1. As the result showed in the bar plot, the RBF and linear kernels bring great results, while RBF is slightly better than linear. On the contrary, the polynomial and sigmoid kernels behave very badly. In conclusion, the selection of kernel for SVM depends on the reality of the data set. A non-linear or complicated kernel is actually not necessary for an easily-classified example like the iris flower data set.

****(2). Cross validation error comparison of 4 kernels****;
proc dmdb batch data=iris dmdbcat=_cat out=_iris;
   var Sepal: Petal:;
   class species;
run;

%let knl = linear;
proc svm data=_iris dmdbcat=_cat kernel=&knl c=1 cv =loo;
   title "The kernel is &knl";
   ods output restab = &knl;
   var Sepal: Petal:;
   target species;
run;

%let knl = rbf;
proc svm data=_iris dmdbcat=_cat kernel=&knl c=1 K_PAR=1 cv=loo;
   title "The kernel is &knl";
   ods output restab = &knl;
   var Sepal: Petal:;
   target species;
run;

%let knl = polynom;
proc svm data=_iris dmdbcat=_cat kernel=&knl c=1 K_PAR =3 cv=loo;
   title "The kernel is &knl";
   ods output restab = &knl;
   var Sepal: Petal:;
   target species;
run;

%let knl = sigmoid;
proc svm data=_iris dmdbcat=_cat kernel=&knl c=1 K_PAR=1 K_PAR2=1 cv=loo;
   title "The kernel is &knl";
   ods output restab = &knl;
   var Sepal: Petal:;
   target species;
run;

data total;   
   set linear rbf polynom sigmoid;
   where label1 in ('Kernel Function','Classification Error (Loo)');
   cValue1 = lag(cValue1);
   if missing(nValue1) = 0;
run;

proc sgplot data=total;
   title " ";
   vbar cValue1 / response = nValue1;
   xaxis label = "Selection of kernel";
   yaxis label = "Classification Error by Leave-one-out Cross Validation"; 
run;

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...