Thursday, July 7, 2011

Using SAS/IML for high performance VaR backtesting

Thanks to Rick’s new book about SAS/IML, I realized that SAS/IML is far beyond PROC IML, and recently it added an advanced IML language called IMLPlus that enjoys many new features. I also start to play with SAS/IML Studio, an IDE exclusively for IMPlus in addition to the SAS enhanced editor we are familiar with (Interestingly, a number of emerging IDEs for matrix languages prefer to use names like XXX Studio. For example, a hot IDE of R is called RStudio). Although SAS users/programmers are notably conservative, I had a rough intuition that learning SAS/IML will be helpful in some occasions of high-demanding computation.

For instance, backtesting of Value-at-risk(VaR) requires a lot of computer resource. At the beginning, VaR based on different methods should be calculated. Jon Danielsson introduced how to use Matlab and R to find VaRs by four ways: exponential moving average, moving average, historical simulation, and generalized autoregressive conditional heteroskedasticity (GARCH), separately [Ref. 1]. And he maintains a daily-updating plot on his website to show the trends [Ref. 2]. As Jon described, if a 4000-day data is fetched and a 1000-day rolling window for calculation is applied, it means that 3000 loops will be needed to have the VaRs , which would make the computation efficiency a concern. Fortunately, we can easily translate the codes by Matlab and R into SAS codes through PROC IML. If there is any hurdle in this porting process, I can always find tips from Rick’s resourceful blog [Ref.3]. The first three chapters of his book are also good reference [Ref. 4].

That is what I did. First I have to use R to download S&P 500 daily returns from 1995-08-09 to 2011-06-28, total 4000 observations. SAS/IML Studio can call R’s function. However, the R on my laptop is version 2.13.0, which is not supported by SAS/IML Studio currently. So I directly used RStudio, my favorite IDE for R, and saved the data as CSV.

######################(0) DOWNLOAD STEP###############################
p = get.hist.quote(instrument = "^gspc",
    Start = "1995-08-09",end="2011-06-28",quote="AdjClose",quiet=T) 
####################END OF STEP 0#####################################
Then I used PROC IML to calculate VaRs by exponential moving average, moving average and historical simulation, and eventually inserted them into a macro. Unlike R or Matlab, SAS depends on the procedures rather than the functions. Since PROC IML cannot call other SAS’s procedures, I used PROC FCMP to make PROC AUTOREG as a user-defined function for GARCH. SAS now provides virtual hard disk with physical memory specified by the MEMLIB command, which is indeed a big help for me to slash the computer’s operation time. After all, the total time to run the program below on my 2-year-old laptop is 9 minutes, which is acceptable.

**********************(1) INPUT STEP*********************************;
***********(1.1) INPUT DAILY RETURN DATA FROM R**********************;
data _tmp01;
   infile "c:\tmp\result.csv" delimiter = ',' missover dsd
   lrecl=32767 firstobs=2; 
   format   date yymmdd10. r best32.; 
   informat date yymmdd10. r best32.;
   input date $ r;

***********(1.2) LOAD DATA INTO MEMORY LIBRARY***********************;
libname mem 'd:\tmp' memlib;
data mem._tmp01; 
   set _tmp01(keep=r);
********************END OF STEP 1************************************;

**********************(2) MODULE-BUILDING STEP***********************;
%macro var123_macro();
   proc iml;
      lambda = 0.94;
      use mem._tmp01;
         read all var{r} into y;
      close mem._tmp01;
      t1 = &start - 1000;
      t2 = &start - 1;
      window = y[t1:t2];
      start Var(A);            
         n = nrow(A);              
         return( (A-A[:,])[##,]/(n-1) );
      finish Var;
      if exist("mem.b") then do;
         use mem.b;
            read all var{sl1} into sl1;
         close mem.b;
      else do;
         sl1 = Var(y[1:30]);
         do t = 2 to 1000;
            sl1 = lambda * sl1 + (1-lambda) * y[t-1]**2;
      sl1 = lambda * sl1 + (1-lambda) * y[&start-1]**2;
      var1 = -probit(&pvalue) * sqrt(sl1) * &value;
      call symput('var1', left(char(var1)));
      create mem.b var{sl1};
      close mem.b;
      call sort(window, {1});
      var2 = -sqrt(Var(window)) * probit(&pvalue) * &value;
      var3 = -window[1000*&pvalue] *&value;
      call symput('var2', left(char(var2)));
      call symput('var3', left(char(var3)));
%mend var123_macro;

proc fcmp outlib =;
   subroutine var123(start, pvalue, value, var1, var2, var3);
      outargs var1, var2, var3;
      rc1 = run_macro('var123_macro', start, pvalue, value, var1, var2, var3);

***********(2.2) CREATE A FUNCTION FOR VAR BY GARCH******************;
%macro var4_macro();
   ods select none;
   %let start = %eval(&start - 1000);
   %let end   = %eval(&start + 999);
   proc autoreg data = mem._tmp01(firstobs = &start obs = &end);
      model r = / noint garch=(q=1, p=1);
      output out = mem._gar01 ht = ht;
   data _null_;
      set mem._gar01 end = eof;
      if eof then do;
          var4 = -sqrt(ht) * probit(&pvalue) * &value;
          call symput('var4', var4);
   ods select all;

proc fcmp outlib =;
   function var4(start, pvalue, value);
       rc1 = run_macro('var4_macro', start, pvalue, value, var4);
********************END OF STEP 2************************************;

**********************(3) IMPLEMENTATION STEP************************;
options cmplib = (work.myfunclib);
   pvalue = 0.01;
   value = 1000;
   do i  = 1001 to 4000;
       call var123(i, pvalue, value, var1, var2, var3);
       var4 = var4(i, pvalue, value);
********************END OF STEP 3************************************;
The four VaRs can be displayed in a time series plot. Then the backtests can be conducted by Bernoulli coverage test and independence coverage test.

**********************(4) VISUALIZATION STEP*************************;
data _tmp02;
   set _tmp01;
   i = _n_;

data _tmp03(where=(i>1000));
   merge _tmp02;
   by i;
   label var1 = "Exponential Moving Average"
         var2 = "Moving Average"
         var3 = "Historical Simulation"
         var4 = "GARCH";

ods html gpath = "c:\tmp" style = harvest;
proc sgplot data = _tmp03;
   series x = date y = var1;
   series x = date y = var2;
   series x = date y = var3;
   series x = date y = var4;
   format var1 dollar8.2;
   yaxis grid label = 'Value-at-risk '; 
   xaxis label = ' ';
ods html close;
********************END OF STEP 4************************************;
The enhanced version of PROC IML, IMLPlus, can call SAS’s procedures within it. Rick gave a number of illustrations in the 4, 12-14 chapters of his book [Ref.4]. The GARCH part above can be re-written in SAS/IML Studio like:

value = 1000;
pvalue = 0.01;
var4 = j(4000, 1);

do i = 1001 to 4000;
   start = i - 1000;
   end   = i - 1;
   submit start end;
      proc autoreg data = sasuser._tmp01(firstobs = &start obs = &end);
            model r = / noint garch=(q=1, p=1);
            output out = _gar01 ht = ht;
   use _gar01;
      read all var{ht} into ht;
   close _gar01;
   x = ht[1000];
   getvar4 = -sqrt(x)*probit(pvalue)*value;
   var4[i, 1] = getvar4;

create sasuser.garch from var{var4}; 
close sasuser.garch; 
In conclusion, I had a lot of fun in studying SAS/IML this summer. SAS/IML Studio is a friendly environment to write code. And IMLPlus provides a different perspective to think about SAS programming.

1. Jon Danielsson. ‘Financial Risk Forecasting’. Wiley, 2011.
2. Jon Danielsson .
3. Rick Wicklin. The Do Loop.
4. Rick Wicklin. ’ Statistical Programming with SAS/IML Software’. SAS Publishing, 2010.

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...