Sunday, August 28, 2011

My 10 wishes for SAS

No wonder that SAS 9.3 is one of SAS’s greatest products. Beyond it, I have 10 secret wishes and hope the far-away SAS 9.3.2 or SAS 9.4 might realize them someday.

1. Trigger for SAS dataset
A wonderful thing about SAS is that it can be used as a RDBMS with full functionality. Just one piece from the SQL language is missing in SAS - trigger. Adding triggers would bring more security for SAS datasets or data views, and automate some routine operations. 

2. PROC PIVOT for pivot table
Pivot table is a huge business success, since I found that every boss loves to do point-and-click data aggregation in Excel (why not they just use the simple PROC FREQ of SAS?). I often spend many hours to painfully decorate a pivot table. A procedure that directly exports SAS’ dataset to pivot-table-contained Excel spreadsheet is going to be a big plus.

3. Visible hash objects
Hash object offers an efficient alternative to the hard disk based Data Step programming. Michele Burlew’s new book  this fall would be a milestone for this emerging technology since SAS 9.1. If SAS windowing environment provides the views for the hash objects through the library explorer, matching or lookup on the hash objects would be better perceived.

4. A multi-threading LOGISTIC procedure
Last week Rick introduced how to open multiple workspace instances to make SAS/IML multi-threading. For many SAS users, PROC LOGISTIC is worthy of the half price they paid for SAS. It seems that SAS is developing a multi-threading HPLOGISTIC procedure for Teradata or in-database technology. However, at the age of big data, a multi-threading LOGISTIC procedure is still very much desired in SAS/STAT.

5. A system options for decimal digits
It is well known that to export the tables by ODS and then change the formats would allow displaying more decimal places for SAS’s outputs. However, a system option specifying the number of digits in the result would save coding time.


/*******************OPTION 1 ***********************************************
* From NOTE 37106 at support.sas.com/kb/37/106.html 
* How can I display more or fewer decimal places in procedure results
* -- Change the decimal places for a table 
****************************************************************************/
ods output parameterestimates = pe;
proc reg data = sashelp.class;
   model Weight = Height;
run; 
proc print data = pe label noobs; 
   format _numeric_ 12.8 ;
run;

/*******************OPTION 2 ***********************************************
* From Robin High at http://listserv.uga.edu/cgi-bin/wa?A2=ind0704b&L=sas-l&P=73559
* Re: How to display p value with more dicimal digits
* -- Change  the decimal places for all p-values 
****************************************************************************/
proc template;
   define column Common.PValue;
      format = pvalue12.8; 
   end;
run;
proc reg data = sashelp.class;
   model Weight = Height;
run; 

6. Support vector machine in SAS/STAT
Although SAS Enterprise Miner 7.1 has two procedures: PROC SVM and PROC SVMSCORE, they seem primitive and only apply for bivariate response variable. A procedure for SVM in SAS/STAT alongside the robust GLM-based procedures there would relieve many desperate SAS coders who got projects to do SVM with SAS.

7. A trial version (or learning edition)
One friend of mine is still using SAS 6 and insists that it is the culminating product. If there is a trial edition of SAS 9.3 available he can taste, he probably will change the idea. A learning edition with no cost or little cost can attract more people to start to learn SAS.

8. PROC TEXTCLOUD
Text cloud is a fancy visualization tool, although it does nothing about statistics. For example, Rick summarized his first 100 blog posts by it. R also has a nice package ' wordcloud'. A text cloud procedure would definitely make SAS more fun.

9. Random forest in Enterprise Miner
Random forest is one of the popular classification methods, which has not been included in Enterprise Miner 7.1 yet. Hope in the future it could become one of the modeling nodes.

10. Reasonably priced SAS cloud
Amazon is earning a windfall of money by its cloud services. And many start-ups provide R clouds. A PC SAS or UNIX SAS cloud may be a lucrative business for SAS. And I will be happy to show SAS to my friends on iPad or Android phone.

Thursday, August 25, 2011

A macro design pattern by PROC FCMP


We all heard horrible stories that someone tried to piece together a bunch of nice functioning macros for a big macro, and ended up with a messy and undebuggable system. Part of reasons can be about encapsulation: not all SAS programmers have the good habit to localize the macro variables by %local statement; the leaking macro variables may ruin the attempt to utilize multiple macros written by different SAS programmers. To solve this problem, Mark Tabladillo brought the concepts of encapsulation, polymorphism, and inheritance into nested macros. And he raised several design patterns for macros to emulate the object-oriented languages.

The FCMP procedure, besides its original purpose as a function compiler, could encapsulate macros by its RUN_MACRO function. The macro-based functions seem to be more safe modules than the macros themselves. Erin, Daniel and Himesh in their SAS Global 2011 paper showed an example to build a complicated reporting system for academic performances. Their principle is to construct a few macro-embedded functions by PROC FCMP and then incorporate them with an interface macro. Here I modified their codes a little to increase the number of macros and showed the relationship among the elements in the UML diagram above. The stucture is similar to the adapter pattern, one of the many OOP design patterns, with PROC FCMP as a wrapper.

Overall, functionalizing our macros or our colleagues’ macros by PROC FCMP is an alternative way to integrate them for a ‘big’ purpose.


 
/*******************READ ME*********************************************
* THE CODES BELOW ARE COPIED AND MODIFIED FROM ERIN LYNCH, DANIEL 
* O’CONNOR, HIMESH PATEL OF SAS INSTITUTE 
*
* THE ORIGINAL CODE AND RAW DATA CAN BE FOUND FROM THEIR PAPER 
* MY REPORTING REQUIRES A FULL STAFF—HELP! 
* PAPER 291, SAS GLOBAL FORUM 2011
* support.sas.com/resources/papers/proceedings11/291-2011.pdf
*
****************END OF READ ME******************************************/

%macro tabledata_prep;
   options topmargin=.125in bottommargin=.125in leftmargin=.25in rightmargin=.25in nodate nonumber;  
   title; footnote;
   ods escapechar="~"; 
   %let tabledata=%sysfunc(dequote(&tabledata.)); 
   data tabledata; 
      set &tabledata; 
      district=substr(district,1,8)||' '||substr(district,9,1); 
      school=substr(school,1,6)||' '||substr(school,7,1); 
   run; 
%mend;

%macro linedata_prep;
   %let linedata=%sysfunc(dequote(&linedata.)); 
   ods _all_ close; 
   data linedata; 
      set &linedata; 
      district=substr(district,1,8)||' '||substr(district,9,1); 
   run; 
   proc sort data= linedata out=sorted_linedata; 
      by district year; 
   run; 
   proc sort data= linedata out=districts(keep=district) nodupkey; 
      by district; 
   run; 
%mend;
                                       
proc template; 
   define style Styles.Charter; 
      parent = styles.printer; 
      style Body from Document 
         "Undef margins so we get the margins from the printer or SYS option"  / 
            marginbottom = _undef_ 
            margintop = _undef_ 
            marginright = _undef_ 
            marginleft = _undef_ 
            pagebreakhtml = html('PageBreakLine') 
            backgroundimage="Your.png"; 
   end; 
run; 
 
%macro Newfile; 
  %if &path ne '' %then %let pathopt= path=&path(url=none); 
   %else  %let pathopt=; 

   %if &gpath ne '' %then %let gpathopt= gpath=&gpath(url=none); 
   %else %let gpathopt=; 

   %let path=%sysfunc(dequote(&path.)); 
   %let gpath=%sysfunc(dequote(&gpath.)); 
   %let destination=%sysfunc(dequote(&destination.)); 
   %let file=%sysfunc(translate(%sysfunc(dequote(&file.)), "_", " ")); 
   %let extension=%sysfunc(dequote(&extension));      
 
   %if &styleparm ne '' %then %let styleopt= style=%sysfunc(dequote(&styleparm.)); 
   %else  %let styleopt=; 

    %if ( %upcase(&destination) eq PDF ) %then %do; 
       ods &destination file="&path.&file..&extension" notoc startpage=no 
       &styleopt; 
   %end; 
   %else %if (( %upcase(&destination) eq RTF ) or ( %upcase(&destination) eq TAGSETS.RTF )) %then %do; 
       ods &destination file="&path.&file..&extension" startpage=no &styleopt; 
   %end; 
   %else %if ( %upcase(&destination) eq HTML ) %then %do; 
       ods &destination file="&file..&extension" &pathopt &gpathopt &styleopt; 
   %end; 
%mend; 
                                                 
%macro Enrollment; 
   %let district=%sysfunc(dequote(&district.)); 
   ods text="~{newline 3}"; 
   ods text="~{style [width=100pct font_size=26pt background=CXf4e9c9] &district Enrollment By School Year}"; 
   ods text="~{newline 2}"; 
   ods text="~{style systemtitle [just=center]Enrollment by Year}"; 
   ods graphics / height=3in width=6in; 
   proc sgplot data=sorted_linedata(where=(district="&district")); 
        series x=year y=students / markers 
        lineattrs=(color=CX39828C pattern=SOLID thickness=3) 
        markerattrs=(color=CX0000FF symbol=STARFILLED) name='series'; 
   run; 
%mend; 

%macro District_Makeup; 
   %let district=%sysfunc(dequote(&district.)); 
   ods text="~{newline 6}"; 
   ods text="~{style [width=100pct font_size=26pt background=CXf4e9c9]Current Year Percentage Of Students By School}"; 
   proc report data=tabledata(where=(district="&district")) nowd 
         style(report)={frame=void font_size=12pt rules=none backgroundcolor=CXF4E9C9 
         cellpadding=0 cellspacing=0}; 
      define district / noprint; 
      define students / noprint; 
      define total_enrollment / noprint; 
      define school / '' style(column)={width=5.5in}; 
      define percent / '' style(column)={width=.5in} right; 
   run; 
%mend; 

%macro Closefile; 
   %let destination=%sysfunc(dequote(&destination.)); 
   ods &destination close; 
%mend; 

proc fcmp outlib=work.fncs.submit; 
   function tabledata_prep(tabledata $); 
      rc = run_macro('tabledata_prep', tabledata); 
      return(rc); 
   endsub; 
   function linedata_prep(linedata $); 
      rc = run_macro('linedata_prep', linedata); 
      return(rc); 
   endsub; 
   function Enrollment(district $); 
      rc = run_macro('Enrollment', district ); 
      return(rc); 
   endsub; 
   function District_Makeup(district $); 
      rc = run_macro('District_Makeup', district ); 
      return(rc); 
   endsub; 
   function Newfile( destination $, path $, gpath $, file $, extension $, styleparm $ ); 
      rc = run_macro('Newfile', destination, path, gpath, file, extension, styleparm ); 
      return(rc); 
   endsub; 
   function Closefile( destination $ ); 
      rc = run_macro('CloseFile', destination ); 
      return(rc); 
   endsub; 
run; quit; 
 
%macro Academic_Performance_Report (linedata =, tabledata = , destination=, path=, gpath=, extension=, style= );        
   options mlogic mprint; 
   %if ( "&extension" eq "" ) and ( &destination ne "" ) %then %let extension =&destination; 
   options cmplib=work.fncs;       
   data _null_;
      rc = tabledata_prep(symget('tabledata')); 
      rc = linedata_prep(symget('linedata'));
   run;
   data _null_;
      set districts;                    
      rc = Newfile( symget('destination'), symget('path'), symget('gpath'), 
         cats(district, "_Annual_Performance"), symget('extension'), symget('style'));                                 
      if ( rc eq 0) then do; 
         rc = Enrollment( district ); 
         rc = District_Makeup( district ); 
         rc = Closefile(symget('destination')); 
      end;  
   run; quit;
%mend; 
 
%Academic_Performance_Report(linedata = data1, tabledata = data2, destination=html, path=, gpath=, extension=, style=Charter ); 

Sunday, August 14, 2011

Using PROC COPULA in a more volatile market

The last week witnessed one of the wildest fluctuations in the market. Copula could measure the nonlinear dependence of multiple assets in a portfolio, and most importantly, is pronounced as \`kä-pyə-lə\(Thanks to the tip by Rick). The latest COPULA procedure in SAS 9.3 is one of the emerging tools to implement copulas.

To test it, I used R to download the daily return data for a few stock shares plus S&P500 index prices, since January 01, 2010. The six stocks are Citi group(C), JP Morgan(jpm), Pfizer(pfe), IBM(ibm), Apple(aapl), and Google(goog). I constructed an equally weighted portfolio by them. Until August 12, 2011, there are 406 observations. Therefore, in a composite plot by SAS, the stocks of banks show the greatest volatility, followed by pharmaceutical and high-tech companies.


#*********************(0) DOWNLOAD MARKET DATA***********************;
library("tseries")    
sp500= get.hist.quote(instrument="^gspc",start="2010-01-01",quote="AdjClose")
c    = get.hist.quote(instrument="c",    start="2010-01-01",quote="AdjClose") 
jpm  = get.hist.quote(instrument="jpm",  start="2010-01-01",quote="AdjClose") 
pfe  = get.hist.quote(instrument="pfe",  start="2010-01-01",quote="AdjClose")
ibm  = get.hist.quote(instrument="ibm",  start="2010-01-01",quote="AdjClose")
aapl = get.hist.quote(instrument="aapl", start="2010-01-01",quote="AdjClose")
goog = get.hist.quote(instrument="goog", start="2010-01-01",quote="AdjClose")

result=as.data.frame(diff(log(merge(sp500, c, jpm, pfe, ibm, aapl, goog))))
write.csv(result,file='c:/tmp/r2sas.csv')

**********************(1) INPUT RAW DATA*****************************;
options fullstimer; dm 'output;clear; log;clear;';
data raw;
   infile 'c:/tmp/r2sas.csv' delimiter = ',' missover dsd firstobs=2 ;
   informat date yymmdd10. sp500 c jpm pfe ibm aapl goog best32.; 
   format date date9.;
   input date sp500 c jpm pfe ibm aapl goog;
run;

**********************(2) PLOT STOCK RETURNS*************************;
proc transpose data = raw out = test01;
   by date;
   var c jpm pfe ibm aapl goog;
run;
data test02;
   merge test01 raw(keep=date sp500);
   by date;
run;
ods graphics / antialiasmax=2900;
proc sgpanel data = test02;
   panelby _name_ / spacing=5 columns = 3 rows = 2 novarname;
   series y=sp500 x=date / lineattrs=(color=red);
   series y=col1  x=date / lineattrs=(color=blue);
   refline 0/ axis=y lineattrs=(pattern=shortdash);
   rowaxis label = 'daily returns';
   label col1 = 'individual stock' ;
run;

I followed the online document of this procedure and also chose the t copula. The correlation plots of the fitted data are displayed above. It seems that PROC COPULA could only draw up to 5*5 matrix for scatter plots in my test. I don’t know if there is any parameter to activate since I have 6 stocks.

**********************(3) CALCULATE COPULA***************************;
proc copula data = raw(drop=sp500); 
   var c jpm pfe ibm aapl goog;
   fit t / marginals = empirical
           method = mle
           plots = (data = both matrix);
   simulate / ndraws = 10000
            seed = 20100822
            out  = sm_t;
run;

Then the kernel densities between stocks from the simulated dataset were calculated in SAS and plotted in R.
 
**********************(4) CALCULATE KERNEL DENSITIES*****************;
%macro kernel(var_list = );
   ods select none;
   %do i = 1 %to 5;
      %do j = %eval(&i + 1) %to 6;
         %let var1 = %scan(&var_list, &i);
         %let var2 = %scan(&var_list, &j);
         proc kde data= sm_t ;
            bivar &var1 &var2 / out = _tmp01;
         run;
         %if %eval(&i + &j) = 3 %then %do;
            data comb;
               set _tmp01;
            run;
         %end;
         %else %do;
            data comb;
               set comb _tmp01;
            run;
         %end;
      %end;
   %end;
   ods select all;
%mend;
%kernel(var_list = c jpm pfe ibm aapl goog);

data comb1;
   set comb;
   length gname $15;
   gname = cats('x=', var1, ';', 'y=', var2);
   keep value1 value2 gname density;
run;
proc export data = comb1 outfile = 'c:/tmp/sas2r.csv' replace;
run;

#*********************(5) PLOT DENSITY IN R**************************;
x = read.csv('c:/tmp/sas2r.csv')
library('lattice')
wireframe(density ~ value1 * value2 | gname , x, shade = TRUE,
          screen = list(z = -30, x = -50), lwd = 0.01, 
          xlab = "Stock X", ylab = "Stock Y",  
          zlab = "Density")


The simulated daily portfolio returns are likely to follow a normal distribution.

**********************(6) PLOT RETURN DISTRIBUTION*******************;
data port_ret (drop = i ret);
   set sm_t;
   array returns{6} c jpm pfe ibm aapl goog;
   ret =0;
   do i =1 to 6;
      ret = ret+ (1/6)*exp(returns[i]);
   end;
   port_ret = ret-1;
run;

proc kde data = port_ret;
   univar port_ret / percentiles = 1,2.5,5,10,90,95,99 plots=histdensity;
   ods output  percentiles = pcts;
   format port_ret percent8.3;
   label port_ret = 'portfolio return';
run;

Several predicted portfolio changes at given probabilities are given in a tilt plot. Hope this portfolio’s performance next day (August 15, 2011) would be within the expected ranges.

**********************(7) PLOT PORTFOLIO RETURNS*********************;
data prob;
   set pcts;
   percent = percent / 100;
   if percent > 0.5 then do ;
      percent  = 1 - percent ;
    result = put(percent , percent8.1)||
         'probability portfolio gains'|| put(port_ret, percent8.3);
   end;
    else result = put(percent , percent8.1)||
         'probability portfolio loses'|| put(port_ret, percent8.3);
run;

goptions device=javaimg ftitle="arial/bold" ftext="arial" 
htitle=.15in htext=.2in xpixels=600 ypixels=500;
proc gtile data = prob;
    tile Percent tileby = (result, Percent) / colorvar = port_ret;
   format port_ret 8.4;
   label port_ret = 'portfolio return';
run;

PROC COPULA supports five types of copulas(normal copula, t copula, clayton copula, Gumbel copula and Frank copula). Jan Chvosta described a ranking method to choose the best copula. I can easily apply the author's protocol.

Overall, PROC COPULA has much lower learning curve than the R package ‘copula’. Hope it grows to a dominating tool in analyzing copula.

Monday, August 1, 2011

10 interesting discoveries from SAS 9.3

1. SAS 9.2 and SAS 9.3 coexist
I thought that a new installation of SAS 9.3 would automatically uninstall the old SAS 9.22 on my laptop. Amazingly, the two editions live together peacefully. The directory for SAS 9.3 is SASHOME while SAS 9.22’s is SAS. I can even run them simultaneously, and they don't bother to interfere each other. Anyway, Enterprise Guide 4.3 replaced the old 4.2 edition after this upgrade.

2. Zero-configuration for Enterprise Miner 7.1
In SAS 9.2 era, installation of Enterprise Miner was a nightmare for me. I can’t imagine somebody other than a trained technician can install it properly. In SAS 9.3, Enterprise Miner 7.1 workstation becomes a part of the default installation (of course you should have the license). There seems no hassle to make it work well.

3. Default HTML output is surprisingly speedy
At the beginning, I seriously doubted on this change, since calling a browser to open HTML pages was intolerably slow in the SAS 9.2 windowing environment. However, HTML output in SAS 9.3 is impressively fast, and I didn’t feel any significant difference between the HTML and the listing. However, I suggest keeping the default HTMLBlue as the HTML style. I tried other styles and they are not that fast as HTMLBlue -- possibly it has been optimized for this feature.

4. Much faster installation
I used to spend an entire afternoon to install SAS 9.22 on my ancient laptop. The medium for SAS 9.22 was 4 DVD discs and I was so exhausted to change the discs from time to time. For SAS 9.3 I used only one hour with 3 discs. It is going to be a big relief for those SAS administrators or IT support.

5. New plotting statements come with no learning curve
I can draw bubble plot or waterfall plot simply according to the SG procedures’ syntax. Sanjay Matange told that SAS 9.3 can also do many other kinds of plots, such as heat map, attribute map, clustered graph, etc, which I will certainly explore. And I pretty expect to see his new book about ODS Graphics in SAS 9.3 later this year.

6. One click to Microsoft Office
Another reason I keep HTML as output destination is that it has really cool interface to send the pieces toward Microsoft Office. For example, I can right click on an output table by PROC PRINT to save it as an Excel sheet. Or I can right click on a graph toward OneNote. Previously in SAS 9.22 I have to write a few lines of codes to do this job.

7. The first question: ODS Graphics on or off
The first time I ran SAS 9.3, a window popped up and asked if I wanted to switch off ODS Graphics. Believe it or not, every statistical procedure could generate dazzling high-quality ODS graphs. However, the trade-off is that it costs more computer resources. A regression by PROC REG from weight to height on SASHELP.CLASS takes 44 seconds. Thus I opt for off.
8. Import spreadsheet from Excel 2007/2010 officially
In many years, to transform an xlsx spreadsheet to a CSV or xls format and import is a routine work for SAS users. In SAS 9.22, a hidden trick is to use PROC IMPORT to input Excel 2007/2010 formats. Now we can see that it has an option in the import menu. Finally SAS 9.3 shows friendship to Excel 2007/2010.
9. New tool in the radar: ODS Graphics Designer
This interface would allow users to create ODS graphs by point-and-click. It can also be invoked by the %sgdesign() command. With this nice facility, I am able to find and export the underlying Graph Template Language (GTL) codes from ODS Graphics.
10. Help documents have a multiple-tab head
Everything in SAS 9.3 has been HTMLized. I like the new multiple-tab head for each SAS procedure in the documents, which should be more helpful in locating desired information.

In conclusion, SAS 9.3 is the real athlete for production purpose, while SAS 9.22 looks like an experimental warm-up release.

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...