Thursday, August 29, 2013

A SAS macro that exports data to MongoDB

MongoDB is possibly the most popular NoSQL data store. To bypass schema and constraint, I feel quite convenient to implement MongoDB as buffer to accompany current RDBMS .Also it is straightforward to use MongoDB and other tools (MEAN) to build some simple web apps for statistics presentation.
Neither SAS nor 10gen so far published any SAS-MongoDB driver. However, the table-like dataset in SAS can be transformed to CSV by PROC EXPORT or DATA Step. MongoDB has a nice API mongoimport that easily accepts CSV formatted data. I write a macro in SAS below to transport data from SAS to MongoDB. The speed is quite fast.
****************(1) MODULE-BUILDING STEP********************************;
%macro sas2mongo(data =, dbname = , collname =, tmpfolder =, mongofolder = );
   /*************************************************************************
   *  MACRO:      sas2mongo()
   *  GOAL:       output a dataset in SAS to a collection in MongoDB
   *  PARAMETERS: data       = SAS dataset to export
   *              dbname     = database name in MongoDB
   *              collname   = collection name in MongoDB
   *              tmpfolder  = Windows directory for temporary file exchange
   *              mongofolder= bin directory where MongoDB was installed
   *************************************************************************/
    proc export data=&data outfile="&tmpfolder.\tmp.csv" dbms=csv replace;
    run;
    options noxsync noxwait;
    %put the execuated command is: &mongofolder\mongoimport.exe -d 
        &dbname -c &collname --type csv --file &tmpfolder.\tmp.csv --headerline;
    x "&mongofolder\mongoimport.exe -d &dbname -c &collname --type 
        csv --file &tmpfolder.\tmp.csv --headerline";
%mend;

****************(2) TESTING STEP****************************************;
%sas2mongo(data = sashelp.class, dbname = demo, collname = class,
    tmpfolder = c:\tmp, mongofolder =c:\mongodb\bin);
Then I run commands in Mongo shell. It works just well.
use demo;
db.class.find();

Tuesday, August 27, 2013

Bubble plot by SAS and Highcharts.js

Bubble plot is a nice data visualization choice for three dimensional numeric variables. It seems quite popular on web and documents.

Static plotting by SAS

Since SAS 9.3, PROC SGPLOT provides a bubble statement, which makes a bubble plot easy. For example, the dataset SASHELP.CLASS can be quickly projected onto a bubble plot.
proc sgplot data = sashelp.class;
   title 'bubble plot by sashelp.class';
   bubble x = weight y = height size = age / group = sex transparency = 0.5;
   yaxis grid;
run;

Dynamic plotting by SAS and Highcharts.js

For the show-off on web, an interactive bubble plot above will be much more attractive. First we need to use SAS to transform the SASHELP.CLASS dataset to a nested JSON array. The link of the final dynamic plot is here.
data one;
   set sashelp.class;
   length data $20.;
   data = cats('[', weight, ',', height, ',', age, '],');
run;

proc sort data = one;
   by sex;
run;
proc transpose data = one out = two;
   by sex;
   var data;
run; 

data JSON;
   set two;
   length _tmp dataline $300.;
   _tmp = cats( of col:);
   substr(_tmp, length(_tmp), 1) = ' ';
   dataline = cats('{data:[', _tmp, '],', 'name:"', sex, '"}');
   keep dataline;
run;
One good thing is that the JSON data can be fully embedded in Highcharts.js, which doesn't require an HTTP server like D3.js. We only need to insert the data from SAS's DATA Step into Highcharts's bubble plot API. It also provides rich options for better visualization effects and convenient downloading.
$(function () {
    $('#container').highcharts({
        chart: {
            type: 'bubble',
            zoomType: 'xy'
        },
        credits: {
            text: "Demo",
            href: 'http://www.sasanalysis.com'
        },
        title: {
            text: 'Bubble plot by sashelp.class'
        },
        series: [{
            data: [
                [84, 56.5, 13],
                [98, 65.3, 13],
                [102.5, 62.8, 14],
                [84.5, 59.8, 12],
                [112.5, 62.5, 15],
                [50.5, 51.3, 11],
                [90, 64.3, 14],
                [77, 56.3, 12],
                [112, 66.5, 15]
            ],
            name: "F"
        }, {
            data: [
                [112.5, 69, 14],
                [102.5, 63.5, 14],
                [83, 57.3, 12],
                [84, 62.5, 13],
                [99.5, 59, 12],
                [150, 72, 16],
                [128, 64.8, 12],
                [133, 67, 15],
                [85, 57.5, 11],
                [112, 66.5, 15]
            ],
            name: "M"
        }]
    });
});

Conclusion

  1. Besides its statistical feature, SAS is also a flexible scripting language such as creating JSON;
  2. Highcharts.js is a view tier tool halfway between D3.js (open source; minimum documentation) and tableau(propriety software; company support), which allows integration with SAS or other data tier tools.

Friday, August 9, 2013

More SQL taste in SAS 9.4

Compared with SAS 9.3, the latest SAS 9.4 introduced a few new procedures for the BASE and STAT components: 7 new procedures for BASE 9.4 and 4 for STAT 12.3. 6 high-performance procedures (thanks to Dr. Wicklin's correction).
New in BASE 9.4New in STAT 12.1New in STAT 12.3
DELETEADAPTIVEREGHPGENSELECT
DS2QUANTLIFEHPLOGISTIC
JSONQUANTSELECTHPLMIXED
PRESENVSTDRATEHPNLMOD
STREAMHPREG
FEDSQLHPSPLIT
AUTHLIB
DS2 is a new SAS proprietary programming language that is appropriate for advanced data manipulation.It is exciting to see the emergence of DS2 and FEDSQL. According to SAS 9.4 DS2 Language Reference,
DS2 is a SAS programming language that is appropriate for advanced data manipulation
Contrary to the thought I had last year, DS2 or PROC DS2 is not a complied language. It seems more like a wrapper of PROC FEDSQL, which combines the capacity of SQL and the original DATA Step together. Therefore, DS2 includes many SQL's features such as subquery.
data class;
   set sashelp.class;
run;

proc datasets nolist;
   delete _:;
quit;

proc ds2 stimer;
   data _test1;
      dcl varchar(6) gender;
      method run();
         set {select name, sex from class where age > 12};
         if sex = 'M' then gender = 'Male';
         else gender = 'Female';
      end;
   enddata;
   run;
quit;
The functionality is equivalent to the SQL syntax in SAS below.
proc sql stimer;
   create table _test2 as
   select *, case when sex = 'M' then 'Male'
         else 'Female'
      end as gender
   from (select name, sex from class where age > 12)
;quit;
Additionally, DS2 supports the concept of transaction in SQL. The run statement in DS2 is equal to the COMMIT statement in SQL, while run cancel statement is comparable to SQL'sROLLBACK statement.
In conclusion, with DS2, SAS is leaning toward RDBMS in how to understand and deal with data.

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...