Wednesday, July 27, 2011

SAS dataset declassified by Matt Shotwell


Matt Shotwell’s new R package ‘sas7bdat’ is a great achievement to bridge SAS and R. Earlier this year Revolution R, a commercial competitor against SAS, launched a RxSasData() function to read SAS’s unique ‘sas7bdat’ data structure. However, we more like the free lunch provided by the community R.

Now R would have a free access toward SAS’s datasets, including many SAS’s own help datasets. And we will be able to do a lot of tricks toward SAS’s datasets powered by R, in many areas where SAS can’t reach or we didn’t pay the licenses. For example, SAS has a SASHELP.LAKE dataset to show the surface plot feature.  We can use R to directly read it and draw a picture combining a contour plot and a surface plot.


library('sas7bdat', 'lattice')
x = read.sas7bdat('c:/program files/sas/sasfoundation/9.2/graph/sashelp/lake.sas7bda')

panel.3d.contour <-
    function(x, y, z, rot.mat, distance,
           nlevels = 20, zlim.scaled, ...){
    add.line <- trellis.par.get("add.line")
    panel.3dwire(x, y, z, rot.mat, distance,
               zlim.scaled = zlim.scaled, ...)
    clines <-
      contourLines(x, y, matrix(z, nrow = length(x), byrow = TRUE),
                   nlevels = nlevels)
    for (ll in clines) {
      m <- ltransform3dto3d(rbind(ll$x, ll$y, zlim.scaled[2]),
                            rot.mat, distance)
      panel.lines(m[1,], m[2,], col = add.line$col,
                  lty = add.line$lty, lwd = add.line$lwd)
    }
}

wireframe(-Depth ~ Length * Width , x, panel.aspect = 0.6,
        panel.3d.wireframe = "panel.3d.contour",  shade = T,
        screen = list(z = -30, x = 50), lwd = 0.01,
        xlab = "Length", ylab = "Width",  
        zlab = "Depth")


I also had a little test to evaualate the speed of the read.sas7bdat() function. Reading the SAS dataset SASHELP.LAKE 30 times only took 1.64 second on my 3-yea-old desktop, which is certainly much faster than transforming it to a CSV file and inputting.

library('sas7bdat')
test <- function(n = 30) {
 system.time(
     for(i in 1:n)
       read.sas7bdat('c:/program files/sas/sasfoundation/9.2/graph/sashelp/lake.sas7bda')
    )
}
test()

> user  system elapsed 
  1.60    0.05    1.64
Hope Matt continues to improve this wonderful package:
  1. add the support for SAS datasets generated by 64bit systems;
  2. add a write.sas7bdat() function (that will be so cool!).

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...