Friday, May 18, 2012

Use the set operator UNION in PROC SQL

SQL syntax contains a few set operators, such as UNION, EXCEPT and INTERSECT. The UNION operator concatenates the results of multiple SQL queries vertically into a single table for all matching rows, which I found particularly useful in PROC SQL while using SAS to manage data. Here come two examples.

Example 1 – Transpose data
PROC SQL can transform a dataset to any desired structure, without referring to DATA step or PROC TRANSPOSE. For example, SASHELP.CLASS can be transposed from wide to long by the UNION ALL clause, and reversely from long to wide by the MAX function and the GROUP clause.

From wide to long

data wide;
   set sashelp.class;
   drop sex;
run;

proc sql;
   create table long as
   select name, 'Age' as var, age as col1
   from wide
   union all
   select name, 'Weight' as var, weight as col1
   from wide
   union all
   select name, 'Height' as var, height as col1
   from wide
;quit;
From long to wide

proc sql;
   create table wide as
   select name,
      max(case when var = 'Age' then col1 end) as Age,
      max(case when var = 'Weight' then col1 end) as Weight,
      max(case when var = 'Height' then col1 end) as Height
   from long
   group by name;
quit;
Example 2 – Aggregate data into a cube
In SAS, PROC SQL doesn’t support the ROLLUP or CUBE clause. However, we can apply multiple UNION operators to simulate such functionality. For example, we can create a cube table to list all possible summations of the ACTUAL variable by STATE, PRODUCT, YEAR in the SASHELP.PRDSAL2 dataset. Afterward we can easily query this multi-dimensional data structure to look for interesting aggregation information, without running any other aggregating procedure again.

data prdsal2;
   set sashelp.prdsal2;
run;

proc sql;
   create table cube as
   select state,product, year, 'total by state, prodcut and year' as category,
   sum(actual) as actual
   from prdsal2
   group by state, product, year
   union
   select state, product, ., 'total by state and prodcuct', sum(actual)
   from prdsal2
   group by state, product
   union
   select state,'', year, 'total by state and year', sum(actual)
   from prdsal2
   group by state, year
   union
   select '',product, year, 'total by product and year', sum(actual)
   from prdsal2
   group by product, year
   union 
   select '' ,'', year, 'total by year', sum(actual)
   from prdsal2
   group by year
   union 
   select state, '',. , 'total by state', sum(actual)
   from prdsal2   
   group by state
   union 
   select '',  product, ., 'total by product', sum(actual)
   from prdsal2
   union
   select '', '', ., 'grand total', sum(actual)
   from prdsal2
   order by state, product, year
;quit;

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...