Wednesday, October 3, 2012

SAS and VBA (6) : delete empty rows

One tricky question of the data work is to delete empty rows/observations in a raw file. A code snippet is particular useful to handle such cases. At the mean time, we need to know how many rows are actually removed as the execution result. The good news is that as programmable softwares, Excel/VBA and SAS are both good at dealing this job. In a simple demo text file below, there are actually two empty lines that have to be removed.

  
ptno visit weight
1 1 122  
1 2   
1 3   
1 4 123  
2 1 156  
2 3   

3 1 112  
3 2   

4 1 125 
4 2  
4 3 

VBA
VBA's CountA function is able to count the number of non-empty cells in any range of cells.Within a loop from the top to the bottom, it will help automatically remove those empty rows. In the codes, a message box is created to return the number of the rows deleted.



Sub DelEptRow()
    Dim myRow, Counter As Integer
    Application.ScreenUpdating = False
    For myRow = ActiveSheet.UsedRange.Rows.Count To 1 Step -1
        If Application.WorksheetFunction.CountA(Rows(myRow)) = 0 Then
            Rows(myRow).Delete
            Counter = Counter + 1
        End If
    Next myRow
    Application.ScreenUpdating = True
'   Display the number of rows that were deleted
    If Counter > 0 Then
        MsgBox Counter & " empty rows were removed"
    Else
        MsgBox "There is no empty row"
    End If
End Sub


SAS
SAS can do the same thing with the combination of its MISSING function(decides if there is any missing value) and CATS function(concatenates all numerical and string variables and trims leading blanks). It is very convenient to apply the logic in a DATA STEP and let LOG tell how many lines are deleted.

data patient;  
   input @1 ptno @3 visit @5 weight;  
   infile datalines missover;   
   cards;  
1 1 122  
1 2   
1 3   
1 4 123  
2 1 156  
2 3   

3 1 112  
3 2   

4 1 125 
4 2  
4 3 
;;;  
run;

options missing = ' ';
data want;
   set patient nobs = inobs end = eof;
   if missing(cats(of _all_)) then delete;
*  Display the number of rows that were deleted;
   outobs + 1;
   counter = inobs - outobs;
   if eof then do;
      if counter > 0 then put counter "empty rows were removed";
      else put "There is no empty row";
   end;
   drop counter outobs;
run;

Conclusion
In a DATA STEP, SAS's implicit loop has two sides. The good side is that we do not need care about how to set up a loop most time which saves codes. The bad side is that sometimes it is hard to control which row we need to loop to.

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...