Saturday, April 9, 2011

Predict unemployment rate for Election 2012 by SAS


Since recently President Obama announced that he is seeking reelection, the unemployment rate on November 2012 would decide the result. The Wall Street Journal averaged 54 economists’ predication and concluded that the number is going to be 7.7%. Apparently, those economists rely on the historical data to forecast the future, together with more or less their subjective judgment. However, the newly released March data is surprisingly good: 8.8%, which means that this predication number has to be adjusted downwardly to be below 7.7%. Then what is the real-time prediction of the unemployment rate for this ‘big’ time?

SAS has one of the finest time-series packages in the world: SAS/ETS which includes a few predictive procedures such as the ARIMA procedure and the FORECAST procedure[Ref. 2]. And the economic data is updated by Federal Reserve and well accessible on their website. To predict unemployment rate like a real professional is possible with a notebook computer and SAS. Of course SAS’s procedures have tons of methods and parameters to tune. To simply this problem, in the SAS macro below, I chose a conservative method and an aggressive one, to give a rough estimation about the unemployment range. Just like what the WSJ said, the trend matters. The predication will be more approaching to the real number as time goes forward. Right now, my prediction for the unemployment rate on November 2012 is from 7.1% to 7.4%.

References:
1."Jobless Rate at 2012 Presidential Vote Forecast at 7.7%, Highest Since Carter-Ford, but the Trend May Matter Most". The Wall Street Journal, 13Mar2011.
2.SAS/ETS 9.2 User Guide. SAS Publishing, 2008.

/*******************READ ME*********************************************
* -- PREDICT UNEMPLOYMENT FOR ELECTION 2012 LIKE A PRO --
*
* VERSION:     SAS 9.2(ts2m0), windows 64bit
* DATE:        09apr2011
* AUTHOR:      hchao8@gmail.com
*
****************END OF READ ME*****************************************/

****************(1) MODULE-BUILDING STEP******************;
%macro unemrate(predtime = );
   /***********************************************************
   *  MACRO:      unemrate()
   *  GOAL:       use time series based on latest FED data to
   *              predict unemployment rate in US and plot
   *  PARAMETERS: predtime = the time when unemployement rate 
   *                         is to be predicted
   *
   ***********************************************************/
   filename _infile url 
      "http://research.stlouisfed.org/fred2/data/UNRATE.txt" 
      debug lrecl=100;

   data raw;
      infile _infile missover firstobs = 22;
      format date date9.;
      input @1 date yymmdd10. @13 value 4.1;
   run;

   data _null_;
      set raw end = eof;
      if eof then do;
         interval = intck('month', date, input("&predtime", monyy7.));
         call symput('interval', interval);
         call symput('eodate', date);
         call symput('insert', 'Lastest data:' || put(value, 4.1) || 
                   '% on ' || put(date, monyy7.));
      end;
   run;

   %if %eval(&interval) le 0 %then %do; 
      %put ERROR: Predicted time must be greater than latest time FED posts data; 
      %goto finish; 
   %end;

   ods select none;
   proc forecast data = raw out = _predbyfc outfull
      method = stepar lead = &interval interval = month;
      id date;
      var value;
   run;

   proc arima data = raw;
      identify var = value;
      estimate p = 1 q = 12;
      forecast lead = &interval interval = month 
              id = date out = _predbyar;
   quit;
   ods select all; 

   proc sql;
      create table predicted0 as
      select a.date, a.value label = 'Real unemployment rate', 
            a.forecast as predbyar label = 'ARIMA model', 
            b.value as predbyfc label = 'STEPAR model'
      from _predbyar as a,
           _predbyfc (where = (lowcase(_type_) = 'forecast')) as b
      where a.date = b.date
   ;quit;

   data predicted1;
      set predicted0 end = eof;
      if date lt &eodate then call missing(predbyar, predbyfc);
      else if date eq &eodate then do;
         predbyar = value; 
         predbyfc = value;
      end;
      if eof then do;
         call symput('arlast', put(predbyar, 4.2));
         call symput('fclast', put(predbyfc, 4.2));
      end;
   run;
    
   ods html style = harvest;
   proc sgplot data = predicted1;
      where date ge '01jan2006'd;
      series x = date y = value;
      series x = date y = predbyar;
      series x = date y = predbyfc;
      refline &arlast / axis = y labelloc = inside
         label = "&arlast" transparency = 1;
      refline &fclast / axis = y labelloc = inside
         label = "&fclast" transparency = 1;
      xaxis grid label = ' ';
      yaxis grid label = 'Unemployment percentage %'
         values = (4 to 11 by 0.2);
      inset "Prediction ends on &predtime" / position = topright border;
      inset "&insert" / position = bottomright;
   run;
   ods html close;

   proc datasets nolist;
      delete _:;
   quit;

   %finish: ; 
%mend;

****************(2) TESTING STEP******************;
%unemrate(predtime = NOV2012);

****************END OF ALL CODING***************************************;

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...