Friday, October 14, 2011

What are those SAS jobs around Cary, NC?

SAS Institute is located in Cary, NC. In this job-scarce economy, an interesting question is: what job opportunities are available for a SAS user around this great company which created SAS, say, in an area of 150-mile radius. Fortunately, I found that the returned values from the omnipotent job search engine,, are highly digestible, although this website doesn’t provide analytics service to general public. To integrate the data from, I designed a macro to extract essential variables from the returned HTML pages. Then I set the time limit for the opening as the past 30 days, ‘SAS’ as keyword to search, and 100 openings to show on each returned page. I extracted 5 such pages by running this macro and eventually I obtained 500 job openings to conduct this experiment.

%macro extract(city =, state = , radius = , page = );
   options mlogic mprint;
   %do i = 1 %to &page;
       %let j = %eval((&i-1)*100);
       filename raw url  
       data _tmp01;
          infile raw lrecl= 500 pad ;
          input record $500. ;
          if find(record, 'jobmap[') gt 0 and find(record, 'srcname') gt 0;
       data _&i;
           set _tmp01;
           array id[5] $50. indeed_id srcname corpname location title;
           length _str $8.;
           do _k = 1 to 5;
               if _k = 1 then _str = "jk:";
               else if _k = 2 then _str = "srcname:";
               else if _k = 3 then _str = "cmpesc:";
               else if _k = 4 then _str = "loc:";
               else _str = "title:";
               _len = length(compress(_str));
               _start = find(record, compress(_str)) + _len ;
               _end = find(record, ",", _start) ;
               id[_k] = compress(substr(record , _start  , _end - _start), "'");
           extract_time = datetime(); format extract_time datetime.;
           drop record _:;
   data &city._&state;
       set %do n = 1 %to &page;
   data &city._&state;
       retain obs extract_time corpname location title indeed_id srcname;
       set &city._&state;
       obs + 1;
   proc datasets nolist;
       delete _:;
%extract(city = cary, state = nc, radius = 150, page = 5);

Popular job titles
I used Wordle, a text cloud website to summarize all job titles. ‘Analyst’, ‘Developer’ and ‘Programmer’ seem to be quite common titles for a job related to SAS. Obviously many openings ask for experience, since they frequently mentioned ‘Senior’, ‘Manager’, or‘Management’. You can also predict the daily routines of those jobs by ‘Data’, ‘Marketing’, ‘Statistical’, ‘Risk’, ‘Database’, ‘Research’ and ‘Clinical’.

data _null_;
   set cary_nc(keep=title);
   string =tranwrd(upcase(title), 'SAS', ' ');
   string =tranwrd(upcase(string), 'SR', 'SENIOR ');
   file 'c:\tmp\output.txt';
   put string;
Top 10 job providers
As I expected, SAS is the biggest job provider with 18 openings in the past month, followed by Bank of America (14), Ettain Group(12) and other companies. The top job providers don’t include insurance companies, CRO or pharmaceuticals, which suggests that there are fewer local companies or they may transfer the recruiting task to staffing companies.

proc sql outobs=10;
   create table _1 as
   select upcase(corpname) as name, count(*) as freq
   from cary_nc
   where corpname is not missing
   group by name
   order by freq desc    

data _2;
   set _1;
   n + 1;
   length category $10.;
   if n = 1 then category = 'SAS';
   else if n in (2, 6) then category = 'Bank';
   else if n in(4, 9, 10) then category = 'Consulting';
   else category = 'Staffing';

ods html gpath = 'c:\tmp\' style = harvest;
goptions device=javaimg ftitle="arial/bold" ftext="arial" 
htitle=.15in htext=.2in xpixels=600 ypixels=500;
proc gtile data = _2;
    tile freq tileby = (name, freq) / colorvar = category;

Location, location, location
Job accumulated in five cities for those talents who have SAS skills: Charlotte, Raleigh, Cary, Durham and Richmond. Except working for SAS Institute, Charlotte and Raleigh are the best cities for a job seeker in Center North Carolina to consider.

proc gchart data= cary_nc;
   pie location;

Source of job posts
Nowadays most company websites require the applicants to disclose where they get the information. In this experiment, the top 3 sites are, and

proc sql outobs=10;
   create table _3 as
   select upcase(srcname) as name, count(*) as freq
   from cary_nc
   where corpname is not missing
   group by name
   order by freq desc    
proc sgplot data = _3;
   waterfall category = name response = freq ;
   xaxis  label= ' '; yaxis label=' ';

Lesson learned from this weekly project
1. PROC GMAP in SAS 9.3 now supports alpha-transparency feature to draw a map -- a significant improvement.
2. Parsing online data doesn’t need to use the awesome Regular Expression syntax, although SAS has a few RE functions. SAS’s character functions are pretty robust.
3. Analyzing the online job information is quite entertaining for me. I believe that mining of those texts deeper would lead to more discoveries of secrets.

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...