PROC HADOOP is available since SAS 9.3M2, which bridges a Windows client and a Hadoop server. The great thing about this procedure is that it supports user-defined function. There are several steps to apply this procedure.
- Download Java SE and Eclipse on Windows
Java SE and Eclipse are free to download. Installation is also fairly easy.
- Make user-defined function on Windows
The most basic user-defined function is an upper-case function for a string that wraps Java’s native str.toUpperCase() function. Pig’s manual has [detail descripton] about it.
- Package the function as JAR
There is a wonderful video tutorial on YouTube. Make sure that version of the [Pig API] with the name such as pig-0.12.0.jar on Windows is the same to the one running on the Hadoop.
- Run PROC HADOOP commands
# pig_code A = load 'test3.txt' as (f1: chararray, f2: chararray, f3: chararray, f4: chararray, f5: chararray); describe A; register myudfs.jar; B = foreach A generate myudfs.UPPER(f3); dump B;Then we can run the SAS codes with PROC HADOOP. Subsequently one field f3 of the text file on HDFS is capitalized.
filename cfg "C:\tmp\config.xml"; filename code "C:\tmp\pig_code.txt"; proc hadoop options=cfg username="myname" password="mypwd" verbose; pig code=code registerjar="C:\tmp\myudfs.jar"; run;