Friday, June 7, 2013

Use Google Trends and SAS to select movies to watch

The newest success story about data science is Google search predicts box office with 94 percent accuracy. I am a frequent movie theater goer, and it will be great if we can implement Google's impressive research result.
There are quite a few offering for this summer. Now I am considering five incoming movies.
Title Date
This is the End Wednesday, June 12
World War Z Friday, June 21
Man of Steel Friday, June 14
Monsters University Friday, June 21
The Internship Friday, June 7

Google Trends reflects what keywords people are searching for, which is a reliable and free data source. Let's use SAS to do some scripting work to generate the URL query based on the get method.
data one;
    input @1 title $25.;
This is the End
World War Z
Man of Steel
Monsters University
The Internship 

data two(where=(missing(word)=0));
    set one nobs = nobs;
    if _n_ ne nobs then
    title1 = cat(title, "%2C");
    else title1 = title;
    do i = 1 to 10;
        word = scan(title1, i, " ");
    keep word;

proc sql noprint;
    select word into: string separated by "%20"
    from two

data three;
    length fullstring $500.;
    fullstring = cats("", "&string", '&geo=US&date=today%203-m&cmpt=q');

proc print;
From SAS, I print the resulting URL. Once I paste the url in a browser, the graphics clearly tells that the box office winners are going to be Man of Steel and World War Z. Finally my choice will be easier. I will surely not miss the two hottest movies.

