Saturday, September 28, 2013

The KFC toy problem: perspectives from four job roles

There is an interesting question —
There are 5 different types of toys at a KFC restaurant. If you go there, you will get one toy randomly. How many times do you need to go to KFC in order to get all 5 toys?
The question is about probabilistic analysis. Different professionals, such as a business analyst, a statistical programmer, a mathematician and a software developer, will have different thinking pathway to solve this problem. Let's see what they would think.

1. Business Analyst

A business analyst will tend to do scenario analysis at the first step.
Best-case scenario:
Assume I am so lucky that each time I visit KFC and get a different toy, then I only need 5 times to have all the five toys. The minimum number is 5.
Worst-case scenario:
To get a different toy I need to go to KFC five times. If there is not the toy I want, I will come back with empty hand. Thus, I will have to go to KFC 5*5 = 25 times. Of course, this scenario never happens.
OK. Then the mean times I need to go to KFC seems to be in a range [5, 25). Let's give the simplest try — use the (5+25)/2 to get 15 times. The number is not accurate but at least we have an estimate.

2. Statistical Programmer

As a brute-force tool, simulation is the instant thought for a statistical programmer. Let the computer randomly create 10,000 trials — say a person plays the game 10,000 times. After averaging the results, the computer eventually tells the expected times to get the 5 toys.
I modified the SAS code from a post by Rick Wicklin, and set the maximum number of visits to KFC as 32. After 10,000 runs, the mean is 11.37. The hunch tells that this number should be quite close.
************(1)Simulate 10000 trials**************************;
proc iml;
   K = 5; /* number of toys */
   L = 32; /* max visits per trial */
   /* generate NSim trials of L visits */
   NSim = 10000;
    x = j(Nsim, L);
   call randseed(12345);
   call randgen(x, "Uniform");
   x = ceil(K*x); /* integers in [1,K] */
   /* record the visit when 5 toys are taken */
   c = j(NSim,1,0); /** allocate */
    do i = 1 to NSim;
      do j = 5 to L;
         rowUnique = countunique(x[i, 1:j]);
         if rowUnique = 5 then do;
            c[i, 1] = j;
            goto skip;
         end;
      end;
      skip:
    end;
   /* output the result */
   create _1 from c;
      append from c;
   close _1;
;quit;

data _2;
   set _1;
   /* remove the trials that didn't get 5 toys in 32 visits */
   where col1 >= 5;
 run;
************(2)Show the results*******************************;
proc sgplot;
   density col1;
   density col1 / type = kernel;
run;

proc means;
run;

3. Mathematician

Actually this question is a variant of Coupon collector's problem. The mean/expectation and standard deviation can be derived directly by the formulas. The final expectation should be 5*(1+1/2+1/3+1/4+1/5) = 11.41. This is the answer.

4. Software Developer

A software developer considers time complexity and space complexity first. When N is approaching infinity, the question is similar to a merge sorting. Given a merge sort is O(nlog(n)), the expected times must be greater than 5*ln(5) = 8.05. At least this number will be a lower bound for this question.

Monday, September 9, 2013

Use MongoDB as a JSON factory

MongoDB is a persistent data store for JSON formatted data, which seems like an ideal middleware between the data tier software and the web. With MongoDB, Javascript's Map/Reduce functionality makes many trivial jobs particularly easy, such as translate an object to an array. For example, we can produce a bubble plot with Highcharts.js and the SASHEP.IRIS dataset in SAS very quickly.
Step 1: push SAS dataset toward MongoDB
First, let's push the SASHEP.IRIS dataset from SAS to MongoDB using thesas2mongo macro.
%sas2mongo(data = sashelp.iris, dbname = demo, collname = iris, tmpfolder = c:\tmp, mongofolder =c:\mongodb\bin);
Step 2: make the JSON file
Under the MongoDB shell, we could use Javascript'smap function to transform data to the desired structure.
var species = ["Setosa", "Versicolor", "Virginica"];
for (i=0; i<3; i++) {
   var z = db.iris.find({Species: species[i]}, {SepalLength:1,SepalWidth:1, PetalWidth:1, _id:0}).toArray().map(function(d){return [d.SepalLength, d.SepalWidth, d.PetalWidth]});
   print("{name:", JSON.stringify(species[i]), ", data:", JSON.stringify(z), "},");
};
Step 3: finalize Highcharts.js
The link of the final plots is here.
The plots are demonstrated into a Bootstrap 3 framework. One great advantages for SVG is that it is responsive to the devices‘ screens, which is especially friendly to mobile. The PNG or JPG formatted images can invoke Bootstrap’s responsive library to have the same effect.

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...