Monday, December 12, 2011

10 toughest interview questions for R developers (1)

I recently discovered 10 questions most daunting in R development, and I am trying to find the answers below.

1. Data structure -- How many data structures R has? How do you build a binary search tree in R?
2. Sorting -- How many sorting algorithms are available? Show me an example in R.
3. Low level -- How do you build a R function powered by C?
4. String -- How do you implement string operation in R?
5. Vectorization -- If you want to do Monte Carlo simulation by R, how do you improve the efficiency?
6. Function -- How do you take function as argument of another function? What is the apply() function family?
7. Threading -- How do you do multi-threading in R?
8. Memory limit and database -- What is the memory limit of R? How do you avoid it? How do you use SQL in R?
9. Testing -- How do you do testing and debugging in R?
10. Software development -- How do you develop a package? How do you do version control?

Q1. Data structure -- How many data structures R has? How do you build a binary search tree in R?
My answer: R mainly has 5 data structures.
Homogeneous(contain the same type of objects): vector --> matrix --> array
Heterogeneous(allow different type of objects): list --> data frame

A binary search tree requires several actions: implement a tree, insert nodes and delete nodes. We should create individual routines in R.

In R, a matrix is the ideal data structure to contain the linked elements. Then a list should be used to wrap the matrix and pass the arguments.

For insert-node routine, there is the pseudocode for it. The key point here is to use recursion in R’s function. Norman Matloff gave a complete example at page 180 of his book. 

insert(X, node){
 if(node = NULL)
 node = new binaryNode(X,NULL,NULL)
 return
 }
 if(X = node:data)
 return
 else if(X < node:data)
 insert(X, node:leftChild)
 else // X > node:data
 insert(X, node:rightChild)
}

Q2. Sorting -- How many sorting algorithms are available? Show me an example in R.
My answer: there are mainly 5 kinds of sorting algorithms(with their complexity):
Bubble Sort - O(n^2);
Selection Sort - O(n^2)
Merge Sort - O(n log n)
Quick Sort - from O(n log n) to O(n^2)
Bucket Sort - O(n + m)

R has a native sort function sort(), which is written by C and uses Quick Sort.

There is an example of Quick Sort in R by recursion

qs <- function(x) {
 if (length(x) <= 1) return(x)
 seed <- x[1]
 rest <- x[-1]
 sv1 <- rest[rest < seed]
 sv2 <- rest[rest >= seed]
 sv1 <- qs(sv1)
 sv2 <- qs(sv2)
 return(c(sv1,seed,sv2))
}

Good math, bad engineering

As a formal statistician and a current engineer, I feel that a successful engineering project may require both the mathematician’s abilit...