binary search tree, data structure, ninja, recursion

Become a BST Ninja - Genin Level

— Published 19 December 2014 —

Binary Search Tree (BST) is one of the most classic data structures. The definition for its structure is shown as below:

It consists of Nodes and Leaves.
A Node has a child bst on the left side, a key (, a data), and a child bst on the right side. Note here a node's left or right child is not a node, instead, is indeed another binary search tree.
A Leaf has nothing but act only as a STOP sign.

type 'a bst_t =  
  | Leaf
  | Node of 'a bst_t * 'a * 'a bst_t (* Node (left, key, right) *)
(* a node may also carry a data associated with the key. 
   It is omitted here for neater demonstration *)

The important feature that makes BST unique is

for any node
- all keys from its left child are smaller than its own key
- all keys from its right child are bigger (assumming keys are distinct)

And here is an example of BST:

Instead of continuing to present the basics of BST, this post will now focus on how to attack BST related problems with the most powerful weapon: Recursion.

Recursion on BST

From Recursion Reloaded, we know that one way to model recursion is:

Assume we already got a problem solver: solve.
Don't think about what would happen in each iteration.
Split the problem to smallers sizes (N = N1 + N2 + ...).
Solve those smaller problems like solve N1, solve N2, ... Here is the tricky part: still, we do not know or care what are inside solve and how f would do the job, we just believe that it will return the correct results.
Now we have those results for smaller problems, how to wire them up to return the result for N? This is the ultimate question we need to answer.
Of course, we do not forget the STOP sign for some edge cases.
Together with point 5 and 6, we get our solve.

A good thing coming from BST is that the split step has been done already, i.e., a BST problem can be always divided into left child, root, and right child.

Thus if we assume we already got solve, we just need to solve left and / or solve right, then do something with root, and finally wire all outcomes to obtain the final result. Again, we should of course never forget the STOP sign and in BST, usually it is the Leaf, i.e., we need to ask ourselves what if the BST is a Leaf.

The thinking flow is illustrated as the diagram below.

Before we start to look at some problems, note that in the diagram above or Recursion Reloaded, we seem to always solve both left and right, or say, all sub-problmes. It is actually not necessary. For BST, sometimes either left or right is enough. Let's have a look at this case first.

Either Left or Right

Our starter for this section is the simplest yet very essential operation: insert a key to a BST.

Insert

From the definition of BST, we know the basic rule is that if the new key is smaller than a root, then it belongs to the root's left child; otherwise, to the root's right child. Let's follow the modelling in the previous diagram to achieve this.

We assume we already got insert key bst
We know the problem can be split into left, root, and right.
Because a new key can go either left or right, so we need to deal_with root first, i.e., compare the new key and the key of the root.
Directly taken from the rule of BST, if the new key is smaller, then we need to solve left thus insert key left; otherwise insert key right.
What if we get to a Leaf? It means we can finally place our new key there as a new Node and of course, at this moment both children of the new Node are Leaves.

Note that the BST type in OCaml we defined early on is pure functional, which means every time we need to update something, we have to create new. That's why in the diagram, even if we just insert x to left or right, we need to construct a new Node because we are updating the left child or the right one. The code is shown as below.

let rec insert x = function  
  | Leaf -> Node (Leaf, x, Leaf) (* Leaf is a STOP *)
  | Node (left, k, right) ->
    if x < k then Node (insert x left, k, right) (* if x is smaller, go left *)
    else Node (left, k, insert x right) (* otherwise, go right *)

Member

member is to check whether a given key exists in the BST or not. It is very similar to insert. There are three differences:

When dealing with root, we need to see whether the given key equals to the root's key or not. If yes, then we hit it.
member is a readonly function, so we directly solve left or right without constructing new Node.
If arriving at a Leaf, then we have nowhere to go any more and it means the given key is not in the BST.

let rec mem x = function  
  | Leaf -> false
  | Node (left, k, right) ->
    if x < k then mem x left
    else if x = k then true
    else mem x right

Both Left and Right

Usually when we need to retrieve some properties of a BST or possibly go through all nodes to get an answer, we have to solve both children. However, the modelling technique does not change.

Height

Recall from Some properties of a tree, the height of a tree is the number of edges that the longest path (from the root to a leaf) has. From this definition, it seems easy to get the height. We simply try to find all possible paths from root and for each path we record its number of edges. The answer will be the max of them. It sounds straightforward, but if you really try to write the code in this way, I bet the code would be a bit messy. Honestly, I never wrote in this way and I will never do that.

Another way is to think recursively. First let analyse a little bit about the longest path matter.

As we can see from the above diagram, Root has two edges: one to Left and the other to Right. So whatever the longest path from Root might be, it must pass either Left or Right. If we somehow could obtain the longest path from the root of Left and the longest path from the root of Right, the longest path from Root should be the bigger one of the two paths, right?

Let's now assume we already got height and it will return the height of a BST. Then we can obtain h_left and h_right. The answer is what is the h (height of Root)? Note that the height implies the longest path already (that's the definition). So from the paragraph above, What we need to do is getting max h_left h_right. Since Root has an edge to either child, h = 1 + max h_left h_right.

Don't forget the STOP sign: the height of a Leaf is 0.

let rec height = function  
  | Leaf -> 0
  | Node (left, _, right) -> 1 + max (height left) (height right)

Simple, isn't it?

Keys at a certain depth

So far, it seems our hypothetic solve function takes only the sub-probem as parameter. In many cases this is not enough. Sometimes we need to supply more arguments to help solve. For example, in the problem of retriving all keys at a certain depth definitely needs current depth information.

Only with the help of current_depth, the Root can know whether it belongs to the final results.

If current_depth = target_depth, then Root should be collected. Also we do not continue to solve Left or Right as we know their depths will never equal to target_deapth.
Otherwise, we need to solve both Left and Right with argument of 1 + current_depth.
Assume our solve is working. Then solve left (1+current_depth) would return a list of target nodes and so does solve right (1+current_depth). We simply then concatenate two target lists.
STOP sign: Leaf is not even a Node, so the result will be empty list.

The code is like this:

let rec from_depth d cur_d = function  
  | Leaf -> []
  | Node (left, k, right) ->
    if cur_d = d then [k]
    else from_depth d (cur_d + 1) left @ from_depth d (cur_d + 1) right

let all_keys_at depth bst = from_depth depth 0 bst

Genin

From Ninja Wiki

A system of rank existed. A jōnin ("upper man") was the highest rank, representing the group and hiring out mercenaries. This is followed by the chūnin ("middle man"), assistants to the jōnin. At the bottom was the genin ("lower man"), field agents drawn from the lower class and assigned to carry out actual missions.

Ps.

Some readers contacted me. They hope that maybe I can use more advanced knowledge or harder examples in my posts and the current ones might seem a little boring. I think I need to explain a bit here.

The general idea behind Many things about OCaml is not to write a cookbook for certain problems related to OCaml or be a place where quick solution is there and copy / paste sample code would do the trick. Instead, Many things means some important aspects in OCaml that might be overlooked, or some particular problems that can show the greatness of OCaml, or the elegant OCaml solutions for some typical data structures and algorithms, etc. As long as something are valuable and that value shows only in OCaml or Functional Programming, I would like to add them all in here one by one.

Fortunately or unfortunately, even though I have only limited experiences in OCaml, I found that the many is actually quite big. And due to this many, I had to make a plan to present them all in a progressive way. Topics can interleave with each other in terms of time order as we do not want to have the same food for a month. More importantly, however, all should go from simple / easy to advanced / hard. In order to present some advanced topic, we need to make sure we have a solid foundation. This is why, for example, I even produced a post for the properties of a tree although they are so basic. This is also why I reloaded recursion since recursion is everywhere in OCaml. They are a kind of preparations.

Moreover, I believe in fundamentals. Fundamentals are normally concise and contain the true essence. But sometimes they can be easily overlooked or ignored and we may need to experience a certain number of difficult problems afterwards, then start to look back and appreciate the fundamentals.

The reason of using simple examples is that it makes my life easier for demonstrations. I love visualisations and one graph can be better than thousands of words. For complicated problems and solutions, it is a bit more difficult to draw a nice and clean diagram to show the true idea behind. I will try to do so later on, but even if I could achieve it in this early stage, some readers might easily get lost or confused because of the unavoidable complication of the graph. As a result, the point of grasping fundamentals might be missed.

Anyway, please don't worry too much. Attractive problems in OCaml are always there. For example, in my plan, I will later start to present a number (maybe 15 ~ 17) of my beloved Functional Pearls in OCaml and if you are really chasing for some awesomeness, I hope they would satisfy you.