mutable, shuffle, imperative programming, binary search, memoize

Mutable

— Published 20 January 2015 —

While OCaml is a functional programming language and emphasises pure functional style, it allows mutable (variables and values) and hence imperative programming style. The reason is said in Real World OCaml:

Imperative code is of fundamental importance to any practical programming language, because real-world tasks require that you interact with the outside world, which is by its nature imperative. Imperative programming can also be important for performance. While pure code is quite efficient in OCaml, there are many algorithms that can only be implemented efficiently using imperative techniques.

How to write imperative code in OCaml has been well introduced in both OCaml's documentation and user's manual and Real World OCaml. The imperative logic flow is very similar as in other imperative languages. For example, we can write an OCaml binary search like this:

let binary_search a x =  
  let len = Array.length a in
  let p = ref 0 and q = ref (len-1) in
  let loc = ref None in
  while !p <= !q && !loc = None && !p >= 0  && !q < len do
    let mi = (!p + !q) / 2 in
    if x = a.(mi) then loc := Some mi
    else if x > a.(mi) then p := mi + 1
    else q := mi - 1
  done;
  !loc

Mutable or imperative could be a snake. People, who are forced to use OCaml (for work or course assignments) while they are not yet convinced by the potential functional power, may be happy when they know stuff can be written like in Java. They may also intend to (secretly) write imperative code in OCaml as much as possible to just get things done.

It sounds bad and it indeed is. What is the point to code in a heavily imperative way with a functional programming language?

However, I won't be worried even if one does that, because I know it is just the initial avoidance and it won't last long. Why? Because writing imperative code is a bit troublesome anyway. Let's revisit the binary_search we wrote before.

You can see that nothing is straightforward there.

When we define a mutable variable, we have to use ref;
When we get the value out, we have to use ! everywhere;
When we assign a value, we can't forget : before =;
We even need to add . before () when we access arrays.

Even after we finish the function, it appears quite verbose and a little bit uncomfortable to read, right? This is why I am sure in longer term, one may give up imperative style, just truly learn the functional style and eventually understand OCaml's beautiful side.

So if we will not / should not write everything imperatively, when to leverage the benefits from mutable?

When performance requires it

In order to manipulate a sequence of elements, we noramlly will use array in imperative languages; however, in pure functional language, we prefer list as it is immutable.

The best thing array offers us is the O(1) speed in accessing an element via its index. But we lose this advantage if using list, i.e., we have to do a linear scan, which is O(n), to get the ith value. Sometimes it would be too slow. To demonstrate this, let's have a look at the problem of shuffle:

Given a sequence, randomize the order of all the elements inside.

A typical algorithm for shuffle can be:

i initially is 0 and len is the length of the sequence.
Randomly generate an integer r within [i, len).
Swap the element at i and the one at r.
Increase i for next targeting element.
Repeat from step 2.

If we use list in OCaml, then we will have:

let rm_nth n l =  
  let rec aux acc i = function
    | [] -> List.rev acc
    | _::tl when i = n -> List.rev_append acc tl
    | x::tl -> aux (x::acc) (i+1) tl
  in 
  aux [] 0 l

(* a slight modification when using list.
   we do not swap, instead, we remove the element from the randomised index 
   and put it to the new list.
*)
let shuffle l =  
  Random.self_init();
  let len = List.length l in
  let rec aux i acc l =
    if i < len then
      let r = Random.int (len - i) in
      aux (i+1) (List.nth l r ::acc) (rm_nth r l)
    else acc
  in
  aux 0 [] l

We have to scan the list twice in each iteration: once to get the element at the randomised index r and once to remove it. Totally we have n iterations, thus, the time complexity is O(n^2).

If we use array, then it is much faster with time complexity of O(n) as locating an element and swapping two elements in array just cost O(1).

let swap a i j =  
  let tmp = a.(i) in
  a.(i) <- a.(j);
  a.(j) <- tmp

let shuffle_array a =  
  Random.self_init();
  let len = Array.length a in
  for i = 0 to len-1 do
    let r = i + Random.int (len - i) in
    swap a i r
  done

In addition to array, some other imperative data structures can also improve performance. For example, the Hashtbl is the traditional hash table with O(1) time complexity. However, the functional Map module uses AVL tree and thus provides O(logN). If one really cares about such a difference, Hashtbl can be used with higher priority.

Note that we should not use potential performance boost as an excuse to use imperative code wildly in OCaml. In many cases, functional data structures and algorithms have similar performance, such as the 2-list based functional queue can offer us amortised O(1).

When we need to remember something

Constantly creating new values makes functional programming safer and logically clearer, as discussed previously. However, occasionally we don't want everything to be newly created; instead, we need a mutable object that stays put in the memory but can be updated. In this way, we can remember values in a single place.

Write a function that does x + y * z. It outputs the correct result and in addition, prints how many times it has been called so far.

The x + y * z part is trivial, but the later recording the times of calls is tricky. Let's try to implement such a function in pure functional style first.

(* pure functional solution *)
let f x y z last_called_count =  
  print_int (last_called_count+1);
  x + y * z, last_called_count + 1

The code above can roughly achieve the goal. However, it is not great.

It needs an additional argument which is meant to be the most recent count of calls. The way could work but is very vulnerable because it accepts whatever integer and there is no way to constrain it to be the real last count.
It has to return the new call count along with the real result

When a user uses this function, he / she will feel awkward. last_called_count needs to be taken care of very carefully in the code flow to avoid miscounting. The result returned is a tuple so pattern matching is necessary to obtain the real value of x + y * z. Also again, one need to somehow remember the new call count so he / she can supply to the next call.

This is where imperative programming comes to help:

(* with help of imperative programming.
   note that the function g can be anonymous.*)
let f =  
  let countable_helper () =
    let called_count = ref 0 in
    let g x y z = 
      called_count := !called_count + 1;
      print_int !called_count;
      x + y * z
    in
    g
  in 
  countable_helper()

countable_helper is a helper function. If it is called, it first creates a mutable called_count and then pass it to g. Because called_count is mutable, so g can freely increase its value by 1. Eventually x + y * z is done after printing called_count. g is returned to f as it is the result of countable_helper(), i.e., f is g.

When we need later substitution

I found this interesting case from The chapter for imperative programming in Real World OCaml and it is about memoize. In this section, we borrow contents directly from the book but will emphasise the substitution part.

Python developers should be quite familiar with the memoize decorator . Essentially, it makes functions remember the argument, result pairs so that if the same arguments are supplied again then the stored result is returned immediately without repeated computation.

We can write memoize in OCaml too:

(* directly copied from Real World OCaml, does not use anonymous function *)
let memoize f =  
  let table = Hashtbl.Poly.create () in
  let g x = 
    match Hashtbl.find table x with
    | Some y -> y
    | None ->
      let y = f x in
      Hashtbl.add_exn table ~key:x ~data:y;
      y
  in
  g

It uses Hashtbl as the imperative storage box and the mechanism is similar as our previous example of call count.

The fascinating bit comes from the fact that this memoize cannot handle recursive functions. If you don't believe it, let's try with the fibonacci function:

let rec fib n =  
  if n <= 1 then 1 else fib(n-1) + fib(n-2)

let fib_mem = memoize fib

Hmmm...it will compile and seems working. Yes, fib_mem will return correct results, but we shall have a closer look to see whether it really remembers the argument, result pairs. So what is fib_mem exactly? By replacing f with fib inside memoize, we get:

(* not proper ocaml code, just for demonstration purpose *)

fib_mem is actually  
  match Hashtbl.find table x with
  | Some y -> y
  | None ->
    let y = fib x in (* here is fib *)
    Hashtbl.add_exn table ~key:x ~data:y;
    y

So inside fib_mem, if a new argument comes, it will call fib and fib is actually not memoized at all. What does this mean? It means fib_mem may eventually remember the new argument and its result, but will never remember all the arguments and their results along the recusive way.

For example, let's do fib_mem 5.

5 is not in the Hashtbl, so fib 5 is called.
According to the definition of fib, fib 5 = fib 4 + fib 3, so now fib 4.
fib 4 = fib 3 + fib 2, no problem, but look, fib 3 will be done after fib 4 finishes.
Assuming fib 4 is done, then we go back to the right hand side of the + in point 2, which means we need to do fib 3.
Will we really do fib 3 now? Yes, unfortunately. Even though it has been done during fib 4, because fib is not memoized.

To solve this problem, we need the substitution technique with the help of imperative programming.

An attractive but WRONG solution

When I read the book for the first time, before continuing for the solution of memoized fib, I tried something on my own.

So the problem is the f inside memoize is not memoized, right? How about we make that that f mutable, then after we define g, we give g to f? Since g is memoized and g is calling g inside, then the whole thing would be memoized, right?

let mem_rec_bad f =  
  let table = Hashtbl.Poly.create () in
  let sub_f = ref f in
  let g x = 
    match Hashtbl.find table x with
    | Some y -> y
    | None ->
      let y = !sub_f x in
      Hashtbl.add_exn table ~key:x ~data:y;
      y
  in
  sub_f := g;
  g

It can pass compiler but will stackoverflow if you do let fib_mem = mem_rec_bad fib;; fib_mem 5. After we define g and replae the original value of sub_f with g, it seems fine, but What is g now?

(* g is now actually like: *)
let g x =  
  match Hashtbl.find table x with
  | Some y -> y
  | None ->
    let y = g x in
    Hashtbl.add_exn table ~key:x ~data:y;
    y

g will check the Hashtbl forever! And we totally lose the functionality of the original f and g becomes meaningless at all.

So we can't directly replace the f inside. Is there any way to reserve the functionality of f but somehow substitute some part with the memoization?

The elegant solution

Updated on 2015-01-25: Please go to Recursive Memoize & Untying the Recursive Knot for better explanation of recursive memoize. This following content here is not that good.

The answer is Yes, and we have to build a non-recusive function out of the recusive fib.

(* directly copied from the book *)
let fib_norec f n =  
  if n <= 1 then 1
  else f(n-2) + f(n-1)

let fib_rec n = fib_norec fib_rec n

(I never thought before that a recursive function can be split like this, honestly. I don't know how to induct such a way and can't explain more. I guess we just learn it as it is and continue. More descriptions of it is in the book.) ([1])

Basically, for a recursive function f_rec, we can

change f_rec to f_norec with an additional parameter f
replace the f_rec in the original body with f
then let rec f_rec parameters = f_norec f_rec parameters
In this way, f_rec will bring up f_norec whenever being called, so actually the recusive logic is still controled by f_norec.

Here the important part is f_norec and its parameter f gives us the chance for correct substitution.

let memo_rec f_norec =  
  let fref = ref (fun _ -> assert false) in
  (* !fref below will have the new function defined next, which is
     the memoized one *)
  let h x = f_norec !fref x in 
  let f_mem = memoize h in
  fref := f_mem;
  f_mem

Now, fref's value will become the memoized one with f_norec. Since f_norec controls the logic and real work, we do not lose any functionality but can remember every argument, result pairs along the recursive way.

Summary

In this post, we just list out three quite typical cases where imperative programming can be helpful. There are many others, such as lazy, etc.

Moreover, one more suggestion on using imperative style is that don't expose imperative interface to public if you can. That means we should try to embed imperative code inside pure function as much as possible, so that the users of our functions cannot access the mutable parts. This way can let our functional code continue being pure enough while enjoying the juice of imperative programming internally.

[1]. glacialthinker has commented here. This technique is called untying the recursive knot.

Mystique in the head image is another major character in X-Men's world. She is a shapeshifter who can mimic the appearance and voice of any person with exquisite precision. Similar like the mutable in OCaml that always sticks to the same type once being created, human is the only thing she can mutate to.