While OCaml is a functional programming language and emphasises pure functional style, it allows mutable (variables and values) and hence imperative programming style. The reason is said in Real World OCaml:
Imperative code is of fundamental importance to any practical programming language, because real-world tasks require that you interact with the outside world, which is by its nature imperative. Imperative programming can also be important for performance. While pure code is quite efficient in OCaml, there are many algorithms that can only be implemented efficiently using imperative techniques.
How to write imperative code in OCaml has been well introduced in both OCaml's documentation and user's manual and Real World OCaml. The imperative logic flow is very similar as in other imperative languages. For example, we can write an OCaml binary search like this:
let binary_search a x = let len = Array.length a in let p = ref 0 and q = ref (len-1) in let loc = ref None in while !p <= !q && !loc = None && !p >= 0 && !q < len do let mi = (!p + !q) / 2 in if x = a.(mi) then loc := Some mi else if x > a.(mi) then p := mi + 1 else q := mi - 1 done; !loc
Mutable or imperative could be a snake. People, who are forced to use OCaml (for work or course assignments) while they are not yet convinced by the potential functional power, may be happy when they know stuff can be written like in Java. They may also intend to (secretly) write imperative code in OCaml as much as possible to just get things done.
It sounds bad and it indeed is. What is the point to code in a heavily imperative way with a functional programming language?
However, I won't be worried even if one does that, because I know it is just the initial avoidance and it won't last long. Why? Because writing imperative code is a bit troublesome anyway. Let's revisit the
binary_search we wrote before.
You can see that nothing is straightforward there.
- When we define a mutable variable, we have to use
- When we get the value out, we have to use
- When we assign a value, we can't forget
- We even need to add
()when we access arrays.
Even after we finish the function, it appears quite verbose and a little bit uncomfortable to read, right? This is why I am sure in longer term, one may give up imperative style, just truly learn the functional style and eventually understand OCaml's beautiful side.
So if we will not / should not write everything imperatively, when to leverage the benefits from mutable?
When performance requires it
In order to manipulate a sequence of elements, we noramlly will use
array in imperative languages; however, in pure functional language, we prefer
list as it is immutable.
The best thing
array offers us is the O(1) speed in accessing an element via its index. But we lose this advantage if using
list, i.e., we have to do a linear scan, which is
O(n), to get the
ith value. Sometimes it would be too slow. To demonstrate this, let's have a look at the problem of shuffle:
Given a sequence, randomize the order of all the elements inside.
A typical algorithm for shuffle can be:
lenis the length of the sequence.
- Randomly generate an integer
- Swap the element at
iand the one at
ifor next targeting element.
- Repeat from step 2.
If we use
list in OCaml, then we will have:
let rm_nth n l = let rec aux acc i = function |  -> List.rev acc | _::tl when i = n -> List.rev_append acc tl | x::tl -> aux (x::acc) (i+1) tl in aux  0 l (* a slight modification when using list. we do not swap, instead, we remove the element from the randomised index and put it to the new list. *) let shuffle l = Random.self_init(); let len = List.length l in let rec aux i acc l = if i < len then let r = Random.int (len - i) in aux (i+1) (List.nth l r ::acc) (rm_nth r l) else acc in aux 0  l
We have to scan the list twice in each iteration: once to get the element at the randomised index
r and once to remove it. Totally we have
n iterations, thus, the time complexity is O(n^2).
If we use
array, then it is much faster with time complexity of O(n) as locating an element and swapping two elements in array just cost O(1).
let swap a i j = let tmp = a.(i) in a.(i) <- a.(j); a.(j) <- tmp let shuffle_array a = Random.self_init(); let len = Array.length a in for i = 0 to len-1 do let r = i + Random.int (len - i) in swap a i r done
In addition to
array, some other imperative data structures can also improve performance. For example, the
Hashtbl is the traditional
hash table with
O(1) time complexity. However, the functional
Map module uses AVL tree and thus provides
O(logN). If one really cares about such a difference,
Hashtbl can be used with higher priority.
Note that we should not use potential performance boost as an excuse to use imperative code wildly in OCaml. In many cases, functional data structures and algorithms have similar performance, such as the 2-list based functional queue can offer us amortised
When we need to remember something
Constantly creating new values makes functional programming safer and logically clearer, as discussed previously. However, occasionally we don't want everything to be newly created; instead, we need a mutable object that stays put in the memory but can be updated. In this way, we can remember values in a single place.
Write a function that does
x + y * z. It outputs the correct result and in addition, prints how many times it has been called so far.
x + y * z part is trivial, but the later recording the times of calls is tricky. Let's try to implement such a function in pure functional style first.
(* pure functional solution *) let f x y z last_called_count = print_int (last_called_count+1); x + y * z, last_called_count + 1
The code above can roughly achieve the goal. However, it is not great.
- It needs an additional argument which is meant to be the most recent count of calls. The way could work but is very vulnerable because it accepts whatever integer and there is no way to constrain it to be the real last count.
- It has to return the new call count along with the real result
When a user uses this function, he / she will feel awkward.
last_called_count needs to be taken care of very carefully in the code flow to avoid miscounting. The result returned is a tuple so pattern matching is necessary to obtain the real value of
x + y * z. Also again, one need to somehow remember the new call count so he / she can supply to the next call.
This is where imperative programming comes to help:
(* with help of imperative programming. note that the function g can be anonymous.*) let f = let countable_helper () = let called_count = ref 0 in let g x y z = called_count := !called_count + 1; print_int !called_count; x + y * z in g in countable_helper()
countable_helper is a helper function. If it is called, it first creates a mutable
called_count and then pass it to
called_count is mutable, so
g can freely increase its value by
x + y * z is done after printing
g is returned to f as it is the result of
When we need later substitution
I found this interesting case from The chapter for imperative programming in Real World OCaml and it is about memoize. In this section, we borrow contents directly from the book but will emphasise the substitution part.
Python developers should be quite familiar with the
memoize decorator . Essentially, it makes functions remember the argument, result pairs so that if the same arguments are supplied again then the stored result is returned immediately without repeated computation.
We can write
memoize in OCaml too:
(* directly copied from Real World OCaml, does not use anonymous function *) let memoize f = let table = Hashtbl.Poly.create () in let g x = match Hashtbl.find table x with | Some y -> y | None -> let y = f x in Hashtbl.add_exn table ~key:x ~data:y; y in g
Hashtbl as the imperative storage box and the mechanism is similar as our previous example of call count.
The fascinating bit comes from the fact that this
memoize cannot handle recursive functions. If you don't believe it, let's try with the fibonacci function:
let rec fib n = if n <= 1 then 1 else fib(n-1) + fib(n-2) let fib_mem = memoize fib
Hmmm...it will compile and seems working. Yes,
fib_mem will return correct results, but we shall have a closer look to see whether it really remembers the argument, result pairs. So what is
fib_mem exactly? By replacing
memoize, we get:
(* not proper ocaml code, just for demonstration purpose *) fib_mem is actually match Hashtbl.find table x with | Some y -> y | None -> let y = fib x in (* here is fib *) Hashtbl.add_exn table ~key:x ~data:y; y
fib_mem, if a new argument comes, it will call
fib is actually not memoized at all. What does this mean? It means
fib_mem may eventually remember the new argument and its result, but will never remember all the arguments and their results along the recusive way.
For example, let's do
5is not in the Hashtbl, so
fib 5is called.
- According to the definition of
fib 5 = fib 4 + fib 3, so now
fib 4 = fib 3 + fib 2, no problem, but look,
fib 3will be done after
fib 4is done, then we go back to the right hand side of the
+in point 2, which means we need to do
- Will we really do
fib 3now? Yes, unfortunately. Even though it has been done during
fib 4, because
fibis not memoized.
To solve this problem, we need the substitution technique with the help of imperative programming.
An attractive but WRONG solution
When I read the book for the first time, before continuing for the solution of memoized fib, I tried something on my own.
So the problem is the
memoize is not memoized, right? How about we make that that
f mutable, then after we define
g, we give
g is memoized and
g is calling
g inside, then the whole thing would be memoized, right?
let mem_rec_bad f = let table = Hashtbl.Poly.create () in let sub_f = ref f in let g x = match Hashtbl.find table x with | Some y -> y | None -> let y = !sub_f x in Hashtbl.add_exn table ~key:x ~data:y; y in sub_f := g; g
It can pass compiler but will stackoverflow if you do
let fib_mem = mem_rec_bad fib;; fib_mem 5. After we define
g and replae the original value of
g, it seems fine, but What is
(* g is now actually like: *) let g x = match Hashtbl.find table x with | Some y -> y | None -> let y = g x in Hashtbl.add_exn table ~key:x ~data:y; y
g will check the Hashtbl forever! And we totally lose the functionality of the original
g becomes meaningless at all.
So we can't directly replace the
f inside. Is there any way to reserve the functionality of
f but somehow substitute some part with the memoization?
The elegant solution
Updated on 2015-01-25: Please go to Recursive Memoize & Untying the Recursive Knot for better explanation of recursive memoize. This following content here is not that good.
The answer is Yes, and we have to build a non-recusive function out of the recusive
(* directly copied from the book *) let fib_norec f n = if n <= 1 then 1 else f(n-2) + f(n-1) let fib_rec n = fib_norec fib_rec n
(I never thought before that a recursive function can be split like this, honestly. I don't know how to induct such a way and can't explain more. I guess we just learn it as it is and continue. More descriptions of it is in the book.) ()
Basically, for a recursive function
f_rec, we can
f_norecwith an additional parameter
- replace the
f_recin the original body with
let rec f_rec parameters = f_norec f_rec parameters
- In this way,
f_recwill bring up
f_norecwhenever being called, so actually the recusive logic is still controled by
Here the important part is
f_norec and its parameter
f gives us the chance for correct substitution.
let memo_rec f_norec = let fref = ref (fun _ -> assert false) in (* !fref below will have the new function defined next, which is the memoized one *) let h x = f_norec !fref x in let f_mem = memoize h in fref := f_mem; f_mem
fref's value will become the memoized one with
f_norec controls the logic and real work, we do not lose any functionality but can remember every argument, result pairs along the recursive way.
In this post, we just list out three quite typical cases where imperative programming can be helpful. There are many others, such as lazy, etc.
Moreover, one more suggestion on using imperative style is that don't expose imperative interface to public if you can. That means we should try to embed imperative code inside pure function as much as possible, so that the users of our functions cannot access the mutable parts. This way can let our functional code continue being pure enough while enjoying the juice of imperative programming internally.
. glacialthinker has commented here. This technique is called untying the recursive knot.
Mystique in the head image is another major character in X-Men's world. She is a shapeshifter who can mimic the appearance and voice of any person with exquisite precision. Similar like the mutable in OCaml that always sticks to the same type once being created, human is the only thing she can mutate to.