# Mutable

While OCaml is a functional programming language and emphasises pure functional style, it allows *mutable* (variables and values) and hence imperative programming style. The reason is said in Real World OCaml:

Imperative code is of fundamental importance to any practical programming language, because real-world tasks require that you interact with the outside world, which is by its nature imperative. Imperative programming can also be important for performance. While pure code is quite efficient in OCaml, there are many algorithms that can only be implemented efficiently using imperative techniques.

How to write imperative code in OCaml has been well introduced in both OCaml's documentation and user's manual and Real World OCaml. The imperative logic flow is very similar as in other imperative languages. For example, we can write an OCaml *binary search* like this:

```
let binary_search a x =
let len = Array.length a in
let p = ref 0 and q = ref (len-1) in
let loc = ref None in
while !p <= !q && !loc = None && !p >= 0 && !q < len do
let mi = (!p + !q) / 2 in
if x = a.(mi) then loc := Some mi
else if x > a.(mi) then p := mi + 1
else q := mi - 1
done;
!loc
```

*Mutable* or *imperative* could be a snake. People, who are *forced* to use OCaml (for work or course assignments) *while they are not yet convinced by the potential functional power*, may be happy when they know stuff can be written like in Java. They may also intend to (secretly) write imperative code in OCaml as much as possible to just get things done.

It sounds bad and it indeed is. What is the point to code in a heavily imperative way with a functional programming language?

However, I won't be worried even if one does that, because I know it is just the initial avoidance and it won't last long. Why? Because writing imperative code is a bit troublesome anyway. Let's revisit the `binary_search`

we wrote before.

You can see that nothing is straightforward there.

- When we define a mutable variable, we have to use
`ref`

; - When we get the value out, we have to use
`!`

everywhere; - When we assign a value, we can't forget
`:`

before`=`

; - We even need to add
`.`

before`()`

when we access arrays.

Even after we finish the function, it appears quite verbose and a little bit uncomfortable to read, right? This is why I am sure in longer term, one may give up imperative style, just truly learn the functional style and eventually understand OCaml's beautiful side.

So if we will not / should not write everything imperatively, when to leverage the benefits from *mutable*?

# When performance requires it

In order to manipulate a sequence of elements, we noramlly will use `array`

in imperative languages; however, in pure functional language, we prefer `list`

as it is immutable.

The best thing `array`

offers us is the *O(1)* speed in accessing an element via its index. But we lose this advantage if using `list`

, i.e., we have to do a linear scan, which is `O(n)`

, to get the `ith`

value. Sometimes it would be too slow. To demonstrate this, let's have a look at the problem of *shuffle*:

Given a sequence, randomize the order of all the elements inside.

A typical algorithm for *shuffle* can be:

`i`

initially is`0`

and`len`

is the length of the sequence.- Randomly generate an integer
`r`

within`[i, len)`

. - Swap the element at
`i`

and the one at`r`

. - Increase
`i`

for next targeting element. - Repeat from step 2.

If we use `list`

in OCaml, then we will have:

```
let rm_nth n l =
let rec aux acc i = function
| [] -> List.rev acc
| _::tl when i = n -> List.rev_append acc tl
| x::tl -> aux (x::acc) (i+1) tl
in
aux [] 0 l
(* a slight modification when using list.
we do not swap, instead, we remove the element from the randomised index
and put it to the new list.
*)
let shuffle l =
Random.self_init();
let len = List.length l in
let rec aux i acc l =
if i < len then
let r = Random.int (len - i) in
aux (i+1) (List.nth l r ::acc) (rm_nth r l)
else acc
in
aux 0 [] l
```

We have to scan the list twice in each iteration: once to get the element at the randomised index `r`

and once to remove it. Totally we have `n`

iterations, thus, the time complexity is *O(n^2)*.

If we use `array`

, then it is much faster with time complexity of *O(n)* as locating an element and swapping two elements in array just cost *O(1)*.

```
let swap a i j =
let tmp = a.(i) in
a.(i) <- a.(j);
a.(j) <- tmp
let shuffle_array a =
Random.self_init();
let len = Array.length a in
for i = 0 to len-1 do
let r = i + Random.int (len - i) in
swap a i r
done
```

In addition to `array`

, some other imperative data structures can also improve performance. For example, the `Hashtbl`

is the traditional `hash table`

with `O(1)`

time complexity. However, the functional `Map`

module uses *AVL* tree and thus provides `O(logN)`

. If one really cares about such a difference, `Hashtbl`

can be used with higher priority.

Note that we *should not use potential performance boost as an excuse* to use imperative code wildly in OCaml. In many cases, functional data structures and algorithms have similar performance, such as the *2-list* based functional queue can offer us amortised `O(1)`

.

# When we need to remember something

Constantly creating new values makes functional programming safer and logically clearer, as discussed previously. However, occasionally we don't want everything to be newly created; instead, we need a mutable *object* that stays put in the memory but can be updated. In this way, we can remember values in a single place.

Write a function that does

`x + y * z`

. It outputs the correct result and in addition, prints how many times it has been called so far.

The `x + y * z`

part is trivial, but the later recording the times of calls is tricky. Let's try to implement such a function in pure functional style first.

```
(* pure functional solution *)
let f x y z last_called_count =
print_int (last_called_count+1);
x + y * z, last_called_count + 1
```

The code above can roughly achieve the goal. However, it is not great.

- It needs an additional argument which is meant to be the most recent count of calls. The way could work but is very vulnerable because it accepts whatever integer and there is no way to constrain it to be the real last count.
- It has to return the new call count along with the real result

When a user uses this function, he / she will feel awkward. `last_called_count`

needs to be taken care of very carefully in the code flow to avoid miscounting. The result returned is a tuple so pattern matching is necessary to obtain the real value of `x + y * z`

. Also again, one need to somehow remember the new call count so he / she can supply to the next call.

This is where imperative programming comes to help:

```
(* with help of imperative programming.
note that the function g can be anonymous.*)
let f =
let countable_helper () =
let called_count = ref 0 in
let g x y z =
called_count := !called_count + 1;
print_int !called_count;
x + y * z
in
g
in
countable_helper()
```

`countable_helper`

is a helper function. If it is called, it first creates a mutable `called_count`

and then pass it to `g`

. Because `called_count`

is mutable, so `g`

can freely increase its value by `1`

. Eventually `x + y * z`

is done after printing `called_count`

. `g`

is returned to f as it is the result of `countable_helper()`

, i.e., `f`

is `g`

.

# When we need later substitution

I found this interesting case from The chapter for imperative programming in Real World OCaml and it is about *memoize*. In this section, we borrow contents directly from the book but will emphasise *the substitution* part.

Python developers should be quite familiar with the `memoize`

decorator . Essentially, it makes functions remember the *argument, result* pairs so that if the same arguments are supplied again then the stored result is returned immediately without repeated computation.

We can write `memoize`

in OCaml too:

```
(* directly copied from Real World OCaml, does not use anonymous function *)
let memoize f =
let table = Hashtbl.Poly.create () in
let g x =
match Hashtbl.find table x with
| Some y -> y
| None ->
let y = f x in
Hashtbl.add_exn table ~key:x ~data:y;
y
in
g
```

It uses `Hashtbl`

as the imperative storage box and the mechanism is similar as our previous example of *call count*.

The fascinating bit comes from the fact that this `memoize`

cannot handle recursive functions. If you don't believe it, let's try with the *fibonacci* function:

```
let rec fib n =
if n <= 1 then 1 else fib(n-1) + fib(n-2)
let fib_mem = memoize fib
```

Hmmm...it will compile and seems working. Yes, `fib_mem`

will return correct results, but we shall have a closer look to see whether it really remembers the *argument, result* pairs. So what is `fib_mem`

exactly? By replacing `f`

with `fib`

inside `memoize`

, we get:

```
(* not proper ocaml code, just for demonstration purpose *)
fib_mem is actually
match Hashtbl.find table x with
| Some y -> y
| None ->
let y = fib x in (* here is fib *)
Hashtbl.add_exn table ~key:x ~data:y;
y
```

So inside `fib_mem`

, if a new argument comes, it will call `fib`

and `fib`

is actually not memoized at all. What does this mean? It means `fib_mem`

may eventually remember the new argument and its result, but will never remember all the arguments and their results along the recusive way.

For example, let's do `fib_mem 5`

.

`5`

is not in the Hashtbl, so`fib 5`

is called.- According to the definition of
`fib`

,`fib 5 = fib 4 + fib 3`

, so now`fib 4`

. `fib 4 = fib 3 + fib 2`

, no problem, but look,`fib 3`

will be done after`fib 4`

finishes.- Assuming
`fib 4`

is done, then we go back to the right hand side of the`+`

in point 2, which means we need to do`fib 3`

. - Will we really do
`fib 3`

now? Yes, unfortunately. Even though it has been done during`fib 4`

, because`fib`

is not memoized.

To solve this problem, we need *the substitution* technique with the help of imperative programming.

## An attractive but WRONG solution

When I read the book for the first time, before continuing for the solution of memoized fib, I tried something on my own.

So the problem is the `f`

inside `memoize`

is not memoized, right? How about we make that that `f`

mutable, then after we define `g`

, we give `g`

to `f`

? Since `g`

is memoized and `g`

is calling `g`

inside, then the whole thing would be memoized, right?

```
let mem_rec_bad f =
let table = Hashtbl.Poly.create () in
let sub_f = ref f in
let g x =
match Hashtbl.find table x with
| Some y -> y
| None ->
let y = !sub_f x in
Hashtbl.add_exn table ~key:x ~data:y;
y
in
sub_f := g;
g
```

It can pass compiler but will stackoverflow if you do `let fib_mem = mem_rec_bad fib;; fib_mem 5`

. After we define `g`

and replae the original value of `sub_f`

with `g`

, it seems fine, but What is `g`

now?

```
(* g is now actually like: *)
let g x =
match Hashtbl.find table x with
| Some y -> y
| None ->
let y = g x in
Hashtbl.add_exn table ~key:x ~data:y;
y
```

`g`

will check the Hashtbl forever! And we totally lose the functionality of the original `f`

and `g`

becomes meaningless at all.

So we can't directly replace the `f`

inside. Is there any way to reserve the functionality of `f`

but somehow substitute some part with the memoization?

## The elegant solution

**Updated on 2015-01-25:** Please go to Recursive Memoize & Untying the Recursive Knot for better explanation of recursive memoize. This following content here is not that good.

The answer is Yes, and we have to build a non-recusive function out of the recusive `fib`

.

```
(* directly copied from the book *)
let fib_norec f n =
if n <= 1 then 1
else f(n-2) + f(n-1)
let fib_rec n = fib_norec fib_rec n
```

(I never thought before that a recursive function can be split like this, honestly. I don't know how to induct such a way and can't explain more. I guess we just learn it as it is and continue. More descriptions of it is in the book.) (*[1]*)

Basically, for a recursive function `f_rec`

, we can

- change
`f_rec`

to`f_norec`

with an additional parameter`f`

- replace the
`f_rec`

in the original body with`f`

- then
`let rec f_rec parameters = f_norec f_rec parameters`

- In this way,
`f_rec`

will bring up`f_norec`

whenever being called, so actually the recusive logic is still controled by`f_norec`

.

Here the important part is `f_norec`

and its parameter `f`

gives us the chance for correct substitution.

```
let memo_rec f_norec =
let fref = ref (fun _ -> assert false) in
(* !fref below will have the new function defined next, which is
the memoized one *)
let h x = f_norec !fref x in
let f_mem = memoize h in
fref := f_mem;
f_mem
```

Now, `fref`

's value will become the memoized one with `f_norec`

. Since `f_norec`

controls the logic and real work, we do not lose any functionality but can remember every *argument, result* pairs along the recursive way.

# Summary

In this post, we just list out three quite typical cases where imperative programming can be helpful. There are many others, such as lazy, etc.

Moreover, one more suggestion on using imperative style is that *don't expose imperative interface to public if you can*. That means we should try to embed imperative code inside pure function as much as possible, so that the users of our functions cannot access the mutable parts. This way can let our functional code continue being pure enough while enjoying the juice of imperative programming internally.

**[1].** glacialthinker has commented here. This technique is called *untying the recursive knot*.

Mystique in the head image is another major character in X-Men's world. *She is a shapeshifter who can mimic the appearance and voice of any person with exquisite precision.* Similar like the *mutable* in OCaml that always sticks to the same type once being created, human is the only *thing* she can mutate to.