The Magic of Thunk - Lazy

— Published 13 November 2014 —

lazy

The dark side of thunk

As discussed previously, thunk is used to encapsulate computations for later uses. Although we may not evaluate a thunk yet, we know its value is determined. However, it has a weakness: when we invoke a thunk repeatedly, the same computation inside it gets carried out again and again, despite of the fact that the results returned are always identical. For example,

let fibonacci n =  
  let rec fib a b i =
    if i = n then a
    else fib b (a+b) (i+1)
  in 
  fib 0 1 0;;

let millionth_fib = fun() -> fibonacci 1000000;;

let show_the_fib fib = function  
  | true -> Some (fib())
  | flase -> None;;

let print_the_fib fib = function  
  | true -> print_int (fib())
  | flase -> ();;

show_the_fib and print_the_fib both use thunks, just in case that the arguments are always false (so the heavy computation would never be necessarily carried out). It is good. But what if show_the_fib millionth_fib true and print_the_fib millionth_fib true? Then actually millionth_fib will be evaluated twice. What if we give billionth_fib? What if show_the_fib millionth_fib true and/or print_the_fib millionth_fib true get called lots of times? Then the heavy computations will be done again and again. You may say in those worst cases, we just let millionth_fib() be done once anyway and bind it to a variable, i.e., don't use thunk. Yes, we can do that, however, it is not perfect as it is still a waste of CPU when arguments are always false. Thunk puts one gate there to make sure that the computation would not be touched if unnecessary, however, once it becomes necessary, the times of executions cannot be controlled.

lazy is there to make all those bad cases go away. It is essentially a thunk (delay the computation, first gate) with the feature of making sure that the computation would be executed at most once (control the times, second gate).

How to invent lazy?

lazy is actually a built-in keyword of OCaml for you to create a lazy_t. There is a module Lazy there for you to fully utilise lazy. However, in this section, we will pretend lazy never exists and try to invent it. In this way, the understanding of lazy would hopefully become deeper. Let's get started.

Assume we have let millionth_fib = fun() -> fibonacci 1000000;; as before. Now we want to have millionth_fib_lazy. How can we add the second gate to stop the repeated computations, i.e., as long as it is computed once, how to return the existing result directly? Let's rephrase it more clearly:

millionth_fib_lazy needs to be thunk when never evaluated.
millionth_fib_lazy needs to become an int immediately after first time evaluation.
Future attempts of accessing it will get the int back directly.

The key part is from step 1 to step 2: the transition between two different types. So we have two fundamental problems to solve now:

millionth_fib_lazy of course must belong to a type and we can call it lazy_node_t. Also ideally lazy_node_t must include two other types: thunk and int (int is particularly for this case and we can have 'a for general purposes). We do not name the type as lazy_t yet because of the other problem below.
One type (thunk) must be able to transit to the other (int), but not the other way around

lazy_node_t - Multiple types in one type

Variants can do this job. It combines multiple types into one and still stands as one single type to OCaml's type system. If you learn or already know variant type, then it is straightforward to write the type lazy_node_t:

type 'a lazy_node_t =  
  | Delayed of (unit -> 'a)
  | Value of 'a;;

So a lazy can initially be Delayed of a thunk. Then after first time evaluation, it becomes a Value of 'a. In the case of millionth_fib_lazy, it is initially Delayed (fun() -> fibonacci 1000000), then later on at some point, it becomes Value 1884755131. Are we done? No! Although millionthfiblazy now can be two things, we still need to handle the transition.

lazy_t - ref

Remember one fact that in OCaml world, being immutable is the default. So even if we somehow manage to evaluation Delayed inside millionthfiblazy and return the result as Value, it will be given as a new binding and will not change millionthfiblazy. This is not what we want, instead, we want to change the binding/variable millionthfiblazy itself. So wherever it is used, it has been changed already. Thus we have to use ref and here comes our final lazy_t:

type 'a lazy_node_t =  
  | Delayed of (unit -> 'a)
  | Value of 'a
  | Exn of exn;; (* we add exception type here just in case the thunk might produce exception *)

type 'a lazy_t = 'a lazy_node_t ref;;

let lazy_from_fun f = ref (Delayed f);; (* create a lazy_t from a thunk *)

Now we are ready to make the transitions more concrete.

let force x_lazy =  
  match !x_lazy with
    | Value x -> x
    | Exn e -> raise e
    | Delayed f -> 
      try
        let r = f() in
        x_lazy := Value r;
        r
      with e -> 
        x_lazy := Exn e;
        raise e;;

force is used to evaluate lazy_t. The logic is simple:

If the lazy is Value already, just return the result
If the lazy is Exn, then raise it
Otherwise, we change the lazy to either Value of Exn accordingly first, then return or raise

Now the two fundamental problems mentioned before are solved. Let's try it:

let millionth_fib_lazy = lazy_from_fun (fun() -> print_endline "calculate millionth fib";fibonacci 1000000);;

let show_the_fib fib = function  
  | true -> Some (force fib)
  | flase -> None;;

let print_the_fib fib = function  
  | true -> print_int (force fib)
  | flase -> ();;

show_the_fib millionth_fib_lazy true;;  
print_the_fib millionth_fib_lazy true;;  
show_the_fib millionth_fib_lazy true;;  
show_the_fib millionth_fib_lazy true;;

calculate millionth fib  
int option = Some 1884755131  
1884755131  
int option = Some 1884755131  
int option = Some 1884755131

We can see calculate millionth fib is printed only once. It seems working! So are we done? No, still no!

Thread safe

OCaml is thread safe as long as you keep everything in the functional way obviously. However, whenever you have to use imperative way, thread safety must be kept in mind.

Like in above lazy, it uses ref the typical imperative type and it is thread dangerous. Let's say we have a lazy inside which the thunk takes 1 hour to finish. Two threads are using it. When the first thread tries to force it, it sees Delayed f, then carry on the computation slowly. After some time fraction, when the second thread is activated at some point and tries to force it, it also sees Delayed f, right? So eventually, two evaluations of the same thunk of the same lazy has been done. The purpose of lazy collapses, doesn't it?

Quiz 1: what is the simplest way to provide thread safe?

A trivial answer

A lock

Mutex provides essential thread safe in OCaml. Let's quickly try it.

First of all, each lazy needs a mutex dedicated for itself and with this, all access to the lazy are guarded.

type 'a lazy_t = 'a lazy_node_t ref * Mutex.t;;

let lazy_from_fun f = ref (Delayed f), Mutex.create();; (* create a lazy_t from a thunk with a brand new mutex *)

Then when we force it, we need to lock / unlock.

let force x_lazy =  
  let x_lazy_part, m = x_lazy in
  Mutex.lock m;
  match !x_lazy_part with
    | Value x -> Mutex.unlock m; x
    | Exn e -> Mutex.unlock m; raise e
    | Delayed f -> 
      try
        let r = f() in
        x_lazy_part := Value r;
        Mutex.unlock m;
        r
      with e -> 
        x_lazy_part := Exn e;
        Mutex.unlock m;
        raise e;;
(* A bit ugly as Mutex.unlock m is everywhere.*)

That's it. Thread is now safe with lazy.

Let's ask again:

Are we done?

Yes, we are done. Now beer time!

To be continued