The Magic of Thunk - Lazy
The dark side of thunk
As discussed previously, thunk is used to encapsulate computations for later uses. Although we may not evaluate a thunk yet, we know its value is determined. However, it has a weakness: when we invoke a thunk repeatedly, the same computation inside it gets carried out again and again, despite of the fact that the results returned are always identical. For example,
let fibonacci n = let rec fib a b i = if i = n then a else fib b (a+b) (i+1) in fib 0 1 0;; let millionth_fib = fun() -> fibonacci 1000000;; let show_the_fib fib = function | true -> Some (fib()) | flase -> None;; let print_the_fib fib = function | true -> print_int (fib()) | flase -> ();;
print_the_fib both use thunks, just in case that the arguments are always false (so the heavy computation would never be necessarily carried out). It is good. But what if
show_the_fib millionth_fib true and
print_the_fib millionth_fib true? Then actually
millionth_fib will be evaluated twice. What if we give
billionth_fib? What if
show_the_fib millionth_fib true and/or
print_the_fib millionth_fib true get called lots of times? Then the heavy computations will be done again and again. You may say in those worst cases, we just let
millionth_fib() be done once anyway and bind it to a variable, i.e., don't use thunk. Yes, we can do that, however, it is not perfect as it is still a waste of CPU when arguments are always false. Thunk puts one gate there to make sure that the computation would not be touched if unnecessary, however, once it becomes necessary, the times of executions cannot be controlled.
lazy is there to make all those bad cases go away. It is essentially a thunk (delay the computation, first gate) with the feature of making sure that the computation would be executed at most once (control the times, second gate).
How to invent lazy?
lazy is actually a built-in keyword of OCaml for you to create a
lazy_t. There is a module
Lazy there for you to fully utilise lazy. However, in this section, we will pretend lazy never exists and try to invent it. In this way, the understanding of lazy would hopefully become deeper. Let's get started.
Assume we have
let millionth_fib = fun() -> fibonacci 1000000;; as before. Now we want to have
millionth_fib_lazy. How can we add the second gate to stop the repeated computations, i.e., as long as it is computed once, how to return the existing result directly? Let's rephrase it more clearly:
- millionth_fib_lazy needs to be thunk when never evaluated.
- millionth_fib_lazy needs to become an int immediately after first time evaluation.
- Future attempts of accessing it will get the int back directly.
The key part is from step 1 to step 2: the transition between two different types. So we have two fundamental problems to solve now:
millionth_fib_lazy of course must belong to a type and we can call it
lazy_node_t. Also ideally
lazy_node_tmust include two other types: thunk and int (int is particularly for this case and we can have
'afor general purposes). We do not name the type as
lazy_tyet because of the other problem below.
One type (thunk) must be able to transit to the other (int), but not the other way around
lazy_node_t - Multiple types in one type
Variants can do this job. It combines multiple types into one and still stands as one single type to OCaml's type system. If you learn or already know variant type, then it is straightforward to write the type lazy_node_t:
type 'a lazy_node_t = | Delayed of (unit -> 'a) | Value of 'a;;
So a lazy can initially be
Delayed of a thunk. Then after first time evaluation, it becomes a
'a. In the case of
millionth_fib_lazy, it is initially
Delayed (fun() -> fibonacci 1000000), then later on at some point, it becomes
Value 1884755131. Are we done? No! Although millionthfiblazy now can be two things, we still need to handle the transition.
lazy_t - ref
Remember one fact that in OCaml world, being immutable is the default. So even if we somehow manage to evaluation
Delayed inside millionthfiblazy and return the result as
Value, it will be given as a new binding and will not change millionthfiblazy. This is not what we want, instead, we want to change the binding/variable millionthfiblazy itself. So wherever it is used, it has been changed already. Thus we have to use ref and here comes our final lazy_t:
type 'a lazy_node_t = | Delayed of (unit -> 'a) | Value of 'a | Exn of exn;; (* we add exception type here just in case the thunk might produce exception *) type 'a lazy_t = 'a lazy_node_t ref;; let lazy_from_fun f = ref (Delayed f);; (* create a lazy_t from a thunk *)
Now we are ready to make the transitions more concrete.
let force x_lazy = match !x_lazy with | Value x -> x | Exn e -> raise e | Delayed f -> try let r = f() in x_lazy := Value r; r with e -> x_lazy := Exn e; raise e;;
force is used to evaluate lazy_t. The logic is simple:
- If the lazy is Value already, just return the result
- If the lazy is Exn, then raise it
- Otherwise, we change the lazy to either Value of Exn accordingly first, then return or raise
Now the two fundamental problems mentioned before are solved. Let's try it:
let millionth_fib_lazy = lazy_from_fun (fun() -> print_endline "calculate millionth fib";fibonacci 1000000);; let show_the_fib fib = function | true -> Some (force fib) | flase -> None;; let print_the_fib fib = function | true -> print_int (force fib) | flase -> ();; show_the_fib millionth_fib_lazy true;; print_the_fib millionth_fib_lazy true;; show_the_fib millionth_fib_lazy true;; show_the_fib millionth_fib_lazy true;; calculate millionth fib int option = Some 1884755131 1884755131 int option = Some 1884755131 int option = Some 1884755131
We can see
calculate millionth fib is printed only once. It seems working! So are we done? No, still no!
OCaml is thread safe as long as you keep everything in the functional way obviously. However, whenever you have to use imperative way, thread safety must be kept in mind.
Like in above lazy, it uses
ref the typical imperative type and it is thread dangerous. Let's say we have a lazy inside which the thunk takes 1 hour to finish. Two threads are using it. When the first thread tries to force it, it sees
Delayed f, then carry on the computation slowly. After some time fraction, when the second thread is activated at some point and tries to force it, it also sees
Delayed f, right? So eventually, two evaluations of the same thunk of the same lazy has been done. The purpose of lazy collapses, doesn't it?
Quiz 1: what is the simplest way to provide thread safe?
A trivial answer
Mutex provides essential thread safe in OCaml. Let's quickly try it.
First of all, each lazy needs a mutex dedicated for itself and with this, all access to the lazy are guarded.
type 'a lazy_t = 'a lazy_node_t ref * Mutex.t;; let lazy_from_fun f = ref (Delayed f), Mutex.create();; (* create a lazy_t from a thunk with a brand new mutex *)
Then when we force it, we need to lock / unlock.
let force x_lazy = let x_lazy_part, m = x_lazy in Mutex.lock m; match !x_lazy_part with | Value x -> Mutex.unlock m; x | Exn e -> Mutex.unlock m; raise e | Delayed f -> try let r = f() in x_lazy_part := Value r; Mutex.unlock m; r with e -> x_lazy_part := Exn e; Mutex.unlock m; raise e;; (* A bit ugly as Mutex.unlock m is everywhere.*)
That's it. Thread is now safe with lazy.
Let's ask again:
Are we done?
Yes, we are done. Now beer time!
To be continued