art with code

2009-01-07

Test generation / measuring code

Started writing the backend for test generation: measurer.ml. Here's an example:

# module M = Measurer (ListGen (IntGen));;
module M :
sig
type t = ListGen(IntGen).t
val measure : (t -> 'a) -> (string * t * 'a exn_result) list
val benchmark : (t -> 'a) -> (int * int * bm_stat exn_result) list
end

Here I created a new measurer that measures an int list generator (i.e. a list generator parametrized with an int generator.) The measurer module has two functions, measure and benchmark. The measure function takes a function and calls it with the values it gets from the aforementioned generator, producing a list of what it did and what happened.

# M.measure (List.map (fun x -> x * 2));;
- : (string * M.t * int list exn_result) list =
[("(int list) normal", [0; -5; 1; -1; 2; 3; 5], Result [0; -10; 2; -2; 4; 6; 10]);
("(int list) negative", [5; 3; 2; -1; 1; -5; 0], Result [10; 6; 4; -2; 2; -10; 0]);
("(int list) zero", [], Result []);
("(int list) one", [1], Result [2]);
("(int list) minus_one", [-1], Result [-2]);
("(int list) even", [1; 2], Result [2; 4]);
("(int list) odd", [-1; 0; 1], Result [-2; 0; 2]);
("(int list) min_val", [-4611686018427387904], Result [0]);
("(int list) max_val", [4611686018427387903], Result [-2])]

The benchmark function times how long it takes to run the function with different input sizes and how much it stresses the GC in terms of allocations and collections. The different input sizes are defined by the generator. The first number in the output tuple is the parameter used to call the generator's get_value, the second number is the relative input size.

# M.benchmark (List.map (fun x -> x));;
- : (int * int * bm_stat exn_result) list =
[(0, 0, Result {
time = 9.5367431640625e-07;
minor_collections = 0; major_collections = 0;
allocated_bytes = 0.});
(1, 1, Result {
time = 2.1457672119140625e-06;
minor_collections = 0; major_collections = 0;
allocated_bytes = 24.});
(2, 2, Result {
time = 1.9073486328125e-06;
minor_collections = 0; major_collections = 0;
allocated_bytes = 48.});
(3, 4, Result {
time = 1.9073486328125e-06;
minor_collections = 0; major_collections = 0;
allocated_bytes = 96.});
(4, 8, Result {
time = 3.0994415283203125e-06;
minor_collections = 0; major_collections = 0;
allocated_bytes = 192.});
...
(14, 8192, Result {
time = 0.00134301185607910156;
minor_collections = 0; major_collections = 0;
allocated_bytes = 196608.});
(15, 16384, Result {
time = 0.00557589530944824219; (* oh, non-linear scaling (caused by a minor_collection) *)
minor_collections = 1; major_collections = 0;
allocated_bytes = 393216.})]

You can test multi-argument functions by currying:

# module IM = Measurer (IntGen);;
# module SM = Measurer (StringGen);;
# let arg_1_results = SM.measure String.get;;
val arg_1_results : (string * SM.t * (int -> char) exn_result) list =
[("string normal", "Foobar", Result <fun>);
("string negative", "barFoo", Result <fun>);
("string zero", "", Result <fun>);
("string one", "a", Result <fun>);
("string minus_one", "A", Result <fun>);
("string even", "aB", Result <fun>);
("string odd", "Abc", Result <fun>);
("string min_val", "-25.0", Result <fun>);
("string max_val", ..., Result <fun>)]

As you can see, the results are curried functions (the Result-constructor being an exception catcher, see below.)

# let arg_2_results = List.map (fun (n,v,r) -> (n,v, ex_map IM.measure r)) arg_1_results;;
val arg_2_results :
(string * SM.t * (string * IM.t * char exn_result) list exn_result) list =
[("string normal", "Foobar",
Result
[("int normal", 5, Result 'r');
("int negative", -5, Error (Invalid_argument "index out of bounds"));
("int zero", 0, Result 'F');
("int one", 1, Result 'o');
("int minus_one", -1, Error (Invalid_argument "index out of bounds"));
("int even", 2, Result 'o');
("int odd", 3, Result 'b');
("int min_val", -4611686018427387904, Error (Invalid_argument "index out of bounds"));
("int max_val", 4611686018427387903, Error (Invalid_argument "index out of bounds"))]);
("string negative", "barFoo",
...

It still needs a front-end, function generators, a pretty-printer and some whacking with a cluebat, but it might just let me write my tests with less effort and fewer errors when it's done. I hope.

A potential problem is the output size explosion with multi-argument functions (each generator outputs 9 values, if you have 4 args, you get 9^4 = 6561 measurements, which might take a while to read.)

No comments:

Blog Archive