art with code

2009-01-29

Two things to make Ubuntu 8.10 less nerve-wrecking

Workaround for the "Hey guys! Let's use HAL for everything! It'll fuck up your keyboard repeat settings though, but that's no big deal, right! I mean, a keyboard! Ha! That's some old fogey stuff!"

Disable the PC speaker completely. As Gnome, in its infinite transcendent wisdom, has come to the conclusion that the PC speaker is the purveyor of all that is good and wise in the land. And hence there's no Gnome way to turn off that cursed beeper anymore.

2009-01-27

Performance debugging with R

The Boyer-Moore string search in prelude.ml had a performance bug.

The search token preprocessing was taking quadratic time.

After fixing that, I special-cased n=1..4 with a per-char search for n=1 and PCRE for n=2..4.

Now it is nice and fast.

2009-01-24

Programming books that I've liked

Programming languages

Programming Language Pragmatics
An extensive survey of how different programming languages work, i.e. how compilers and interpreters work, how different language features are implemented, what sorts of different programming languages are there, and so on. I read the book over a couple of months, and it was actually quite interesting.


The C Programming Language a.k.a. K&R
I read this over the weekend a couple weeks back and it was a nice read. Very practice-oriented, going through the language and standard library by building up example programs (even if some examples are a bit "Oh and by the way, you should never do this.") And at 200 pages or so (+100 pages of appendices), it's not going to take forever to read either.


Real World Haskell
Everyone's favourite lazily evaluated purely functional language and how to use it to get things done. You should also check out the online version of the book (and the paragraph comments in particular, such a nice idea.) While it does start off very, very slowly and tends to use an annoying mix of code snippets interleaved with function introductions in ghci, it's still quite readable.


Haskell 98 Language and Libraries: The Revised Report
The language lawyer book for Haskell. Very dense, very technical, very code-filled, very lovely. Starts off with a grammar specifying the lexical structure of valid Haskell programs, and contains the source code for Haskell's standard library, so it might require some fortitude. Also available online. This alone makes it worth checking out.


For OCaml, do see the OCaml manual and Developing Applications With Objective Caml. For Coq, I've been reading the tutorial and reading some sources.


Programming Ruby: The Pragmatic Programmers' Guide
The Ruby book. Has a very K&R vibe to it with focused examples and a tour of the standard library. I think this was the book that got me re-started on programming ~five years ago. Before that I was drawing for a living and programming for fun. Now I'm drawing for fun, programming for fun and starving for a living. Ha ha, fate, that fickle mistress.


Algorithms and data structures

Introduction to Algorithms
This was the book on our data structures course. We went through hash tables, binary trees, red-black trees and graph search algorithms. Borrowed the book from the library and read it for the exam. Can't remember a thing apart from "Hey, this book is pretty nice."


Purely Functional Data Structures
The Cormen book above contains imperative algorithms. Using them in a functional language is going to be painful. Hence Okasaki ventured forth and wrote his PhD about data structures that are better compatible with functional programming, and this book is the result. The language of the examples is SML with a laziness extension, and there's an appendix with Haskell versions of the sources. There's also a PDF of the thesis available online.


Graphics programming


Computer Graphics: Principles and Practice in C
While the drawing API parts of the book are aged (there's a whole chapter dedicated to PHIGS for crying out loud), it has a very useful chapter on parametric curves and goes through the basic shading equations and transformation matrices. It also has algorithms for drawing lines and circles and whatnot. Perhaps most useful as a 2D graphics book.


Real-Time Rendering
A thorough treatment of the theory behind real-time (read: game) rendering engines. I leafed this through at the university library, so my memories are vague. The website of the book has a very helpful list of recommended books.



GPU Gems 1, GPU Gems 2 and GPU Gems 3. Chock full of cool stuff from rendering to physics to GPGPU. And available online on the book websites (1, 2, 3.)


Physically Based Rendering - From Theory To Implementation
Focused on ray tracing, with chapters consisting of theory followed by C++. Should try to find this somewhere and Build My Very First Ray Tracer.


Fiction

Not really books about programming, but books with programming. I enjoyed them nevertheless.


The Wiz Biz
Wiz Zumwalt gets whisked away to a world where anyone can cast spells (though a good several years of training is recommended lest the wannabe mage fizzle themselves in the process) and where magic can be cast by written words. Small leap from there to writing a Forth running on the very fabric of reality. Also available online.


The Dance of Gods
A series of four books, wacky fantasy with a cast of trickster characters, multiple-viewpoint narrative and chain reaction plots. And it's CC licensed to boot.

And now that I've given you enough rope to hang your productivity for a week or two, I have some unit tests to write. 'til next time!

2009-01-21

Revised quickcheck.ml syntax

Changed the quickcheck.ml syntax a bit to make it easier to pretty-print lists, arrays and tuples. Now you can use e.g. Q.int instead of Q.uig as a parameter to the list generator (though you need parens now.) Yesterday's example looks like this with the new syntax:

let reverse l =
let rec aux res l = match l with
| [] -> res
| (h::t) -> aux (h::res) t in
aux [] l
(**Q
(Q.list_of_size (fun () -> Random.int 2) Q.int) (fun l -> reverse l = l)
(Q.list Q.int) (fun l -> reverse (reverse l) = l)

(* Thanks to notfancy on reddit for the following law. *)
(* Demonstrating tuple generation *)
(Q.pair (Q.list Q.int) (Q.list Q.int)) (fun (l,m) -> reverse (l @ m) = reverse m @ reverse l)
**)

Saving that to reverse.ml and running the test extractor on it prints out:

open OUnit
module Q = Quickcheck
open Reverse

let _iteri f l = ignore (List.fold_left (fun i v -> f v i; i + 1) 0 l)
let _TL = _iteri (fun (n,b) i ->
OUnit.assert_bool ("Line " ^ string_of_int (i+1) ^ " of bool test: " ^ n) b)

let test_reverse_line_6 () =
Quickcheck.laws_exn
"(Q.list_of_size (fun () -> Random.int 2) Q.int) (fun l -> reverse l = l)"
(Q.list_of_size (fun () -> Random.int 2) Q.int) (fun l -> reverse l = l);
Quickcheck.laws_exn
"(Q.list Q.int) (fun l -> reverse (reverse l) = l)"
(Q.list Q.int) (fun l -> reverse (reverse l) = l);

(* Thanks to notfancy on reddit for the following law. *)
(* Demonstrating tuple generation *)
Quickcheck.laws_exn
"(Q.pair (Q.list Q.int) (Q.list Q.int)) (fun (l,m) -> reverse (l @ m) = reverse m @ reverse l)"
(Q.pair (Q.list Q.int) (Q.list Q.int)) (fun (l,m) -> reverse (l @ m) = reverse m @ reverse l);()


let suite = "Reverse unit tests" >:::
[
"test_reverse_line_6" >:: test_reverse_line_6
]

let () = Tests.register suite

2009-01-20

Low-boilerplate testing in OCaml

Continuing my search for a low-boilerplate way to write reliable code, I integrated quickcheck.ml (QuickCheck clone adapted from Core, which is adapted from this) to Prelude.ml's test suite extractor. The test suite extractor parses the source code for tests embedded in special comments and creates an oUnit test suite from them. This way I can have the tests near the code (where I think they should be), and only compile and run them when I want to (i.e. I do `omake -P test` and it extracts and reruns the tests whenever it detects changes in the source files.)

A further bonus of generating the tests is that I don't have to come up with names for them. The extractor automatically names the tests at extraction time (as test_filename_line_nnn.) This makes it a lot nicer to write tests, as you don't need to shift your focus from "what should the test do" to "what should the name be for what the test should do." The existence of the file name and line number in the auto-generated name also assures me that I _will_ find the failing test and _fast_.

Previously I had embedded unit test bodies (which I haven't used much) and lists of boolean tests (which most of my tests are.) Now I added embedded QuickCheck laws, and can perhaps eliminate a dozen manually written tests with a single property check.

let reverse l =
let rec aux res l = match l with
| [] -> res
| (h::t) -> aux (h::res) t in
aux [] l
(***
(* embedded test body *)
assert_equal (reverse []) [];
assert_equal (reverse [1]) [1];
assert_equal (reverse (1--10)) (10--1)
**)
(**T
(* embedded boolean test list *)
reverse [] = []
reverse [1] = [1]
reverse (1--10) = (10--1)
**)
(**Q
(* embedded QuickCheck law list *)
Q.list ~size_gen:(fun _ -> Random.int 2) Q.uig (fun l -> reverse l = l)
Q.list Q.uig (fun l -> reverse (reverse l) = l)
Q.list Q.uig (fun l -> reverse l = list (areverse (array l)))
Q.list Q.cg (fun l -> reverse l = explode (sreverse (implode l)))
**)

As you may notice, the QuickCheck laws are generally longer to write than the tests themselves. But you shouldn't need as many laws as tests.

I haven't yet used the embedded QuickCheck tests for anything, so I can't give any real-world statistics. Maybe by the end of the week? Though the code that I'm currently testing is IO, and that tends to require whole embedded test bodies (as you need setup for temp file creation and deletion and state change checks.) Hm, maybe there's a nice way to abstract all that fiddling into an unfold of some kind...

Future directions

The FsCheck port of QuickCheck contains a shrinking feature for finding a reduction of the failed test input. Which would be nice.

My goal for the testing system is minimizing the amount of extraneous stuff I have to write to get the information I want from the code. To explain:

Tests give me information about the code. Specifically: does this function behave as expected when called with this argument.

I can only write a limited amount of lines of code per day (physically I can sustain ~300 loc / day, 1000 will get me RSI by the end of the day.) The more boilerplate I have to write, the less real code I can write.

A code:tests ratio of 1:4 means that I can write 60 lines of code per day. The rest of the budget is taken by the 240 lines of tests.

There are several ways to get at the testing information. Finding a way that expends less effort on my part will boost my productivity (and my hands will be thankful as well.)

Furthermore, I make mistakes when writing code. I also make mistakes when writing tests, missing tests that might discover mistakes in the code. The testing system should seek to find the coding mistakes and hence also minimize the amount of testing mistakes (QuickCheck and such automated test generators help there.)

The more tests the computer writes, the more and better quality code I can write.

2009-01-19

QuickCheck in make_suite.rb

Adapted quickcheck.ml from Jane Street Capital's Core to use for Prelude.ml testing. Will write a longer post tomorrow (need to be wide awake in 7 hours), but now I can do this:

let shell_escape =
let re = Pcre.regexp "(?=[^a-zA-Z0-9._+/-])" in
Pcre.replace ~rex:re ~templ:"\\"
(**Q
Q.uint (fun i -> shell_escape (string_of_int i) = (string_of_int i))
Q.string (fun s -> slen (shell_escape s) = slen s)
**)

And get error messages like:

law Q.string (fun s -> slen (shell_escape s) = slen s) failed for
"\018\235\220\202U\222\146\200\002\249\232\170\015\238\147\238\004
\209\003\241\182\245\150\t\016P\190"

I also tweaked the bool test error messages, now they include the assertion body:

FAIL: Unit tests:0:Prelude unit tests:671:test_prelude_line_8631

OUnit: Line 5 of bool test: shell_escape "foo is fan/cy+some!" = "foo\\ is\\ fan/cy+some!"


Oh and hey, there's an F# port of QuickCheck too.

2009-01-17

PreString tests done

Arrrr! Ahead of schedule!

Olliej told me that HTML5 now has text operations for the canvas, and henceforth I may go and make cake.js use the aforementioned. He also said that the WebKit nightly does a 640x480 ImageData fill and draw at 46fps. Which is very nice. Earlier this week, made cake.js skip malformed SVG paths instead of raising an error. And fixed a gradient bug just now.

2009-01-15

PreArray testing stats

As I finished the first pass at PreArray unit tests yesterday, I thought to take a quick look at the testing statistics for the PreArray module. Before the test writing pass, PreArray contained 299 non-blank lines (of which 252 source, the rest tests and section headers.) With the tests, the line count rose to 1357 (of which 290 source), giving around 1020 new test lines, for a code:tests ratio of 1:3.7.

I quickly counted the number of bugfixes with gitk, and found 16 bugs fixed. So, one bug found (and fixed) for every 64 lines of tests. And one bug per every 18 lines of code for 55 bugs/ksloc for PreArray.

Compared to the previous numbers of bug per 50 test lines and 24 bugs/ksloc, PreArray has more tests per line of code, less bugs found per test, and a more than doubled bug rate per line of code (24 bugs/ksloc vs 55 bugs/ksloc.) What are the main differences between PreArray and the previously tested part? The main thing increasing bug rate might be that PreArray does [unsafe] indexed access to arrays, making sub/slice indexing harder than with lists, as the indexes need to be clamped to the array's dimensions.

Lists were perhaps less buggy because they have nicer provability (the x::xs -pattern matching, and a naturally recursive coding style.) Whereas arrays tend towards random access, which is harder to reason about.

And quite a few of the list functions and the preceding stuff was copied from prelude.hs, which would lower bug rate :)

Prelude.ml: first pass of array tests complete

The power of copy&paste coupled with search&replace brought about the completion of the first pass of prelude.ml array tests. What remains is porting those to PreString and Bytestring, plus the rest. PreArray has a 1:4 code:tests ratio, which sounds pretty ridiculous. With that ratio, the rest of prelude.ml is still 4900 lines of tests short, about half of them (== PreString & Bytestring) should be adaptable from PreArray.

The list-to-array adaptation trudged along at around 900 lines per day, so completing PreString and Bytestring should be doable by next week. Then the rest at a slower pace, for full tests by the end of the month? And tag 1.0 then.

The unit tests are turning writing a 2000-line library into writing a 10000-line library. That's a huge amount of boilerplate. There must be some better way to assert that a piece of code works correctly. And my tests aren't even complete, I'm mostly testing the easy high-impact things like "does indexing work right?"; "does this do what it should for this small array?" and not the subtler things ("if the sum of this array overflows int, the 'average' function doesn't work right.") Not to mention performance and memory use.

2009-01-09

Multi-threaded qsort in C

Edit: Added memory access pattern analysis.
Edit #2: And a note on _qsort_1 and _qsort_2 having horrible performance. Plus swap accesses in the graphs.
Edit #3: The slowness in sorting chars comes from [x, x, x, ..., x] arrays triggering worst case behaviour (reverse sorted triggers it as well.) And as the char alphabet is only 8 bits, you get [x,x,x,...,x] spans after a couple rounds of partitioning, triggering the worst case. E.g. with 1M array, you'll get 256 spans of 2^20 / 2^8 = 2^14 = 16k length, which means that it'll be doing a total of 2^8 * 2^14^2 = 2^8 * 2^28 = 2^36 = ~65 billion swaps and compares in total. Which'll take a while.

Reading through K&R, wrote a libc-style qsort based on the simple version in K&R. Then wrote special cases for hardware element sizes (8/4/2/1 bytes.) At that point it took 2.25s to run on a 10 million element array of random int64s, versus 2.75s for the glibc version. Adding in simple threading brought the time down to 1.4s on my dualcore.

The generic version of the K&R qsort took around 2.5s for the single-threaded case (again, vs. glibc's 2.75s.) The above numbers are with -O2 and -O3. With optimizations turned off, the glibc qsort takes 7.2 seconds[1], while the K&R qsort takes 6.8 seconds (of which 1.5 sec can be regained by manually doing the -O3 loop invariant optimizations and inlining the swaps.) Which is interesting. What does the glibc qsort do wrong?

To investigate, I instrumented the swap and comparison functions to generate graphs of the memory access patterns. Here's the graph for the glibc qsort and the graph for the K&R qsort (with trailing insertion sorts.) A dot to the left of the graph is an access to the start of the array, while a dot to the right is an access to the end of the array.

In the graph for the glibc qsort you can see its "collapsing walls"-strategy in the tornado-like structures. Another feature of interest is at the very end: it runs insertion sort on each qsorted 4-element span. The tornadoes might be what's causing the performance gap between the glibc qsort and the K&R qsort, as they do streaming access both from the start and the end.

Timings for a small memory access pattern benchmark:

$ time ./mem_bm forward

real 0m5.138s
user 0m5.024s
sys 0m0.064s

$ time ./mem_bm reverse

real 0m5.004s
user 0m4.868s
sys 0m0.064s

$ time ./mem_bm both

real 0m6.197s
user 0m5.932s
sys 0m0.080s


Showing the tornado access pattern to be 20% slower than streaming access. Do take that with a hefty dose of salt though.

Also, the _qsort_2 and _qsort_1 versions are very slow compared to the glibc qsort. _qsort_1 specifically does something very wrong.

[1] Note that you have to use the _quicksort from stdlib/qsort.c, using the stdlib.h qsort links in the compiled library and -O doesn't have much effect on that (it does give you a small performance improvement though, enough to fool you into thinking that it does something to the qsort.)

Here's the generic version of the K&R qsort:

#include <stdio.h>
#include <string.h>
#include <alloca.h>
#include <sys/types.h>
#include <stdlib.h>
#include <pthread.h>

typedef int (*comparer)(const void *, const void *);

#define THREAD_THRESHOLD 8192

void _qsort (size_t element_size, void *arr, int left, int right, comparer cmp);

struct qsort_args {
size_t element_size;
void *arr;
int left;
int right;
comparer cmp;
};

struct qsort_args _qsort_args
(size_t element_size, void *arr, int left, int right, comparer cmp)
{
struct qsort_args qa;
qa.element_size = element_size;
qa.arr = arr;
qa.left = left;
qa.right = right;
qa.cmp = cmp;
return qa;
}

void *_qsort_start (void *args)
{
struct qsort_args *a = (struct qsort_args *)args;
_qsort (a->element_size, a->arr, a->left, a->right, a->cmp);
return NULL;
}

inline size_t average2 (size_t a, size_t b)
{
return ((a ^ b) >> 1) + (a & b);
}

void swap (size_t sz, void *arr, size_t i, size_t j)
{
char *a = (char*)(arr + i*sz);
char *b = (char*)(arr + j*sz);
long tmp;
for (i=0; i<sz-sz%sizeof(long); i+=sizeof(long)) {
tmp = *(long*)&a[i];
*(long*)&a[i] = *(long*)&b[i];
*(long*)&b[i] = tmp;
}
for (; i<sz; i++) {
tmp = a[i];
a[i] = b[i];
b[i] = (char)tmp;
}
}

void _qsort (size_t element_size, void *arr, int left, int right, comparer cmp)
{
int i, last;
pthread_t thread;
struct qsort_args qa;

if (left >= right)
return;
swap(element_size, arr, left, average2(left, right));
last = left;
for (i = left + 1; i <= right; i++) {
if (cmp(arr + i*element_size, arr + left*element_size) < 0) {
last++;
swap(element_size, arr, last, i);
}
}
swap (element_size, arr, left, last);
if (right-left > THREAD_THRESHOLD) {
qa = _qsort_args(element_size, arr,left,last-1,cmp);
if (0 == pthread_create(&thread, PTHREAD_CREATE_JOINABLE,
_qsort_start, (void*)&qa)) {
_qsort (element_size, arr, last+1, right, cmp);
pthread_join(thread, NULL);
return;
}
}
_qsort (element_size, arr, left, last-1, cmp);
_qsort (element_size, arr, last+1, right, cmp);
}

void* my_qsort (void *arr, size_t length, size_t element_size, comparer cmp)
{
_qsort (element_size, arr, 0, length-1, cmp);
return arr;
}

And the whole kaboodle can be seen at my_qsort.c.

2009-01-07

Test generation / measuring code

Started writing the backend for test generation: measurer.ml. Here's an example:

# module M = Measurer (ListGen (IntGen));;
module M :
sig
type t = ListGen(IntGen).t
val measure : (t -> 'a) -> (string * t * 'a exn_result) list
val benchmark : (t -> 'a) -> (int * int * bm_stat exn_result) list
end

Here I created a new measurer that measures an int list generator (i.e. a list generator parametrized with an int generator.) The measurer module has two functions, measure and benchmark. The measure function takes a function and calls it with the values it gets from the aforementioned generator, producing a list of what it did and what happened.

# M.measure (List.map (fun x -> x * 2));;
- : (string * M.t * int list exn_result) list =
[("(int list) normal", [0; -5; 1; -1; 2; 3; 5], Result [0; -10; 2; -2; 4; 6; 10]);
("(int list) negative", [5; 3; 2; -1; 1; -5; 0], Result [10; 6; 4; -2; 2; -10; 0]);
("(int list) zero", [], Result []);
("(int list) one", [1], Result [2]);
("(int list) minus_one", [-1], Result [-2]);
("(int list) even", [1; 2], Result [2; 4]);
("(int list) odd", [-1; 0; 1], Result [-2; 0; 2]);
("(int list) min_val", [-4611686018427387904], Result [0]);
("(int list) max_val", [4611686018427387903], Result [-2])]

The benchmark function times how long it takes to run the function with different input sizes and how much it stresses the GC in terms of allocations and collections. The different input sizes are defined by the generator. The first number in the output tuple is the parameter used to call the generator's get_value, the second number is the relative input size.

# M.benchmark (List.map (fun x -> x));;
- : (int * int * bm_stat exn_result) list =
[(0, 0, Result {
time = 9.5367431640625e-07;
minor_collections = 0; major_collections = 0;
allocated_bytes = 0.});
(1, 1, Result {
time = 2.1457672119140625e-06;
minor_collections = 0; major_collections = 0;
allocated_bytes = 24.});
(2, 2, Result {
time = 1.9073486328125e-06;
minor_collections = 0; major_collections = 0;
allocated_bytes = 48.});
(3, 4, Result {
time = 1.9073486328125e-06;
minor_collections = 0; major_collections = 0;
allocated_bytes = 96.});
(4, 8, Result {
time = 3.0994415283203125e-06;
minor_collections = 0; major_collections = 0;
allocated_bytes = 192.});
...
(14, 8192, Result {
time = 0.00134301185607910156;
minor_collections = 0; major_collections = 0;
allocated_bytes = 196608.});
(15, 16384, Result {
time = 0.00557589530944824219; (* oh, non-linear scaling (caused by a minor_collection) *)
minor_collections = 1; major_collections = 0;
allocated_bytes = 393216.})]

You can test multi-argument functions by currying:

# module IM = Measurer (IntGen);;
# module SM = Measurer (StringGen);;
# let arg_1_results = SM.measure String.get;;
val arg_1_results : (string * SM.t * (int -> char) exn_result) list =
[("string normal", "Foobar", Result <fun>);
("string negative", "barFoo", Result <fun>);
("string zero", "", Result <fun>);
("string one", "a", Result <fun>);
("string minus_one", "A", Result <fun>);
("string even", "aB", Result <fun>);
("string odd", "Abc", Result <fun>);
("string min_val", "-25.0", Result <fun>);
("string max_val", ..., Result <fun>)]

As you can see, the results are curried functions (the Result-constructor being an exception catcher, see below.)

# let arg_2_results = List.map (fun (n,v,r) -> (n,v, ex_map IM.measure r)) arg_1_results;;
val arg_2_results :
(string * SM.t * (string * IM.t * char exn_result) list exn_result) list =
[("string normal", "Foobar",
Result
[("int normal", 5, Result 'r');
("int negative", -5, Error (Invalid_argument "index out of bounds"));
("int zero", 0, Result 'F');
("int one", 1, Result 'o');
("int minus_one", -1, Error (Invalid_argument "index out of bounds"));
("int even", 2, Result 'o');
("int odd", 3, Result 'b');
("int min_val", -4611686018427387904, Error (Invalid_argument "index out of bounds"));
("int max_val", 4611686018427387903, Error (Invalid_argument "index out of bounds"))]);
("string negative", "barFoo",
...

It still needs a front-end, function generators, a pretty-printer and some whacking with a cluebat, but it might just let me write my tests with less effort and fewer errors when it's done. I hope.

A potential problem is the output size explosion with multi-argument functions (each generator outputs 9 values, if you have 4 args, you get 9^4 = 6561 measurements, which might take a while to read.)

2009-01-06

And even more testing

Indexing. Indexing is hard. Pretty much every test I wrote for fancy array indexing found a bug. And when you have an indexing bug combined with unsafe gets and sets, you have a segfault bug. High P(you fucked up) and high P(real bad) combine to make a high P(you fucked up real bad.)

Now the tests pass but ... yeah. The seed of insecurity is sown. Fear taking control, etc. Cause of fear: lack of information. The more you know about the code, the less you fear it (specifically: Does it work? Really? Even on a PowerPC? What does "work" mean in this context? Give me the measurements.) Right?


Today I've been writing some simple things in Coq (s/writing/copying examples by hand.) It lets you define statements and prove facts about them. Which brings the question, how do you know that you have proven all the facts that you need to prove about a piece of code (e.g. that "(x + y) / 2" works incorrectly if x + y > max_int (prodding you towards x / 2 + y / 2 + if odd x && odd y then 1 else 0))


Tomorrow, test generation.

2009-01-04

Current testing stats

1500 lines of new tests written, 20 bugs found, codebase sans tests is 1700 lines (2200 with the aliases.) The first 800 lines of tests uncovered 6 bugs, the next 700 found 14.[1] Apparently practice does make you better at writing tests.

I'm still around a thousand test lines short of done, so at the current rate of one bug per 50 lines of tests I might expect to find 20 more bugs. Which'd put prelude.ml's pre-testing bugs/kSLOC at 24 bugs per thousand lines of source code (bug every 42 lines.) Prelude.ml contains around 700 top-level function definitions, giving a ratio of one bug every 17.5 functions.

Furthermore, I might expect the current set of tests to contain 36 bugs (1500 * 24/1000) and the finished tests to contain 60 bugs. A bug in a test would be omitting some relevant test case or expecting wrong behaviour.

A few questions arise: How to find more defects with less effort? How to assert the correctness of the tests? How to write code with fewer defects?


[1] My counting methodology was: Misbehaviour is a bug. Changing code and tests to change expected behaviour counts as a bug. Solving multiple bugs with a single search&replace counts as one bug. Changing performance characteristics of a function (e.g. making a function tail recursive) counts as a bug.

Random thought on automated testing

An automatic test generator that takes the function signature would be nice. Feed it the code and it uses the signature to generate and output a file of passing tests. Then you go and fix the code until the measurements match your expectations. And it could also curve-fit runtime measurements to complexity classes. And derive a set of hypotheses to describe your code by matching the test results against its rule database. And feed the hypotheses to QuickCheck to test them.

It'd work like this:

$ cat foo.ml
let double x = if x = 0 then 0 else 2

$ measure_code < foo.ml
(*
val double : int -> int
Estimated time complexity: O(1) (average runtime 0.01us)
Estimated space complexity: O(1) (allocated an average 8 bytes)
Hypothesis:
double 0 -> 0
forall x:int, double x -> 2
*)
(**T double_against_Int_gen
double 0 = 0
double 1 = 2
double (-1) = 2
double 2 = 2
double 3 = 2
double max_int = 2
double min_int = 2
map double (map (fun i -> 1 lsl i) (0--61)) = replicate 62 2
map double (map (fun i -> -1 lsl i) (0--61)) = replicate 62 2
map double [] = []
**)

And a sketch of the value generator:

module Int_gen =
struct
let zero = 0
let one = 1
let minus_one = -1
let even = 2
let odd = 3
let max_val = max_int
let min_val = min_int
let ever_larger = fun i -> 1 lsl i
let ever_smaller = fun i -> -1 lsl i
let larger_limit = 61
let smaller_limit = 61
let extras = []
end

module Float_gen =
struct
let zero = 0.
let one = 1.
let minus_one = -1.
let even = 2.
let odd = 3.
let max_val = max_float
let min_val = -.max_float
let ever_larger = fun i -> 0.1 *. 2. ** (float i)
let ever_smaller = fun i -> -0.1 *. 2. ** (float i)
let larger_limit = 200
let smaller_limit = 200
let extras = [nan, min_float, -.min_float, infinity, neg_infinity, 1e-320, -1e320]
end

module List_gen (G:GENERATOR) =
struct
let zero = []
let one = [G.one]
let minus_one = [G.minus_one]
let even = [G.zero; G.one]
let odd = [G.minus_one; G.zero; G.one]
let max_val = [G.max_val]
let min_val = [G.min_val]
let ever_larger = fun i -> replicate (1 lsl i) G.one
let ever_smaller = fun i -> []
let larger_limit = 8
let smaller_limit = 0
let extras = [[G.one; G.zero]; [G.max_val; G.min_val]; G.extras]
end

2009-01-02

Prelude.ml testing

I've been bombing prelude.ml with a barrage of unit tests. Started from the top of the file, moving towards the end. Line 958 of two days ago is now line 1723, which is where I'm currently at. And I've got 1900 lines of code left to test.

At this pace I'll be done after about 2000 extra lines of tests. If I use two days each week and write 400 lines per day, it'll take 2.5 weeks.

I've been finding bugs at a rate of around one per every hundred lines of tests. So there should be 20 bugs lurking in the rest of the codebase. I should keep proper track of that and estimate just how bad code I am writing. And try to learn something about it.

The tests serve three purposes: documenting the code, specifying functionality, and finding bugs. The way I write the tests is by writing special comments, then parsing them into a oUnit test file. So the tests are next to the code they test and take relatively little effort to write. Here's an example:

let showFloat f =
Pcre.replace ~rex:(Pcre.regexp "\\.$") ~templ:".0" (string_of_float f)
(**T
showFloat 1. = "1.0"
showFloat 1.0048 = "1.0048"
showFloat (-0.2) = "-0.2"
showFloat 1e18 = "1e+18"
showFloat 1e-18 = "1e-18"
**)

The (**T ... **) creates an anonymous test that checks that each line in the comment body evaluates to true. The name of an anonymous test consists of the source file name and the line number: if our example was at the top of the file "floats.ml", the test name would be "test_floats_line_3", making it easy to find.

You can also create named tests by adding the test name after the (**T:

(**T showFloat_test
showFloat 1. = "1.0"
**)

And it's also possible to write the oUnit test body explicitly using (***:

(*** showFloat_test
assert_equal (showFloat 1.0) "1.0";
assert_equal (showFloat (-0.2)) "-0.2"
**)

When you want to generate the tests, run tools/make_suite.rb floats.ml > floats_test.ml, preferably in your OMakefile.

Blog Archive