art with code

2008-10-01

I/O in programming languages: writing

This post is a part of a series where your intrepid host looks at I/O in different programming languages in search of understanding and interesting abstractions.

Part 1: open and read
Part 2: writing -- you are here
Part 3: basic piping
Part 4: piping to processes
Part 5: structs and serialization

Continuing from the previous post -- where we looked at opening, closing and reading files in languages ranging from GNU Assembly to Haskell -- today we'll do some writing. The languages of the day are ASM, Bash, C, Clean, Factor, Haskell, OCaml, Ruby and SML.

But before that, a small Perl example for line-wise reading, courtesy of Philip Taylor:

open $fd, "my_file";
while (<$fd>) {
chomp;
print scalar reverse;
print $/;
}

It's quite similar to the Pythonic for line in fd:, the <$fd> reading the next line from $fd. Though, in a Perlish twist, the loop uses the $_ implicit argument to call the string munging functions, which makes the code look a bit like a stack language. See this Factor version for example:

USING: io io.encodings.utf8 io.files sequences ;

"my_file" utf8
file-lines [
reverse
print
] each

Both the Perl loop and the Factor loop use an implicit variable ($_ in Perl, the stack in Factor) to determine what value a procedure call should take as its argument.

But back to our subject for the day. The write syscall takes a file descriptor, a buffer and the buffer length, as demonstrated by this ASM version of "Hello, world!":

.equ EXIT, 1
.equ WRITE, 4
.equ STDOUT, 1

.section .data
hello:
.ascii "Hello, world!\n"
.equ hello_len, 14

.section .text
.globl _start
_start:
movq $WRITE, %rax
movq $STDOUT, %rbx
movq $hello, %rcx
movq $hello_len, %rdx
int $0x80

movq $EXIT, %rax
movq $0, %rbx
int $0x80

The C version is more convenient to write:

#include <stdio.h>
#include <unistd.h>

int main (int argc, char *argv[])
{
char hello[] = "Hello, world!\n";
write(STDOUT_FILENO, hello, strlen(hello));
return 0;
}

Moving up the abstraction ladder, OCaml does away with strlen and return 0:

let () = output_string stdout "Hello, world!\n"

The SML version is very similar, but uses a tuple instead of currying:

val () = TextIO.output (TextIO.stdOut, "Hello, world!\n")

Haskell uses PutStr where the ML derivatives use output. What's wrong with "write", anyhow?

import System.IO
main = hPutStr stdout "Hello, world!\n"

Ruby has no equivalent of main, top-level expressions are executed in the order they are found:

STDOUT.write "Hello, world\n"

Bash uses > to pipe output to a file, let's use that in this homespun version of echo (that uses echo for good measure...):

echo -n -e 'Hello, world!\n' > /dev/stdout

Clean uses uniqueness types to preserve referential transparency. In the following program, we get the standard IO pipes from the world and then have to use the return value of IO actions to do the next IO action:

module hello
import StdEnv

Start :: *World -> *World
Start world
# (console, world) = stdio world
# console1 = fwrites "Hello, world!\n" console
# (ok,world) = fclose console1 world
| not ok = abort "Cannot close console"
| otherwise = world

If we bungle the IO passing, the result is a broken program. I'll split the write into two parts to demonstrate. First a working version:

# console1 = fwrites "Hello," console
# console2 = fwrites " world!\n" console1
# (ok,world) = fclose console2 world

If, instead of console1, I continue to use the original console, I end up with a screwed up program. Behold: the following snippet prints out only " world!\n" (it doesn't cause a compilation error though):

# console1 = fwrites "Hello," console
# console2 = fwrites " world!\n" console
# (ok,world) = fclose console2 world


Moving on, the most common filename-level write operations are writing a string into the named file and appending a string to the named file. You sometimes need to do prepend as well, but that tends to make library writers squirm as most file systems only do fast truncates and appends.

Prelude.ml has utilities, as you might expect:

open Prelude
let () =
let fn = "my_file" in
writeFile fn "hello there";
appendFile fn "!";
prependFile fn "Why, ";
puts (readFile fn)

Ruby leverages the open mode flag, and I'm doing a simple memory-hungry prepend:

fn = "my_file"
File.open(fn, "w"){|f| f.write "hello there" }
File.open(fn, "a"){|f| f.write "!" }
File.open(fn, "r+"){|f|
d = f.read
f.truncate 0
f.write "Why, "
f.write d
}
puts File.read(fn)

A C version that uses fopen mode strings is pretty much the same as the Ruby version, except a lot more verbose (I don't even have error handling here):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <unistd.h>

int write_file(const char *fn, const char *buf, size_t len)
{
FILE *fd = fopen(fn, "w");
int rv = fwrite(buf, 1, len, fd);
fclose(fd);
return rv;
}

int append_file(const char *fn, const char *buf, size_t len)
{
FILE *fd = fopen(fn, "a");
int rv = fwrite(buf, 1, len, fd);
fclose(fd);
return rv;
}

int read_file(const char *fn, char **buf, size_t *len)
{
struct stat st;
FILE *fd = fopen(fn, "r");
fstat(fileno(fd), &st);
*buf = (char*)malloc(st.st_size);
int rv = fread(*buf, 1, st.st_size, fd);
fclose(fd);
*len = st.st_size;
return rv;
}

int prepend_file(const char *fn, const char *buf, size_t len)
{
char* tmp;
size_t tlen;
read_file(fn, &tmp, &tlen);
write_file(fn, buf, len);
int rv = append_file(fn, tmp, tlen);
free(tmp);
return rv;
}

int main ( int argc, char *argv[] )
{
char fn[] = "my_file";
char a[] = "hello there",
b[] = "!",
c[] = "Why, ";
char *buf;
size_t len;

write_file(fn, a, strlen(a));
append_file(fn, b, strlen(b));
prepend_file(fn, c, strlen(c));
read_file(fn, &buf, &len);

fwrite(buf, 1, len, stdout);
fwrite("\n", 1, 1, stdout);

free(buf);
return 0;
}

In Bash, writing and appending are easy, prepending requires a temp file:

echo -n 'hello there' > my_file;
echo -n '!' >> my_file;

echo -n 'Why, ' | cat - my_file > tmp &&
mv tmp my_file;

cat my_file && echo;

Haskell has writeFile and appendFile, but prependFile causes some extra work due to the lazy readFile; the naive buf <- readFile fn; writeFile fn (s ++ buf) fails with "openFile: resource busy (file is locked)":

import System.IO

main = do
let fn = "my_file"
writeFile fn "hello there"
appendFile fn "!"
prependFile fn "Why, "
putStrLn =<< readFile fn

prependFile fn s = do
buf <- readFileStrict fn
writeFile fn (s ++ buf)

readFileStrict fn = do
h <- openFile fn ReadMode
buf <- hGetContents h
let !b = buf
hClose h
return b

SML doesn't have the convenience functions, so we need to implement them:

fun bracket v finally f =
let
val rv = f v
val () = finally v
in
rv
end handle x => let
val () = finally v handle _ => ()
in
raise x
end

fun withTextIn file f = bracket (TextIO.openIn file) TextIO.closeIn f
fun withTextOut file f = bracket (TextIO.openOut file) TextIO.closeOut f
fun withTextAppend file f = bracket (TextIO.openAppend file) TextIO.closeOut f

fun readFile file = withTextIn file TextIO.inputAll
fun writeFile file s = withTextOut file (fn f => TextIO.output (f, s))
fun appendFile file s = withTextAppend file (fn f => TextIO.output (f, s))
fun prependFile file s =
let
val buf = readFile file
val () = writeFile file s
val () = appendFile file buf
in
()
end

val () =
let
val file = "my_file"
val () = writeFile file "hello there"
val () = appendFile file "!"
val () = prependFile file "Why, "
in
TextIO.print (readFile file ^ "\n")
end

Which is pretty much how prelude.ml works as well, except that, in prependFile, if the file is larger than 32 megabytes, prelude.ml uses the tempfile strategy (like the Bash version.)

Factor has set-file-contents for writing a file, for appending we need to use with-file-appender (compare with the withTextAppend above.) By the way, Factor's REPL workspace is really nice, it has incremental search for words and links them to a documentation browser. It also shows the current data stack, has separate panes for input and output and a built-in profiler. Here's the Factor version:

USING: io io.files io.encodings.utf8 ;

"hello there" "my_file" utf8 set-file-contents

"my_file" utf8 [ "!" write ] with-file-appender

"my_file" utf8 file-contents
"Why, " "my_file" utf8 set-file-contents
"my_file" utf8 [ write ] with-file-appender

"my_file" utf8 file-contents "\n" append write


I tried writing a Clean versions of readFile, writeFile, appendFile and prependFile, but my program segfaults when I run it. C'est la vie~

Here's the segfaulting version anyhow, maybe lazyweb knows what the problem with it is:

module why

import StdEnv

DoFile f mode filename files
# (ok,file1,files1) = fopen filename mode files
| not ok = abort ("Failed to open '" +++ filename +++ "'")
# (res,file2) = f file1
(closeok,files2) = fclose file2 files1
| not closeok = abort ("Failed to close '" +++ filename +++ "'")
| otherwise = (res,files2)

WriteFile_ mode filename str files =
snd (DoFile (\f = (False, fwrites str f)) mode filename files)
WriteFile = WriteFile_ FWriteText
AppendFile = WriteFile_ FAppendText

flength f
# (ok, f1) = fseek f 0 FSeekEnd
| not ok = abort "seek failed"
# (pos, f2) = fposition f1
# (ok, f3) = fseek f2 0 FSeekSet
| not ok = abort "seek failed"
| otherwise = (pos, f3)

ReadAll f
# (len, f1) = flength f
= freads f1 len

ReadFile fn files = DoFile ReadAll FReadText fn files

PrependFile fn str files
# (old, files1) = ReadFile fn files
# files2 = WriteFile fn (str +++ old) files1
= files2

Start world
# (console,world1) = stdio world
# world2 = WriteFile fn "hello there" world1
# world3 = AppendFile fn "!" world2
# world4 = PrependFile fn "Why, " world3
# (str,world5) = ReadFile fn world4
# console1 = fwrites (str +++ "\n") console
# (ok,world6) = fclose console1 world5
| not ok = abort "Cannot close console."
| otherwise = world6
where
fn = "my_file"


And that's it for simple writing. There wasn't all that much variation, the Bash version being perhaps the most novel. C and SML used imperative writes [OCaml stdlib too], Ruby and prelude.ml wrapped them in higher-order functions, Clean threaded state to preserve purity, and Haskell used monads for the same.

Piping in the next installment. Should give stream-oriented IO and lazy lists a workout.

1 comment:

Anonymous said...

The Haskell prependFile doesn't work for me: it works for small files, but above a certain size, it cuts off the end of the file. For example, if I do

sequence $ [prependFile "test.txt" (take 1000 $ cycle (show n)) | n <- [0..100]]

Then test.txt ends with ...98989898. So watch out.

Blog Archive