Copyright © 2005 Gene Michael Stover. All rights reserved. Permission to copy, store, & view this document unmodified & in its entirety is granted.
This is a description of a Lisp library for reading Delimeter Separated Values (DSV). The library is called CyberTiggyr DSV.
Document do-escaped. Actually, I wrote it in a rush, on a whim, so it'd be worth re-considering it. It does its job, but maybe there is a better way. Or maybe not. Whatever. After deciding on something, doc it.
DSV is Delimeter Separated Values. Comma Separated Values (CSV) is a kind of DSV. The unix /etc/passwd file is a DSV file.
DSV file formats are explained well in the Data file Metaformats chapter of The Art of Unix Programming, by Eric S. Raymond.
CyberTiggyr DSV converts the records of the file into lists of strings in Lisp. An alternative would be to use a regular expression library & treat the records as lines of text. (And if doing that, Perl could be a better language choice than Lisp.)
A programming library's documentation should have an Examples section near the front so you can determine whether the library does what you want in a way you want without having to read an entire manual.
CyberTiggyr DSV can read unix-style DSV files that have an escape character. The LOAD-ESCAPED function returns the entire contents of such a file at once. The separator, escape character, & end-of-record character default to colon, backslash, & newline, respectively, so you could read a file such as /etc/passwd like this:
;; Requires CyberTiggyr Test
> (load "../lut/test.lisp")
T
> (load "dsv.lisp")
T
> (use-package "CYBERTIGGYR-DSV")
T
> (load-escaped "/etc/passwd")
(("root" "x" "0" "0" "root" "/root" "/bin/sh")
("uucp" "x" "10" "14" "uucp" "/var/spool/uucp" "/sbin/nologin")
("fido" "x" "501" "501" "fidonet national mail hour" "/home/fido"
"/home/bin/fido"))
You can specify your own field separator character & end-of-record character. For example, at my dayjob just today (I swear), I had a file that separated fields with tabs & ended records with the underbar. Here's an example of that nonsense (using consecutive white space to simulate a tab):
Joe 123 Sesame St Virginia, USA_Steve 345 Suite Street DC, US A phone 123-456-7890_
You can read a file like that by specifying the field separator & end-of-record characters for LOAD-ESCAPED, like this:
> (load-escaped "addresses.dsv"
:field-separator #\Tab
:end-of-record #\_)
(("Joe" "123 Sesame St
Virginia, USA")
("Steve" "345 Suite Street
DC, US A
phone 123-456-7890"))
You can change the default field separator, end-of-record, & escape characters so you don't need to specify them each time you call LOAD-ESCAPED.
If you have a stream, not a file, you can read a record at a time from it with READ-ESCAPED.
In the future, CyberTiggyr DSV will support quoted-style DSV files. That's what Microsloth XL uses when it writes CSV files.
CyberTiggyr DSV is released according to the Gnu Lesser General Public License (GPL).
You need just one file: http://cybertiggyr.com/gene/dsv/dsv.lisp.
The complete source code is also in an appendix of the printed version of this essay.
The Lisp package is called CYBERTIGGYR-DSV (all upcase).
It requires COMMON-LISP & CYBERTIGGYR-TEST. You can get CyberTiggyr Test from http://cybertiggyr.com/gene/lut/.
CyberTiggyr DSV exports these symbols:
defvar *end-of-record* #
Newline
*END-OF-RECORD* must be bound to the character which ends a record. By default, it's a newline. When you do not specify an end-of-record character when you call READ-ESCAPED or LOAD-ESCAPED, the function you call will get its default end-of-record character from *END-OF-RECORD*.
defvar *escape* #
*ESCAPE* is bound to the default escape character that READ-ESCAPED & LOAD-ESCAPED will use. By default, it's a backslash.
To disable escapes, bind NIL to *ESCAPE*. Since NIL is a symbol, not a character, no character will ever be EQL to it, so no character will ever be used as the escape character.
defvar *field-separator* #:
*FIELD-SEPARATOR* is bound to the character which by default separates fields in a record. If you do not specify a field separator character when you call READ-ESCAPED & LOAD-ESCAPED, the function will use the character bound to *FIELD-SEPARATOR*. By default, it's a colon.
defun load-escaped pathname &key (field-separator *field-separator*) (end-of-record *end-of-record*) (escape *escape*) (trace nil)
LOAD-ESCAPED reads all the DSV records from the specified file & returns them in a list.
If you specify a stream for trace, LOAD-ESCAPED will print a progress messages as it goes. (It isn't pretty, so you probably don't want to use that feature when an end user will see the output.)
defun read-escaped strm &key (field-separator *field-separator*) (end-of-record *end-of-record*) (escape *escape*)
READ-ESCAPED consumes & returns the next record from the DSV stream. On end-of-input, returns strm.
strm must be a stream that supports READ-CHAR & PEEK-CHAR.
I write almost all of my documents in LATEX ([5], [3]). I compile to PDF with latex, dvips, & ps2pdf. I compile to HTML with latex2html ([1], [4]).
Gene Michael Stover 2008-04-20