DSV Library for Lisp

Gene Michael Stover

created Sunday, 2005 June 19
updated Monday, 2005 July 11

Copyright © 2005 Gene Michael Stover. All rights reserved. Permission to copy, store, & view this document unmodified & in its entirety is granted.


Contents

1 What is this?

This is a description of a Lisp library for reading Delimeter Separated Values (DSV). The library is called CyberTiggyr DSV.

2 To Do

Document do-escaped. Actually, I wrote it in a rush, on a whim, so it'd be worth re-considering it. It does its job, but maybe there is a better way. Or maybe not. Whatever. After deciding on something, doc it.

3 What is DSV

DSV is Delimeter Separated Values. Comma Separated Values (CSV) is a kind of DSV. The unix /etc/passwd file is a DSV file.

DSV file formats are explained well in the Data file Metaformats chapter of The Art of Unix Programming, by Eric S. Raymond.

CyberTiggyr DSV converts the records of the file into lists of strings in Lisp. An alternative would be to use a regular expression library & treat the records as lines of text. (And if doing that, Perl could be a better language choice than Lisp.)

4 Examples

A programming library's documentation should have an Examples section near the front so you can determine whether the library does what you want in a way you want without having to read an entire manual.

CyberTiggyr DSV can read unix-style DSV files that have an escape character. The LOAD-ESCAPED function returns the entire contents of such a file at once. The separator, escape character, & end-of-record character default to colon, backslash, & newline, respectively, so you could read a file such as /etc/passwd like this:

;; Requires CyberTiggyr Test
> (load "../lut/test.lisp")
T
> (load "dsv.lisp")
T
> (use-package "CYBERTIGGYR-DSV")
T
> (load-escaped "/etc/passwd")
(("root" "x" "0" "0" "root" "/root" "/bin/sh")
 ("uucp" "x" "10" "14" "uucp" "/var/spool/uucp" "/sbin/nologin")
 ("fido" "x" "501" "501" "fidonet national mail hour" "/home/fido"
  "/home/bin/fido"))

You can specify your own field separator character & end-of-record character. For example, at my dayjob just today (I swear), I had a file that separated fields with tabs & ended records with the underbar. Here's an example of that nonsense (using consecutive white space to simulate a tab):

Joe     123 Sesame St
Virginia, USA_Steve     345 Suite Street
DC, US A
phone 123-456-7890_

You can read a file like that by specifying the field separator & end-of-record characters for LOAD-ESCAPED, like this:

> (load-escaped "addresses.dsv"
                :field-separator #\Tab
                :end-of-record #\_)
(("Joe" "123 Sesame St
Virginia, USA")
 ("Steve" "345 Suite Street
DC, US A
phone 123-456-7890"))

You can change the default field separator, end-of-record, & escape characters so you don't need to specify them each time you call LOAD-ESCAPED.

If you have a stream, not a file, you can read a record at a time from it with READ-ESCAPED.

In the future, CyberTiggyr DSV will support quoted-style DSV files. That's what Microsloth XL uses when it writes CSV files.

5 License

CyberTiggyr DSV is released according to the Gnu Lesser General Public License (GPL).

6 Obtaining

You need just one file: http://cybertiggyr.com/gene/dsv/dsv.lisp.

The complete source code is also in an appendix of the printed version of this essay.

7 Reference

7.1 Package CyberTiggyr DSV

The Lisp package is called CYBERTIGGYR-DSV (all upcase).

It requires COMMON-LISP & CYBERTIGGYR-TEST. You can get CyberTiggyr Test from http://cybertiggyr.com/gene/lut/.

CyberTiggyr DSV exports these symbols:


7.2 *end-of-record*

defvar *end-of-record* #
Newline

*END-OF-RECORD* must be bound to the character which ends a record. By default, it's a newline. When you do not specify an end-of-record character when you call READ-ESCAPED or LOAD-ESCAPED, the function you call will get its default end-of-record character from *END-OF-RECORD*.


7.3 *escape*

defvar *escape* #

*ESCAPE* is bound to the default escape character that READ-ESCAPED & LOAD-ESCAPED will use. By default, it's a backslash.

To disable escapes, bind NIL to *ESCAPE*. Since NIL is a symbol, not a character, no character will ever be EQL to it, so no character will ever be used as the escape character.


7.4 *field-separator*

defvar *field-separator* #:

*FIELD-SEPARATOR* is bound to the character which by default separates fields in a record. If you do not specify a field separator character when you call READ-ESCAPED & LOAD-ESCAPED, the function will use the character bound to *FIELD-SEPARATOR*. By default, it's a colon.


7.5 load-escaped

defun load-escaped pathname &key (field-separator *field-separator*) (end-of-record *end-of-record*) (escape *escape*) (trace nil)

LOAD-ESCAPED reads all the DSV records from the specified file & returns them in a list.

If you specify a stream for trace, LOAD-ESCAPED will print a progress messages as it goes. (It isn't pretty, so you probably don't want to use that feature when an end user will see the output.)


7.6 read-escaped

defun read-escaped strm &key (field-separator *field-separator*) (end-of-record *end-of-record*) (escape *escape*)

READ-ESCAPED consumes & returns the next record from the DSV stream. On end-of-input, returns strm.

strm must be a stream that supports READ-CHAR & PEEK-CHAR.

A. Other File Formats

I write almost all of my documents in LATEX ([5], [3]). I compile to PDF with latex, dvips, & ps2pdf. I compile to HTML with latex2html ([1], [4]).

Bibliography

1
Nikos Drakos.
latex2html.

2
Free Software Foundation.
Lesser general public license.
world wide web.
http://www.gnu.org/licenses/licenses.html#LGPL.

3
Michel Goossens and Frank Mittelbach.
The LATEX Companion.
Addison Wesley Longman, Inc., 1993.
ISBN 0201541998.

4
Michel Goossens and Sebastian Rahtz.
The LATEX Web Companion: Integrating TEX, HTML, and XML.
Addison Wesley Longman, Inc., 1999.
ISBN 020143317.

5
Leslie Lamport.
LATEX: A Document Preparation System.
Addison-Wesley Publishing Company, Inc., 1986.
ISBN 0-201-15790-X.

6
Eric Steven Raymond.
The art of unix programming, 2003.
http://www.faqs.org/docs/artu/.

Gene Michael Stover 2008-04-20