Gene Michael Stover's

cpio How-To & Quick Start

created Tuesday, 19 March 2002
updated Monday, 14 October 2002


Introduction

cpio is an archive program, sort of like tar. It is commonly available on Unix & Unix-like systems, including Gnu/Linux.

This article is a quick introduction for using cpio.

Extract Files from an Archive

To extract files from an archive, use the -i (copy in) command line option for cpio. That will tell cpio to read an archive from stdin & to extract the files from it.

So, assuming the archive is compressed, do this:

bzcat dir.cpio.bz2 |cpio -i

(If you're confused or concerned about my use of bzip2, you might want to read my short section about bzip2 or gzip?, then come back here & continue reading this article.)

Create an Archive

cpio creates archives differently than tar. Where tar automatically recurses into subdirectories, cpio reads from stdin a list of files & directories to archive; it does not automatically recurse into directories.

To create an archive, give cpio the -o (copy out) command line option. cpio will read a list of files & directories from stdin, create the archive, & write the archive to stdout.

A good way to generate the list of files is the find program.

To archive everything in a directory, compress it with bzip2, & write the results to a file, do this:

find dir -print |cpio -o |bzip2 >dir.cpio.bz2

That's the generic way to create an archive. On a Gnu/Linux system, you might get a lot of ugly warnings about i-node numbers being truncated. The archive will be fine, but it's never good to have unnecessary errors in the output; the eye-sore might prevent you from seeing important error messages. To prevent all those warnings, type this:

find dir -print |cpio -o -Hnewc |bzip2 >dir.cpio.bz2

A potential problem is that "-Hnewc" is not portable to all implementations of cpio. So either you must know when it's okay to use it or you must avoid using it & suffer with the gratuitous warning messages.

So far, we've created archives of all files in a directory tree. In other words, we've reproduced the functionality of tar but at the cost of more key strokes. Not very impressive. Since cpio reads a list of files from stdin, we can do a lot more.

If you want to create a distribution archive of your source code, leaving out object files (*.o), backup files (*~), and CVS & RCS directories, just take advantage of the features of find that you already know & love.

find dir \
     -name "*.o" -o \
     -name "*~" -o \
     -name CVS -prune -o \
     -name RCS -prune -o \
     -print \
|cpio -o -Hnewc |bzip2 >dir.cpio.bz2

(I've broken the example into multiple lines for readability. You'd either type the command on a single command line, or you'd break it into multiple lines, as I've done, by including the back-slashes (\) literally.)

Need to backup just the files that have changed since your last backup yesterday? Trivial!

find dir -ctime -1 -print \
|cpio -o -Hnewc |bzip2 >dir.cpio.bz2

By using find to generate the list of files, you can make cpio archive any combination of files you want. It's easy to use find from your own shell scripts, too, or you could even use your own programs to generate the list of file names. cpio achieves great flexibility by leaving the file-selection responsibilities to another program.

Advanced Features

Some (most? all?) cpio implementations are able to access file systems & tapes through a cpio server on another host. A benefit there is that you can use cpio to archive files from one host but write the archive file to, say, the tape drive on another host. I've found this useful in cases where I needed to backup large amounts of data to a tape drive, but the tape drive was on a server that didn't have enough disk space to hold a temporary copy of the entire archive, so I had to go directly to tape.

To use this feature, use the -O (that's a capital O) command line option in conjunction with the user@host:pathname method of specifying the destination file. See "man cpio" for details.

Similarly, you can use the -I command line option to extract files from tape archives mounted on servers.

As cool as it sounds, this feature has some draw-backs. System-specific command line options & device-file names are often necessary. For example, you might have to force special block sizes with -B or --block-size, or you might have to use system-specific device file names, such as /dev/st/n0a1bf00a or something similarly incomprehensible. Also, systems sometimes behave as though the communication between the client (your cpio process) & the server are treated as text, so non-text characters & end-of-lines get mangled. In other words, it sometimes just doesn't work.

In those cases, I've often made it work by using rsh and dd explicitly. In other words:

find . -print |cpio -o -Hnewc \
|rsh server dd bs=32kb of=/dev/st0

(The values for block size (bs) & output file (of) are system-specific, of course, & might differ for you.)

Comparison with tar

I don't mean this article to persuade people to use cpio instead of tar. tar is fine; I mean mostly to help people learn to use cpio if they are faced with such an archive (probably because that's what I usually give to people unless they instruct me differently). Nevertheless, I can't help but do some comparisons.

The main advantage cpio has over tar is that it's easier to archive only some of the files in a directory. That benefit comes to us because cpio reads a list of files to archive instead of assuming it should recurse into directories & archive all files. Modern implementations of tar have similar features, but they are not as flexible as the file-selection features of find or of your own program. What's more, you have to learn the file-selection language of tar, whereas you already know the file-selection language of find, & that knowledge can be applied to any file-selection task that's appropriate for find. In other words, you must know find anyway, so why not re-use that knowledge with your archiver (cpio) instead of learning a less capable, less general archiver-specific system?

cpio archives are usually noticeably smaller than tar files.

bash-2.04$ for D in phil skeleton camano tigris; do
> (cd /space/gene-1/src; find $D -print |cpio -o -Hnewc |bzip2 -9) >$D.cpio.bz2
> (cd /space/gene-1/src; tar cf - $D |bzip2 -9) >$D.tar.bz2
> done
10903 blocks
1783 blocks
13996 blocks
504 blocks
bash-2.04$ ls -l
total 3332
-rw-rw----    1 gene     gene       861535 Mar 19 18:26 camano.cpio.bz2
-rw-rw----    1 gene     gene       866475 Mar 19 18:27 camano.tar.bz2
-rw-rw----    1 gene     gene       663206 Mar 19 18:26 phil.cpio.bz2
-rw-rw----    1 gene     gene       662529 Mar 19 18:26 phil.tar.bz2
-rw-rw----    1 gene     gene       110668 Mar 19 18:26 skeleton.cpio.bz2
-rw-rw----    1 gene     gene       111623 Mar 19 18:26 skeleton.tar.bz2
-rw-rw----    1 gene     gene        46374 Mar 19 18:27 tigris.cpio.bz2
-rw-rw----    1 gene     gene        46633 Mar 19 18:27 tigris.tar.bz2

You can see that the cpio archives are smaller, but here's a table to show the relative sizes. The right-most column shows the relative size of the cpio archive in terms of the the tar archive. Smaller numbers indicate that the cpio archive was smaller.

size
base name cpio (bytes) tar (bytes) relative
camano 861535 866475 0.994
phil 663206 662529 1.001
skeleton 110668 111623 0.991
tigris 46374 46633 0.994

A disadvantage with cpio, compared to tar, is that you must type more characters to use it. Even in the simplest case, recursively archiving a directory tree using the default archive format, requires more typing. Observe the differences between these two command line:

find . -print |cpio -o >../archive.cpio

tar cf ../archive.tar .

History, Portability, & Tips

"cpio" stands for "copy in, copy out". The copy part comes from "cp", which is the Unix copy program.

cpio comes to us from AT&T from the early 1980s, if not earlier. It is not used often; tar has that honor, but I found cpio because I was forced to exchange files between two systems that had incompatible versions of tar. The systems' administrator was unwilling to update the tar implementations, so I had to find an alternative. cpio worked just fine, & since then, I have not found a Unix or Unix-like system that had a cpio that could not work with some other Unix's cpio. In other words, cpio archives appear to be very portable.

To achieve that portability when you create archives, always use -Hnewc or the default archive format (no -H option at all) unless specific experience shows that another -H value is required.

Ignore the pass-through (-p) function of cpio. Use find instead.

On MS-DOS (including Windows), where pipes are treated as text, always use the -O or -I command line option to specify the output archive or the input archive. That's a real bummer, but life sucks. (More specifically, MS-DOS (which includes Windows) is naïve.)

Many modern implementations of cpio (and tar) are able to read or write all manner of archive file formats. Don't use these; they are not portable. Use cpio for archives in the cpio format. Use tar for archives in the tar format.

Similarly, some modern implementations allow you to instruct cpio to run your compression program on the archive. Don't do this; it is not portable. Instead, run the compression program separately & explicitly. On Unix, use a pipe to connect cpio & the compressor.

bzip2 or gzip?

I prefer bzip2, so I've used it in my examples. gzip would work just as well. The two programs even share most of the important command line options. So you could substitute gzip wherever you see bzip2, & you could substitute gzcat or zcat wherever you see bzcat.

End.