Comparing Some IPC Methods on Unix

by Gene Michael Stover

Sunday, 27 January 2002

Introduction

There is sometimes disagreement between programmers on what method of IPC to use in Unix applications. Here I report the measurements of throughput, programming difficult, code sharing, & run-time flexibility of some IPC methods.

The Methods

pipe
This is where one program writes to stdout, another program reads from stdin, & you connect the two with the pipe operator ('|') in a shell script. Pipes can be considered as a method of communication between software components in which simple programs are the components & shell is the scripting language. In this respect, pipes are in the same family as COM & Corba.
temporary file
This is where one program writes to stdout, another program reads from stdin, & you connect the two by running the first program with its output redirected to a temporary file, then running the second program with its input redirected from the temporary file. (Then you remove the file, in case that matters.) This is closely related to pipes & can be considered a kind of piping.
message queue
This is where one program places messages in a Unix message queue, & another program removes the messages from the queue. See getmsg, msgsnd, & msgrcv in the Unix manual.
Unix domain sockets
This is where two programs communicate over a network socket in the Unix domain. This type of socket is like a TCP socket except that, where a TCP socket is in the IP domain, a Unix socket is in the Unix domain. Sockets in the Unix domain can be server sockets (bind, listen, accept) or client sockets (connect). They are just like the more familiar TCP sockets except that the client & the server must be on the same host.

Numbers

Throughput

Naïve software developers believe that the most important thing about software is run-time speed. Let's satisfy their curiosity now.

I did some performance tests on two machines on my network, & here are the raw numbers.

Each row in a table is one "test". Each test (row in the table) shows the run-times for each of the IPC mechanisms I tested on one host. For example, the first row in the table (after the column headings) shows the results of running the message queue IPC method, the temporary file IPC method, the Unix socket IPC method, the pipe IPC method, plus two others (sanity checks & curiosity satisfiers) on Palsy.

The first column is the host name. Ebola is a 200 MHz Pentium with 32 MB of RAM running OpenBSD 2.7. Palsy is a 750 MHz Pentium 3 with 128 MB of RAM running Red Hat Linux 7.1 (Linux kernel 2.4, methinks).

Columns 2 through 7 are times for the IPC mechanims. For each IPC mechanism in a test, a generator program uses the IPC mechanism to send 512 megabytes of randomly generated, printable characters to a sink program. The time required for the generator to send everything & the sink to receive everything is the result of using that IPC mechanism in that test. The time include connection start-up & tear-down.

I chose 512 megabytes because

I chose to send randomly generated printable characters because it's a way to simulate real-world data being transmitted but with a small development cost & a small run-time cost.

Regardless of the IPC mechanism, the generator transmitted data to the sink in chunks of fixed size. In all cases, the chunk size was 1024 bytes. I chose this because

Larger chunk sizes tend to produce higher throughput, but the largest chunk size that would work on Ebola was 1024. Besides that, the purpose of these tests is to compare the throughputs of the IPC mechanisms, not to find any particular mechanism's maximum throughput. I used the same chunk size for all IPC mechanisms in the hope of giving all IPC mechanisms the same advantage, which is equivalent to giving no advantage to any IPC mechanism.

The IPC mechanims are named in the first column. They are:

msgq
Message Queues. The generator produced characters in fixed-sized messages which is stuffed into a message queue. The sink pulled messages from the queue & discarded them. In this case, the generator was src/qgenerator.c, & the sink was src/sink.c.
tmp file
Temporary file. The generator wrote characters to stdout. I redirected that to a temporary file. When the generator was done, I ran the sink & redirected its input from the temporary file. Then I deleted the file. ("I" didn't do all that manually. It was in a shell script.) In this test, the the generator was src/generator.c, and the sink was src/sink.c.
>/dev/null
The generator's stdout was redirected to /dev/null. There was no sink. In this test, the generator was src/generator.c.
|sink
The generator's stdout was piped into the sink's stdin. In this test, the the generator was src/generator.c, and the sink was src/sink.c.
|cat >/dev/null
The generator's stdout was piped into cat, & cat's output was redirected to /dev/null. This was sort of to get a feel for the efficiency of the src/sink.c program. In this test, the the generator was src/generator.c, and the sink was src/sink.c.

Generator/sink pairs were separate programs run from a simple shell script. (Those are the src/run-*.sh scripts in the source code.) A wrapper shell script (src/throughput.sh) timed the shell scripts that ran the generator/sink pairs. Timing was done by running the common Unix date program before & after the pair-running script, then taking the difference. The resolution of this method of timing is 1 second.

I ran multiple tests on each host so that noise would average-out when I analyzed the results. Noise might come from the (lack of) resolution of the timing mechanism & from periodic background processes run by the operating system. (During the tests, the hosts were not running any user programs at all, & I didn't even access the common LAN from other hosts, but common & standard Unix operating system processes were still running.)

(I shouldn't need to point this out, but I have a sinking feeling that I should: Since the table reports run-times, smaller numbers indicate higher throughput, & "Higher throughput" is a kind of better performance.)

Finally, here's the table of performance results.

hostname time (sec)
msgq tmp file unix pipe |cat >/dev/null >/dev/null
palsy 67 101 67 66 66 64
palsy 67 101 67 66 66 64
palsy 67 101 67 65 66 65
palsy 67 101 69 66 66 64
palsy 68 101 69 66 66 64
palsy 67 103 68 66 66 64
palsy 67 101 69 65 66 65
palsy 68 100 69 66 66 65
ebola 479 432 371 356 355 340
ebola 477 434 370 354 357 340
ebola 478 432 369 353 356 340
ebola 478 432 370 352 356 340

Development Time

I tracked development times on a wall clock while I wrote the generator/sink pairs of programs. Here is a table of development times, but notice that the first pair of programs, generator and sink, are used in four IPC methods. So, depending on your demands, their development cost might amortize to 8.5 minutes per IPC method.

method program pair development time (hh:mm)
pipe, tmp file, |cat, /dev/null generator.c, sink.c 0:34
msgq qgenerator.c, qsink.c 1:24
Unix socket ugenerator.c, usink.c 1:03

Analysis

Throughput

Well, I'm surprised. I expected message queues to have a marginally higher throughput than the other methods of IPC. It looks like pipes are the fastest method of IPC that I tested. This is not surprising because it is the original method of IPC on Unix, so implementors have had plenty of time to optimize it. What's more, other benefits of pipes cause it to be a common method of IPC on Unix, which gives implementors still more motivation to optimize pipes.

Because each IPC mechanism was used the same number of times in each combination of hostname, data size (always 512 megabytes), & chunk size (always 1024 bytes), we can obtain an estimate of total work for each IPC mechanism over all the tests by summing each column from the first table (the one that showed run times for each test). These sums are the cumulative work for each IPC mechanism over all the tests, & they are valid as long as each test summed includes all the IPC mechanisms. The sums produce the first row in the following table.

From those sums, we can get a relative measure of efficiency. We do that by finding the largest sum. (Rememeber that the sums are seconds, so larger numbers means less throughput.) That largest sum turns out to belong to the tmp file (temporary file) IPC method, & it is 2539 seconds. We divide that number by each of the sums. The quotient for an IPC method is that IPC method's efficiency compared to the slowest of the methods. (The efficiency of the tmp file method will be 1.0, since it's sum is the numberator in all the divisions.) These quotients are the second column in the following table.

Here's that's table, keeping sort of the same columns as the previous run-time table for easy reading. The first row holds the sums; each column is a number of seconds. The second row holds the quotients; they are multiples of the througput of the tmp file method of IPC. Larger values in the second row indicate higher throughput.

over-all method
msgq tmp file unix pipe |cat >/dev/null >/dev/null
sum (sec) 2450 2539 2025 1941 1952 1875
throughput
(relative)
1.03 1.00 1.25 1.30 1.30 1.35

From the second row, you can see that pipes are the fastest method of IPC. (Redirecting to /dev/null has a higher throughput, but it's not exactly a method of IPC because the data never reaches another program, it just disappears into /dev/null.) Unix sockets are a close second in throughput, with message queues being only slightly faster than temporary files.

Run-time Flexibility

Think about flexibility. To put us on the same page, let me tell you how the test programs were implemented.

I had two write six programs:

qgenerator & qsink are good for exactly one thing: reading & writing the message queue they share. You can't send their data through any of the standard Unix utilities such as wc, awk, or any of the hundreds of others. You can't put their data in a file or an e-mail message. Their computations, their code, & their data are bound to their communication method.

Nearly the same can be said for ugenerator & usink, though they benefit from some code reuse.

The same cannot be said of generator & sink. That single pair of programs, with no increase in complexity, implements two IPC methods (piping & redirecting) without any increase in complexity in the source code. (Look at the source code if you don't believe me. The source for generator.c is simpler than that of the other generators, & the source for sink.c is simpler than that of the other sinks.)

How can this be? It's because their IPC mechanism is external to them. They read stdin or write stdout, & other programs can redirect or pipe that where they want.

generator & sink could be combined with standard Unix utilites such as wc, file compressors, & e-mail. They can even be used over a network via rsh, telnet, or some other, pipe-based command line networking utility (which would itself be flexible due to the same pipe IPC mechanism that generator & sink use).

Code Reuse

Code reuse occurs on three levels: source code, object code (.o files & linkable libraries), & executable code (the program files you actually run).

Source code reuse occurs when the developer types one set of source code & uses it in different places. A common example of this is templates in C++ (which I did not use in these tests). It also happens when you use macros in C. Notice that with C++ templates & C macros, the same source code is (probably) used to produce different chunks of object code. For example, if I write a function template in C++ & then call it with an integer argument in one place & a char * argument in another place, the function is compiled to object code twice; once as a function of integers & a second time as a function of char *. Source code reuse is the weakest form of code reuse; it saves you the least.

Object code reuse occurs when you compile a chunk of source code once & link with it. This implies that the source code was reused. So with object code reuse, the developer write the source code once & the compiler compiled it once. Object code reuse is most cost-effective than source code reuse because the developer wrote & debugged just one set of source code, & the compiler compiled it just once.

Executable code reuse occurs when a single program can be used for more than one purpose. When a shell script takes some filter program (such as cat or gzip), pipes some other program's output through it, & then pipes yet another program's output through it again, that's executable code reuse. Here's an example:

    # !/bin/sh

    #
    # Here we use the 'gzip' executable
    #
    ls |gzip >tmpfile

    #
    # Here we use the 'gzip' executable
    # again
    #
    w |gzip >tmpfile2

Sure, that's a trivial example, but consider the benefits of you're reusing complex functionality, like rsh, an e-mail program, or a complex data-analysis program. Consider also the benefits of this kind of reuse:

So much for code reuse in theory. Here's how my test programs reused code.

All the generator programs shared the code that generated the random characters. That code is in the g.c file. This was object code reuse, & it was a trivial case of that.

All the generator programs also shared the looping code that sent the generator characters. That code is the function APP_Loop in the file app.c. They also shared the command line parsing code (function APP_ParseCommandLine) and some initialization & clean-up code (functions APP_Init and APP_Uninit). All this is object code reuse, & that's a good thing.

The pair of message queue programs (qgenerator and qsink) were not able to share much more code. They shared some between each other, but no more.

The Unix socket generator (ugenerator) was able to share a lot of code with the pipe-able generator (generatorFILE * functions. This is an argument that basic, C (or C++) I/O promotes object code reuse. (Notice that this was more reuse than achived with the message queue programs.)

The pipe-able programs (generator and sink) even allow for executable code reuse. That generator's output can be piped to other programs or redirected to a file. That sink's input can come from other programs or a file. It's not such a great advantage to these particular programs because they are toys, but what if you have programs that do non-trivial analysis of data, or that allow other programs to communicate over a network (which is what rsh, ssh, telnet, rcp, e-mail programs, & about a bazillion others do)? What if the program that can be reused is a data compressor (such as gzip)? Or an error-corrector? All these programs allow for executable code reuse, & as I said already, that reuse doesn't require as much developer expertise (or time or effort) as does writing the C, C++, or Java code to do the same thing.

Anyway, so the pipe method of IPC (which includes redirecting to files) promotes executable code reuse (which implies object code reuse & source code reuse), whereas the other methods only allow for object code reuse & source code reuse.

Architecture

The pipe-able programs, generator & sink, are the cleanest. All generator does is make new characters & write them. All sink does it read characters & discard them. Neither program needs to worry about connection start-up or tear-down. Neither program needs to worry about message boundaries. Few things in life could be simpler than these two programs. Their functionality is completely divorced from the details of communication. While this isn't a big deal with these two programs, it could be a big deal with programs that have more complex functionality. If a developer is writing a program with complex functionality, he'll be less productive if he needs to worry about message boundaries or connection start-up & tear-down. He'll have to worry about these things if he's using message queues or sockets for IPC.

The other programs (qgenerator, qsink, ugenerator, & usink) have code that deals with the IPC mechanisms. The Unix domain socket programs (ugenerator & usink) are the simplest of these because, after setting up their connection, they call the same code that the pipe-able programs do. The message queue programs have to deal with connection start-up, then with some unique message-transmission code, & then connection tear-down. Their functionality is coupled to their IPC mechanism more closely than in the pipe-able programs.

Conclusion

Unix application developers have many methods of IPC available. This plethora of choices can make for a difficult decision.

Programs which read & write stdin & stdout so that they can be piped together or redirected to or from files excell in many ways over programs that use other methods of IPC:

There are few, if any, business reasons for application developers to prefer message queues to reading & writing stdin & stdout as a method of IPC.

Source Code

At this time (Sunday, 27 January 2002), the source code I used in these tests is online at throughput-6.cpio.bz2. I make no guarrantees about it remaining available there.

After downloading:

  1. bzcat throughput-6.cpio.bz2 |cpio -i
  2. cd throughput-6
  3. ./configure
  4. make
  5. go

The results will be written to doc/`hostname`.txt.

Bibliography

A relevant, good books is:

Mike Gancarz, The Unix Philosophy. (1995) Digital Press; Newton, MA. ISBN 1-55558-123-4.

End.