Printer   GNUlpr Roadmap VA Logo HP Logo
SourceForge Project Links: Home | Overview | Documentation | Screenshots | CVS | Downloads | View Bugs | Submit a Bug
Printing Summit 2000
Printing Summit 2001
Affiliate Projects
Free Standards Printing
HP Sponsored Projects
Quick Documentation
Man Pages
GNULpr Road Map
GNULpr Caveats
Note: A version of this document can be read at

The need for a printing system
gnulpr 1.0:
gnulpr 1.1:
Enumerating the printers
gnulpr 1.2:
gnulpr 2.0:
Submitting print jobs
The spooler
Not queuing print jobs
Queuing print print jobs
Print job metadata
Filter parameters
Network Printing protocols
Type conversion
Print by reference
Some proposed filters
Important gnulpr 2.0 design features

1: The need for a printing system

In, Grant makes several arguments for the death of LPR and lpd. The highlighted points are:

  • LPR does not show the robustness and maturity of a program its age.
  • LPR has fragmented horribly, and thus is a "support nightmare".
  • It is underpowered, and "gratuitously devoid of actual features."

Grant stated that his ideal printing system would:

  • Be secure, both by design and in implementation.
  • Offer lightweight and efficient IPP and LPD [protocol implementations].
  • Offer a simple and extensible interface for filters, drivers, accounting, queue management, etc.
  • Support a [wide] selection of clients; ideally with a client library offering job operations and printer status and capability information.

This is likely not new to anyone who has worked with LPR and had to deal with its insufficiencies. LPR was cobbled together (likely over a weekend) to meet a DARPA deadline. It fulfilled a checklist, and that was it.

Grant is not alone in this. The Free Software Foundation has noticed that LPR has forked into a number of poorly-maintained or propretary versions. Also, the GNU project lacks a printing system, so Richard Stallman approached Ben Woodard about taking over LPR maintainership for project GNU. Since then, we have been working hard to prepare the gnulpr 1.0 release.

To download the latest development snapshot, simply run the following commands:

cvs login
cvs -z3 co -r bens_dev_branch lpr

There are two gnulpr-related mailing lists hosted on sourceforge. The lpr-announce list is a standard announcement list for notices about new releases and events. You can subscribe or read the archives at:

The lpr-discuss list is an open discussion list for conversation about architecture, features, patches, and so forth. You can subscrib to lpr-discuss at:

What follows is a collection of notes, gleaned from various e-mails and discussions, that provide a tentative development roadmap for gnulpr.

2: gnulpr 1.0:

The real problem is that LPR is not actually a printing system. It is merely a queue manager, job submission utility, and print job submitter. All of the real work is handed off to filter programs and utilities.

The work necessary to make a complete printing system from LPR is substantial, especially if any advanced features are needed. Configuration, data conversion, and user feedback are all handled by different tools.

gnulpr 1.0 is what we've been referring to internally as an "anthology release". We've compiled all of the different pieces of an LPR-based printing system into one CVS tree, similar to the Cygnus toolchain tree. The pieces still stand alone as independent packages (and will build independently), but one can run the top-level configure script to build and install a complete system.

2.1: LPR

Up to now, simplicity has been LPR's saving grace: it doesn't try to do too much, and so all of the other pieces have been able to evolve without LPR getting in the way. But we're at the point where adding features to the queue management and job sumbission components are necessary, and nobody wants to touch LPR.

The version of LPR included in the anthology has two important features added:

  • Support for the multicast-based LPNotify protocol, for job status notification.
  • Command-line switches and control-file fields to support device-specific printing options (such as PPD options).
  • Most large statically-allocated arrays have also been replaced with dynamically allocated data structures.

    2.2: printfilters

    The filter scripts that LPR uses perform the logic of determining source and target file formats and the filters needed to translate between the two. The printfilters included in gnulpr 1.0 have been enhanced to allow PPD-based filtering and generate device-specific PostScript for a given printer.

    2.3: libppd

    One of the limitiations of current print systems is that there has been no way for applications to find out about and access printer capabilities. Libppd provides an API for programs to find out about and select device capabilities when genereating PS print jobs. It can be used both with PS printers as well as well as printers that go through GhostScript (provided that the particular GS driver has an associated PPD file). The foo-o-matic program is reported to make PPD files for arbitrary GhostScript drivers.

    2.4: gpr

    Since not all applications link to libppd, gpr is a client application that uses the UI hints included in a PPD file to provide GTK dialogs for enabling device-specific features. It is designed as a drop-in replacement for the lpr binary (although it does call our modified lpr client).

    2.5: printerconf

    This is a library of functions for doing network autodetection of printers using SNMP or IEEE 1284's parallel port discovery protocol.

    Most modern printers have very sophisticated network configuration capabilites. The future plan for libprinterconf is to provide an API to discover or set a printer's current status configuration (currently this is only available in the npadmin utility, below).

    2.6: npadmin

    Right now npadmin is a stand alone program that allows printer discovery and read only access to status and configuration. In the future, it is planned that it will be merely a command line front-end to libprinterconf's query and configuration capability.

    2.7: snmpkit

    This library provides functions to make using SNMP easier. The printerconf library uses it to make many simultaneous SNMP connections.

    2.8: printtool

    This TCL/Tk program provides a simple GUI for the configuration of printers. This version uses libprinterconf to attempt autodetection.

    2.9: tdb

    This is a lightweight DBM implementation originally developed for SAMBA. We use this library in a number of utilities.

    3: gnulpr 1.1:

    While 1.0 is nearly debugged and complete, our development efforts have been pushing forward version 1.1. As of this writing, it is nowhere near as complete as 1.0, but we have been adding and testing many new features.

    The main thrust with gnulpr 1.1 is to provide an API for enumerating printers and their configurations and controlling the print spooler. This library will be used internally within gnulpr as well as exported so that client applications may make use of it.

    While not slated to be complete until gnulpr 2.0, gnulpr 1.1 will include a version of libprintsys which provides the printer enumeration functions, as well as a version of the gnulpr utilities that make use of them.

    3.1: Enumerating the printers

    Printers are simply lists of key->value pairs coupled to a list of names. To quote printsys.h:

    typedef struct{
      char *key;
      char *value;
    } GLPR_PairType;
    typedef struct{
      char **names;
      PairType *fields
    } GLPR_PrinterType;

    The printsys library provides a common interface to printer enumeration regardless of the database access method. The function glpr_init() uses a global configuration file to determine which set of functions actually do the work of reading in the database and returning data to the application.

    In the printcap case, the functions:

    int (*glpr_end)();
    char *(*glpr_get_first_prname)();
    GLPR_PrinterType *(*glpr_get_printer)(char *);
    int (*glpr_get_next_prname)(char **dest);
    char **(*glpr_get_printer_list)();
    would be mapped to:
    int printcap_end();
    char **printcap_get_printer_list();
    char *printcap_get_first_prname();
    int printcap_get_next_prname(char **dest);
    GLPR_PrinterType *printcap_get_printer(char *prname);

    The 1.1 implementation currently only has functions for parsing /etc/printcap, although other modules are in the works.

    4: gnulpr 1.2:

    By the release of 1.2, we plan to have modules for a number of printer enumeration methods, including SLP. The SLP module for libprintsys will search the network for a directory service and then fetch information from there.

    In the event that no SLP server is found, the system may start one and populate it, using libprinterconf to probe the network.

    5: gnulpr 2.0:

    While gnulpr 1.x series was based upon the old BSD lpd spooler, gnulpr 2.x will have a brand new spooler. Gnulpr is a much more modular design, focusing on simplicity, security, and flexibility.

    Thus, the network daemon will be one program, the local job submission daemon will be another, the network transmission filter will be another, etc. Each utility will Do One Thing And Do It Well, and the addition of a new network protocol or spooling policy is merely a matter of implementing a few new utility programs.

    The following is an initial design document based on Ben Woodard's notes on gnulpr 2.0:

    5.1: Submitting print jobs

    The function:

    int glpr_get_connection(char *prname, GLPR_PairType *attributes, int *job_id);

    returns a file descriptor which is a Unix domain socket connected to the print spooler daemon. Since this is a C function, and we can't possibly allow every application to be setuid, the authentication is done using the SO_PASSCRED socket option.

    Since the application is handed a file descriptor, it can do anything to it that you can do to a normal file descriptor. An application can put the fd into non-blocking mode. It can write to it, it can read from it, it can select on it etc... What it does with it is largely based upon the design of the program that is trying to print. For example if a program has a more or less a linear design and is not interactive it will probably just write to the file descriptor. If the program is a command line interactive program and responsiveness is important, then it will likely have to fork and let the child do the writing. If the program is a GUI application, which essentially is sitting in a select loop anyway, then it will add the print file descriptor into the select loop.

    For applications that don't want a raw file descriptor as their only interface to the print system, there will be a set of convenience functions. These will allow an application to print a whole file or stream reliably no matter what happens to the pipeline.

    This is in some ways different from the old LPR, and in some ways very similar. It is different in that you call a function and are given a file descriptor, and the program wanting to do the printing doesn't have to fork. However it is very similar in that after most applications fork, they simply pipe their data into the child lpr.

    5.2: The spooler

    As mentioned above, the spooler's interface basically consists of the function glpr_get_connection(). What happens on the back-end is not any scary black magic. It simply looks up the filter directory (part of the queue data structure returned by glpr_get_printer) and in that directory are a series of symlinks to filter scripts, structured in a similar manner to the init scripts found in /etc/rc3.d/. The order that filters are run is determined by the order that they appear in the sorted directory listing. This way it is very simple to see which programs are run and in what order. These filter scripts are all run and connected together into a pipeline with the input end being the other end of the socket that was handed to the application by glpr_get_connection. (We will probably have to put in the optimization that it doesn't actually fork a new process until the previous filter in the pipeline actually emits some data.)

    The structure that I had in my mind was to have /etc/printfilt.d/filters have the actual filters (analogous to /etc/init.d/) and having symlinks to the actual filters in a directory like /etc/printfilt.d/prtype_filters/hplj/ (analogous to /etc/rc3.d/'s symlinks to /etc/init.d/'s scripts).

    For backward compatibility with printcap there will be several workarounds designed to make old printcap files work the way that the user expects. If the filter directory isn't specified, it will construct a filter pipeline out of the input filter, the output filter and the queue manager (see below for information about the queue manager).

    There are many implications to having a pipeline model for printing. One of the most important to understand is that an application can block while waiting for a printer to actually print its data. Unix has long had many ways to deal with this situation. Some of the most commonly used are fork, non-blocking IO, and select or poll.

    Another implication is that if for some reason something goes wrong in the pipeline like a network printer closes the TCP connection unexpectantly or a print server crashes and has to reboot, then the application has to be prepared to deal with the fact it will get EPIPE when it tries to write to the file descriptor. In this case the application does not consider the print job as being sent and must decide what to do next. What it does is totally up to it. It may call glpr_get_connection again and try again or it may warn the user and give up.

    All these cases will be easily handled by convenience functions provided as part of libprintsys.

    5.3: Not queuing print jobs

    One thing that you might notice is that there is no notion of queuing in this model of a spooler. The spooler simply connects the filters and lets the data pass right on through. In many cases this might be just what you want.

    There is an old saying that I learned when I first started programming and that is, "1 buffer is good and speeds things up. 2 buffers or more slow things down." Right now in many cases most of the time it takes to get a printout is due to the fact that everybody tries to queue things. A very common example is an enterprise with a print spooler where everyone is running the LPR from BSD.

    1. You run some command and pipe it into lpr and it copies the file to the spooldir. The LPR command doesn't tell lpd to start printing it until the whole file is there.
    2. Now lpd on your system picks up the job and discovers that it is going to a remote queue on the "print server" machine and so it sends it to the print spooler.
    3. Now lpd on the print server machine waits until it has the whole file before it begins trying to print it.
    4. Fortunately most normal desktop business printers don't wait until they have the whole job before they start printing. They essentially treat each page as a separate print job. However, the bigger, faster printers do. The reason is that they expect to be the print server themselves and are designed with the assumption that they will be getting the print jobs from multiple sources simultaneously. In the testing I've done, there is no way to keep these printers operating at full speed with only one connection. In fact, if you want optimal performance out of even a desktop business printer, you need to constantly pass data into it.

    So in this situation you have at least 2 extra spoolers buffering all the data before the job is printed, possibly 3. This greatly slows down printing. The model I propose is the logical optimization of this.

    Another good reason to have a pipeline with the application on one end and the the printer on the other. With a lot of modern printers there are many things you can do if you can establish 2-way communication. This allows a program to establish 2-way communication with the printer even if a print server is between you and the printer. For example, you can find out the toner levels, you can find out what PS options are actually installed. You can get instantaneous reports of errors...

    Also, print jobs are getting bigger and bigger. The resources necessary to spool them to disk is growing. If we can stream them straight to the printer then the demands on the print server remain virtually constant. I've heard of one place which prints billboards. Each 6 foot wide section of the billboard is about 7Gb of data!

    There are also many different queueing policies. People want different things out of a job queue. If you don't believe me take a look at LPRng and see all the different features that Patrick Powell has had to build into his queueing engine.

    If people did not need anything more complicated than a simple FIFO, then I think the argument for having the queue engine built into the spooler would make more sense. However, here are some totally reasonable requirements:

    • Single in, multi-out: 5 identical printers. The job goes to the first available.
    • Sort by job size rather than by time job submitted.
    • Sort jobs first by organizational importance and then by time (print the boss's jobs first).
    • Sort jobs by when they first begin to arrive on the queue, as opposed to when submission is completed.
    • Hold and release. High volume print shops and copy centers like Kinko's use this a lot. They have a print queue represented on their screen, and incoming jobs just sit idle in the queue. They are released when the operator explicitly allows them to print.
    • Sort to minimize printer configuration changes: The printer is loaded with 3 hole drilled paper. Print all the jobs which require 3 hole drilled paper before switching to jobs which require card stock.
    • Keep printer flowing as best as possible: Printer is out of letter. Skip over the jobs that require letter paper and print any that require legal paper.
    • Spool all small jobs but if a job gets to be over X Mb then quit spooling it to disk and start sending it directly to the printer.
    • If there is disk space available, spool a job to disk. Otherwise connect the job directly to the printer.

    I could easily go on. I'm sure that given enough thought, someone can come up with a general way to deal with all these situations in one queueing engine. However, I (or more likely the users of gnulpr) will find a situation where this generic engine doesn't work. With the proposed system they have recourse to implement their own queueing policy.

    5.4: Queuing print print jobs

    As great as the optimization for printing directly is, there are many cases where you actually do want to have a print queue. This is easily handled. One of the filters is a queue manager. When it starts up, it grabs a lock on the printer and passes the data straight through to it. If another queue manager starts up and cannot get the lock, it drops the file into the spool directory. Just before the queue manager terminates, it checks the spool directory for other pending print jobs and prints them.

    In the filter pipeline, all filters before the queue manager would behave like input filters and all the filters after the queue manager would behave like output filters.

    One important consideration in designing filters is that filters before the queue manager will have the context of a specific print job as well as a specified printer. Filters that run after the queue manager will only have the context of a specified printer.

    One of nice things about LPR is that if you reboot the machine all the pending print jobs are not lost and they continue to print as soon as lpd starts. In LPR, jobs that have not been COMPLETELY delivered into the print spool are not considered to have been accepted. The same can be true of this system. The one difference is that with LPR, the time that a print job is in flight is actually pretty small. In this system, that time might be longer and so the chance for failure is higher. Also, in LPR the lpr command insulates the application from most failures. In this system, the application becomes aware of them because it receives an EPIPE error when it tries to write print data to a broken connection.

    When a queue manager is used and spooling occurs, problems such as a machine reboot may abandon jobs in the spool directory. This can be dealt with in exactly the same way that LPR, LPRng and Sendmail deal with it: at startup, check all the spool directories and print any jobs that are find there.

    5.5: Print job metadata

    The "context of a specific print job" is something that I haven't described yet. Notice that in the libprintsys function:

    int glpr_get_connection(char *prname, GLPR_PairType *attributes, unsigned int *jobid);

    there are three parameters. The first one is the name of the print queue that you want to submit the print job to. The second parameter is a set of attributes which you want to pass through to the filters and the third is a pointer to an integer which glpr_get_connection sets to the local job ID.

    These attributes can be anything that can be specified in terms of a key/value pair made of ASCII strings. Different filters along the pipeline will use different attributes. Some possible attributes may be:

    • the PPD options that a user wants applied to this print job
    • the document title that you want on the cover page
    • the type of file being printed

    Other attributes such as the user who printed the job and the current status of the print job are set by the spooler.

    The job ID is unique for the local system and is semi-persistent. At any given time there will be only one job with a particular job ID within the print system. The job ID will not be destroyed until long after the job has completed. This way an application which submitted a print job can query the print system using the proper job ID and find out the current status.

    GLPR_PairType *glpr_get_job_attributes(unsigned int jobid);

    Normal applications will only have read-only access to the database of job attributes, but the spooler and the filters will have read-write access to these filters.

    The way that these tickets will ultimately be destroyed is by way of a cron job that writes the information to a log file and then cleans out the oldest of the completed jobs.

    5.6: Filter parameters

    To make all these things work together properly all the filters will need to be able to be able to get at the two kinds of meta-data. The first piece of information is the information about the capabilities of the printer. The API provides functions to figure out printer capabilities from a printer name. The second piece of data is the job meta-data and this will be anything that can be gathered by the input engine.

    My thought is that each filters gets two parameters. The first one is the queue name for which the job was spooled and the second is a job ticket. It can request the job ticket information using a tdb (lightweight DBM) call. For example: jdsend -P printername -J 905127839

    5.7: Network Printing protocols

    I have only discussed printing on the local system, and I haven't mentioned any of the standard network printing protocols. The print spooler daemon in and of itself has no knowledge of networks or remote printing. Network printing will be handled by a collection of single-function applications which act as gateways between the various printing protocols and the local printing system. These gateways will be stand-alone implementations of certain network protocols such as LPR, IPP etc. Their job is basically to receive the print job and then pass it directly into the spooler as soon as possible, calling glpr_get_connection and then piping the print job data into the queue.

    Notice that each one of these is a separate program whose sole purpose is to implement the network protocol. This is one of the key principles of this printing system. Complexity is isolated in small individual applications with fairly simple interfaces connecting them. This is for several reasons:

    • It makes it more reliable. If the lpr-network program crashes it doesn't take down the whole printing system--just the lpr protocol.
    • It makes it easier for people to contribute. They don't have to figure out how to integrate their function into a larger program; they can simply write program which implements their particular function.
    • The system can have myriad features without getting bloated. The system administrator can easily choose how and which functions are used.
    • It makes it easier to isolate problems in the event of a bug.

    5.8: lpr

    For backward compatibility, all the BSD commands will be implemented. lpr will be very easy to implement. All that it needs to do is open the port and send the jobs. The spooler will have to have some additional bookkeeping so that lpq and friends can do their job reasonably well. It also needs to provide information about open connections.

    5.9: Type conversion

    Note: The 2.0 release will probably use one of the standard file conversion filters such as a2ps or magicfilter. The following design is being considered for later versions and is by no means final.

    One of the most important filters is the type conversion filter. Unless this is a highly controlled environment where the printer will never be sent a job that is in a format that it cannot understand natively, then this filter will be one of the first filters run. In a way this is sort of a macro filter. It is a filter which itself creates a pipeline of filters to do the necessary type conversions.

    I guess you could use a2ps or magicfilters to do this but I never liked the way that they worked. It seemed to me to be artificial intelligence bordering on black magic and too hard to understand.

    A simpler way to do it is to have the type conversion filter first check to see if the front end set the job type in the job ticket. If it didn't, then run a type detection program on the input. Once the type of the file is known, then the type conversion filter looks up which data formats the printer understands from the printer capabilities database. If they match then it just sends the data on through with no modifications. If they do not match then it looks in /etc/printfilt.d/convfilters/// and runs all the filters in the order that they are presented in there. If no match is found for all the possible destination types then the print filter simply passes the data through and potentially logs an error.

    For a concrete example, say you have a HP LJ 4050 (I do). It can handle PS, PCL6, and PJL. In my system this would mean that in the printer database the printer can handle:


    So the top level directory structure would look like:


    Then going down further on the PJL-wrapped-DeviceSpecific-PS2 branch we would have something that looked like (Obviously the files at the end of the branches are just symlinks back into another directory):


    In this way, we use the file system to store the complete pre-computed graph of all interconnections. Obviously, this will be a rather big file system. Regardless, I expect that it won't take up that much space. Creating a tree like this by hand would be virtually impossible. However, writing a program that does the full enumeration of the graph is fairly simple. Since all the computing it is done off-line, it doesn't have to be fast and so there is time to compute optimal paths. (Yes I know that it is an NP complete problem but it's only run once, and CPU cycles are really cheap these days.)

    Also if it does turn out to take up too much space in the file system it is just as easy to make the tree look like:


    Where the files have an ordered list of filters that should be run e.g.:

    $ cat /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/latex

    Since the type converter is just another filter, there is room for any number of implementations. This is just one implementation that I dreamed up. Heck include them all. If someone wants a table driven version and they are willing to write it, let them. Maybe someday some AI genius will write something that does all of this in real time. On my first pass I will probably just stick a2ps in there. One of the key principles of this system is that it is very easy to incorporate functionality.

    5.10: Print by reference

    One special case of type conversion is print by reference. This could be used several places. First of all, some network protocols mandate a print by reference capability. Also implementing lpr -s would mandate some print by reference capability. This could be implemented by having the spooling application set the file type as a reference type and then make the print data the complete reference for example lpr -s might set the "job-type" attribute to "file-reference" and then the send "/home/ben/src/lpr/printjob.c" as the print data. The dereference filter would see the file type is a file-reference and then open the file specified and output that instead of the the pathname. It would then be the type converter's job to convert it to the right format for the printer. This print-by-reference filter is the only special case so far where a program needs to be setuid root (and in this case only so that it can switch uid to the job's owner).

    5.11: Some proposed filters

    This is by no means a complete list. I think that the real key to this design is that people can very simply add filters to the filter pipeline to implement whatever feature they want. This will replace many of the features that LPRng was forced to implement within the spooling engine.

    sends to port 9100
    sends to a lpr device
    sends to another print server running this software
    sends to a smb server
    the queue filter described above.
    munge PPD
    divvy the jobs up among several printers.

    I'm sure that if I spent a bunch of time thinking about it I could come up with a dozen more print filters which would be fairly simple to implement and which would do implement a feature that someone needed.

    5.12: Important gnulpr 2.0 design features

    1. Simple core functionality. The core spooler program is exceedingly simple and easy to build and maintain in a secure way. It is also very easy to develop with.
    2. Very modular structure. It is trivial to add a new protocol or feature. Anyone who wants a new feature can easily add it. Also this is in close alignment with the UNIX ethos of having a program do one thing and do it well. Different parts of the system can be maintained by different people, fitting free software development rather well.
    3. It should be efficient in the trivial cases. A print server whose job is to just accept print jobs from windows clients and forward them onto network connected printers will have very little overhead. It will simply have one samba process per client writing to a file descriptor which writes directly into a program whose output is port 9100 of the printer.
    4. In the unqueued configurations, the job is never even written to disk, which means that disk space does not have to be larger than the size of the print job. Data simply streams straight through.
    5. It has a central API at its core. This API is exactly the same for filter writers as it is for applications doing printing. That way very little retraining is needed.
    6. You can plug in different data stores for the printer enumeration section. This means you can store the data in any way that you want as long as you implement the interface that we request. I think that this is a key selling point. Think of these possible data stores:
      for backward compatibility
      Damian Ivereigh has a distributed database called sddb which could trivially easily be adapted to provide the right information. This would easily provide a way for enterprise customers to create a high availability print system.
      local directory service
      a program discovers the printers nearby and it automatically broadcasts their existence. Any new client that shows up on the network simply inquires as to the local printers and it knows about all the printers in the vicinity.
      enterprise directory service
      when the printers are discovered, they are reported to a central repository which keeps a master database.

    The thing here is that with a little work we could make it such that you can basically have a zero effort client and printer setup. The only place where an admin would have to intervene would be to give a printer a name.

    All trademarks and copyrights on this page are properties of their respective owners. Forum comments are owned by the poster. The rest is copyright 1999-2001 Ben Woodard and VA Linux Systems, Inc.