| |||||
|
|
Note: A version of this document can be read at http://lpr.sourceforge.net/gnulpr-announcement.html
1: The need for a printing systemIn http://www.linuxprinting.org/lpd-must-die.html, Grant makes several arguments for the death of LPR and lpd. The highlighted points are:
Grant stated that his ideal printing system would:
This is likely not new to anyone who has worked with LPR and had to deal with its insufficiencies. LPR was cobbled together (likely over a weekend) to meet a DARPA deadline. It fulfilled a checklist, and that was it. Grant is not alone in this. The Free Software Foundation has noticed that LPR has forked into a number of poorly-maintained or propretary versions. Also, the GNU project lacks a printing system, so Richard Stallman approached Ben Woodard about taking over LPR maintainership for project GNU. Since then, we have been working hard to prepare the gnulpr 1.0 release. To download the latest development snapshot, simply run the following commands: cvs -d:pserver:anonymous@cvs.lpr.sourceforge.net:/cvsroot/lpr login There are two gnulpr-related mailing lists hosted on sourceforge. The lpr-announce list is a standard announcement list for notices about new releases and events. You can subscribe or read the archives at: http://lists.sourceforge.net/mailman/listinfo/lpr-announce The lpr-discuss list is an open discussion list for conversation about architecture, features, patches, and so forth. You can subscrib to lpr-discuss at: http://lists.sourceforge.net/mailman/listinfo/lpr-discuss What follows is a collection of notes, gleaned from various e-mails and discussions, that provide a tentative development roadmap for gnulpr. 2: gnulpr 1.0:
The real problem is that LPR is not actually a printing system. It is merely a queue manager, job submission utility, and print job submitter. All of the real work is handed off to filter programs and utilities. The work necessary to make a complete printing system from LPR is substantial, especially if any advanced features are needed. Configuration, data conversion, and user feedback are all handled by different tools. gnulpr 1.0 is what we've been referring to internally as an "anthology release". We've compiled all of the different pieces of an LPR-based printing system into one CVS tree, similar to the Cygnus toolchain tree. The pieces still stand alone as independent packages (and will build independently), but one can run the top-level configure script to build and install a complete system. 2.1: LPR
Up to now, simplicity has been LPR's saving grace: it doesn't try to do too much, and so all of the other pieces have been able to evolve without LPR getting in the way. But we're at the point where adding features to the queue management and job sumbission components are necessary, and nobody wants to touch LPR. The version of LPR included in the anthology has two important features added:
Most large statically-allocated arrays have also been replaced with dynamically allocated data structures. 2.2: printfilters
The filter scripts that LPR uses perform the logic of determining source and target file formats and the filters needed to translate between the two. The printfilters included in gnulpr 1.0 have been enhanced to allow PPD-based filtering and generate device-specific PostScript for a given printer. 2.3: libppd
One of the limitiations of current print systems is that there has been no way for applications to find out about and access printer capabilities. Libppd provides an API for programs to find out about and select device capabilities when genereating PS print jobs. It can be used both with PS printers as well as well as printers that go through GhostScript (provided that the particular GS driver has an associated PPD file). The foo-o-matic program is reported to make PPD files for arbitrary GhostScript drivers. 2.4: gpr
Since not all applications link to libppd, gpr is a client application that uses the UI hints included in a PPD file to provide GTK dialogs for enabling device-specific features. It is designed as a drop-in replacement for the lpr binary (although it does call our modified lpr client). 2.5: printerconf
This is a library of functions for doing network autodetection of printers using SNMP or IEEE 1284's parallel port discovery protocol. Most modern printers have very sophisticated network configuration capabilites. The future plan for libprinterconf is to provide an API to discover or set a printer's current status configuration (currently this is only available in the npadmin utility, below). 2.6: npadmin
Right now npadmin is a stand alone program that allows printer discovery and read only access to status and configuration. In the future, it is planned that it will be merely a command line front-end to libprinterconf's query and configuration capability. 2.7: snmpkit
This library provides functions to make using SNMP easier. The printerconf library uses it to make many simultaneous SNMP connections. 2.8: printtool
This TCL/Tk program provides a simple GUI for the configuration of printers. This version uses libprinterconf to attempt autodetection. 2.9: tdb
This is a lightweight DBM implementation originally developed for SAMBA. We use this library in a number of utilities. 3: gnulpr 1.1:
While 1.0 is nearly debugged and complete, our development efforts have been pushing forward version 1.1. As of this writing, it is nowhere near as complete as 1.0, but we have been adding and testing many new features. The main thrust with gnulpr 1.1 is to provide an API for enumerating printers and their configurations and controlling the print spooler. This library will be used internally within gnulpr as well as exported so that client applications may make use of it. While not slated to be complete until gnulpr 2.0, gnulpr 1.1 will include a version of libprintsys which provides the printer enumeration functions, as well as a version of the gnulpr utilities that make use of them. 3.1: Enumerating the printers
Printers are simply lists of key->value pairs coupled to a list of names. To quote printsys.h: typedef struct{ char *key; char *value; } GLPR_PairType; typedef struct{ char **names; PairType *fields } GLPR_PrinterType; The printsys library provides a common interface to printer enumeration regardless of the database access method. The function glpr_init() uses a global configuration file to determine which set of functions actually do the work of reading in the database and returning data to the application. In the printcap case, the functions: int (*glpr_end)(); char *(*glpr_get_first_prname)(); GLPR_PrinterType *(*glpr_get_printer)(char *); int (*glpr_get_next_prname)(char **dest); char **(*glpr_get_printer_list)();would be mapped to: int printcap_end(); char **printcap_get_printer_list(); char *printcap_get_first_prname(); int printcap_get_next_prname(char **dest); GLPR_PrinterType *printcap_get_printer(char *prname); The 1.1 implementation currently only has functions for parsing /etc/printcap, although other modules are in the works. 4: gnulpr 1.2:
By the release of 1.2, we plan to have modules for a number of printer enumeration methods, including SLP. The SLP module for libprintsys will search the network for a directory service and then fetch information from there. In the event that no SLP server is found, the system may start one and populate it, using libprinterconf to probe the network. 5: gnulpr 2.0:
While gnulpr 1.x series was based upon the old BSD lpd spooler, gnulpr 2.x will have a brand new spooler. Gnulpr is a much more modular design, focusing on simplicity, security, and flexibility. Thus, the network daemon will be one program, the local job submission daemon will be another, the network transmission filter will be another, etc. Each utility will Do One Thing And Do It Well, and the addition of a new network protocol or spooling policy is merely a matter of implementing a few new utility programs. The following is an initial design document based on Ben Woodard's notes on gnulpr 2.0: 5.1: Submitting print jobs
The function: int glpr_get_connection(char *prname, GLPR_PairType *attributes, int *job_id); returns a file descriptor which is a Unix domain socket connected to the print spooler daemon. Since this is a C function, and we can't possibly allow every application to be setuid, the authentication is done using the SO_PASSCRED socket option. Since the application is handed a file descriptor, it can do anything to it that you can do to a normal file descriptor. An application can put the fd into non-blocking mode. It can write to it, it can read from it, it can select on it etc... What it does with it is largely based upon the design of the program that is trying to print. For example if a program has a more or less a linear design and is not interactive it will probably just write to the file descriptor. If the program is a command line interactive program and responsiveness is important, then it will likely have to fork and let the child do the writing. If the program is a GUI application, which essentially is sitting in a select loop anyway, then it will add the print file descriptor into the select loop. For applications that don't want a raw file descriptor as their only interface to the print system, there will be a set of convenience functions. These will allow an application to print a whole file or stream reliably no matter what happens to the pipeline. This is in some ways different from the old LPR, and in some ways very similar. It is different in that you call a function and are given a file descriptor, and the program wanting to do the printing doesn't have to fork. However it is very similar in that after most applications fork, they simply pipe their data into the child lpr. 5.2: The spooler
As mentioned above, the spooler's interface basically consists of the function glpr_get_connection(). What happens on the back-end is not any scary black magic. It simply looks up the filter directory (part of the queue data structure returned by glpr_get_printer) and in that directory are a series of symlinks to filter scripts, structured in a similar manner to the init scripts found in /etc/rc3.d/. The order that filters are run is determined by the order that they appear in the sorted directory listing. This way it is very simple to see which programs are run and in what order. These filter scripts are all run and connected together into a pipeline with the input end being the other end of the socket that was handed to the application by glpr_get_connection. (We will probably have to put in the optimization that it doesn't actually fork a new process until the previous filter in the pipeline actually emits some data.) The structure that I had in my mind was to have /etc/printfilt.d/filters have the actual filters (analogous to /etc/init.d/) and having symlinks to the actual filters in a directory like /etc/printfilt.d/prtype_filters/hplj/ (analogous to /etc/rc3.d/'s symlinks to /etc/init.d/'s scripts). For backward compatibility with printcap there will be several workarounds designed to make old printcap files work the way that the user expects. If the filter directory isn't specified, it will construct a filter pipeline out of the input filter, the output filter and the queue manager (see below for information about the queue manager). There are many implications to having a pipeline model for printing. One of the most important to understand is that an application can block while waiting for a printer to actually print its data. Unix has long had many ways to deal with this situation. Some of the most commonly used are fork, non-blocking IO, and select or poll. Another implication is that if for some reason something goes wrong in the pipeline like a network printer closes the TCP connection unexpectantly or a print server crashes and has to reboot, then the application has to be prepared to deal with the fact it will get EPIPE when it tries to write to the file descriptor. In this case the application does not consider the print job as being sent and must decide what to do next. What it does is totally up to it. It may call glpr_get_connection again and try again or it may warn the user and give up. All these cases will be easily handled by convenience functions provided as part of libprintsys. 5.3: Not queuing print jobs
One thing that you might notice is that there is no notion of queuing in this model of a spooler. The spooler simply connects the filters and lets the data pass right on through. In many cases this might be just what you want. There is an old saying that I learned when I first started programming and that is, "1 buffer is good and speeds things up. 2 buffers or more slow things down." Right now in many cases most of the time it takes to get a printout is due to the fact that everybody tries to queue things. A very common example is an enterprise with a print spooler where everyone is running the LPR from BSD.
So in this situation you have at least 2 extra spoolers buffering all the data before the job is printed, possibly 3. This greatly slows down printing. The model I propose is the logical optimization of this. Another good reason to have a pipeline with the application on one end and the the printer on the other. With a lot of modern printers there are many things you can do if you can establish 2-way communication. This allows a program to establish 2-way communication with the printer even if a print server is between you and the printer. For example, you can find out the toner levels, you can find out what PS options are actually installed. You can get instantaneous reports of errors... Also, print jobs are getting bigger and bigger. The resources necessary to spool them to disk is growing. If we can stream them straight to the printer then the demands on the print server remain virtually constant. I've heard of one place which prints billboards. Each 6 foot wide section of the billboard is about 7Gb of data! There are also many different queueing policies. People want different things out of a job queue. If you don't believe me take a look at LPRng and see all the different features that Patrick Powell has had to build into his queueing engine. If people did not need anything more complicated than a simple FIFO, then I think the argument for having the queue engine built into the spooler would make more sense. However, here are some totally reasonable requirements:
I could easily go on. I'm sure that given enough thought, someone can come up with a general way to deal with all these situations in one queueing engine. However, I (or more likely the users of gnulpr) will find a situation where this generic engine doesn't work. With the proposed system they have recourse to implement their own queueing policy. 5.4: Queuing print print jobs
As great as the optimization for printing directly is, there are many cases where you actually do want to have a print queue. This is easily handled. One of the filters is a queue manager. When it starts up, it grabs a lock on the printer and passes the data straight through to it. If another queue manager starts up and cannot get the lock, it drops the file into the spool directory. Just before the queue manager terminates, it checks the spool directory for other pending print jobs and prints them. In the filter pipeline, all filters before the queue manager would behave like input filters and all the filters after the queue manager would behave like output filters. One important consideration in designing filters is that filters before the queue manager will have the context of a specific print job as well as a specified printer. Filters that run after the queue manager will only have the context of a specified printer. One of nice things about LPR is that if you reboot the machine all the pending print jobs are not lost and they continue to print as soon as lpd starts. In LPR, jobs that have not been COMPLETELY delivered into the print spool are not considered to have been accepted. The same can be true of this system. The one difference is that with LPR, the time that a print job is in flight is actually pretty small. In this system, that time might be longer and so the chance for failure is higher. Also, in LPR the lpr command insulates the application from most failures. In this system, the application becomes aware of them because it receives an EPIPE error when it tries to write print data to a broken connection. When a queue manager is used and spooling occurs, problems such as a machine reboot may abandon jobs in the spool directory. This can be dealt with in exactly the same way that LPR, LPRng and Sendmail deal with it: at startup, check all the spool directories and print any jobs that are find there. 5.5: Print job metadata
The "context of a specific print job" is something that I haven't described yet. Notice that in the libprintsys function: int glpr_get_connection(char *prname, GLPR_PairType *attributes, unsigned int *jobid); there are three parameters. The first one is the name of the print queue that you want to submit the print job to. The second parameter is a set of attributes which you want to pass through to the filters and the third is a pointer to an integer which glpr_get_connection sets to the local job ID. These attributes can be anything that can be specified in terms of a key/value pair made of ASCII strings. Different filters along the pipeline will use different attributes. Some possible attributes may be:
Other attributes such as the user who printed the job and the current status of the print job are set by the spooler. The job ID is unique for the local system and is semi-persistent. At any given time there will be only one job with a particular job ID within the print system. The job ID will not be destroyed until long after the job has completed. This way an application which submitted a print job can query the print system using the proper job ID and find out the current status. GLPR_PairType *glpr_get_job_attributes(unsigned int jobid); Normal applications will only have read-only access to the database of job attributes, but the spooler and the filters will have read-write access to these filters. The way that these tickets will ultimately be destroyed is by way of a cron job that writes the information to a log file and then cleans out the oldest of the completed jobs. 5.6: Filter parameters
To make all these things work together properly all the filters will need to be able to be able to get at the two kinds of meta-data. The first piece of information is the information about the capabilities of the printer. The API provides functions to figure out printer capabilities from a printer name. The second piece of data is the job meta-data and this will be anything that can be gathered by the input engine. My thought is that each filters gets two parameters. The first one is the queue name for which the job was spooled and the second is a job ticket. It can request the job ticket information using a tdb (lightweight DBM) call. For example: jdsend -P printername -J 905127839 5.7: Network Printing protocols
I have only discussed printing on the local system, and I haven't mentioned any of the standard network printing protocols. The print spooler daemon in and of itself has no knowledge of networks or remote printing. Network printing will be handled by a collection of single-function applications which act as gateways between the various printing protocols and the local printing system. These gateways will be stand-alone implementations of certain network protocols such as LPR, IPP etc. Their job is basically to receive the print job and then pass it directly into the spooler as soon as possible, calling glpr_get_connection and then piping the print job data into the queue. Notice that each one of these is a separate program whose sole purpose is to implement the network protocol. This is one of the key principles of this printing system. Complexity is isolated in small individual applications with fairly simple interfaces connecting them. This is for several reasons:
5.8: lpr
For backward compatibility, all the BSD commands will be implemented. lpr will be very easy to implement. All that it needs to do is open the port and send the jobs. The spooler will have to have some additional bookkeeping so that lpq and friends can do their job reasonably well. It also needs to provide information about open connections. 5.9: Type conversion
Note: The 2.0 release will probably use one of the standard file conversion filters such as a2ps or magicfilter. The following design is being considered for later versions and is by no means final. One of the most important filters is the type conversion filter. Unless this is a highly controlled environment where the printer will never be sent a job that is in a format that it cannot understand natively, then this filter will be one of the first filters run. In a way this is sort of a macro filter. It is a filter which itself creates a pipeline of filters to do the necessary type conversions. I guess you could use a2ps or magicfilters to do this but I never liked the way that they worked. It seemed to me to be artificial intelligence bordering on black magic and too hard to understand.
A simpler way to do it is to have the type conversion filter first
check to see if the front end set the job type in the job ticket. If it
didn't, then run a type detection program on the input. Once the type
of the file is known, then the type conversion filter looks up which
data formats the printer understands from the printer capabilities
database. If they match then it just sends the data on through with no
modifications. If they do not match then it looks in
/etc/printfilt.d/convfilters/ For a concrete example, say you have a HP LJ 4050 (I do). It can handle PS, PCL6, and PJL. In my system this would mean that in the printer database the printer can handle: PJL-wrapped-PCL1 PJL-wrapped-PCL2 PJL-wrapped-PCL3 PJL-wrapped-PCL4 PJL-wrapped-PCL5 PJL-wrapped-PCL6 PJL-wrapped-DeviceSpecific-PS1 PJL-wrapped-DeviceSpecific-PS2 So the top level directory structure would look like: /etc/printfilt.d/convfilters/PJL-wrapped-PCL1 /etc/printfilt.d/convfilters/PJL-wrapped-PCL2 /etc/printfilt.d/convfilters/PJL-wrapped-PCL3 /etc/printfilt.d/convfilters/PJL-wrapped-PCL4 /etc/printfilt.d/convfilters/PJL-wrapped-PCL5 /etc/printfilt.d/convfilters/PJL-wrapped-PCL6 /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS1 /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2 Then going down further on the PJL-wrapped-DeviceSpecific-PS2 branch we would have something that looked like (Obviously the files at the end of the branches are just symlinks back into another directory): /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/PS2/000ppdfilt /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/PS2/001PJLwrap /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/DeviceSpecific-PS2/000PJLwrap /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/PJL-wrapped-PS2/000PJLstrip /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/PJL-wrapped-PS2/001ppdfilt /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/PJL-wrapped-PS2/002PJLwrap /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/dvi/000dvips /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/dvi/001ppdfilt /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/dvi/002pjlwrap /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/latex/000latex /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/latex/001dvips /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/latex/002ppdfilt /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/latex/003pjlwrap /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/tex/000tex /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/tex/001dvips /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/tex/002ppdfilt /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/tex/003pjlwrap /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/troff/000groff /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/troff/001ppdfilt /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/troff/002pjlwrap ... In this way, we use the file system to store the complete pre-computed graph of all interconnections. Obviously, this will be a rather big file system. Regardless, I expect that it won't take up that much space. Creating a tree like this by hand would be virtually impossible. However, writing a program that does the full enumeration of the graph is fairly simple. Since all the computing it is done off-line, it doesn't have to be fast and so there is time to compute optimal paths. (Yes I know that it is an NP complete problem but it's only run once, and CPU cycles are really cheap these days.) Also if it does turn out to take up too much space in the file system it is just as easy to make the tree look like: /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/PS2 /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/DeviceSpecific-PS2 /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/PJL-wrapped-PS2 /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/dvi /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/latex /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/tex /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/troff Where the files have an ordered list of filters that should be run e.g.: $ cat /etc/printfilt.d/convfilters/PJL-wrapped-DeviceSpecific-PS2/latex /etc/printfilt.d/convfilters.d/latex /etc/printfilt.d/convfilters.d/dvips /etc/printfilt.d/convfilters.d/ppdfilt /etc/printfilt.d/convfilters.d/PJLwrap Since the type converter is just another filter, there is room for any number of implementations. This is just one implementation that I dreamed up. Heck include them all. If someone wants a table driven version and they are willing to write it, let them. Maybe someday some AI genius will write something that does all of this in real time. On my first pass I will probably just stick a2ps in there. One of the key principles of this system is that it is very easy to incorporate functionality. 5.10: Print by reference
One special case of type conversion is print by reference. This could be used several places. First of all, some network protocols mandate a print by reference capability. Also implementing lpr -s would mandate some print by reference capability. This could be implemented by having the spooling application set the file type as a reference type and then make the print data the complete reference for example lpr -s might set the "job-type" attribute to "file-reference" and then the send "/home/ben/src/lpr/printjob.c" as the print data. The dereference filter would see the file type is a file-reference and then open the file specified and output that instead of the the pathname. It would then be the type converter's job to convert it to the right format for the printer. This print-by-reference filter is the only special case so far where a program needs to be setuid root (and in this case only so that it can switch uid to the job's owner). 5.11: Some proposed filters
This is by no means a complete list. I think that the real key to this design is that people can very simply add filters to the filter pipeline to implement whatever feature they want. This will replace many of the features that LPRng was forced to implement within the spooling engine.
I'm sure that if I spent a bunch of time thinking about it I could come up with a dozen more print filters which would be fairly simple to implement and which would do implement a feature that someone needed. 5.12: Important gnulpr 2.0 design features
The thing here is that with a little work we could make it such that you can basically have a zero effort client and printer setup. The only place where an admin would have to intervene would be to give a printer a name. |
All trademarks and copyrights on this page are properties of their respective owners. Forum comments are owned by the poster. The rest is copyright ©1999-2001 Ben Woodard and VA Linux Systems, Inc. |