An Introduction to groff
roff, short for "run off", is an old-school document formatting language [1] that is readily available for Linux systems in the form of GNU roff (groff), troff, and nroff. It's significantly lighter-weight than something like LaTex, but still allows for fairly complex document formatting tasks. It is also used to render man pages, so it's worth learning for that alone if you're interested in Linux. Beyond manuals, it can be used for formatting books, papers, etc. You can render your document in a variety of different formats, including HTML, PostScript [2], and pdf.
As a classic Unix utility, roff itself is implemented through several distinct programs that are called one after another in a pipeline to produce the desired output document. I'm going to address roff initially in terms of these individual programs and pipelining, however we will transition to the more modern approach of using groff instead to do much of the work in one command.
The roff Pipeline
roff comes in many flavours. Historically, the major two were
troff(1) and nroff(1). troff(1)
was used to produce output for typesetters, and nroff(1)
for computer terminals. These programs both accept roff code as input,
and output a standardized intermediate format for further processing. In
modern contexts, these two programs are merged together into the
groff(1) front-end. We'll start out using
troff(1) standalone, though.
A general pipeline for creating a document using
troff(1) is,
cat input.tr | preprocessor | troff | postprocessor > output.ext
In this example, input.tr is the source file describing
the document. Preprocessors are programs that handle more complex
formatting tasks like equations and tables, which are not actually part
of the roff engine itself. The common preprocessors are,
- eqn (for equations)
- grn (for pictures)
- pic (for simple block-diagrams [think mermaid])
- chem (for chemical diagrams)
- refer (for bibliographies and references)
- tble (for tables)
We won't be discussing these in this article, but information on them
may come later. These preprocessors generate roff code, which is then
converted into an intermediate output by the troff(1)
program. Finally, this intermediate output is fed into a postprocessor,
to produce output of the desired format.
Getting Started with troff
troff itself is a very simple program to use. It will accept text on stdin and will write the intermediate output to stdout.For example, we could simply run the program like so,
troff
hello, world!
^D
and we get the following as output,
x T ps x res 72000 1 1 x init p1 x font 5 TR f5 s10000 V12000 H72000 md DFd thello, wh2500 tw H104120 torld! n12000 0 x trailer V792000 x stop
Of course, why manually type the data into standard input when you
can use a file. troff(1) will accept content on standard
input, or you can specify a filename as an argument,
echo hello, world! > hello.trtroff hello.tr
Postprocessing: Creating a pdf
Now, we could easily save this output to a file using redirection, but if our purpose is to produce a document for distribution, this probably isn't what we want to do. The roff intermediate output isn't exactly suitable for end-user consumption! We'll need to run our file through a postprocessor first, to create a readable document.
We'll create a pdf file here, and so we will use the
gropdf(1) postprocessor. gropdf(1) accepts
troff(1) output on stdin, and writes a pdf to stdout. If
you run,
troff hello.tr | gropdf
you will get a viable pdf file, however gropdf(1) will
also display the following error message,
Expecting a pdf pipe (got ps)
The problem is that we need to give troff(1) a heads up
that we intend to target our document to a pdf file. This is done using
the -Tpdf argument. If we don't do this,
troff(1) will assume that we're targeting PostScript, and
so the file won't be set up appropriately for gropdf(1) to
do its thing. Thus, our final command is,
troff -Tpdf hello.tr | gropdf > output.pdf
I've uploaded the output file from this command here, so you can see what the output should be. Opening it up in Evince yields,
Admittedly, it isn't the most impressive of documents. But it is a start!
Skipping the Pipelines with groff
One common complaint about the roff system is how long the processing pipelines can get. It's already a fair bit of typing to make our pdf above--now imagine if you wanted equations, pictures, and tables in your document too. That's three more commands we need to add to the pipeline!
As a result, the modern convention is to bypass all this pipelining
by using the groff(1) front-end to roff. This program will
allow us to specify options telling it to run certain preprocessors or
postprocessors, and thereby allowing us to skip writing the long
pipelines.
To create our output pdf using groff(1), we need only
execute the following command,
groff -Tpdf hello.tr > output.pdf
When the -T option is specified, groff(1)
will automatically call the relevant postprocessor, so we don't need to!
If no -T option is provided, groff(1) will
default to PostScript output.
In recent references, I have seen a different process listed for creating a pdf using the ps2pdf command. This command takes PostScript input and produces a pdf from it. By taking advantage of the fact that groff defaults to producing PostScript, a pdf can be created using,
groff hello.tr | ps2pdf - output.pdf
I'm not entirely sure why these authors use this approach, ratherthan the postprocessor for creating a pdf directly, but I want to mention it here as you are sure to encounter it in your searches.
Requests and Macros
roff code consists of a sequence of lines, which can be classified as either text lines or control lines. A text line, unsurprisingly, contains text that should appear in the document. A control line represents a command, and will always start with either a . or a ' , followed by a command and any arguments that it might have. These control lines are what are used to format the document.
The standard format of a control line is,
.command_name arg1 arg2
The most common kind of command is a
Generally speaking, these requests are too low level to be useful to
somebody just trying to write a paper for their English class (sure, you
can do that in roff, why not?), and so roff also includes
Technically, you could write these yourself out of roff requests. And
if you'd like to give it a shot, the groff(7) man page does
include a reference for all the supported requests (make sure to run
man 7 groff to get the right document), but
groff(1) also comes with a number of pre-packaged macro
packages that you can use.
- man
- mandoc
- mdoc
- me
- mm
- ms
- www
Others are available too, such as mom.
While each is a little different, many share some common elements
when it comes to basic formatting. I'll use the me package for the rest
of this article [3]. To load the package, pass the
-me argument to groff(1).
Formatting hello.tr
As a first application of formatting, let's modify our
hello.tr file to include bold and italics. We'll render
"hello," in bold, and "world!" in italics. The macros for bold and
italics are, unsurprisingly, .b and .i
respectively. So, we have,
cat formatted.me.b hello, .i world!groff -me -Tpdf formatted.me > formatted.pdf
To center the text, we can use the .(c and
.)c macros. These should surround the text to be centered,
as below,
cat centered.me.(c .b hello, .i world! .)cgroff -me -Tpdf formatted.me > formatted.pdf
A listing of many common me macros can be found by reading the
groff_me(7) man page. I'll introduce a few more as we go,
but you should look there for a more complete list.
Formatting a Simple Paper
Applying some simple formatting directives to a couple of words is a good start, but applying it to a larger paper is a bit more complex. So let's give that a shot. Let's grab the first bit of text from This Side of Paradise, courtesy of Project Gutenberg [4]. In raw form, we have
BOOK ONE--The Romantic Egotist CHAPTER 1. Amory, Son of Beatrice Amory Blaine inherited from his mother every trait, except the stray inexpressible few, that made him worth while. His father, an ineffectual, inarticulate man with a taste for Byron and a habit of drowsing over the Encyclopedia Britannica, grew wealthy at thirty through the death of two elder brothers, successful Chicago brokers, and in the first flush of feeling that the world was his, went to Bar Harbor and met Beatrice O'Hara. In consequence, Stephen Blaine handed down to posterity his height of just under six feet and his tendency to waver at crucial moments, these two abstractions appearing in his son Amory. For many years he hovered in the background of his family's life, an unassertive figure with a face half-obliterated by lifeless, silky hair, continually occupied in "taking care" of his wife, continually harassed by the idea that he didn't and couldn't understand her. But Beatrice Blaine! There was a woman! Early pictures taken on her father's estate at Lake Geneva, Wisconsin, or in Rome at the Sacred Heart Convent--an educational extravagance that in her youth was only for the daughters of the exceptionally wealthy--showed the exquisite delicacy of her features, the consummate art and simplicity of her clothes. A brilliant education she had--her youth passed in renaissance glory, she was versed in the latest gossip of the Older Roman Families; known by name as a fabulously wealthy American girl to Cardinal Vitori and Queen Margherita and more subtle celebrities that one must have had some culture even to have heard of. She learned in England to prefer whiskey and soda to wine, and her small talk was broadened in two senses during a winter in Vienna. All in all Beatrice O'Hara absorbed the sort of education that will be quite impossible ever again; a tutelage measured by the number of things and people one could be contemptuous of and charming about; a culture rich in all arts and traditions, barren of all ideas, in the last of those days when the great gardener clipped the inferior roses to produce one perfect bud.
Let's start by bolding and centering the BOOK ONE line.
.(c.b BOOK ONE - The Romantic Egotist.)c
The above roff seems reasonable. However, when we run it and take a look we see something rather unexpected,
The text got centered, but of all the words we typed only two of them are present, and of those only the first is bold! What gives?
Well, it has to do with the way that arguments to requests/macros work. Just like arguments on the command line, arguments to requests are separated by spaces. The bold macro makes its first argument bold and appends the second to it (not bold). So, the output that we got should make sense. The other four arguments to the macro were ignored.
Just like on the shell, if we want to have spaces in an argument, we'll need to wrap the whole thing in quotes. So this,
.(c.b "BOOK ONE - The Romantic Egotist".)c
is actually what we want.
Bearing this lesson in mind, let's place the chapter title next. We'll bold the words CHAPTER 1, but leave the name in normal face.
.(c.b "BOOK ONE - The Romantic Egotist".)c.b "CHAPTER 1."Amory, Son of Beatrice
Notice that roff automatically did a line break after we ended the centering on line 3. Generally, roff will handle line breaks, paragraphs, etc., for us, as long as we tell it where to put them (we'll see this in just a moment). In fact, the manual advises against leaving any blank lines in your input file.
That will result in a rather ugly and hard to read file, though. If you've programmed before, you know that spacing things out is quite useful. So there is a way to do this. Simply start the line with a ., and don't give a command. So we can space our book and chapter titles out a little bit like this,
.(c.b "BOOK ONE - The Romantic Egotist".)c..b "CHAPTER 1."Amory, Son of Beatrice
This change is purely cosmetic. It won't affect the actual output file.
What if we actually wanted to add some blank space in our output
file? Those two lines are awfully close together, and it doesn't look
all that great. For this we will use a roff request, .sp.
This isn't a macro, but just looking at the file, you'd never be able to
tell the difference.
.(c.b "BOOK ONE - The Romantic Egotist".)c..sp 2..b "CHAPTER 1."Amory, Son of Beatrice
This request accepts a numerical argument that states how many lines
to skip, so here we are adding two blank lines between the book and
chapter title. However, although this works, it does go against the
manual for groff_me(7), which states that the
.sp request can only be safely used with me after the first
call to .pp (the paragraph macro).
The trouble is that a .pp will result in the title
getting indented, which we don't want. However there is also an
.lp macro, which creates a paragraph with no indent. The
documentation doesn't explicitly say that this is safe, but if things
work the way I think they do, it should be just as good. Just be aware
that this is a "your mileage may vary" moment. I've never had it cause
an issue, but we are going against the direct advice of the manual
here.
.lp.(c.b "BOOK ONE - The Romantic Egotist".)c..sp 2..lp.b "CHAPTER 1."Amory, Son of Beatrice
While I was at it, I threw another .lp before the
chapter title, just to be explicit. If you haven't gathered already, the
.lp and .pp macros are what we will be using
to add paragraph breaks in general. They do add a small amount of extra
blank space as well. If it isn't obvious from this example of their use,
you'll be able to see it once we start adding extra paragraphs.
Okay, let's bring in the first paragraph of text. We'll use the
.lp macro to start this one off, as this will be the first
paragraph in the chapter, and I prefer to leave this unindented. The
.pp macro will work much the same, except it will add an
indent. We'll see this one in action for the second paragraph.
.lp.(c.b "BOOK ONE - The Romantic Egotist".)c..sp 2..lp.b "CHAPTER 1."Amory, Son of Beatrice..lpAmory Blaine inherited from his mother every trait, except thestray inexpressible few, that made him worth while. His father,an ineffectual, inarticulate man with a taste for Byron and ahabit of drowsing over the Encyclopedia Britannica, grew wealthyat thirty through the death of two elder brothers, successfulChicago brokers, and in the first flush of feeling that the worldwas his, went to Bar Harbor and met Beatrice O'Hara. Inconsequence, Stephen Blaine handed down to posterity his heightof just under six feet and his tendency to waver at crucialmoments, these two abstractions appearing in his son Amory. Formany years he hovered in the background of his family's life, anunassertive figure with a face half-obliterated by lifeless,silky hair, continually occupied in "taking care" of his wife,continually harassed by the idea that he didn't and couldn'tunderstand her.
A cursory glance would seem to indicate that this all looks okay, however if you look carefully you'll see that there are a few issues. For example, we see the word didnât, instead of didn't. This is a fairly standard encoding problem--the text we are using as input to groff contains "curly" quotation marks, which groff's standard text encoding can't handle. So, let's go through and swap all of the fancy quotation marks for normal ones, and try again.
And that one worked. In fact, if you look at it, all of the single quotes are rendered as the curly ones anyway. Groff will automatically handle translating from the standard "straight" single quotes into curly ones. However, the double quotes are still straight and (some might say) boring.
Luckily, we can fix this too. If we replace the first quotation mark
with `` and the second with '' (that is, two backticks for the first
and two single quotes for the second), groff(1) will give us the
curly quotes there too! It just needs to know which one is the opening
quote (noted by the backticks) and which is the closing one (noted by
the single quotes), so it knows what the quotes should look like.
While we are making edits, this document actually does not meet the
recommendations in the manual for groff(7). Just like the
manual recommends that we avoid adding blank lines to the file, it also
recommends that we start each new sentence on its own line. This won't
actually change the output (again, it's just cosmetic), but it might
help us stay organized. So let's make that edit too,
.lp.(c.b "BOOK ONE - The Romantic Egotist".)c..sp 2..lp.b "CHAPTER 1."Amory, Son of Beatrice..lpAmory Blaine inherited from his mother every trait, except thestray inexpressible few, that made him worth while. His father,an ineffectual, inarticulate man with a taste for Byron and ahabit of drowsing over the Encyclopedia Britannica, grew wealthyat thirty through the death of two elder brothers, successfulChicago brokers, and in the first flush of feeling that the worldwas his, went to Bar Harbor and met Beatrice O'Hara..In consequence, Stephen Blaine handed down to posterity his height ofjust under six feet and his tendency to waver at crucial moments, thesetwo abstractions appearing in his son Amory..For many years he hovered in the background of his family's life, anunassertive figure with a face half-obliterated by lifeless, silky hair,continually occupied in ``taking care'' of his wife, continually harassedby the idea that he didn't and couldn't understand her..pp
Okay, so far so good! Now, let's add the next paragraph of text. I've
already added the .pp macro in the code above in
preparation. So let's continue. I'll go ahead and replace all the
quotes, and put a line break between sentences, in advance this
time.
.lp.(c.b "BOOK ONE - The Romantic Egotist".)c..sp 2..lp.b "CHAPTER 1."Amory, Son of Beatrice..lpAmory Blaine inherited from his mother every trait, except thestray inexpressible few, that made him worth while. His father,an ineffectual, inarticulate man with a taste for Byron and ahabit of drowsing over the Encyclopedia Britannica, grew wealthyat thirty through the death of two elder brothers, successfulChicago brokers, and in the first flush of feeling that the worldwas his, went to Bar Harbor and met Beatrice O'Hara..In consequence, Stephen Blaine handed down to posterity his height ofjust under six feet and his tendency to waver at crucial moments, thesetwo abstractions appearing in his son Amory..For many years he hovered in the background of his family's life, anunassertive figure with a face half-obliterated by lifeless, silky hair,continually occupied in ``taking care'' of his wife, continually harassedby the idea that he didn't and couldn't understand her..ppBut Beatrice Blaine!.There was a woman!.Early pictures taken on her father's estate at Lake Geneva, Wisconsin, or inRome at the Sacred Heart Convent--an educational extravagance that in her youthwas only for the daughters of the exceptionally wealthy-showed the exquisitedelicacy of her features, the consummate art and simplicity of her clothes..A brilliant education she had--her youth passed in renaissance glory, she wasversed in the latest gossip of the Older Roman Families; known by name as afabulously wealthy American girl to Cardinal Vitori and Queen Margherita andmore subtle celebrities that one must have had some culture even to have heardof..She learned in England to prefer whiskey and soda to wine, and her small talkwas broadened in two senses during a winter in Vienna..All in all Beatrice O'Hara absorbed the sort of education that will be quiteimpossible ever again; a tutelage measured by the number of things and peopleone could be contemptuous of and charming about; a culture rich in all arts andtraditions, barren of all ideas, in the last of those days when the greatgardener clipped the inferior roses to produce one perfect bud..pp
Conclusion
And there we have it! Obviously, we've barely scratched the surface of roff, and I intend to write a few more articles about it to further explore its features, but this should be enough to get you started! You now know enough roff to handle most non-technical writing tasks, like simple school papers. You also know how to find the manual pages for the different macro packages, which should give you most of what you need to know to use them.
Other References
While the man pages are fairly detailed for groff(1) and
its component parts, they are still man pages, and may not be terribly
approachable to somebody without any background. They are also a bit of
a tangled mess, with information scattered across the man pages for
several different programs, and thus tricky to navigate.
Unfortunately, time has not been kind to roff, and there aren't a lot of resources that I've been able to track down on it outside of those man pages. However, here are a few other sources that I did track down that might be of interest.
Hall, J. (2018). How to format academic papers on Linux with groff -me. Open Source. https://opensource.com/article/18/2/how-format-academic-papers-linux-groff-me
Arora, H. (2012). Linux Groff Command Examples to Create Formatted Document. The Geek Stuff. https://www.thegeekstuff.com/2012/09/linux-groff-command-examples/
Kernighan, B. W., & Pike, R. (1984). The UNIX Programming Environment. Prentice Hall.
Shotts, W. (2019). The Linux Command Line (2nd ed.). No Starch Press.
The UNIX Programming Environment has a rather comprehensive coverage of several macro packages, as well as some of the preprocessors. It is dated, but most of the content is just as applicable today as it was in the '80s.
The Linux Command Line covers many aspects of command line
Linux, including groff(1). However the coverage
specifically on groff(1) is minimal; it feels like it was
just added to "check a box", so to speak. Don't buy this book for its
groff(1) content. That said, it is a pretty good book to
have around if you're just learning Linux.