An Introduction to groff
A surprisingly handy, old-school formatting system
roff, short for “run off”, is an old-school document formatting language1 that is readily available for Linux systems in the form of GNU roff (groff), troff, and nroff. It’s significantly lighter-weight than something like LaTex, but still allows for fairly complex document formatting tasks. It is also used to render man pages, so it’s worth learning for that alone if you’re interested in Linux. Beyond manuals, it can be used for formatting books, papers, etc. You can render your document in a variety of different formats, including HTML, PostScript2, and pdf.
As a classic Unix utility, roff itself is implemented through several distinct programs that are called one after another in a pipeline to produce the desired output document. I’m going to address roff initially in terms of these individual programs and pipelining, however we will transition to the more modern approach of using groff instead to do much of the work in one command.
The roff Pipeline
roff comes in many flavours. Historically, the major two were troff(1)
and
nroff(1)
. troff(1)
was used to produce output for typesetters, and
nroff(1)
for computer terminals. These programs both accept roff code as
input, and output a standardized intermediate format for further processing. In
modern contexts, these two programs are merged together into the groff(1)
front-end. We’ll start out using troff(1)
standalone, though.
A general pipeline for creating a document using troff(1)
is,
% cat input.tr | preprocessor | troff | postprocessor > output.ext
In this example, input.tr
is the source file describing the document.
Preprocessors are programs that handle more complex formatting tasks like
equations and tables, which are not actually part of the roff engine itself.
The common preprocessors are,
- eqn (for equations)
- grn (for pictures)
- pic (for simple block-diagrams [think mermaid])
- chem (for chemical diagrams)
- refer (for bibliographies and references)
- tble (for tables)
We won’t be discussing these in this article, but information on them may come
later. These preprocessors generate roff code, which is then converted into
an intermediate output by the troff(1)
program. Finally, this intermediate output
is fed into a postprocessor, to produce output of the desired format.
Getting Started with troff
troff itself is a very simple program to use. It will accept text on stdin and will write the intermediate output to stdout.For example, we could simply run the program like so,
% troff
hello, world!
^D
and we get the following as output,
x T ps
x res 72000 1 1
x init
p1
x font 5 TR
f5
s10000
V12000
H72000
md
DFd
thello,
wh2500
tw
H104120
torld!
n12000 0
x trailer
V792000
x stop
Of course, why manually type the data into standard input when you can use
a file. troff(1)
will accept content on standard input, or you can specify
a filename as an argument,
% echo hello, world! > hello.tr
% troff hello.tr
Postprocessing: Creating a pdf
Now, we could easily save this output to a file using redirection, but if our purpose is to produce a document for distribution, this probably isn’t what we want to do. The roff intermediate output isn’t exactly suitable for end-user consumption! We’ll need to run our file through a postprocessor first, to create a readable document.
We’ll create a pdf file here, and so we will use the gropdf(1)
postprocessor.
gropdf(1)
accepts troff(1)
output on stdin, and writes a pdf to stdout. If
you run,
% troff hello.tr | gropdf
you will get a viable pdf file, however gropdf(1)
will also display the
following error message,
Expecting a pdf pipe (got ps)
The problem is that we need to give troff(1)
a heads up that we intend to
target our document to a pdf file. This is done using the -Tpdf
argument. If
we don’t do this, troff(1)
will assume that we’re targeting PostScript, and
so the file won’t be set up appropriately for gropdf(1)
to do its thing.
Thus, our final command is,
% troff -Tpdf hello.tr | gropdf > output.pdf
I’ve uploaded the output file from this command here, so you can see what the output should be. Opening it up in Evince yields,
Admittedly, it isn’t the most impressive of documents. But it is a start!
Skipping the Pipelines with groff
One common complaint about the roff system is how long the processing pipelines can get. It’s already a fair bit of typing to make our pdf above–now imagine if you wanted equations, pictures, and tables in your document too. That’s three more commands we need to add to the pipeline!
As a result, the modern convention is to bypass all this pipelining by using
the groff(1)
front-end to roff. This program will allow us to specify options
telling it to run certain preprocessors or postprocessors, and thereby allowing
us to skip writing the long pipelines.
To create our output pdf using groff(1)
, we need only execute the following
command,
% groff -Tpdf hello.tr > output.pdf
When the -T
option is specified, groff(1)
will automatically call the relevant
postprocessor, so we don’t need to! If no -T
option is provided, groff(1)
will
default to PostScript output.
In recent groff(1)
references, I have seen a different process listed for
creating a pdf: using the ps2pdf(1)
command. This command takes PostScript
input and produces a pdf from it. By taking advantage of the fact that
groff(1)
defaults to producing PostScript, a pdf can be created using,
% groff hello.tr | ps2pdf - output.pdf
I’m not entirely sure why these authors use this approach, rather than the postprocessor for creating a pdf directly, but I want to mention it here as you are sure to encounter it in your searches.
Requests and Macros
roff code consists of a sequence of lines, which can be classified as either text lines or control lines. A text line, unsurprisingly, contains text that should appear in the document. A control line represents a command, and will always start with either a . or a ’ , followed by a command and any arguments that it might have. These control lines are what are used to format the document.
The standard format of a control line is,
.command_name arg1 arg2
The most common kind of command is a request, which is a low-level formatting or control directive. roff requests allow you to do a variety of things, such as controlling the kerning, adding space, coloring glyphs, etc.
Generally speaking, these requests are too low level to be useful to somebody just trying to write a paper for their English class (sure, you can do that in roff, why not?), and so roff also includes macros. These are a bit like functions, and allow you to do simple things like “make that bold” or “indent this line” or “center that text”, without having to worry about super low-level details of typesetting.
Technically, you could write these yourself out of roff requests. And if you’d
like to give it a shot, the groff(7)
man page does include a reference for
all the supported requests (make sure to run man 7 groff
to get the right
document), but groff(1)
also comes with a number of pre-packaged macro
packages that you can use.
- man
- mandoc
- mdoc
- me
- mm
- ms
- www
Others are available too, such as mom.
While each is a little different, many share some common elements when it comes
to basic formatting. I’ll use the me package for the rest of this article.3
To load the package, pass the -me
argument to groff(1)
.
Formatting hello.tr
As a first application of formatting, let’s modify our hello.tr
file to include
bold and italics. We’ll render “hello,” in bold, and “world!” in italics. The
macros for bold and italics are, unsurprisingly, .b
and .i
respectively.
So, we have,
% cat formatted.me
.b hello,
.i world!
% groff -me -Tpdf formatted.me > formatted.pdf
To center the text, we can use the .(c
and .)c
macros. These should
surround the text to be centered, as below,
% cat centered.me
.(c
.b hello,
.i world!
.)c
% groff -me -Tpdf centered.me > centered.pdf
A listing of many common me macros can be found by reading the groff_me(7)
man page. I’ll introduce a few more as we go, but you should look there for
a more complete list.
Formatting a Simple Paper
Applying some simple formatting directives to a couple of words is a good start, but applying it to a larger paper is a bit more complex. So let’s give that a shot. Let’s grab the first bit of text from This Side of Paradise, courtesy of Project Gutenberg4. In raw form, we have
BOOK ONE—The Romantic Egotist
CHAPTER 1. Amory, Son of Beatrice
Amory Blaine inherited from his mother every trait, except the
stray inexpressible few, that made him worth while. His father,
an ineffectual, inarticulate man with a taste for Byron and a
habit of drowsing over the Encyclopedia Britannica, grew wealthy
at thirty through the death of two elder brothers, successful
Chicago brokers, and in the first flush of feeling that the world
was his, went to Bar Harbor and met Beatrice O’Hara. In
consequence, Stephen Blaine handed down to posterity his height
of just under six feet and his tendency to waver at crucial
moments, these two abstractions appearing in his son Amory. For
many years he hovered in the background of his family’s life, an
unassertive figure with a face half-obliterated by lifeless,
silky hair, continually occupied in “taking care” of his wife,
continually harassed by the idea that he didn’t and couldn’t
understand her.
But Beatrice Blaine! There was a woman! Early pictures taken on
her father’s estate at Lake Geneva, Wisconsin, or in Rome at the
Sacred Heart Convent—an educational extravagance that in her
youth was only for the daughters of the exceptionally
wealthy—showed the exquisite delicacy of her features, the
consummate art and simplicity of her clothes. A brilliant
education she had—her youth passed in renaissance glory, she was
versed in the latest gossip of the Older Roman Families; known by
name as a fabulously wealthy American girl to Cardinal Vitori and
Queen Margherita and more subtle celebrities that one must have
had some culture even to have heard of. She learned in England to
prefer whiskey and soda to wine, and her small talk was broadened
in two senses during a winter in Vienna. All in all Beatrice
O’Hara absorbed the sort of education that will be quite
impossible ever again; a tutelage measured by the number of
things and people one could be contemptuous of and charming
about; a culture rich in all arts and traditions, barren of all
ideas, in the last of those days when the great gardener clipped
the inferior roses to produce one perfect bud.
Let’s start by bolding and centering the BOOK ONE line.
.(c
.b BOOK ONE - The Romantic Egotist
.)c
The above roff seems reasonable. However, when we run it and take a look we see something rather unexpected,
The text got centered, but of all the words we typed only two of them are present, and of those only the first is bold! What gives?
Well, it has to do with the way that arguments to requests/macros work. Just like arguments on the command line, arguments to requests are separated by spaces. The bold macro makes its first argument bold and appends the second to it (not bold). So, the output that we got should make sense. The other four arguments to the macro were ignored.
Just like on the shell, if we want to have spaces in an argument, we’ll need to wrap the whole thing in quotes. So this,
.(c
.b "BOOK ONE - The Romantic Egotist"
.)c
is actually what we want.
Bearing this lesson in mind, let’s place the chapter title next. We’ll bold the words CHAPTER 1, but leave the name in normal face.
1.(c
2.b "BOOK ONE - The Romantic Egotist"
3.)c
4.b "CHAPTER 1."
5Amory, Son of Beatrice
Notice that roff automatically did a line break after we ended the centering on line 3. Generally, roff will handle line breaks, paragraphs, etc., for us, as long as we tell it where to put them (we’ll see this in just a moment). In fact, the manual advises against leaving any blank lines in your input file.
That will result in a rather ugly and hard to read file, though. If you’ve programmed before, you know that spacing things out is quite useful. So there is a way to do this. Simply start the line with a ., and don’t give a command. So we can space our book and chapter titles out a little bit like this,
1.(c
2.b "BOOK ONE - The Romantic Egotist"
3.)c
4.
5.b "CHAPTER 1."
6Amory, Son of Beatrice
This change is purely cosmetic. It won’t affect the actual output file.
What if we actually wanted to add some blank space in our output file? Those
two lines are awfully close together, and it doesn’t look all that great. For
this we will use a roff request, .sp
. This isn’t a macro, but just looking at
the file, you’d never be able to tell the difference.
1.(c
2.b "BOOK ONE - The Romantic Egotist"
3.)c
4.
5.sp 2
6.
7.b "CHAPTER 1."
8Amory, Son of Beatrice
This request accepts a numerical argument that states how many lines to skip,
so here we are adding two blank lines between the book and chapter title.
However, although this works, it does go against the manual for groff_me(7)
,
which states that the .sp
request can only be safely used with me after the
first call to .pp
(the paragraph macro).
The trouble is that a .pp
will result in the title getting indented, which we
don’t want. However there is also an .lp
macro, which creates a paragraph
with no indent. The documentation doesn’t explicitly say that this is safe, but
if things work the way I think they do, it should be just as good. Just be
aware that this is a “your mileage may vary” moment. I’ve never had it cause an
issue, but we are going against the direct advice of the manual here.
1.lp
2.(c
3.b "BOOK ONE - The Romantic Egotist"
4.)c
5.
6.sp 2
7.
8.lp
9.b "CHAPTER 1."
10Amory, Son of Beatrice
While I was at it, I threw another .lp
before the chapter title, just to be
explicit. If you haven’t gathered already, the .lp
and .pp
macros are what
we will be using to add paragraph breaks in general. They do add a small amount
of extra blank space as well. If it isn’t obvious from this example of their
use, you’ll be able to see it once we start adding extra paragraphs.
Okay, let’s bring in the first paragraph of text. We’ll use the .lp
macro to
start this one off, as this will be the first paragraph in the chapter, and I
prefer to leave this unindented. The .pp
macro will work much the same,
except it will add an indent. We’ll see this one in action for the second
paragraph.
1.lp
2.(c
3.b "BOOK ONE - The Romantic Egotist"
4.)c
5.
6.sp 2
7.
8.lp
9.b "CHAPTER 1."
10Amory, Son of Beatrice
11.
12.lp
13Amory Blaine inherited from his mother every trait, except the
14stray inexpressible few, that made him worth while. His father,
15an ineffectual, inarticulate man with a taste for Byron and a
16habit of drowsing over the Encyclopedia Britannica, grew wealthy
17at thirty through the death of two elder brothers, successful
18Chicago brokers, and in the first flush of feeling that the world
19was his, went to Bar Harbor and met Beatrice O’Hara. In
20consequence, Stephen Blaine handed down to posterity his height
21of just under six feet and his tendency to waver at crucial
22moments, these two abstractions appearing in his son Amory. For
23many years he hovered in the background of his family’s life, an
24unassertive figure with a face half-obliterated by lifeless,
25silky hair, continually occupied in “taking care” of his wife,
26continually harassed by the idea that he didn’t and couldn’t
27understand her.
A cursory glance would seem to indicate that this all looks okay, however if you look carefully you’ll see that there are a few issues. For example, we see the word didnât, instead of didn’t.
This is a really common problem when dealing with systems like this when you copy and paste text into them, instead of typing it. If you look closely, all of the issues appear where there is either a single quote or a double quote in the input. It’s a simple encoding problem.
When you press the quotation mark key on your keyboard, it corresponds to the character,
"
however, if you look closely, you’ll see that in the input we actually have the character
“
The difference is subtle, but it is enough to confuse roff. Let’s change all the “fancy” quotation marks for normal ones and try again.
And that one worked. In fact, if you look at it, all of the single quotes are rendered as the curly ones anyway. Groff will automatically handle translating from the standard “straight” single quotes into curly ones. However, the double quotes are still straight and (some might say) boring.
Luckily, we can fix this too. If we replace the first quotation mark with ``
and the second with ’’ (that is, two backticks for the first and two single
quotes for the second), groff(1)
will give us the curly quotes there too! It just
needs to know which one is the opening quote (noted by the backticks) and which
is the closing one (noted by the single quotes), so it knows what the quotes
should look like.
While we are making edits, this document actually does not meet the
recommendations in the manual for groff(7)
. Just like the manual recommends that
we avoid adding blank lines to the file, it also recommends that we start each
new sentence on its own line. This won’t actually change the output (again,
it’s just cosmetic), but it might help us stay organized. So let’s make that
edit too,
1.lp
2.(c
3.b "BOOK ONE - The Romantic Egotist"
4.)c
5.
6.sp 2
7.
8.lp
9.b "CHAPTER 1."
10Amory, Son of Beatrice
11.
12.lp
13Amory Blaine inherited from his mother every trait, except the
14stray inexpressible few, that made him worth while. His father,
15an ineffectual, inarticulate man with a taste for Byron and a
16habit of drowsing over the Encyclopedia Britannica, grew wealthy
17at thirty through the death of two elder brothers, successful
18Chicago brokers, and in the first flush of feeling that the world
19was his, went to Bar Harbor and met Beatrice O'Hara.
20.
21In consequence, Stephen Blaine handed down to posterity his height of
22just under six feet and his tendency to waver at crucial moments, these
23two abstractions appearing in his son Amory.
24.
25For many years he hovered in the background of his family's life, an
26unassertive figure with a face half-obliterated by lifeless, silky hair,
27continually occupied in ``taking care'' of his wife, continually harassed
28by the idea that he didn't and couldn't understand her.
29.pp
Okay, so far so good! Now, let’s add the next paragraph of text. I’ve already
added the .pp
macro in the code above in preparation. So let’s continue. I’ll
go ahead and replace all the quotes, and put a line break between sentences,
in advance this time.
1.lp
2.(c
3.b "BOOK ONE - The Romantic Egotist"
4.)c
5.
6.sp 2
7.
8.lp
9.b "CHAPTER 1."
10Amory, Son of Beatrice
11.
12.lp
13Amory Blaine inherited from his mother every trait, except the
14stray inexpressible few, that made him worth while. His father,
15an ineffectual, inarticulate man with a taste for Byron and a
16habit of drowsing over the Encyclopedia Britannica, grew wealthy
17at thirty through the death of two elder brothers, successful
18Chicago brokers, and in the first flush of feeling that the world
19was his, went to Bar Harbor and met Beatrice O'Hara.
20.
21In consequence, Stephen Blaine handed down to posterity his height of
22just under six feet and his tendency to waver at crucial moments, these
23two abstractions appearing in his son Amory.
24.
25For many years he hovered in the background of his family's life, an
26unassertive figure with a face half-obliterated by lifeless, silky hair,
27continually occupied in ``taking care'' of his wife, continually harassed
28by the idea that he didn't and couldn't understand her.
29.pp
30But Beatrice Blaine!
31.
32There was a woman!
33.
34Early pictures taken on her father's estate at Lake Geneva, Wisconsin, or in
35Rome at the Sacred Heart Convent--an educational extravagance that in her youth
36was only for the daughters of the exceptionally wealthy-showed the exquisite
37delicacy of her features, the consummate art and simplicity of her clothes.
38.
39A brilliant education she had--her youth passed in renaissance glory, she was
40versed in the latest gossip of the Older Roman Families; known by name as a
41fabulously wealthy American girl to Cardinal Vitori and Queen Margherita and
42more subtle celebrities that one must have had some culture even to have heard
43of.
44.
45She learned in England to prefer whiskey and soda to wine, and her small talk
46was broadened in two senses during a winter in Vienna.
47.
48All in all Beatrice O'Hara absorbed the sort of education that will be quite
49impossible ever again; a tutelage measured by the number of things and people
50one could be contemptuous of and charming about; a culture rich in all arts and
51traditions, barren of all ideas, in the last of those days when the great
52gardener clipped the inferior roses to produce one perfect bud.
53.pp
Conclusion
And there we have it! Obviously, we’ve barely scratched the surface of roff, and I intend to write a few more articles about it to further explore its features, but this should be enough to get you started! You now know enough roff to handle most non-technical writing tasks, like simple school papers. You also know how to find the manual pages for the different macro packages, which should give you most of what you need to know to use them.
Other References
While the man pages are fairly detailed for groff(1)
and its component parts,
they are still man pages, and may not be terribly approachable to somebody
without any background. They are also a bit of a tangled mess, with information
scattered across the man pages for several different programs, and thus tricky
to navigate.
Unfortunately, time has not been kind to roff, and there aren’t a lot of resources that I’ve been able to track down on it outside of those man pages. However, here are a few other sources that I did track down that might be of interest.
-
Hall, J. (2018). How to format academic papers on Linux with groff -me. Open Source. https://opensource.com/article/18/2/how-format-academic-papers-linux-groff-me
-
Arora, H. (2012). Linux Groff Command Examples to Create Formatted Document. The Geek Stuff. https://www.thegeekstuff.com/2012/09/linux-groff-command-examples/
-
Kernighan, B. W., & Pike, R. (1984). The UNIX Programming Environment. Prentice Hall.
-
Shotts, W. (2019). The Linux Command Line (2nd ed.). No Starch Press.
The UNIX Programming Environment has a rather comprehensive coverage of several macro packages, as well as some of the preprocessors. It is dated, but most of the content is just as applicable today as it was in the ’80s.
The Linux Command Line covers many aspects of command line Linux, including
groff(1)
. However the coverage specifically on groff(1)
is minimal; it
feels like it was just added to “check a box”, so to speak. Don’t buy this
book for its groff(1)
content. That said, it is a pretty good book to have around
if you’re just learning Linux.
-
If you’re interested in more background information on roff, you can readily find it in the man page, of all places. Run
% man roff
and read over the first section or two. ↩︎
-
Postscript is a programming language created at Adobe for describing documents, with commands for drawing lines and such. Dr. Brailsford has a pretty good video on this language, published via Computerphile on YouTube, if you’re interested. For our purposes here, we don’t care about it beyond knowing it is the output of groff. ↩︎
-
If you are interested in reading about a different package, they have man pages. Simply run,
% man groff_*
where * is replaced by the name of the macro package, to acces the documentation for that package. ↩︎
-
Fitzgerald, F. S. (1920). This Side of Paradise. Project Gutenberg. https://www.gutenberg.org/ebooks/805 ↩︎