There have been a raft of reviews of the new Kindle and the various iPhone reading applications lately. In general, reviewers are more positive about the experience of reading from a screen than they have been in the past. However, I’ve noticed that one enormous factor in reading tends to get passed by; maybe it’s not something that people notice if they don’t think about book design. See if you can identify it from these screenshots, which you can click to enlarge:
The new Kindle 2:
All of these screen-reading environments fully justify their paragraphs of text: there’s not a ragged right margin. This is what we tend to expect books to look like: typically, a book page has an even rectangle of text on it, a tradition that extends back to Gutenberg’s 42-line Bible:
One might notice here, however, that Gutenberg’s page has something that the screen-reading environments do not: hyphenation. When Gutenberg’s words don’t fit in a line (see, for example, the third line down in the right column) he broke them with a hyphen, starting a tradition in book design that has made its way to the present moment. The reason for hyphenation is apparent if you look at the shots of the screen-reading devices: if words aren’t split, often the spacing between words must be increased, making it harder for the eye to follow. This is more apparent when the width of the text column (called the measure) is narrow, as is the case on iPhone apps: notice how spaced-out the penultimate line, “necessary to effectiveness in an”, is in the eReader screenshot. The Kindle and the Sony Reader look a little bit better because there aren’t such glaring white spaces in the text, although weirdly both appear to have lines in the middle of paragraphs that aren’t fully justified.
Why don’t these reading devices hyphenate their lines if they fully justify them? This isn’t, for what it’s worth, a problem that affects more than just these devices; plenty of text on the web is fully justified and has no hyphenation. The problem is that hyphenation is trickier than it might initially appear. To properly hyphenate a paragraph, the hyphenator needs to understand at least something about how the language that the paragraph of text is written in works. Here’s how Robert Bringhurst outlines what he calls the “etiquette of hyphenation and pagination” as rules for compositors in his authoritative Elements of Typographic Style:
2.4.1. At hyphenated line-ends, leave at least two characters behind and take at least three forward.
2.4.2. Avoid leaving the stub-end of a hyphenated word, or any word shorter than four letters, as the last line of a paragraph.
2.4.3. Avoid more than three consecutive hyphenated lines.
2.4.4. Hyphenate proper names only as a last resort unless they occur with the frequency of common nouns.
2.4.5. Hyphenate according to the conventions of the language.
2.4.6. Link short numerical and mathematical expressions with hard spaces.
2.4.7. Avoid beginning more than two consecutive lines with the same word.
2.4.8. Never begin a page with the last line of a multi-line paragraph.
2.4.9. Balance facing pages by moving single lines.
2.4.10. Avoid hyphenated breaks where the text is interrupted.
2.4.11. Abandon any and all rules of hyphenation and pagination that fail to serve the needs of the text.
Rule 2.4.5 might be worth quoting in full:
In English we hyphenate cab-ri-o-let but in French ca-brio-let. The old German rule which hyphenated Glockenspiel as Glok-kenspiel was changed by law in 1998, but when össze is broken in Hungarian, it still tuns into ösz-sze. In Spanish the double consonants ll and rr are never divided. (The only permissible hyphenation in the phrase arroz con pollo is thus arroz con po-llo.) The conventions of each language are part of its typographic heritage and should normally be followed, even when setting single foreign words or brief quotations.
Can a computer hyphenate texts? Sure: if these rules can be made comprehensible to a computer, it can sensibly hyphenate a text. Donald Knuth’s TeX typesetting program, for example, contains hyphenation dictionaries: lists of words in which the various points at which they can be hyphenated are marked. Hyphenation points are arranged by “badness”: it’s worse to use hy-phenation than hyphen-ation, for example, but it would be even worse not to break the word and leave a gap of white space in the line. The TeX engine tries to find the least bad way to set a line; it usually does a reasonable job. Not all hyphenation is equal, however: Adobe InDesign, for example, will do a much better job of hyphenating a paragraph than Microsoft Word will.
And: as rule 2.4.5 suggests, if a computer is going to hyphenate something, it needs to know what language the text is in. This is a job for metadata: electronic books could have an indicator of what language they’re in, and the reader application could hyphenate automatically. But that won’t always help: in the text on the Kindle screen, for example, der Depperte isn’t English and wouldn’t be recognized as such. A human compositor could catch that; a computer wouldn’t guess, and would have to default to not breaking it. The same problem will happen with proper names.
There aren’t really easy solutions for this problem. A smarter ebook reading device (and smarter ebooks) might hyphenate automatically; if this were the case, the reader would need to rehyphenate whenever the user changed the font or the font size. (There are some possibilities in HTML, but they do require a lot of work on the part of the author or designer; some day this might work better.) It’s not a problem with PDFs, of course, but PDFs don’t allow reflowing text. There’s no shame in using a ragged right margin; at least then one might not subject to Bringhurst’s opprobrium towards to poorly justified in The Elements of Typographic Style:
A typewriter (or a computer-driven printer of similar quality) that justifies its lines in imitation of typesetting is a presumptuous machine, mimicking the outer form instead of the inner truth of typography.