Saturday, September 29, 2012

PDF Font Obsessive

I ran into another issue using Flying Saucer/XHTMLRenderer to generate PDFs: i18n.
I've had to dig into Flying Saucer a few times before (finding and upgrading to R9 for CSS fixes like word wrap, overriding image resource loading), and so it happened again.

During testing, i18n characters were missing (such as Polish, including this one).
I first suspected it was an issue with handling UTF-8, but after debugging I found all was handled correctly (property files, unicode codepoints, XHTML) up until PDF creation using iText.
It looks like the original output differences in PDFs can be explained by the default fonts included and handled only as Latin-1.
In order to handle UTF-8, fonts have to be included manually as shown in the Flying Saucer user guide. And these fonts will be embedded (despite saying otherwise).

In my CSS I was referencing Arial and other fonts.
To keep these I'd have to reference their absolute TTF file path and embed them in the PDF.
Since I'm not sure on the legality of this, I searched for alternative fonts with clear licensing.
I tried FreeSans, DejaVuSans, and finally settled at the moment on Liberation Sans matching Arial the best (and with the least vertical spacing)
These ended up adding 40-60k to the PDF size for embedding the different font types (sans/serif, regular/bold).
But thankfully now the UTF-8 characters are displaying in the output PDF.

1 comment:

  1. I actually ended up using FreeMono at the moment for matching Courier almost exactly, unlike Liberation Mono.

    ReplyDelete