Reading and writing PDF files
Note
This page only describes the read/write functionality of the
pdf_utils
package. See The pdf-utils package for further
information.
Reading files
Opening PDF files for reading and writing in pyHanko is easy.
For example, to instantiate a PdfFileReader
reading from document.pdf
, it suffices to do the following.
from pyhanko.pdf_utils.reader import PdfFileReader
with open('document.pdf', 'rb') as doc:
r = PdfFileReader(doc)
# ... do stuff ...
In-memory data can be read in a similar way: if buf
is a bytes
object containing data from a PDF file, you can use it in a
PdfFileReader
as follows.
from pyhanko.pdf_utils.reader import PdfFileReader
from io import BytesIO
buf = b'<PDF file data goes here>'
doc = BytesIO(buf)
r = PdfFileReader(doc)
# ... do stuff ...
Modifying files
If you want to modify a PDF file, use
IncrementalPdfFileWriter
,
like so.
from pyhanko.pdf_utils.incremental_writer import IncrementalPdfFileWriter
with open('document.pdf', 'rb+') as doc:
w = IncrementalPdfFileWriter(doc)
# ... do stuff ...
w.write_in_place()
Using
write_in_place()
will cause the generated update to be appended to the same stream as the input
stream; this is why we open the file with 'rb+'
.
If you want the output to be written to a different file or buffer, use
write()
instead.
Obviously, opening the input file with 'rb'
is sufficient in this case.
Note
Due to the way PDF signing works, pyHanko’s signing API will usually
take care of calling write
or write_in_place
as appropriate,
and do its own processing of the results.
In most standard use cases, you probably don’t need to worry about explicit
writes too much.
Any
IncrementalPdfFileWriter
objects used in a signing operation should be discarded afterwards.
If you want to continue appending updates to a signed document, create
a new
IncrementalPdfFileWriter
on top of the output.
This should suffice to get you started with pyHanko’s signing and validation functionality, but the reader/writer classes can do a lot more. To learn more about the inner workings of the low-level PDF manipulation layer of the library, take a look at The pdf-utils package or the API reference.
Warning
While the pyhanko.pdf_utils
module is very powerful in that
it allows you to modify objects in the PDF file in essentially arbitrary
ways, and with a lot of control over the output, actually using it in this
way requires some degree of familiarity with the PDF standard.
As things are now, pyHanko does not offer any facilities to help you format documents neatly, or to do any kind of layout work beyond the most basic operations. This may or may not change in the future. In the meantime, you’re probably better off using typesetting software or a HTML to PDF converter for your more complex layout needs, and let pyHanko handle the signing step at the end.