Most PDF files do not look readable in a text editor. Compression, encryption, and embedded images are largely to blame. After removing these three components, one can more easily see that PDF is a human-readable document description language.
The Adobe PDF specification (ISO approved copy of the ISO 32000-1 Standards document
) includes an example minimal PDF file,
but it's possible to trim it down even further. The trickiest part is making sure that all the byte counts are correct (tips).
%PDF-1.1 %¥±ë 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [3 0 R] /Count 1 /MediaBox [0 0 300 144] >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Times-Roman >> >> >> /Contents 4 0 R >> endobj 4 0 obj << /Length 55 >> stream BT /F1 18 Tf 0 0 Td (Hello World) Tj ET endstream endobj xref 0 5 0000000000 65535 f 0000000018 00000 n 0000000077 00000 n 0000000178 00000 n 0000000457 00000 n trailer << /Root 1 0 R /Size 5 >> startxref 565 %%EOF |
Header; specifies that this file uses PDF version 1.1 Comment containing at least 4 high bit bytes. This example has 6, shown as 3 two-byte characters in UTF-8. Object 1, Generation 0 Begin a Catalog dictionary The root Pages object: Object 2, Generation 0 End dictionary End object Object 2, Generation 0 Begin a Pages dictionary An array of the individual pages in the document The array contains only one page Global page size, lower-left to upper-right, measured in points End dictionary End object Object 3 Begin a Page dictionary The resources for this page… Begin a Fontresource dictionaryBind the nameF1to a Font dictionary It's a Type 1 font and the font face is Times-Roman The contents of the page: Object 4, Generation 0 Object 4 A stream, 55 bytes in length Begin stream Begin Text object UseF1font at 18 point size Position the text at 0,0 Show textHello WorldEnd Text End stream The xref section A contiguous group of 5 objects, starting with Object 0 Object 0: is object number 0, generation 65535, free, space+linefeed Object 1: at byte offset 18, generation 0, in use, space+linefeed The trailer section The document root is Object 1, Generation 0 (the Catalog dictionary) The document contains 5 indirect objects Where is the newest xref? byte offset 565 End of File |
linefeed
newlines: minimal.pdflinefeed
newlines and a license comment: minimal_l.pdfOutlines
dictionary.0x0a
) is used for the newline character.carriage return + linefeed
for newline, so pasting into Notepad is not ideal.Submit a comment or correction
MIT License. See the LICENSE file.
25 Jul 2020 | Move note about 6 high bit bytes appearing as 3 UTF-8 characters to be an inline comment. |
14 Mar 2019 | Correct download links for the PDF files that include license comments. |
02 Dec 2018 | Add MIT License because placing in the public domain is not supported in all jurisdictions. |
03 May 2014 | Add note about 6 high bit characters appearing as 3 UTF-8 characters. Thanks @pdfkungfu! |
2012 Jan 08 | Comments link |
2012 Dec 24 | The file was not working in some readers (including Adobe Reader!) because the Contents stream needed to be an indirect object. All streams shall be indirect objects |
2012 Jun 15 | Reword download link |
2012 Jan 26 | remove document trapdoor tech talk link. it seems off-topic. |
2010 Dec 02 | link to Google Tech Talk |
2010 Nov 20 | clean up |
2010 Sep 27 | Small changes, corrections |
2010 Sep 13 | Created |