Most PDF files do not look readable in a text editor. Compression, encryption, and embedded images are largely to blame. After removing these three components, one can more easily see that PDF is a human-readable document description language.
The Adobe PDF specification (
ISO approved copy of the ISO 32000-1 Standards document) includes an example
minimal PDF file, but it's possible to trim it down even further. The trickiest part is making sure that all the byte counts are correct (tips).
%PDF-1.1 %¥±ë 1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj 2 0 obj << /Type /Pages /Kids [3 0 R] /Count 1 /MediaBox [0 0 300 144] >> endobj 3 0 obj << /Type /Page /Parent 2 0 R /Resources << /Font << /F1 << /Type /Font /Subtype /Type1 /BaseFont /Times-Roman >> >> >> /Contents 4 0 R >> endobj 4 0 obj << /Length 55 >> stream BT /F1 18 Tf 0 0 Td (Hello World) Tj ET endstream endobj xref 0 5 0000000000 65535 f 0000000018 00000 n 0000000077 00000 n 0000000178 00000 n 0000000457 00000 n trailer << /Root 1 0 R /Size 5 >> startxref 565 %%EOF
Header; specifies that this file uses PDF version 1.1 Comment containing at least 4high bitcharacters. This example has 6. Object 1, Generation 0 Begin a Catalog dictionary The root Pages object: Object 2, Generation 0 End dictionary End object Object 2, Generation 0 Begin a Pages dictionary An array of the individual pages in the document The array contains only one page Global page size, lower-left to upper-right, measured in points End dictionary End object Object 3 Begin a Page dictionary The resources for this page… Begin a Fontresource dictionaryBind the nameF1to a Font dictionary It's a Type 1 font and the font face is Times-Roman The contents of the page: Object 4, Generation 0 Object 4 A stream, 55 bytes in length Begin stream Begin Text object UseF1font at 18 point size Position the text at 0,0 Show textHello WorldEnd Text End stream The xref section A contiguous group of 5 objects, starting with Object 0 Object 0: is object number 0, generation 65535, free, space+linefeed Object 1: at byte offset 18, generation 0, in use, space+linefeed The trailer section The document root is Object 1, Generation 0 (the Catalog dictionary) The document contains 5 indirect objects Where is the newest xref? byte offset 565 End of File
The high bit comment in this example contains 6 one-byte characters. These happen to show up as 3 two-byte characters when viewing the file as UTF-8 encoded text. To see 6 characters, try changing your browser's character encoding to
0x0a) is used for the newline character.
carriage return + linefeedfor newline, so pasting into Notepad is not ideal.
|03 May 2014||Add note about 6 high bit characters appearing as 3 UTF-8 characters. Thanks @pdfkungfu!|
|2012 Jan 08||Comments link|
|2012 Dec 24||The file was not working in some readers (including Adobe Reader!) because the
|2012 Jun 15||Reword download link|
|2012 Jan 26||remove document trapdoor tech talk link. it seems off-topic.|
|2010 Dec 02||link to Google Tech Talk|
|2010 Nov 20||clean up|
|2010 Sep 27||Small changes, corrections|
|2010 Sep 13||Created|