Minimal PDF

PDF is a binary format, but it contains mostly plain text

Most PDF files do not look readable in a text editor. Compression, encryption, and embedded images are largely to blame. After removing these three components, one can more easily see that PDF is a human-readable document description language.

With patience, one can write a PDF file by hand

The Adobe PDF specification (ISO approved copy of the ISO 32000-1 Standards document) includes an example minimal PDF file, but it's possible to trim it down even further. The trickiest part is making sure that all the byte counts are correct (tips).

The file (download: minimal.pdf)

%PDF-1.1
%¥±ë

1 0 obj
  << /Type /Catalog
     /Pages 2 0 R
  >>
endobj

2 0 obj
  << /Type /Pages
     /Kids [3 0 R]
     /Count 1
     /MediaBox [0 0 300 144]
  >>
endobj

3 0 obj
  <<  /Type /Page
      /Parent 2 0 R
      /Resources
       << /Font
           << /F1
               << /Type /Font
                  /Subtype /Type1
                  /BaseFont /Times-Roman
               >>
           >>
       >>
      /Contents 4 0 R
  >>
endobj

4 0 obj
  << /Length 55 >>
stream
  BT
    /F1 18 Tf
    0 0 Td
    (Hello World) Tj
  ET
endstream
endobj

xref
0 5
0000000000 65535 f 
0000000018 00000 n 
0000000077 00000 n 
0000000178 00000 n 
0000000457 00000 n 
trailer
  <<  /Root 1 0 R
      /Size 5
  >>
startxref
565
%%EOF
Header; specifies that this file uses PDF version 1.1
Comment containing at least 4 high bit characters. This example has 6.

Object 1, Generation 0
  Begin a Catalog dictionary
    The root Pages object: Object 2, Generation 0
  End dictionary
End object

Object 2, Generation 0
  Begin a Pages dictionary
    An array of the individual pages in the document
    The array contains only one page
    Global page size, lower-left to upper-right, measured in points
  End dictionary
End object

Object 3
  Begin a Page dictionary

    The resources for this page…
      Begin a Font resource dictionary
        Bind the name F1 to
          a Font dictionary
            It's a Type 1 font
            and the font face is Times-Roman



    The contents of the page: Object 4, Generation 0



Object 4
  A stream, 55 bytes in length
Begin stream
  Begin Text object
    Use F1 font at 18 point size
    Position the text at 0,0
    Show text Hello World
  End Text
End stream


The xref section
A contiguous group of 5 objects, starting with Object 0
Object 0: is object number 0, generation 65535, free, space+linefeed
Object 1: at byte offset 18, generation 0, in use, space+linefeed



The trailer section
  The document root is Object 1, Generation 0 (the Catalog dictionary)
  The document contains 5 indirect objects

Where is the newest xref?
byte offset 565
End of File

Download

Notes

The high bit comment in this example contains 6 one-byte characters. These happen to show up as 3 two-byte characters when viewing the file as UTF-8 encoded text. To see 6 characters, try changing your browser's character encoding to Western.

Differences from the minimal PDF file in the Adobe spec.

Links

http://blog.idrsolutions.com/?s=%22Make+your+own+PDF+file%22
A series of posts that explains how to write PDF files from scratch.
For a gentler introduction to the specification, read some tips on writing a PDF file (this site) or an introduction to PDF (another site).

Found a mistake?

Submit a comment or correction

Updates

03 May 2014 Add note about 6 high bit characters appearing as 3 UTF-8 characters. Thanks @pdfkungfu!
2012 Jan 08 Comments link
2012 Dec 24 The file was not working in some readers (including Adobe Reader!) because the Contents stream needed to be an indirect object. All streams shall be indirect objects
2012 Jun 15 Reword download link
2012 Jan 26 remove document trapdoor tech talk link. it seems off-topic.
2010 Dec 02 link to Google Tech Talk
2010 Nov 20 clean up
2010 Sep 27 Small changes, corrections
2010 Sep 13 Created