It seems like it should be simple: a few pages of text and a couple of images. But save it as a PDF and suddenly you're looking at a 50MB file. What's going on? Here's the complete breakdown.
Contents
1. High-Resolution Images
This is by far the most common cause. When you insert a photo into a Word document or presentation and save it as PDF, the image is often stored at its full original resolution — even if it's displayed at a small size on the page.
A photo taken on a modern smartphone is typically 12–50 megapixels and 3–8MB. If you embed ten such photos in a document, that alone is 30–80MB before anything else is counted.
The fix: use PDF compression to resample images to a resolution appropriate for the intended use (screen viewing or standard print resolution).
2. Embedded Fonts
PDFs embed font files to ensure consistent rendering across different systems. Each font file can be 100KB–1MB. A document using several different fonts (headings, body text, code blocks, etc.) may embed 500KB–3MB of font data.
Well-made PDFs use font subsetting — embedding only the characters actually used — which reduces font data significantly. Poorly made PDFs embed entire font families.
3. Scanned Pages
Scanned documents store each page as a raster image at scanner resolution. At 300 DPI, each page image is around 2500 × 3500 pixels. At colour depth, that's several megabytes per page before compression.
A 10-page scanned contract can easily be 20–40MB. This is pure image data — there's no text layer, no vector graphics, just large pixel grids for every page.
4. Hidden Metadata and Revision History
PDF files can accumulate hidden data over time:
- Author name, organization, creation date
- All previous revision states of the document (if "track changes" style editing was used)
- Thumbnail images for page previews
- Color profile (ICC profile) data
- Undo history from certain PDF editors
While metadata alone is rarely more than a few hundred KB, documents with long editing histories can accumulate several MB of invisible legacy data.
5. PDF Version and Creator Software
Some software creates PDFs less efficiently than others. Microsoft PowerPoint, for example, often produces large PDFs because it embeds slide images at high resolution and includes extensive font and formatting data. Conversely, a PDF exported from a specialized tool like LaTeX is typically very compact for the same content.
Newer PDF versions (PDF 1.6 and later) support object streams and cross-reference streams that reduce file structure overhead. Older PDFs or those produced by legacy software may use less efficient internal structures.
Solutions for Each Cause
- Images: Use a PDF compressor to resample embedded images. This is the single most effective reduction technique.
- Fonts: Most compressors handle font optimization automatically. In your authoring software, look for "subset fonts" options when exporting.
- Scanned pages: Run through a PDF compressor — scans compress extremely well (often 70–85% reduction).
- Metadata: A good PDF optimizer will strip unnecessary metadata automatically.
- Creator software: When possible, re-export from the source application with better PDF export settings. Many apps offer "optimized" or "minimum size" PDF export options.
Find out how much space you can save
Upload your PDF and see the reduction instantly. Free, private, no sign-up.
Compress PDF Now →