A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://gist.github.com/hubgit/6078384 below:

Remove metadata from a PDF file, using exiftool and qpdf. Note that embedded objects may still contain metadata. ยท GitHub

Remove metadata from a PDF file, using exiftool and qpdf. Note that embedded objects may still contain metadata.

Metadata in PDF files can be stored in at least two places:

A PDF file contains a) objects and b) pointers to those objects.

When information is added to a PDF file, it is appended to the end of the file and a pointer is added.

When information is removed from a PDF file, the pointer is removed, but the actual data may not be removed.

To remove previously-deleted data, the PDF file must be rebuilt.

pdftk can be used to update the Info Dictionary of a PDF file. See pdftk-unset-info-dictionary-values.php below for an example. As noted in the pdftk documentation, though, pdftk does not alter XMP metadata.

exiftool can be used to read/write XMP metadata from/to PDF files.

exiftool -all:all also removes the pointer to the Info Dictionary, but does not completely remove it.

qpdf can be used to linearize PDF files (qpdf --linearize $FILE), which optimises them for fast web loading and removes any orphan data.

After running qpdf, there may be new XMP metadata, as it extracts metadata from any embedded objects. To read the XMP tags of embedded objects, use exiftool -extractEmbedded -all:all $FILE.


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4