PDFDocument

With MuPDF it is also possible to create, edit and manipulate PDF documents using low level access to the objects and streams contained in a PDF file. A PDFDocument object is also a Document object. You can test a Document object to see if it is safe to use as a PDFDocument by calling document.isPDF().

new PDFDocument()

Constructor method.

Create a new empty PDF document.

Returns:

PDFDocument.

EXAMPLE

var pdfDocument = new mupdf.PDFDocument();
new PDFDocument(fileName)

mutool only

Constructor method.

Load a PDF document from file.

Returns:

PDFDocument.

EXAMPLE

var pdfDocument = new mupdf.PDFDocument("my-file.pdf");

Instance methods

getVersion()

Returns the PDF document version as an integer multiplied by 10, so e.g. a PDF-1.4 document would return 14.

Returns:

Integer.

EXAMPLE

var version = pdfDocument.getVersion();
setLanguage(lang)

wasm only

Sets the language for the document.

Arguments:
  • langString.

EXAMPLE

pdfDocument.setLanguage("en");
getLanguage()

wasm only

Gets the language for the document.

Returns:

String.

EXAMPLE

var lang = pdfDocument.getLanguage();
rearrangePages(pages)

Rearrange (re-order and/or delete) pages in the PDFDocument.

The pages in the document will be rearranged according to the input list. Any pages not listed will be removed, and pages may be duplicated by listing them multiple times.

The PDF objects describing removed pages will remain in the file and take up space (and can be recovered by forensic tools) unless you save with the garbage option.

N.B. the PDFDocument should not be used for anything except saving after rearranging the pages (FIXME).

Arguments:
  • pages – An array of page numbers (0-based).

EXAMPLE

var document = new Document.openDocument("my_pdf.pdf");
pdfDocument.rearrangePages([3,2]);
pdfDocument.save("fewer_pages.pdf", "garbage");
save(fileName, options)

mutool only

Write the PDFDocument to file. The options are a string of comma separated options (see the mutool convert options).

Arguments:
  • fileName – The name of the file to save to.

  • options – The options.

EXAMPLE

pdfDocument.save("my_fileName.pdf", "compress,compress-images,garbage=compact");
saveToBuffer(options)

wasm only

Saves the document to a buffer. The options are a string of comma separated options (see the mutool convert options).

Arguments:
  • options – The options.

Returns:

Buffer.

EXAMPLE

var buffer = pdfDocument.saveToBuffer({"compress-images":true});
canBeSavedIncrementally()

Returns true if the document can be saved incrementally, e.g. repaired documents or applying redactions prevents incremental saves.

Returns:

Boolean.

EXAMPLE

var canBeSavedIncrementally = pdfDocument.canBeSavedIncrementally();
countVersions()

Returns the number of versions of the document in a PDF file, typically 1 + the number of updates.

Returns:

Integer.

EXAMPLE

var versionNum = pdfDocument.countVersions();
countUnsavedVersions()

Returns the number of unsaved updates to the document.

Returns:

Integer.

EXAMPLE

var unsavedVersionNum = pdfDocument.countUnsavedVersions();
validateChangeHistory()

Check the history of the document, return the last version that checks out OK. Returns 0 if the entire history is OK, 1 if the next to last version is OK, but the last version has issues, etc.

Returns:

Integer.

EXAMPLE

var changeHistory = pdfDocument.validateChangeHistory();
hasUnsavedChanges()

Returns true if the document has been changed since it was last opened or saved.

Returns:

Boolean.

EXAMPLE

var hasUnsavedChanges = pdfDocument.hasUnsavedChanges();
wasPureXFA()

mutool only

Returns true if the document was an XFA form without AcroForm fields.

Returns:

Boolean.

EXAMPLE

var wasPureXFA = pdfDocument.wasPureXFA();
wasRepaired()

Returns true if the document was repaired when opened.

Returns:

Boolean.

EXAMPLE

var wasRepaired = pdfDocument.wasRepaired();
setPageLabels(index, style, prefix, start)

Sets the page label numbering for the page and all pages following it, until the next page with an attached label.

Arguments:
  • indexInteger.

  • styleString Can be one of the following strings: "" (none), "D" (decimal), "R" (roman numerals upper-case), "r" (roman numerals lower-case), "A" (alpha upper-case), or "a" (alpha lower-case).

  • prefixString.

  • startInteger The ordinal with which to start numbering.

EXAMPLE

pdfDocument.setPageLabels(0, "D", "Prefix", 1);
deletePageLabels(index)

Removes any associated page label from the page.

Arguments:
  • indexInteger.

EXAMPLE

pdfDocument.deletePageLabels(0);
getTrailer()

The trailer dictionary. This contains indirect references to the “Root” and “Info” dictionaries. See: PDF object access.

Returns:

PDFObject The trailer dictionary.

EXAMPLE

var dict = pdfDocument.getTrailer();
countObjects()

Return the number of objects in the PDF. Object number 0 is reserved, and may not be used for anything. See: PDF object access.

Returns:

Integer Object count.

EXAMPLE

var num = pdfDocument.countObjects();
createObject()

Allocate a new numbered object in the PDF, and return an indirect reference to it. The object itself is uninitialized.

Returns:

The new object.

EXAMPLE

var obj = pdfDocument.createObject();
deleteObject(obj)

Delete the object referred to by an indirect reference or its object number.

Arguments:
  • obj – The object to delete.

EXAMPLE

pdfDocument.deleteObject(obj);
formatURIWithPathAndDest(path, destination)

Format a link URI given a system independent path (see table 3.40 in the 1.7 specification) to a remote document and a destination object or a destination string suitable for createLink().

Arguments:
  • pathString An absolute or relative path to a remote document file.

  • destinationLink destiation or String referring to a destination using either a destination object or a destination name in the remote document.

appendDestToURI(uri, destination)

Append a fragment representing a document destination to a an existing URI that points to a remote document. The resulting string is suitable for createLink().

Arguments:
  • uriString An URI to a remote document file.

  • destinationLink destiation or String referring to a destination using either a destination object or a destination name in the remote document.


PDF JavaScript actions

enableJS()

Enable interpretation of document JavaScript actions.

EXAMPLE

pdfDocument.enableJS();
disableJS()

Disable interpretation of document JavaScript actions.

EXAMPLE

pdfDocument.disableJS();
isJSSupported()

Returns true if interpretation of document JavaScript actions is supported.

Returns:

Boolean.

EXAMPLE

var jsIsSupported = pdfDocument.isJSSupported();
setJSEventListener(listener)

mutool only

Calls the listener whenever a document JavaScript action triggers an event.

Arguments:
  • listener{} The JavaScript listener function.

Note

At present this listener will only trigger when a document JavaScript action triggers an alert.

EXAMPLE

pdfDocument.setJSEventListener({
        onAlert: function(message) {
                print(message);
        }
});
bake(bakeAnnots, bakeWidgets)

Baking a document changes all the annotations and/or form fields (otherwise known as widgets) in the document into static content. It “bakes” the appearance of the annotations and fields onto the page, before removing the interactive objects so they can no longer be changed.

Effectively this removes the “annotation or “widget” type of these objects, but keeps the appearance of the objects.

Arguments:
  • bakeAnnotsBoolean Whether to bake annotations or not. Defaults to true.

  • bakeWidgetsBoolean Whether to bake widgets or not. Defaults to true.


PDF Journalling

enableJournal()

Activate journalling for the document.

EXAMPLE

pdfDocument.enableJournal();
getJournal()

Returns a PDF Journal Object.

Returns:

Object PDF Journal Object.

EXAMPLE

var journal = pdfDocument.getJournal();
beginOperation(op)

Begin a journal operation.

Arguments:
  • opString The name of the operation.

EXAMPLE

pdfDocument.beginOperation("my_operation");
beginImplicitOperation()

Begin an implicit journal operation. Implicit operations are operations that happen due to other operations, e.g. updating an annotation.

EXAMPLE

pdfDocument.beginImplicitOperation();
endOperation()

End a previously started normal or implicit operation. After this it can be undone/redone using the methods below.

EXAMPLE

pdfDocument.endOperation();
abandonOperation()

Abandon an operation. Reverts to the state before that operation began.

EXAMPLE

pdfDocument.abandonOperation();
canUndo()

Returns true if undo is possible in this state.

Returns:

Boolean.

EXAMPLE

var canUndo = pdfDocument.canUndo();
canRedo()

Returns true if redo is possible in this state.

Returns:

Boolean.

EXAMPLE

var canRedo = pdfDocument.canRedo();
undo()

Move backwards in the undo history. Changes to the document after this throws away all subsequent history.

EXAMPLE

pdfDocument.undo();
redo()

Move forwards in the undo history.

EXAMPLE

pdfDocument.redo();
saveJournal(filename)

Save the journal to a file.

arg filename:

File to save the journal to.

EXAMPLE

pdfDocument.saveJournal("test.journal");

PDF Object Access

A PDF document contains objects, similar to those in JavaScript: arrays, dictionaries, strings, booleans, and numbers. At the root of the PDF document is the trailer object; which contains pointers to the meta data dictionary and the catalog object which contains the pages and other information.

Pointers in PDF are also called indirect references, and are of the form “32 0 R” (where 32 is the object number, 0 is the generation, and R is magic syntax). All functions in MuPDF dereference indirect references automatically.

PDF has two types of strings: /Names and (Strings). All dictionary keys are names.

Some dictionaries in PDF also have attached binary data. These are called streams, and may be compressed.

Note

PDFObjects are always bound to the document that created them. Do NOT mix and match objects from one document with another document!


addObject(obj)

Add obj to the PDF as a numbered object, and return an indirect reference to it.

Arguments:
  • obj – Object to add.

Returns:

Object.

EXAMPLE

var ref = pdfDocument.addObject(obj);
addStream(buffer, object)

Create a stream object with the contents of buffer, add it to the PDF, and return an indirect reference to it. If object is defined, it will be used as the stream object dictionary.

Arguments:
  • bufferBuffer object.

  • object – The object to add the stream to.

Returns:

Object.

EXAMPLE

var stream = pdfDocument.addStream(buffer, object);
addRawStream(buffer, object)

Create a stream object with the contents of buffer, add it to the PDF, and return an indirect reference to it. If object is defined, it will be used as the stream object dictionary. The buffer must contain already compressed data that matches “Filter” and “DecodeParms” set in the stream object dictionary.

Arguments:
  • bufferBuffer object.

  • object – The object to add the stream to.

Returns:

Object.

EXAMPLE

var stream = pdfDocument.addRawStream(buffer, object);
newNull()

Create a new null object.

Returns:

PDFObject.

EXAMPLE

var obj = pdfDocument.newNull();
newBoolean(boolean)

Create a new boolean object.

Arguments:
  • boolean – The boolean value.

Returns:

PDFObject.

EXAMPLE

var obj = pdfDocument.newBoolean(true);
newInteger(number)

Create a new integer object.

Arguments:
  • number – The number value.

Returns:

PDFObject.

EXAMPLE

var obj = pdfDocument.newInteger(1);
newReal(number)

Create a new real number object.

Arguments:
  • number – The number value.

Returns:

PDFObject.

EXAMPLE

var obj = pdfDocument.newReal(7.3);
newString(string)

Create a new string object.

Arguments:
  • stringString.

Returns:

PDFObject.

EXAMPLE

var obj = pdfDocument.newString("hello");
newByteString(byteString)

Create a new byte string object.

Arguments:
  • byteStringString.

Returns:

PDFObject.

EXAMPLE

var obj = pdfDocument.newByteString("hello");
newName(string)

Create a new name object.

Arguments:
  • string – The string value.

Returns:

PDFObject.

EXAMPLE

var obj = pdfDocument.newName("hello");
newIndirect(objectNumber, generation)

Create a new indirect object.

Arguments:
  • objectNumberInteger.

  • generationInteger.

Returns:

PDFObject.

EXAMPLE

var obj = pdfDocument.newIndirect(100, 0);
newArray(capacity)

Create a new array object.

Arguments:
  • capacityInteger Defaults to 8.

Returns:

PDFObject.

EXAMPLE

var obj = pdfDocument.newArray();
newDictionary(capacity)

Create a new dictionary object.

Arguments:
  • capacityInteger Defaults to 8.

Returns:

PDFObject.

EXAMPLE

var obj = pdfDocument.newDictionary();

PDF Page Access

All page objects are structured into a page tree, which defines the order the pages appear in.

countPages()

Number of pages in the document.

Returns:

Integer Page number.

EXAMPLE

var pageCount = pdfDocument.countPages();
loadPage(number)

Return the PDFPage for a page number.

Arguments:
  • numberInteger The page number, the first page is number zero.

Returns:

PDFPage.

EXAMPLE

var page = pdfDocument.loadPage(0);
findPage(number)

Return the PDFObject for a page number.

Arguments:
  • numberInteger The page number, the first page is number zero.

Returns:

PDFObject.

EXAMPLE

var obj = pdfDocument.findPage(0);
findPageNumber(page)

mutool only

Given a PDFPage instance, find the page number in the document.

Arguments:
  • pagePDFPage instance.

Returns:

Integer.

EXAMPLE

var pageNumber = pdfDocument.findPageNumber(page);
deletePage(number)

Delete the numbered PDFPage.

Arguments:
  • number – The page number, the first page is number zero.

EXAMPLE

pdfDocument.deletePage(0);
insertPage(at, page)

Insert the PDFPage object in the page tree at the location. If at is -1, at the end of the document.

Pages consist of a content stream, and a resource dictionary containing all of the fonts and images used.

Arguments:
  • at – The index to insert at.

  • page – The PDFPage to insert.

EXAMPLE

pdfDocument.insertPage(-1, page);
addPage(mediabox, rotate, resources, contents)

Create a new PDFPage object. Note: this function does NOT add it to the page tree, use insertPage to do that.

Arguments:
  • mediabox[ulx,uly,lrx,lry] Rectangle.

  • rotate – Rotation value.

  • resources – Resources object.

  • contents – Contents string. This represents the page content stream - see section 3.7.1 in the PDF 1.7 specification.

Returns:

PDFObject.

EXAMPLE

var helvetica = pdfDocument.newDictionary();
helvetica.put("Type", pdfDocument.newName("Font"));
helvetica.put("Subtype", pdfDocument.newName("Type1"));
helvetica.put("Name", pdfDocument.newName("Helv"));
helvetica.put("BaseFont", pdfDocument.newName("Helvetica"));
helvetica.put("Encoding", pdfDocument.newName("WinAnsiEncoding"));
var fonts = pdfDocument.newDictionary();
fonts.put("Helv", helvetica);
var resources = pdfDocument.addObject(pdfDocument.newDictionary());
resources.put("Font", fonts);
var pageObject = pdfDocument.addPage([0,0,300,350], 0, resources, "BT /Helv 12 Tf 100 100 Td (MuPDF!)Tj ET");
pdfDocument.insertPage(-1, pageObject);

EXAMPLE

docs/examples/pdf-create.js
// Create a PDF from scratch using helper functions.

// This example creates a new PDF file from scratch, using helper
// functions to create resources and page objects.
// This assumes a basic working knowledge of the PDF file format.

// Create a new empty document with no pages.
var pdf = new PDFDocument()

// Load built-in font and create WinAnsi encoded simple font resource.
var font = pdf.addSimpleFont(new Font("Times-Roman"))

// Load PNG file and create image resource.
var image = pdf.addImage(new Image("example.png"))

// Create resource dictionary.
var resources = pdf.addObject({
	Font: { Tm: font },
	XObject: { Im0: image },
})

// Create content stream data.
var contents =
	"10 10 280 330 re s\n" +
	"q 200 0 0 200 50 100 cm /Im0 Do Q\n" +
	"BT /Tm 16 Tf 50 50 TD (Hello, world!) Tj ET\n"

// Create a new page object.
var page = pdf.addPage([0,0,300,350], 0, resources, contents)

// Insert page object at the end of the document.
pdf.insertPage(-1, page)

// Save the document to file.
pdf.save("out.pdf", "pretty,ascii,compress-images,compress-fonts")
addSimpleFont(font, encoding)

Create a PDFObject from the Font object as a simple font.

Arguments:
  • fontFont.

  • encoding – The encoding to use. Encoding is either “Latin” (CP-1252), “Greek” (ISO-8859-7), or “Cyrillic” (KOI-8U). The default is “Latin”.

Returns:

PDFObject.

EXAMPLE

var obj = pdfDocument.addSimpleFont(new mupdf.Font("Times-Roman"), "Latin");
addCJKFont(font, language, wmode, style)

Create a PDFObject from the Font object as a UTF-16 encoded CID font for the given language (“zh-Hant”, “zh-Hans”, “ko”, or “ja”), writing mode (“H” or “V”), and style (“serif” or “sans-serif”).

Arguments:
  • fontFont.

  • languageString.

  • wmode0 for horizontal writing, and 1 for vertical writing.

  • styleString.

Returns:

PDFObject.

EXAMPLE

var obj = pdfDocument.addCJKFont(new mupdf.Font("ja"), "ja", 0, "serif");
addFont(font)

Create a PDFObject from the Font object as an Identity-H encoded CID font.

Arguments:
  • fontFont.

Returns:

PDFObject.

EXAMPLE

var obj = pdfDocument.addFont(new mupdf.Font("Times-Roman"));
addImage(image)

Create a PDFObject from the Image object.

Arguments:
  • imageImage.

Returns:

PDFObject.

EXAMPLE

var obj = pdfDocument.addImage(new mupdf.Image(pixmap));
loadImage(obj)

Load an Image from a PDFObject (typically an indirect reference to an image resource).

Arguments:
  • objPDFObject.

Returns:

Image.

EXAMPLE

var image = pdfDocument.loadImage(obj);

Copying objects across PDFs

The following functions can be used to copy objects from one PDF document to another:

newGraftMap()

Create a graft map on the destination document, so that objects that have already been copied can be found again. Each graft map should only be used with one source document. Make sure to create a new graft map for each source document used.

Returns:

PDFGraftMap.

EXAMPLE

var graftMap = pdfDocument.newGraftMap();
graftObject(object)

Deep copy an object into the destination document. This function will not remember previously copied objects. If you are copying several objects from the same source document using multiple calls, you should use a graft map instead.

Arguments:
  • object – Object to graft.

EXAMPLE

pdfDocument.graftObject(obj);
graftPage(to, srcDoc, srcPageNumber)

Graft a page and its resources at the given page number from the source document to the requested page number in the document.

Arguments:
  • to – The page number to insert the page before. Page numbers start at 0 and -1 means at the end of the document.

  • srcDoc – Source document.

  • srcPageNumber – Source page number.

EXAMPLE

This would copy the first page of the source document (0) to the last page (-1) of the current PDF document.

pdfDocument.graftPage(-1, srcDoc, 0);

Embedded/Associated files in PDFs

addEmbeddedFile(filename, mimetype, contents, creationDate, modificationDate, addChecksum)

Embedded a file into the document. If a checksum is added then the file contents can be verified later. An indirect reference to a File Specification Object is returned.

Arguments:
  • filenameString.

  • mimetypeString See: Mimetype.

  • contentsBuffer.

  • creationDateDate.

  • modificationDateDate.

  • addChecksumBoolean.

Returns:

Object File Specification Object.

Note

After embedding a file into a PDF, it can be connected to an annotation using PDFAnnotation.setFilespec().

EXAMPLE

var fileSpecObject = pdfDocument.addEmbeddedFile("my_file.jpg",
                                                 "image/jpeg",
                                                 buffer,
                                                 new Date(),
                                                 new Date(),
                                                 false);
getEmbeddedFiles()

Returns the embedded files or null for the document.

Returns:

Object File Specification Object.

getEmbeddedFileParams(fileSpecObject)

Historical alias for getFilespecParams.

getFilespecParams(fileSpecObject)

Return an object describing the file referenced by the fileSpecObject.

Arguments:
Returns:

Object Filespec Params Object.

EXAMPLE

var obj = pdfDocument.getFilespecParams(fileSpecObject);
getEmbeddedFileContents(fileSpecObject)

Returns a Buffer with the contents of the embedded file referenced by the fileSpecObject.

Arguments:
Returns:

Buffer.

EXAMPLE

var buffer = pdfDocument.getEmbeddedFileContents(fileSpecObject);
verifyEmbeddedFileChecksum(fileSpecObject)

Verify the MD5 checksum of the embedded file contents.

Arguments:
Returns:

Boolean.

EXAMPLE

var fileChecksumValid = pdfDocument.verifyEmbeddedFileChecksum(fileSpecObject);
countAssociatedFiles()

Return the number of Associated Files on this document. Note that this is the number of files associated at the document level, not necessarily the total number of files associated with elements throughout the entire document.

Returns:

Integer

EXAMPLE

var count = pdfDocument.countAssociatedFiles();
associatedFile(n)

Return the Filespec object that represents the nth Associated File on this document. 0 <= n < count, where count is the value given by countAssociatedFiles().

Return fileSpecObject:

Object File Specification Object.

EXAMPLE

var obj = pdfDocument.associatedFile(0);

ZUGFeRD support in PDFs

zugferdProfile()

Determine if the current PDF is a ZUGFeRD PDF, and, if so, return the profile type in use. Possible return values include: “NOT ZUGFERD”, “COMFORT”, “BASIC”, “EXTENDED”, “BASIC WL”, “MINIMUM”, “XRECHNUNG”, and “UNKNOWN”.

Returns:

String.

EXAMPLE

var profile = pdfDocument.zugferdProfile();
zugferdVersion()

Determine if the current PDF is a ZUGFeRD PDF, and, if so, return the version of the spec it claims to conforms to. This will return 0 for non-zugferd PDFs.

Returns:

Float.

EXAMPLE

var version = pdfDocument.zugferdVersion();
zugferdXML()

Return a buffer containing the embedded ZUGFeRD XML data from this PDF.

Returns:

Buffer.

EXAMPLE

var buf = pdfDocument.zugferdXML();