PDFDocument
¶
With MuPDF it is also possible to create, edit and manipulate PDF documents using low level access to the objects and streams contained in a PDF file. A PDFDocument
object is also a Document
object. You can test a Document
object to see if it is safe to use as a PDFDocument
by calling document.isPDF()
.
- new PDFDocument()¶
Constructor method.
Create a new empty PDF document.
- Returns:
PDFDocument
.
EXAMPLE
var pdfDocument = new mupdf.PDFDocument();
- new PDFDocument(fileName)¶
mutool only
Constructor method.
Load a PDF document from file.
- Returns:
PDFDocument
.
EXAMPLE
var pdfDocument = new mupdf.PDFDocument("my-file.pdf");
Instance methods
- getVersion()¶
Returns the PDF document version as an integer multiplied by 10, so e.g. a PDF-1.4 document would return 14.
- Returns:
Integer
.
EXAMPLE
var version = pdfDocument.getVersion();
- setLanguage(lang)¶
wasm only
Sets the language for the document.
- Arguments:
lang –
String
.
EXAMPLE
pdfDocument.setLanguage("en");
- getLanguage()¶
wasm only
Gets the language for the document.
- Returns:
String
.
EXAMPLE
var lang = pdfDocument.getLanguage();
- rearrangePages(pages)¶
Rearrange (re-order and/or delete) pages in the
PDFDocument
.The pages in the document will be rearranged according to the input list. Any pages not listed will be removed, and pages may be duplicated by listing them multiple times.
The PDF objects describing removed pages will remain in the file and take up space (and can be recovered by forensic tools) unless you save with the
garbage
option.N.B. the
PDFDocument
should not be used for anything except saving after rearranging the pages (FIXME).- Arguments:
pages – An array of page numbers (0-based).
EXAMPLE
var document = new Document.openDocument("my_pdf.pdf"); pdfDocument.rearrangePages([3,2]); pdfDocument.save("fewer_pages.pdf", "garbage");
- save(fileName, options)¶
mutool only
Write the
PDFDocument
to file. The options are a string of comma separated options (see the mutool convert options).- Arguments:
fileName – The name of the file to save to.
options – The options.
EXAMPLE
pdfDocument.save("my_fileName.pdf", "compress,compress-images,garbage=compact");
- saveToBuffer(options)¶
wasm only
Saves the document to a buffer. The options are a string of comma separated options (see the mutool convert options).
- Arguments:
options – The options.
- Returns:
Buffer
.
EXAMPLE
var buffer = pdfDocument.saveToBuffer({"compress-images":true});
- canBeSavedIncrementally()¶
Returns true if the document can be saved incrementally, e.g. repaired documents or applying redactions prevents incremental saves.
- Returns:
Boolean
.
EXAMPLE
var canBeSavedIncrementally = pdfDocument.canBeSavedIncrementally();
- countVersions()¶
Returns the number of versions of the document in a PDF file, typically 1 + the number of updates.
- Returns:
Integer
.
EXAMPLE
var versionNum = pdfDocument.countVersions();
- countUnsavedVersions()¶
Returns the number of unsaved updates to the document.
- Returns:
Integer
.
EXAMPLE
var unsavedVersionNum = pdfDocument.countUnsavedVersions();
- validateChangeHistory()¶
Check the history of the document, return the last version that checks out OK. Returns
0
if the entire history is OK,1
if the next to last version is OK, but the last version has issues, etc.- Returns:
Integer
.
EXAMPLE
var changeHistory = pdfDocument.validateChangeHistory();
- hasUnsavedChanges()¶
Returns true if the document has been changed since it was last opened or saved.
- Returns:
Boolean
.
EXAMPLE
var hasUnsavedChanges = pdfDocument.hasUnsavedChanges();
- wasPureXFA()¶
mutool only
Returns true if the document was an XFA form without AcroForm fields.
- Returns:
Boolean
.
EXAMPLE
var wasPureXFA = pdfDocument.wasPureXFA();
- wasRepaired()¶
Returns true if the document was repaired when opened.
- Returns:
Boolean
.
EXAMPLE
var wasRepaired = pdfDocument.wasRepaired();
- setPageLabels(index, style, prefix, start)¶
Sets the page label numbering for the page and all pages following it, until the next page with an attached label.
- Arguments:
index –
Integer
.style –
String
Can be one of the following strings:""
(none),"D"
(decimal),"R"
(roman numerals upper-case),"r"
(roman numerals lower-case),"A"
(alpha upper-case), or"a"
(alpha lower-case).prefix –
String
.start –
Integer
The ordinal with which to start numbering.
EXAMPLE
pdfDocument.setPageLabels(0, "D", "Prefix", 1);
- deletePageLabels(index)¶
Removes any associated page label from the page.
- Arguments:
index –
Integer
.
EXAMPLE
pdfDocument.deletePageLabels(0);
- getTrailer()¶
The trailer dictionary. This contains indirect references to the “Root” and “Info” dictionaries. See: PDF object access.
- Returns:
PDFObject
The trailer dictionary.
EXAMPLE
var dict = pdfDocument.getTrailer();
- countObjects()¶
Return the number of objects in the PDF. Object number
0
is reserved, and may not be used for anything. See: PDF object access.- Returns:
Integer
Object count.
EXAMPLE
var num = pdfDocument.countObjects();
- createObject()¶
Allocate a new numbered object in the PDF, and return an indirect reference to it. The object itself is uninitialized.
- Returns:
The new object.
EXAMPLE
var obj = pdfDocument.createObject();
- deleteObject(obj)¶
Delete the object referred to by an indirect reference or its object number.
- Arguments:
obj – The object to delete.
EXAMPLE
pdfDocument.deleteObject(obj);
- formatURIWithPathAndDest(path, destination)¶
Format a link URI given a system independent path (see table 3.40 in the 1.7 specification) to a remote document and a destination object or a destination string suitable for createLink().
- Arguments:
path –
String
An absolute or relative path to a remote document file.destination – Link destiation or
String
referring to a destination using either a destination object or a destination name in the remote document.
- appendDestToURI(uri, destination)¶
Append a fragment representing a document destination to a an existing URI that points to a remote document. The resulting string is suitable for createLink().
- Arguments:
uri –
String
An URI to a remote document file.destination – Link destiation or
String
referring to a destination using either a destination object or a destination name in the remote document.
PDF JavaScript actions¶
- enableJS()¶
Enable interpretation of document JavaScript actions.
EXAMPLE
pdfDocument.enableJS();
- disableJS()¶
Disable interpretation of document JavaScript actions.
EXAMPLE
pdfDocument.disableJS();
- isJSSupported()¶
Returns true if interpretation of document JavaScript actions is supported.
- Returns:
Boolean
.
EXAMPLE
var jsIsSupported = pdfDocument.isJSSupported();
- setJSEventListener(listener)¶
mutool only
Calls the listener whenever a document JavaScript action triggers an event.
- Arguments:
listener –
{}
The JavaScript listener function.
Note
At present this listener will only trigger when a document JavaScript action triggers an alert.
EXAMPLE
pdfDocument.setJSEventListener({ onAlert: function(message) { print(message); } });
- bake(bakeAnnots, bakeWidgets)¶
Baking a document changes all the annotations and/or form fields (otherwise known as widgets) in the document into static content. It “bakes” the appearance of the annotations and fields onto the page, before removing the interactive objects so they can no longer be changed.
Effectively this removes the “annotation or “widget” type of these objects, but keeps the appearance of the objects.
- Arguments:
bakeAnnots –
Boolean
Whether to bake annotations or not. Defaults totrue
.bakeWidgets –
Boolean
Whether to bake widgets or not. Defaults totrue
.
PDF Journalling¶
- enableJournal()¶
Activate journalling for the document.
EXAMPLE
pdfDocument.enableJournal();
- getJournal()¶
Returns a PDF Journal Object.
- Returns:
Object
PDF Journal Object.
EXAMPLE
var journal = pdfDocument.getJournal();
- beginOperation(op)¶
Begin a journal operation.
- Arguments:
op –
String
The name of the operation.
EXAMPLE
pdfDocument.beginOperation("my_operation");
- beginImplicitOperation()¶
Begin an implicit journal operation. Implicit operations are operations that happen due to other operations, e.g. updating an annotation.
EXAMPLE
pdfDocument.beginImplicitOperation();
- endOperation()¶
End a previously started normal or implicit operation. After this it can be undone/redone using the methods below.
EXAMPLE
pdfDocument.endOperation();
- abandonOperation()¶
Abandon an operation. Reverts to the state before that operation began.
EXAMPLE
pdfDocument.abandonOperation();
- canUndo()¶
Returns true if undo is possible in this state.
- Returns:
Boolean
.
EXAMPLE
var canUndo = pdfDocument.canUndo();
- canRedo()¶
Returns true if redo is possible in this state.
- Returns:
Boolean
.
EXAMPLE
var canRedo = pdfDocument.canRedo();
- undo()¶
Move backwards in the undo history. Changes to the document after this throws away all subsequent history.
EXAMPLE
pdfDocument.undo();
- redo()¶
Move forwards in the undo history.
EXAMPLE
pdfDocument.redo();
- saveJournal(filename)¶
Save the journal to a file.
- arg filename:
File to save the journal to.
EXAMPLE
pdfDocument.saveJournal("test.journal");
PDF Object Access¶
A PDF document contains objects, similar to those in JavaScript: arrays, dictionaries, strings, booleans, and numbers. At the root of the PDF document is the trailer object; which contains pointers to the meta data dictionary and the catalog object which contains the pages and other information.
Pointers in PDF are also called indirect references, and are of the form “32 0 R” (where 32 is the object number, 0 is the generation, and R is magic syntax). All functions in MuPDF dereference indirect references automatically.
PDF has two types of strings: /Names
and (Strings)
. All dictionary keys are names.
Some dictionaries in PDF also have attached binary data. These are called streams, and may be compressed.
Note
PDFObjects
are always bound to the document that created them. Do NOT mix and match objects from one document with another document!
- addObject(obj)¶
Add
obj
to the PDF as a numbered object, and return an indirect reference to it.- Arguments:
obj – Object to add.
- Returns:
Object
.
EXAMPLE
var ref = pdfDocument.addObject(obj);
- addStream(buffer, object)¶
Create a stream object with the contents of
buffer
, add it to the PDF, and return an indirect reference to it. Ifobject
is defined, it will be used as the stream object dictionary.- Arguments:
buffer –
Buffer
object.object – The object to add the stream to.
- Returns:
Object
.
EXAMPLE
var stream = pdfDocument.addStream(buffer, object);
- addRawStream(buffer, object)¶
Create a stream object with the contents of
buffer
, add it to the PDF, and return an indirect reference to it. Ifobject
is defined, it will be used as the stream object dictionary. Thebuffer
must contain already compressed data that matches “Filter” and “DecodeParms” set in the stream object dictionary.- Arguments:
buffer –
Buffer
object.object – The object to add the stream to.
- Returns:
Object
.
EXAMPLE
var stream = pdfDocument.addRawStream(buffer, object);
- newNull()¶
Create a new null object.
- Returns:
PDFObject
.
EXAMPLE
var obj = pdfDocument.newNull();
- newBoolean(boolean)¶
Create a new boolean object.
- Arguments:
boolean – The boolean value.
- Returns:
PDFObject
.
EXAMPLE
var obj = pdfDocument.newBoolean(true);
- newInteger(number)¶
Create a new integer object.
- Arguments:
number – The number value.
- Returns:
PDFObject
.
EXAMPLE
var obj = pdfDocument.newInteger(1);
- newReal(number)¶
Create a new real number object.
- Arguments:
number – The number value.
- Returns:
PDFObject
.
EXAMPLE
var obj = pdfDocument.newReal(7.3);
- newString(string)¶
Create a new string object.
- Arguments:
string –
String
.
- Returns:
PDFObject
.
EXAMPLE
var obj = pdfDocument.newString("hello");
- newByteString(byteString)¶
Create a new byte string object.
- Arguments:
byteString –
String
.
- Returns:
PDFObject
.
EXAMPLE
var obj = pdfDocument.newByteString("hello");
- newName(string)¶
Create a new name object.
- Arguments:
string – The string value.
- Returns:
PDFObject
.
EXAMPLE
var obj = pdfDocument.newName("hello");
- newIndirect(objectNumber, generation)¶
Create a new indirect object.
- Arguments:
objectNumber –
Integer
.generation –
Integer
.
- Returns:
PDFObject
.
EXAMPLE
var obj = pdfDocument.newIndirect(100, 0);
- newArray(capacity)¶
Create a new array object.
- Arguments:
capacity –
Integer
Defaults to8
.
- Returns:
PDFObject
.
EXAMPLE
var obj = pdfDocument.newArray();
- newDictionary(capacity)¶
Create a new dictionary object.
- Arguments:
capacity –
Integer
Defaults to8
.
- Returns:
PDFObject
.
EXAMPLE
var obj = pdfDocument.newDictionary();
PDF Page Access¶
All page objects are structured into a page tree, which defines the order the pages appear in.
- countPages()¶
Number of pages in the document.
- Returns:
Integer
Page number.
EXAMPLE
var pageCount = pdfDocument.countPages();
- loadPage(number)¶
Return the
PDFPage
for a page number.- Arguments:
number –
Integer
The page number, the first page is number zero.
- Returns:
PDFPage
.
EXAMPLE
var page = pdfDocument.loadPage(0);
- findPage(number)¶
Return the
PDFObject
for a page number.- Arguments:
number –
Integer
The page number, the first page is number zero.
- Returns:
PDFObject
.
EXAMPLE
var obj = pdfDocument.findPage(0);
- findPageNumber(page)¶
mutool only
Given a
PDFPage
instance, find the page number in the document.- Arguments:
page –
PDFPage
instance.
- Returns:
Integer
.
EXAMPLE
var pageNumber = pdfDocument.findPageNumber(page);
- deletePage(number)¶
Delete the numbered
PDFPage
.- Arguments:
number – The page number, the first page is number zero.
EXAMPLE
pdfDocument.deletePage(0);
- insertPage(at, page)¶
Insert the
PDFPage
object in the page tree at the location. Ifat
is -1, at the end of the document.Pages consist of a content stream, and a resource dictionary containing all of the fonts and images used.
- Arguments:
at – The index to insert at.
page – The
PDFPage
to insert.
EXAMPLE
pdfDocument.insertPage(-1, page);
- addPage(mediabox, rotate, resources, contents)¶
Create a new
PDFPage
object. Note: this function does NOT add it to the page tree, use insertPage to do that.- Arguments:
mediabox –
[ulx,uly,lrx,lry]
Rectangle.rotate – Rotation value.
resources – Resources object.
contents – Contents string. This represents the page content stream - see section 3.7.1 in the PDF 1.7 specification.
- Returns:
PDFObject
.
EXAMPLE
var helvetica = pdfDocument.newDictionary(); helvetica.put("Type", pdfDocument.newName("Font")); helvetica.put("Subtype", pdfDocument.newName("Type1")); helvetica.put("Name", pdfDocument.newName("Helv")); helvetica.put("BaseFont", pdfDocument.newName("Helvetica")); helvetica.put("Encoding", pdfDocument.newName("WinAnsiEncoding")); var fonts = pdfDocument.newDictionary(); fonts.put("Helv", helvetica); var resources = pdfDocument.addObject(pdfDocument.newDictionary()); resources.put("Font", fonts); var pageObject = pdfDocument.addPage([0,0,300,350], 0, resources, "BT /Helv 12 Tf 100 100 Td (MuPDF!)Tj ET"); pdfDocument.insertPage(-1, pageObject);
EXAMPLE
docs/examples/pdf-create.js¶// Create a PDF from scratch using helper functions. // This example creates a new PDF file from scratch, using helper // functions to create resources and page objects. // This assumes a basic working knowledge of the PDF file format. // Create a new empty document with no pages. var pdf = new PDFDocument() // Load built-in font and create WinAnsi encoded simple font resource. var font = pdf.addSimpleFont(new Font("Times-Roman")) // Load PNG file and create image resource. var image = pdf.addImage(new Image("example.png")) // Create resource dictionary. var resources = pdf.addObject({ Font: { Tm: font }, XObject: { Im0: image }, }) // Create content stream data. var contents = "10 10 280 330 re s\n" + "q 200 0 0 200 50 100 cm /Im0 Do Q\n" + "BT /Tm 16 Tf 50 50 TD (Hello, world!) Tj ET\n" // Create a new page object. var page = pdf.addPage([0,0,300,350], 0, resources, contents) // Insert page object at the end of the document. pdf.insertPage(-1, page) // Save the document to file. pdf.save("out.pdf", "pretty,ascii,compress-images,compress-fonts")
- addSimpleFont(font, encoding)¶
Create a
PDFObject
from theFont
object as a simple font.- Arguments:
font –
Font
.encoding – The encoding to use. Encoding is either “Latin” (CP-1252), “Greek” (ISO-8859-7), or “Cyrillic” (KOI-8U). The default is “Latin”.
- Returns:
PDFObject
.
EXAMPLE
var obj = pdfDocument.addSimpleFont(new mupdf.Font("Times-Roman"), "Latin");
- addCJKFont(font, language, wmode, style)¶
Create a
PDFObject
from the Font object as a UTF-16 encoded CID font for the given language (“zh-Hant”, “zh-Hans”, “ko”, or “ja”), writing mode (“H” or “V”), and style (“serif” or “sans-serif”).- Arguments:
font –
Font
.language –
String
.wmode –
0
for horizontal writing, and1
for vertical writing.style –
String
.
- Returns:
PDFObject
.
EXAMPLE
var obj = pdfDocument.addCJKFont(new mupdf.Font("ja"), "ja", 0, "serif");
- addFont(font)¶
Create a
PDFObject
from theFont
object as an Identity-H encoded CID font.- Arguments:
font –
Font
.
- Returns:
PDFObject
.
EXAMPLE
var obj = pdfDocument.addFont(new mupdf.Font("Times-Roman"));
- addImage(image)¶
Create a
PDFObject
from theImage
object.- Arguments:
image –
Image
.
- Returns:
PDFObject
.
EXAMPLE
var obj = pdfDocument.addImage(new mupdf.Image(pixmap));
- loadImage(obj)¶
Load an
Image
from aPDFObject
(typically an indirect reference to an image resource).- Arguments:
obj –
PDFObject
.
- Returns:
Image
.
EXAMPLE
var image = pdfDocument.loadImage(obj);
Copying objects across PDFs¶
The following functions can be used to copy objects from one PDF document to another:
- newGraftMap()¶
Create a graft map on the destination document, so that objects that have already been copied can be found again. Each graft map should only be used with one source document. Make sure to create a new graft map for each source document used.
- Returns:
PDFGraftMap
.
EXAMPLE
var graftMap = pdfDocument.newGraftMap();
- graftObject(object)¶
Deep copy an object into the destination document. This function will not remember previously copied objects. If you are copying several objects from the same source document using multiple calls, you should use a graft map instead.
- Arguments:
object – Object to graft.
EXAMPLE
pdfDocument.graftObject(obj);
- graftPage(to, srcDoc, srcPageNumber)¶
Graft a page and its resources at the given page number from the source document to the requested page number in the document.
- Arguments:
to – The page number to insert the page before. Page numbers start at
0
and-1
means at the end of the document.srcDoc – Source document.
srcPageNumber – Source page number.
EXAMPLE
This would copy the first page of the source document (
0
) to the last page (-1) of the current PDF document.pdfDocument.graftPage(-1, srcDoc, 0);
Embedded/Associated files in PDFs¶
- addEmbeddedFile(filename, mimetype, contents, creationDate, modificationDate, addChecksum)¶
Embedded a file into the document. If a checksum is added then the file contents can be verified later. An indirect reference to a File Specification Object is returned.
- Arguments:
filename –
String
.mimetype –
String
See: Mimetype.contents –
Buffer
.creationDate –
Date
.modificationDate –
Date
.addChecksum –
Boolean
.
- Returns:
Object
File Specification Object.
Note
After embedding a file into a PDF, it can be connected to an annotation using PDFAnnotation.setFilespec().
EXAMPLE
var fileSpecObject = pdfDocument.addEmbeddedFile("my_file.jpg", "image/jpeg", buffer, new Date(), new Date(), false);
- getEmbeddedFiles()¶
Returns the embedded files or null for the document.
- Returns:
Object
File Specification Object.
- getEmbeddedFileParams(fileSpecObject)¶
Historical alias for getFilespecParams.
- getFilespecParams(fileSpecObject)¶
Return an object describing the file referenced by the
fileSpecObject
.- Arguments:
fileSpecObject –
Object
File Specification Object.
- Returns:
Object
Filespec Params Object.
EXAMPLE
var obj = pdfDocument.getFilespecParams(fileSpecObject);
- getEmbeddedFileContents(fileSpecObject)¶
Returns a
Buffer
with the contents of the embedded file referenced by thefileSpecObject
.- Arguments:
fileSpecObject –
Object
File Specification Object.
- Returns:
EXAMPLE
var buffer = pdfDocument.getEmbeddedFileContents(fileSpecObject);
- verifyEmbeddedFileChecksum(fileSpecObject)¶
Verify the MD5 checksum of the embedded file contents.
- Arguments:
fileSpecObject –
Object
File Specification Object.
- Returns:
Boolean
.
EXAMPLE
var fileChecksumValid = pdfDocument.verifyEmbeddedFileChecksum(fileSpecObject);
- countAssociatedFiles()¶
Return the number of Associated Files on this document. Note that this is the number of files associated at the document level, not necessarily the total number of files associated with elements throughout the entire document.
- Returns:
Integer
EXAMPLE
var count = pdfDocument.countAssociatedFiles();
- associatedFile(n)¶
Return the Filespec object that represents the nth Associated File on this document. 0 <= n < count, where count is the value given by countAssociatedFiles().
- Return fileSpecObject:
Object
File Specification Object.
EXAMPLE
var obj = pdfDocument.associatedFile(0);
ZUGFeRD support in PDFs¶
- zugferdProfile()¶
Determine if the current PDF is a ZUGFeRD PDF, and, if so, return the profile type in use. Possible return values include: “NOT ZUGFERD”, “COMFORT”, “BASIC”, “EXTENDED”, “BASIC WL”, “MINIMUM”, “XRECHNUNG”, and “UNKNOWN”.
- Returns:
String
.
EXAMPLE
var profile = pdfDocument.zugferdProfile();
- zugferdVersion()¶
Determine if the current PDF is a ZUGFeRD PDF, and, if so, return the version of the spec it claims to conforms to. This will return 0 for non-zugferd PDFs.
- Returns:
Float
.
EXAMPLE
var version = pdfDocument.zugferdVersion();
- zugferdXML()¶
Return a buffer containing the embedded ZUGFeRD XML data from this PDF.
- Returns:
Buffer
.
EXAMPLE
var buf = pdfDocument.zugferdXML();