StructuredText

StructuredText objects hold text from a page that has been analyzed and grouped into blocks, lines and spans. To obtain a StructuredText instance use Page toStructuredText().

Instance methods

Search the text for all instances of needle, and return an array with all matches found on the page.

Each match in the result is an array containing one or more QuadPoints that cover the matching text.

Arguments:
  • needleString.

Returns:

[...].

EXAMPLE

var result = sText.search("Hello World!");
highlight(p, q)

Return an array with rectangles needed to highlight a selection defined by the start and end points.

Arguments:
  • p – Start point in format [x,y].

  • q – End point in format [x,y].

Returns:

[...].

EXAMPLE

var result = sText.highlight([100,100], [200,100]);
copy(p, q)

Return the text from the selection defined by the start and end points.

Arguments:
  • p – Start point in format [x,y].

  • q – End point in format [x,y].

Returns:

String.

EXAMPLE

var result = sText.copy([100,100], [200,100]);
walk(walker)

wasm only

Arguments:
  • walker – Function with protocol methods, see example below for details.

Walk through the blocks (images or text blocks) of the structured text. For each text block walk over its lines of text, and for each line each of its characters. For each block, line or character the walker will have a method called.

EXAMPLE

var stext = pdfPage.toStructuredText();
stext.walk({
    beginLine: function (bbox, wmode, direction) {
        console.log("beginLine", bbox, wmode, direction);
    },
    beginTextBlock: function (bbox) {
        console.log("beginTextBlock", bbox);
    },
    endLine: function () {
        console.log("endLine");
    },
    endTextBlock: function () {
        console.log("endTextBlock");
    },
    onChar: function (utf, origin, font, size, quad, color) {
        console.log("onChar", utf, origin, font, size, quad, color);
    },
    onImageBlock: function (bbox, transform, image) {
        console.log("onImageBlock", bbox, transform, image);
    },
});

Note

On beginLine the direction parameter is a vector (e.g. [0, 1]) and can you can calculate the rotation as an angle with some trigonometry on the vector.

asJSON(scale)

wasm only

Returns the instance in JSON format.

Arguments:
  • scaleFloat Default: 1. Multiply all the coordinates by this factor to get the coordinates at another resolution. The structured text has all coordinates in points (72 DPI), however you may want to use the coordinates in the StructuredText data at another resolution.

Returns:

String.

EXAMPLE

var json = sText.asJSON();

Note

If you want the coordinates to be 300 DPI then pass (300/72) as the scale parameter.