StructuredText
¶
StructuredText
objects hold text from a page that has been analyzed and
grouped into blocks, lines and spans. To obtain a StructuredText
instance use Page toStructuredText().
Instance methods
- search(needle)¶
Search the text for all instances of
needle
, and return an array with all matches found on the page.Each match in the result is an array containing one or more QuadPoints that cover the matching text.
- Arguments:
needle –
String
.
- Returns:
[...]
.
EXAMPLE
var result = sText.search("Hello World!");
- highlight(p, q)¶
Return an array with rectangles needed to highlight a selection defined by the start and end points.
- Arguments:
p – Start point in format
[x,y]
.q – End point in format
[x,y]
.
- Returns:
[...]
.
EXAMPLE
var result = sText.highlight([100,100], [200,100]);
- copy(p, q)¶
Return the text from the selection defined by the start and end points.
- Arguments:
p – Start point in format
[x,y]
.q – End point in format
[x,y]
.
- Returns:
String
.
EXAMPLE
var result = sText.copy([100,100], [200,100]);
- walk(walker)¶
wasm only
- Arguments:
walker – Function with protocol methods, see example below for details.
Walk through the blocks (images or text blocks) of the structured text. For each text block walk over its lines of text, and for each line each of its characters. For each block, line or character the walker will have a method called.
EXAMPLE
var stext = pdfPage.toStructuredText(); stext.walk({ beginLine: function (bbox, wmode, direction) { console.log("beginLine", bbox, wmode, direction); }, beginTextBlock: function (bbox) { console.log("beginTextBlock", bbox); }, endLine: function () { console.log("endLine"); }, endTextBlock: function () { console.log("endTextBlock"); }, onChar: function (utf, origin, font, size, quad, color) { console.log("onChar", utf, origin, font, size, quad, color); }, onImageBlock: function (bbox, transform, image) { console.log("onImageBlock", bbox, transform, image); }, });
Note
On
beginLine
the direction parameter is a vector (e.g.[0, 1]
) and can you can calculate the rotation as an angle with some trigonometry on the vector.
- asJSON(scale)¶
wasm only
Returns the instance in JSON format.
- Arguments:
scale –
Float
Default:1
. Multiply all the coordinates by this factor to get the coordinates at another resolution. The structured text has all coordinates in points (72 DPI), however you may want to use the coordinates in theStructuredText
data at another resolution.
- Returns:
String
.
EXAMPLE
var json = sText.asJSON();
Note
If you want the coordinates to be 300 DPI then pass (300/72) as the
scale
parameter.