This document describes the breaking changes and additions when upgrading from v0.2.1.
Upgraded from pdf.js 2.14.110 to 5.5.207. See breaking changes in the result json below.
| v0.2.1 | Current |
|---|---|
| >= 12 | >= 20 |
The package has been modernized to module system ESM.
If you use require(), the CJS entry point is still available. If you use import, the ESM entry point is used automatically.
The top-level filename property has been removed from the result.
{
"meta": { ... },
"pages": [ ... ],
- "filename": "./example.pdf"
}The top-level document info object has been renamed from pdfInfo to info,
and the fingerprint string has been replaced by a fingerprints array.
{
"meta": { ... },
"pages": [ ... ],
- "pdfInfo": {
- "numPages": 1,
- "fingerprint": "2e22bde07d96d0408524c26eeecd3483"
- }
+ "info": {
+ "numPages": 1,
+ "fingerprints": [
+ "2e22bde07d96d0408524c26eeecd3483",
+ "f6c92b368a8a13408457a1d395a37eb9"
+ ]
+ }
}The per-page info object has been renamed from pageInfo to info, and a new view sub-object has been added.
{
- "pageInfo": {
+ "info": {
"num": 1,
"scale": 1,
"rotation": 0,
"offsetX": 0,
"offsetY": 0,
"width": 595,
- "height": 842
+ "height": 842,
+ "view": {
+ "minX": 0,
+ "minY": 0,
+ "maxX": 595,
+ "maxY": 842
+ }
}
}The links array has been removed from pages. Link annotations are now available via the new pages[].annotations array
(if present in the PDF). See Annotations below.
{
- "links": ["https://example.com"],
"content": [ ... ]
}The fontName string has been replaced by a font object containing detailed font information.
{
"str": "Hello World",
"x": 100,
"y": 200,
"width": 80,
"height": 12,
"dir": "ltr",
- "fontName": "Times"
+ "font": {
+ "name": "TimesNewRomanPSMT",
+ "family": "serif",
+ "size": 12,
+ "vertical": false,
+ "ascent": 0.891,
+ "descent": -0.216
+ }
}Note: The
font.namevalue may differ from the oldfontName- it now uses the full PostScript font name rather than a shortened alias.
Note:
font.coloris only present when theincludeColors: trueoption is set.
| Property | Type | Description |
|---|---|---|
transform |
number[] |
The 6-element transformation matrix [a, b, c, d, e, f] |
hasEOL |
boolean |
Whether this text item ends a line |
The property order within text items has changed (e.g. str comes first now). This should not affect programmatic consumers, but may cause diffs if you compare serialized JSON.
The following fields have been made optional in meta.info:
| Field | Note |
|---|---|
Language |
Still present in some PDFs, now omitted when null |
EncryptFilterName |
Still present in some PDFs, now omitted when null |
These fields were previously always included (set to null when absent). Now they are only present when the PDF actually contains them.
| Option | Type | Default | Description |
|---|---|---|---|
includeAttachments |
boolean |
false |
Include file attachments as base64 |
includeImages |
boolean |
false |
Include images as base64 |
includeColors |
boolean |
false |
Include font color in text content items |
When includeAttachments: true, the result contains a top-level attachments array:
{
"attachments": [
{
"filename": "document.pdf",
"description": "An attached file",
"base64data": "JVBERi0xLj..."
}
]
}When includeImages: true, each page may contain an images array:
{
"pages": [
{
"images": [
{
"index": 0,
"x": 0,
"y": 0,
"width": 200,
"height": 100,
"kind": 3,
"transform": [200, 0, 0, -100, 0, 100],
"base64data": "/9j/4AAQ..."
}
]
}
]
}For tiled/repeated images, a positions array is also present containing the repeat positions.
Annotations (including links) are now always extracted when present.
Each page may contain an annotations array with detailed annotation objects.
For example, a link annotation:
{
"annotationType": 2,
"annotationFlags": 0,
"borderStyle": {
"width": 0,
"rawWidth": 1,
"style": 1,
"dashArray": [3],
"horizontalCornerRadius": 0,
"verticalCornerRadius": 0
},
"color": "#000000",
"borderColor": "#000000",
"rotation": 0,
"contentsObj": { "str": "", "dir": "ltr" },
"hasAppearance": false,
"id": "66R",
"rect": [80.459, 95.314, 504.618, 116.088],
"subtype": "Link",
"hasOwnCanvas": false,
"noRotate": false,
"noHTML": false,
"isEditable": false,
"structParent": 1,
"url": "https://www.example.com",
"unsafeUrl": "https://www.example.com",
"overlaidText": "Click here to see example.com",
"x": 504.618,
"y": 116.088
}A new top-level info object is always present:
{
"info": {
"numPages": 2,
"fingerprints": ["abc123...", "def456..."]
}
}Each page's info now includes a view object with the page's bounding box:
{
"view": {
"minX": 0,
"minY": 0,
"maxX": 595,
"maxY": 842
}
}| v0.2.1 Path | Current Path | Change Type |
|---|---|---|
filename |
(removed) | 🔴 Removed |
pdfInfo |
info |
🔴 Renamed |
pdfInfo.fingerprint |
info.fingerprints |
🔴 Changed (string → array) |
pdfInfo.numPages |
info.numPages |
🔴 Renamed (parent renamed) |
pages[].pageInfo |
pages[].info |
🔴 Renamed |
pages[].links |
(removed, see annotations) | 🔴 Removed |
pages[].content[].fontName |
pages[].content[].font.name |
🔴 Restructured |
meta.info.Language |
meta.info.Language |
🟡 No longer included when null |
meta.info.EncryptFilterName |
meta.info.EncryptFilterName |
🟡 No longer included when null |
| - | pages[].info.view |
🟢 Added |
| - | pages[].content[].transform |
🟢 Added |
| - | pages[].content[].font |
🟢 Added |
| - | pages[].content[].font.color |
🟢 Added (requires includeColors: true) |
| - | pages[].content[].hasEOL |
🟢 Added |
| - | pages[].annotations |
🟢 Added |
| - | pages[].images |
🟢 Added |
| - | attachments |
🟢 Added |