-
-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Desktop: Resolves #9794: Plugin API: Add support for loading PDFs with the imaging API #10177
Desktop: Resolves #9794: Plugin API: Add support for loading PDFs with the imaging API #10177
Conversation
nativeImage: nativeImage, | ||
nativeImage: { | ||
async createFromPath(path: string) { | ||
if (path.endsWith('.pdf') || path.endsWith('.PDF')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or path.toLowerCase().endsWith('.pdf')
? (just to handle that one edge case where the pdf is going to be named .Pdf)
/** | ||
* The first page to export. Defaults to `1`, the first page in | ||
* the document. | ||
*/ | ||
minPage?: number; | ||
|
||
/** | ||
* The number of the last page to convert. Defaults to the last page | ||
* if not given. | ||
* | ||
* If `maxPage` is greater than the number of pages in the PDF, all pages | ||
* in the PDF will be converted to images. | ||
*/ | ||
maxPage?: number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to expose this for now? Since we don't have an API to get this info from the pdf in the first place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because PDF-to-image conversion can be slow, I would prefer to have some way to only load a specific number of pages (similar to the cursor
we use for the data API). The intention for minPage
and maxPage
was to allow fetching only a certain number of pages, starting at a specific point.
An alternative could be to allow users to provide a cursor
and a limit
(and also return a has_more
property), similar to the data API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the cursor
interface would be a bit too abstract for this, so it's fine as it is, however how will the plugin know the number of pages in the document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however how will the plugin know the number of pages in the document?
It currently isn't possible to get the number of pages in the document. However, it's fine if maxPage
is greater than the maximum number of pages. As such, a plugin could do something similar to the following:
const processCount = 5;
for (let i = 1; ; i += processCount) {
const images = await joplin.imaging.createFromPdfResource(
resourceId,
{ minPage: i, maxPage: i + (processCount - 1) },
);
if (images.length === 0) {
break;
}
await processImages(images);
}
It could make sense to add the above to the test plugin (which currently processes only the first 10 pages).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could also make sense to have joplin.imaging.createFromPdfResource
return an object with additional PDF information, in addition to image handles. For example,
{
pages: Handle[]; // String image handles for each page
pdfInfo: {
pageCount: number;
},
}
Such an object would also make it easier to return more information about PDFs in the future, without breaking changes to the plugin API.
packages/generator-joplin/generators/app/templates/api/JoplinImaging.d.ts
Show resolved
Hide resolved
scaleFactor?: number; | ||
} | ||
export interface PdfInfo { | ||
numPages: number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you name it "pageCount" please since this is what we use for pagination in various APIs? (often "page_count" actually but "pageCount" would make more sense in this context)
} | ||
export interface PdfInfo { | ||
numPages: number; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a follow up pull request, is there any additional info we can retrieve using pdf.js? I see that there's a doc.getMetadata()
which maybe we could use.
Summary
Resolves #9794.
Adds a plugin API to convert PDFs to images.
Specifically, it adds the following:
joplin.imaging.createFromPdfResource
joplin.imaging.createFromPdfPath
: Creates images from a PDF file (one per page). Returns a handle to each page.It also makes the following changes to existing methods:
joplin.imaging.createFromResource
andjoplin.imaging.createFromPath
: If given a path to a pdf resource or file (based on the file extension), return an image of its first page.joplin.imaging.free
: Support freeing an array of image handles.Testing
This pull request updates the
imaging
example plugin to use the new APIs. To manually test this pull request using that plugin,