RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/qpdf/qpdf/discussions/1104 below:

QPDF "pages epic" · qpdf/qpdf · Discussion #1104 · GitHub

These are great responses. Thanks for taking the time to read and comment. It is much appreciated!

On my end I might end up creating these objects as needed but maybe not exposing bindings to them directly. It depends what exactly they end up looking like. ... For that to work I'd need QPDFAssembler to update the primary input QPDF rather than returning a new QPDF, since the existing pikepdf API is expressed in terms of modifying the state of the existing primary PDF.

Yes, I think we're thinking the same way.

I would think of QPDFAssember as being something you would most likely use behind the scenes. When someone manipulates pages with arrays like that, you could call QPDFAssember instead of other methods, and that would delegate all the document-level stuff to QPDFAssember. I view pikepdf as having its own higher-level interfaces for qpdf that are more Pythonic. Of course, you could also expose QPDFAssember if you felt like it, and I'd like to not preclude that by some aspect of my design. I occasionally use pikepdf, and I find its interfaces to be extremely natural. It is one of the best Python packages I've used.

Regarding the original QPDF, QPDFAssember would work exactly the way QPDFJob works today. There is a "primary input", and all operations are modifications to that. It is my full intention to allow any QPDF object to be a primary input, so, yes, it would make changes to the original QPDF. I updated the doc to state that, if a QPDF is the primary input, the same QPDF will be returned. What QPDFJob does today is to track all operations and ensure that the first occurrence of a page from the primary input as it appears in the output, even if the order has changed, is the exact same page object with the same ID. Internally, qpdf clears the pages tree up front and then re-inserts pages. When it's completely done, it deletes any unreferenced pages. I think this model will work well with what you're trying to achieve. It also preserves document-level functionality for the primary input even for things I haven't gotten to yet.

I'd need something more like a declarative interface for how one wants the pages organized. (Come to think of it, that would probably be better.)

Yes, this is how I see it. QPDFAssembler (and currently QPDFJob) sequences things on its own as described in the document. That way it can do all the calculations and move things around in the most efficient way. For example, overlay and underlay are always done after page ordering, and when I add composition, that will always be done last. If someone wanted to do things in a different order, they'd have to create intermediate results. From the CLI, that would be to create intermediate files, but from C++ code, you could take the QPDF from one QPDFAssembler and use it as an input to another. It should even work to stack QPDFAssembers with the same QPDF. This is not a common pattern right now because it's hard to set up, but I have test cases to make sure it works. It gets quite tricky when you do things like copying a stream that has a stream data provider, and I can't promise there aren't some bugs, but I think it's quite solid overall.

Might need to way to specify whether pages given on the command line are page numbers or page labels that happen to be numeric. (If that isn't already present?)

Right now, it's always sequential page numbers from 1. It would be nice to have something based on page labels. It looks like I had that and accidentally dropped it. I just went through the last pre-rework version of my TODO document and recovered a few cases I had previous thought of including selecting pages based on labels. I haven't thought of a syntax for that, but it would probably be something like --labels=.... rather than --range=.... Once the new framework is in place, adding something like that would be quite easy.

StructTreeRoot

Your comments are noted. I haven't really studied this part of the spec carefully yet. It appears to be the most complex bit, so it probably won't be first. I'll make a note to come back to your comments.

The method chaining/fluent interface of QPDFJob by raw pointers is awkward to deal with in pybind11. .... Returning a reference or lightweight copyable object like QPDFObjectHandle works best, I've found.

I didn't know this about pybind -- this is very helpful and is one of the reasons I'm interested in having your early input. I think using reference should be fine and will be efficient and flexible. I'll try that first. Would std::shared_ptr work? I know there are ways to create a shared_ptr to this that weren't there in earlier C++ versions, but I'll have to refresh myself. I only use C++ in qpdf now...most of the code I write in my day job is in Go, and if it's not in Go, it's probably in Python. It's hard to keep up with all the changes in the C++ standard library. Anyway, if raw pointers work, then references will work too since I think there are no cases in a fluent interface where you would ever return a null pointer.

I think it's fine to implement this as creating a new empty QPDF and attaching a blank page to it, rather than treating blank pages as a special input source. It wouldn't hurt to have a little API syntactic sugar around that process, though.

By "input source", I just mean a thing inside --pages ... --. My intention is just create a page object with a media box and no content and add it to the pages tree, so this is about as light-weight as it gets.

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4