A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://phabricator.wikimedia.org/T299521 below:

⚓ T299521 PDF file has 0x0 image size in Commons after uploading a new version while the page number is correct

Event Timeline Ankry

renamed this task from

PDF file has 0x0 image size in Commons after uploading a new version while page number is corect

to

PDF file has 0x0 image size in Commons after uploading a new version while the page number is correct

.

Ankry

reopened this task as

Open

.

Comment Actions

Finally, I tried deleting all versions except the last one, but it didn't work. So I deleted everything, and I reupload the file, and it shows fine. The bug still exist through. And it doesn't show properly on Wikisource.

Comment Actions

I'm getting a similar problem with this PDF:

(currently it is broke just on la.ws, as I reverted to an old version.)

An administrator can try to upload this file locally. This sometimes works as a workaround, but hides the problem.

Comment Actions

,snip>
An administrator can try to upload this file locally. This sometimes works as a workaround, but hides the problem.

Are you able to give me some instructions or tips on how this is done so our administrators can assess if it is easy to try or not?

Comment Actions

,snip>
An administrator can try to upload this file locally. This sometimes works as a workaround, but hides the problem.

Are you able to give me some instructions or tips on how this is done so our administrators can assess if it is easy to try or not?

Download the file from Commons. Visit Special:Upload on your wiki and upload the file, telling it to continue anyway (I think it will complain it exists on commons)

Comment Actions

Just as a general comment... While it's helpful if you report these issues, if you revert these uploads several times, while only linking to the general File: page, it makes it hard for anyone to try and debug the issue, as they can't necessarily find which is the "broken" version of the PDF...

Comment Actions

@Reedy Yes I see that. I'm afraid I was trying to brute force solve this problem, but all of the attempts failed. I'll remember to link to file versions in future.

As it goes the current version is still broken on LA WS, and I won't be moving it again as I am hoping an administrator will move a copy onto the wiki as per your instructions.

Comment Actions

I've stepped through the logic a bit with some of the reported files, and pretty much the only reason this can happen, is because pdfinfo is not defined/executable/incorrect, or $wgPdfHandlerDpi being 0/undefined.

Both would be silent errors when they occur.

Another option is that the file that boxedcommand creates with the output of the metadata, is not available for the MediaWiki app server for some reason. This too would be a silent error.

D6283's comment indicates that this is still happening. If all pdfinfo work is now a boxedcommand and running in kubernetes, then this indicates that some hosts either don't have pdfinfo, or that there are occasional issues with the output of the command. A race condition with the file flush?

Comment Actions

I've stepped through the logic a bit with some of the reported files, and pretty much the only reason this can happen, is because pdfinfo is not defined/executable/incorrect, or $wgPdfHandlerDpi being 0/undefined.

Both would be silent errors when they occur.

This generally happens on large PDF files. Maybe there is a memory/time/other limit that pdfinfo execution exceeds?

Comment Actions

@Ankry That would be something that points more to the second possibility. A race condition with the file not yet being available to the process trying to read the metadata.

The file gets uploaded, written to the main file server, then the metadata reading starts and the file is not there at all, or not yet completely written to the location where the metadata is trying to read that file from (a replica server).

Comment Actions

The patch should make any errors more verbose so that we can collect more information about these failures.

Comment Actions

&action=purge seems to solve this issue. Can we search for other PDF files that have this issue?

Comment Actions

Can confirm:

  1. that the problem is present both at Commons and every local wiki
  2. ?action=purge on a local description page (e.g. Spanish Wikisource Archivo:Filename.pdf) solves the problem locally, but only after '?action=purge'ing the Commons description page.
Comment Actions

The patch should make any errors more verbose so that we can collect more information about these failures.

@TheDJ Based on buzz around various places this problem seems to be markedly more prevalent currently, and has been for anything from around the last week to a couple of weeks (guesstimate based on how long it usually takes such issues to bubble up to catch attention).

We're mostly seeing this for Index:-namespace pages, possibly because 1) they are nearly always depending on multi-page files (vs. plain jpegs), 2) they actually always need the page numbers and other extended metadata, and 3) they are very often created shortly after the file is uploaded.

Anecdotally (and I verified in one case), the file on Commons looks fine, the file on the local wiki looks broken (no thumb), and the Index: page displays the above error. Purging the file on Commons has no effect, but purging the non-existent local file description page fixes it. This last is different from previous problems that looked similar in that no amount of purging of anything seemed to have an effect in those cases, except maybe exceptionally and randomly (so we're probably talking a cluster of similar problems).

But that it appears significantly more prevalent right now means something changed somewhere in the stack and combined with the extended logging it may be possible to pinpoint the cause.

Comment Actions

I don't know if this could add something to the discussion, anyway on my mw installation (MediaWiki 1.40.0; PHP 8.3.7 (apache2handler); ICU 70.1 ; MariaDB 10.11.8) I am having the same issue. If I try to generate the thumbs of a pdf via script, I get this:
Deprecated: strlen(): Passing null to parameter #1 ($string) of type string is deprecated in /var/www/html/mediawiki/thumb.php on line 362

Comment Actions

Deprecated: strlen(): Passing null to parameter #1 ($string) of type string is deprecated in /var/www/html/mediawiki/thumb.php on line 362

Passing null to strlen() was deprecated in PHP 8.1, but this is probably just a symptom. Whatever is being passed to strlen() should presumably not be null and is so because something failed earlier in the process.

This "something" could be anything, including datacenters being out of sync, JobQueue jobs timing out, DB queries timing out, pathological data that makes Ghostscript spit out something the PDF handler doesn't handle.

For example, it looks like FileRepo\ThumbnailEntryPoint.php mostly sanitizes what it passes to strlen() (using null coalescing), but in one instance, when trying to generate a Content-Length header, it passes $content directly so that if it fails to get the thumbnail data it'll throw that error.

Comment Actions

Deprecated: strlen(): Passing null to parameter #1 ($string) of type string is deprecated in /var/www/html/mediawiki/thumb.php on line 362

This "something" could be anything, including datacenters being out of sync, JobQueue jobs timing out, DB queries timing out, pathological data that makes Ghostscript spit out something the PDF handler doesn't handle.

Well in this case it is https://github.com/wikimedia/mediawiki/blob/1.40.0/thumb.php#L362
So thumbproxyurl is null. This was fixed with null coalescing in a patch release of 1.40

Comment Actions

This issue (or something closely related) has been causing problems at Wikisource projects for a long time now. Basically, every single file that is uploaded for use at Wikisource has this issue, and needs to be purged after the upload. This is very confusing and frustrating for new and inexperienced users, because they just can't understand what is going wrong, and they don't know how to purge the file to solve the problem.
Please, is it possible to do something to solve this issue?
If purging the file solves the problem, is it possible, at least as a temporary measure, to just force an automatic purge after every upload of a PDF or Djvu file?


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4