RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://phabricator.wikimedia.org/T18474 below:

⚓ T18474 {{FILEPATH:{{PAGENAME}} }} doesn't work for filenames containing characters that get escaped to HTML entities

Event Timeline • bzimport

added a subscriber:

Unknown Object (MLST)

Comment Actions Also seems to apply to the output of other magic Words, even simple ones like LC. We risk getting into the situation where people are relying on the broken behaviour, and fixing it will then break stuff. Comment Actions a.d.bergi wrote: I'm not sure whether it was already posted, but {{NAMESPACE:xyz}} and {{PAGENAME:xyz}} are also not working with "<", ">" or their html-escaped equivalents &lt; and &gt;. The quote characters (",') are OK in here. I think it belongs to this bug, please fix me. Comment Actions (In reply to comment #9) I'm not sure whether it was already posted, but {{NAMESPACE:xyz}} and {{PAGENAME:xyz}} are also not working with "<", ">" or their html-escaped equivalents &lt; and &gt;. The quote characters (",') are OK in here. I think it belongs to this bug, please fix me. "<" and ">" are not allow inside titles, see [[mw:Manual:$wgLegalTitleChars]], so there are cannot work with NAMESPACE or PAGENAME. Comment Actions (In reply to comment #4) Looks like the same base problem as bug 14779... No, using "$title = Title::newFromText( $name );" and give the $title to wfFindFile works (with or without "File:"-prefix). But that doesnot work with urlencode title, that is bug 14779. wfFindFile is using Title::makeTitleSafe, which use Tilte::makeTitle for the mDbkeyform. Title::newFromText use Sanitizer::decodeCharReferencesAndNormalize and that normalize the " & '. It looks like, newFromText should used for all Titles from Wikitext and makeTitleSafe should used for form given titles. So this parser function use the wrong method. Comment Actions (In reply to comment #6) Related issue: {{PAGESINCATEGORY:{{PAGENAME}}}} doesn't work on categories with a ' PAGESINCATEGORY used Category::newFromName which use Title::makeTitleSafe -> see comment 11 Comment Actions (In reply to comment #11) ... ([wfFindFile works] with or without "File:"-prefix). ... reported as bug 25670 Comment Actions Workaround: filter the fiven file name / page name / category name, containing HTML entities as returned by various parser functions (like lc:, uc:, #if:, #switch:...), through #titleparts to convert back these HTML entities to plain characters this returned value can be passed to parsers functions that do not like these HTML entities: PAGESINCATEGORY, FILEPATH, #ifexist... The HTML entities we need to handle are notably those characters: which are valid in page names (the < and > characters are not valid in pagenames, they will remain encoded after calling #titleparts). For details about te various encodings used in page names, see [[mw:Manual:PAGENAMEE encoding]] which details how characters may get encoded. This covers the full ASCII set, and the first printable non-ASCII characters (tested with UTF-8 assumed for their plain-text encoding). This also covers some other contextual changes that may occur for some characters which are not encoded except in leading positions where they may be changed, or dropped, as well as those few charaters that get transformed within specific subsequences anywhere in the string (such as the slash and periods). But I agree that functions like PAGESINCATEGORY, FILEPATH... should properly decode these HTML entities (and notably the 3 characters above; the most frequent one encountered being the ASCII apostrophe-quote).

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4