RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/Python-Markdown/markdown/issues/899 below:

Heading names in toc_tokens contain stashed HTML placeholders · Issue #899 · Python-Markdown/markdown · GitHub

If a Markdown heading contains HTML, the corresponding entry in the .toc_tokens property ends up with HTML placeholders when returned to the user. The following example should illustrate the problem:

>>> import markdown
>>> md = markdown.Markdown(extensions=['toc'])
>>> md.convert('# <code>Heading</code>\n')
'<h1 id="heading"><code>Heading</code></h1>'
>>> md.toc_tokens
[{'level': 1, 'id': 'heading', 'name': '\x02wzxhzdk:0\x03Heading\x02wzxhzdk:1\x03', 'children': []}]

While this isn't too hard to fix (we could just un-stash the HTML immediately before returning it to the user), it does raise a bigger question: what should the data format of the name field in toc_tokens be? Is it...

Markdown (so the value would be exactly as in the source file: <code>Heading</code>)
Plain text (so the value would strip HTML: Heading)
HTML (similar to the Markdown format, but with HTML entities replaced, so <code>a>b</code> becomes <code>a>b</code>)

In particular, this is relevant for mkdocs/mkdocs#1970. Prior to that PR, MkDocs would build an internal representation of the TOC by parsing the HTML from .toc. With the change, it (tries to) use .toc_tokens, but fails due to this issue.

FWIW, I think MkDocs wants this to be plain text in the end, but HTML makes the most sense to me in general: after all, the purpose of this lib is to convert Markdown to HTML. (MkDocs would then just need to parse the HTML fragment and strip out the tags.)

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4