A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from http://mail.python.org/pipermail/python-list/2005-September/340333.html below:

Unnest tag ?

Unnest tag ?Greg. Nans gregnans at yahoo.fr
Mon Sep 5 14:01:32 EDT 2005
Hello,

i am looking for an idea on how to handle un-nesting tags.

i know i can use something build on top of a htmltidy, but i'm rather
wondering if this could be done using only python standard library. my
input tags can not be crossed (i mean "<a> w1 <b> w2 </a> w3 </b>" is
impossible from my input)

actually i had produced some data with :

some input : (line number / content)

0	<a>
1	<b>
2	<c>
3	w1
4	w2
5	</a>
6	w3
7	<d>
8	w4
9	</b>
10	</d>
11	</c>

where in fact i should i have :

0	<b>
1	<c>
2	<a>
3	w1
4	w2
5	</a>
6	w3
7	<d>
8	w4
9	</d>
10	</c>
11	</b>

i am wondering how i can repair that.

i had built a small script which already do that, but as i know there
are clever brains here, may be i will get some better suggestions...

(i need to clean/rewrite my code, but here is how it works : it first
find paired opening/closing tags, their width and positions, then from
the smallest to the largest, it encloses the previous text inside the
current tag and build a text that will be the next one to be enclosed
and so on.)


More information about the Python-list mailing list

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4