AngleSharp.Css
for CSS support)For more information, see the CONTRIBUTING
guide.
GetInnerText of the latest AngleSharp.CSS version renders too many line breaks at the start and end of a paragraph
Steps to Reproduceusing AngleSharp;
using AngleSharp.Css;
using AngleSharp.Dom;
using System.Diagnostics;
var content = "<div class=\"entry-content entry-content-single\" itemprop=\"description\"><p><em><strong>[By the studio that brought you <Solo Leveling>, <Reaper of the Drifting Moon>, and many more!]</strong></em></p>\n<p>He was the hound of the Baskerville family: Vikir.</p>\n<p>Yet his loyalty was rewarded by the blade of a guillotine dirtied by slander.</p>\n<p>“I will never live the life of a hound slaughtered after the rabbit is caught.”</p>\n<p>In place of death, an unexpected opportunity awaits him.</p>\n<p>Vikir’s eyes glowed red as he sharpened his canines in the dark.</p>\n<p>“Just you wait, Hugo. I will rip out your throat this time.”</p>\n<p>It’s time for the hound to exact bloody revenge on his owner.</p>\n</div>";
var context = BrowsingContext.New(Configuration.Default
.WithCss());
var doc = await context.OpenAsync(req => req.Content(content));
var description = doc.QuerySelector(("div[itemprop=\"description\"]"))?.GetInnerText().Trim();
Console.WriteLine(description);
Console.ReadKey();
Expected behavior: Outputs
"[By the studio that brought you <Solo Leveling>, <Reaper of the Drifting Moon>, and many more!]\n\nHe was the hound of the Baskerville family: Vikir.\n\nYet his loyalty was rewarded by the blade of a guillotine dirtied by slander.\n\n“I will never live the life of a hound slaughtered after the rabbit is caught.”\n\nIn place of death, an unexpected opportunity awaits him.\n\nVikir’s eyes glowed red as he sharpened his canines in the dark.\n\n“Just you wait, Hugo. I will rip out your throat this time.”\n\nIt’s time for the hound to exact bloody revenge on his owner."
Pasting this code into the browser consoles also outputs the samedocument.querySelector('div[itemprop=\"description\"]').innerText
Actual behavior: Outputs
"[By the studio that brought you <Solo Leveling>, <Reaper of the Drifting Moon>, and many more!]\n\n \n\nHe was the hound of the Baskerville family: Vikir.\n\n \n\nYet his loyalty was rewarded by the blade of a guillotine dirtied by slander.\n\n \n\n“I will never live the life of a hound slaughtered after the rabbit is caught.”\n\n \n\nIn place of death, an unexpected opportunity awaits him.\n\n \n\nVikir’s eyes glowed red as he sharpened his canines in the dark.\n\n \n\n“Just you wait, Hugo. I will rip out your throat this time.”\n\n \n\nIt’s time for the hound to exact bloody revenge on his owner."
Environment details: .NET 7
Possible SolutionAdjust the RequiredLineBreakCounts in ElementExtension.cs for the Start from 2 to 1 and for the end from 2 to 1
Adding a IsEmpty check for Text Nodes did fix the issue, it was actually empty "\n" line break Text Nodes between the elements (nodes which aren't visible in the frontend anyway, so they shouldn't be rendered).
The current behaviour is actually adding 2 line breaks before the element and 2 line breaks after the element, but Paragraphs should actually render 2 line breaks in total, 1 at the start and 1 at the end.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4