RetroSearch Browse

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Showing content from https://github.com/spreadsheetlab/XLParser/issues/161 below:

Add prefixes to regex terminals · Issue #161 · spreadsheetlab/XLParser · GitHub

I am working on a replacement of a handcrafted parser for library ClosedXml (a library to manipulate xlsx files) with the XLParser. Because xlsx can have hundreds of thousands of formulas, I would like to improve performance of XLParser.

I would like to add prefixes to the RegexBasedTerminals.

Irony uses a first character prefixes to build a table of a char->possible terminals (terminals without prefix are always considered). This table is then used to speedup a calculation of current terminal.

I would also like to change grammar to be case sensitive (small, but measurable improvement) and terminals already use both cases, where necessary (e.g. a-zA-Z).

I have tried to change regex options of the terminals (through reflection) - RegexOptions.ExplicitCapture (as recommended in best practices), RegexOptions.Compiled, RegexOptions.CultureInvariant . but there wasn't significant improvements.

I have run a benchmark on EnronFormulasParseTest (test was modified to be single threaded). Parser version with prefixes runs 44% faster.

BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22000.856/21H2)
AMD Ryzen 5 5500U with Radeon Graphics, 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.302
  [Host]     : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT AVX2
  Job-JFUIAS : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT AVX2

IterationCount=3  LaunchCount=1  WarmupCount=1

With prefixes

Method Mean Error StdDev EnronDataSet 26.496 s 3.6721 s 0.2013 s EusesFormulasParseTest 2.852 s 0.0582 s 0.0032 s

Without prefixes

Method Mean Error StdDev EnronDataSet 47.295 s 2.5500 s 0.1398 s EusesFormulasParseTest 4.738 s 0.3636 s 0.0199 s

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4