I am working on a replacement of a handcrafted parser for library ClosedXml (a library to manipulate xlsx files) with the XLParser. Because xlsx can have hundreds of thousands of formulas, I would like to improve performance of XLParser.
I would like to add prefixes to the RegexBasedTerminal
s.
Irony uses a first character prefixes to build a table of a char->possible terminals (terminals without prefix are always considered). This table is then used to speedup a calculation of current terminal.
I would also like to change grammar to be case sensitive (small, but measurable improvement) and terminals already use both cases, where necessary (e.g. a-zA-Z
).
I have tried to change regex options of the terminals (through reflection) - RegexOptions.ExplicitCapture
(as recommended in best practices), RegexOptions.Compiled
, RegexOptions.CultureInvariant
. but there wasn't significant improvements.
I have run a benchmark on EnronFormulasParseTest (test was modified to be single threaded). Parser version with prefixes runs 44% faster.
BenchmarkDotNet=v0.13.2, OS=Windows 11 (10.0.22000.856/21H2) AMD Ryzen 5 5500U with Radeon Graphics, 1 CPU, 12 logical and 6 physical cores .NET SDK=6.0.302 [Host] : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT AVX2 Job-JFUIAS : .NET 6.0.7 (6.0.722.32202), X64 RyuJIT AVX2 IterationCount=3 LaunchCount=1 WarmupCount=1
With prefixes
Method Mean Error StdDev EnronDataSet 26.496 s 3.6721 s 0.2013 s EusesFormulasParseTest 2.852 s 0.0582 s 0.0032 sWithout prefixes
Method Mean Error StdDev EnronDataSet 47.295 s 2.5500 s 0.1398 s EusesFormulasParseTest 4.738 s 0.3636 s 0.0199 sRetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4