A high-performance C++ regex library and lexical analyzer generator with Unicode support.
Two example use cases:
The RE/flex lexical analyzer generator extends Flex++ with Unicode support, indent/dedent anchors, POSIX regex lazy quantifiers, word boundaries, functions for lex and syntax error reporting, lexer rule execution performance profiling, and other new features.
Only RE/flex supports backtrack-free regex lazy matching in linear time using an advanced DFA transformation algorithm (invented by Dr. Robert van Engelen.)
RE/flex is faster than Flex and much faster than regex libraries such as Boost.Regex, C++11 std::regex, PCRE2 and RE2. For example, tokenizing a 2 KB representative C source code file into 244 tokens takes only 8.7 microseconds:
Command / Function Software Time (μs) reflex --fast --noindent RE/flex 3.4.1 8.7 reflex --fast RE/flex 3.4.1 8.9 flex -+ --full Flex 2.5.35 9.8 boost::spirit::lex::lexertl::actor_lexer::iterator_type Boost.Spirit.Lex 1.82.0 10.7 reflex --full RE/flex 3.4.1 20.6 pcre2_jit_match() PCRE2 (jit) 10.42 60.8 hs_compile_multi(), hs_scan() Hyperscan 5.4.2 129 reflex -m=boost-perl Boost.Regex 1.82.0 205 RE2::Consume() RE2 (pre-compiled) 2023-09-01 218 reflex -m=boost Boost.Regex POSIX 1.82.0 392 pcre2_match() PCRE2 10.42 500 RE2::Consume() RE2 POSIX (pre-compiled) 2023-09-01 534 flex -+ Flex 2.5.35 3759 pcre2_dfa_match() PCRE2 POSIX (dfa) 10.42 4029 regcomp(), regexec() GNU C POSIX.2 regex 4932 std::cregex_iterator() C++11 std::regex 6490Note: performance in elapsed time (lower is better) in microseconds for 1000 to 10000 benchmark runs using Mac OS X 12.6.9 with clang 12.0.0 -O2, 2.9 GHz Intel Core i7, 16 GB 2133 MHz LPDDR3. Hyperscan disqualifies as a scanner due to its "All matches reported" semantics resulting in 1915 matches for this test, and due to its event handler requirements. Download the tests
The performance table is indicative of the impact on performance when using PCRE2 and Boost.Regex with RE/flex. PCRE2 and Boost.Regex are optional libraries integrated with RE/flex for Perl matching because of their efficiency. By default, RE/flex uses DFA-based extended regular expression matching in linear time, the fastest method (as shown in the table).
The RE/flex matcher tracks line numbers, column numbers, and indentations, whereas Lex and Flex do not (option noyylineno) and neither do the other regex matchers in the table (except PCRE2 and Boost.Regex when used with RE/flex). Tracking this information incurs some overhead. RE/flex also automatically decodes UTF-8/16/32 input and accepts std::istream
, strings, and wide strings as input.
%skeleton "lalr1.cc"
and Bison Complete Symbols.\p{C}
, including Unicode identifier matching for C++11, Java, C#, and Python source code.%class
and %init
to customize the generated Lexer classes.%include
to modularize lexer specifications.yypush_buffer_state
saves the scanner state (line, column, and indentation positions), not just the input buffer; no input buffer length limit (Flex has a 16KB limit); line()
returns the current line (e.g. for error reporting).Note: PCRE2 and Boost.Regex are not dependencies, they can be used as optional regex engines in addition to the RE/flex regex engine.
Use reflex/bin/reflex.exe
from the command line or add a Custom Build Step in MSVC++ as follows:
select the project name in Solution Explorer then Property Pages from the Project menu (see also custom-build steps in Visual Studio);
add an extra path to the reflex/include
folder in the Include Directories under VC++ Directories, which should look like $(VC_IncludePath);$(WindowsSDK_IncludePath);C:\Users\YourUserName\Documents\reflex\include
(this assumes the reflex
source package is in your Documents folder).
enter "C:\Users\YourUserName\Documents\reflex\bin\win32\reflex.exe" --header-file "C:\Users\YourUserName\Documents\mylexer.l"
in the Command Line property under Custom Build Step (this assumes mylexer.l
is in your Documents folder);
enter lex.yy.h lex.yy.cpp
in the Outputs property;
specify Execute Before as PreBuildEvent
.
If you are using specific reflex options such as --flex
then add these in step 3.
Before compiling your program with MSVC++, drag the folders reflex/lib
and reflex/unicode
to the Source Files in the Solution Explorer panel of your project. Next, run reflex.exe
simply by compiling your project (which may fail, but that is OK for now as long as we executed the custom build step to run reflex.exe
). Drag the generated lex.yy.h
(if present) and lex.yy.cpp
files to the Source Files. Now you are all set!
In addition, the reflex/vs
directory contains batch scripts to build projects with MS Visual Studio C++.
On macOS systems you can use homebrew to install RE/flex with brew install re-flex
. Or use MacPorts to install RE/flex with sudo port install re-flex
.
On NetBSD systems you can use the standard NetBSD package installer (pkgsrc): http://cdn.netbsd.org/pub/pkgsrc/current/pkgsrc/devel/RE-flex/README.html
First clone the code:
$ git clone https://github.com/Genivia/RE-flex
Then simply do a quick clean build, assuming your environment is pretty much standard:
$ ./clean.sh
$ ./build.sh
This compiles the reflex tool and installs it locally in reflex/bin
. For local use of RE/flex in your project, you can add this location to your $PATH
variable to enable the new reflex
command:
$ export PATH=$PATH:/your_path_to_reflex/reflex/bin
Note that the libreflex.a
and libreflex.so
libraries are saved locally in reflex/lib
. Link against the library when you use the RE/flex regex engine in your code, such as:
$ c++ <options and .o/.cpp files> -L/your_path_to_reflex/reflex/lib -lreflex
or you could statically link libreflex.a with:
$ c++ <options and .o/.cpp files> /your_path_to_reflex/reflex/lib/libreflex.a
Also note that the RE/flex header files that you will need to include in your project are locally located in include/reflex
.
To install the man page, the header files in /usr/local/include/reflex
, the library in /usr/local/lib
and the reflex
command in /usr/local/bin
:
The configure script accepts configuration and installation options. To view these options, run:
Run configure and make:
To build the examples also:
$ ./configure --enable-examples && make
After this successfully completes, you can optionally run make install
to install the reflex
command and the libreflex
library:
Unfortunately, cloning from Git does not preserve timestamps which means that you may run into "WARNING: 'aclocal-1.15' is missing on your system." To work around this problem, run:
$ autoreconf -fi
$ ./configure && make
The above builds the library with SSE/AVX optimizations applied. To disable AVX optimizations:
$ ./configure --disable-avx && make
To disable both SSE2 and AVX optimizations:
$ ./configure --disable-sse2 && make
Optional libraries to install
To use PCRE2 as a regex engine with the RE/flex library and scanner generator, install PCRE2 and link your code with -lpcre2-8
.
To use Boost.Regex as a regex engine with the RE/flex library and scanner generator, install Boost and link your code with -lboost_regex
or -lboost_regex-mt
.
To visualize the FSM graphs generated with reflex option --graphs-file
, install Graphviz dot.
Copy the lex.vim
file to ~/.vim/syntax/
to enjoy improved syntax highlighting for both Flex and RE/flex.
There are two ways you can use this project:
For the first use case, use the reflex tool on the command line on a lexer specification:
$ reflex --flex --bison --graphs-file lexspec.l
This generates a scanner for Bison from the lexer specification lexspec.l
and saves the finite state machine (FSM) as a Graphviz .gv
file that can be visualized with the Graphviz dot tool:
$ dot -Tpdf reflex.INITIAL.gv > reflex.INITIAL.pdf
$ open reflex.INITIAL.pdf
Several examples are included to get you started. See the manual for more details.
For the second use case, use the RE/flex matcher API classes to start pattern search, matching, splitting and scanning on strings, wide strings, files, and streams.
You can select matchers that are based on different regex engines:
#include <reflex/matcher.h>
and use reflex::Matcher
;#include <reflex/fuzzymatcher.h>
and use reflex::FuzzyMatcher
#include <reflex/pcre2matcher.h>
and use reflex::PCRE2Matcher
or reflex::PCRE2UTFMatcher
.#include <reflex/boostmatcher.h>
and use reflex::BoostMatcher
or reflex::BoostPosixMatcher
;#include <reflex/stdmatcher.h>
and use reflex::StdMatcher
or reflex::StdPosixMatcher
.Each matcher may differ in regex syntax features (see the full documentation), but they all share the same methods and iterators, such as:
matches()
returns nonzero if the whole input from start to end matches the specified pattern;find()
search input and returns nonzero if a match was found, can be repeated;scan()
scan input and returns nonzero if input at current position matches, can be repeated;split()
returns nonzero for a split of the input at the next match, can be repeated;find.begin()
...find.end()
a filter iterator, iterates with find()
;scan.begin()
...scan.end()
a tokenizer iterator, iterates with scan()
;split.begin()
...split.end()
a splitter iterator, iterates with split()
.The input matched and searched may be a string, a wide string, a file, or a stream. Searching is incremental, meaning that the input is not buffered as a whole in memory, but rather buffered in parts in a sliding window of a few KB. The window size may grow to fit a pattern match. UTF-16/32 file input with a UTF BOM is automatically normalized and matched as UTF-8.
For example, using Boost.Regex (alternatively use PCRE2 reflex::PCRE2Matcher
or reflex::PCRE2UTFMatcher
to match Unicode UTF-8 input):
#include <reflex/boostmatcher.h> // reflex::BoostMatcher, reflex::Input, boost::regex
// use a BoostMatcher to check if the birthdate string is a valid date
if (reflex::BoostMatcher("\\d{4}-\\d{2}-\\d{2}", birthdate).matches() != 0)
std::cout << "Valid date!" << std::endl;
With a group capture to fetch the year:
#include <reflex/boostmatcher.h> // reflex::BoostMatcher, reflex::Input, boost::regex
// use a BoostMatcher to check if the birthdate string is a valid date
reflex::BoostMatcher matcher("(\\d{4})-\\d{2}-\\d{2}", birthdate);
if (matcher.matches() != 0)
std::cout << std::string(matcher[1].first, matcher[1].second) << " was a good year!" << std::endl;
A pattern match made by any of the regex engines to match input with matches()
, or search with find()
, or tokenize with scan()
, or split with split()
includes detailed information that can be retrieved with the following methods:
accept()
returns group capture index (or zero if not captured/matched)text()
returns const char*
to 0-terminated match (ends in \0
)strview()
returns std::string_view
text match (preserves \0
s) (C++17)str()
returns std::string
text match (preserves \0
s)wstr()
returns std::wstring
wide text match (converted from UTF-8)chr()
returns first 8-bit char of the text match (str()[0]
as int)wchr()
returns first wide char of the text match (wstr()[0]
as int)pair()
returns std::pair<size_t,std::string>(accept(),str())
wpair()
returns std::pair<size_t,std::wstring>(accept(),wstr())
size()
returns the length of the text match in byteswsize()
returns the length of the match in number of wide characterslines()
returns the number of lines in the text match (>=1)columns()
returns the number of columns of the text match (>=0)begin()
returns const char*
to non-0-terminated text match beginend()
returns const char*
to non-0-terminated text match endrest()
returns const char*
to 0-terminated rest of inputspan()
returns const char*
to 0-terminated match enlarged to span the lineline()
returns std::string
line with the matched text as a substringwline()
returns std::wstring
line with the matched text as a substringmore()
tells the matcher to append the next match (when using scan()
)less(n)
cuts text()
to n
bytes and repositions the matcherlineno()
returns line number of the match, starting at line 1columno()
returns column number of the match in characters, starting at 0lineno_end()
returns ending line number of the match, starting at line 1columno_end()
returns ending column number of the match, starting at 0bol()
returns const char*
to begin of matching line (not 0-terminated)border()
returns the byte offset from the start of the line of the matchfirst()
returns input position of the first character of the matchlast()
returns input position + 1 of the last character of the matchat_bol()
true if matcher reached the begin of a new line \n
at_bob()
true if matcher is at the begin of input and no input consumedat_end()
true if matcher is at the end of input[0]
operator returns std::pair<const char*,size_t>(begin(),size())
[n]
operator returns n'th capture std::pair<const char*,size_t>
Note: POSIX matchers do not generally support group capturing, e.g. BoostPosixMatcher
and StdPosixMatcher
do not. RE/flex is an efficient backtrack-free DFA-based POSIX engine that supports a limited form of capturing, limited to outermost gouping, such as (abc)|(def)
which has two groups. This may be extended in a future release to full capturing.
To search a string for words \w+
to display with the column number of each word found:
#include <reflex/boostmatcher.h> // reflex::BoostMatcher, reflex::Input, boost::regex
// use a BoostMatcher to search for words in a sentence
reflex::BoostMatcher matcher("\\w+", "How now brown cow.");
while (matcher.find() != 0)
std::cout << "Found " << matcher.text() << " at column " << matcher.columno() << std::endl;
The split
method is roughly the inverse of the find
method and returns text located between matches. For example using non-word matching \W+
:
#include <reflex/boostmatcher.h> // reflex::BoostMatcher, reflex::Input, boost::regex
// use a BoostMatcher to search for words in a sentence
reflex::BoostMatcher matcher("\\W+", "How now brown cow.");
while (matcher.split())
std::cout << "Found " << matcher.text() << std::endl;
To pattern match the content of a file, where the file may use UTF-8, 16, or 32 encodings that are automatically converted when a UTF BOM is present:
#include <reflex/boostmatcher.h> // reflex::BoostMatcher, reflex::Input, boost::regex
// use a BoostMatcher to search and display words from a FILE
FILE *fd = fopen("somefile.txt", "r");
if (fd == NULL)
exit(EXIT_FAILURE);
reflex::BoostMatcher matcher("\\w+", fd);
while (matcher.find())
std::cout << "Found " << matcher.text() << std::endl;
fclose(fd);
Same again, but this time with a C++ input stream:
#include <reflex/boostmatcher.h> // reflex::BoostMatcher, reflex::Input, boost::regex
// use a BoostMatcher to search and display words from a stream
std::ifstream file("somefile.txt", std::ifstream::in);
reflex::BoostMatcher matcher("\\w+", file);
while (matcher.find())
std::cout << "Found " << matcher.text() << std::endl;
file.close();
Stuffing the search results into a container using RE/flex iterators:
#include <reflex/boostmatcher.h> // reflex::BoostMatcher, reflex::Input, boost::regex
#include <vector> // std::vector
// use a BoostMatcher to convert words of a sentence into a string vector
reflex::BoostMatcher matcher("\\w+", "How now brown cow.");
std::vector<std::string> words(matcher.find.begin(), matcher.find.end());
Use C++11 range-based loops with RE/flex iterators:
#include <reflex/pcre2matcher.h> // reflex::PCRE2TFMatcher, reflex::Input, std::regex
// use a PCRE2UTFMatcher to search for words in a sentence
reflex::PCRE2UTFMatcher matcher("\\w+", "How now brown cow.");
for (auto& match : matcher.find)
std::cout << "Found " << match.text() << std::endl;
Note that we cannot generally simplify this loop to the following, because the temporary matcher object is destroyed (some compilers handle this in C++23):
for (auto& match : reflex::PCRE2UTFMatcher matcher("\\w+", "How now brown cow.").find);
std::cout << "Found " << match.text() << std::endl;
RE/flex also allows you to convert expressive regex syntax forms such as \p
Unicode classes, character class set operations such as [a-z--[aeiou]]
, escapes such as \X
, and (?x)
mode modifiers, to a regex string that the underlying regex library understands and will be able to use:
std::string reflex::Matcher::convert(const std::string& regex, reflex::convert_flag_type flags)
std::string reflex::PCRE2Matcher::convert(const std::string& regex, reflex::convert_flag_type flags)
std::string reflex::PCRE2UTFMatcher::convert(const std::string& regex, reflex::convert_flag_type flags)
std::string reflex::BoostMatcher::convert(const std::string& regex, reflex::convert_flag_type flags)
std::string reflex::StdMatcher::convert(const std::string& regex, reflex::convert_flag_type flags)
For example:
#include <reflex/matcher.h> // reflex::Matcher, reflex::Input, reflex::Pattern
// use a Matcher to check if sentence is in Greek:
static const reflex::Pattern pattern(reflex::Matcher::convert("[\\p{Greek}\\p{Zs}\\pP]+", reflex::convert_flag::unicode));
if (reflex::Matcher(pattern, sentence).matches() != 0)
std::cout << "This is Greek" << std::endl;
We use convert
with optional flag reflex::convert_flag::unicode
to make .
(dot), \w
, \s
and so on match Unicode and to convert \p
Unicode character classes.
Conversion is fast (it runs in linear time in the size of the regex), but it is not without some overhead. Making converted regex patterns static
as shown above saves the cost of conversion to just once to support many matchings.
Please see CONTRIBUTING.
Where do I find the documentation?Read more about RE/flex in the manual.
RE/flex by Robert van Engelen, Genivia Inc. Copyright (c) 2016-2025, All rights reserved.
RE/flex is distributed under the BSD-3 license LICENSE.txt. Use, modification, and distribution are subject to the BSD-3 license.
Visit GitHub to report bugs: https://github.com/Genivia/RE-flex
%import
with %include
, adds freespace option -x
, fixes minor issues--regexp-file
, Python tokenizer--full
and --fast
, generates scanner with FSM table or a fast scanner with FSM code, respectively#include <reflex/xyz.h>
from now on, fixed errno_t
portability issue-v
shows stats with execution timings, bug fixeswtext()
, wpair()
, winput()
methods, other improvements--unicode
regex conversion, also with (?u:)
, changed wtext()
to wstr()
and added a str()
method-p
(--perf-report
) for performance debugging, added doc/man/reflex.1 man page, added interactive readline example-m
, lexer.in(i)
now resets the lexer, fixed reassigning the same input to the lexer that caused UTF BOM to be read twice-P
to support multiple lexer classes in one application, added configure
installation script, optional quick install with allinstall.sh
(renamed from install.sh
)matches()
, find()
, scan()
, split()
that return nonzero for a match, other minor improvements#
in free space mode--fast
FSM not always halting on EOF after a mismatch; fixed buffer realloc, added new examples/csv.lwstr()
always returning UTF-16 strings (should be UTF-16 only when std::wstring
requires it)yy_scan_string()
, yy_scan_bytes()
and yy_scan_buffer()
functions now create a new buffer as in Flex, delete this buffer with yy_delete_buffer()
; fixed examples to work with newer Bison versions (Bison 3.0.4)yy_scan_wstring
and yy_scan_wbuffer
for wide string scanning with Flex-like functions%option namespace=NAME1.NAME2.NAME3 ...
--namespace
and %option namespace
--namespace
for options --fast
and --full
to support the generation of multiple optimized lexers placed in namespaces.--bison-cc
option to generate scanners for Bison 3.0 %skeleton "lalr1.cc"
C++ parsers, included two examples flexexample9xx
and reflexexample9xx
to demo this feature.--bison-cc-namespace
and --bison-cc-parser
options to customize Bison 3.0 %skeleton "lalr1.cc"
C++ parsers.columno()
to take tab spacing into account.reflex::Input
.configure
and make install
header files, updated --bison-locations
option../configure --enable-examples
.--bison-complete
option, new ugrep utility example, updated manual, fixes minor issues.reflex::convert_flag::basic
to convert BRE syntax to ERE syntax, used by ugrep.reflex::Input::get()
to return positive character code, matcher option "N"
for scan
and find
matches empty input (^$
).--full
, updated documentation and other improvements.reflex::StdMatcher
(std::regex
) causing failures to match input with split
.reflex::Input::in(const char *memptr, size_t memlen)
to read a memory segment (for scanning etc.), added reflex::Input::streambuf
class to use a reflex::Input
object as a std::streambuf
, improved yy_scan_buffer
and yy_scan_bytes
.reflex::Input::dos_streambuf
to convert DOS CRLF to LF, other improvements.(?-imsux)
to reflex::convert
and reflex::Pattern
.AbstractMatcher::set_bob()
, moved AbstractMatcher::peek()
to public, minor improvements.<^...>
, added undent \k
anchor to undo indenting changes ("undenting") with an example in examples/indent2.l
, improved indent \i
and dedent \j
anchors and other improvements.matcher().tabs(n)
to set tab size, used by columno()
and indent \i
and dedent \j
anchors, new reflex::Pattern
methods, other improvements.||
, intersection &&
, and subtraction --
operations, e.g. [||{letter}||{digit}]
expands into [a-zA-Z0-9]
when letter
is defined as [a-zA-Z]
and digit
is defined as [0-9]
, see Character Classes in the documentation.reflex::BoostMatcher
(and derived reflex::BoostPosixMatcher
, reflex::BoostPerlMatcher
) regression bug that crept into the 1.2.4 update.buffer(base, size)
methods and improved Flex-compatible yy_scan_buffer(base, size)
, these functions scan memory fast with zero copy overhead; added mmap.l
example to scan an mmap-ed file fast with mmap(2) and buffer(base, size)
; other improvements.reflex::BufferedInput::dos_streambuf
to improve dos_streambuf
speed by buffering (reflex::Input::dos_streambuf
is unbuffered), fixed %option token-type
to apply without restrictions.lineno()
and columno()
to increase speed, which is essential for large buffers such as large mmap-ed files scanned with buffer(base, size)
; other improvements.-S
(--find
) for efficient searching instead of scanning input (i.e. efficiently ignoring unmatched input) demonstrated with new findfast
and findsearch
examples; changed --nodefault
to throw an exception when option --flex
is not used and when the default rule is triggered.lineno()
caching issue (1.3.8 bug); faster find
.find
for patterns beginning with optional repetitions such as .*
.reflex::AbstractMatcher::clone()
to clone a referenced concrete matcher object.--flex
for Flex compatibility; fixed option --token-type
with option --flex
, now properly defines YY_NULL
and yyterminate
; fixed AbstractMatcher::buffer(n)
for large n
; faster find
.border()
, span()
, line()
, wline()
, and skip(c)
methods; added new section on error reporting and recovery to the documentation; fixed yy_scan_string()
and yy_scan_buffer()
when called before calling yylex()
for the first time; improved performance.dos.l
demo example of reflex::InputBuffer::dos_streambuf
.lineno_end()
and columno_end()
methods, updated columns()
with clarifications in the updated documentation; expanded the documentation with additional error reporting and handling techniques with RE/flex and Bison bridge and complete configurations; FSM code generation improvements.flexexample11xx
example with Flex specification and Bison complete parser; minor improvements.skip(c)
methods with a wchar_t
wide character parameter and a UTF-8 string parameter to skip input; added new option --token-eof
.--noindent
to speed up pattern matching and lexical analysis by disabling indentation tracking in the input (also disables anchors \i
, \j
, and \k
); speed improvements.line()
and span()
.wunput()
method; added lex.vim
improved Flex and RE/flex Vim syntax highlighting; added yaml.l
example; fixed --freespace
with --unicode
when bracket lists contain a #
; character class operators {+}
, {-}
, {&}
now accept defined names as first operands and inverted character classes; indent anchor \k
now matches only when indent level is changed as documented.--matcher=pcre2-perl
; optimized RE/flex matcher find()
with AVX/SSE2/NEON/AArch64; updated and improved regex converters.std::string
in generated scanners.reflex::PCRE2Matcher
; fixed MSVC++ x86 32-bit build error when HAVE_AVX512BW
is enabled (requires AVX512BW).{
; updated lex.vim.--yy
to enable --flex
and --bison
, but also defines the global FILE*
variables yyin
and yyout
for enhanced Lex/Flex compatibility (yyin
is otherwise a pointer to the reflex::Input
object to read files, streams, and strings).}
as closing marker for %top{
, %class{
, and %init{
code blocks, i.e. %}
or }
may be used as closing markers.minic
using RE/flex scanner with Bison 3.2 C++ complete locations, compiles C-like source code to Java bytecode (class files); added fast fuzzy (approximate) regex matcher reflex::FuzzyMatcher
derived from reflex::Matcher
.%option params
to extend lex()
/yylex()
parameters; updated AVX2 detection for SIMD optimizations.--bison-bridge
option; updated examples.IN_HEADER
to yyIN_HEADER
when --flex
is used with --header-file
; added reflex::Input::Handler
event handler for custom handling of FILE*
errors and non-blocking FILE*
streams.lineno(n)
to set or change the line number to n
; added yyset_lineno(n,s)
to flexlexer.h
; updated Mini C compiler example.reflex::Input
copy constructor; minor improvements.\d
to match Unicode when option unicode
is enabled.simd_avx2.cpp
and simd_avx512bw.cpp
to support runtime CPU ID checking when the library is built with ./configure; make
, disable AVX with ./configure --disable-avx
, disable SSE2 with ./configure --disable-sse2
; UTF-16LE BOM detection correction.simd.h
after installation, added REFLEX_BUFFER_SIZE
to customize the initial size and growth of the input buffer.--prefix
to the generated REFLEX_code_[PREFIX]STATE
code.--params
when used with --flex
.columno()
for long lines; fix CP-1251 table typo.%option ctorinit
; faster compilation of regular expressions to tables and direct code DFAs; refactored SIMD source code to enable AVX2 and AVX512BW optimizations in multi-version matcher code; updated Windows binary file opening.%begin
directive; new --batch=SIZE
option argument.xy
and x/x
patterns collided when they should not; updated yaml parser example.--prefix
is specified.yyrestart
dropping the first character; faster Matcher::find()
.Matcher::find()
initialization issue in 3.3.3.Matcher::find()
; improved --stdout
to include tables.Matcher::find()
.FuzzyMatcher::DEL
flag when this is the only flag selected; fix FuzzyMatcher::matches()
bug that incorrectly matched an extra character before the end of the input; optimize find()
; updated saving the FSM pred[]
hashes to a file, which has changed; increase default buffer size REFLEX_BUFSZ
to 128K for best throughput performance.\b
, \B
, \<
and \>
applicable anywhere in a pattern..
(dot) with %unicode
enabled, which is a catch-all pattern; update \X
to match only valid Unicode characters.Matcher::find()
with a new DFA cut algorithm to optimize match prediction speed and accuracy, see also ugrep 5.0; apply Unicode pattern canonicalization with reflex::convert(..., reflex::convert_flag::unicode)
.rawk
example to demonstrate awk-like fast search in C++; enable <<EOF>>
rules for option find
to generate a fast search engine.reflex.pc
(and reflexmin.pc
minimized library) to use the reflex library -lreflex
.Matcher::find()
with refactored SIMD (SSE2/AVX2/AVX512BW/NEON/AArch64) code; larger default 256KB buffer (from 128KB).reflex::Matcher
and reflex::FuzzyMatcher
to respect Unicode word boundaries instead of only ASCII \<
, \>
, \b
, \B
; upgraded regex Unicode converters to Unicode [::]
character classes instead of only ASCII [[:alpha:]]
etc.; improved FSM code generation without local c0.std::string_view strview()
matcher method.-Woverload-virtual
and -Wshadow
warnings; fix a bug in case-insensitive Unicode negated character class matching too much.null_data
to read NUL as LF and vice versa; supports reading xargs -0
output for example.Matcher::find()
.Matcher::find()
speed improvements for certain regex patterns that do not match the input.FILE*
input, i.e. for fcntl
O_NONBLOCK
the regex matchers will wait for input to become available again instead of giving up with an error; changed reflex::Input::Handler
see documentation; remove compiler warnings.reflex
option -D
to immediately debug a scanner's lexer patterns against a specified input file; fix an issue with Matcher::find()
for certain short patterns with ^$ anchors; mark likely
and unlikely
branches in hot paths for Matcher::find()
performance.Matcher::find()
with new and expanded predict-match PM3+PM5 methods to replace PM4; updated FSM code generation to support the expanded prediction tables now stored in compressed form as a hex string when reflex option find
is used with fast
or full
to generate a pre-compiled search engine.RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4