pub struct Config { }
Available on crate feature meta
only.
An object describing the configuration of a Regex
.
This configuration only includes options for the non-syntax behavior of a Regex
, and can be applied via the Builder::configure
method. For configuring the syntax options, see util::syntax::Config
.
In some cases, the default size limit might be too big. The size limit can be lowered, which will prevent large regex patterns from compiling.
use regex_automata::meta::Regex;
let result = Regex::builder()
.configure(Regex::config().nfa_size_limit(Some(20 * (1<<10))))
.build(r"\pL");
assert!(result.is_err());
Source§ Source
Create a new configuration object for a Regex
.
Set the match semantics for a Regex
.
The default value is MatchKind::LeftmostFirst
.
use regex_automata::{meta::Regex, Match, MatchKind};
let re = Regex::new("sam|samwise")?;
assert_eq!(Some(Match::must(0, 0..3)), re.find("samwise"));
let re = Regex::builder()
.configure(Regex::config().match_kind(MatchKind::All))
.build("sam|samwise")?;
assert_eq!(Some(Match::must(0, 0..7)), re.find("samwise"));
assert_eq!(Some(Match::must(0, 4..11)), re.find("sam samwise"));
Source
Toggles whether empty matches are permitted to occur between the code units of a UTF-8 encoded codepoint.
This should generally be enabled when search a &str
or anything that you otherwise know is valid UTF-8. It should be disabled in all other cases. Namely, if the haystack is not valid UTF-8 and this is enabled, then behavior is unspecified.
By default, this is enabled.
§Exampleuse regex_automata::{meta::Regex, Match};
let re = Regex::new("")?;
let got: Vec<Match> = re.find_iter("☃").collect();
assert_eq!(got, vec![
Match::must(0, 0..0),
Match::must(0, 3..3),
]);
let re = Regex::builder()
.configure(Regex::config().utf8_empty(false))
.build("")?;
let got: Vec<Match> = re.find_iter("☃").collect();
assert_eq!(got, vec![
Match::must(0, 0..0),
Match::must(0, 1..1),
Match::must(0, 2..2),
Match::must(0, 3..3),
]);
Ok::<(), Box<dyn std::error::Error>>(())
Source
Toggles whether automatic prefilter support is enabled.
If this is disabled and Config::prefilter
is not set, then the meta regex engine will not use any prefilters. This can sometimes be beneficial in cases where you know (or have measured) that the prefilter leads to overall worse search performance.
By default, this is enabled.
§Exampleuse regex_automata::{meta::Regex, Match};
let re = Regex::builder()
.configure(Regex::config().auto_prefilter(false))
.build(r"Bruce \w+")?;
let hay = "Hello Bruce Springsteen!";
assert_eq!(Some(Match::must(0, 6..23)), re.find(hay));
Ok::<(), Box<dyn std::error::Error>>(())
Source
Overrides and sets the prefilter to use inside a Regex
.
This permits one to forcefully set a prefilter in cases where the caller knows better than whatever the automatic prefilter logic is capable of.
By default, this is set to None
and an automatic prefilter will be used if one could be built. (Assuming Config::auto_prefilter
is enabled, which it is by default.)
This example shows how to set your own prefilter. In the case of a pattern like Bruce \w+
, the automatic prefilter is likely to be constructed in a way that it will look for occurrences of Bruce
. In most cases, this is the best choice. But in some cases, it may be the case that running memchr
on B
is the best choice. One can achieve that behavior by overriding the automatic prefilter logic and providing a prefilter that just matches B
.
use regex_automata::{
meta::Regex,
util::prefilter::Prefilter,
Match, MatchKind,
};
let pre = Prefilter::new(MatchKind::LeftmostFirst, &["B"])
.expect("a prefilter");
let re = Regex::builder()
.configure(Regex::config().prefilter(Some(pre)))
.build(r"Bruce \w+")?;
let hay = "Hello Bruce Springsteen!";
assert_eq!(Some(Match::must(0, 6..23)), re.find(hay));
§Example: incorrect prefilters can lead to incorrect results!
Be warned that setting an incorrect prefilter can lead to missed matches. So if you use this option, ensure your prefilter can never report false negatives. (A false positive is, on the other hand, quite okay and generally unavoidable.)
use regex_automata::{
meta::Regex,
util::prefilter::Prefilter,
Match, MatchKind,
};
let pre = Prefilter::new(MatchKind::LeftmostFirst, &["Z"])
.expect("a prefilter");
let re = Regex::builder()
.configure(Regex::config().prefilter(Some(pre)))
.build(r"Bruce \w+")?;
let hay = "Hello Bruce Springsteen!";
assert_eq!(None, re.find(hay));
Source
Configures what kinds of groups are compiled as “capturing” in the underlying regex engine.
This is set to WhichCaptures::All
by default. Callers may wish to use WhichCaptures::Implicit
in cases where one wants avoid the overhead of capture states for explicit groups.
Note that another approach to avoiding the overhead of capture groups is by using non-capturing groups in the regex pattern. That is, (?:a)
instead of (a)
. This option is useful when you can’t control the concrete syntax but know that you don’t need the underlying capture states. For example, using WhichCaptures::Implicit
will behave as if all explicit capturing groups in the pattern were non-capturing.
Setting this to WhichCaptures::None
is usually not the right thing to do. When no capture states are compiled, some regex engines (such as the PikeVM
) won’t be able to report match offsets. This will manifest as no match being found.
This example demonstrates how the results of capture groups can change based on this option. First we show the default (all capture groups in the pattern are capturing):
use regex_automata::{meta::Regex, Match, Span};
let re = Regex::new(r"foo([0-9]+)bar")?;
let hay = "foo123bar";
let mut caps = re.create_captures();
re.captures(hay, &mut caps);
assert_eq!(Some(Span::from(0..9)), caps.get_group(0));
assert_eq!(Some(Span::from(3..6)), caps.get_group(1));
Ok::<(), Box<dyn std::error::Error>>(())
And now we show the behavior when we only include implicit capture groups. In this case, we can only find the overall match span, but the spans of any other explicit group don’t exist because they are treated as non-capturing. (In effect, when WhichCaptures::Implicit
is used, there is no real point in using Regex::captures
since it will never be able to report more information than Regex::find
.)
use regex_automata::{
meta::Regex,
nfa::thompson::WhichCaptures,
Match,
Span,
};
let re = Regex::builder()
.configure(Regex::config().which_captures(WhichCaptures::Implicit))
.build(r"foo([0-9]+)bar")?;
let hay = "foo123bar";
let mut caps = re.create_captures();
re.captures(hay, &mut caps);
assert_eq!(Some(Span::from(0..9)), caps.get_group(0));
assert_eq!(None, caps.get_group(1));
Ok::<(), Box<dyn std::error::Error>>(())
Source
Sets the size limit, in bytes, to enforce on the construction of every NFA build by the meta regex engine.
Setting it to None
disables the limit. This is not recommended if you’re compiling untrusted patterns.
Note that this limit is applied to each NFA built, and if any of them exceed the limit, then construction will fail. This limit does not correspond to the total memory used by all NFAs in the meta regex engine.
This defaults to some reasonable number that permits most reasonable patterns.
§Exampleuse regex_automata::meta::Regex;
let result = Regex::builder()
.configure(Regex::config().nfa_size_limit(Some(20 * (1<<10))))
.build(r"\pL");
assert!(result.is_err());
let result = Regex::builder()
.configure(Regex::config()
.nfa_size_limit(Some(20 * (1<<10)))
.hybrid(false)
.dfa(false)
)
.build(r"\pL");
assert!(result.is_ok());
Source
Sets the size limit, in bytes, for the one-pass DFA.
Setting it to None
disables the limit. Disabling the limit is strongly discouraged when compiling untrusted patterns. Even if the patterns are trusted, it still may not be a good idea, since a one-pass DFA can use a lot of memory. With that said, as the size of a regex increases, the likelihood of it being one-pass likely decreases.
This defaults to some reasonable number that permits most reasonable one-pass patterns.
§ExampleThis shows how to set the one-pass DFA size limit. Note that since a one-pass DFA is an optional component of the meta regex engine, this size limit only impacts what is built internally and will never determine whether a Regex
itself fails to build.
use regex_automata::meta::Regex;
let result = Regex::builder()
.configure(Regex::config().onepass_size_limit(Some(2 * (1<<20))))
.build(r"\pL{5}");
assert!(result.is_ok());
Source
Set the cache capacity, in bytes, for the lazy DFA.
The cache capacity of the lazy DFA determines approximately how much heap memory it is allowed to use to store its state transitions. The state transitions are computed at search time, and if the cache fills up it, it is cleared. At this point, any previously generated state transitions are lost and are re-generated if they’re needed again.
This sort of cache filling and clearing works quite well so long as cache clearing happens infrequently. If it happens too often, then the meta regex engine will stop using the lazy DFA and switch over to a different regex engine.
In cases where the cache is cleared too often, it may be possible to give the cache more space and reduce (or eliminate) how often it is cleared. Similarly, sometimes a regex is so big that the lazy DFA isn’t used at all if its cache capacity isn’t big enough.
The capacity set here is a limit on how much memory is used. The actual memory used is only allocated as it’s needed.
Determining the right value for this is a little tricky and will likely required some profiling. Enabling the logging
feature and setting the log level to trace
will also tell you how often the cache is being cleared.
use regex_automata::meta::Regex;
let result = Regex::builder()
.configure(Regex::config().hybrid_cache_capacity(20 * (1<<20)))
.build(r"\pL{5}");
assert!(result.is_ok());
Source
Sets the size limit, in bytes, for heap memory used for a fully compiled DFA.
NOTE: If you increase this, you’ll likely also need to increase Config::dfa_state_limit
.
In contrast to the lazy DFA, building a full DFA requires computing all of its state transitions up front. This can be a very expensive process, and runs in worst case 2^n
time and space (where n
is proportional to the size of the regex). However, a full DFA unlocks some additional optimization opportunities.
Because full DFAs can be so expensive, the default limits for them are incredibly small. Generally speaking, if your regex is moderately big or if you’re using Unicode features (\w
is Unicode-aware by default for example), then you can expect that the meta regex engine won’t even attempt to build a DFA for it.
If this and Config::dfa_state_limit
are set to None
, then the meta regex will not use any sort of limits when deciding whether to build a DFA. This in turn makes construction of a Regex
take worst case exponential time and space. Even short patterns can result in huge space blow ups. So it is strongly recommended to keep some kind of limit set!
The default is set to a small number that permits some simple regexes to get compiled into DFAs in reasonable time.
§Exampleuse regex_automata::meta::Regex;
let result = Regex::builder()
.configure(Regex::config()
.dfa_size_limit(Some(100 * (1<<20)))
.dfa_state_limit(None))
.build(r"\pL{5}");
assert!(result.is_ok());
Source
Sets a limit on the total number of NFA states, beyond which, a full DFA is not attempted to be compiled.
This limit works in concert with Config::dfa_size_limit
. Namely, where as Config::dfa_size_limit
is applied by attempting to construct a DFA, this limit is used to avoid the attempt in the first place. This is useful to avoid hefty initialization costs associated with building a DFA for cases where it is obvious the DFA will ultimately be too big.
By default, this is set to a very small number.
§Exampleuse regex_automata::meta::Regex;
let result = Regex::builder()
.configure(Regex::config()
.dfa_state_limit(None))
.build(r"(?-u)\w{30}");
assert!(result.is_ok());
Source
Whether to attempt to shrink the size of the alphabet for the regex pattern or not. When enabled, the alphabet is shrunk into a set of equivalence classes, where every byte in the same equivalence class cannot discriminate between a match or non-match.
WARNING: This is only useful for debugging DFAs. Disabling this does not yield any speed advantages. Indeed, disabling it can result in much higher memory usage. Disabling byte classes is useful for debugging the actual generated transitions because it lets one see the transitions defined on actual bytes instead of the equivalence classes.
This option is enabled by default and should never be disabled unless one is debugging the meta regex engine’s internals.
§Exampleuse regex_automata::{meta::Regex, Match};
let re = Regex::builder()
.configure(Regex::config().byte_classes(false))
.build(r"[a-z]+")?;
let hay = "!!quux!!";
assert_eq!(Some(Match::must(0, 2..6)), re.find(hay));
Source
Set the line terminator to be used by the ^
and $
anchors in multi-line mode.
This option has no effect when CRLF mode is enabled. That is, regardless of this setting, (?Rm:^)
and (?Rm:$)
will always treat \r
and \n
as line terminators (and will never match between a \r
and a \n
).
By default, \n
is the line terminator.
Warning: This does not change the behavior of .
. To do that, you’ll need to configure the syntax option syntax::Config::line_terminator
in addition to this. Otherwise, .
will continue to match any character other than \n
.
use regex_automata::{meta::Regex, util::syntax, Match};
let re = Regex::builder()
.syntax(syntax::Config::new().multi_line(true))
.configure(Regex::config().line_terminator(b'\x00'))
.build(r"^foo$")?;
let hay = "\x00foo\x00";
assert_eq!(Some(Match::must(0, 1..4)), re.find(hay));
Source
Toggle whether the hybrid NFA/DFA (also known as the “lazy DFA”) should be available for use by the meta regex engine.
Enabling this does not necessarily mean that the lazy DFA will definitely be used. It just means that it will be available for use if the meta regex engine thinks it will be useful.
When the hybrid
crate feature is enabled, then this is enabled by default. Otherwise, if the crate feature is disabled, then this is always disabled, regardless of its setting by the caller.
Toggle whether a fully compiled DFA should be available for use by the meta regex engine.
Enabling this does not necessarily mean that a DFA will definitely be used. It just means that it will be available for use if the meta regex engine thinks it will be useful.
When the dfa-build
crate feature is enabled, then this is enabled by default. Otherwise, if the crate feature is disabled, then this is always disabled, regardless of its setting by the caller.
Toggle whether a one-pass DFA should be available for use by the meta regex engine.
Enabling this does not necessarily mean that a one-pass DFA will definitely be used. It just means that it will be available for use if the meta regex engine thinks it will be useful. (Indeed, a one-pass DFA can only be used when the regex is one-pass. See the dfa::onepass
module for more details.)
When the dfa-onepass
crate feature is enabled, then this is enabled by default. Otherwise, if the crate feature is disabled, then this is always disabled, regardless of its setting by the caller.
Toggle whether a bounded backtracking regex engine should be available for use by the meta regex engine.
Enabling this does not necessarily mean that a bounded backtracker will definitely be used. It just means that it will be available for use if the meta regex engine thinks it will be useful.
When the nfa-backtrack
crate feature is enabled, then this is enabled by default. Otherwise, if the crate feature is disabled, then this is always disabled, regardless of its setting by the caller.
Returns the match kind on this configuration, as set by Config::match_kind
.
If it was not explicitly set, then a default value is returned.
SourceReturns whether empty matches must fall on valid UTF-8 boundaries, as set by Config::utf8_empty
.
If it was not explicitly set, then a default value is returned.
SourceReturns whether automatic prefilters are enabled, as set by Config::auto_prefilter
.
If it was not explicitly set, then a default value is returned.
SourceReturns a manually set prefilter, if one was set by Config::prefilter
.
If it was not explicitly set, then a default value is returned.
SourceReturns the capture configuration, as set by Config::which_captures
.
If it was not explicitly set, then a default value is returned.
SourceReturns NFA size limit, as set by Config::nfa_size_limit
.
If it was not explicitly set, then a default value is returned.
SourceReturns one-pass DFA size limit, as set by Config::onepass_size_limit
.
If it was not explicitly set, then a default value is returned.
Source SourceReturns DFA size limit, as set by Config::dfa_size_limit
.
If it was not explicitly set, then a default value is returned.
SourceReturns DFA size limit in terms of the number of states in the NFA, as set by Config::dfa_state_limit
.
If it was not explicitly set, then a default value is returned.
SourceReturns whether byte classes are enabled, as set by Config::byte_classes
.
If it was not explicitly set, then a default value is returned.
SourceReturns the line terminator for this configuration, as set by Config::line_terminator
.
If it was not explicitly set, then a default value is returned.
SourceReturns whether the hybrid NFA/DFA regex engine may be used, as set by Config::hybrid
.
If it was not explicitly set, then a default value is returned.
SourceReturns whether the DFA regex engine may be used, as set by Config::dfa
.
If it was not explicitly set, then a default value is returned.
SourceReturns whether the one-pass DFA regex engine may be used, as set by Config::onepass
.
If it was not explicitly set, then a default value is returned.
SourceReturns whether the bounded backtracking regex engine may be used, as set by Config::backtrack
.
If it was not explicitly set, then a default value is returned.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4