Class RegexOptions

java.lang.Object
com.oracle.truffle.regex.RegexOptions

public final class RegexOptions extends Object
These options define how TRegex should interpret a given parsing request.

Available options:

  • Flavor: specifies the regex dialect to use. Possible values:
    • ECMAScript: ECMAScript/JavaScript syntax (default).
    • Python: Python 3 syntax
    • Ruby: Ruby syntax.
  • Encoding: specifies the string encoding to match against. Possible values:
    • UTF-8
    • UTF-16
    • UTF-32
    • LATIN-1
    • BYTES (equivalent to LATIN-1)
  • MatchingMode: specifies implicit anchoring modes. See MatchingMode for details. Possible values:
    • search
    • match
    • fullmatch
  • PythonLocale: specifies which locale is to be used by this locale-sensitive Python regexp
  • Validate: don't generate a regex matcher object, just check the regex for syntax errors.
  • U180EWhitespace: treat 0x180E MONGOLIAN VOWEL SEPARATOR as part of \s. This is a legacy feature for languages using a Unicode standard older than 6.3, such as ECMAScript 6 and older.
  • UTF16ExplodeAstralSymbols: generate one DFA states per (16 bit) char instead of per-codepoint. This may improve performance in certain scenarios, but increases the likelihood of DFA state explosion.
  • AlwaysEager: do not generate any lazy regex matchers (lazy in the sense that they may lazily compute properties of a RegexResult).
  • RegressionTestMode: exercise all supported regex matcher variants, and check if they produce the same results.
  • DumpAutomata: dump all generated parser trees, NFA, and DFA to disk. This will generate debugging dumps of most relevant data structures in JSON, GraphViz and LaTex format.
  • StepExecution: dump tracing information about all DFA matcher runs.
  • IgnoreAtomicGroups: treat atomic groups as ordinary groups (experimental).
  • MustAdvance: force the matcher to advance by at least one character, either by finding a non-zero-width match or by skipping at least one character before matching.
All options except Flavor, Encoding and PythonMethod are boolean and false by default.
  • Field Details

  • Method Details

    • builder

      public static RegexOptions.Builder builder(com.oracle.truffle.api.source.Source source, String sourceString)
    • getMaxDFASize

      public short getMaxDFASize()
      Maximum number of DFA transitions. Must be less than Short.MAX_VALUE. Defaults to TRegexOptions.TRegexMaxDFATransitions.
    • getMaxBackTrackerCompileSize

      public short getMaxBackTrackerCompileSize()
      Maximum number of NFA transitions to allow for runtime compilation. Must be less than Short.MAX_VALUE. Defaults to TRegexOptions.TRegexMaxBackTrackerMergeExplodeSize.
    • isU180EWhitespace

      public boolean isU180EWhitespace()
    • isRegressionTestMode

      public boolean isRegressionTestMode()
    • isDumpAutomata

      public boolean isDumpAutomata()
      Produce ASTs and automata in JSON, DOT (GraphViz) and LaTeX formats.
    • isDumpAutomataWithSourceSections

      public boolean isDumpAutomataWithSourceSections()
    • isStepExecution

      public boolean isStepExecution()
      Trace the execution of automata in JSON files.
    • isGenerateDFAImmediately

      public boolean isGenerateDFAImmediately()
      Generate DFA matchers immediately after parsing the expression.
    • isBooleanMatch

      public boolean isBooleanMatch()
      Don't track capture groups, just return a boolean match result instead.
    • isAlwaysEager

      public boolean isAlwaysEager()
      Always match capture groups eagerly.
    • isUTF16ExplodeAstralSymbols

      public boolean isUTF16ExplodeAstralSymbols()
      Explode astral symbols (0x10000 - 0x10FFFF) into sub-automata where every state matches one char as opposed to one code point.
    • isValidate

      public boolean isValidate()
      Do not generate an actual regular expression matcher, just check the given regular expression for syntax errors.
    • isIgnoreAtomicGroups

      public boolean isIgnoreAtomicGroups()
      Ignore atomic groups (found e.g. in Ruby regular expressions), treat them as regular groups.
    • isMustAdvance

      public boolean isMustAdvance()
      Do not return zero-width matches at the beginning of the search string. The matcher must advance by at least one character by either finding a match of non-zero width or finding a match after advancing skipping several characters.
    • isGenerateInput

      public boolean isGenerateInput()
      Try to generate a string that matches the given regex and return it instead of the compiled regex.
    • getFlavor

      public RegexFlavor getFlavor()
    • getEncoding

      public Encodings.Encoding getEncoding()
    • getMatchingMode

      public MatchingMode getMatchingMode()
    • getPythonLocale

      public String getPythonLocale()
    • getJavaJDKVersion

      public int getJavaJDKVersion()
      JDK compatibility version for JavaFlavor.
    • withBooleanMatch

      public RegexOptions withBooleanMatch()
    • withoutBooleanMatch

      public RegexOptions withoutBooleanMatch()
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • equals

      public boolean equals(Object obj)
      Overrides:
      equals in class Object
    • toString

      public String toString()
      Overrides:
      toString in class Object