Class RegexASTBuilder

java.lang.Object
com.oracle.truffle.regex.tregex.parser.RegexASTBuilder

public final class RegexASTBuilder extends Object
This class is used to generate regex ASTs. The provided methods append nodes to the AST.
  • Constructor Details

  • Method Details

    • getCompilationBuffer

      public CompilationBuffer getCompilationBuffer()
    • getCurGroup

      public Group getCurGroup()
      Returns the current Group. Any new Terms will be added to its last Sequence (the one returned by curSequence).
    • getCurSequence

      public Sequence getCurSequence()
      Returns the current Sequence into which new Terms will be added.
    • getCurTerm

      public Term getCurTerm()
      Returns the last Term inserted into the current Sequence. This will be null if a new Sequence or Group was just started.
    • getCurGroupStartPosition

      public int getCurGroupStartPosition()
      Returns the code position of the beginning (opening parenthesis) of the current Group (getCurGroup()).
    • setOverrideSourceSection

      public void setOverrideSourceSection(com.oracle.truffle.api.source.SourceSection sourceSection)
    • clearOverrideSourceSection

      public void clearOverrideSourceSection()
    • curGroupIsRoot

      public boolean curGroupIsRoot()
      Indicates whether the builder is currently in the root group or in some nested group.
      Returns:
      true if the builder is in the root group
    • pushRootGroup

      public void pushRootGroup()
      This should be called first after creating a new RegexASTBuilder. This will create and enter the root capture group (group number 0).
    • pushRootGroup

      public void pushRootGroup(boolean rootCapture)
      Like pushRootGroup(), but allows creating a non-capturing root group. This is useful for building intermediate ASTs that are then pasted into other ASTs.
    • popRootGroup

      public RegexAST popRootGroup()
      This is the build method of this Builder. As such, it should be the last method you call on an RegexASTBuilder instance.
      Returns:
      the generated AST
    • pushGroup

      public void pushGroup(Token token)
      Creates and enters a new non-capturing group. This call should be paired with a call to popGroup(com.oracle.truffle.regex.tregex.parser.Token).
      Parameters:
      token - a Token whose source section should be included in the group's source sections, or null if none
    • pushGroup

      public void pushGroup()
    • pushCaptureGroup

      public void pushCaptureGroup(Token token)
      Creates and enters a new capture group. This call should be paired with a call to popGroup(com.oracle.truffle.regex.tregex.parser.Token).
      Parameters:
      token - a Token whose source section should be included in the group's source sections, or null if none
    • pushCaptureGroup

      public void pushCaptureGroup()
    • pushLookAheadAssertion

      public void pushLookAheadAssertion(Token token, boolean negate)
      Creates and enters a new look-ahead assertion. This call should be paired with a call to popGroup(com.oracle.truffle.regex.tregex.parser.Token).
      Parameters:
      token - a Token whose source section should be included in the assertion's source sections, or null if none
      negate - true if the look-ahead assertion is to be negative
    • pushLookAheadAssertion

      public void pushLookAheadAssertion(boolean negate)
    • pushLookBehindAssertion

      public void pushLookBehindAssertion(Token token, boolean negate)
      Creates and enters a new look-behind assertion. This call should be paired with a call to popGroup(com.oracle.truffle.regex.tregex.parser.Token).
      Parameters:
      token - a Token whose source section should be included in the assertion's source sections, or null if none
      negate - true if the look-behind assertion is to be negative
    • pushLookBehindAssertion

      public void pushLookBehindAssertion(boolean negate)
    • pushAtomicGroup

      public void pushAtomicGroup(Token token)
      Creates and enters a new atomic group. This call should be paired with a call to popGroup(com.oracle.truffle.regex.tregex.parser.Token).
      Parameters:
      token - a Token whose source section should be included in the group's source sections, or null if none
    • pushAtomicGroup

      public void pushAtomicGroup()
    • pushConditionalBackReferenceGroup

      public void pushConditionalBackReferenceGroup(Token.BackReference token)
      Creates and enters a new conditional back-reference group. This call should be paired with a call to popGroup(com.oracle.truffle.regex.tregex.parser.Token).
      Parameters:
      token - a Token whose source section should be included in the group's source sections, or null if none
    • pushConditionalBackReferenceGroup

      public void pushConditionalBackReferenceGroup(int referencedGroupNumber, boolean namedReference)
    • popGroup

      public void popGroup(Token token)
      Close and leave the current group. This should be paired either with pushGroup(Token), pushCaptureGroup(Token), pushLookAheadAssertion(Token, boolean) or pushLookBehindAssertion(Token, boolean).
      Parameters:
      token - a Token whose source section should be included in the group's or assertion's source sections, or null if none
    • popGroup

      public void popGroup()
    • nextSequence

      public void nextSequence()
      Adds a new Sequence to the current Group. In a parser, you would call this method after encountering the vertical bar operator.
    • addCharClass

      public void addCharClass(Token.CharacterClass token)
      Adds a new CharacterClass to the current Sequence.
      Parameters:
      token - aside from the source sections, the token most importantly contains the set of code points to be included in the character class and a flag indicating whether it corresponds to a single character in the regex (i.e. a literal or an escaped character)
    • addCharClass

      public void addCharClass(CodePointSet charSet, boolean wasSingleChar)
    • addCharClass

      public void addCharClass(CodePointSet charSet)
    • addClassSet

      public void addClassSet(Token.ClassSet token, CaseFoldData.CaseFoldUnfoldAlgorithm caseUnfoldAlgo)
      Adds a new Group representing a class set expression to the current Sequence.
      Parameters:
      token - aside from the source sections, the token most importantly contains the set of code points and strings to be included in the class set
    • addBackReference

      public void addBackReference(Token.BackReference token)
    • addBackReference

      public void addBackReference(Token.BackReference token, boolean ignoreCase)
    • addBackReference

      public void addBackReference(Token.BackReference token, boolean ignoreCase, boolean ignoreCaseAltMode)
      Adds a new BackReference to the current Sequence.
      Parameters:
      token - aside from the source sections, this contains the number of the group being referenced
    • addBackReference

      public void addBackReference(int groupNumber, boolean namedReference, boolean ignoreCase)
    • addSubexpressionCall

      public void addSubexpressionCall(int groupNumber)
    • addPositionAssertion

      public void addPositionAssertion(Token token)
      Adds a new PositionAssertion to the current Sequence.
      Parameters:
      token - aside from the source sections, the kind of this token indicates whether this is the ^ assertion or the $ assertion
    • addCaret

      public void addCaret()
    • addDollar

      public void addDollar()
    • addQuantifier

      public void addQuantifier(Token.Quantifier quantifier)
      Adds a quantifier to the current Term.
      Parameters:
      quantifier - this token contains a specification of the quantifier's semantics, along with the source section data
    • addCopy

      public void addCopy(Token token, Group sourceGroup)
      Adds a copy of sourceGroup to the current Sequence.
      Parameters:
      token - a token indicating which source sections should be attributed to the copied group
      sourceGroup - the Group to be copied
    • removeCurTerm

      public void removeCurTerm()
      Removes the current Term from the current Sequence.
    • addDeadNode

      public void addDeadNode()
      Adds a dead node (an empty character class) to the current Sequence.
    • replaceCurTermWithDeadNode

      public void replaceCurTermWithDeadNode()
      Replaces the current Term with a dead node.
    • wrapCurTermInGroup

      public void wrapCurTermInGroup()
      Wraps the current Term in a non-capturing group.
    • wrapCurTermInAtomicGroup

      public void wrapCurTermInAtomicGroup()
      Wraps the current Term in an atomic group. This can be useful when implementing possessive quantifiers.
    • addWordBoundaryAssertion

      public void addWordBoundaryAssertion(CodePointSet wordChars, CodePointSet nonWordChars)
    • addWordNonBoundaryAssertion

      public void addWordNonBoundaryAssertion(CodePointSet wordChars, CodePointSet nonWordChars)
    • addWordNonBoundaryAssertionPython

      public void addWordNonBoundaryAssertionPython(CodePointSet wordChars, CodePointSet nonWordChars)