Class RegexLexer
java.lang.Object
com.oracle.truffle.regex.tregex.parser.RegexLexer
- Direct Known Subclasses:
JavaRegexLexer,JSRegexLexer,OracleDBRegexLexer,PythonRegexLexer
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enumstatic final classprotected static enum -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final CompilationBufferprotected static final TBitSetprotected final StringThe source of the input pattern.protected intThe index of the next character inpatternto be parsed.protected static final TBitSetfinal RegexSource -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidadvance()protected voidadvance(int len) protected booleanatEnd()protected abstract longThe maximum value allowed while parsing bounded quantifiers.protected abstract ClassSetContentscaseFoldClassSetAtom(ClassSetContents classSetContents) Case folds an atom in a class set expression.protected abstract voidcaseFoldUnfold(CodePointSetAccumulator charClass) Updates a character set by expanding it to the set of characters that case fold to the same characters as the characters currently in the set.protected abstract voidcheckClassSetCharacter(int codePoint) Checks whethercodepointcan appear as an unescaped literal class set character.protected abstract CodePointSetcomplementClassSet(CodePointSet codePointSet) Returns the complement of a class set element.protected charprotected booleanconsumingLookahead(char character) protected booleanconsumingLookahead(String match) protected booleanconsumingLookahead(Predicate<Character> predicate, int length) protected intprotected intprotected intprotected intprotected intprotected charcurChar()protected abstract booleanReturnstrueif\Aand\Zposition assertions are supported.protected abstract booleanReturnstrueif empty minimum values in bounded quantifiers (e.g.protected abstract booleanTry to parse ranges with pre-defined inner character classes, e.g.protected abstract booleanReturnstrueif the first character in a character class must be interpreted as part of the character set, even if it is the closing bracket']'.protected abstract booleanReturnstrueif class set expressions (e.g.protected abstract booleanReturnstrueif forward references are allowed.protected abstract booleanReturnstrueif group comments (e.g.protected abstract booleanReturnstrueif ignore-case mode is currently enabled.protected abstract booleanReturnstrueif white space in the pattern is ignored.protected abstract booleanReturnstrueif line comments (e.g.protected abstract booleanReturnstrueif nested character classes are supported.protected abstract booleanReturnstrueif octal escapes (e.g.protected abstract booleanReturnstrueif POSIX character classes, character equivalence classes, and the POSIX Collating Element Operator are supported.protected abstract booleanReturnstrueif possessive quantifiers (+suffix) are allowed.protected abstract booleanReturnstrueif any constructs that alter a capture group's function, such as non-capturing groups(?:)or look-around assertions(?=), are supported.protected abstract booleanReturnstrueif unicode property escapes (e.g.protected abstract booleanReturnstrueif\zposition assertion is supported.protected booleanfindChars(char... chars) protected intfinishSurrogatePair(char c) protected abstract CodePointSetReturns the code point set represented by the dot operator.protected abstract CodePointSetReturns the set of all codepoints a group identifier may continue with.protected abstract CodePointSetReturns the set of all codepoints a group identifier may begin with.protected intintintReturns the last token's position in the pattern string.protected abstract intReturns the maximum number of digits to parse when parsing a back-reference.protected intGet the number of capture groups parsed so far.protected abstract CodePointSetgetPOSIXCharClass(String name) Returns the POSIX character class associated to the given name.protected abstract CodePointSetgetPredefinedCharClass(char c) Returns the CodePointSet associated with the given predefined character class (e.g.protected intprotected abstract TBitSetThe set of codepoints to consider as whitespace in comments and "ignore white space" mode.protected abstract TokenHandle missing } or minimum value in bounded quantifiers.protected abstract TokenHandle non-digit characters in bounded quantifiers.protected abstract RegexSyntaxExceptionHandle{2,1}.protected abstract TokenhandleBoundedQuantifierOverflow(long min, long max) Handle integer overflows in quantifier bounds, e.g.protected abstract TokenhandleBoundedQuantifierOverflowMin(long min, long max) Handle integer overflows in quantifier bounds, e.g.protected abstract RegexSyntaxExceptionhandleCCRangeOutOfOrder(int startPos) Handle out of order character class range elements, e.g.protected abstract voidhandleCCRangeWithPredefCharClass(int startPos, ClassSetContents firstAtom, ClassSetContents secondAtom) Handle non-codepoint character class range elements, e.g.protected abstract RegexSyntaxExceptionHandle complement of class set expressions containing strings, e.g.protected abstract voidhandleGroupRedefinition(String name, int newId, int oldId) protected abstract voidHandle incomplete hex escapes, e.g.protected abstract TokenhandleInvalidBackReference(int reference) Handle group references to non-existent groups.protected abstract RegexSyntaxExceptionprotected abstract RegexSyntaxExceptionHandle groups starting with(?and invalid next char.protected abstract RegexSyntaxExceptionHandle missing operands in class set expressions, e.g.protected abstract RegexSyntaxExceptionhandleMixedClassSetOperators(RegexLexer.ClassSetOperator leftOperator, RegexLexer.ClassSetOperator rightOperator) Handle class set expressions with mixed set operators in the same nested set.protected abstract voidHandle octal values larger than 255.protected abstract RegexSyntaxExceptionHandle character ranges as operands in class set expressions with operators other than union.protected abstract voidHandle unfinished escape (e.g.protected abstract voidHandle unfinished group comment(#...).protected abstract RegexSyntaxExceptionHandle unfinished group with question mark(?.protected abstract RegexSyntaxExceptionHandle unfinished range in class set expression[a-].protected abstract RegexSyntaxExceptionHandle unmatched[.protected abstract voidHandle unmatched }.protected abstract voidHandle unmatched].protected booleanChecks whether this regular expression contains any named capture groups.booleanhasNext()booleanstatic booleanisAscii(int c) booleanstatic booleanisDecimalDigit(int c) static booleanisHexDigit(int c) static booleanisOctalDigit(int c) protected booleanisPredefCharClass(char c) Returnstrueiff the given character is a predefined character class when preceded with a backslash (e.g.protected TokenliteralChar(int codePoint) protected booleanprotected booleanprotected booleanlookbehind(char c) next()intprotected intparseCharClassAtomCodePoint(char c) protected ClassSetContentsprotected ClassSetContentsprotected abstract intParse the next codepoint in a group name and return it.protected abstract TokenparseCustomEscape(char c) Parse any escape sequence starting with\and the argumentc.protected abstract intparseCustomEscapeChar(char c, boolean inCharClass) Parse an escape character sequence (inside character class, or other escapes have already been tried) starting with\and the argument {code c}.protected abstract intparseCustomEscapeCharFallback(int c, boolean inCharClass) Parse an escape character sequence (inside character class, or other escapes have already been tried) starting with\and the code pointc.This method is called after all other means of parsing the escape sequence have been exhausted.protected abstract TokenparseCustomGroupBeginQ(char charAfterQuestionMark) Parse group starting with(?.protected abstract TokenParse group starting with(<.protected RegexLexer.ParseGroupNameResultparseGroupName(char terminator) Parse aGroupName, i.e.protected intparseHex(int minDigits, int maxDigits, int maxValue, Runnable handleTooFewDigits, Runnable handleValueTooLarge) protected intparseIntSaturated(int firstDigit, int length, int returnOnOverflow) protected longparseIntSaturated(int firstDigit, int length, int returnOnOverflow, long maxValue) protected intparseOctal(int firstDigit, int maxDigits) protected ClassSetContentsparseUnicodeCharacterProperty(boolean invert) protected voidprotected voidretreat()syntaxError(String msg) intprotected abstract voidvalidatePOSIXCollationElement(String sequence) Checks if the given string is a valid collation element.protected abstract voidvalidatePOSIXEquivalenceClass(String sequence) Checks if the given string is a valid equivalence class.
-
Field Details
-
PREDEFINED_CHAR_CLASSES
-
DEFAULT_WHITESPACE
-
source
-
pattern
The source of the input pattern. -
position
protected int positionThe index of the next character inpatternto be parsed. -
namedCaptureGroups
-
compilationBuffer
-
-
Constructor Details
-
RegexLexer
-
-
Method Details
-
getCompilationBuffer
-
featureEnabledIgnoreCase
protected abstract boolean featureEnabledIgnoreCase()Returnstrueif ignore-case mode is currently enabled. -
featureEnabledAZPositionAssertions
protected abstract boolean featureEnabledAZPositionAssertions()Returnstrueif\Aand\Zposition assertions are supported. -
featureEnabledZLowerCaseAssertion
protected abstract boolean featureEnabledZLowerCaseAssertion()Returnstrueif\zposition assertion is supported. -
featureEnabledBoundedQuantifierEmptyMin
protected abstract boolean featureEnabledBoundedQuantifierEmptyMin()Returnstrueif empty minimum values in bounded quantifiers (e.g.{,1}) are allowed and treated as zero. -
featureEnabledPossessiveQuantifiers
protected abstract boolean featureEnabledPossessiveQuantifiers()Returnstrueif possessive quantifiers (+suffix) are allowed. -
featureEnabledCharClassFirstBracketIsLiteral
protected abstract boolean featureEnabledCharClassFirstBracketIsLiteral()Returnstrueif the first character in a character class must be interpreted as part of the character set, even if it is the closing bracket']'. -
featureEnabledCCRangeWithPredefCharClass
protected abstract boolean featureEnabledCCRangeWithPredefCharClass()Try to parse ranges with pre-defined inner character classes, e.g.[\w-a]. -
featureEnabledNestedCharClasses
protected abstract boolean featureEnabledNestedCharClasses()Returnstrueif nested character classes are supported. This is required forfeatureEnabledPOSIXCharClasses(). -
featureEnabledPOSIXCharClasses
protected abstract boolean featureEnabledPOSIXCharClasses()Returnstrueif POSIX character classes, character equivalence classes, and the POSIX Collating Element Operator are supported. RequiresfeatureEnabledNestedCharClasses(). -
getPOSIXCharClass
Returns the POSIX character class associated to the given name. -
validatePOSIXCollationElement
Checks if the given string is a valid collation element. -
validatePOSIXEquivalenceClass
Checks if the given string is a valid equivalence class. -
featureEnabledForwardReferences
protected abstract boolean featureEnabledForwardReferences()Returnstrueif forward references are allowed. -
featureEnabledGroupComments
protected abstract boolean featureEnabledGroupComments()Returnstrueif group comments (e.g.(# ... )) are supported. -
featureEnabledLineComments
protected abstract boolean featureEnabledLineComments()Returnstrueif line comments (e.g.# ...) are supported. -
featureEnabledIgnoreWhiteSpace
protected abstract boolean featureEnabledIgnoreWhiteSpace()Returnstrueif white space in the pattern is ignored. This is relevant only if line comments are not supported. -
getWhitespace
The set of codepoints to consider as whitespace in comments and "ignore white space" mode. -
featureEnabledOctalEscapes
protected abstract boolean featureEnabledOctalEscapes()Returnstrueif octal escapes (e.g.\012) are supported. -
featureEnabledSpecialGroups
protected abstract boolean featureEnabledSpecialGroups()Returnstrueif any constructs that alter a capture group's function, such as non-capturing groups(?:)or look-around assertions(?=), are supported. If this flag isfalse, groups starting with a question mark(?do not have any special meaning. -
featureEnabledUnicodePropertyEscapes
protected abstract boolean featureEnabledUnicodePropertyEscapes()Returnstrueif unicode property escapes (e.g.\p{...}) are supported. -
featureEnabledClassSetExpressions
protected abstract boolean featureEnabledClassSetExpressions()Returnstrueif class set expressions (e.g.[[\w\q{abc|xyz}]--[a-cx-z]]) are supported. -
caseFoldUnfold
Updates a character set by expanding it to the set of characters that case fold to the same characters as the characters currently in the set. This is done by case folding the set and then "unfolding" it by finding all inverse case fold mappings. -
caseFoldClassSetAtom
Case folds an atom in a class set expression. This maps the elements of the expression into their case folded variant. -
complementClassSet
Returns the complement of a class set element. In ECMAScript, this behavior can vary with the flags. -
getDotCodePointSet
Returns the code point set represented by the dot operator. -
getIdStart
Returns the set of all codepoints a group identifier may begin with. -
getIdContinue
Returns the set of all codepoints a group identifier may continue with. -
getMaxBackReferenceDigits
protected abstract int getMaxBackReferenceDigits()Returns the maximum number of digits to parse when parsing a back-reference. -
isPredefCharClass
protected boolean isPredefCharClass(char c) Returnstrueiff the given character is a predefined character class when preceded with a backslash (e.g. \d). -
getPredefinedCharClass
Returns the CodePointSet associated with the given predefined character class (e.g.\d).Note that the CodePointSet returned by this function has already been case-folded and negated.
-
boundedQuantifierMaxValue
protected abstract long boundedQuantifierMaxValue()The maximum value allowed while parsing bounded quantifiers. Larger values will cause a call tohandleBoundedQuantifierOverflow(long, long). -
handleBoundedQuantifierOutOfOrder
Handle{2,1}. -
handleBoundedQuantifierEmptyOrMissingMin
Handle missing } or minimum value in bounded quantifiers. -
handleBoundedQuantifierInvalidCharacter
Handle non-digit characters in bounded quantifiers. -
handleBoundedQuantifierOverflow
Handle integer overflows in quantifier bounds, e.g.{2147483649}. If this method returns a non-null value, it will be returned instead of the current quantifier. -
handleBoundedQuantifierOverflowMin
Handle integer overflows in quantifier bounds, e.g.{2147483649}. If this method returns a non-null value, it will be returned instead of the current quantifier. This method is called when no explicitmaxvalue is present. -
handleCCRangeOutOfOrder
Handle out of order character class range elements, e.g.[b-a]. -
handleCCRangeWithPredefCharClass
protected abstract void handleCCRangeWithPredefCharClass(int startPos, ClassSetContents firstAtom, ClassSetContents secondAtom) Handle non-codepoint character class range elements, e.g.[\w-a]. -
handleComplementOfStringSet
Handle complement of class set expressions containing strings, e.g.[^\q{abc}]or\P{RGI_Emoji}. -
handleGroupRedefinition
-
handleIncompleteEscapeX
protected abstract void handleIncompleteEscapeX()Handle incomplete hex escapes, e.g.\x1. -
handleInvalidBackReference
Handle group references to non-existent groups. -
handleInvalidCharInCharClass
-
handleInvalidGroupBeginQ
Handle groups starting with(?and invalid next char. -
handleMixedClassSetOperators
protected abstract RegexSyntaxException handleMixedClassSetOperators(RegexLexer.ClassSetOperator leftOperator, RegexLexer.ClassSetOperator rightOperator) Handle class set expressions with mixed set operators in the same nested set. -
handleMissingClassSetOperand
protected abstract RegexSyntaxException handleMissingClassSetOperand(RegexLexer.ClassSetOperator operator) Handle missing operands in class set expressions, e.g.[\s&&]or[\w--]. -
handleOctalOutOfRange
protected abstract void handleOctalOutOfRange()Handle octal values larger than 255. -
handleRangeAsClassSetOperand
protected abstract RegexSyntaxException handleRangeAsClassSetOperand(RegexLexer.ClassSetOperator operator) Handle character ranges as operands in class set expressions with operators other than union. -
handleUnfinishedEscape
protected abstract void handleUnfinishedEscape()Handle unfinished escape (e.g.\). -
handleUnfinishedGroupComment
protected abstract void handleUnfinishedGroupComment()Handle unfinished group comment(#...). -
handleUnfinishedGroupQ
Handle unfinished group with question mark(?. -
handleUnfinishedRangeInClassSet
Handle unfinished range in class set expression[a-]. -
handleUnmatchedRightBrace
protected abstract void handleUnmatchedRightBrace()Handle unmatched }. -
handleUnmatchedLeftBracket
Handle unmatched[. -
handleUnmatchedRightBracket
protected abstract void handleUnmatchedRightBracket()Handle unmatched]. -
checkClassSetCharacter
Checks whethercodepointcan appear as an unescaped literal class set character.- Throws:
RegexSyntaxException
-
parseCodePointInGroupName
Parse the next codepoint in a group name and return it.- Throws:
RegexSyntaxException
-
parseCustomEscape
Parse any escape sequence starting with\and the argumentc. -
parseCustomEscapeChar
protected abstract int parseCustomEscapeChar(char c, boolean inCharClass) Parse an escape character sequence (inside character class, or other escapes have already been tried) starting with\and the argument {code c}. -
parseCustomEscapeCharFallback
protected abstract int parseCustomEscapeCharFallback(int c, boolean inCharClass) Parse an escape character sequence (inside character class, or other escapes have already been tried) starting with\and the code pointc.This method is called after all other means of parsing the escape sequence have been exhausted. -
parseCustomGroupBeginQ
Parse group starting with(?. -
parseGroupLt
Parse group starting with(<. -
findChars
protected boolean findChars(char... chars) -
advance
protected void advance() -
retreat
protected void retreat() -
hasNext
public boolean hasNext() -
next
- Throws:
RegexSyntaxException
-
getLastTokenPosition
public int getLastTokenPosition()Returns the last token's position in the pattern string. -
getLastCharacterClassBeginPosition
public int getLastCharacterClassBeginPosition() -
getLastAtomPosition
protected int getLastAtomPosition() -
curChar
protected char curChar() -
consumeChar
protected char consumeChar() -
advance
protected void advance(int len) -
lookahead
-
lookahead
-
consumingLookahead
protected boolean consumingLookahead(char character) -
consumingLookahead
-
consumingLookahead
-
lookbehind
protected boolean lookbehind(char c) -
count
-
countUpTo
-
countFrom
-
count
-
atEnd
protected boolean atEnd() -
inCharacterClass
public boolean inCharacterClass() -
isCurCharClassInverted
public boolean isCurCharClassInverted() -
getNumberOfParsedGroups
protected int getNumberOfParsedGroups()Get the number of capture groups parsed so far. -
totalNumberOfCaptureGroups
- Throws:
RegexSyntaxException
-
numberOfCaptureGroupsSoFar
public int numberOfCaptureGroupsSoFar() -
getNamedCaptureGroups
- Throws:
RegexSyntaxException
-
hasNamedCaptureGroups
Checks whether this regular expression contains any named capture groups.This method is a way to check whether we are parsing the goal symbol Pattern[~U, +N] or Pattern[~U, ~N] (see the ECMAScript RegExp grammar).
- Throws:
RegexSyntaxException
-
registerNamedCaptureGroup
-
getSingleNamedGroupNumber
-
literalChar
-
parseGroupName
protected RegexLexer.ParseGroupNameResult parseGroupName(char terminator) throws RegexSyntaxException Parse aGroupName, i.e.<RegExpIdentifierName>, assuming that the opening<bracket was already read.- Returns:
- the StringValue of the
RegExpIdentifierName - Throws:
RegexSyntaxException
-
parseIntSaturated
protected int parseIntSaturated(int firstDigit, int length, int returnOnOverflow) -
parseIntSaturated
protected long parseIntSaturated(int firstDigit, int length, int returnOnOverflow, long maxValue) -
countDecimalDigits
protected int countDecimalDigits() -
parseCharClassAtomPredefCharClass
- Throws:
RegexSyntaxException
-
parseCharClassAtomCodePoint
- Throws:
RegexSyntaxException
-
parseClassSetExpression
- Throws:
RegexSyntaxException
-
parseUnicodeCharacterProperty
protected ClassSetContents parseUnicodeCharacterProperty(boolean invert) throws RegexSyntaxException - Throws:
RegexSyntaxException
-
finishSurrogatePair
protected int finishSurrogatePair(char c) -
parseOctal
protected int parseOctal(int firstDigit, int maxDigits) -
parseHex
-
syntaxError
-
isDecimalDigit
public static boolean isDecimalDigit(int c) -
isOctalDigit
public static boolean isOctalDigit(int c) -
isHexDigit
public static boolean isHexDigit(int c) -
isAscii
public static boolean isAscii(int c)
-