Class JavaRegexLexer
java.lang.Object
com.oracle.truffle.regex.tregex.parser.RegexLexer
com.oracle.truffle.regex.tregex.parser.flavors.java.JavaRegexLexer
-
Nested Class Summary
Nested classes/interfaces inherited from class com.oracle.truffle.regex.tregex.parser.RegexLexer
RegexLexer.ClassSetOperator, RegexLexer.ParseGroupNameResult, RegexLexer.ParseGroupNameResultState -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final CodePointSetstatic final CodePointSetstatic final CodePointSetstatic final CodePointSetFields inherited from class com.oracle.truffle.regex.tregex.parser.RegexLexer
compilationBuffer, DEFAULT_WHITESPACE, namedCaptureGroups, pattern, position, source -
Constructor Summary
ConstructorsConstructorDescriptionJavaRegexLexer(RegexSource source, JavaFlags flags, CompilationBuffer compilationBuffer) -
Method Summary
Modifier and TypeMethodDescriptionprotected longThe maximum value allowed while parsing bounded quantifiers.protected ClassSetContentscaseFoldClassSetAtom(ClassSetContents classSetContents) Case folds an atom in a class set expression.protected voidcaseFoldUnfold(CodePointSetAccumulator charClass) Updates a character set by expanding it to the set of characters that case fold to the same characters as the characters currently in the set.protected voidcheckClassSetCharacter(int codePoint) Checks whethercodepointcan appear as an unescaped literal class set character.protected CodePointSetcomplementClassSet(CodePointSet codePointSet) Returns the complement of a class set element.protected booleanReturnstrueif\Aand\Zposition assertions are supported.protected booleanReturnstrueif empty minimum values in bounded quantifiers (e.g.protected booleanTry to parse ranges with pre-defined inner character classes, e.g.protected booleanReturnstrueif the first character in a character class must be interpreted as part of the character set, even if it is the closing bracket']'.protected booleanReturnstrueif class set expressions (e.g.protected booleanReturnstrueif forward references are allowed.protected booleanReturnstrueif group comments (e.g.protected booleanReturnstrueif ignore-case mode is currently enabled.protected booleanReturnstrueif white space in the pattern is ignored.protected booleanReturnstrueif line comments (e.g.protected booleanReturnstrueif nested character classes are supported.protected booleanReturnstrueif octal escapes (e.g.protected booleanReturnstrueif POSIX character classes, character equivalence classes, and the POSIX Collating Element Operator are supported.protected booleanReturnstrueif possessive quantifiers (+suffix) are allowed.protected booleanReturnstrueif any constructs that alter a capture group's function, such as non-capturing groups(?:)or look-around assertions(?=), are supported.protected booleanReturnstrueif unicode property escapes (e.g.protected booleanReturnstrueif\zposition assertion is supported.protected CodePointSetReturns the code point set represented by the dot operator.protected CodePointSetReturns the set of all codepoints a group identifier may continue with.protected CodePointSetReturns the set of all codepoints a group identifier may begin with.protected intReturns the maximum number of digits to parse when parsing a back-reference.protected CodePointSetgetPOSIXCharClass(String name) Returns the POSIX character class associated to the given name.protected CodePointSetgetPredefinedCharClass(char c) Returns the CodePointSet associated with the given predefined character class (e.g.protected TBitSetThe set of codepoints to consider as whitespace in comments and "ignore white space" mode.protected TokenHandle missing } or minimum value in bounded quantifiers.protected TokenHandle non-digit characters in bounded quantifiers.protected RegexSyntaxExceptionHandle{2,1}.protected TokenhandleBoundedQuantifierOverflow(long min, long max) Handle integer overflows in quantifier bounds, e.g.protected TokenhandleBoundedQuantifierOverflowMin(long min, long max) Handle integer overflows in quantifier bounds, e.g.protected RegexSyntaxExceptionhandleCCRangeOutOfOrder(int startPos) Handle out of order character class range elements, e.g.protected voidhandleCCRangeWithPredefCharClass(int startPos, ClassSetContents firstAtom, ClassSetContents secondAtom) Handle non-codepoint character class range elements, e.g.protected RegexSyntaxExceptionHandle complement of class set expressions containing strings, e.g.protected voidhandleGroupRedefinition(String name, int newId, int oldId) protected voidHandle incomplete hex escapes, e.g.protected TokenhandleInvalidBackReference(int reference) Handle group references to non-existent groups.protected RegexSyntaxExceptionprotected RegexSyntaxExceptionHandle groups starting with(?and invalid next char.protected RegexSyntaxExceptionHandle missing operands in class set expressions, e.g.protected RegexSyntaxExceptionhandleMixedClassSetOperators(RegexLexer.ClassSetOperator leftOperator, RegexLexer.ClassSetOperator rightOperator) Handle class set expressions with mixed set operators in the same nested set.protected voidHandle octal values larger than 255.protected RegexSyntaxExceptionHandle character ranges as operands in class set expressions with operators other than union.protected voidHandle unfinished escape (e.g.protected voidHandle unfinished group comment(#...).protected RegexSyntaxExceptionHandle unfinished group with question mark(?.protected RegexSyntaxExceptionHandle unfinished range in class set expression[a-].protected RegexSyntaxExceptionHandle unmatched[.protected voidHandle unmatched }.protected voidHandle unmatched].protected booleanisPredefCharClass(char c) Returnstrueiff the given character is a predefined character class when preceded with a backslash (e.g.protected TokenliteralChar(int codePoint) protected ClassSetContentsprotected intParse the next codepoint in a group name and return it.protected TokenparseCustomEscape(char c) Parse any escape sequence starting with\and the argumentc.protected intparseCustomEscapeChar(char c, boolean inCharClass) Parse an escape character sequence (inside character class, or other escapes have already been tried) starting with\and the argument {code c}.protected intparseCustomEscapeCharFallback(int c, boolean inCharClass) Parse an escape character sequence (inside character class, or other escapes have already been tried) starting with\and the code pointc.This method is called after all other means of parsing the escape sequence have been exhausted.protected TokenparseCustomGroupBeginQ(char charAfterQuestionMark) Parse group starting with(?.protected TokenParse group starting with(<.protected ClassSetContentsparseUnicodeCharacterProperty(boolean invert) voidvoidvoidsetCurrentFlags(JavaFlags flags) protected voidvalidatePOSIXCollationElement(String sequence) Checks if the given string is a valid collation element.protected voidvalidatePOSIXEquivalenceClass(String sequence) Checks if the given string is a valid equivalence class.Methods inherited from class com.oracle.truffle.regex.tregex.parser.RegexLexer
advance, advance, atEnd, consumeChar, consumingLookahead, consumingLookahead, consumingLookahead, count, count, countDecimalDigits, countFrom, countUpTo, curChar, findChars, finishSurrogatePair, getCompilationBuffer, getLastAtomPosition, getLastCharacterClassBeginPosition, getLastTokenPosition, getNamedCaptureGroups, getNumberOfParsedGroups, getSingleNamedGroupNumber, hasNamedCaptureGroups, hasNext, inCharacterClass, isAscii, isCurCharClassInverted, isDecimalDigit, isHexDigit, isOctalDigit, lookahead, lookahead, lookbehind, next, numberOfCaptureGroupsSoFar, parseCharClassAtomCodePoint, parseCharClassAtomPredefCharClass, parseGroupName, parseHex, parseIntSaturated, parseIntSaturated, parseOctal, registerNamedCaptureGroup, retreat, syntaxError, totalNumberOfCaptureGroups
-
Field Details
-
HORIZONTAL_WHITE_SPACE
-
NOT_HORIZONTAL_WHITE_SPACE
-
VERTICAL_WHITE_SPACE
-
NOT_VERTICAL_WHITE_SPACE
-
-
Constructor Details
-
JavaRegexLexer
-
-
Method Details
-
literalChar
- Overrides:
literalCharin classRegexLexer
-
isPredefCharClass
protected boolean isPredefCharClass(char c) Description copied from class:RegexLexerReturnstrueiff the given character is a predefined character class when preceded with a backslash (e.g. \d).- Overrides:
isPredefCharClassin classRegexLexer
-
pushLocalFlags
public void pushLocalFlags() -
popLocalFlags
public void popLocalFlags() -
setCurrentFlags
-
featureEnabledIgnoreCase
protected boolean featureEnabledIgnoreCase()Description copied from class:RegexLexerReturnstrueif ignore-case mode is currently enabled.- Specified by:
featureEnabledIgnoreCasein classRegexLexer
-
featureEnabledAZPositionAssertions
protected boolean featureEnabledAZPositionAssertions()Description copied from class:RegexLexerReturnstrueif\Aand\Zposition assertions are supported.- Specified by:
featureEnabledAZPositionAssertionsin classRegexLexer
-
featureEnabledZLowerCaseAssertion
protected boolean featureEnabledZLowerCaseAssertion()Description copied from class:RegexLexerReturnstrueif\zposition assertion is supported.- Specified by:
featureEnabledZLowerCaseAssertionin classRegexLexer
-
featureEnabledBoundedQuantifierEmptyMin
protected boolean featureEnabledBoundedQuantifierEmptyMin()Description copied from class:RegexLexerReturnstrueif empty minimum values in bounded quantifiers (e.g.{,1}) are allowed and treated as zero.- Specified by:
featureEnabledBoundedQuantifierEmptyMinin classRegexLexer
-
featureEnabledPossessiveQuantifiers
protected boolean featureEnabledPossessiveQuantifiers()Description copied from class:RegexLexerReturnstrueif possessive quantifiers (+suffix) are allowed.- Specified by:
featureEnabledPossessiveQuantifiersin classRegexLexer
-
featureEnabledCharClassFirstBracketIsLiteral
protected boolean featureEnabledCharClassFirstBracketIsLiteral()Description copied from class:RegexLexerReturnstrueif the first character in a character class must be interpreted as part of the character set, even if it is the closing bracket']'.- Specified by:
featureEnabledCharClassFirstBracketIsLiteralin classRegexLexer
-
featureEnabledCCRangeWithPredefCharClass
protected boolean featureEnabledCCRangeWithPredefCharClass()Description copied from class:RegexLexerTry to parse ranges with pre-defined inner character classes, e.g.[\w-a].- Specified by:
featureEnabledCCRangeWithPredefCharClassin classRegexLexer
-
featureEnabledNestedCharClasses
protected boolean featureEnabledNestedCharClasses()Description copied from class:RegexLexerReturnstrueif nested character classes are supported. This is required forRegexLexer.featureEnabledPOSIXCharClasses().- Specified by:
featureEnabledNestedCharClassesin classRegexLexer
-
featureEnabledPOSIXCharClasses
protected boolean featureEnabledPOSIXCharClasses()Description copied from class:RegexLexerReturnstrueif POSIX character classes, character equivalence classes, and the POSIX Collating Element Operator are supported. RequiresRegexLexer.featureEnabledNestedCharClasses().- Specified by:
featureEnabledPOSIXCharClassesin classRegexLexer
-
getPOSIXCharClass
Description copied from class:RegexLexerReturns the POSIX character class associated to the given name.- Specified by:
getPOSIXCharClassin classRegexLexer
-
validatePOSIXCollationElement
Description copied from class:RegexLexerChecks if the given string is a valid collation element.- Specified by:
validatePOSIXCollationElementin classRegexLexer
-
validatePOSIXEquivalenceClass
Description copied from class:RegexLexerChecks if the given string is a valid equivalence class.- Specified by:
validatePOSIXEquivalenceClassin classRegexLexer
-
featureEnabledForwardReferences
protected boolean featureEnabledForwardReferences()Description copied from class:RegexLexerReturnstrueif forward references are allowed.- Specified by:
featureEnabledForwardReferencesin classRegexLexer
-
featureEnabledGroupComments
protected boolean featureEnabledGroupComments()Description copied from class:RegexLexerReturnstrueif group comments (e.g.(# ... )) are supported.- Specified by:
featureEnabledGroupCommentsin classRegexLexer
-
featureEnabledLineComments
protected boolean featureEnabledLineComments()Description copied from class:RegexLexerReturnstrueif line comments (e.g.# ...) are supported.- Specified by:
featureEnabledLineCommentsin classRegexLexer
-
featureEnabledIgnoreWhiteSpace
protected boolean featureEnabledIgnoreWhiteSpace()Description copied from class:RegexLexerReturnstrueif white space in the pattern is ignored. This is relevant only if line comments are not supported.- Specified by:
featureEnabledIgnoreWhiteSpacein classRegexLexer
-
getWhitespace
Description copied from class:RegexLexerThe set of codepoints to consider as whitespace in comments and "ignore white space" mode.- Specified by:
getWhitespacein classRegexLexer
-
featureEnabledOctalEscapes
protected boolean featureEnabledOctalEscapes()Description copied from class:RegexLexerReturnstrueif octal escapes (e.g.\012) are supported.- Specified by:
featureEnabledOctalEscapesin classRegexLexer
-
featureEnabledSpecialGroups
protected boolean featureEnabledSpecialGroups()Description copied from class:RegexLexerReturnstrueif any constructs that alter a capture group's function, such as non-capturing groups(?:)or look-around assertions(?=), are supported. If this flag isfalse, groups starting with a question mark(?do not have any special meaning.- Specified by:
featureEnabledSpecialGroupsin classRegexLexer
-
featureEnabledUnicodePropertyEscapes
protected boolean featureEnabledUnicodePropertyEscapes()Description copied from class:RegexLexerReturnstrueif unicode property escapes (e.g.\p{...}) are supported.- Specified by:
featureEnabledUnicodePropertyEscapesin classRegexLexer
-
featureEnabledClassSetExpressions
protected boolean featureEnabledClassSetExpressions()Description copied from class:RegexLexerReturnstrueif class set expressions (e.g.[[\w\q{abc|xyz}]--[a-cx-z]]) are supported.- Specified by:
featureEnabledClassSetExpressionsin classRegexLexer
-
caseFoldUnfold
Description copied from class:RegexLexerUpdates a character set by expanding it to the set of characters that case fold to the same characters as the characters currently in the set. This is done by case folding the set and then "unfolding" it by finding all inverse case fold mappings.- Specified by:
caseFoldUnfoldin classRegexLexer
-
caseFoldClassSetAtom
Description copied from class:RegexLexerCase folds an atom in a class set expression. This maps the elements of the expression into their case folded variant.- Specified by:
caseFoldClassSetAtomin classRegexLexer
-
complementClassSet
Description copied from class:RegexLexerReturns the complement of a class set element. In ECMAScript, this behavior can vary with the flags.- Specified by:
complementClassSetin classRegexLexer
-
getDotCodePointSet
Description copied from class:RegexLexerReturns the code point set represented by the dot operator.- Specified by:
getDotCodePointSetin classRegexLexer
-
getIdStart
Description copied from class:RegexLexerReturns the set of all codepoints a group identifier may begin with.- Specified by:
getIdStartin classRegexLexer
-
getIdContinue
Description copied from class:RegexLexerReturns the set of all codepoints a group identifier may continue with.- Specified by:
getIdContinuein classRegexLexer
-
getMaxBackReferenceDigits
protected int getMaxBackReferenceDigits()Description copied from class:RegexLexerReturns the maximum number of digits to parse when parsing a back-reference.- Specified by:
getMaxBackReferenceDigitsin classRegexLexer
-
getPredefinedCharClass
Description copied from class:RegexLexerReturns the CodePointSet associated with the given predefined character class (e.g.\d).Note that the CodePointSet returned by this function has already been case-folded and negated.
- Specified by:
getPredefinedCharClassin classRegexLexer
-
boundedQuantifierMaxValue
protected long boundedQuantifierMaxValue()Description copied from class:RegexLexerThe maximum value allowed while parsing bounded quantifiers. Larger values will cause a call toRegexLexer.handleBoundedQuantifierOverflow(long, long).- Specified by:
boundedQuantifierMaxValuein classRegexLexer
-
handleBoundedQuantifierOutOfOrder
Description copied from class:RegexLexerHandle{2,1}.- Specified by:
handleBoundedQuantifierOutOfOrderin classRegexLexer
-
handleBoundedQuantifierEmptyOrMissingMin
Description copied from class:RegexLexerHandle missing } or minimum value in bounded quantifiers.- Specified by:
handleBoundedQuantifierEmptyOrMissingMinin classRegexLexer
-
handleBoundedQuantifierInvalidCharacter
Description copied from class:RegexLexerHandle non-digit characters in bounded quantifiers.- Specified by:
handleBoundedQuantifierInvalidCharacterin classRegexLexer
-
handleBoundedQuantifierOverflow
Description copied from class:RegexLexerHandle integer overflows in quantifier bounds, e.g.{2147483649}. If this method returns a non-null value, it will be returned instead of the current quantifier.- Specified by:
handleBoundedQuantifierOverflowin classRegexLexer
-
handleBoundedQuantifierOverflowMin
Description copied from class:RegexLexerHandle integer overflows in quantifier bounds, e.g.{2147483649}. If this method returns a non-null value, it will be returned instead of the current quantifier. This method is called when no explicitmaxvalue is present.- Specified by:
handleBoundedQuantifierOverflowMinin classRegexLexer
-
handleCCRangeOutOfOrder
Description copied from class:RegexLexerHandle out of order character class range elements, e.g.[b-a].- Specified by:
handleCCRangeOutOfOrderin classRegexLexer
-
handleCCRangeWithPredefCharClass
protected void handleCCRangeWithPredefCharClass(int startPos, ClassSetContents firstAtom, ClassSetContents secondAtom) Description copied from class:RegexLexerHandle non-codepoint character class range elements, e.g.[\w-a].- Specified by:
handleCCRangeWithPredefCharClassin classRegexLexer
-
handleComplementOfStringSet
Description copied from class:RegexLexerHandle complement of class set expressions containing strings, e.g.[^\q{abc}]or\P{RGI_Emoji}.- Specified by:
handleComplementOfStringSetin classRegexLexer
-
handleGroupRedefinition
- Specified by:
handleGroupRedefinitionin classRegexLexer
-
handleIncompleteEscapeX
protected void handleIncompleteEscapeX()Description copied from class:RegexLexerHandle incomplete hex escapes, e.g.\x1.- Specified by:
handleIncompleteEscapeXin classRegexLexer
-
handleInvalidBackReference
Description copied from class:RegexLexerHandle group references to non-existent groups.- Specified by:
handleInvalidBackReferencein classRegexLexer
-
handleInvalidCharInCharClass
- Specified by:
handleInvalidCharInCharClassin classRegexLexer
-
handleInvalidGroupBeginQ
Description copied from class:RegexLexerHandle groups starting with(?and invalid next char.- Specified by:
handleInvalidGroupBeginQin classRegexLexer
-
handleMixedClassSetOperators
protected RegexSyntaxException handleMixedClassSetOperators(RegexLexer.ClassSetOperator leftOperator, RegexLexer.ClassSetOperator rightOperator) Description copied from class:RegexLexerHandle class set expressions with mixed set operators in the same nested set.- Specified by:
handleMixedClassSetOperatorsin classRegexLexer
-
handleMissingClassSetOperand
Description copied from class:RegexLexerHandle missing operands in class set expressions, e.g.[\s&&]or[\w--].- Specified by:
handleMissingClassSetOperandin classRegexLexer
-
handleOctalOutOfRange
protected void handleOctalOutOfRange()Description copied from class:RegexLexerHandle octal values larger than 255.- Specified by:
handleOctalOutOfRangein classRegexLexer
-
handleRangeAsClassSetOperand
Description copied from class:RegexLexerHandle character ranges as operands in class set expressions with operators other than union.- Specified by:
handleRangeAsClassSetOperandin classRegexLexer
-
handleUnfinishedEscape
protected void handleUnfinishedEscape()Description copied from class:RegexLexerHandle unfinished escape (e.g.\).- Specified by:
handleUnfinishedEscapein classRegexLexer
-
handleUnfinishedGroupComment
protected void handleUnfinishedGroupComment()Description copied from class:RegexLexerHandle unfinished group comment(#...).- Specified by:
handleUnfinishedGroupCommentin classRegexLexer
-
handleUnfinishedGroupQ
Description copied from class:RegexLexerHandle unfinished group with question mark(?.- Specified by:
handleUnfinishedGroupQin classRegexLexer
-
handleUnfinishedRangeInClassSet
Description copied from class:RegexLexerHandle unfinished range in class set expression[a-].- Specified by:
handleUnfinishedRangeInClassSetin classRegexLexer
-
handleUnmatchedRightBrace
protected void handleUnmatchedRightBrace()Description copied from class:RegexLexerHandle unmatched }.- Specified by:
handleUnmatchedRightBracein classRegexLexer
-
handleUnmatchedLeftBracket
Description copied from class:RegexLexerHandle unmatched[.- Specified by:
handleUnmatchedLeftBracketin classRegexLexer
-
handleUnmatchedRightBracket
protected void handleUnmatchedRightBracket()Description copied from class:RegexLexerHandle unmatched].- Specified by:
handleUnmatchedRightBracketin classRegexLexer
-
checkClassSetCharacter
Description copied from class:RegexLexerChecks whethercodepointcan appear as an unescaped literal class set character.- Specified by:
checkClassSetCharacterin classRegexLexer- Throws:
RegexSyntaxException
-
parseCodePointInGroupName
Description copied from class:RegexLexerParse the next codepoint in a group name and return it.- Specified by:
parseCodePointInGroupNamein classRegexLexer- Throws:
RegexSyntaxException
-
parseClassSetExpression
- Overrides:
parseClassSetExpressionin classRegexLexer- Throws:
RegexSyntaxException
-
parseUnicodeCharacterProperty
protected ClassSetContents parseUnicodeCharacterProperty(boolean invert) throws RegexSyntaxException - Overrides:
parseUnicodeCharacterPropertyin classRegexLexer- Throws:
RegexSyntaxException
-
parseCustomEscape
Description copied from class:RegexLexerParse any escape sequence starting with\and the argumentc.- Specified by:
parseCustomEscapein classRegexLexer
-
parseCustomEscapeChar
protected int parseCustomEscapeChar(char c, boolean inCharClass) Description copied from class:RegexLexerParse an escape character sequence (inside character class, or other escapes have already been tried) starting with\and the argument {code c}.- Specified by:
parseCustomEscapeCharin classRegexLexer
-
parseCustomEscapeCharFallback
protected int parseCustomEscapeCharFallback(int c, boolean inCharClass) Description copied from class:RegexLexerParse an escape character sequence (inside character class, or other escapes have already been tried) starting with\and the code pointc.This method is called after all other means of parsing the escape sequence have been exhausted.- Specified by:
parseCustomEscapeCharFallbackin classRegexLexer
-
parseCustomGroupBeginQ
Description copied from class:RegexLexerParse group starting with(?.- Specified by:
parseCustomGroupBeginQin classRegexLexer
-
parseGroupLt
Description copied from class:RegexLexerParse group starting with(<.- Specified by:
parseGroupLtin classRegexLexer
-