Package org.lionsoul.jcseg.util
Class StringUtil
- java.lang.Object
-
- org.lionsoul.jcseg.util.StringUtil
-
public class StringUtil extends Object
a class to deal with the English stop char like the English punctuation
- Author:
- chenxin
-
-
Field Summary
Fields Modifier and Type Field Description static intEN_LETTERstatic intEN_NUMERICstatic intEN_PUNCTUATIONstatic intEN_UNKNOWstatic intEN_WHITESPACE
-
Constructor Summary
Constructors Constructor Description StringUtil()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static intCJKIndexOf(String str)static intCJKIndexOf(String str, int offset)get the index of the first CJK char of the specified stringstatic StringfwsTohws(String str)a static method to replace the full-width char to the half-width char in a given string (65281-65374 for full-width char)static intgetEnCharType(int u)get the type of the English char defined in this class and start with EN_.static chargetPunctuationPair(char c)get the pair punctuation' pairstatic StringhwsTofws(String str)a static method to replace the half-width char to the full-width char in a given stringstatic booleanisCJK(String str)static booleanisCJK(String str, int beginIndex, int endIndex)check if the specified string is all CJK charsstatic booleanisCJKChar(int c)check the specified char is CJK, Thai...static booleanisCnPunctuation(int c)static booleanisDecimal(String str)static booleanisDecimal(String str, int beginIndex, int endIndex)check the specified char is a decimal including the full-width charstatic booleanisDigit(String str)static booleanisDigit(String str, int beginIndex, int endIndex)check the specified char is a digit or not true will return if it is or return false this method can recognize full-with charstatic booleanisEnChar(int c)check the specified char is a basic Latin and Russia and Greece letter.static booleanisENKeepPunctuaton(char c)check the given char is English keep punctuationstatic booleanisEnLetter(int u)include the full-width and half-width charstatic booleanisEnNumeric(int u)check the specified char is an English numeric(48-57) including the full-width charstatic booleanisEnPunctuation(int c)check if the given char is half-width punctuationstatic booleanisFWEnChar(int c)check the given char is a full-width char AT+reader: the full-width punctuation is not included herestatic booleanisHWEnChar(int c)check the given char is a half-width char or notstatic booleanisLatin(String str)static booleanisLatin(String str, int beginIndex, int endIndex)check if the specified string is all Latin charsstatic booleanisLetter(String str)static booleanisLetter(String str, int beginIndex, int endIndex)check if the specified string is Latin letterstatic booleanisLetterNumber(int c)check the specified char is Letter number like 'ⅠⅡ' true will be return if it is, or return falsestatic booleanisLetterOrNumeric(String str)static booleanisLetterOrNumeric(String str, int beginIndex, int endIndex)check if the specified string is Latin numeric or letterstatic booleanisLowerCaseLetter(int u)static booleanisNoTailingPunctuation(char c)check if the given punctuation is the one that need to be clearedstatic booleanisNumeric(String str)static booleanisNumeric(String str, int beginIndex, int endIndex)check if the specified string is Latin numericstatic booleanisOtherNumber(int c)check the specified char is other number like '①⑩⑽㈩' true will be return if it is, or return falsestatic booleanisPairPunctuation(char c)check the given char is pair punctuation or notstatic booleanisPunctuation(int c)check if the given char is a punctuationstatic booleanisPunctuation(String str)static booleanisPunctuation(String str, int beginIndex, int endIndex)Check if the specified string is all punctuation chars (English and Chinese punctuation)static booleanisUpperCaseLetter(int u)static booleanisWhitespace(int c)check the given string is a whitespacestatic intlatinIndexOf(String str)static intlatinIndexOf(String str, int offset)get the index of the first Latin char of the specified stringstatic inttoLowerCase(int u)static inttoUpperCase(int u)
-
-
-
Field Detail
-
EN_LETTER
public static final int EN_LETTER
- See Also:
- Constant Field Values
-
EN_NUMERIC
public static final int EN_NUMERIC
- See Also:
- Constant Field Values
-
EN_PUNCTUATION
public static final int EN_PUNCTUATION
- See Also:
- Constant Field Values
-
EN_WHITESPACE
public static final int EN_WHITESPACE
- See Also:
- Constant Field Values
-
EN_UNKNOW
public static final int EN_UNKNOW
- See Also:
- Constant Field Values
-
-
Method Detail
-
isCJKChar
public static boolean isCJKChar(int c)
check the specified char is CJK, Thai... char true will be return if it is or return false- Parameters:
c-- Returns:
- boolean
-
isEnChar
public static boolean isEnChar(int c)
check the specified char is a basic Latin and Russia and Greece letter. True will be return if it is or return false. this method can recognize full-width char and letter- Parameters:
c-- Returns:
- boolean
-
isLetterNumber
public static boolean isLetterNumber(int c)
check the specified char is Letter number like 'ⅠⅡ' true will be return if it is, or return false- Parameters:
c-- Returns:
- boolean
-
isOtherNumber
public static boolean isOtherNumber(int c)
check the specified char is other number like '①⑩⑽㈩' true will be return if it is, or return false- Parameters:
c-- Returns:
- boolean
-
isENKeepPunctuaton
public static boolean isENKeepPunctuaton(char c)
check the given char is English keep punctuation- Parameters:
c-- Returns:
- boolean
-
isNoTailingPunctuation
public static boolean isNoTailingPunctuation(char c)
check if the given punctuation is the one that need to be cleared- Parameters:
c-- Returns:
- boolean
-
isUpperCaseLetter
public static boolean isUpperCaseLetter(int u)
-
isLowerCaseLetter
public static boolean isLowerCaseLetter(int u)
-
toLowerCase
public static int toLowerCase(int u)
-
toUpperCase
public static int toUpperCase(int u)
-
isEnLetter
public static boolean isEnLetter(int u)
include the full-width and half-width char- Parameters:
u-- Returns:
- boolean
-
isEnNumeric
public static boolean isEnNumeric(int u)
check the specified char is an English numeric(48-57) including the full-width char- Parameters:
u-- Returns:
- boolean
-
getEnCharType
public static int getEnCharType(int u)
get the type of the English char defined in this class and start with EN_. (only half-width)- Parameters:
u- char to identity- Returns:
- int type keywords
-
isHWEnChar
public static boolean isHWEnChar(int c)
check the given char is a half-width char or not
- 32 -> whitespace
- 33-47 -> punctuation
- 48-57 -> 0-9
- 58-64 -> punctuation
- 65-90 -> A-Z
- 91-96 -> punctuation
- 97-122 -> a-z
- 123-126 -> punctuation
- Parameters:
c-- Returns:
- boolean
-
isFWEnChar
public static boolean isFWEnChar(int c)
check the given char is a full-width char AT+reader: the full-width punctuation is not included here- Parameters:
c-- Returns:
- boolean
-
isEnPunctuation
public static boolean isEnPunctuation(int c)
check if the given char is half-width punctuation- Parameters:
c-- Returns:
- boolean
-
isCnPunctuation
public static boolean isCnPunctuation(int c)
-
isPunctuation
public static boolean isPunctuation(int c)
check if the given char is a punctuation
-
isWhitespace
public static boolean isWhitespace(int c)
check the given string is a whitespace- Parameters:
c-- Returns:
- boolean
-
isDigit
public static boolean isDigit(String str, int beginIndex, int endIndex)
check the specified char is a digit or not true will return if it is or return false this method can recognize full-with char- Parameters:
str-beginIndex-endIndex-- Returns:
- boolean
-
isDigit
public static boolean isDigit(String str)
-
isDecimal
public static boolean isDecimal(String str, int beginIndex, int endIndex)
check the specified char is a decimal including the full-width char- Parameters:
str-beginIndex-endIndex-- Returns:
- boolean
-
isDecimal
public static boolean isDecimal(String str)
-
isLatin
public static boolean isLatin(String str, int beginIndex, int endIndex)
check if the specified string is all Latin chars- Parameters:
str-beginIndex-endIndex-- Returns:
- boolean
-
isLatin
public static boolean isLatin(String str)
-
isCJK
public static boolean isCJK(String str, int beginIndex, int endIndex)
check if the specified string is all CJK chars- Parameters:
str-beginIndex-endIndex-- Returns:
- boolean
-
isCJK
public static boolean isCJK(String str)
-
isLetterOrNumeric
public static boolean isLetterOrNumeric(String str, int beginIndex, int endIndex)
check if the specified string is Latin numeric or letter- Parameters:
str-beginIndex-endIndex-- Returns:
- boolean
-
isLetterOrNumeric
public static boolean isLetterOrNumeric(String str)
-
isLetter
public static boolean isLetter(String str, int beginIndex, int endIndex)
check if the specified string is Latin letter- Parameters:
str-beginIndex-endIndex-- Returns:
- boolean
-
isLetter
public static boolean isLetter(String str)
-
isNumeric
public static boolean isNumeric(String str, int beginIndex, int endIndex)
check if the specified string is Latin numeric- Parameters:
str-beginIndex-endIndex-- Returns:
- boolean
-
isNumeric
public static boolean isNumeric(String str)
-
latinIndexOf
public static int latinIndexOf(String str, int offset)
get the index of the first Latin char of the specified string- Parameters:
str-offset-- Returns:
- integer
-
latinIndexOf
public static int latinIndexOf(String str)
-
CJKIndexOf
public static int CJKIndexOf(String str, int offset)
get the index of the first CJK char of the specified string- Parameters:
str-offset-- Returns:
- integer
-
CJKIndexOf
public static int CJKIndexOf(String str)
-
fwsTohws
public static String fwsTohws(String str)
a static method to replace the full-width char to the half-width char in a given string (65281-65374 for full-width char)- Parameters:
str-- Returns:
- String the new String after the replace.
-
hwsTofws
public static String hwsTofws(String str)
a static method to replace the half-width char to the full-width char in a given string- Parameters:
str-- Returns:
- String the new String after the replace
-
isPairPunctuation
public static boolean isPairPunctuation(char c)
check the given char is pair punctuation or not- Parameters:
c-- Returns:
- boolean true for it is and false for not
-
getPunctuationPair
public static char getPunctuationPair(char c)
get the pair punctuation' pair- Parameters:
c-- Returns:
- char
-
isPunctuation
public static boolean isPunctuation(String str, int beginIndex, int endIndex)
Check if the specified string is all punctuation chars (English and Chinese punctuation)- Parameters:
str-beginIndex-endIndex-- Returns:
- boolean
-
isPunctuation
public static boolean isPunctuation(String str)
-
-