Serialized Form
-
Package org.lionsoul.jcseg.dic
-
Class org.lionsoul.jcseg.dic.ADictionary extends Object implements Serializable
- serialVersionUID:
- 4471659405268497613L
-
Serialized Fields
-
autoloadThread
Thread autoloadThread
auto reload thread -
config
SegmenterConfig config
-
mixPrefixLength
int mixPrefixLength
-
mixSuffixLength
int mixSuffixLength
maximum length for the Chinese words after the LATIN word or the one before it used to match Chinese and English mix word, like 'B超,AA制...' or style compose style like '卡拉ok'. since 2.0.1 the value will be reset during the lexicon load process -
rootMap
Map<String,SynonymsEntry> rootMap
-
synBuffer
List<String[]> synBuffer
synonyms buffer -
sync
boolean sync
-
-
Class org.lionsoul.jcseg.dic.HashMapDictionary extends ADictionary implements Serializable
- serialVersionUID:
- 1L
-
-
Package org.lionsoul.jcseg.segmenter
-
Class org.lionsoul.jcseg.segmenter.ComplexSeg extends Segmenter implements Serializable
- serialVersionUID:
- 1L
-
Class org.lionsoul.jcseg.segmenter.DetectSeg extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
config
SegmenterConfig config
-
dic
ADictionary dic
the dictionary and task configuration -
idx
int idx
the index of the current input stream -
isb
IStringBuffer isb
-
reader
IPushbackReader reader
runtime needed push back reader and the string buffer -
wordPool
LinkedList<IWord> wordPool
-
-
Class org.lionsoul.jcseg.segmenter.NLPSeg extends ComplexSeg implements Serializable
- serialVersionUID:
- -8686944894332423915L
-
Serialized Fields
-
buffer
IStringBuffer buffer
-
eWordPool
LinkedList<IWord> eWordPool
word pool for NLP complex entity recognition
-
-
Class org.lionsoul.jcseg.segmenter.SegmenterConfig extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
APPEND_CJK_ENTITY
boolean APPEND_CJK_ENTITY
do the entity recognition ? -
APPEND_CJK_PINYIN
boolean APPEND_CJK_PINYIN
append the Pinyin to the result -
APPEND_CJK_SYN
boolean APPEND_CJK_SYN
append the syn word to the result. -
APPEND_PART_OF_SPEECH
boolean APPEND_PART_OF_SPEECH
append the part of speech. -
CLEAR_STOPWORD
boolean CLEAR_STOPWORD
clear away the stop word. -
CNFRA_TO_ARABIC
boolean CNFRA_TO_ARABIC
Chinese fraction to Arabic fraction . -
CNNUM_TO_ARABIC
boolean CNNUM_TO_ARABIC
Chinese numeric to Arabic . -
DELIMITER
char DELIMITER
char for delimiter segmentation default to English whitespace -
EN_MAX_LEN
int EN_MAX_LEN
maximum/minimum match length for English word extract -
EN_SEC_MIN_LEN
int EN_SEC_MIN_LEN
minimum length for the secondary segmentation word -
EN_SECOND_SEG
boolean EN_SECOND_SEG
whether to do the secondary split for complex Latin compose by the type of the chars -
EN_WORD_SEG
boolean EN_WORD_SEG
do the English word extract -
GRAM
byte GRAM
N for the n-gram -
I_CN_NAME
boolean I_CN_NAME
identify the Chinese name? -
KEEP_PUNCTUATIONS
String KEEP_PUNCTUATIONS
keep punctuation -
KEEP_UNREG_WORDS
boolean KEEP_UNREG_WORDS
-
keepEnSecOriginalWord
boolean keepEnSecOriginalWord
configuration items for cross segment implementation control -
keepEnSegOriginalWord
boolean keepEnSegOriginalWord
-
lexAutoload
boolean lexAutoload
-
lexPath
String[] lexPath
-
LOAD_CJK_ENTITY
boolean LOAD_CJK_ENTITY
whether to load the entity define -
LOAD_CJK_PINYIN
boolean LOAD_CJK_PINYIN
whether to load the Pinyin of the CJK_WORDS -
LOAD_CJK_POS
boolean LOAD_CJK_POS
whether to load the word's part of speech -
LOAD_CJK_SYN
boolean LOAD_CJK_SYN
whether to load the synonym word of the CJK_WORDS. -
LOAD_PARAMETER
boolean LOAD_PARAMETER
whether to load the self-define parameter -
MAX_CN_LNADRON
int MAX_CN_LNADRON
the max length for the adron of the Chinese last name.like 老陈 “老” -
MAX_LATIN_LENGTH
int MAX_LATIN_LENGTH
maximum length for Latin words -
MAX_LENGTH
int MAX_LENGTH
maximum length for maximum match(5-7) -
MAX_UNIT_LENGTH
int MAX_UNIT_LENGTH
maximum length for unit words for the NLP algorithm added at 2016/11/18 -
NAME_SINGLE_THRESHOLD
int NAME_SINGLE_THRESHOLD
the threshold of the single word that is a single word when it and the last char of the name make up a word. -
pFile
String pFile
-
pollTime
int pollTime
-
PPT_MAX_LENGTH
int PPT_MAX_LENGTH
the maximum length for the text between the pair punctuation.
-
-
Class org.lionsoul.jcseg.segmenter.Word extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
entity
String[] entity
NOTE added at 2016/11/12 word string entity name. and it could be assigned from the lexicon or the word item setting or assign dynamic during the segment runtime NOTE make it an Array at 2017/06/06 -
fre
int fre
-
h
int h
-
length
int length
well we could get the length of the word by invoke #getValue().length owing to the implementation of Jcseg andWord.getValue().length may no equals toWord.getLength()Word.getLength()will return the value set by #setLength -
parameter
String parameter
NOTE added at 2017/10/02 with IWord additional parameter support -
partSpeech
String[] partSpeech
-
pinyin
String pinyin
-
position
int position
-
syn
SynonymsEntry syn
-
type
int type
-
value
String value
-
-
-
Package org.lionsoul.jcseg.util
-
Class org.lionsoul.jcseg.util.IStringBuffer extends Object implements Serializable
- serialVersionUID:
- 1L
-
Serialized Fields
-
buff
char[] buff
buffer char array. -
count
int count
-
-