Package org.lionsoul.jcseg.segmenter
Class MostSeg
- java.lang.Object
-
- org.lionsoul.jcseg.segmenter.Segmenter
-
- org.lionsoul.jcseg.segmenter.MostSeg
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.lionsoul.jcseg.ISegment
ISegment.Type
-
-
Field Summary
-
Fields inherited from class org.lionsoul.jcseg.segmenter.Segmenter
behindLatin, config, ctrlMask, dic, iaList, idx, isb, reader, subWordPool, wordPool
-
Fields inherited from interface org.lionsoul.jcseg.ISegment
CHECK_CE_MASk, CHECK_CF_MASK, CHECK_EC_MASK, COMPLEX, COMPLEX_MODE, DELIMITER, DELIMITER_MODE, DETECT, DETECT_MODE, MOST, MOST_MODE, NGRAM, NGRAM_MODE, NLP, NLP_MODE, SIMPLE, SIMPLE_MODE, START_SS_MASK
-
-
Constructor Summary
Constructors Constructor Description MostSeg(SegmenterConfig config, ADictionary dic)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected booleanenSecondSegFilter(IWord w)interface to check and do the English secondary segmentation.protected LinkedList<IWord>enWordSeg(IWord w, LinkedList<IWord> wList)Latin word lexicon based English word segmentation for search modeprotected IWordgetNextCJKWord(int c, int pos)get the next CJK word from the current position of the input stream and this function is the core part the most segmentation implements-
Methods inherited from class org.lionsoul.jcseg.segmenter.Segmenter
appendCJKWordFeatures, appendLatinWordFeatures, enSecondSeg, findCHName, getBestChunk, getConfig, getDict, getNextLatinWord, getNextMatch, getNextMixedWord, getNextPunctuationPairWord, getPairPunctuationText, getStreamPosition, next, nextCJKSentence, nextCNNumeric, nextLatinString, nextLatinWord, nextLetterNumber, nextOtherNumber, pushBack, pushBack, readNext, reset, wordNewOrClone
-
-
-
-
Constructor Detail
-
MostSeg
public MostSeg(SegmenterConfig config, ADictionary dic)
-
-
Method Detail
-
getNextCJKWord
protected IWord getNextCJKWord(int c, int pos) throws IOException
get the next CJK word from the current position of the input stream and this function is the core part the most segmentation implements- Overrides:
getNextCJKWordin classSegmenter- Returns:
- IWord could be null and that mean we reached a stop word
- Throws:
IOException- See Also:
Segmenter.getNextCJKWord(int, int)
-
enSecondSegFilter
protected boolean enSecondSegFilter(IWord w)
Description copied from class:Segmenterinterface to check and do the English secondary segmentation. Override this method to control the secondary logic.- Overrides:
enSecondSegFilterin classSegmenter- Returns:
- boolean
- See Also:
Segmenter.enSecondSegFilter(IWord)
-
enWordSeg
protected LinkedList<IWord> enWordSeg(IWord w, LinkedList<IWord> wList)
Latin word lexicon based English word segmentation for search mode
-
-