Package org.lionsoul.jcseg.segmenter
Class ComplexSeg
- java.lang.Object
-
- org.lionsoul.jcseg.segmenter.Segmenter
-
- org.lionsoul.jcseg.segmenter.ComplexSeg
-
- All Implemented Interfaces:
Serializable,ISegment
- Direct Known Subclasses:
NLPSeg
public class ComplexSeg extends Segmenter implements Serializable
Jcseg complex segmentation implementation based on the filter works of MMSeg rules:
- 1.maximum match chunk.
- 2.largest average word length.
- 3.smallest variance of words length.
- 4.largest sum of degree of morphemic freedom of one-character words.
- Author:
- chenxin
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.lionsoul.jcseg.ISegment
ISegment.Type
-
-
Field Summary
-
Fields inherited from class org.lionsoul.jcseg.segmenter.Segmenter
behindLatin, config, ctrlMask, dic, iaList, idx, isb, reader, subWordPool, wordPool
-
Fields inherited from interface org.lionsoul.jcseg.ISegment
CHECK_CE_MASk, CHECK_CF_MASK, CHECK_EC_MASK, COMPLEX, COMPLEX_MODE, DELIMITER, DELIMITER_MODE, DETECT, DETECT_MODE, MOST, MOST_MODE, NGRAM, NGRAM_MODE, NLP, NLP_MODE, SIMPLE, SIMPLE_MODE, START_SS_MASK
-
-
Constructor Summary
Constructors Constructor Description ComplexSeg(SegmenterConfig config, ADictionary dic)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description IChunkgetBestChunk(char[] chars, int index, int maxLen)an abstract method to get word from the current position with MMSEG algorithm.protected static voidprintChunks(String scene, ArrayList<IChunk> chunks)-
Methods inherited from class org.lionsoul.jcseg.segmenter.Segmenter
appendCJKWordFeatures, appendLatinWordFeatures, enSecondSeg, enSecondSegFilter, enWordSeg, findCHName, getConfig, getDict, getNextCJKWord, getNextLatinWord, getNextMatch, getNextMixedWord, getNextPunctuationPairWord, getPairPunctuationText, getStreamPosition, next, nextCJKSentence, nextCNNumeric, nextLatinString, nextLatinWord, nextLetterNumber, nextOtherNumber, pushBack, pushBack, readNext, reset, wordNewOrClone
-
-
-
-
Constructor Detail
-
ComplexSeg
public ComplexSeg(SegmenterConfig config, ADictionary dic)
-
-
Method Detail
-
getBestChunk
public IChunk getBestChunk(char[] chars, int index, int maxLen)
Description copied from class:Segmenteran abstract method to get word from the current position with MMSEG algorithm. simpleSeg and ComplexSeg is different to deal with this so make it a abstract method here- Overrides:
getBestChunkin classSegmenter- Returns:
- IChunk
- See Also:
Segmenter.getBestChunk(char[], int, int)
-
-