Package org.lionsoul.jcseg.segmenter
Class SegmenterConfig
- java.lang.Object
-
- org.lionsoul.jcseg.segmenter.SegmenterConfig
-
- All Implemented Interfaces:
Serializable,Cloneable
public class SegmenterConfig extends Object implements Cloneable, Serializable
Jcseg segmenter configuration class
- Author:
- chenxin
- See Also:
- Serialized Form
-
-
Field Summary
Fields Modifier and Type Field Description booleanAPPEND_CJK_ENTITYdo the entity recognition ?booleanAPPEND_CJK_PINYINappend the Pinyin to the resultbooleanAPPEND_CJK_SYNappend the syn word to the result.booleanAPPEND_PART_OF_SPEECHappend the part of speech.booleanCLEAR_STOPWORDclear away the stop word.booleanCNFRA_TO_ARABICChinese fraction to Arabic fraction .booleanCNNUM_TO_ARABICChinese numeric to Arabic .intEN_MAX_LENmaximum/minimum match length for English word extractintEN_SEC_MIN_LENminimum length for the secondary segmentation wordbooleanEN_SECOND_SEGwhether to do the secondary split for complex Latin compose by the type of the charsbooleanEN_WORD_SEGdo the English word extractbooleanI_CN_NAMEidentify the Chinese name?booleanKEEP_UNREG_WORDSstatic StringLEX_PROPERTY_FILEdefault lexicon property file namebooleanLOAD_CJK_ENTITYwhether to load the entity definebooleanLOAD_CJK_PINYINwhether to load the Pinyin of the CJK_WORDSbooleanLOAD_CJK_POSwhether to load the word's part of speechbooleanLOAD_CJK_SYNwhether to load the synonym word of the CJK_WORDS.booleanLOAD_PARAMETERwhether to load the self-define parameterintMAX_CN_LNADRONthe max length for the adron of the Chinese last name.like 老陈 “老”intMAX_LATIN_LENGTHmaximum length for Latin wordsintMAX_LENGTHmaximum length for maximum match(5-7)intMAX_UNIT_LENGTHmaximum length for unit words for the NLP algorithm added at 2016/11/18intNAME_SINGLE_THRESHOLDthe threshold of the single word that is a single word when it and the last char of the name make up a word.intPPT_MAX_LENGTHthe maximum length for the text between the pair punctuation.
-
Constructor Summary
Constructors Constructor Description SegmenterConfig()create the config and do nothing about initialize Note: this may cuz Incompatibility problems for the old version that has use this construct methodSegmenterConfig(boolean autoLoad)create and initialize the config by autoloadSegmenterConfig(InputStream is)create and initialize the task config from a InputStreamSegmenterConfig(String proFile)create and initialize the task config from a properties file
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanappendCJKPinyin()booleanappendCJKSyn()voidautoLoad()initialize the value of its options by auto searching the jcesg.properties file:booleanclearStopwords()SegmenterConfigclone()rewrite the clone methodbooleancnFractionToArabic()booleancnNumToArabic()chargetDELIMITER()intgetEnMaxLen()intgetEnSecondMinLen()booleangetEnSecondSeg()bytegetGRAM()String[]getLexiconPath()return the lexicon directory pathintgetMaxCnLnadron()intgetMaxLength()intgetNameSingleThreshold()intgetPollTime()intgetPPTMaxLength()StringgetPropertieFile()booleanidentifyCnName()booleanisAutoload()about lexicon autoloadbooleanisEnWordSeg()booleanisKeepEnSecOriginalWord()booleanisKeepEnSegOriginalWord()booleanisKeepPunctuation(char c)booleankeepUnregWords()booleanladCJKPos()voidload(InputStream is)initialize the value of its options from a InputStream of a jcseg.properties filevoidload(String proFile)initialize the value of its options from a specified jcseg.properties propertie filebooleanloadCJKEntity()booleanloadCJKPinyin()booleanloadCJKSyn()voidset(String key, String value)set the option value from a specified key and value define in jcseg.propertiesvoidsetAppendCJKPinyin(boolean appendCJKPinyin)voidsetAppendCJKSyn(boolean appendCJKPinyin)voidsetAppendPartOfSpeech(boolean partOfSpeech)voidsetAutoload(boolean autoload)voidsetClearStopwords(boolean clearstopwords)voidsetCnFactionToArabic(boolean cnFractionToArabic)voidsetCnNumToArabic(boolean cnNumToArabic)voidsetDELIMITER(char dELIMITER)voidsetEnMaxLen(int enMaxLen)voidsetEnSecondMinLen(int minLen)voidsetEnSecondSeg(boolean enSecondSeg)voidsetEnWordSeg(boolean enWordSeg)voidsetGRAM(byte gRAM)voidsetICnName(boolean iCnName)voidsetKeepEnSecOriginalWord(boolean keepEnSecOriginalWord)voidsetKeepEnSegOriginalWord(boolean keepEnSegOriginalWord)voidsetKeepPunctuations(String keepPunctuations)voidsetKeepUnregWords(boolean keepUnregWords)voidsetLexiconPath(String[] lexPath)voidsetLoadCJKPinyin(boolean loadCJKPinyin)voidsetLoadCJKPos(boolean loadCJKPos)voidsetLoadCJKSyn(boolean loadCJKSyn)voidsetLoadEntity(boolean loadEntity)voidsetMaxCnLnadron(int maxCnLnadron)voidsetMaxLength(int maxLength)voidsetNameSingleThreshold(int thresold)voidsetPollTime(int polltime)voidsetPPT_MAX_LENGTH(int pptMaxLength)
-
-
-
Field Detail
-
LEX_PROPERTY_FILE
public static final String LEX_PROPERTY_FILE
default lexicon property file name- See Also:
- Constant Field Values
-
MAX_LENGTH
public int MAX_LENGTH
maximum length for maximum match(5-7)
-
MAX_LATIN_LENGTH
public int MAX_LATIN_LENGTH
maximum length for Latin words
-
MAX_UNIT_LENGTH
public int MAX_UNIT_LENGTH
maximum length for unit words for the NLP algorithm added at 2016/11/18
-
I_CN_NAME
public boolean I_CN_NAME
identify the Chinese name?
-
MAX_CN_LNADRON
public int MAX_CN_LNADRON
the max length for the adron of the Chinese last name.like 老陈 “老”
-
LOAD_CJK_PINYIN
public boolean LOAD_CJK_PINYIN
whether to load the Pinyin of the CJK_WORDS
-
APPEND_CJK_PINYIN
public boolean APPEND_CJK_PINYIN
append the Pinyin to the result
-
LOAD_CJK_POS
public boolean LOAD_CJK_POS
whether to load the word's part of speech
-
APPEND_PART_OF_SPEECH
public boolean APPEND_PART_OF_SPEECH
append the part of speech.
-
LOAD_CJK_SYN
public boolean LOAD_CJK_SYN
whether to load the synonym word of the CJK_WORDS.
-
APPEND_CJK_SYN
public boolean APPEND_CJK_SYN
append the syn word to the result.
-
LOAD_CJK_ENTITY
public boolean LOAD_CJK_ENTITY
whether to load the entity define
-
APPEND_CJK_ENTITY
public boolean APPEND_CJK_ENTITY
do the entity recognition ?
-
LOAD_PARAMETER
public boolean LOAD_PARAMETER
whether to load the self-define parameter
-
NAME_SINGLE_THRESHOLD
public int NAME_SINGLE_THRESHOLD
the threshold of the single word that is a single word when it and the last char of the name make up a word.
-
PPT_MAX_LENGTH
public int PPT_MAX_LENGTH
the maximum length for the text between the pair punctuation.
-
CLEAR_STOPWORD
public boolean CLEAR_STOPWORD
clear away the stop word.
-
CNNUM_TO_ARABIC
public boolean CNNUM_TO_ARABIC
Chinese numeric to Arabic .
-
CNFRA_TO_ARABIC
public boolean CNFRA_TO_ARABIC
Chinese fraction to Arabic fraction .
-
EN_SECOND_SEG
public boolean EN_SECOND_SEG
whether to do the secondary split for complex Latin compose by the type of the chars
-
EN_SEC_MIN_LEN
public int EN_SEC_MIN_LEN
minimum length for the secondary segmentation word
-
EN_MAX_LEN
public int EN_MAX_LEN
maximum/minimum match length for English word extract
-
EN_WORD_SEG
public boolean EN_WORD_SEG
do the English word extract
-
KEEP_UNREG_WORDS
public boolean KEEP_UNREG_WORDS
-
-
Constructor Detail
-
SegmenterConfig
public SegmenterConfig()
create the config and do nothing about initialize Note: this may cuz Incompatibility problems for the old version that has use this construct method- Since:
- 1.9.8
-
SegmenterConfig
public SegmenterConfig(boolean autoLoad)
create and initialize the config by autoload- Parameters:
autoLoad-
-
SegmenterConfig
public SegmenterConfig(String proFile)
create and initialize the task config from a properties file- Parameters:
proFile-
-
SegmenterConfig
public SegmenterConfig(InputStream is)
create and initialize the task config from a InputStream- Parameters:
is-
-
-
Method Detail
-
load
public void load(String proFile) throws IOException
initialize the value of its options from a specified jcseg.properties propertie file- Parameters:
proFile-- Throws:
IOException
-
autoLoad
public void autoLoad() throws IOExceptioninitialize the value of its options by auto searching the jcesg.properties file:1. Inside the dir that jcseg-core-{version}.jar is located, means beside the jar file.
2. Search root classpath.
- First, could manually put this file into root classpath (out of any jar file).
- Second, there is a copy of this file inside jcseg-core-{version}.jar. It will be used if didn't manually copy this file into classpath.
3. Load from system property "user.home".
- Throws:
IOException
-
load
public void load(InputStream is) throws IOException
initialize the value of its options from a InputStream of a jcseg.properties file- Parameters:
is-- Throws:
IOException
-
set
public void set(String key, String value) throws IOException
set the option value from a specified key and value define in jcseg.properties- Parameters:
key-value-- Throws:
IOException
-
getLexiconPath
public String[] getLexiconPath()
return the lexicon directory path
-
setLexiconPath
public void setLexiconPath(String[] lexPath)
-
isAutoload
public boolean isAutoload()
about lexicon autoload
-
setAutoload
public void setAutoload(boolean autoload)
-
getPollTime
public int getPollTime()
-
setPollTime
public void setPollTime(int polltime)
-
getMaxLength
public int getMaxLength()
-
setMaxLength
public void setMaxLength(int maxLength)
-
identifyCnName
public boolean identifyCnName()
-
setICnName
public void setICnName(boolean iCnName)
-
getMaxCnLnadron
public int getMaxCnLnadron()
-
setMaxCnLnadron
public void setMaxCnLnadron(int maxCnLnadron)
-
loadCJKPinyin
public boolean loadCJKPinyin()
-
setLoadCJKPinyin
public void setLoadCJKPinyin(boolean loadCJKPinyin)
-
setAppendPartOfSpeech
public void setAppendPartOfSpeech(boolean partOfSpeech)
-
appendCJKPinyin
public boolean appendCJKPinyin()
-
setAppendCJKPinyin
public void setAppendCJKPinyin(boolean appendCJKPinyin)
-
loadCJKSyn
public boolean loadCJKSyn()
-
setLoadCJKSyn
public void setLoadCJKSyn(boolean loadCJKSyn)
-
appendCJKSyn
public boolean appendCJKSyn()
-
setAppendCJKSyn
public void setAppendCJKSyn(boolean appendCJKPinyin)
-
ladCJKPos
public boolean ladCJKPos()
-
setLoadCJKPos
public void setLoadCJKPos(boolean loadCJKPos)
-
loadCJKEntity
public boolean loadCJKEntity()
-
setLoadEntity
public void setLoadEntity(boolean loadEntity)
-
getNameSingleThreshold
public int getNameSingleThreshold()
-
setNameSingleThreshold
public void setNameSingleThreshold(int thresold)
-
getPPTMaxLength
public int getPPTMaxLength()
-
setPPT_MAX_LENGTH
public void setPPT_MAX_LENGTH(int pptMaxLength)
-
clearStopwords
public boolean clearStopwords()
-
setClearStopwords
public void setClearStopwords(boolean clearstopwords)
-
cnNumToArabic
public boolean cnNumToArabic()
-
setCnNumToArabic
public void setCnNumToArabic(boolean cnNumToArabic)
-
cnFractionToArabic
public boolean cnFractionToArabic()
-
setCnFactionToArabic
public void setCnFactionToArabic(boolean cnFractionToArabic)
-
getEnSecondSeg
public boolean getEnSecondSeg()
-
setEnSecondSeg
public void setEnSecondSeg(boolean enSecondSeg)
-
getEnSecondMinLen
public int getEnSecondMinLen()
-
setEnSecondMinLen
public void setEnSecondMinLen(int minLen)
-
getEnMaxLen
public int getEnMaxLen()
-
setEnMaxLen
public void setEnMaxLen(int enMaxLen)
-
isEnWordSeg
public boolean isEnWordSeg()
-
setEnWordSeg
public void setEnWordSeg(boolean enWordSeg)
-
setKeepPunctuations
public void setKeepPunctuations(String keepPunctuations)
-
isKeepPunctuation
public boolean isKeepPunctuation(char c)
-
getDELIMITER
public char getDELIMITER()
-
setDELIMITER
public void setDELIMITER(char dELIMITER)
-
getGRAM
public byte getGRAM()
-
setGRAM
public void setGRAM(byte gRAM)
-
keepUnregWords
public boolean keepUnregWords()
-
setKeepUnregWords
public void setKeepUnregWords(boolean keepUnregWords)
-
getPropertieFile
public String getPropertieFile()
-
isKeepEnSecOriginalWord
public boolean isKeepEnSecOriginalWord()
-
setKeepEnSecOriginalWord
public void setKeepEnSecOriginalWord(boolean keepEnSecOriginalWord)
-
isKeepEnSegOriginalWord
public boolean isKeepEnSegOriginalWord()
-
setKeepEnSegOriginalWord
public void setKeepEnSegOriginalWord(boolean keepEnSegOriginalWord)
-
clone
public SegmenterConfig clone() throws CloneNotSupportedException
rewrite the clone method- Overrides:
clonein classObject- Returns:
- SegmenterConfig
- Throws:
CloneNotSupportedException
-
-