Class SegmenterConfig

    • Field Detail

      • MAX_LENGTH

        public int MAX_LENGTH
        maximum length for maximum match(5-7)
      • MAX_LATIN_LENGTH

        public int MAX_LATIN_LENGTH
        maximum length for Latin words
      • MAX_UNIT_LENGTH

        public int MAX_UNIT_LENGTH
        maximum length for unit words for the NLP algorithm added at 2016/11/18
      • I_CN_NAME

        public boolean I_CN_NAME
        identify the Chinese name?
      • MAX_CN_LNADRON

        public int MAX_CN_LNADRON
        the max length for the adron of the Chinese last name.like 老陈 “老”
      • LOAD_CJK_PINYIN

        public boolean LOAD_CJK_PINYIN
        whether to load the Pinyin of the CJK_WORDS
      • APPEND_CJK_PINYIN

        public boolean APPEND_CJK_PINYIN
        append the Pinyin to the result
      • LOAD_CJK_POS

        public boolean LOAD_CJK_POS
        whether to load the word's part of speech
      • APPEND_PART_OF_SPEECH

        public boolean APPEND_PART_OF_SPEECH
        append the part of speech.
      • LOAD_CJK_SYN

        public boolean LOAD_CJK_SYN
        whether to load the synonym word of the CJK_WORDS.
      • APPEND_CJK_SYN

        public boolean APPEND_CJK_SYN
        append the syn word to the result.
      • LOAD_CJK_ENTITY

        public boolean LOAD_CJK_ENTITY
        whether to load the entity define
      • APPEND_CJK_ENTITY

        public boolean APPEND_CJK_ENTITY
        do the entity recognition ?
      • LOAD_PARAMETER

        public boolean LOAD_PARAMETER
        whether to load the self-define parameter
      • NAME_SINGLE_THRESHOLD

        public int NAME_SINGLE_THRESHOLD
        the threshold of the single word that is a single word when it and the last char of the name make up a word.
      • PPT_MAX_LENGTH

        public int PPT_MAX_LENGTH
        the maximum length for the text between the pair punctuation.
      • CLEAR_STOPWORD

        public boolean CLEAR_STOPWORD
        clear away the stop word.
      • CNNUM_TO_ARABIC

        public boolean CNNUM_TO_ARABIC
        Chinese numeric to Arabic .
      • CNFRA_TO_ARABIC

        public boolean CNFRA_TO_ARABIC
        Chinese fraction to Arabic fraction .
      • EN_SECOND_SEG

        public boolean EN_SECOND_SEG
        whether to do the secondary split for complex Latin compose by the type of the chars
      • EN_SEC_MIN_LEN

        public int EN_SEC_MIN_LEN
        minimum length for the secondary segmentation word
      • EN_MAX_LEN

        public int EN_MAX_LEN
        maximum/minimum match length for English word extract
      • EN_WORD_SEG

        public boolean EN_WORD_SEG
        do the English word extract
      • KEEP_UNREG_WORDS

        public boolean KEEP_UNREG_WORDS
    • Constructor Detail

      • SegmenterConfig

        public SegmenterConfig()
        create the config and do nothing about initialize Note: this may cuz Incompatibility problems for the old version that has use this construct method
        Since:
        1.9.8
      • SegmenterConfig

        public SegmenterConfig​(boolean autoLoad)
        create and initialize the config by autoload
        Parameters:
        autoLoad -
      • SegmenterConfig

        public SegmenterConfig​(String proFile)
        create and initialize the task config from a properties file
        Parameters:
        proFile -
      • SegmenterConfig

        public SegmenterConfig​(InputStream is)
        create and initialize the task config from a InputStream
        Parameters:
        is -
    • Method Detail

      • load

        public void load​(String proFile)
                  throws IOException
        initialize the value of its options from a specified jcseg.properties propertie file
        Parameters:
        proFile -
        Throws:
        IOException
      • autoLoad

        public void autoLoad()
                      throws IOException
        initialize the value of its options by auto searching the jcesg.properties file:

        1. Inside the dir that jcseg-core-{version}.jar is located, means beside the jar file.

        2. Search root classpath.

      • First, could manually put this file into root classpath (out of any jar file).
      • Second, there is a copy of this file inside jcseg-core-{version}.jar. It will be used if didn't manually copy this file into classpath.

        3. Load from system property "user.home".

Throws:
IOException