Class NLPSeg

  • All Implemented Interfaces:
    Serializable, ISegment

    public class NLPSeg
    extends ComplexSeg
    NLP segmentation implementation And this extends all the properties of the Complex one the rest of them are build for NLP only
    Author:
    chenxin
    See Also:
    Serialized Form
    • Method Detail

      • getNextTheWord

        protected IWord getNextTheWord​(IWord word)
                                throws IOException
        get the next the_xxx word like '第x个', '第x集' EG ...
        Parameters:
        word -
        Returns:
        IWord
        Throws:
        IOException
      • getNextTimeMergedWord

        protected IWord getNextTimeMergedWord​(IWord word,
                                              int eIdx)
                                       throws IOException
        get and return the next time merged date-time word
        Parameters:
        word -
        eIdx -
        Returns:
        IWord
        Throws:
        IOException
      • getNextDatetimeWord

        protected IWord getNextDatetimeWord​(IWord word,
                                            int entityIdx)
                                     throws IOException
        get and return the next date-time word
        Parameters:
        word -
        entityIdx -
        Returns:
        IWord
        Throws:
        IOException
      • getNumericUnitComposedWord

        public IWord getNumericUnitComposedWord​(int numeric,
                                                IWord unitWord)
      • nextLatinWord

        protected IWord nextLatinWord​(int c,
                                      int pos)
                               throws IOException
        find the letter or digit word from the current position count until the char is whitespace or not letter_digit
        Overrides:
        nextLatinWord in class Segmenter
        Parameters:
        c -
        pos -
        Returns:
        IWord
        Throws:
        IOException