Class NGramSeg

  • All Implemented Interfaces:
    ISegment

    public class NGramSeg
    extends Object
    implements ISegment
    Jcseg n-gram tokenizer implementation
    Since:
    2.6.0
    Author:
    lionsoul
    • Field Detail

      • idx

        protected int idx
        the index value of the current input stream mainly for track the start position of the token
      • wordPool

        protected final LinkedList<IWord> wordPool
        CJK word cache pool, Reusable string buffer
      • dic

        public final ADictionary dic
        the dictionary and task configuration
      • N

        protected byte N
        The N for n-gram, default to 1 and that is uni-gram
    • Constructor Detail

      • NGramSeg

        public NGramSeg​(SegmenterConfig config,
                        ADictionary dic)
        method to create a new ISegment
        Parameters:
        config -
        dic -
    • Method Detail

      • getStreamPosition

        public int getStreamPosition()
        Description copied from interface: ISegment
        get the current length of the stream
        Specified by:
        getStreamPosition in interface ISegment
      • readNext

        protected int readNext()
                        throws IOException
        read the next char from the current position
        Returns:
        int
        Throws:
        IOException
      • pushBack

        protected void pushBack​(int data)
        push back the data to the stream
        Parameters:
        data -
      • streamResetTo

        protected void streamResetTo​(String str,
                                     int start)
        reset the data back from the specified position
      • getNextType

        protected String getNextType​(int c,
                                     int type,
                                     CharTypeFunction checker)
                              throws IOException
        common interface to get the next n-gram word for the specified char type. For the basic Latin char this will automatically do the full-width to half-width uppercase to lowercase conversion.
        Parameters:
        c -
        type -
        checker -
        Returns:
        IWord
        Throws:
        IOException
      • wordNewOrClone

        public IWord wordNewOrClone​(int t,
                                    String str,
                                    int type)
        check if the specified word is existed in a specified dictionary and if does clone it or create a new one.
        Parameters:
        t -
        str -
        type -
      • getN

        public byte getN()
      • setN

        public void setN​(byte n)