Class IntTrieBuilder

java.lang.Object
com.ibm.icu.impl.TrieBuilder
com.ibm.icu.impl.IntTrieBuilder

public class IntTrieBuilder extends TrieBuilder
Builder class to manipulate and generate a trie. This is useful for ICU data in primitive types. Provides a compact way to store information that is indexed by Unicode values, such as character properties, types, keyboard values, etc. This is very useful when you have a block of Unicode data that contains significant values while the rest of the Unicode data is unused in the application or when you have a lot of redundance, such as where all 21,000 Han ideographs have the same value. However, lookup is much faster than a hash table. A trie of any primitive data type serves two purposes:
  • Fast access of the indexed values.
  • Smaller memory footprint.
This is a direct port from the ICU4C version
  • Field Details

    • m_data_

      protected int[] m_data_
    • m_initialValue_

      protected int m_initialValue_
    • m_leadUnitValue_

      private int m_leadUnitValue_
  • Constructor Details

    • IntTrieBuilder

      public IntTrieBuilder(IntTrieBuilder table)
      Copy constructor
    • IntTrieBuilder

      public IntTrieBuilder(int[] aliasdata, int maxdatalength, int initialvalue, int leadunitvalue, boolean latin1linear)
      Constructs a build table
      Parameters:
      aliasdata - data to be filled into table
      maxdatalength - maximum data length allowed in table
      initialvalue - initial data value
      latin1linear - is latin 1 to be linear
  • Method Details

    • getValue

      public int getValue(int ch)
      Gets a 32 bit data from the table data
      Parameters:
      ch - codepoint which data is to be retrieved
      Returns:
      the 32 bit data
    • getValue

      public int getValue(int ch, boolean[] inBlockZero)
      Get a 32 bit data from the table data
      Parameters:
      ch - code point for which data is to be retrieved.
      inBlockZero - Output parameter, inBlockZero[0] returns true if the char maps into block zero, otherwise false.
      Returns:
      the 32 bit data value.
    • setValue

      public boolean setValue(int ch, int value)
      Sets a 32 bit data in the table data
      Parameters:
      ch - codepoint which data is to be set
      value - to set
      Returns:
      true if the set is successful, otherwise if the table has been compacted return false
    • serialize

      public IntTrie serialize(TrieBuilder.DataManipulate datamanipulate, Trie.DataManipulate triedatamanipulate)
      Serializes the build table with 32 bit data
      Parameters:
      datamanipulate - builder raw fold method implementation
      triedatamanipulate - result trie fold method
      Returns:
      a new trie
    • serialize

      public int serialize(OutputStream os, boolean reduceTo16Bits, TrieBuilder.DataManipulate datamanipulate) throws IOException
      Serializes the build table to an output stream. Compacts the build-time trie after all values are set, and then writes the serialized form onto an output stream. After this, this build-time Trie can only be serialized again and/or closed; no further values can be added. This function is the rough equivalent of utrie_seriaize() in ICU4C.
      Parameters:
      os - the output stream to which the seriaized trie will be written. If nul, the function still returns the size of the serialized Trie.
      reduceTo16Bits - If true, reduce the data size to 16 bits. The resulting serialized form can then be used to create a CharTrie.
      datamanipulate - builder raw fold method implementation
      Returns:
      the number of bytes written to the output stream.
      Throws:
      IOException
    • setRange

      public boolean setRange(int start, int limit, int value, boolean overwrite)
      Set a value in a range of code points [start..limit]. All code points c with start <= c < limit will get the value if overwrite is true or if the old value is 0.
      Parameters:
      start - the first code point to get the value
      limit - one past the last code point to get the value
      value - the value
      overwrite - flag for whether old non-initial values are to be overwritten
      Returns:
      false if a failure occurred (illegal argument or data array overrun)
    • allocDataBlock

      private int allocDataBlock()
    • getDataBlock

      private int getDataBlock(int ch)
      No error checking for illegal arguments.
      Parameters:
      ch - codepoint to look for
      Returns:
      -1 if no new data block available (out of memory in data array)
    • compact

      private void compact(boolean overlap)
      Compact a folded build-time trie. The compaction - removes blocks that are identical with earlier ones - overlaps adjacent blocks as much as possible (if overlap == true) - moves blocks in steps of the data granularity - moves and overlaps blocks that overlap with multiple values in the overlap region It does not - try to move and overlap blocks that are not already adjacent
      Parameters:
      overlap - flag
    • findSameDataBlock

      private static final int findSameDataBlock(int[] data, int dataLength, int otherBlock, int step)
      Find the same data block
      Parameters:
      data - array
      dataLength -
      otherBlock -
      step -
    • fold

      private final void fold(TrieBuilder.DataManipulate manipulate)
      Fold the normalization data for supplementary code points into a compact area on top of the BMP-part of the trie index, with the lead surrogates indexing this compact area. Duplicate the index values for lead surrogates: From inside the BMP area, where some may be overridden with folded values, to just after the BMP area, where they can be retrieved for code point lookups.
      Parameters:
      manipulate - fold implementation
    • fillBlock

      private void fillBlock(int block, int start, int limit, int value, boolean overwrite)