Class CharsetRecognizer

java.lang.Object
com.ibm.icu.text.CharsetRecognizer
Direct Known Subclasses:
CharsetRecog_2022, CharsetRecog_mbcs, CharsetRecog_sbcs, CharsetRecog_Unicode, CharsetRecog_UTF8

abstract class CharsetRecognizer extends Object
Abstract class for recognizing a single charset. Part of the implementation of ICU's CharsetDetector. Each specific charset that can be recognized will have an instance of some subclass of this class. All interaction between the overall CharsetDetector and the stuff specific to an individual charset happens via the interface provided here. Instances of CharsetDetector DO NOT have or maintain state pertaining to a specific match or detect operation. The WILL be shared by multiple instances of CharsetDetector. They encapsulate const charset-specific information.
  • Constructor Details

    • CharsetRecognizer

      CharsetRecognizer()
  • Method Details

    • getName

      abstract String getName()
      Get the IANA name of this charset.
      Returns:
      the charset name.
    • getLanguage

      public String getLanguage()
      Get the ISO language code for this charset.
      Returns:
      the language code, or null if the language cannot be determined.
    • match

      abstract CharsetMatch match(CharsetDetector det)
      Test the match of this charset with the input text data which is obtained via the CharsetDetector object.
      Parameters:
      det - The CharsetDetector, which contains the input text to be checked for being in this charset.
      Returns:
      A CharsetMatch object containing details of match with this charset, or null if there was no match.