Index index by Group index by Distribution index by Vendor index by creation date index by Name Mirrors Help Search

python39-charset-normalizer-2.1.0-1.1 RPM for noarch

From OpenSuSE Tumbleweed for noarch

Name: python39-charset-normalizer Distribution: openSUSE Tumbleweed
Version: 2.1.0 Vendor: openSUSE
Release: 1.1 Build date: Tue Jul 26 20:01:19 2022
Group: Unspecified Build host: lamb51
Size: 324547 Source RPM: python-charset-normalizer-2.1.0-1.1.src.rpm
Summary: Python Universal Charset detector
Python Universal Charset detector.






* Tue Jul 19 2022 Dirk Müller <>
  - update to 2.1.0:
    * Output the Unicode table version when running the CLI with `--version`
    * Re-use decoded buffer for single byte character sets
    * Fixing some performance bottlenecks
    * Workaround potential bug in cpython with Zero Width No-Break Space located
    * in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space
    * CLI default threshold aligned with the API threshold from
    * Support for Python 3.5 (PR #192)
    * Use of backport unicodedata from `unicodedata2` as Python is quickly
      catching up, scheduled for removal in 3.0
* Tue Feb 15 2022 Dirk Müller <>
  - update to 2.0.12:
    * ASCII miss-detection on rare cases (PR #170)
    * Explicit support for Python 3.11 (PR #164)
    * The logging behavior have been completely reviewed, now using only TRACE
      and DEBUG levels
* Mon Jan 10 2022 Dirk Müller <>
  - update to 2.0.10:
    * Fallback match entries might lead to UnicodeDecodeError for large bytes
    * Skipping the language-detection (CD) on ASCII
* Mon Dec 06 2021 Dirk Müller <>
  - update to 2.0.9:
    * Moderating the logging impact (since 2.0.8) for specific
    * Wrong logging level applied when setting kwarg `explain` to True
* Mon Nov 29 2021 Dirk Müller <>
  - update to 2.0.8:
    * Improvement over Vietnamese detection
    * MD improvement on trailing data and long foreign (non-pure latin)
    * Efficiency improvements in cd/alphabet_languages
    * call sum() without an intermediary list following PEP 289 recommendations
    * Code style as refactored by Sourcery-AI
    * Minor adjustment on the MD around european words
    * Remove and replace SRTs from assets / tests
    * Initialize the library logger with a `NullHandler` by default
    * Setting kwarg `explain` to True will add provisionally
    * Fix large (misleading) sequence giving UnicodeDecodeError
    * Avoid using too insignificant chunk
    * Add and expose function `set_logging_handler` to configure a specific
* Fri Nov 26 2021 Dirk Müller <>
  - require lower-case name instead of breaking build
* Thu Nov 25 2021 Matej Cepl <>
  - Use lower-case name of prettytable package
* Sun Oct 17 2021 Martin Hauke <>
  - Update to version 2.0.7
    * Addition: bento Add support for Kazakh (Cyrillic) language
    * Improvement: sparkle Further improve inferring the language
      from a given code page (single-byte).
    * Removed: fire Remove redundant logging entry about detected
    * Improvement: zap Refactoring for potential performance
      improvements in loops.
    * Improvement: sparkles Various detection improvement (MD+CD).
    * Bugfix: bug Fix a minor inconsistency between Python 3.5 and
      other versions regarding language detection.
  - Update to version 2.0.6
    * Bugfix: bug Unforeseen regression with the loss of the
      backward-compatibility with some older minor of Python 3.5.x.
    * Bugfix: bug Fix CLI crash when using --minimal output in
      certain cases.
    * Improvement: sparkles Minor improvement to the detection
      efficiency (less than 1%).
  - Update to version 2.0.5
    * Improvement: sparkles The BC-support with v1.x was improved,
      the old staticmethods are restored.
    * Remove: fire The project no longer raise warning on tiny
      content given for detection, will be simply logged as warning
    * Improvement: sparkles The Unicode detection is slightly
      improved, see #93
    * Bugfix: bug In some rare case, the chunks extractor could cut
      in the middle of a multi-byte character and could mislead the
      mess detection.
    * Bugfix: bug Some rare 'space' characters could trip up the
      UnprintablePlugin/Mess detection.
    * Improvement: art Add syntax sugar __bool__ for results
      CharsetMatches list-container.
  - Update to version 2.0.4
    * Improvement: sparkle Adjust the MD to lower the sensitivity,
      thus improving the global detection reliability.
    * Improvement: sparkle Allow fallback on specified encoding
      if any.
    * Bugfix: bug The CLI no longer raise an unexpected exception
      when no encoding has been found.
    * Bugfix: bug Fix accessing the 'alphabets' property when the
      payload contains surrogate characters.
    * Bugfix: bug pencil2 The logger could mislead (explain=True) on
      detected languages and the impact of one MBCS match (in #72)
    * Bugfix: bug Submatch factoring could be wrong in rare edge
      cases (in #72)
    * Bugfix: bug Multiple files given to the CLI were ignored when
      publishing results to STDOUT. (After the first path) (in #72)
    * Internal: art Fix line endings from CRLF to LF for certain
  - Update to version 2.0.3
    * Improvement: sparkles Part of the detection mechanism has been
      improved to be less sensitive, resulting in more accurate
      detection results. Especially ASCII. #63 Fix #62
    * Improvement: sparklesAccording to the community wishes, the
      detection will fall back on ASCII or UTF-8 in a last-resort
  - Update to version 2.0.2
    * Bugfix: bug Empty/Too small JSON payload miss-detection fixed.
    * Improvement: sparkler Don't inject unicodedata2 into sys.modules
  - Update to version 2.0.1
    * Bugfix: bug Make it work where there isn't a filesystem
      available, dropping assets frequencies.json.
    * Improvement: sparkles You may now use aliases in cp_isolation
      and cp_exclusion arguments.
    * Bugfix: bug Using explain=False permanently disable the verbose
      output in the current runtime #47
    * Bugfix: bug One log entry (language target preemptive) was not
      show in logs when using explain=True #47
    * Bugfix: bug Fix undesired exception (ValueError) on getitem of
      instance CharsetMatches #52
    * Improvement: wrench Public function normalize default args
      values were not aligned with from_bytes #53
  - Update to version 2.0.0
    * Performance: zap 4x to 5 times faster than the previous 1.4.0
    * Performance: zap At least 2x faster than Chardet.
    * Performance: zap Accent has been made on UTF-8 detection,
      should perform rather instantaneous.
    * Improvement: back The backward compatibility with Chardet has
      been greatly improved. The legacy detect function returns an
      identical charset name whenever possible.
    * Improvement: sparkle The detection mechanism has been slightly
      improved, now Turkish content is detected correctly (most of
      the time)
    * Code: art The program has been rewritten to ease the
      readability and maintainability. (+Using static typing)
    * Tests: heavy_check_mark New workflows are now in place to
      verify the following aspects: Performance, Backward-
      Compatibility with Chardet, and Detection Coverage in addition#
      to currents tests. (+CodeQL)
    * Dependency: heavy_minus_sign This package no longer require
      anything when used with Python 3.5 (Dropped cached_property)
    * Docs: pencil2 Performance claims have been updated, the guide
      to contributing, and the issue template.
    * Improvement: sparkle Add --version argument to CLI
    * Bugfix: bug The CLI output used the relative path of the
      file(s). Should be absolute.
    * Deprecation: red_circle Methods coherence_non_latin, w_counter,
      chaos_secondary_pass of the class CharsetMatch are now
      deprecated and scheduled for removal in v3.0
    * Improvement: sparkle If no language was detected in content,
      trying to infer it using the encoding name/alphabets used.
    * Removal: fire Removed support for these languages: Catalan,
      Esperanto, Kazakh, Baque, Volapük, Azeri, Galician, Nynorsk,
      Macedonian, and Serbocroatian.
    * Improvement: sparkle utf_7 detection has been reinstated.
    * Removal: fire The exception hook on UnicodeDecodeError has
      been removed.
  - Update to version 1.4.1
    * Improvement: art Logger configuration/usage no longer
      conflict with others #44
  - Update to version 1.4.0
    * Dependency: heavy_minus_sign Using standard logging instead
      of using the package loguru.
    * Dependency: heavy_minus_sign Dropping nose test framework in
      favor of the maintained pytest.
    * Dependency: heavy_minus_sign Choose to not use dragonmapper
      package to help with gibberish Chinese/CJK text.
    * Dependency: wrench heavy_minus_sign Require cached_property
      only for Python 3.5 due to constraint. Dropping for every
      other interpreter version.
    * Bugfix: bug BOM marker in a CharsetNormalizerMatch instance
      could be False in rare cases even if obviously present. Due
      to the sub-match factoring process.
    * Improvement: sparkler Return ASCII if given sequences fit.
    * Performance: zap Huge improvement over the larges payload.
    * Change: fire Stop support for UTF-7 that does not contain a
      SIG. (Contributions are welcome to improve that point)
    * Feature: sparkler CLI now produces JSON consumable output.
    * Dependency: Dropping PrettyTable, replaced with pure JSON
    * Bugfix: bug Not searching properly for the BOM when trying
      utf32/16 parent codec.
    * Other: zap Improving the package final size by compressing
* Thu May 20 2021
  - version update to 1.3.9
    * Bugfix: bug In some very rare cases, you may end up getting encode/decode errors due to a bad bytes payload #40
    * Bugfix: bug Empty given payload for detection may cause an exception if trying to access the alphabets property. #39
    * Bugfix: bug The legacy detect function should return UTF-8-SIG if sig is present in the payload. #38
* Tue Feb 09 2021 John Vandenberg <>
  - Switch to PyPI source
  - Add Suggests: python-unicodedata2
  - Remove executable bit from charset_normalizer/assets/frequencies.json
  - Update to v1.3.6
    * Allow prettytable 2.0
  - from v1.3.5
    * Dependencies refactor and add support for py 3.9 and 3.10
    * Fix version parsing
* Mon May 25 2020 Petr Gajdos <>
  - %python3_only -> %python_alternative
* Mon Jan 27 2020 Marketa Calabkova <>
  - Update to 1.3.4
    * Improvement/Bugfix : False positive when searching for successive upper, lower char. (ProbeChaos)
    * Improvement : Noticeable better detection for jp
    * Bugfix : Passing zero-length bytes to from_bytes
    * Improvement : Expose version in package
    * Bugfix : Division by zero
    * Improvement : Prefers unicode (utf-8) when detected
    * Apparently dropped Python2 silently
* Fri Oct 04 2019 Marketa Calabkova <>
  - Update to 1.3.0
    * Backport unicodedata for v12 impl into python if available
    * Add aliases to CharsetNormalizerMatches class
    * Add feature preemptive behaviour, looking for encoding declaration
    * Add method to determine if specific encoding is multi byte
    * Add has_submatch property on a match
    * Add percent_chaos and percent_coherence
    * Coherence ratio based on mean instead of sum of best results
    * Using loguru for trace/debug <3
    * from_byte method improved
* Thu Sep 26 2019 Tomáš Chvátal <>
  - Update to 1.1.1:
    * from_bytes parameters steps and chunk_size were not adapted to sequence len if provided values were not fitted to content
    * Sequence having lenght bellow 10 chars was not checked
    * Legacy detect method inspired by chardet was not returning
    * Various more test updates
* Fri Sep 13 2019 Tomáš Chvátal <>
  - Update to 0.3:
    * Improvement on detection
    * Performance loss to expect
    * Added --threshold option to CLI
    * Bugfix on UTF 7 support
    * Legacy detect(byte_str) method
    * BOM support (Unicode mostly)
    * Chaos prober improved on small text
    * Language detection has been reviewed to give better result
    * Bugfix on jp detection, every jp text was considered chaotic
* Fri Aug 30 2019 Tomáš Chvátal <>
  - Fix the tarball to really be the one published by upstream
* Wed Aug 28 2019 John Vandenberg <>
  - Initial spec for v0.1.8



Generated by rpm2html 1.8.1

Fabrice Bellet, Sat Aug 20 23:23:18 2022