The NetBSD Project

CVS log for pkgsrc/converters/py-charset-normalizer/distinfo

[BACK] Up to [cvs.NetBSD.org] / pkgsrc / converters / py-charset-normalizer

Request diff between arbitrary revisions


Keyword substitution: kv
Default branch: MAIN


Revision 1.19: download - view: text, markup, annotated - select for diffs
Thu Oct 10 09:58:01 2024 UTC (3 weeks, 6 days ago) by adam
Branches: MAIN
CVS tags: HEAD
Diff to: previous 1.18: preferred, colored
Changes since revision 1.18: +4 -4 lines
py-charset-normalizer: updated to 3.4.0

3.4.0

Added
- Argument `--no-preemptive` in the CLI to prevent the detector to search for hints.
- Support for Python 3.13

Fixed
- Relax the TypeError exception thrown when trying to compare a CharsetMatch with anything else than a CharsetMatch.
- Improved the general reliability of the detector based on user feedbacks.
- Declared charset in content (preemptive detection) not changed when converting to utf-8 bytes.

Revision 1.18: download - view: text, markup, annotated - select for diffs
Wed Nov 1 09:14:56 2023 UTC (12 months ago) by adam
Branches: MAIN
CVS tags: pkgsrc-2024Q3-base, pkgsrc-2024Q3, pkgsrc-2024Q2-base, pkgsrc-2024Q2, pkgsrc-2024Q1-base, pkgsrc-2024Q1, pkgsrc-2023Q4-base, pkgsrc-2023Q4
Diff to: previous 1.17: preferred, colored
Changes since revision 1.17: +4 -4 lines
py-charset-normalizer: updated to 3.3.2

3.3.2
Fixed
- Unintentional memory usage regression when using large payload that match several encoding
- Regression on some detection case showcased in the documentation

Revision 1.17: download - view: text, markup, annotated - select for diffs
Mon Oct 23 07:56:04 2023 UTC (12 months, 2 weeks ago) by adam
Branches: MAIN
Diff to: previous 1.16: preferred, colored
Changes since revision 1.16: +4 -4 lines
py-charset-normalizer: updated to 3.3.1

3.3.1
Changed
- Optional mypyc compilation upgraded to version 1.6.1 for Python >= 3.8
- Improved the general detection reliability based on reports from the community

Revision 1.16: download - view: text, markup, annotated - select for diffs
Sat Sep 30 17:16:30 2023 UTC (13 months, 1 week ago) by adam
Branches: MAIN
Diff to: previous 1.15: preferred, colored
Changes since revision 1.15: +4 -4 lines
py-charset-normalizer: updated to 3.3.0

3.3.0

Added
- Allow to execute the CLI (e.g. normalizer) through `python -m charset_normalizer.cli` or `python -m charset_normalizer`
- Support for 9 forgotten encoding that are supported by Python but unlisted in `encoding.aliases` as they have no alias

Removed
- (internal) Redundant utils.is_ascii function and unused function is_private_use_only
- (internal) charset_normalizer.assets is moved inside charset_normalizer.constant

Changed
- (internal) Unicode code blocks in constants are updated using the latest v15.0.0 definition to improve detection
- Optional mypyc compilation upgraded to version 1.5.1 for Python >= 3.7

Fixed
- Unable to properly sort CharsetMatch when both chaos/noise and coherence were close due to an unreachable condition in \_\_lt\_\_

Revision 1.15: download - view: text, markup, annotated - select for diffs
Sat Jul 8 04:35:31 2023 UTC (16 months ago) by adam
Branches: MAIN
CVS tags: pkgsrc-2023Q3-base, pkgsrc-2023Q3
Diff to: previous 1.14: preferred, colored
Changes since revision 1.14: +4 -4 lines
py-charset-normalizer: updated to 3.2.0

3.2.0

Changed
- Typehint for function `from_path` no longer enforce `PathLike` as its first argument
- Minor improvement over the global detection reliability

Added
- Introduce function `is_binary` that relies on main capabilities, and optimized to detect binaries
- Propagate `enable_fallback` argument throughout `from_bytes`, `from_path`, and `from_fp` that allow a deeper control over the detection (default True)
- Explicit support for Python 3.12

Fixed
- Edge case detection failure where a file would contain 'very-long' camel cased word

Revision 1.14: download - view: text, markup, annotated - select for diffs
Mon Apr 24 10:30:04 2023 UTC (18 months, 2 weeks ago) by adam
Branches: MAIN
CVS tags: pkgsrc-2023Q2-base, pkgsrc-2023Q2
Diff to: previous 1.13: preferred, colored
Changes since revision 1.13: +4 -4 lines
py-charset-normalizer: updated to 3.1.0

3.1.0

Added
- Argument `should_rename_legacy` for legacy function `detect` and disregard any new arguments without errors

Removed
- Support for Python 3.6

Changed
- Optional speedup provided by mypy/c 1.0.1

Revision 1.13: download - view: text, markup, annotated - select for diffs
Fri Nov 18 18:50:29 2022 UTC (23 months, 2 weeks ago) by adam
Branches: MAIN
CVS tags: pkgsrc-2023Q1-base, pkgsrc-2023Q1, pkgsrc-2022Q4-base, pkgsrc-2022Q4
Diff to: previous 1.12: preferred, colored
Changes since revision 1.12: +4 -4 lines
py-charset-normalizer: updated to 3.0.1

3.0.1 (2022-11-18)

Fixed

Multi-bytes cutter/chunk generator did not always cut correctly

Changed

Speedup provided by mypy/c 0.990 on Python >= 3.7


3.0.0 (2022-10-20)

Added

Extend the capability of explain=True when cp_isolation contains at most two entries (min one), will log in details of the Mess-detector results
Support for alternative language frequency set in charset_normalizer.assets.FREQUENCIES
Add parameter language_threshold in from_bytes, from_path and from_fp to adjust the minimum expected coherence ratio
normalizer --version now specify if current version provide extra speedup (meaning mypyc compilation whl)

Changed

Build with static metadata using 'build' frontend
Make the language detection stricter
Optional: Module md.py can be compiled using Mypyc to provide an extra speedup up to 4x faster than v2.1

Fixed

CLI with opt --normalize fail when using full path for files
TooManyAccentuatedPlugin induce false positive on the mess detection when too few alpha character have been fed to it
Sphinx warnings when generating the documentation

Removed

Coherence detector no longer return 'Simple English' instead return 'English'
Coherence detector no longer return 'Classical Chinese' instead return 'Chinese'
Breaking: Method first() and best() from CharsetMatch
UTF-7 will no longer appear as "detected" without a recognized SIG/mark (is unreliable/conflict with ASCII)
Breaking: Class aliases CharsetDetector, CharsetDoctor, CharsetNormalizerMatch and CharsetNormalizerMatches
Breaking: Top-level function normalize
Breaking: Properties chaos_secondary_pass, coherence_non_latin and w_counter from CharsetMatch
Support for the backport unicodedata2

Revision 1.12: download - view: text, markup, annotated - select for diffs
Wed Sep 14 11:10:00 2022 UTC (2 years, 1 month ago) by adam
Branches: MAIN
CVS tags: pkgsrc-2022Q3-base, pkgsrc-2022Q3
Diff to: previous 1.11: preferred, colored
Changes since revision 1.11: +4 -4 lines
py-charset-normalizer: updated to 2.1.1

2.1.1

Deprecated
- Function `normalize` scheduled for removal in 3.0

Changed
- Removed useless call to decode in fn is_unprintable

Fixed
- Third-party library (i18n xgettext) crashing not recognizing utf_8 (PEP 263) with underscore

Revision 1.11: download - view: text, markup, annotated - select for diffs
Fri Aug 5 13:59:38 2022 UTC (2 years, 3 months ago) by adam
Branches: MAIN
Diff to: previous 1.10: preferred, colored
Changes since revision 1.10: +4 -4 lines
py-charset-normalizer: updated to 2.1.0

2.1.0 (2022-06-19)

Added

Output the Unicode table version when running the CLI with --version

Changed

Re-use decoded buffer for single byte character sets
Fixing some performance bottlenecks

Fixed

Workaround potential bug in cpython with Zero Width No-Break Space located in Arabic Presentation Forms-B, Unicode 1.1 not acknowledged as space
CLI default threshold aligned with the API threshold

Removed

Support for Python 3.5

Deprecated

Use of backport unicodedata from unicodedata2 as Python is quickly catching up, scheduled for removal in 3.0

Revision 1.10: download - view: text, markup, annotated - select for diffs
Sat Feb 12 17:53:15 2022 UTC (2 years, 8 months ago) by adam
Branches: MAIN
CVS tags: pkgsrc-2022Q2-base, pkgsrc-2022Q2, pkgsrc-2022Q1-base, pkgsrc-2022Q1
Diff to: previous 1.9: preferred, colored
Changes since revision 1.9: +4 -4 lines
py-charset-normalizer: updated to 2.0.12

2.0.12
Fixed
- ASCII miss-detection on rare cases

Revision 1.9: download - view: text, markup, annotated - select for diffs
Mon Jan 31 11:04:38 2022 UTC (2 years, 9 months ago) by adam
Branches: MAIN
Diff to: previous 1.8: preferred, colored
Changes since revision 1.8: +4 -4 lines
py-charset-normalizer: updated to 2.0.11

2.0.11:

Added
- Explicit support for Python 3.11

Changed
- The logging behavior have been completely reviewed, now using only TRACE and DEBUG levels

Revision 1.8: download - view: text, markup, annotated - select for diffs
Fri Jan 7 16:37:10 2022 UTC (2 years, 9 months ago) by adam
Branches: MAIN
Diff to: previous 1.7: preferred, colored
Changes since revision 1.7: +4 -4 lines
py-charset-normalizer: updated to 2.0.10

2.0.10:
Fixed
- Fallback match entries might lead to UnicodeDecodeError for large bytes sequence

Revision 1.7: download - view: text, markup, annotated - select for diffs
Sat Dec 11 20:47:41 2021 UTC (2 years, 10 months ago) by adam
Branches: MAIN
CVS tags: pkgsrc-2021Q4-base, pkgsrc-2021Q4
Diff to: previous 1.6: preferred, colored
Changes since revision 1.6: +4 -4 lines
py-charset-normalizer: updated to 2.0.9

2.0.9

Changed
- Moderating the logging impact (since 2.0.8) for specific environments

Fixed
- Wrong logging level applied when setting kwarg `explain` to True

Revision 1.6: download - view: text, markup, annotated - select for diffs
Thu Nov 25 08:10:29 2021 UTC (2 years, 11 months ago) by adam
Branches: MAIN
Diff to: previous 1.5: preferred, colored
Changes since revision 1.5: +4 -4 lines
py-charset-normalizer: updated to 2.0.8

2.0.8
Changed
- Improvement over Vietnamese detection
- MD improvement on trailing data and long foreign (non-pure latin) data
- Efficiency improvements in cd/alphabet_languages from [@adbar](https://github.com/adbar)
- call sum() without an intermediary list following PEP 289 recommendations from [@adbar](https://github.com/adbar)
- Code style as refactored by Sourcery-AI
- Minor adjustment on the MD around european words
- Remove and replace SRTs from assets / tests
- Initialize the library logger with a `NullHandler` by default from [@nmaynes](https://github.com/nmaynes)
- Setting kwarg `explain` to True will add provisionally (bounded to function lifespan) a specific stream handler

Revision 1.5: download - view: text, markup, annotated - select for diffs
Tue Oct 26 10:06:49 2021 UTC (3 years ago) by nia
Branches: MAIN
Diff to: previous 1.4: preferred, colored
Changes since revision 1.4: +2 -2 lines
converters: Replace RMD160 checksums with BLAKE2s checksums

All checksums have been double-checked against existing RMD160 and
SHA512 hashes

Revision 1.4: download - view: text, markup, annotated - select for diffs
Tue Oct 12 09:12:20 2021 UTC (3 years ago) by adam
Branches: MAIN
Diff to: previous 1.3: preferred, colored
Changes since revision 1.3: +4 -4 lines
py-charset-normalizer: updated to 2.0.7

Version 2.0.7

Changes:

Addition: 🍱 Add support for Kazakh (Cyrillic) language detection
Improvement: ❇️ Further improve inferring the language from a given code page (single-byte)
Removed: 🔥 Remove redundant logging entry about detected language(s)
Miscellaneous: 🔧 Trying to leverage PEP263 when PEP3120 is not supported
While I do not think that this (116) will actually fix something, it will rather raise a SyntaxError (Not about ASCII decoding error) for those trying to install this package using a non-supported Python version
Improvement: ⚡ Refactoring for potential performance improvements in loops
Improvement: ✨ Various detection improvement (MD+CD)
Bugfix: 🐛 Fix a minor inconsistency between Python 3.5 and other versions regarding language detection

Revision 1.3: download - view: text, markup, annotated - select for diffs
Thu Oct 7 13:29:11 2021 UTC (3 years, 1 month ago) by nia
Branches: MAIN
Diff to: previous 1.2: preferred, colored
Changes since revision 1.2: +1 -2 lines
converters: Remove SHA1 hashes for distfiles

Revision 1.2: download - view: text, markup, annotated - select for diffs
Sun Sep 19 10:39:10 2021 UTC (3 years, 1 month ago) by adam
Branches: MAIN
CVS tags: pkgsrc-2021Q3-base, pkgsrc-2021Q3
Diff to: previous 1.1: preferred, colored
Changes since revision 1.1: +5 -5 lines
py-charset-normalizer: updated to 2.0.6

Version 2.0.6

Changes:

Bugfix: 🐛 Unforeseen regression with the loss of the backward-compatibility with some older minor of Python 3.5.x
Bugfix: 🐛 Fix CLI crash when using --minimal output in certain cases
Improvement: ✨ Minor improvement to the detection efficiency (less than 1%)


Version 2.0.5

Changes:

Internal: 🎨 The project now comply with: flake8, mypy, isort and black to ensure a better overall quality
Internal: 🎨 The MANIFEST.in was not exhaustive
Improvement: ✨ The BC-support with v1.x was improved, the old staticmethods are restored
Remove: 🔥 The project no longer raise warning on tiny content given for detection, will be simply logged as warning instead
Improvement: ✨ The Unicode detection is slightly improved
Bugfix: 🐛 In some rare case, the chunks extractor could cut in the middle of a multi-byte character and could mislead the mess detection
Bugfix: 🐛 Some rare 'space' characters could trip up the UnprintablePlugin/Mess detection
Improvement: 🎨 Add syntax sugar __bool__ for results CharsetMatches list-container

This release push further the detection coverage to 97 % !


Version 2.0.4

Changes:

Improvement: ❇️ Adjust the MD to lower the sensitivity, thus improving the global detection reliability
Improvement: ❇️ Allow fallback on specified encoding if any
Bugfix: 🐛 The CLI no longer raise an unexpected exception when no encoding has been found
Bugfix: 🐛 Fix accessing the 'alphabets' property when the payload contains surrogate characters
Bugfix: 🐛 ✏️ The logger could mislead (explain=True) on detected languages and the impact of one MBCS match
Bugfix: 🐛 Submatch factoring could be wrong in rare edge cases
Bugfix: 🐛 Multiple files given to the CLI were ignored when publishing results to STDOUT. (After the first path)
Internal: 🎨 Fix line endings from CRLF to LF for certain files

Revision 1.1: download - view: text, markup, annotated - select for diffs
Fri Jul 30 04:14:49 2021 UTC (3 years, 3 months ago) by adam
Branches: MAIN
py-charset-normalizer: added version 2.0.3

A library that helps you read text from an unknown charset encoding.

Diff request

This form allows you to request diffs between any two revisions of a file. You may select a symbolic revision name using the selection box or you may type in a numeric name using the type-in text box.

Log view options

CVSweb <webmaster@jp.NetBSD.org>