SUSE Package Hub - openSUSE-2022-10040

Update Info

openSUSE-2022-10040

Security update for python-nltk

Type: security
Severity: moderate
Issued: 2022-07-03
Description:
This update for python-nltk fixes the following issues:

Update to 3.7

  - Improve and update the NLTK team page on nltk.org (#2855,
    #2941)
  - Drop support for Python 3.6, support Python 3.10 (#2920)

- Update to 3.6.7

  - Resolve IndexError in `sent_tokenize` and `word_tokenize`
    (#2922)

- Update to 3.6.6

  - Refactor `gensim.doctest` to work for gensim 4.0.0 and up
    (#2914)
  - Add Precision, Recall, F-measure, Confusion Matrix to Taggers
    (#2862)
  - Added warnings if .zip files exist without any corresponding
    .csv files. (#2908)
  - Fix `FileNotFoundError` when the `download_dir` is
    a non-existing nested folder (#2910)
  - Rename omw to omw-1.4 (#2907)
  - Resolve ReDoS opportunity by fixing incorrectly specified
    regex (#2906, boo#1191030, CVE-2021-3828).
  - Support OMW 1.4 (#2899)
  - Deprecate Tree get and set node methods (#2900)
  - Fix broken inaugural test case (#2903)
  - Use Multilingual Wordnet Data from OMW with newer Wordnet
    versions (#2889)
  - Keep NLTKs "tokenize" module working with pathlib (#2896)
  - Make prettyprinter to be more readable (#2893)
  - Update links to the nltk book (#2895)
  - Add `CITATION.cff` to nltk (#2880)
  - Resolve serious ReDoS in PunktSentenceTokenizer (#2869)
  - Delete old CI config files (#2881)
  - Improve Tokenize documentation + add TokenizerI as superclass
    for TweetTokenizer (#2878)
  - Fix expected value for BLEU score doctest after changes from
    #2572
  - Add multi Bleu functionality and tests (#2793)
  - Deprecate 'return_str' parameter in NLTKWordTokenizer and
    TreebankWordTokenizer (#2883)
  - Allow empty string in CFG's + more (#2888)
  - Partition `tree.py` module into `tree` package + pickle fix
    (#2863)
  - Fix several TreebankWordTokenizer and NLTKWordTokenizer bugs
    (#2877)
  - Rewind Wordnet data file after each lookup (#2868)
  - Correct __init__ call for SyntaxCorpusReader subclasses
    (#2872)
  - Documentation fixes (#2873)
  - Fix levenstein distance for duplicated letters (#2849)
  - Support alternative Wordnet versions (#2860)
  - Remove hundreds of formatting warnings for nltk.org (#2859)
  - Modernize `nltk.org/howto` pages (#2856)
  - Fix Bleu Score smoothing function from taking log(0) (#2839)
  - Update third party tools to newer versions and removing
    MaltParser fixed version (#2832)
  - Fix TypeError: _pretty() takes 1 positional argument but 2
    were given in sem/drt.py (#2854)
  - Replace `http` with `https` in most URLs (#2852)

- Update to 3.6.5

  - modernised nltk.org website
  - addressed LGTM.com issues
  - support ZWJ sequences emoji and skin tone modifer emoji in
    TweetTokenizer
  - METEOR evaluation now requires pre-tokenized input
  - Code linting and type hinting
  - implement get_refs function for DrtLambdaExpression
  - Enable automated CoreNLP, Senna, Prover9/Mace4, Megam,
    MaltParser CI tests
  - specify minimum regex version that supports regex.Pattern
  - avoid re.Pattern and regex.Pattern which fail for Python 3.6,
    3.7

- Update to 3.6.4

  - deprecate `nltk.usage(obj)` in favor of `help(obj)`
  - resolve ReDoS vulnerability in Corpus Reader
  - solidify performance tests
  - improve phone number recognition in tweet tokenizer
  - refactored CISTEM stemmer for German
  - identify NLTK Team as the author
  - replace travis badge with github actions badge
  - add SECURITY.md

- Update to 3.6.3

  - Dropped support for Python 3.5
  - Run CI tests on Windows, too
  - Moved from Travis CI to GitHub Actions
  - Code and comment cleanups
  - Visualize WordNet relation graphs using Graphviz
  - Fixed large error in METEOR score
  - Apply isort, pyupgrade, black, added as pre-commit hooks
  - Prevent debug_decisions in Punkt from throwing IndexError
  - Resolved ZeroDivisionError in RIBES with dissimilar sentences
  - Initialize WordNet IC total counts with smoothing value
  - Fixed AttributeError for Arabic ARLSTem2 stemmer
  - Many fixes and improvements to lm language model package
  - Fix bug in nltk.metrics.aline, C_skip = -10
  - Improvements to TweetTokenizer
  - Optional show arg for FreqDist.plot, ConditionalFreqDist.plot
  - edit_distance now computes Damerau-Levenshtein edit-distance

- Update to 3.6.2

  - move test code to nltk/test
  - fix bug in NgramAssocMeasures (order preserving fix)

- Update to 3.6

  - add support for Python 3.9
  - add Tree.fromlist
  - compute Minimum Spanning Tree of unweighted graph using BFS
  - fix bug with infinite loop in Wordnet closure and tree
  - fix bug in calculating BLEU using smoothing method 4
  - Wordnet synset similarities work for all pos
  - new Arabic light stemmer (ARLSTem2)
  - new syllable tokenizer (LegalitySyllableTokenizer)
  - remove nose in favor of pytest

- Update to v3.5

  * add support for Python 3.8
  * drop support for Python 2
  * create NLTK's own Tokenizer class distinct from the Treebank
    reference tokeniser
  * update Vader sentiment analyser
  * fix JSON serialization of some PoS taggers
  * minor improvements in grammar.CFG, Vader, pl196x corpus reader,
    StringTokenizer
  * change implementation <= and >= for FreqDist so they are partial
    orders
  * make FreqDist iterable
  * correctly handle Penn Treebank trees with a unlabeled branching
    top node

- Update to 3.4.5 (boo#1146427, CVE-2019-14751):
References

Packages

python-nltk-3.7-bp152.3.3.1