Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/develop' into develop
Browse files Browse the repository at this point in the history
  • Loading branch information
ekaf committed Jan 5, 2023
2 parents 83b5975 + 175929b commit 4477788
Show file tree
Hide file tree
Showing 354 changed files with 580 additions and 840 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ jobs:
needs: [cache_nltk_data, cache_third_party]
strategy:
matrix:
python-version: ['3.7', '3.8', '3.9', '3.10']
python-version: ['3.7', '3.8', '3.9', '3.10', '3.11']
os: [ubuntu-latest, macos-latest, windows-latest]
fail-fast: false
runs-on: ${{ matrix.os }}
Expand Down
10 changes: 10 additions & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
@@ -1,4 +1,14 @@

Version 3.8.1 2023-01-02

* Resolve RCE vulnerability in localhost WordNet Browser (#3100)
* Remove unused tool scripts (#3099)
* Resolve XSS vulnerability in localhost WordNet Browser (#3096)
* Add Python 3.11 support (#3090)

Thanks to the following contributors to 3.8.1:
Francis Bond, John Vandenberg, Tom Aarsen

Version 3.8 2022-12-12

* Refactor dispersion plot (#3082)
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: source Makefile
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Steven Bird <[email protected]>
# Edward Loper <[email protected]>
# URL: <https://www.nltk.org/>
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

NLTK -- the Natural Language Toolkit -- is a suite of open source Python
modules, data sets, and tutorials supporting research and development in Natural
Language Processing. NLTK requires Python version 3.7, 3.8, 3.9 or 3.10.
Language Processing. NLTK requires Python version 3.7, 3.8, 3.9, 3.10 or 3.11.

For documentation, please visit [nltk.org](https://www.nltk.org/).

Expand Down Expand Up @@ -33,7 +33,7 @@ If you publish work that uses NLTK, please cite the NLTK book, as follows:

## Copyright

Copyright (C) 2001-2022 NLTK Project
Copyright (C) 2001-2023 NLTK Project

For license information, see [LICENSE.txt](LICENSE.txt).

Expand Down
2 changes: 1 addition & 1 deletion RELEASE-HOWTO.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Building an NLTK distribution

3. Build Documentation
- Check the copyright year is correct and update if necessary
e.g. ./tools/global_replace.py 2001-2022 2001-2022
e.g. ./tools/global_replace.py 2001-2022 2001-2023
check web/conf.py copyright line
- Check that installation instructions are up-to-date
(including the range of Python versions that are supported)
Expand Down
2 changes: 1 addition & 1 deletion nltk/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.8.1a
3.8.1
7 changes: 4 additions & 3 deletions nltk/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit (NLTK)
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Authors: Steven Bird <[email protected]>
# Edward Loper <[email protected]>
# URL: <https://www.nltk.org/>
Expand Down Expand Up @@ -42,7 +42,7 @@

# Copyright notice
__copyright__ = """\
Copyright (C) 2001-2022 NLTK Project.
Copyright (C) 2001-2023 NLTK Project.
Distributed and Licensed under the Apache License, Version 2.0,
which is included by reference.
Expand All @@ -52,7 +52,7 @@
# Description of the toolkit, keywords, and the project's primary URL.
__longdescr__ = """\
The Natural Language Toolkit (NLTK) is a Python package for
natural language processing. NLTK requires Python 3.7, 3.8, 3.9 or 3.10."""
natural language processing. NLTK requires Python 3.7, 3.8, 3.9, 3.10 or 3.11."""
__keywords__ = [
"NLP",
"CL",
Expand Down Expand Up @@ -88,6 +88,7 @@
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Topic :: Scientific/Engineering",
"Topic :: Scientific/Engineering :: Artificial Intelligence",
"Topic :: Scientific/Engineering :: Human Machine Interfaces",
Expand Down
2 changes: 1 addition & 1 deletion nltk/app/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Applications package
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Edward Loper <[email protected]>
# Steven Bird <[email protected]>
# URL: <https://www.nltk.org/>
Expand Down
2 changes: 1 addition & 1 deletion nltk/app/chartparser_app.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Chart Parser Application
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Edward Loper <[email protected]>
# Jean Mark Gawron <[email protected]>
# Steven Bird <[email protected]>
Expand Down
2 changes: 1 addition & 1 deletion nltk/app/chunkparser_app.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Regexp Chunk Parser Application
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Edward Loper <[email protected]>
# URL: <https://www.nltk.org/>
# For license information, see LICENSE.TXT
Expand Down
2 changes: 1 addition & 1 deletion nltk/app/collocations_app.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Collocations Application
# Much of the GUI code is imported from concordance.py; We intend to merge these tools together
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Sumukh Ghodke <[email protected]>
# URL: <https://www.nltk.org/>
# For license information, see LICENSE.TXT
Expand Down
2 changes: 1 addition & 1 deletion nltk/app/concordance_app.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Concordance Application
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Sumukh Ghodke <[email protected]>
# URL: <https://www.nltk.org/>
# For license information, see LICENSE.TXT
Expand Down
2 changes: 1 addition & 1 deletion nltk/app/rdparser_app.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Recursive Descent Parser Application
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Edward Loper <[email protected]>
# URL: <https://www.nltk.org/>
# For license information, see LICENSE.TXT
Expand Down
2 changes: 1 addition & 1 deletion nltk/app/srparser_app.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Shift-Reduce Parser Application
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Edward Loper <[email protected]>
# URL: <https://www.nltk.org/>
# For license information, see LICENSE.TXT
Expand Down
2 changes: 1 addition & 1 deletion nltk/app/wordfreq_app.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Wordfreq Application
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Sumukh Ghodke <[email protected]>
# URL: <https://www.nltk.org/>
# For license information, see LICENSE.TXT
Expand Down
42 changes: 25 additions & 17 deletions nltk/app/wordnet_app.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: WordNet Browser Application
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Jussi Salmela <[email protected]>
# Paul Bone <[email protected]>
# URL: <https://www.nltk.org/>
Expand Down Expand Up @@ -47,11 +47,10 @@

import base64
import copy
import datetime
import getopt
import io
import os
import pickle
import re
import sys
import threading
import time
Expand All @@ -60,17 +59,12 @@
from http.server import BaseHTTPRequestHandler, HTTPServer

# Allow this program to run inside the NLTK source tree.
from sys import argv, path
from sys import argv
from urllib.parse import unquote_plus

from nltk.corpus import wordnet as wn
from nltk.corpus.reader.wordnet import Lemma, Synset

# now included in local file
# from util import html_header, html_trailer, \
# get_static_index_page, get_static_page_by_path, \
# page_from_word, page_from_href

firstClient = True

# True if we're not also running a web browser. The value f server_mode
Expand Down Expand Up @@ -127,7 +121,12 @@ def do_GET(self):
else:
# Handle files here.
word = sp
page = get_static_page_by_path(usp)
try:
page = get_static_page_by_path(usp)
except FileNotFoundError:
page = "Internal error: Path for static page '%s' is unknown" % usp
# Set type to plain to prevent XSS by printing the path as HTML
type = "text/plain"
elif sp.startswith("search"):
# This doesn't seem to work with MWEs.
type = "text/html"
Expand Down Expand Up @@ -654,6 +653,16 @@ def make_synset_html(db_name, disp_name, rels):
return html


class RestrictedUnpickler(pickle.Unpickler):
"""
Unpickler that prevents any class or function from being used during loading.
"""

def find_class(self, module, name):
# Forbid every function
raise pickle.UnpicklingError(f"global '{module}.{name}' is forbidden")


class Reference:
"""
A reference to a page that may be generated by page_word
Expand Down Expand Up @@ -689,7 +698,7 @@ def decode(string):
Decode a reference encoded with Reference.encode
"""
string = base64.urlsafe_b64decode(string.encode())
word, synset_relations = pickle.loads(string)
word, synset_relations = RestrictedUnpickler(io.BytesIO(string)).load()
return Reference(word, synset_relations)

def toggle_synset_relation(self, synset, relation):
Expand Down Expand Up @@ -789,7 +798,7 @@ def page_from_reference(href):
except KeyError:
pass
if not body:
body = "The word or words '%s' where not found in the dictionary." % word
body = "The word or words '%s' were not found in the dictionary." % word
return body, word


Expand All @@ -816,8 +825,7 @@ def get_static_page_by_path(path):
return get_static_web_help_page()
elif path == "wx_help.html":
return get_static_wx_help_page()
else:
return "Internal error: Path for static page '%s' is unknown" % path
raise FileNotFoundError()


def get_static_web_help_page():
Expand All @@ -828,7 +836,7 @@ def get_static_web_help_page():
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<!-- Natural Language Toolkit: Wordnet Interface: Graphical Wordnet Browser
Copyright (C) 2001-2022 NLTK Project
Copyright (C) 2001-2023 NLTK Project
Author: Jussi Salmela <[email protected]>
URL: <https://www.nltk.org/>
For license information, see LICENSE.TXT -->
Expand Down Expand Up @@ -898,7 +906,7 @@ def get_static_index_page(with_shutdown):
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
<HTML>
<!-- Natural Language Toolkit: Wordnet Interface: Graphical Wordnet Browser
Copyright (C) 2001-2022 NLTK Project
Copyright (C) 2001-2023 NLTK Project
Author: Jussi Salmela <[email protected]>
URL: <https://www.nltk.org/>
For license information, see LICENSE.TXT -->
Expand Down Expand Up @@ -931,7 +939,7 @@ def get_static_upper_page(with_shutdown):
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html>
<!-- Natural Language Toolkit: Wordnet Interface: Graphical Wordnet Browser
Copyright (C) 2001-2022 NLTK Project
Copyright (C) 2001-2023 NLTK Project
Author: Jussi Salmela <[email protected]>
URL: <https://www.nltk.org/>
For license information, see LICENSE.TXT -->
Expand Down
2 changes: 1 addition & 1 deletion nltk/book.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Some texts for exploration in chapter 1 of the book
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Steven Bird <[email protected]>
#
# URL: <https://www.nltk.org/>
Expand Down
2 changes: 1 addition & 1 deletion nltk/ccg/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Combinatory Categorial Grammar
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Graeme Gange <[email protected]>
# URL: <https://www.nltk.org/>
# For license information, see LICENSE.TXT
Expand Down
2 changes: 1 addition & 1 deletion nltk/ccg/api.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: CCG Categories
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Graeme Gange <[email protected]>
# URL: <https://www.nltk.org/>
# For license information, see LICENSE.TXT
Expand Down
2 changes: 1 addition & 1 deletion nltk/ccg/chart.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Combinatory Categorial Grammar
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Graeme Gange <[email protected]>
# URL: <https://www.nltk.org/>
# For license information, see LICENSE.TXT
Expand Down
2 changes: 1 addition & 1 deletion nltk/ccg/combinator.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Combinatory Categorial Grammar
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Graeme Gange <[email protected]>
# URL: <https://www.nltk.org/>
# For license information, see LICENSE.TXT
Expand Down
2 changes: 1 addition & 1 deletion nltk/ccg/lexicon.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Combinatory Categorial Grammar
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Graeme Gange <[email protected]>
# URL: <https://www.nltk.org/>
# For license information, see LICENSE.TXT
Expand Down
2 changes: 1 addition & 1 deletion nltk/ccg/logic.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Combinatory Categorial Grammar
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Tanin Na Nakorn (@tanin)
# URL: <https://www.nltk.org/>
# For license information, see LICENSE.TXT
Expand Down
2 changes: 1 addition & 1 deletion nltk/chat/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Chatbots
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Authors: Steven Bird <[email protected]>
# URL: <https://www.nltk.org/>
# For license information, see LICENSE.TXT
Expand Down
2 changes: 1 addition & 1 deletion nltk/chat/eliza.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Eliza
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Authors: Steven Bird <[email protected]>
# Edward Loper <[email protected]>
# URL: <https://www.nltk.org/>
Expand Down
2 changes: 1 addition & 1 deletion nltk/chat/iesha.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Teen Chatbot
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Selina Dennis <[email protected]>
# URL: <https://www.nltk.org/>
# For license information, see LICENSE.TXT
Expand Down
2 changes: 1 addition & 1 deletion nltk/chat/rude.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Rude Chatbot
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Peter Spiller <[email protected]>
# URL: <https://www.nltk.org/>
# For license information, see LICENSE.TXT
Expand Down
2 changes: 1 addition & 1 deletion nltk/chat/suntsu.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Sun Tsu-Bot
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Sam Huston 2007
# URL: <https://www.nltk.org/>
# For license information, see LICENSE.TXT
Expand Down
2 changes: 1 addition & 1 deletion nltk/chat/util.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Chatbot Utilities
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Authors: Steven Bird <[email protected]>
# URL: <https://www.nltk.org/>
# For license information, see LICENSE.TXT
Expand Down
2 changes: 1 addition & 1 deletion nltk/chat/zen.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Natural Language Toolkit: Zen Chatbot
#
# Copyright (C) 2001-2022 NLTK Project
# Copyright (C) 2001-2023 NLTK Project
# Author: Amy Holland <[email protected]>
# URL: <https://www.nltk.org/>
# For license information, see LICENSE.TXT
Expand Down
Loading

0 comments on commit 4477788

Please sign in to comment.