-
-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-40286: Add randbytes() method to random.Random #19527
Conversation
TODO: handle properly inheritance. Currently, random.Random is kind of weird. The base class directly implements Mersenne Twister, it's not an abstract class. Maybe a getrandbytes() implementation using getrandbits() should be implemented for subclasses? |
I'm not sure if my Mersenne Twister implementation is correct. Not sure if it should handle endianness differently for example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your MT implementation is fine. I don't think we guarantee identical results on different architectures. Otherwise you'll need to choose conventional endianness indeed. @mdickinson
I'm not sure if we should really care about subclassing here other than the classes bundled with Python. Otherwise you'll need to do a bit of plumbing in __init_subclass__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Allow n = 0.
- The result should be deterministic and do not depend on endianess. Ideally it should be identical to
getrandbits(n*8).to_bytes(n, 'little')
(or'big'
). - See also Raymod's suggestion about the name.
Would be nice to add the following tests:
- Call
seek()
with some constant and compare the result ofgetrandbytes()
with a constant. - Compare the results of several sequential calls of
getrandbits()
andgetrandbytes()
for identical Random objects (or afterseek()
with the same seed).
Modules/_randommodule.c
Outdated
size_t chunk = Py_MIN(n, 4); | ||
memcpy(ptr, &word, chunk); | ||
ptr += chunk; | ||
n -= chunk; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would write
size_t chunk = Py_MIN(n, 4); | |
memcpy(ptr, &word, chunk); | |
ptr += chunk; | |
n -= chunk; | |
if (n < 4) { | |
memcpy(ptr, &word, n); | |
break; | |
} | |
memcpy(ptr, &word, 4); | |
ptr += 4; | |
n -= 4; |
memcpy()
and incrementing/decrementing with constant size may be faster.
Also do not forget about endianess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would write (...)
Done.
Also do not forget about endianess.
I don't know how to handle the endianness. Should use htonl() to convert word to big endian?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you can use htonl()
or swap bytes manually, as it is done in many places in CPython for performace. Search WORDS_BIGENDIAN, PY_BIG_ENDIAN and PY_LITTLE_ENDIAN. Perhaps the most efficient way is:
x = ...
x = ((x & 0x00FF00FFu) << 8) | ((x >> 8) & 0x00FF00FFu);
x = ((x & 0x0000FFFFu) << 16) | ((x >> 16) & 0x0000FFFFu));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created https://bugs.python.org/issue40302 "Add _Py_bswap32() function to pyport.h".
I updated my PR to rename the method to randbytes(). |
Done. |
Doc/library/random.rst
Outdated
|
||
Generate *n* random bytes. | ||
|
||
If *n* is equal to zero, always returns an empty string for convenience. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it is needed. It may confuse the user if the degenerate case be explicitly mentioned for one and only one method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's kind of surprising that a function documented as "generate random bytes" returns a deterministic (non random) result, so I consider that it's worth it to document it.
... explicitly mentioned for one and only one method.
What do you mean? Currently, only randbytes() has this convenience behavior no? I asked Antoine to add a similar sentence for his getrandbits(0) PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
randrange(1, 2)
, uniform(10, 10)
and choice([7])
all return deterministic result. I think that most if not all of random functions can return deterministic result for some arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I remove this sentence.
Maybe I should wait until https://bugs.python.org/issue40302 "Add _Py_bswap32() function to pyport.h" is solved, rather than using htonl(). I'm not 100% sure that htonl() uses uint32_t on all platforms. It may use "long" type on some platforms and long may be larger than uint32_t. |
|
Misc/NEWS.d/next/Library/2020-04-15-00-39-25.bpo-40286.ai80FA.rst
Outdated
Show resolved
Hide resolved
If genrand_uint32() produces numbers 0x01234567 and 0x89abcdef, then |
Is it a question? Are you asking to change the result? |
FWIW, I don't think there is a problem if two different methods produce different results. |
It would be nice if there is a simple relation between |
Here you have:
The last line (test) is copied from the unit test of this PR: Python 3.7 with this script and Python 3.9 with my PR produces the same bytes. |
I rebased my PR to fix a merge conflict, I fixed the NEWS entry and added more tests to address Serhiy's latest review. |
I think that the following implementation would be simpler: def randbytes(gen, n):
return gen.getrandbits(n * 8).to_bytes(n, 'little') It is actually the most efficient code that is used now in cases when you need |
I prefer to design randbytes() to be efficient and reliable, rather than trying to be compatible with various recipes to generate bytes currently used in the wild. Serhiy:
People already worked around the lack of randbytes() and have their own implementation, see: https://bugs.python.org/issue40286 for various recipes. Antoine:
Yeah, I don't think that we should bother with compatibility with users creative ways to generate random bytes. For example, some people use Some other people prefer an "explicit" |
Add random.randbytes() function and random.Random.randbytes() method to generate random bytes. Modify secrets.token_bytes() to use SystemRandom.randbytes() rather than calling directly os.urandom(). Rename also genrand_int32() to genrand_uint32(), since it returns an unsigned 32-bit integer, not a signed integer. The _random module is now built with Py_BUILD_CORE_MODULE defined.
I rebased my PR to get https://bugs.python.org/issue40302
|
Thanks for the review @serhiy-storchaka and @pitrou! |
* master: (1985 commits) bpo-40179: Fix translation of #elif in Argument Clinic (pythonGH-19364) bpo-35967: Skip test with `uname -p` on Android (pythonGH-19577) bpo-40257: Improve help for the typing module (pythonGH-19546) Fix two typos in multiprocessing (pythonGH-19571) bpo-40286: Use random.randbytes() in tests (pythonGH-19575) bpo-40286: Makes simpler the relation between randbytes() and getrandbits() (pythonGH-19574) bpo-39894: Route calls from pathlib.Path.samefile() to os.stat() via the path accessor (pythonGH-18836) bpo-39897: Remove needless `Path(self.parent)` call, which makes `is_mount()` misbehave in `Path` subclasses. (pythonGH-18839) bpo-40282: Allow random.getrandbits(0) (pythonGH-19539) bpo-40302: UTF-32 encoder SWAB4() macro use a|b rather than a+b (pythonGH-19572) bpo-40302: Replace PY_INT64_T with int64_t (pythonGH-19573) bpo-40286: Add randbytes() method to random.Random (pythonGH-19527) bpo-39901: Move `pathlib.Path.owner()` and `group()` implementations into the path accessor. (pythonGH-18844) bpo-40300: Allow empty logging.Formatter.default_msec_format. (pythonGH-19551) bpo-40302: Add pycore_byteswap.h header file (pythonGH-19552) bpo-40287: Fix SpooledTemporaryFile.seek() return value (pythonGH-19540) Minor modernization and readability improvement to the tokenizer example (pythonGH-19558) bpo-40294: Fix _asyncio when module is loaded/unloaded multiple times (pythonGH-19542) Fix parameter names in assertIn() docs (pythonGH-18829) bpo-39793: use the same domain on make_msgid tests (python#18698) ...
Add random.randbytes() function and random.Random.randbytes()
method to generate random bytes.
Modify secrets.token_bytes() to use SystemRandom.randbytes()
rather than calling directly os.urandom().
Rename also genrand_int32() to genrand_uint32(), since it returns an
unsigned 32-bit integer, not a signed integer.
https://bugs.python.org/issue40286