Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-40328: Add tool for generating cjk mapping headers #19602

Merged
merged 14 commits into from
Apr 29, 2020

Conversation

corona10
Copy link
Member

@corona10 corona10 commented Apr 19, 2020

@corona10 corona10 changed the title bpo-40328: Add tool for generating mapping headers. bpo-40328: Add tool for generating mapping headers Apr 19, 2020
@corona10 corona10 requested a review from vstinner April 19, 2020 13:29
@corona10
Copy link
Member Author

corona10 commented Apr 19, 2020

@vstinner
If this repository would not be a proper place to store files under tool/data
Is it okay to create a new repository for this? (it's too huge)
But files need to be stored somewhere in our manageable place since
these files can be broken and loose when the times are passed.
(e.g http://www.info.gov.hk/digital21/eng/hkscs/download/big5-iso.txt is broken now.)

@corona10 corona10 requested a review from methane April 19, 2020 13:33
@corona10 corona10 changed the title bpo-40328: Add tool for generating mapping headers bpo-40328: Add tool for generating cjk mapping headers Apr 19, 2020
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to put this script in Tools/unicode/ and don't store Unicode files in Modules/cjkcodecs/tools/data/CP932.TXT, but always download them: as done by Tools/unicode/makeunicodedata.py or some unit tests.

Copy link
Member Author

@corona10 corona10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vstinner Please take a look :)

@corona10
Copy link
Member Author

@methane @serhiy-storchaka
Also, Can you please take look at this PR?

@methane
Copy link
Member

methane commented Apr 23, 2020

I don't have time to understand the script. I only confirmed that this pull request doesn't change the mapping header files.

@corona10
Copy link
Member Author

@methane
Thank you Naoki San!
The main objective of this PR is generating the same mapping header files :)
For example, we can make some change with this script when
something needs to change about supporting Japanese.
(I don't know the practical case)

@corona10
Copy link
Member Author

@vstinner Can we merge this PR?

Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we run this script in "make regen-all"? Or do you expect that these generated files will never ever change?

Currently, "make regen-all" has no 3rd party dependencies, whereas this script requires to download files from the Internet. In my experience, any additional external dependency makes Python CIs less stable :-(

Other "unicode scripts" are no run by "make regen-all", so I think that it's fine.

@vstinner
Copy link
Member

@corona10: I let you decide if you are confident enough to merge this PR with no review.

Since this script is not run by any CI and only run manually, I don't see any risk of regression.

It seems like "Unicode experts" (including me) don't have the bandwidth to review your change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants