Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chinese pages: correct Chinese punctuation #5240

Merged
merged 20 commits into from
Aug 2, 2021

Conversation

bl-ue
Copy link
Contributor

@bl-ue bl-ue commented Feb 5, 2021

Attempt to correct and normalize Chinese punctuation across all Chinese pages.

Note: I'm not a Chinese speaker, so there may be mistakes.

cc @einverne @mebeim
cc @gyli @wizarot @starccy @ChungZH @zhouLion @xiaolong-666 @sandylaw @shanoaice @telnetning

Refs:
#2897
#3426
#3442

What I've done so far:

  • Change colons before "More information" links to Chinese colon
  • Change colons at the end of example descriptsion to English colon per @mebeim's comment

Questions:

  • What should period be at the end of page sentences in page descriptions? for sentences that end with English letters? for sentences that end with Chinese letters?

@bl-ue bl-ue added mass changes Changes that affect multiple pages. translation Translate pages from one language to another. labels Feb 5, 2021
@bl-ue bl-ue requested review from einverne and mebeim February 5, 2021 13:50
@bl-ue bl-ue marked this pull request as draft February 5, 2021 13:50
@zhouLion
Copy link
Contributor

zhouLion commented Feb 5, 2021

Really attentive you are. In Chinese, "。" is used as a period.

@bl-ue
Copy link
Contributor Author

bl-ue commented Feb 5, 2021

Thank you @zhouLion for the heads up. Should that period be used at the end of sentences where the last word is an English word?

Also, please see line 3 of asdf.md and line 4 of autojump.md (you may need to wait a few seconds for GH to scroll down to those when you click the link). Per https://github.com/ruanyf/document-style-guide/blob/master/docs/marks.md and (translated to English, of course) and what someone else did on line 4 of axel.md, I used the character before the last item of the list. However, I'm just guessing because I don't understand Chinese, so will you please check those and see if they're right? If they're not, please add some suggestions and I'll apply them.

@bl-ue bl-ue mentioned this pull request Feb 5, 2021
6 tasks
Copy link
Contributor

@gyli gyli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are my answers.

pages.zh/common/axel.md Outdated Show resolved Hide resolved
pages.zh/common/rsync.md Outdated Show resolved Hide resolved
pages.zh/common/asdf.md Outdated Show resolved Hide resolved
@bl-ue
Copy link
Contributor Author

bl-ue commented Feb 5, 2021

Thank you @gyli for your feedback! I replied to your comment.

@zhouLion
Copy link
Contributor

zhouLion commented Feb 5, 2021

Thank you @zhouLion for the heads up. Should that period be used at the end of sentences where the last word is an English word?

Also, please see line 3 of asdf.md and line 4 of autojump.md (you may need to wait a few seconds for GH to scroll down to those when you click the link). Per https://github.com/ruanyf/document-style-guide/blob/master/docs/marks.md and (translated to English, of course) and what someone else did on line 4 of axel.md, I used the character before the last item of the list. However, I'm just guessing because I don't understand Chinese, so will you please check those and see if they're right? If they're not, please add some suggestions and I'll apply them.

It still use "。" as period in the Chinese sentence ending with an English word. 

Second question,You almost got right.The last element of the coordinate words should be connected with "和", however, the semicolon in front of the character "和" is redundant.

@bl-ue bl-ue mentioned this pull request Feb 5, 2021
4 tasks
@bl-ue
Copy link
Contributor Author

bl-ue commented Feb 5, 2021

@agnivade @mebeim - a while back, we determined that the periods in the Chinese pages must be .; however, the correct Chinese period is and I would really like the pages to be as correct as possible. Could we instead update the linter with a special rule for Chinese pages?

@bl-ue
Copy link
Contributor Author

bl-ue commented Feb 5, 2021

If we could have a special rule, it should also insist that there be a space between English and Chinese letters in the same sentence. I've got a regex for that: [\u{4e00}-\u{9fd5}][a-zA-Z] (and [a-zA-Z][\u{4e00}-\u{9fd5}])

It could also require English colons at the end of example descriptions, Chinese colon for the more information link, Chinese parentheses in Chinese words, English parentheses in English words, etc.

@bl-ue
Copy link
Contributor Author

bl-ue commented Feb 5, 2021

@zhouLion @gyli take aapt.md:

> 安卓资源包工具(Android Asset Packaging Tools).
> 该工具可以查看,创建, 更新资源压缩包(zip, jar, apk)。

Is that all good? I see mixed English and Chinese periods, a space after a Chinese comma, mixed Chinese and English parentheses...etc.

Will you please carefully study this bit of a text and send the corrected version? That way I could apply the things I learn from it to the rest of the pages ;)

@gyli
Copy link
Contributor

gyli commented Feb 5, 2021

For the parentheses issue, I have to say there is no hard standard, but only style guides and recommendations.

Some suggests if what's wrapped in parentheses is all in English, then use English parentheses https://zh-style-guide.readthedocs.io/zh_CN/latest/%E6%A0%87%E7%82%B9%E7%AC%A6%E5%8F%B7/%E4%B8%AD%E8%8B%B1%E6%96%87%E6%B7%B7%E7%94%A8%E6%97%B6%E6%A0%87%E7%82%B9%E7%94%A8%E6%B3%95.html

Some suggests is what's wrapped is not a full English sentence, then use Chinese parentheses http://www.moe.gov.cn/ewebeditor/uploadfile/2015/01/13/20150113092346124.pdf

So I think it's good enough to choose one at least, or build our own guide. In my opinion, I prefer the first one, use English parentheses if it's all English inside, and there's a space before and after the parentheses.

For other marks inside, if it's not a full English sentence, still use Chinese marks. So that is

> 安卓资源包工具 (Android Asset Packaging Tools)。
> 该工具可以查看,创建,更新资源压缩包 (zip、jar、apk)。

No need to mix English and Chinese periods.

I am not an expert of typesetting, and this is just a suggestion. Please correct me if any one has better idea.

@bl-ue
Copy link
Contributor Author

bl-ue commented Feb 5, 2021

In English, there's only one comma. In Chinese there seems to be two, and this is what I've conjectured: is for sentence structure, and is for lists. Is this correct?

@bl-ue
Copy link
Contributor Author

bl-ue commented Feb 5, 2021

How about this for the parentheses:

  1. If the first char after the opening paren is English, use English opening paren, otherwise use Chinese opening paren.
  2. If the last char before the closing paren is English, use English closing paren, otherwise use Chinese closing paren.

@gyli What do you think?

@gyli
Copy link
Contributor

gyli commented Feb 5, 2021

In English, there's only one comma. In Chinese there seems to be two, and this is what I've conjectured: is for sentence structure, and is for lists. Is this correct?

Chinese comma could be used in list too. Both of and work for parallelism. Generally, means a slightly longer pause, while is for short pause and usually used between short terms.

How about this for the parentheses:

  1. If the first char after the opening paren is English, use English opening paren, otherwise use Chinese opening paren.
  2. If the last char before the closing paren is English, use English closing paren, otherwise use Chinese closing paren.

Doesn't that bring unmatched parentheses, like (...)? That would be worse than using either of them.

@bl-ue
Copy link
Contributor Author

bl-ue commented Feb 5, 2021

Chinese comma could be used in list too. Both of and work for parallelism. Generally, means a slightly longer pause, while is for short pause and usually used between short terms.

Good to know.

Doesn't that bring unmatched parentheses, like (...)? That would be worse than using either of them.

Oh you're right. That wouldn't be good.

@bl-ue
Copy link
Contributor Author

bl-ue commented Feb 5, 2021

So I think it's good enough to choose one at least, or build our own guide. In my opinion, I prefer the first one, use English parentheses if it's all English inside, and there's a space before and after the parentheses.

By "there's a space before and after the parentheses" do you mean

  • a space before and after the opening and closing paren, like this: ... ( ... and ... ) ..., or
  • a space before the opening paren and a space after the closing paren, like this: ... (... and ...) ...

?

pages.zh/common/behat.md Outdated Show resolved Hide resolved
@sbrl
Copy link
Member

sbrl commented Feb 6, 2021

This is just part of the trouble as to why we have allowed unicode punctuation in tldr pages, as linting it gets seriously complicated. Not to say it can't be done - just that it's a complex beast.

Also, we need to make a clear distinction between visible punctuation characters and punctuation that means something in Markdown.

Finally, the issue of commas ties into the question of ascii punctuation, which was discussed before. IIRC it was decided that punctuation such as quotes " were to stay as ASCII for whatever reason which I forget (the linter being inflexible was part of the reason, but not the only one)

Relevant reading: #3426 (but this isn't quite the one I was looking for).

@zjuyk
Copy link
Contributor

zjuyk commented Feb 6, 2021

I think you need this.

https://github.com/sparanoid/chinese-copywriting-guidelines/blob/master/README.en-US.md

For Chinese Writing, there are not strict rules. Most of us agree with the rules listed above which can make our writing more beautiful.

@bl-ue
Copy link
Contributor Author

bl-ue commented Feb 6, 2021

Thank you @zjuyk that's great, I'll take a look tomorrow as it's night here. (I'm 13 hours behind you)

I didn't know there was an English translation!! ❤️

@mebeim
Copy link
Member

mebeim commented Feb 6, 2021

Hey there @bl-ue, thanks for this draft PR. I think my comment more or less summarizes the issue around this kind of change. Basically, as long as we don't alter critical punctuation characters (which could be used by linter and/or clients for parsing) it's surely fine, otherwise there might be some issues with clients. If in there are better punctuation characters to use for Chinese pages we can use those, as long as they are not in those critical parts. If they are, we can still use them, but they could break clients (hopefully they do not, but there's a chance).

Therefore:

What should period be at the end of page sentences in page descriptions? for sentences that end with English letters? for sentences that end with Chinese letters?

If the Chinese period is then you can go for it in the middle of description lines, that's fine. E.G.:

# example

> Sentence。Another sentence.
> Second line。Another second line sentence.

...

The problem only arises at the end of description lines, for example:

# example

> Sentence。Another sentence。
> Second line。Another second line sentence。

...

The same applies to colons at the end of command descriptions.

Currently, our linter does not check Chinese pages, so it would still be OK to have the Chinese final period or final colon, but I am unsure whether it could break existing clients. It should not, if the clients are decently written. Therefore I don't see any real problem in using it.

@mebeim
Copy link
Member

mebeim commented Feb 6, 2021

Also, @bl-ue I see you changed most of the instances of mixed English and Chinese text by adding spaces, for example 运行指定event -> 运行指定 event. Is there a specific reason for that? Is the version without a space wrong, or worse? Just curious, it's fine either way.

@bl-ue
Copy link
Contributor Author

bl-ue commented Feb 6, 2021

Also, @bl-ue I see you changed most of the instances of mixed English and Chinese text by adding spaces, for example 运行指定event -> 运行指定 event. Is there a specific reason for that? Is the version without a space wrong, or worse? Just curious, it's fine either way.

Yes, @mebeim. https://github.com/sparanoid/chinese-copywriting-guidelines/blob/master/README.en-US.md#place-one-space-beforeafter-english-words

pages.zh/linux/flameshot.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@marchersimon marchersimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few issues / questions I found. After that I think this PR is finished.

pages.zh/common/asciinema.md Outdated Show resolved Hide resolved
@@ -2,31 +2,31 @@

> 录制和播放终端会话,也可以把他们分享到 asciinema.org.

- 将本地安装的`asciinema`与 asciinema.org 账号关联:
- 将本地安装的`asciinema`与 asciinema.org 账号关联
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- 将本地安装的`asciinema`与 asciinema.org 账号关联:
- 将本地安装的 `asciinema` 与 asciinema.org 账号关联:

There should always be a space between Latin and Chinese characters (except for full-width punctuation) right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right.

@@ -1,28 +1,28 @@
# asdf

> 可扩展的包版本管理器,支持Nodejs,Ruby,Elixir,Erlang等.
> 更多信息: <https://asdf-vm.com>.
> 可扩展的包版本管理器,支持 Nodejs、Ruby、Elixir 和 Erlang 等。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that change was made by @ bl-ue. Any idea why this comma looks so weird?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found out that is an "ideographic comma" (U+3001), but there's also the "halfwidth ideographic comma" (U+FF64) and there's the "fullwidth comma" (U+FF0C). Is there any rule when to use which? And what should we be using?

Copy link
Member

@blueskyson blueskyson Jul 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://en.wikipedia.org/wiki/JIS_X_0201, halfwidth ideographic comma derives from JIS X 0201, which is adopted by Japanese. So it is not supposed to appear in any Chinese page.

Copy link
Member

@blueskyson blueskyson Jul 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideographic comma is used when listing items, it's also called "enumeration comma". Full-width comma is used to join together clauses of certain topic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. For enumerating Latin text, should we use or ,? Currently both variants appear in the pages.

Copy link
Collaborator

@marchersimon marchersimon Jul 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I'm able to correct the commas, since most of the time I have to understand the sentence to know if it's a enumeration or linking comma.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although using to list is weird for us, we still can understand what the sentence means. Listing without is tolerable, we can fix it in future PRs.

pages.zh/common/curl.md Outdated Show resolved Hide resolved
@marchersimon
Copy link
Collaborator

https://github.com/sparanoid/chinese-copywriting-guidelines/blob/master/README.en-US.md says that we should be using 「...」, instead of "...". Any thoughts?

@blueskyson
Copy link
Member

blueskyson commented Jul 18, 2021

https://github.com/sparanoid/chinese-copywriting-guidelines/blob/master/README.en-US.md says that we should be using 「...」, instead of "...". Any thoughts?

The usage differs between zh (Chinese) and zh_TW (traditional Chinese).

  • Use “...” in zh.
  • Use 「...」 in zh_TW. The quotation mark is called "Corner brackets".

East Asian countries adopt many different quotation marks, which can be referenced from: https://en.wikipedia.org/wiki/Quotation_mark#Chinese,_Japanese,_and_Korean

Copy link
Collaborator

@marchersimon marchersimon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be done now


`bmaptool create -o {{blockmap格式文件.bmap}} {{图片文件}}`
`bmaptool create -o "{{blockmap 格式文件.bmap}}" {{图片文件}}`
Copy link
Member

@blueskyson blueskyson Jul 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what's the purpose of using ". The original page doesn't contain ".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All else punctuations look good.

Copy link
Collaborator

@marchersimon marchersimon Aug 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, those quotation marks were added by @ bl-ue in 4dc03a3, but that was probably not on purpose. I'll remove them again.

All else punctuations look good.

Thanks for checking :)

@@ -3,7 +3,7 @@
> 使用`GPG`加密存档中的文件和目录。
> 更多信息: <https://www.gnupg.org/documentation/manuals/gnupg/gpg_002dzip.html>.

- 使用密码将一个目录加密为`archive.gpg`
- 使用密码将一个目录加密为`archive.gpg`:

`gpg-zip --symmetric --output {{archive.gpg}} {{path/to/directory}}`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`gpg-zip --symmetric --output {{archive.gpg}} {{path/to/directory}}`
`gpg-zip --symmetric --output {{档案.gpg}} {{路径/目录}}`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the main purpose of this PR is to fix punctuations, so I'll ignore translation problems from now on.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think this PR is already big enough 😅
But I feel like translating untranslated arguments in Chinese pages would be worth it's own PR

Copy link
Member

@sbrl sbrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has been sitting around long enough. It may not be perfect, but it does leave everything in a better state than it found it, and we can always work on additional issues in future PRs.

@marchersimon marchersimon changed the title Chinese pages: correct chinese punctuation Chinese pages: correct Chinese punctuation Aug 2, 2021
@marchersimon marchersimon merged commit 289e30d into main Aug 2, 2021
@marchersimon marchersimon deleted the bl-ue/correct-chinese-punctuation branch August 2, 2021 08:41
@marchersimon
Copy link
Collaborator

Finally!
Thanks to everyone who helped with this, I think this repo has never had such a big PR 😄

@marchersimon
Copy link
Collaborator

If there should be no space between full-width characters and Latin characters, does that mean there should be no space between the : and the < in the more information link?

@zjuyk
Copy link
Contributor

zjuyk commented Aug 5, 2021

If there should be no space between full-width characters and Latin characters, does that mean there should be no space between the : and the < in the more information link?

Yes, there should be no space between full-width characters and Latin characters.

According to Chinese Copywriting Guidelines , we should not place one space before/after punctuation in fullwidth form in chinese writing.

@marchersimon
Copy link
Collaborator

marchersimon commented Aug 5, 2021

Thanks, I'll make another PR for this → #6305

blueskyson pushed a commit that referenced this pull request Aug 21, 2021
* Chinese pages: add style guide

a proposal in addition to #5240 to guide new contributors to
unify the formatting of Chinese pages.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mass changes Changes that affect multiple pages. translation Translate pages from one language to another.
Projects
None yet
Development

Successfully merging this pull request may close these issues.