Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding problems with non-ASCII characters with Git 2.44.0 #4851

Closed
1 task done
inosik opened this issue Mar 5, 2024 · 15 comments · Fixed by #4968
Closed
1 task done

Encoding problems with non-ASCII characters with Git 2.44.0 #4851

inosik opened this issue Mar 5, 2024 · 15 comments · Fixed by #4968
Milestone

Comments

@inosik
Copy link

inosik commented Mar 5, 2024

  • I was not able to find an open or closed issue matching what I'm seeing

There's something wrong w.r.t. encoding in Git Bash. Non-ASCII characters (German Umlauts in my case) appear garbled in the terminal output.

Git 2.43.0:

$ git add -p
diff --git a/test.txt b/test.txt
index a3ea8e6..0e54281 100644
--- a/test.txt
+++ b/test.txt
@@ -1 +1,2 @@
 äöü
+change
(1/1) Stage this hunk [y,n,q,a,d,e,?]?

Git 2.44.0:

$ git add -p
diff --git a/test.txt b/test.txt
index a3ea8e6..0e54281 100644
--- a/test.txt
+++ b/test.txt
@@ -1 +1,2 @@
 ├ñ├Â├╝
+change
(1/1) Stage this hunk [y,n,q,a,d,e,?]?

Setup

  • Which version of Git for Windows are you using? Is it 32-bit or 64-bit?
$ git --version --build-options

git version 2.44.0.windows.1
cpu: x86_64
built from commit: ad0bbfffa543db6979717be96df630d3e5741331
sizeof-long: 4
sizeof-size_t: 8
shell-path: /bin/sh
feature: fsmonitor--daemon
  • Which version of Windows are you running? Vista, 7, 8, 10? Is it 32-bit or 64-bit?
$ cmd.exe /c ver

Microsoft Windows [Version 10.0.19045.3930]
  • What options did you set as part of the installation? Or did you choose the
    defaults?

I installed Git using Scoop, which just extracts PortableGit-2.44.0-64-bit.7z.exe.

  • Any other interesting things about your environment that might be related
    to the issue you're seeing?

Nothing I can think of.

Details

  • Which terminal/shell are you running Git from? e.g Bash/CMD/PowerShell/other

Bash inside of the Windows Terminal App. But this is also reproducible using CMD, PowerShell and git-bash.exe.

git add -p
  • What did you expect to occur after running these commands?

Non-ASCII characters should be displayed properly .

  • What actually happened instead?

Non-ASCII characters are garbled.

  • If the problem was occurring with a specific repository, can you provide the
    URL to that repository to help us with testing?

No specific repository, but here's a Gist with the test file.

@dscho
Copy link
Member

dscho commented Mar 5, 2024

I can reproduce this, and work around it by using MSYS=disable_pcon:

$ MSYS=disable_pcon git add -p
diff --git a/test.txt b/test.txt
index a3ea8e6..0e54281 100644
--- a/test.txt
+++ b/test.txt
@@ -1 +1,2 @@
 äöü
+change
(1/1) Stage this hunk [y,n,q,a,d,e,?]?

$ MSYS=enable_pcon git add -p
diff --git a/test.txt b/test.txt
index a3ea8e6..0e54281 100644
--- a/test.txt
+++ b/test.txt
@@ -1 +1,2 @@
 äöü
+change
(1/1) Stage this hunk [y,n,q,a,d,e,?]?

I can also work around it by calling chcp 65001:

$ cmd //c chcp 65001
Active code page: 65001

me@work MINGW64 ~/repros/umlauts-4851 (main)
$ MSYS=enable_pcon git add -p
diff --git a/test.txt b/test.txt
index a3ea8e6..0e54281 100644
--- a/test.txt
+++ b/test.txt
@@ -1 +1,2 @@
 äöü
+change
(1/1) Stage this hunk [y,n,q,a,d,e,?]?

I guess that we'll want to always change the code page to 65001 in the enable_pcon mode?

@Kirill
Copy link

Kirill commented Mar 5, 2024

I found this bug too.
Folder in Explorer
image

This screenshot I see in Git bash, Terminal, FAR Manager
image
image

I downgrate to 2.21.0 and it's work correctly.
image

I update to 2.44.0, but bug lost (I try reproduce in newer version after downgrade, but I see again correct filename in last version)
image

@inosik
Copy link
Author

inosik commented Mar 6, 2024

@dscho MSYS=disable_pcon doesn't seem to do anything in my case:

$ MSYS=disable_pcon git add -p
diff --git a/file.txt b/file.txt
index a3ea8e6..d88e171 100644
--- a/file.txt
+++ b/file.txt
@@ -1 +1,2 @@
 ├ñ├Â├╝
+Ôé¼
(1/1) Stage this hunk [y,n,q,a,d,e,?]?

chcp 65001 does work.

@MrMarvel
Copy link

MrMarvel commented Mar 18, 2024

Same here utf-8 filenames wrong encoded in git 2.44.0.
image
It was working in git 2.42.0

@zzfhadr
Copy link

zzfhadr commented Apr 4, 2024

same +1
image
but if use git bash it will work... but i don't want to use it.. not convinient for copy cand paste
image

@stenretni
Copy link

stenretni commented Apr 5, 2024

I added chcp.com 65001 > /dev/null to my .config/git/git-prompt.sh as a workaround. So far so good.

@drq001out
Copy link

In Chinese, VS Code & CMD, same problem!!!

@inosik
Copy link
Author

inosik commented May 6, 2024

Same behavior with Git 2.45.0.

@dscho
Copy link
Member

dscho commented May 6, 2024

@inosik I have to be honest: due to shifts in priorities at my day job, I am stretched a little too thin to work on this. Maybe you can? It would require a little bit of C++ knowledge (not C# or F#) to work on the MSYS2 runtime, which is a bit tricky to navigate, I'd try my best to assist with guidance.

@inosik
Copy link
Author

inosik commented May 8, 2024

I never did any C++ coding, but I could give it a shot. Can you tell me where I should start looking?

@dscho
Copy link
Member

dscho commented May 9, 2024

@inosik the first thing would not even be C++, but to verify that dropping the usr\bin\msys-2.0.dll from v2.43.0 into, say, a Portable Git v2.44.0 "fixes" the problem.

@inosik
Copy link
Author

inosik commented May 23, 2024

verify that dropping the usr\bin\msys-2.0.dll from v2.43.0 into, say, a Portable Git v2.44.0 "fixes" the problem.

Moving msys-2.0.dll around didn't fix the problem, but copying mingw64/bin/git.exe from Git 2.43.0 to 2.45.0 did.

@dscho
Copy link
Member

dscho commented May 25, 2024

@inosik thank you for testing! This led me into the right direction. It still took over a day to bisect, but at least now I have a fix in #4968. Could you please verify that this works for you? You should be able to extract the git.exe from the PR build's artifacts, once the build finishes.

@dscho dscho added this to the Next release milestone May 25, 2024
dscho added a commit that referenced this issue May 26, 2024
In #4700, I introduced a change in Git for Windows' behavior where it
would favor recent Windows 10 versions' native ANSI sequence processing
to [Git for Windows' home-grown
one](https://github.com/git-for-windows/git/blob/v2.45.1.windows.1/compat/winansi.c#L362-L439).

What I missed was that the home-grown processing _also_ ensured that
text written to the Win32 Console was carefully converted from UTF-8 to
UTF-16 encoding, while the native ANSI sequence processing would respect
the currently-set code page.

However, Git for Windows does not use the current code page at all,
always using UTF-8 encoded text internally. So let's make sure that the
code page is `CP_UTF8` when Git for Windows uses the native ANSI
sequence processing.

This fixes #4851.
github-actions bot pushed a commit to git-for-windows/build-extra that referenced this issue May 26, 2024
When Git for Windows v2.44.0 introduced the ability [to use native Win32
Console ANSI sequence
processing](git-for-windows/git#4700), an
inadvertent fallout was that in this instance, [non-ASCII characters
were no longer printed correctly unless the current code page was set to
65001](git-for-windows/git#4851). This bug
[has been fixed](git-for-windows/git#4968).

Signed-off-by: gitforwindowshelper[bot] <[email protected]>
@inosik
Copy link
Author

inosik commented May 27, 2024

I've downloaded the snapshot from yesterday (2024-05-26), and this bug seems to be fixed for me. Thank you @dscho!

I've tried it inside a Windows 10 VM as well, which is a bit different of a setup than my actual machine. In the VM, the bug seems to still be reproducible, but this also might have to do with something else. Here's a screenshot:

VirtualBoxVM_2024-05-27_08-13-56

@dscho
Copy link
Member

dscho commented May 27, 2024

In the VM, the bug seems to still be reproducible, but this also might have to do with something else.

Looks like the diff suggests ISO-8859-1 encoding. You may need to specify the working-tree-encoding accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants