H.-Dirk Schmitt
2018-01-27 17:33:27 UTC
I found a very weird behaviour of `svnlook log` that IMHO is a bug (or
at least a serious missing documentation issue).
Introduction
------------
Consider a log message like: 'Unicode Test → ø ÄÖÜ'
`svnlook log` invoked in a normal terminal session shows the proper
content.
This works because the environment is set to 'en_US.UTF-8'.
Now start to play - `env LC_ALL=C.UTF-8 svnlook log` also shows a
correct result.
Problem
-------
But falling back to `env LC_ALL=C svnlook log` I got a very flawed
result:
Unicode Test {U+2192} {U+00F8} AOU
→ and ø are replaced with there code description
The German Umlaut chars are translitterated in a very uncommon way.
In the old ASCII/type-writer days Ä was translitterated in Ae (Ö → Oe,
…)
Why is this behaviour not a cosmetic problem.
---------------------------------------------
Consider a post-commit hook fetching the commit message with `svnlook
log`.
Purpose is to postprocess the log message content, e.g. append to
bugzilla issues.
The actual setup is svn+apache2 and a bash script as post commit hook.
The machine locatle as reported by `localectl`: System Locale:
LANG=en_US.utf8
All the commit messages content transfered is broken as described
above.
This happens because the post-commit hook is running with a very
reduced set of environment variables:
PWD=/
SHLVL=1
Especially `LC_ALL` is not set which is eqivalent to `LC_ALL=C`.
Suggested Mitigation/Fixing
---------------------------
1. Subversion should ensure that the system locale is forwarded to the
post-commit hook.
2. `svnlook` shoud support the `--encoding` switch
3. German Umlaute (and surely some other national characters in the 8-
bit range) shouldn't translittered in a different
way as unicode characters (see ø / {U+00F8}).
PS: Google et. al. haven't shown that this issue is well documented.
at least a serious missing documentation issue).
Introduction
------------
Consider a log message like: 'Unicode Test → ø ÄÖÜ'
`svnlook log` invoked in a normal terminal session shows the proper
content.
This works because the environment is set to 'en_US.UTF-8'.
Now start to play - `env LC_ALL=C.UTF-8 svnlook log` also shows a
correct result.
Problem
-------
But falling back to `env LC_ALL=C svnlook log` I got a very flawed
result:
Unicode Test {U+2192} {U+00F8} AOU
→ and ø are replaced with there code description
The German Umlaut chars are translitterated in a very uncommon way.
In the old ASCII/type-writer days Ä was translitterated in Ae (Ö → Oe,
…)
Why is this behaviour not a cosmetic problem.
---------------------------------------------
Consider a post-commit hook fetching the commit message with `svnlook
log`.
Purpose is to postprocess the log message content, e.g. append to
bugzilla issues.
The actual setup is svn+apache2 and a bash script as post commit hook.
The machine locatle as reported by `localectl`: System Locale:
LANG=en_US.utf8
All the commit messages content transfered is broken as described
above.
This happens because the post-commit hook is running with a very
reduced set of environment variables:
PWD=/
SHLVL=1
Especially `LC_ALL` is not set which is eqivalent to `LC_ALL=C`.
Suggested Mitigation/Fixing
---------------------------
1. Subversion should ensure that the system locale is forwarded to the
post-commit hook.
2. `svnlook` shoud support the `--encoding` switch
3. German Umlaute (and surely some other national characters in the 8-
bit range) shouldn't translittered in a different
way as unicode characters (see ø / {U+00F8}).
PS: Google et. al. haven't shown that this issue is well documented.