Discussion:
mailer.py commit says TypeError: must be unicode, not str
Kenneth Porter
2018-01-23 15:14:47 UTC
Permalink
I upgraded my repo server from CentOS 6 to 7 and my commit hook is now
failing with this message. I manually upgraded mailer.py from that found
in Subversion 1.7.14 to the latest in trunk and still get the same
error. So I suspect it's something in the Subversion Python bindings and
not the mailer script. Or do I need to upgrade my repo through a dump
and reload? (Repo format is 3.)

Here's the full traceback. I do get the email.

bash-4.2$ /usr/local/bin/subversion/mailer.py commit /srv/svn/MPA 7598
/usr/local/bin/subversion/mailer.conf
Traceback (most recent call last):
File "/usr/local/bin/subversion/mailer.py", line 1465, in <module>
sys.argv[3:3+expected_args])
File "/usr/lib64/python2.7/site-packages/svn/core.py", line 307, in
run_app
return func(application_pool, *args, **kw)
File "/usr/local/bin/subversion/mailer.py", line 132, in main
messenger.generate()
File "/usr/local/bin/subversion/mailer.py", line 439, in generate
group, params, paths, subpool)
File "/usr/local/bin/subversion/mailer.py", line 709, in generate_content
renderer.render(data)
File "/usr/local/bin/subversion/mailer.py", line 1056, in render
self._render_diffs(data.diffs, '')
File "/usr/local/bin/subversion/mailer.py", line 1099, in _render_diffs
for diff in diffs:
File "/usr/local/bin/subversion/mailer.py", line 898, in __getitem__
src_fname, dst_fname = diff.get_files()
File "/usr/lib64/python2.7/site-packages/svn/fs.py", line 103, in
get_files
self._dump_contents(self.tempfile2, self.root2, self.path2)
File "/usr/lib64/python2.7/site-packages/svn/fs.py", line 87, in
_dump_contents
fp.write(chunk)
TypeError: must be unicode, not str
Kenneth Porter
2018-02-01 03:23:59 UTC
Permalink
--On Tuesday, January 23, 2018 7:14 AM -0800 Kenneth Porter
Post by Kenneth Porter
File "/usr/lib64/python2.7/site-packages/svn/fs.py", line 87, in
_dump_contents
fp.write(chunk)
TypeError: must be unicode, not str
Here's the code where this is going wrong. I think svn_stream_read is
returning a byte stream and the file object here is expecting a unicode
string. Is there a missing decode('utf-8') call? (I'm a very new Python
coder but have lots of experience in C++ and some understanding of unicode.)

Package is subversion-python-1.7.14-11.el7_4.x86_64 in CentOS 7.4.
Post by Kenneth Porter
From /usr/lib64/python2.7/site-packages/svn/fs.py
def _dump_contents(self, file, root, path, pool=None):
fp = builtins.open(file, 'w+') # avoid namespace clash with
# trimmed-down svn_fs_open()
if path is not None:
stream = file_contents(root, path, pool)
try:
while True:
chunk = _svncore.svn_stream_read(stream,
_svncore.SVN_STREAM_CHUNK_SIZE)
if not chunk:
break
fp.write(chunk)
finally:
_svncore.svn_stream_close(stream)
fp.close()

BTW, I found this nice treatment of unicode in Python 2 and 3:

<https://nedbatchelder.com/text/unipain.html>
Kenneth Porter
2018-02-01 03:40:20 UTC
Permalink
--On Wednesday, January 31, 2018 7:23 PM -0800 Kenneth Porter
Post by Kenneth Porter
fp = builtins.open(file, 'w+') # avoid namespace clash with
# trimmed-down svn_fs_open()
I'm now thinking the problem is in the open call, and that I'm somehow
getting a Python 3 open function even though I've got Python 2.7 installed.
Should the mode be 'wb' instead of 'w+'? That would insure that the raw
data from the Subversion object is getting dumped into the temporary fle
without interpretation. I don't understand why update (denoted by the plus)
is wanted. The temp file isn't being read from.
Kenneth Porter
2018-02-01 18:31:52 UTC
Permalink
[moving discussion to dev list as I think this is now the correct fix.]

--On Wednesday, January 31, 2018 7:40 PM -0800 Kenneth Porter
Post by Kenneth Porter
--On Wednesday, January 31, 2018 7:23 PM -0800 Kenneth Porter
Post by Kenneth Porter
fp = builtins.open(file, 'w+') # avoid namespace clash with
# trimmed-down svn_fs_open()
I'm now thinking the problem is in the open call, and that I'm somehow
getting a Python 3 open function even though I've got Python 2.7
installed. Should the mode be 'wb' instead of 'w+'? That would insure
that the raw data from the Subversion object is getting dumped into the
temporary fle without interpretation. I don't understand why update
(denoted by the plus) is wanted. The temp file isn't being read from.
Proposed edit to fs.py: Change 'w+' to 'wb' when copying svn stream object
to temporary file. Update isn't needed, and the code just needs to dump the
raw data into a file for the external diff to access, so no
encoding/decoding should occur. Hence we should open the file in binary
mode. I just tested this edit and it seems to cure the problem.

It looks like this line is the same since it was originally added in
r843330 and hasn't changed in Troy's swig-py3 branch.
<https://svn.haxx.se/users/archive-2018-01/0094.shtml>
<https://svn.haxx.se/users/archive-2018-02/0000.shtml>

I'm using mailer.py in my post-commit hook and it's throwing a Unicode type
error during the diff phase. Digging through the source code, I figured out
that it's happening during the creation of the two temporary files for
diff'ing. Somehow the output file is getting opened in Unicode text mode
but the input source (the Subversion object stream) is a raw byte stream.
The write call fails.

OS: CentOS 7.4
subversion-python-1.7.14-11.el7_4.x86_64
python-2.7.5-58.el7.x86_64
Troy Curtis Jr
2018-02-02 05:40:16 UTC
Permalink
Post by Kenneth Porter
[moving discussion to dev list as I think this is now the correct fix.]
--On Wednesday, January 31, 2018 7:40 PM -0800 Kenneth Porter
Post by Kenneth Porter
--On Wednesday, January 31, 2018 7:23 PM -0800 Kenneth Porter
Post by Kenneth Porter
fp = builtins.open(file, 'w+') # avoid namespace clash with
# trimmed-down svn_fs_open()
I'm now thinking the problem is in the open call, and that I'm somehow
getting a Python 3 open function even though I've got Python 2.7
installed. Should the mode be 'wb' instead of 'w+'? That would insure
that the raw data from the Subversion object is getting dumped into the
temporary fle without interpretation. I don't understand why update
(denoted by the plus) is wanted. The temp file isn't being read from.
That seems strange, for py3 sure, but certainly odd on py2. Perhaps your
locale is set to utf8? I'll have to research to see if that even makes
sense.

Proposed edit to fs.py: Change 'w+' to 'wb' when copying svn stream object
Post by Kenneth Porter
to temporary file. Update isn't needed, and the code just needs to dump the
raw data into a file for the external diff to access, so no
encoding/decoding should occur. Hence we should open the file in binary
mode. I just tested this edit and it seems to cure the problem.
It looks like this line is the same since it was originally added in
r843330 and hasn't changed in Troy's swig-py3 branch.
I've been leaning heavily on the test coverage for validating my py3
updates. At first glance it looks like this FileDiff isn't referenced in
any existing test. I'll add a test and confirm the behavior, and then test
with your fix, unless you'd like to do so.

Troy
Post by Kenneth Porter
<https://svn.haxx.se/users/archive-2018-01/0094.shtml>
<https://svn.haxx.se/users/archive-2018-02/0000.shtml>
I'm using mailer.py in my post-commit hook and it's throwing a Unicode type
error during the diff phase. Digging through the source code, I figured out
that it's happening during the creation of the two temporary files for
diff'ing. Somehow the output file is getting opened in Unicode text mode
but the input source (the Subversion object stream) is a raw byte stream.
The write call fails.
OS: CentOS 7.4
subversion-python-1.7.14-11.el7_4.x86_64
python-2.7.5-58.el7.x86_64
Troy Curtis Jr
2018-02-07 03:56:10 UTC
Permalink
Proposed edit to fs.py: Change 'w+' to 'wb' when copying svn stream object
Post by Troy Curtis Jr
Post by Kenneth Porter
to temporary file. Update isn't needed, and the code just needs to dump the
raw data into a file for the external diff to access, so no
encoding/decoding should occur. Hence we should open the file in binary
mode. I just tested this edit and it seems to cure the problem.
It looks like this line is the same since it was originally added in
r843330 and hasn't changed in Troy's swig-py3 branch.
I've been leaning heavily on the test coverage for validating my py3
updates. At first glance it looks like this FileDiff isn't referenced in
any existing test. I'll add a test and confirm the behavior, and then test
with your fix, unless you'd like to do so.
Kenneth, I'm having trouble reproducing your issue. Any other hints at
what might causing the trouble in your environment that you can think of?
I've tried changing my locale, changing the files diffed to being utf8, all
with no luck. Regardless your suggested change needs to be done on my
swig-py3 branch, since it for sure needs it for Python 3, but I'd really
like to understand what is going on here to make sure the issue is well and
truly resolved.

Troy
Post by Troy Curtis Jr
Post by Kenneth Porter
<https://svn.haxx.se/users/archive-2018-01/0094.shtml>
<https://svn.haxx.se/users/archive-2018-02/0000.shtml>
I'm using mailer.py in my post-commit hook and it's throwing a Unicode type
error during the diff phase. Digging through the source code, I figured out
that it's happening during the creation of the two temporary files for
diff'ing. Somehow the output file is getting opened in Unicode text mode
but the input source (the Subversion object stream) is a raw byte stream.
The write call fails.
OS: CentOS 7.4
subversion-python-1.7.14-11.el7_4.x86_64
python-2.7.5-58.el7.x86_64
Nico Kadel-Garcia
2018-02-01 05:02:05 UTC
Permalink
Post by Kenneth Porter
--On Wednesday, January 31, 2018 7:23 PM -0800 Kenneth Porter
Post by Kenneth Porter
fp = builtins.open(file, 'w+') # avoid namespace clash with
# trimmed-down svn_fs_open()
I'm now thinking the problem is in the open call, and that I'm somehow
getting a Python 3 open function even though I've got Python 2.7 installed.
Should the mode be 'wb' instead of 'w+'? That would insure that the raw data
from the Subversion object is getting dumped into the temporary fle without
interpretation. I don't understand why update (denoted by the plus) is
wanted. The temp file isn't being read from.
If you are on RHEL based operating systems, some of them install both,
and sometimes with some fascinating "PATH" settings. It can be very
handy to edit customized .py scripts to explicitly call the version of
Python you reall want to use.
Loading...