Discussion:
Show textual diff in a moved/copied file - how?
Alexey Neyman
2018-02-26 07:38:03 UTC
Permalink
Hi all,

I am trying to dig for some changes in a file that was moved a few times
and 'svn diff' shows full "remove old location and add new location as
if it were a new file" diffs, which are not helpful. Is there a way to
make the diff show the changes as compared against the origin of the
copy? I tried --notice-ancestry, does not help.

I have a vague recollection that 'svn diff' used to show the changes in
such copied files before - but I tried the small reproduction script
below and it shows the same, both with 1.7.22/1.8.17/1.9.7/trunk:

---8<---
#!/bin/bash

rm -rf /tmp/foo-{svn,wc}
svnadmin create /tmp/foo-svn
svn co file:///tmp/foo-svn foo-wc
cd foo-wc
echo foo > foobar
svn add foobar
svn ci -m "1"
svn mv foobar barfoo
echo bar >> barfoo
svn ci -m "2"
svn up
svn diff -c 2
svn --version
---8<---


Diff output:

---8<---
Index: foobar
===================================================================
--- foobar    (revision 1)
+++ foobar    (nonexistent)
@@ -1 +0,0 @@
-foo
Index: barfoo
===================================================================
--- barfoo    (nonexistent)
+++ barfoo    (revision 2)
@@ -0,0 +1,2 @@
+foo
+bar
---8<----

Regards,
Alexey.
Stefan Sperling
2018-02-26 08:18:59 UTC
Permalink
Post by Alexey Neyman
Hi all,
I am trying to dig for some changes in a file that was moved a few times and
'svn diff' shows full "remove old location and add new location as if it
were a new file" diffs, which are not helpful. Is there a way to make the
diff show the changes as compared against the origin of the copy? I tried
--notice-ancestry, does not help.
Diff output changes depending on whether you pass a path to the
file itself or to a parent of the file. Try: svn diff -c 2 barfoo
I found this in the diff_renamed_file() test in diff_tests.py,
see there for more examples.
https://svn.apache.org/repos/asf/subversion/trunk/subversion/tests/cmdline/diff_tests.py
Post by Alexey Neyman
I have a vague recollection that 'svn diff' used to show the changes in such
copied files before - but I tried the small reproduction script below and it
---8<---
#!/bin/bash
rm -rf /tmp/foo-{svn,wc}
svnadmin create /tmp/foo-svn
svn co file:///tmp/foo-svn foo-wc
cd foo-wc
echo foo > foobar
svn add foobar
svn ci -m "1"
svn mv foobar barfoo
echo bar >> barfoo
svn ci -m "2"
svn up
svn diff -c 2
svn --version
---8<---
---8<---
Index: foobar
===================================================================
--- foobar    (revision 1)
+++ foobar    (nonexistent)
@@ -1 +0,0 @@
-foo
Index: barfoo
===================================================================
--- barfoo    (nonexistent)
+++ barfoo    (revision 2)
@@ -0,0 +1,2 @@
+foo
+bar
---8<----
Regards,
Alexey.
Alexey Neyman
2018-02-26 08:43:42 UTC
Permalink
Post by Stefan Sperling
Post by Alexey Neyman
Hi all,
I am trying to dig for some changes in a file that was moved a few times and
'svn diff' shows full "remove old location and add new location as if it
were a new file" diffs, which are not helpful. Is there a way to make the
diff show the changes as compared against the origin of the copy? I tried
--notice-ancestry, does not help.
Diff output changes depending on whether you pass a path to the
file itself or to a parent of the file. Try: svn diff -c 2 barfoo
I found this in the diff_renamed_file() test in diff_tests.py,
see there for more examples.
https://svn.apache.org/repos/asf/subversion/trunk/subversion/tests/cmdline/diff_tests.py
You don't expect the end-user to read the test cases in the product to
get these subtleties, do you? :)

And, I find it quite counter-intuitive. I would expect --notice-ancestry
at least to take ancestral relationship between these files into
account; the currently shown diff is the same as if 'barfoo' were not
copied but was created from scratch.
Well, either of these approaches is not very convenient when there is a
dozen moves & modifications in a single revision.

Besides, the former (just passing the path) does not seem to work in all
cases. In the real repository, I have two revisions that did the same
thing: moved a directory and modified some files in the moved directory.
The trick with passing the path to the file works for one of them, but
not for the other - and I am at a loss why SVN treats these two
differently. Here's where diff does not display the proper diff even
when supplied with the path to the file:

# The relevant fragment of a revision
$ svn log -c 36 -v file://`pwd`/XXXXXX-svn
   A /trunk/XXXXXX/src/bin/more (from /vendor/YYYY:29)
   M /trunk/XXXXXX/src/bin/more/more.c
# Passing the path to the directory that was copied: does not work
$ svn di -c 36 file://`pwd`/XXXXXX-svn/trunk/XXXXXX/src/bin/more | grep
-A 4 'Index: more.c'
Index: more.c
===================================================================
--- more.c      (nonexistent)
+++ more.c      (revision 36)
@@ -0,0 +1,1894 @@
# Passing the path to the specific file: does not work
$ svn di -c 36 file://`pwd`/XXXXXX-svn/trunk/XXXXXX/src/bin/more/more.c
| grep -A 4 'Index: more.c'
Index: more.c
===================================================================
--- more.c      (nonexistent)
+++ more.c      (revision 36)
@@ -0,0 +1,1894 @@
# Manual, file-by-file: works, but doesn't scale to revisions with lots
of modifications
$ svn di
file://`pwd`/los178-svn{/vendor/YYYY/***@29,/trunk/XXXXX/src/bin/more/***@36}
| grep -A 4 'Index: more.c'
Index: more.c
===================================================================
--- more.c      (.../vendor/BSD/more/4.3Tahoe/more.c)   (revision 29)
+++ more.c      (.../trunk/los178/src/bin/more/more.c)  (revision 36)
@@ -1,3 +1,11 @@


Regards,
Alexey.
Post by Stefan Sperling
Post by Alexey Neyman
I have a vague recollection that 'svn diff' used to show the changes in such
copied files before - but I tried the small reproduction script below and it
---8<---
#!/bin/bash
rm -rf /tmp/foo-{svn,wc}
svnadmin create /tmp/foo-svn
svn co file:///tmp/foo-svn foo-wc
cd foo-wc
echo foo > foobar
svn add foobar
svn ci -m "1"
svn mv foobar barfoo
echo bar >> barfoo
svn ci -m "2"
svn up
svn diff -c 2
svn --version
---8<---
---8<---
Index: foobar
===================================================================
--- foobar    (revision 1)
+++ foobar    (nonexistent)
@@ -1 +0,0 @@
-foo
Index: barfoo
===================================================================
--- barfoo    (nonexistent)
+++ barfoo    (revision 2)
@@ -0,0 +1,2 @@
+foo
+bar
---8<----
Regards,
Alexey.
Stefan Sperling
2018-02-26 09:49:07 UTC
Permalink
Post by Stefan Sperling
Post by Alexey Neyman
Hi all,
I am trying to dig for some changes in a file that was moved a few times and
'svn diff' shows full "remove old location and add new location as if it
were a new file" diffs, which are not helpful. Is there a way to make the
diff show the changes as compared against the origin of the copy? I tried
--notice-ancestry, does not help.
Diff output changes depending on whether you pass a path to the
file itself or to a parent of the file. Try: svn diff -c 2 barfoo
I found this in the diff_renamed_file() test in diff_tests.py,
see there for more examples.
https://svn.apache.org/repos/asf/subversion/trunk/subversion/tests/cmdline/diff_tests.py
You don't expect the end-user to read the test cases in the product to get
these subtleties, do you? :)
No, I don't. But subtle details such as this are often not documented.
In documentation there is always a trade-off between what the system
is actually doing in detail and what the reader really needs to know.

The test cases are an accurate source of reference when it comes to
details of expected behaviour like this because they encode what's
actually intended.
And, I find it quite counter-intuitive. I would expect --notice-ancestry at
least to take ancestral relationship between these files into account;
(I don't have time to look at the code right now, so I'm speculating a bit.)
You're diffing *directories*, not files. There are separate client-side
handlers for directory and file diffs which might not always have the same
information available. E.g. it may not be feasible to trace the back the
copy history of every child when diffing two directories.
Well, either of these approaches is not very convenient when there is a
dozen moves & modifications in a single revision.
Agreed. At least the file diffs allows you to 'zoom in', but it would
be much better if there was a way to get the diff you want to see
with just one command.
Besides, the former (just passing the path) does not seem to work in all
moved a directory and modified some files in the moved directory. The trick
with passing the path to the file works for one of them, but not for the
other - and I am at a loss why SVN treats these two differently. Here's
where diff does not display the proper diff even when supplied with the path
# The relevant fragment of a revision
$ svn log -c 36 -v file://`pwd`/XXXXXX-svn
   A /trunk/XXXXXX/src/bin/more (from /vendor/YYYY:29)
   M /trunk/XXXXXX/src/bin/more/more.c
# Passing the path to the directory that was copied: does not work
$ svn di -c 36 file://`pwd`/XXXXXX-svn/trunk/XXXXXX/src/bin/more | grep -A 4
'Index: more.c'
Index: more.c
===================================================================
--- more.c      (nonexistent)
+++ more.c      (revision 36)
@@ -0,0 +1,1894 @@
# Passing the path to the specific file: does not work
$ svn di -c 36 file://`pwd`/XXXXXX-svn/trunk/XXXXXX/src/bin/more/more.c |
grep -A 4 'Index: more.c'
Index: more.c
===================================================================
--- more.c      (nonexistent)
+++ more.c      (revision 36)
@@ -0,0 +1,1894 @@
# Manual, file-by-file: works, but doesn't scale to revisions with lots of
modifications
| grep -A 4 'Index: more.c'
Index: more.c
===================================================================
--- more.c      (.../vendor/BSD/more/4.3Tahoe/more.c)   (revision 29)
+++ more.c      (.../trunk/los178/src/bin/more/more.c)  (revision 36)
@@ -1,3 +1,11 @@
I can't explain this one. It might be worth filing an issue about
this problem in case you can come up with a standalone recipe to
reproduce it.
Johan Corveleyn
2018-02-26 10:48:06 UTC
Permalink
Post by Stefan Sperling
Post by Stefan Sperling
Post by Alexey Neyman
Hi all,
I am trying to dig for some changes in a file that was moved a few times and
'svn diff' shows full "remove old location and add new location as if it
were a new file" diffs, which are not helpful. Is there a way to make the
diff show the changes as compared against the origin of the copy? I tried
--notice-ancestry, does not help.
Diff output changes depending on whether you pass a path to the
file itself or to a parent of the file. Try: svn diff -c 2 barfoo
I found this in the diff_renamed_file() test in diff_tests.py,
see there for more examples.
https://svn.apache.org/repos/asf/subversion/trunk/subversion/tests/cmdline/diff_tests.py
You don't expect the end-user to read the test cases in the product to get
these subtleties, do you? :)
No, I don't. But subtle details such as this are often not documented.
In documentation there is always a trade-off between what the system
is actually doing in detail and what the reader really needs to know.
The test cases are an accurate source of reference when it comes to
details of expected behaviour like this because they encode what's
actually intended.
And, I find it quite counter-intuitive. I would expect --notice-ancestry at
least to take ancestral relationship between these files into account;
(I don't have time to look at the code right now, so I'm speculating a bit.)
You're diffing *directories*, not files. There are separate client-side
handlers for directory and file diffs which might not always have the same
information available. E.g. it may not be feasible to trace the back the
copy history of every child when diffing two directories.
Well, either of these approaches is not very convenient when there is a
dozen moves & modifications in a single revision.
Agreed. At least the file diffs allows you to 'zoom in', but it would
be much better if there was a way to get the diff you want to see
with just one command.
Besides, the former (just passing the path) does not seem to work in all
moved a directory and modified some files in the moved directory. The trick
with passing the path to the file works for one of them, but not for the
other - and I am at a loss why SVN treats these two differently. Here's
where diff does not display the proper diff even when supplied with the path
# The relevant fragment of a revision
$ svn log -c 36 -v file://`pwd`/XXXXXX-svn
A /trunk/XXXXXX/src/bin/more (from /vendor/YYYY:29)
M /trunk/XXXXXX/src/bin/more/more.c
# Passing the path to the directory that was copied: does not work
$ svn di -c 36 file://`pwd`/XXXXXX-svn/trunk/XXXXXX/src/bin/more | grep -A 4
'Index: more.c'
Index: more.c
===================================================================
--- more.c (nonexistent)
+++ more.c (revision 36)
@@ -0,0 +1,1894 @@
# Passing the path to the specific file: does not work
$ svn di -c 36 file://`pwd`/XXXXXX-svn/trunk/XXXXXX/src/bin/more/more.c |
grep -A 4 'Index: more.c'
Index: more.c
===================================================================
--- more.c (nonexistent)
+++ more.c (revision 36)
@@ -0,0 +1,1894 @@
# Manual, file-by-file: works, but doesn't scale to revisions with lots of
modifications
| grep -A 4 'Index: more.c'
Index: more.c
===================================================================
--- more.c (.../vendor/BSD/more/4.3Tahoe/more.c) (revision 29)
+++ more.c (.../trunk/los178/src/bin/more/more.c) (revision 36)
@@ -1,3 +1,11 @@
I can't explain this one. It might be worth filing an issue about
this problem in case you can come up with a standalone recipe to
reproduce it.
I remembered we had a similar discussion (also on the different
behaviour of 'svn diff' vs. 'svnlook diff') on dev@ some years ago.
It's a long thread with lots of info in it. I don't have time to
refocus / summarize this now, so I'm just dropping this link here from
where I think the thread starts to become interesting:

https://svn.haxx.se/dev/archive-2013-06/0621.shtml

It also refers to an older post where I highlighted the difference
between 'svnlook diff' has --diff-copy-from' and 'svn diff
--show-copies-as-adds' (which sounds like the reverse option, so 'svn
diff' sounds like diff-copy-from would be the default ... but then
apparently that isn't quite true):

https://svn.haxx.se/dev/archive-2012-11/0480.shtml

I agree that there are various inconsistencies in the current
behaviour of diff regarding "ancestry handling" and it is certainly
not ideal. Maybe it's time someone refocused on this to untangle it
all ...
--
Johan
Alexey Neyman
2018-02-27 00:52:41 UTC
Permalink
Post by Stefan Sperling
And, I find it quite counter-intuitive. I would expect --notice-ancestry at
least to take ancestral relationship between these files into account;
(I don't have time to look at the code right now, so I'm speculating a bit.)
You're diffing *directories*, not files. There are separate client-side
handlers for directory and file diffs which might not always have the same
information available. E.g. it may not be feasible to trace the back the
copy history of every child when diffing two directories.
I am not that familiar to say why 'svn diff' behaves in the way it does,
but it does look like it's contradicting the description in 'svn help diff':

  --notice-ancestry        : diff unrelated nodes as delete and add

Since 'svn diff' does not take the opposite option, '--ignore-ancestry',
I'd say one would assume that 'svn diff' should diff *related* nodes
textually, not *as delete and add*. Tracing each child may take some
additional time, right, but between "fast and wrong" and "slow and
correct" behaviors, I'd choose the latter :)
Post by Stefan Sperling
Well, either of these approaches is not very convenient when there is a
dozen moves & modifications in a single revision.
Agreed. At least the file diffs allows you to 'zoom in', but it would
be much better if there was a way to get the diff you want to see
with just one command.
If backwards compatibility of 'svn diff' behavior, or the performance
impact of tracing every child, is a concern - is it possible to have
'svn diff' do such history tracing if enabled by some new option?

Although, I cannot come up with a better name than 'svn diff
--properly-diff-related-nodes'.
Post by Stefan Sperling
Besides, the former (just passing the path) does not seem to work in all
moved a directory and modified some files in the moved directory. The trick
with passing the path to the file works for one of them, but not for the
other - and I am at a loss why SVN treats these two differently. Here's
where diff does not display the proper diff even when supplied with the path
[... snip ...]
Post by Stefan Sperling
I can't explain this one. It might be worth filing an issue about
this problem in case you can come up with a standalone recipe to
reproduce it.
I found what triggers this behavior. This happens when the source of the
copy is not the revision immediately preceding the revision being diffed.

Here's the script for reproduction:

---8<---
#!/bin/bash

r=`pwd`/foo-svn
url=file://$r
wc=`pwd`/foo-wc
rm -rf $r $wc
svnadmin create $r
svn co $url $wc
cd $wc
echo "Initial content" > foo
svn add foo
svn ci -m "Initial import"

# Source revision to be used in copy later
srev=`svnlook youngest $r`

if [ "$INSERT_EXTRA_REVISION" = "yes" ]; then
    svn mkdir somedir
    svn ci -m "Extra revision"
fi

svn cp $url/foo@$srev bar
echo "Added line" >> bar
svn ci -m "Copy + modify"

cmrev=`svnlook youngest $r`
svn diff -c $cmrev $url/bar@$cmrev
---8<---

And here is the output from the script:

---8<---
$ ./test.sh
...
Index: foo
===================================================================
--- foo    (.../foo)    (revision 1)
+++ foo    (.../bar)    (revision 2)
@@ -1 +1,2 @@
 Initial content
+Added line
$ INSERT_EXTRA_REVISION=yes ./test.sh
...
Index: bar
===================================================================
--- bar    (nonexistent)
+++ bar    (revision 3)
@@ -0,0 +1,2 @@
+Initial content
+Added line
---8<---

Why is the behavior different in these cases? Isn't that
counter-intuitive as well that the diff's output depends on the source
revision of the copy?

Regards,
Alexey.
Stefan Sperling
2018-02-27 08:13:48 UTC
Permalink
Why is the behavior different in these cases? Isn't that counter-intuitive
as well that the diff's output depends on the source revision of the copy?
I think these differences in behaviour boil down to side-effects of
the implementation.

On the server-side, all of diff/update/merge are driven by the same code,
the "reporter" in libsvn_repos. So when trying to fix diff one needs to be
careful not to break the other two.
I've always had a lot of fun whenever I dove in there...

In other words, it looks like a simple bug on the surface, but when you dig
in to figure out what needs fixing it gets tricky rather quickly.
Maybe diff needs a separate driver implementation on the server.
Johan Corveleyn
2018-02-27 11:26:25 UTC
Permalink
Post by Stefan Sperling
Why is the behavior different in these cases? Isn't that counter-intuitive
as well that the diff's output depends on the source revision of the copy?
I think these differences in behaviour boil down to side-effects of
the implementation.
As I posted before in this thread, this problem was already noted and
discussed before on dev@ (feel free to follow the links I posted :-)).
But I'm happy this issue is brought back to the foreground, because I
too consider this an issue and inconsistent behaviour from the user's
perspective (regardless of the underlying implementation problem).
Post by Stefan Sperling
Back to your issue. Since Subversion can't represent the copy as part
of the diff it tries to do the interoperable thing which is to
represent the addition of a new file (from a copy) as an addition.
1) If the copyfrom source is part of the operative revision range of
the diff command, show a modification against the copyfrom source.
Unless --show-copies-as-adds was passed, because then we always
show copied files as an addition.
2) If the copyfrom source is not part of the operative revision range,
history of the file isn't traced back to that revision, so it appears
as an addition.
It could be argued that 2) is weird special case, and that it should
behave like 1) (i.e. trace back to the copyfrom source anyway) and
only show an addition with --show-copies-as-adds.
Johan pointed out that svnlook diff seems to traverse to the copyfrom
source even in case 2). If this is indeed the case, these commands are
now behaving in contradictory ways :( However, I think it's too late
to change either command now.
--
Johan
Alexey Neyman
2018-02-27 15:52:00 UTC
Permalink
Post by Johan Corveleyn
Post by Stefan Sperling
Why is the behavior different in these cases? Isn't that counter-intuitive
as well that the diff's output depends on the source revision of the copy?
I think these differences in behaviour boil down to side-effects of
the implementation.
As I posted before in this thread, this problem was already noted and
But I'm happy this issue is brought back to the foreground, because I
too consider this an issue and inconsistent behaviour from the user's
perspective (regardless of the underlying implementation problem).
Post by Stefan Sperling
Back to your issue. Since Subversion can't represent the copy as part
of the diff it tries to do the interoperable thing which is to
represent the addition of a new file (from a copy) as an addition.
1) If the copyfrom source is part of the operative revision range of
the diff command, show a modification against the copyfrom source.
Unless --show-copies-as-adds was passed, because then we always
show copied files as an addition.
2) If the copyfrom source is not part of the operative revision range,
history of the file isn't traced back to that revision, so it appears
as an addition.
It could be argued that 2) is weird special case, and that it should
behave like 1) (i.e. trace back to the copyfrom source anyway) and
only show an addition with --show-copies-as-adds.
Johan pointed out that svnlook diff seems to traverse to the copyfrom
source even in case 2). If this is indeed the case, these commands are
now behaving in contradictory ways :( However, I think it's too late
to change either command now.
Thanks for bringing up this explanation. So the second inconsistency is
because '-c X' actually defines operative range X-1:X and the source of
the copy is X-2 in this case.

Indeed, a lot of subtleties and inconsistencies that appear to be bugs.

Is there ever going to be SVN 2.0 that can finally break these
bug-for-bug compatibility promises? Is there a list of things that are
going to be changed in 2.0?

Regards,
Alexey.
Stefan Sperling
2018-02-27 16:50:32 UTC
Permalink
Post by Alexey Neyman
Thanks for bringing up this explanation.
Indeed!
I had totally forgotten about this conversion from years ago.
Post by Alexey Neyman
So the second inconsistency is
because '-c X' actually defines operative range X-1:X and the source of the
copy is X-2 in this case.
Indeed, a lot of subtleties and inconsistencies that appear to be bugs.
Is there ever going to be SVN 2.0 that can finally break these bug-for-bug
compatibility promises? Is there a list of things that are going to be
changed in 2.0?
I wouldn't object to changing 'svn diff' to match the behaviour
of 'svnlook diff' in this particular case. The inconsistency
does not help anyone, and our compatibilty guarantees aren't
*that* solid. We've certainly changed some output of our tooling
when it helped our users, even where doing so hurt scripts.

I think my concerns were more about the effort involved, rather
than compatibility. The process of adding --show-copies-as-adds
was surprisingly difficult. I wouldn't want to go back to that
code myself. I would review another brave soul's patches, though.
The effort involved is easy to underestimate, unfortunately.
Johan Corveleyn
2018-02-27 22:07:50 UTC
Permalink
Post by Stefan Sperling
Post by Alexey Neyman
Thanks for bringing up this explanation.
Indeed!
I had totally forgotten about this conversion from years ago.
Post by Alexey Neyman
So the second inconsistency is
because '-c X' actually defines operative range X-1:X and the source of the
copy is X-2 in this case.
Indeed, a lot of subtleties and inconsistencies that appear to be bugs.
Is there ever going to be SVN 2.0 that can finally break these bug-for-bug
compatibility promises? Is there a list of things that are going to be
changed in 2.0?
I wouldn't object to changing 'svn diff' to match the behaviour
of 'svnlook diff' in this particular case. The inconsistency
does not help anyone, and our compatibilty guarantees aren't
*that* solid. We've certainly changed some output of our tooling
when it helped our users, even where doing so hurt scripts.
+1. Backwards compatibility shouldn't block us from fixing bugs.

This certainly feels like a bug to me (the fact that it's inconsistent
depending on the operative revision range, and inconsistent with
'svnlook diff' where the behaviour seems more sane, is a strong
indication).
Post by Stefan Sperling
I think my concerns were more about the effort involved, rather
than compatibility. The process of adding --show-copies-as-adds
was surprisingly difficult. I wouldn't want to go back to that
code myself. I would review another brave soul's patches, though.
The effort involved is easy to underestimate, unfortunately.
Putting Julian in cc because he was just talking about the diff code
on IRC today. You never know :-) ...
--
Johan
Loading...