Discussion:
svndumpfilter and svnsync?
Chris
2018-10-30 13:36:09 UTC
Permalink
Hi,

I just wanted to say that I finally managed to get the dump-filter-load cycle done and deploy the filtered repo. It did get rid of about 90% of the repository size so that's good for us. A big thanks to all who helped out with information in this mail thread! I would definitely have stranded somewhere without you.

One thing that was a bit annoying was when the dumpfilter threw an error because of a source of a file was missing when I filtered out a certain path and it turned out it had been copied to another location. The error message only prints out the missing source and not the destination, so I had to go into the repo to check the revision it crashed on to find the copy destination and add it to my filter list. Would have been nice if the error message could list both the source and the destination.

/Chris

--------------------------------------------
On Wed, 10/10/18, Johan Corveleyn <***@gmail.com> wrote:

Subject: Re: svndumpfilter and svnsync?
To: "Chris" <***@yahoo.se>
Cc: "Daniel Shahaf" <***@daniel.shahaf.name>, "Ryan Schmidt" <subversion-***@ryandesign.com>, "Subversion" <***@subversion.apache.org>
Date: Wednesday, October 10, 2018, 12:11 PM

On Wed, Oct 10, 2018 at 11:18 AM
Chris <***@yahoo.se>
wrote:
...
The syntax I used: svnadmin dump -q MYREPO | svndumpfilter
exclude
--targets
filterfile filterdump svnadmin load -q --no-flush-to-disk
--force-uuid -M 2048
--bypass- prop-validation ./NEWREPO < filterdump
(I had to use the bypass-prop-validation due to
some newline issues
in old log
message, similar to this one
https://groups.google.com/forum/#!topic/subversion_users/P3ohZ-hKhCA,
don't know why they have wrong
newlines, but the repo works as it is
now...)
Instead of ignoring wrong
newlines, you could fix them using
svndumptool (using its eolfix-revprop command),
http://svn.borg.ch/svndumptool/
https://github.com/jwiegley/svndumptool
Also, as of
version 1.10, svnadmin finally has an option to normalize
these on-the-fly during
http://subversion.apache.org/docs/release-notes/1.10.html#normalize-
props
It's a lot better to normalize
these (either with the
--normalize-props option for 'svnadmin load' or by
using svndumptool)
than to
"bypass" them. Otherwise you'll run into this
again later (if
you would
dump+load again sometime in the future).
I tried
--normalize-props and I still got the same error which is
why I
switched over to bypass. Maybe
I've run into some bug with --normalize-props.
Unfortunately, I don't think I'll
be able to create a script for reproducing
the error since it happens far into a
monster dump load.
So I'll stick
with the bypass for now or try the tool that Ryan
suggested.

In that case the
culprit might be another property than svn:log (or it
might be something like "non UTF-8
encoded" but not EOL-related in
svn:log). Possibly a "versioned"
property like svn:ignore or some
other
property in the svn: namespace. This is more difficult to
fix,
but still it might be best to get rid
of it or you'll run into it
again in the
future.

See the very last
bullet in:
http://subversion.apache.org/faq.html#dumpload

If that's indeed the
problem, then you'll have to use that svndumptool
that Ryan pointed you to.
Quoting from that last bullet in the FAQ entry
above:

"This is more
difficult to repair, because 'svn:ignore' is not
a
revision property (unlike svn:log, which
can be manipulated with
svnadmin
setrevprop), but a versioned property (so it's part
of
history). Again, you can ignore this with
--bypass-prop-validation.
But since this is
a corruption "in history", this can only be
repaired
with a dump+load, so this might be
a good time to try and fix this (or
you'll run into this again in the future).
To repair it you can use a
tool like
svndumptool. But it only works on dump files, not as part
of
a pipe. So a possible way to go about it
is: dump that single
(corrupt) revision to a
file, repair it ('svndumptool.py eolfix-prop
svn:ignore svn.dump svn.dump.repaired'),
load that single dumpfile,
and then continue
with a new "piped" command (like step (6) above).
"

I should note here
that svnsync is more powerful in this regard: it
does have the ability to normalize all of these
on the fly. It's a
real pity that
'svnadmin load' doesn't (except for the svn:log
EOL
fixing). Doesn't *yet* that is,
until a volunteer comes along that
submits a
patch for it ;-).

Anyway, I
hope you succeed in cleaning this up eventually :-).
--

Johan
Daniel Shahaf
2018-10-30 18:29:57 UTC
Permalink
Post by Chris
One thing that was a bit annoying was when the dumpfilter threw an error
because of a source of a file was missing when I filtered out a certain
path and it turned out it had been copied to another location. The error
message only prints out the missing source and not the destination, so I
had to go into the repo to check the revision it crashed on to find the
copy destination and add it to my filter list. Would have been nice if
the error message could list both the source and the destination.
Fixed in https://svn.apache.org/r1845261.

Loading...