Discussion:
Exe files corrupted in SVN after import from CVS
Bo Berglund
2018-01-20 16:25:47 UTC
Permalink
I have found that there is a problem in our SVN repository, which was
converted from CVS using cvs2svn 2.5.0.

It concerns two exe files which are corrupted when I check out or
export them. The trunk files have been expanded by 905 bytes or shrunk
by 119 bytes in the two cases. Both are in the same project. They have
82 and 67 commits into them respectively.

Other projects I had checked before do not show this behaviour, so it
was a surprise when this was seen today.
I also tried to export an old tagged version from 2008, but it too has
the same corruption. Most easily seen by the missing icon in Windows
Explorer.

The files were originally managed in CVS as -kb (binary) files so the
cvs2svn conversion should not have treated these files any different
than other exe files in other projects.

But there is one difference between these files and virtually all
other CVS stored files and this is the size of the RCS files in the
repository. The xxx,v files are 487Mb and 367Mb respectively.
The next biggest files in another project are 202Mb and 160Mb.

Maybe the cvs2svn script choked on the very big file sizes for the
problem file?
I can do a svn check out or export on the files in the other project
with the smaller sized RCS files and that works OK.
So if the problem is really the size of the input CVS files then I
wonder if this is recoverable at all?

Other parts of the SVN repository has been used already and there are
new commits, so I cannot scrap it all and start over...

But luckily the project it belongs to has not yet been touched by a
SVN commit so given that I can remove it completely it should be
possible to convert it once more (only this single project) and then
load it from the dump file into the repository again. Or is this not
possible?

Is there any way to re-migrate just the affected project from the CVS
files into Svn with some improved configuration of the cvs2svn
process? Each project (top level directory in CVS) has been treated as
a separate svn project during import so the trunk, tags and branches
directories are located separately inside each such project.

So if I re-convert only this single project I will get an SVN dump
file that only contains this and it should be possible to load it,
provided that the already loaded project by the same name can somehow
be completely removed from the SVN server. If I just delete it via svn
I guess it will still live there as older revisions and blocking a
renewed load, right?

Being new to Subversion I don't know where to continue my search for
how to resolve the problem.
--
Bo Berglund
Developer in Sweden
Nico Kadel-Garcia
2018-01-20 18:19:26 UTC
Permalink
Post by Bo Berglund
Maybe the cvs2svn script choked on the very big file sizes for the
problem file?
Can you re-run csv2svn to build a different Subversion repo and do a
comparison? And are those files that were frequently updated and may
have gotten an individual update messed up?
Post by Bo Berglund
But luckily the project it belongs to has not yet been touched by a
SVN commit so given that I can remove it completely it should be
possible to convert it once more (only this single project) and then
load it from the dump file into the repository again. Or is this not
possible?
And... this is the part where you start thinking about putting
individual projects in individual repos.
Post by Bo Berglund
Is there any way to re-migrate just the affected project from the CVS
files into Svn with some improved configuration of the cvs2svn
process? Each project (top level directory in CVS) has been treated as
a separate svn project during import so the trunk, tags and branches
directories are located separately inside each such project.
Part of the problem is that it's very painful, and normally considered
heresy, to delete *anything* from a Subversion repository. So you're
stuck with the corrupted files in the history, and as you said,
they're quite large. That looks to me like time to put that project in
a separate repository, and if necessary don't bother with cvs2svn.
Take a working snapshot of the old CVS working copy, import that into
a new project, and put a README.md in place to say "hey, history has
been discarded as part of the import for this project!"
Post by Bo Berglund
So if I re-convert only this single project I will get an SVN dump
file that only contains this and it should be possible to load it,
provided that the already loaded project by the same name can somehow
be completely removed from the SVN server. If I just delete it via svn
I guess it will still live there as older revisions and blocking a
renewed load, right?
Not if the old material is still there. In theory, you could use
svnadmin dump, svndumpfilter, and svnadmin load to create a *new*
Subversion multi-project repository without that content, and import a
new load into *that*. But if you have commits that affected multiple
repositories, including this one, I think you're looking at trouble
doing this.

And yes, "svn delete" still leaves the old material embedded in the
history. Expect adventures, and a much bulkier master repository with
changes for hundreds of megabyte files, never cleared out.
Post by Bo Berglund
Being new to Subversion I don't know where to continue my search for
how to resolve the problem.
I suspect you can simplify the whole situation a great deal by moving
at least this project to its own Subversion repository. You should be
able to to at least test and debug ideas by doing a cvs2svn of just
that project and trying out ideas. And you can import that into a
local working Subversion repository, rather than touching the primary
repository at all.

There is also a philosophically nasty trick that I've used. Use
cvs2git to get a working git repo, one you can work with locally
without touching the Subversion repository. Re-arrange *that*,
especially pruning content, obsolete tags and obsolete branches you
don't want to import to the new repository more dynamically. This can
can allow much more sophisticated pruning than "svndumpfilter". Then
Apply "git gc --aggressive". And yes, this discards history, which is
the primary point of svndumpfilter, but can be much more tightly tuned
this way. Push that to a testable Subversion repository, with "git
svn". As much as I appreciate Subversion's "one repository to rule
them all" approach for a master, centralized repository, this has been
my friend in transferring or re-arranging a number of Subversion
repositories with awkward, bulky binaries in their history.
Bo Berglund
2018-01-21 20:09:58 UTC
Permalink
Post by Nico Kadel-Garcia
I suspect you can simplify the whole situation a great deal by moving
at least this project to its own Subversion repository. You should be
able to to at least test and debug ideas by doing a cvs2svn of just
that project and trying out ideas. And you can import that into a
local working Subversion repository, rather than touching the primary
repository at all.
I am leaning toward a completely different approach now concerning the PC (Windows) software development repository:

1) I leave the CVS server running but I confgure it as read-only.
This makes it possible for people to export older stuff if needed.

2) I export trunk of the projects that need to be worked on from CVS

3) I start over with an empty SVN repository and import the exported
projects into SVN, so they are the first revision there.

This way svn will be a smaller size and the content should be OK since the projects are not *converted* from CVS but imported as regular normal projects.

We also use CVS (and now SVN) as a store for drawings and printed circuit board projects (PCB).

We are using an engineering repository for these but I found that the converted structure is less than optimal when using SVN because of the tags and branches directories. In CVS we have a subdirectory for drawing sources (DWG) and another for PDF versions (PDF) and also a subdirectory for the boards (PCB). Like this:
REPO
|-DWG
|-PDF
|-PCB
|-and a few more that do not have sub-containers

These subdirectories were treated as "projects" during conversion, which has led to a problem since there is now only one tags, trunk and brances dir for ALL of the drawings and another set for all PCB:s etc.

So when I looked at the converted repository there is a total mess in the tags directory because we have used tag names like Rev_A, Rev_B etc for almost all PCB:s and now these are merged and contains an assortment of different unrelated projects...
In CVS the tag was a property of any given *file* but in SVN it is on a whole directory, or really not even this...
It is just a copy of the directory with a different name, not a property of the directory...

Seems like I have to scrap the conversion also for Engineering and do something else, but what? Separate repositories for drawings, PDF releases and PCB:s maybe?

Regards,
Bo B

(PS: Had to send this as regular email since the posting I made through Gmane seems to have disappeared. DS)
Bo Berglund
2018-01-21 16:18:17 UTC
Permalink
On Sat, 20 Jan 2018 13:19:26 -0500, Nico Kadel-Garcia
Post by Nico Kadel-Garcia
I suspect you can simplify the whole situation a great deal by moving
at least this project to its own Subversion repository. You should be
able to to at least test and debug ideas by doing a cvs2svn of just
that project and trying out ideas. And you can import that into a
local working Subversion repository, rather than touching the primary
repository at all.
I am leaning toward a completely different approach now concerning the
PC (Windows) software development repository:

1) I leave the CVS server running but I confgure it as read-only.
This makes it possible for people to export older stuff if needed.

2) I export trunk of the projects that need to be worked on from CVS

3) I start over with an empty SVN repository and import the exported
projects into SVN, so they are the first revision there.

This way svn will be a smaller size and the content should be OK since
the projects are not *converted* from CVS but imported as regular
normal projects.

We also use CVS (and now SVN) as a store for drawings and printed
circuit board projects (PCB).

We are using an engineering repository for these but I found that the
converted structure is less than optimal when using SVN because of the
tags and branches directories. In CVS we have a subdirectory for
drawing sources (DWG) and another for PDF versions (PDF) and also a
subdirectory for the boards (PCB). Like this:
REPO
|-DWG
|-PDF
|-PCB
|-and a few more that do not have sub-containers

These subdirectories were treated as "projects" during conversion,
which has led to a problem since there is now only one tags, trunk and
brances dir for ALL of the drawings and another set for all PCB:s etc.

So when I looked at the converted repository there is a total mess in
the tags directory because we have used tag names like Rev_A, Rev_B
etc for almost all PCB:s and now these are merged and contains an
assortment of different unrelated projects...
In CVS the tag was a property of any given *file* but in SVN it is on
a whole directory, or really not even this...
It is just a copy of the directory with a different name, not a
property of the directory...

Seems like I have to scrap the conversion also for Engineering and do
something else, but what? Separate repositories for drawings, PDF
releases and PCB:s maybe?
--
Bo Berglund
Developer in Sweden
Ryan Schmidt
2018-01-22 04:23:21 UTC
Permalink
Post by Bo Berglund
I have found that there is a problem in our SVN repository, which was
converted from CVS using cvs2svn 2.5.0.
It concerns two exe files which are corrupted when I check out or
export them. The trunk files have been expanded by 905 bytes or shrunk
by 119 bytes in the two cases. Both are in the same project. They have
82 and 67 commits into them respectively.
Did you or your conversion process set the svn:eol-style property on these files? If so, that's why they got corrupted; you mustn't set that property on binary files.
Bo Berglund
2018-01-22 07:42:22 UTC
Permalink
On Sun, 21 Jan 2018 22:23:21 -0600, Ryan Schmidt
Post by Ryan Schmidt
Post by Bo Berglund
I have found that there is a problem in our SVN repository, which was
converted from CVS using cvs2svn 2.5.0.
It concerns two exe files which are corrupted when I check out or
export them. The trunk files have been expanded by 905 bytes or shrunk
by 119 bytes in the two cases. Both are in the same project. They have
82 and 67 commits into them respectively.
Did you or your conversion process set the svn:eol-style property on
these files? If so, that's why they got corrupted; you mustn't set
that property on binary files.
Well, most of the exe files got converted without this problem, so it
could not really be a global mistake in the way file properties are
set.

This is how I configured individual conversions, the remaining options
file content stayed the same for all repos:

outdumpfile='pc-dump'
inputreponame='PC'

This is my options settings for the cvs2svn conversion regarding
properties, mostly defaults in the options file example from cvs2svn):

ctx.file_property_setters.extend([
CVSBinaryFileEOLStyleSetter(),
CVSBinaryFileDefaultMimeTypeSetter(),
DefaultEOLStyleSetter(None),
SVNBinaryFileKeywordsPropertySetter(),
KeywordsPropertySetter(config.SVN_KEYWORDS_VALUE),
ExecutablePropertySetter(),
DescriptionPropertySetter(propname='cvs:description'),
SVNKeywordHandlingPropertySetter(),
SVNEOLFixPropertySetter(),
])
ctx.revision_property_setters.extend([
])

and this is how the CVS top level directories were treated as
"projects" during conversion:

# 1)List all projects automatically
import os
cvs_repo_main_dir = '/home/bosse/CVSREPOS/' + inputreponame
projects = os.listdir(cvs_repo_main_dir)

# 2) Probably you don't want to convert CVSROOT:
projects.remove('CVSROOT')

# 3) Now loop projects and add to conversion list
for project in projects:
run_options.add_project(
cvs_repo_main_dir + '/' + project,
trunk_path=(project + '/trunk'),
branches_path=(project + '/branches'),
tags_path=(project + '/tags'),
symbol_strategy_rules=global_symbol_strategy_rules,
)

So, the options file stayed the same for all 8 CVS repositories except
regarding the definition of the repo to convert and the output dump
file name. Also I have not (yet) found any more corrupted exe files
than these two that were compiled using Borland C++Builder and had CVS
revision files with a huge size of 487Mb and 367Mb respectively.

Therefore I suspect that either the RCS file size itself caused a
problem for cvs2svn, or there is some kind of internal exe file byte
pattern in such files that triggers an action in cvs2svn which causes
the file corruption...

The Ubuntu 16.04 Server (virtual machine) on which I ran the
conversion has a RAM allotment of 2 GB, maybe this is too little when
dealing with these huge files?

Checking out these files from CVS, even very old revisions from say
2006, still works successfully.
--
Bo Berglund
Developer in Sweden
Scott Bloom
2018-01-22 09:02:44 UTC
Permalink
When I have used cvs2svn, I had a couple of these issues as well..

It came down to improper settings on the cvs side, but since the binary files were never modified, there was no corruption due to cvs thinking it was a text file.

What I wound up doing, was simply finding all the expected binary files... and re-checking them in, after the conversion with proper SVN settings.

Scott


-----Original Message-----
From: Bo Berglund [mailto:***@gmail.com]
Sent: Sunday, January 21, 2018 11:42 PM
To: ***@subversion.apache.org
Subject: Re: Exe files corrupted in SVN after import from CVS
Post by Ryan Schmidt
Post by Bo Berglund
I have found that there is a problem in our SVN repository, which was
converted from CVS using cvs2svn 2.5.0.
It concerns two exe files which are corrupted when I check out or
export them. The trunk files have been expanded by 905 bytes or
shrunk by 119 bytes in the two cases. Both are in the same project.
They have
82 and 67 commits into them respectively.
Did you or your conversion process set the svn:eol-style property on
these files? If so, that's why they got corrupted; you mustn't set that
property on binary files.
Well, most of the exe files got converted without this problem, so it could not really be a global mistake in the way file properties are set.

This is how I configured individual conversions, the remaining options file content stayed the same for all repos:

outdumpfile='pc-dump'
inputreponame='PC'

This is my options settings for the cvs2svn conversion regarding properties, mostly defaults in the options file example from cvs2svn):

ctx.file_property_setters.extend([
CVSBinaryFileEOLStyleSetter(),
CVSBinaryFileDefaultMimeTypeSetter(),
DefaultEOLStyleSetter(None),
SVNBinaryFileKeywordsPropertySetter(),
KeywordsPropertySetter(config.SVN_KEYWORDS_VALUE),
ExecutablePropertySetter(),
DescriptionPropertySetter(propname='cvs:description'),
SVNKeywordHandlingPropertySetter(),
SVNEOLFixPropertySetter(),
])
ctx.revision_property_setters.extend([
])

and this is how the CVS top level directories were treated as "projects" during conversion:

# 1)List all projects automatically
import os
cvs_repo_main_dir = '/home/bosse/CVSREPOS/' + inputreponame projects = os.listdir(cvs_repo_main_dir)

# 2) Probably you don't want to convert CVSROOT:
projects.remove('CVSROOT')

# 3) Now loop projects and add to conversion list for project in projects:
run_options.add_project(
cvs_repo_main_dir + '/' + project,
trunk_path=(project + '/trunk'),
branches_path=(project + '/branches'),
tags_path=(project + '/tags'),
symbol_strategy_rules=global_symbol_strategy_rules,
)

So, the options file stayed the same for all 8 CVS repositories except regarding the definition of the repo to convert and the output dump file name. Also I have not (yet) found any more corrupted exe files than these two that were compiled using Borland C++Builder and had CVS revision files with a huge size of 487Mb and 367Mb respectively.

Therefore I suspect that either the RCS file size itself caused a problem for cvs2svn, or there is some kind of internal exe file byte pattern in such files that triggers an action in cvs2svn which causes the file corruption...

The Ubuntu 16.04 Server (virtual machine) on which I ran the conversion has a RAM allotment of 2 GB, maybe this is too little when dealing with these huge files?

Checking out these files from CVS, even very old revisions from say 2006, still works successfully.


--
Bo Berglund
Developer in Sweden
Bo Berglund
2018-01-22 09:38:55 UTC
Permalink
Post by Scott Bloom
When I have used cvs2svn, I had a couple of these issues as well..
It came down to improper settings on the cvs side, but since the
binary files were never modified, there was no corruption due to
cvs thinking it was a text file.
What I wound up doing, was simply finding all the expected binary
files... and re-checking them in, after the conversion with proper
SVN settings.
Scott
OK thanks,
I have now retrieved the latest CVS file versions on trunk and copied
them into my svn working copy with the corrupted exe files so I could
commit them to svn. And before I committed them I also explicitly set
the file MIME properties to binary (using the SmartSvn properties
dialogue).
Now when I export trunk they are OK.
So at least as long as one stays on trunk these files will be OK.
--
Bo Berglund
Developer in Sweden
Nico Kadel-Garcia
2018-01-22 12:48:43 UTC
Permalink
Post by Bo Berglund
Post by Scott Bloom
When I have used cvs2svn, I had a couple of these issues as well..
It came down to improper settings on the cvs side, but since the
binary files were never modified, there was no corruption due to
cvs thinking it was a text file.
What I wound up doing, was simply finding all the expected binary
files... and re-checking them in, after the conversion with proper
SVN settings.
Scott
OK thanks,
I have now retrieved the latest CVS file versions on trunk and copied
them into my svn working copy with the corrupted exe files so I could
commit them to svn. And before I committed them I also explicitly set
the file MIME properties to binary (using the SmartSvn properties
dialogue).
Now when I export trunk they are OK.
So at least as long as one stays on trunk these files will be OK.
That makes sense. I'm glad you were able to work it out.

Some folks, like me, consider EOL reprocessing on checking and
checkout to be a very dangerous habit and one that should be avoided
in source control systems, It works great, until it doesn't, as you've
just found.
Branko Čibej
2018-01-22 13:03:51 UTC
Permalink
Post by Nico Kadel-Garcia
Post by Bo Berglund
Post by Scott Bloom
When I have used cvs2svn, I had a couple of these issues as well..
It came down to improper settings on the cvs side, but since the
binary files were never modified, there was no corruption due to
cvs thinking it was a text file.
What I wound up doing, was simply finding all the expected binary
files... and re-checking them in, after the conversion with proper
SVN settings.
Scott
OK thanks,
I have now retrieved the latest CVS file versions on trunk and copied
them into my svn working copy with the corrupted exe files so I could
commit them to svn. And before I committed them I also explicitly set
the file MIME properties to binary (using the SmartSvn properties
dialogue).
Now when I export trunk they are OK.
So at least as long as one stays on trunk these files will be OK.
That makes sense. I'm glad you were able to work it out.
Some folks, like me, consider EOL reprocessing on checking and
checkout to be a very dangerous habit and one that should be avoided
in source control systems, It works great, until it doesn't, as you've
just found.
... which is precisely why Subversion doesn't do this by default.

But the moral of this whole story is this: After any major surgery on a
version control system — and conversion from one system to another is
certainly major — one should thoroughly verify the result before
bringing it into production.

-- Brane
Bo Berglund
2018-01-22 20:42:37 UTC
Permalink
Post by Branko Čibej
But the moral of this whole story is this: After any major surgery on a
version control system — and conversion from one system to another is
certainly major — one should thoroughly verify the result before
bringing it into production.
Branko,
you are completely right and now I am considering to go the extreme
way and *only* put the latest HEAD revision on TRUNK for all projects
into SVN. Basically taking a snapshot at the time I closed down CVS
and import that into svn.

So I wonder if there is a way to automate this?
Do you (or anyone else reading this) know if cvs2svn can be set to
only deal with the HEAD revision of files on TRUNK when creating the
dump files?
I still want the target projects to be first-level directories in the
resulting svn repository (I am dealing with 8-9 CVS repositories
here). So the stuff I have put at the end of my options file must
still work:

# 1)List all projects automatically
import os
cvs_repo_main_dir = '/home/bosse/CVSREPOS/' + inputreponame
projects = os.listdir(cvs_repo_main_dir)

# 2) Probably you don't want to convert CVSROOT:
projects.remove('CVSROOT')

# 3) Now loop projects and add to conversion list
for project in projects:
run_options.add_project(
cvs_repo_main_dir + '/' + project,
trunk_path=(project + '/trunk'),
branches_path=(project + '/branches'),
tags_path=(project + '/tags'),
symbol_strategy_rules=global_symbol_strategy_rules,
)

This part is what I think makes cvs2svn scan the top level and create
a train of svn commands to stuff those projects into svn with tags and
all...

And I would like to get this done with some automation also, but for
HEAD only...

I found an option like this:
ctx.trunk_only = False

Setting it to True will make the conversion only include TRUNK
revisions AFAICT.

But I did not find anything like:
ctx.head_only = True

This option (if it existed) would make the conversion simpler by only
considering the HEAD revision of every RCS file.
--
Bo Berglund
Developer in Sweden
Loading...