Discussion:
Copy and Reduce the size of SVn repos
Rajesh Kumar
2015-03-08 04:57:51 UTC
Permalink
I have one Huge SVN repos which is around 1TB in terms of size. I have two requirement as follows and i would like to know the best approach to be followed to save time and effort.
1. Duplicating the whole repos of 1TB in shorter span of time and create another SVN repos.
2. How to reduce the Repos size drastically without impacting the integrity and version of the files? Here my repos size is 1TB and i want to make it smaller without deleting any files? what are the ways of doing so.....?

-Rajesh

Don’t Miss FOCUS 2015 Orlando – over 100 customer-led sessions and 1 Grammy-winning singer! Learn More > <http://www.jda.com/focus>
Nico Kadel-Garcia
2015-03-08 08:35:10 UTC
Permalink
Post by Rajesh Kumar
I have one Huge SVN repos which is around 1TB in terms of size. I have two
requirement as follows and i would like to know the best approach to be
followed to save time and effort.
According to the doctrine of "there shall be no obliterate command,
the record must be kept absolutely pristine at all costs, praise the
gospel of all history matters!", you don't In theory, history is kept
pristine and cannot be discarded. Sometimes there are even good
historical or legal reasons to do so. Personally, I consider it like
"cleaning your plate". It's a good idea when you're 5 years old,
because your folks want you to eat your vegetables instead of candy
later and they know better than you that you *will* get hungry again
quite soon. But for grownups, with the big pile of starchy empty
content from abandoned branches, and fatty binaries that will just
clog your backups and your workflow and give you a coronary when you
realize *how much* cruft is in the failover backup system, I find it
OK to say "no: we've had enough history" and send some of it to the
wastebasket.

In practice, when your repository has reached a full Terabyte, it's
out of hand and has probably wound up cluttered with unnecessary
binary content, such as jar files, rpm's, or iso images. If it's where
you keep a year of binary releases of a big project, OK, but
otherwise, I think not.
Post by Rajesh Kumar
1. Duplicating the whole repos of 1TB in shorter span of time and create
another SVN repos.
This is straightforward and usually the way to do it if you're in a rush.
Post by Rajesh Kumar
2. How to reduce the Repos size drastically without impacting the
integrity and version of the files? Here my repos size is 1TB and i want to
make it smaller without deleting any files? what are the ways of doing
so.....?
You can't reduce it much without cutting out history. You *can* set a
final tag, do an export of *that*, and import that to a new
repository, or dump the tag and load it as the trunk of a much, much
smaller new reporisoty, and *lock the old repository permamenently",
and *make people check out new clean working copies from the new
repository*. I've done that very effectively in a number of
professional environments, when individual products could and should
have been forked off to sepaqrate repositories.
Post by Rajesh Kumar
-Rajesh
Don’t Miss FOCUS 2015 Orlando – over 100 customer-led sessions and 1
Grammy-winning singer! Learn More >
Branko Čibej
2015-03-08 16:42:17 UTC
Permalink
Post by Nico Kadel-Garcia
Post by Rajesh Kumar
I have one Huge SVN repos which is around 1TB in terms of size. I have two
requirement as follows and i would like to know the best approach to be
followed to save time and effort.
According to the doctrine of "there shall be no obliterate command,
the record must be kept absolutely pristine at all costs, praise the
gospel of all history matters!",
Heh, I have to ask, where did you find that doctrine? There's no such
thing. It's all a lot more mundane: First, you have to get people to
agree what "obliterate" actually means; there are about five meanings
that I know of. And second, all five are insanely hard to implement with
our current repository design (just ask Julian, he spent about a year
trying to come up with a sane, moderately backwards-compatible solution).

-- Brane
Tony Sweeney
2015-03-08 20:31:15 UTC
Permalink
Post by Branko Čibej
Post by Nico Kadel-Garcia
Post by Rajesh Kumar
I have one Huge SVN repos which is around 1TB in terms of size. I have two
requirement as follows and i would like to know the best approach to be
followed to save time and effort.
According to the doctrine of "there shall be no obliterate command,
the record must be kept absolutely pristine at all costs, praise the
gospel of all history matters!",
Heh, I have to ask, where did you find that doctrine? There's no such
thing. It's all a lot more mundane: First, you have to get people to
agree what "obliterate" actually means; there are about five meanings
that I know of. And second, all five are insanely hard to implement with
our current repository design (just ask Julian, he spent about a year
trying to come up with a sane, moderately backwards-compatible solution).
-- Brane
***@fractal:~ # p4 help obliterate

obliterate -- Remove files and their history from the depot

p4 obliterate [-y -A -b -a -h] file[revRange] ...

Obliterate permanently removes files and their history from the server.
(See 'p4 delete' for the non-destructive way to delete a file.)
Obliterate retrieves the disk space used by the obliterated files
in the archive and clears the files from the metadata that is
maintained by the server. Files in client workspaces are not
physically affected, but they are no longer under Perforce control.

Obliterate is aware of lazy copies made when 'p4 integrate' creates
a branch, and does not remove copies that are still in use. Because
of this, obliterating files does not guarantee that the corresponding
files in the archive will be removed.

If the file argument has a revision, the specified revision is
obliterated. If the file argument has a revision range, the
revisions in that range are obliterated. See 'p4 help revisions'
for help.

By default, obliterate displays a preview of the results. To execute
the operation, you must specify the -y flag.

By default, obliterate will not process a revision which has been
archived. To include such revisions, you must specify the -A flag.

Obliterate has three flags that can improve performance:

The '-b' flag restricts files in the argument range to those that
are branched and are both the first revision and the head revision
This flag is useful for removing old branches while keeping files
of interest (files that were modified).

The '-a' flag skips the archive search and removal phase. This
phase of obliterate can take a very long time for sites with big
archive maps (db.archmap). However, file content is not removed;
if the file was a branch, then it's most likely that the archival
search is not necessary. This option is safe to use with the '-b'
option.

The '-h' flag instructs obliterate not to search db.have for all
possible matching records to delete. Usually, db.have is one of the
largest tables in a repository and consequently this search takes
a long time. Do not use this flag when obliterating branches or
namespaces for reuse, because the old content on any client
will not match the newly-added repository files. Note that use of
the -h flag has the side-effect of cleaning the obliterated files
from client workspaces when they are synced.

If you are obliterating files in order to entirely remove a depot
from the server, and files in that depot have been integrated to
other depots, run 'p4 snap' first to break those linkages, so that
obliterate can remove the unreferenced archive files. If, instead,
you specify '-a' to skip the archive removal phase, then you will
need to specify '-f' when deleting the depot, since the presence
of the archive files will prevent the depot deletion.

'p4 obliterate' requires 'admin' access, which is granted by 'p4
protect'.

***@fractal:~ #

As I recall, this was feature request #13 after Perforce was released, and was implemented the best part of 15 years ago. As near as I can tell it's architecturally impossible to implement in Subversion as a consequence of some of the initial design choices. Subversion has served me well, but this has been a glaring misfeature since its inception:

http://svn.haxx.se/dev/archive-2003-01/0364.shtml

Tony.
Les Mikesell
2015-03-08 21:00:31 UTC
Permalink
I have to agree. I can't imagine anyone using subversion for any
length of time without having some things committed that shouldn't be
there. It probably would still be the main topic of conversation
here if everyone had not simply given up hope long ago.
--
Les Mikesell
***@gmail.com
Nico Kadel-Garcia
2015-03-09 01:27:18 UTC
Permalink
Post by Branko Čibej
Post by Nico Kadel-Garcia
Post by Rajesh Kumar
I have one Huge SVN repos which is around 1TB in terms of size. I have two
requirement as follows and i would like to know the best approach to be
followed to save time and effort.
According to the doctrine of "there shall be no obliterate command,
the record must be kept absolutely pristine at all costs, praise the
gospel of all history matters!",
Heh, I have to ask, where did you find that doctrine? There's no such
thing. It's all a lot more mundane: First, you have to get people to
I've had to deal with that doctrine personally and professionally
since first working with Subversion in 2006. It comes up again eveyr
so often, for example in
http://subversion.tigris.org/issues/show_bug.cgi?id=516 and is
relevant to the original poster's request.

There can be both software and legal reasons to ensure that the
history is pristine and never forgets a single byte. But in most
shops, for any lengthy project, *someone* is going to submit
unnecessary bulky binaries, and *someone* is going to create spurious
branches, tags, or other subdirectories that should go the way of the
passenger pigeon.
Post by Branko Čibej
agree what "obliterate" actually means; there are about five meanings
that I know of. And second, all five are insanely hard to implement with
our current repository design (just ask Julian, he spent about a year
trying to come up with a sane, moderately backwards-compatible solution).
-- Brane
I appreciate that it's been awkward. The only workable method method
now is the sort of "svn export; svn import to new repo and discard old
repo" that I described, or a potentially dangerous and often fragile
dump, filter, and reload approach that preserves the original URL's
for the repo, but it's really not the same repo.

It remains messy as heck. This is, in fact, one of the places where
git or other systems's more gracious exclusion or garbage collection
tools doe better. Even CVS had the ability to simply delete a
directory on the main fileserver to discard old debris: it's one of
the risks of the more database based approach of Subversion to
managing the entire repository history.
Les Mikesell
2015-03-10 22:37:18 UTC
Permalink
Post by Nico Kadel-Garcia
Post by Branko Čibej
Heh, I have to ask, where did you find that doctrine? There's no such
thing. It's all a lot more mundane: First, you have to get people to
I've had to deal with that doctrine personally and professionally
since first working with Subversion in 2006. It comes up again eveyr
so often, for example in
http://subversion.tigris.org/issues/show_bug.cgi?id=516 and is
relevant to the original poster's request.
There can be both software and legal reasons to ensure that the
history is pristine and never forgets a single byte. But in most
shops, for any lengthy project, *someone* is going to submit
unnecessary bulky binaries, and *someone* is going to create spurious
branches, tags, or other subdirectories that should go the way of the
passenger pigeon.
Post by Branko Čibej
agree what "obliterate" actually means; there are about five meanings
that I know of. And second, all five are insanely hard to implement with
our current repository design (just ask Julian, he spent about a year
trying to come up with a sane, moderately backwards-compatible solution).
-- Brane
I appreciate that it's been awkward. The only workable method method
now is the sort of "svn export; svn import to new repo and discard old
repo" that I described, or a potentially dangerous and often fragile
dump, filter, and reload approach that preserves the original URL's
for the repo, but it's really not the same repo.
It remains messy as heck. This is, in fact, one of the places where
git or other systems's more gracious exclusion or garbage collection
tools doe better. Even CVS had the ability to simply delete a
directory on the main fileserver to discard old debris: it's one of
the risks of the more database based approach of Subversion to
managing the entire repository history.
Maybe it is time to change the request from 'obliterate' to _any_
reasonable way to fix a repository that has accumulated cruft. And a
big warning to new users to put separate projects in separate
repositories from the start because they are too hard to untangle
later. I've considered dumping ours and trying to split by project,
but I'm not even sure that is possible because many were imported from
CVS then subsequently moved to improve the layout. So I can't really
filter by path.
--
Les Mikesell
***@gmail.com
Stümpfig, Thomas
2015-03-11 07:00:04 UTC
Permalink
Hi all
Actually splitting projects is not a solution to something that eliminates old data. Think of a project with one file only. Legally one might be forced to keep the file at least for 5 or 10 Years. But after this period the very same old revisions of the file must be destroyed because of other legal or contractual obligations. There is enough reason for final deletion of old data. I appreciate very much the work of open source programmers. And as a matter of fact we deal with svn's limitations as we use it with much success for our purposes. Said this, one of the most wanted feature is obliteration.

Regards
Thomas

-----Original Message-----
From: Les Mikesell [mailto:***@gmail.com]
Sent: Dienstag, 10. März 2015 23:37
To: Nico Kadel-Garcia
Cc: Branko Čibej; Subversion
Subject: Re: Copy and Reduce the size of SVn repos
Post by Nico Kadel-Garcia
Post by Branko Čibej
Heh, I have to ask, where did you find that doctrine? There's no such
thing. It's all a lot more mundane: First, you have to get people to
I've had to deal with that doctrine personally and professionally
since first working with Subversion in 2006. It comes up again eveyr
so often, for example in
http://subversion.tigris.org/issues/show_bug.cgi?id=516 and is
relevant to the original poster's request.
There can be both software and legal reasons to ensure that the
history is pristine and never forgets a single byte. But in most
shops, for any lengthy project, *someone* is going to submit
unnecessary bulky binaries, and *someone* is going to create spurious
branches, tags, or other subdirectories that should go the way of the
passenger pigeon.
Post by Branko Čibej
agree what "obliterate" actually means; there are about five meanings
that I know of. And second, all five are insanely hard to implement
with our current repository design (just ask Julian, he spent about a
year trying to come up with a sane, moderately backwards-compatible solution).
-- Brane
I appreciate that it's been awkward. The only workable method method
now is the sort of "svn export; svn import to new repo and discard old
repo" that I described, or a potentially dangerous and often fragile
dump, filter, and reload approach that preserves the original URL's
for the repo, but it's really not the same repo.
It remains messy as heck. This is, in fact, one of the places where
git or other systems's more gracious exclusion or garbage collection
tools doe better. Even CVS had the ability to simply delete a
directory on the main fileserver to discard old debris: it's one of
the risks of the more database based approach of Subversion to
managing the entire repository history.
Maybe it is time to change the request from 'obliterate' to _any_
reasonable way to fix a repository that has accumulated cruft. And a
big warning to new users to put separate projects in separate repositories from the start because they are too hard to untangle later. I've considered dumping ours and trying to split by project, but I'm not even sure that is possible because many were imported from CVS then subsequently moved to improve the layout. So I can't really filter by path.
--
Les Mikesell
***@gmail.com
-----------------
Siemens Industry Software GmbH & Co. KG; Anschrift: Franz-Geuer-Str. 10, 50823 Köln;
Kommanditgesellschaft: Sitz der Gesellschaft: Köln; Registergericht: Amtsgericht Köln, HRA 28227;
Geschäftsführung und persönlich haftender Gesellschafter: Siemens Industry Software Management GmbH;
Geschäftsführer: Urban August, Daniel Trebes; Sitz der Gesellschaft: Köln; Registergericht: Amtsg
Les Mikesell
2015-03-11 12:46:34 UTC
Permalink
On Wed, Mar 11, 2015 at 2:00 AM, Stümpfig, Thomas
Post by Stümpfig, Thomas
l
Actually splitting projects is not a solution to something that eliminates old data.
Correct, but if we give up on getting a working obliterate, we are
left with dump/filter/load as the only way to administer content. And
as a practical matter, how many dump/filter/load cycles do you want to
do on repositories after they go over a few hundred gigs with all of
your development teams waiting for you to get the filters right to
match all the distributed cruft? Also, in many cases over the years
whole projects become obsolete so getting rid of or archiving that
part would be easy if you had used the 'directory of repositories'
approach instead of 'repository of projects' and everything would have
worked about the same.
--
Les Mikesell
***@gmail.com
Andreas Stieger
2015-03-08 16:39:47 UTC
Permalink
Hello,
Post by Rajesh Kumar
2. How to reduce the Repos size drastically without impacting the
integrity and version of the files?
Several points:
A. Are you talking about the on-server repository size or the size of a
working copy? The reasons one needs to ask are that many users regularly
confuse the terms, and that many will check out the root of the
repository into a working copy which unnecessarily increases the on-disk
size of a working copy by duplicating /branches and /tags.

B. For the server on-disk size, ensure representation sharing is enabled
throughout the lifetime of the repository. When using deep tree
structures and large properties also enable "directory and property
storage reduction" available in 1.8. As these only take affect for added
data, you need to perform what is referred to as a dump-load cycle and
switch to the new but content-identical repository. Dump/load are
documented.
Post by Rajesh Kumar
Here my repos size is 1TB and i want to make it smaller without
deleting any files?
Whoa this is kind of what a scm is designed not to allow. And deleting
any files inside the repository tree does not reduce it's size as you of
course retain all history, including deleted items.
Post by Rajesh Kumar
1. Duplicating the whole repos of 1TB in shorter span of time and
create another SVN repos.
You can perform a seamless migration to a second otherwise identical
repository with reduced size. First prepare a replacement offline while
keeping it up-to-date with the original by using svnsync configuration
as documented in the Subversion book. You will need some migration space
for that, can be on the same or another server. The repository URL may
or may not change in the course of that - if it does do take care that
you seamlessly direct users to the new data.

Andreas
Branko Čibej
2015-03-08 16:45:05 UTC
Permalink
Post by Andreas Stieger
Hello,
Post by Rajesh Kumar
2. How to reduce the Repos size drastically without impacting the
integrity and version of the files?
A. Are you talking about the on-server repository size or the size of a
working copy? The reasons one needs to ask are that many users regularly
confuse the terms, and that many will check out the root of the
repository into a working copy which unnecessarily increases the on-disk
size of a working copy by duplicating /branches and /tags.
B. For the server on-disk size, ensure representation sharing is enabled
throughout the lifetime of the repository. When using deep tree
structures and large properties also enable "directory and property
storage reduction" available in 1.8. As these only take affect for added
data, you need to perform what is referred to as a dump-load cycle and
switch to the new but content-identical repository. Dump/load are
documented.
Post by Rajesh Kumar
Here my repos size is 1TB and i want to make it smaller without
deleting any files?
Whoa this is kind of what a scm is designed not to allow. And deleting
any files inside the repository tree does not reduce it's size as you of
course retain all history, including deleted items.
Post by Rajesh Kumar
1. Duplicating the whole repos of 1TB in shorter span of time and
create another SVN repos.
You can perform a seamless migration to a second otherwise identical
repository with reduced size. First prepare a replacement offline while
keeping it up-to-date with the original by using svnsync configuration
as documented in the Subversion book. You will need some migration space
for that, can be on the same or another server. The repository URL may
or may not change in the course of that - if it does do take care that
you seamlessly direct users to the new data.
And it bears repeating: If you replace a repository, please make sure to
restart Apache and/or svnserve to clear stale caches.

-- Brane
Ryan Schmidt
2015-03-09 04:33:14 UTC
Permalink
Post by Rajesh Kumar
I have one Huge SVN repos which is around 1TB in terms of size. I have two requirement as follows and i would like to know the best approach to be followed to save time and effort.
1. Duplicating the whole repos of 1TB in shorter span of time and create another SVN repos.
2. How to reduce the Repos size drastically without impacting the integrity and version of the files? Here my repos size is 1TB and i want to make it smaller without deleting any files? what are the ways of doing so.....?
How long has your repository been in operation? With what version of Subversion did you create it originally?

I ask because newer versions of Subversion store revisions more efficiently than older versions. If your repository was created with, say, Subversion 1.4, and you dump it and load the dump into a new repository created by Subversion 1.8, it will probably be smaller on disk, while containing exactly the same data. There may also be settings you can set in the new repository (before loading) that would make it even smaller.
Rajesh Kumar
2015-03-10 07:01:09 UTC
Permalink
I am still awaiting for response.

From: Rajesh Kumar
Sent: Sunday, March 08, 2015 10:28 AM
To: '***@subversion.apache.org'
Subject: Copy and Reduce the size of SVn repos

I have one Huge SVN repos which is around 1TB in terms of size. I have two requirement as follows and i would like to know the best approach to be followed to save time and effort.
1. Duplicating the whole repos of 1TB in shorter span of time and create another SVN repos.
2. How to reduce the Repos size drastically without impacting the integrity and version of the files? Here my repos size is 1TB and i want to make it smaller without deleting any files? what are the ways of doing so.....?

-Rajesh
Andreas Stieger
2015-03-10 08:14:31 UTC
Permalink
Post by Rajesh Kumar
I am still awaiting for response.
And you can wait for a long time, all until you read the responses you were already given:
http://mail-archives.apache.org/mod_mbox/subversion-users/201503.mbox/browser

It helps to read:
https://subversion.apache.org/mailing-lists.html

Andreas
Loading...