Discussion:
svn hotcopy incremental overwrites existing revisions in backup
lumi
2017-05-16 13:21:06 UTC
Permalink
I use "svnadmin hotcopy --incremental" command to create backup. Subversion
1.9.5. It was discovered that this command recreates already backuped
revisions files which size exceeds e.g. 120kb in directory db/revs/. Backup
log (1st backup was made to empty folder, next ones to the same folder):
C:\Users\Администратор.WIN-DBM2OE9OJ54>svnadmin hotcopy
D:\Repositories\Sandbox D:\Test --incremental
/* Copied revision 0.
* Copied revision 1.
* Copied revision 2.
* Copied revision 3.
* Copied revision 4.
* Copied revision 5.
* Copied revision 6.
* Copied revision 7.
* Copied revision 8.
* Copied revision 9.
* Copied revision 10.
* Copied revision 11.
* Copied revision 12.
* Copied revision 13.
* Copied revision 14.
* Copied revision 15.
* Copied revision 16.
* Copied revision 17.
* Copied revision 18.
* Copied revision 19.
* Copied revision 20.
* Copied revision 21.
* Copied revision 22.
* Copied revision 23.
* Copied revision 24.
* Copied revision 25.
* Copied revision 26.
* Copied revision 27.
* Copied revision 28.
* Copied revision 29.
* Copied revision 30.
* Copied revision 31.
* Copied revision 32.
* Copied revision 33.
* Copied revision 34.
* Copied revision 35.
* Copied revision 36.
* Copied revision 37.
* Copied revision 38.
* Copied revision 39.
* Copied revision 40.
* Copied revision 41.
* Copied revision 42.
* Copied revision 43.
* Copied revision 44.
* Copied revision 45.
* Copied revision 46.
* Copied revision 47.
* Copied revision 48.
* Copied revision 49.
* Copied revision 50.
* Copied revision 51.
* Copied revision 52.
* Copied revision 53.
* Copied revision 54.
* Copied revision 55.

C:\Users\Администратор.WIN-DBM2OE9OJ54>svnadmin hotcopy
D:\Repositories\Sandbox D:\Test --incremental
* Copied revision 14.
* Copied revision 21.
* Copied revision 22.

C:\Users\Администратор.WIN-DBM2OE9OJ54>svnadmin hotcopy
D:\Repositories\Sandbox D:\Test --incremental
* Copied revision 14.
* Copied revision 21.
* Copied revision 22./

And so on with each next hotcopy --incremental command. Binary comparison
revision 14, 21, 22 files of original repositary and backup gives equal
result. What reason of this strange behaviour?



--
View this message in context: http://subversion.1072662.n5.nabble.com/svn-hotcopy-incremental-overwrites-existing-revisions-in-backup-tp198977.html
Sent from the Subversion Users mailing list archive at Nabble.com.
Daniel Shahaf
2017-05-16 13:44:08 UTC
Permalink
Post by lumi
C:\Users\Администратор.WIN-DBM2OE9OJ54>svnadmin hotcopy
D:\Repositories\Sandbox D:\Test --incremental
* Copied revision 14.
* Copied revision 21.
* Copied revision 22.
C:\Users\Администратор.WIN-DBM2OE9OJ54>svnadmin hotcopy
D:\Repositories\Sandbox D:\Test --incremental
* Copied revision 14.
* Copied revision 21.
* Copied revision 22./
And so on with each next hotcopy --incremental command. Binary comparison
revision 14, 21, 22 files of original repositary and backup gives equal
result. What reason of this strange behaviour?
I can't reproduce this:

% rm -rf r d
% svnadmin create r
% repeat 100 svnmucc put -mm -U file://$PWD/r =(dd if=/dev/urandom bs=1k count=200 2>/dev/null) f$RANDOM.$RANDOM >/dev/null
% svnadmin hotcopy --incremental r d >/dev/null
% svnadmin hotcopy --incremental r d
% svnadmin hotcopy --incremental r d
% svnadmin hotcopy --incremental r d
% svnadmin hotcopy --incremental r d
% 13:39

If you delete D:\Test and run the 'hotcopy' command three more times,
does it say 14, 21, 22 in those times too?

What filesystem is D:? Is it NTFS, or a network drive, or…?
lumi
2017-05-16 16:48:00 UTC
Permalink
NTFS with deduplication enabled (Windows Server 2016). Problem files have APL
attributes (Archive, SparseFile, ReparsePoint), which means that file takes
part in deduplication I guess. Hotcopy of repository to WebDav network drive
gives exactly the same result. It means that problem in source files.



--
View this message in context: http://subversion.1072662.n5.nabble.com/svn-hotcopy-incremental-overwrites-existing-revisions-in-backup-tp198977p198984.html
Sent from the Subversion Users mailing list archive at Nabble.com.
Stefan Sperling
2017-05-16 13:53:54 UTC
Permalink
Post by lumi
And so on with each next hotcopy --incremental command. Binary comparison
revision 14, 21, 22 files of original repositary and backup gives equal
result. What reason of this strange behaviour?
The only possible reasons are a size mismatch or a timestamp mismatch
on the affected files.
lumi
2017-05-16 17:57:44 UTC
Permalink
Size mismatch is definitly takes place. Actual size is normal, but size on
disk is a bit unreal, again because of deduplication.
<Loading Image...>



--
View this message in context: http://subversion.1072662.n5.nabble.com/svn-hotcopy-incremental-overwrites-existing-revisions-in-backup-tp198977p198986.html
Sent from the Subversion Users mailing list archive at Nabble.com.
Stefan Sperling
2017-05-16 18:10:56 UTC
Permalink
Post by lumi
Size mismatch is definitly takes place. Actual size is normal, but size on
disk is a bit unreal, again because of deduplication.
<http://subversion.1072662.n5.nabble.com/file/n198986/FileSize.png>
The whole point of a hotcopy is to have a 1-to-1 bit-identical backup.
If the NTFS filesystem which stores the backup is de-duplicating files
in a way that makes their filesize change, then incremental hotcopy
cannot work. By design, incremental hotcopy compares the size and
timestamp to see if a revision file must be copied again.
So if you really want to use svnadmin hotcopy you should disable the
de-duplication feature on the target filesystem.

But there are other tools you could use for backup purposes instead,
such as svnadmin dump/load and svnsync (e.g. with file:// URLs).
These should work fine with NTFS de-duplication enabled on backup storage.
See http://svnbook.red-bean.com/nightly/en/svn.reposadmin.maint.html#svn.reposadmin.maint.migrate
and http://svnbook.red-bean.com/nightly/en/svn.reposadmin.maint.html#svn.reposadmin.maint.replication
lumi
2017-05-16 19:02:38 UTC
Permalink
Isn't it a mistake in method of getting file size on deduplicated volume?
What I showed on screenshots is what Windows Explorer says. Other
applications shows only one size value, e.g. Powershell Get-ItemProperty
gets only actual size, no matter deduplicated volume or not, and this size
is always the same.
/PS C:\Users\Администратор.WIN-DBM2OE9OJ54> Get-ItemProperty -Path
D:\Repositories\Sandbox\db\revs\0\14


Каталог: D:\Repositories\Sandbox\db\revs\0


Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---l 16.03.2017 13:00 126748 14



PS C:\Users\Администратор.WIN-DBM2OE9OJ54> Get-ItemProperty -Path
Y:\RepositoriesBackup\Daily\Sandbox\db\revs\0\14


Каталог: Y:\RepositoriesBackup\Daily\Sandbox\db\revs\0


Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 16.05.2017 11:08 126748 14 /

The first one is deduplicated file on local drive, the second is hotcopied
file on WebDav Network drive.



--
View this message in context: http://subversion.1072662.n5.nabble.com/svn-hotcopy-incremental-overwrites-existing-revisions-in-backup-tp198977p198990.html
Sent from the Subversion Users mailing list archive at Nabble.com.
Stefan Sperling
2017-05-16 19:28:08 UTC
Permalink
Post by lumi
Isn't it a mistake in method of getting file size on deduplicated volume?
Subversion asks APR (a portability library) for the filesize.
APR does something to find that size. Subversion uses the value reported
by APR, and Subversion does not care about how APR figured it out.

So if there is a problem with how the size is determined on Windows with
NTFS de-duplication enabled, then this problem is probably located in APR
and should be fixed there. The APR project is at https://apr.apache.org

That said, if you know of a way to find the correct size with the win32 API
we could probably patch Subversion to bypass APR for this specific case.
But APR would have to be fixed anyway.
Branko Čibej
2017-05-16 19:40:40 UTC
Permalink
Post by Stefan Sperling
Post by lumi
Isn't it a mistake in method of getting file size on deduplicated volume?
Subversion asks APR (a portability library) for the filesize.
APR does something to find that size. Subversion uses the value reported
by APR, and Subversion does not care about how APR figured it out.
So if there is a problem with how the size is determined on Windows with
NTFS de-duplication enabled, then this problem is probably located in APR
and should be fixed there. The APR project is at https://apr.apache.org
That said, if you know of a way to find the correct size with the win32 API
we could probably patch Subversion to bypass APR for this specific case.
But APR would have to be fixed anyway.
I suspect the ReparsePoint attribute on the file is what actually makes
APR hiccup. A "reparse point" is distantly related to a unix symlink ...
it tells the file-system path resolver to restart with a different path.
I bet that APR reports is the size of the reparse-point record instead
of the size of the target file, but when we open the file we get the
actual file contents.

-- Brane

Loading...