Discussion:
Checkout without storing two copies
Robert Hickman
2017-09-26 18:13:08 UTC
Permalink
Hello,

I tend to work on projects with a large amount of binary data along
with source code and need to track them together. To this date
Subversion is the only tool that I've used which handles this
dependably. That being said I have one major issue with it - the last
time I used SVN it stored two copies of every file in a checkout. For
what I am doing this additional data is useless. I frequently add new
binary files but rarely modify them in place. It would be extremely
useful for me if there was an option to store only one copy and loose
delta uploads. It's all new data so there is nothing to delta anyway.

As I have not used SVN for several years I realize that this feature
may have been added. If not has it been considered?

Robert.
Ryan Schmidt
2017-09-27 11:01:33 UTC
Permalink
Post by Robert Hickman
I tend to work on projects with a large amount of binary data along
with source code and need to track them together. To this date
Subversion is the only tool that I've used which handles this
dependably. That being said I have one major issue with it - the last
time I used SVN it stored two copies of every file in a checkout. For
what I am doing this additional data is useless. I frequently add new
binary files but rarely modify them in place. It would be extremely
useful for me if there was an option to store only one copy and loose
delta uploads. It's all new data so there is nothing to delta anyway.
As I have not used SVN for several years I realize that this feature
may have been added. If not has it been considered?
The feature hasn't been added yet.

https://issues.apache.org/jira/browse/SVN-525
Stefan Sperling
2017-09-27 11:16:01 UTC
Permalink
Post by Ryan Schmidt
Post by Robert Hickman
I tend to work on projects with a large amount of binary data along
with source code and need to track them together. To this date
Subversion is the only tool that I've used which handles this
dependably. That being said I have one major issue with it - the last
time I used SVN it stored two copies of every file in a checkout. For
what I am doing this additional data is useless. I frequently add new
binary files but rarely modify them in place. It would be extremely
useful for me if there was an option to store only one copy and loose
delta uploads. It's all new data so there is nothing to delta anyway.
As I have not used SVN for several years I realize that this feature
may have been added. If not has it been considered?
The feature hasn't been added yet.
https://issues.apache.org/jira/browse/SVN-525
I suspect the only problem with this feature request is that nobody
has time to work on it :-/
Robert Hickman
2017-09-27 16:21:57 UTC
Permalink
@Ryan Schmidt @Stefan Sperling. I guess that the difficulty of
implementing this depends on how much of the client code depends on
the existence of those files. From the linked bug tracker item, the
answer appears to be 'quite a lot', though I don't know anything about
this codebase.

@Paul Hammant This is used by myself only and I am familiar with
working with SVN and GIT on the command line, but in no way an
'expert'. I prefer tools which are simple and developer focused and
use Linux exclusively, prefer text file configuration.

I mainly work on a desktop but sometimes need to move part of the
file-system onto a laptop and then merge it back again. By file size
most of this data is DSLR raw files and source video, most of which is
related to a website with associated source code. Additionally I have
multiple unrelated personal projects from the past 7 years which need
to be in there own repositories. And miscellaneous 'stuff' which also
needs to stay separate. Some files are interdependent, some are not.

I too have developed a tool to fit my needs, having become
sufficiently frustrated with other tools. However I feel that I'm just
reimplementing part of Subversion, hence the question.

https://github.com/robehickman/simple-http-file-sync

The implementation of this system is very naive, for example storing
it's file manifest as JSON. It also has a number of problems that I
haven't fixed yet. However I've been quite surprised at how well it
works. It handles 16,000 individual files in one of my projects
without difficulty, the biggest bottleneck being the network.
Currently I'm using this to manage the binary stuff and git for code.

I was surprised how easy it was to implement this. The above system is
just over 1000 lines of python and a good chunk of that is a
journaling file-system interface.
Post by Stefan Sperling
Post by Ryan Schmidt
Post by Robert Hickman
I tend to work on projects with a large amount of binary data along
with source code and need to track them together. To this date
Subversion is the only tool that I've used which handles this
dependably. That being said I have one major issue with it - the last
time I used SVN it stored two copies of every file in a checkout. For
what I am doing this additional data is useless. I frequently add new
binary files but rarely modify them in place. It would be extremely
useful for me if there was an option to store only one copy and loose
delta uploads. It's all new data so there is nothing to delta anyway.
As I have not used SVN for several years I realize that this feature
may have been added. If not has it been considered?
The feature hasn't been added yet.
https://issues.apache.org/jira/browse/SVN-525
I suspect the only problem with this feature request is that nobody
has time to work on it :-/
Paul Hammant
2017-09-27 17:14:31 UTC
Permalink
* HTTP(S) based sync protocol.
Mine uses Subversions WebDAV as is.
* All files, both on the client and the server, are stored as plain files
with there original names.

Mine too, or plain binary 'as is'
* Stores limited version history on the server only. Has limited support
for file versioning.

Mine: Only current version is stored on the client. Works out if 'clash' is
about to happen.
* No web/graphical UI Designed to perform a single function only,
provides a minimal command line interface.

Yup, though I'll have a tray piece in time like
https://www.sparkleshare.org/ and DropBox.
* No database dependency Stores file manifest information as regular
JSON.

I've metadata stored client side in JSON too.
* Atomic file system operations through journaling.
I've no server side beyond Subversion.

* Supports partial checkouts

Got that too :)

For your README, you'd be better to move the rationale to a separate page,
and concentrate on hooking the potential user in.

- Paul
Robert Hickman
2017-09-27 20:10:34 UTC
Permalink
Post by Paul Hammant
Mine uses Subversions WebDAV as is.
What is subversions WebDAV interface like to work with?
Post by Paul Hammant
* Atomic file system operations through journaling.
I've no server side beyond Subversion.
The journaling system was mostly needed on the client. During a
'checkout' any file being placed in the local file system also has to
be added to the manifest. If this is not atomic the FS could be left
in an inconsistent state. The journal goes some way towards solving
this as it can detect if something caused a failure between the two
operations and roll back.

This does nothing to help if another process is modifying the
file-system at the same time. The only way of addressing that would be
to lock the whole file-system during that change. I have not found
this to be an issue in practice.
Post by Paul Hammant
Yup, though I'll have a tray piece in time
Personally I'm happy with scrolling text in a terminal. I use Xmonad
almost stock, no system tray or any kind of status-bar.
Post by Paul Hammant
* HTTP(S) based sync protocol.
Mine uses Subversions WebDAV as is.
* All files, both on the client and the server, are stored as plain files
with there original names.
Mine too, or plain binary 'as is'
* Stores limited version history on the server only. Has limited support
for file versioning.
Mine: Only current version is stored on the client. Works out if 'clash' is
about to happen.
* No web/graphical UI Designed to perform a single function only, provides
a minimal command line interface.
Yup, though I'll have a tray piece in time like
https://www.sparkleshare.org/ and DropBox.
* No database dependency Stores file manifest information as regular JSON.
I've metadata stored client side in JSON too.
* Atomic file system operations through journaling.
I've no server side beyond Subversion.
* Supports partial checkouts
Got that too :)
For your README, you'd be better to move the rationale to a separate page,
and concentrate on hooking the potential user in.
- Paul
Paul Hammant
2017-09-27 11:52:57 UTC
Permalink
Post by Robert Hickman
As I have not used SVN for several years I realize that this feature
may have been added. If not has it been considered?
I have a file-sync agent that uses a non-standard Subversion install as a
backing-store over WebDAV. It only keeps one copy on the client side, and
will shuttle all saves around the team that is subscribing to the same
directory in the repo. It obeys permissions of course. It happily moves
10GB files, and like Svn itself can go up to multiple TB in the backend
(I've tested it to 3.4TB). Tech is written in Python. It suits people that
use MS-Office as tool, rather devs doing development.

Can you say more about your usage patterns, the numbers of people who'd use
it, the frequency of change, and the where there users are on the
source-control savvy spectrum?

Regards,

- Paul
Continue reading on narkive:
Loading...