Discussion:
DFS alternative for linux
Ruslan Sivak
2006-10-04 13:31:13 UTC
Permalink
I'm looking to keep my working copies syncronized between 2 servers, and
I was wondering if there is something similar to DFS on linux. On
windows we use DFS (Distributed File System) which replicates changes
between 2 shares almost instantly. It detects that files have changed
and initiates replication. Sometimes it's kind of slow for large
changes, but it works perfect for small deployments.

Is there something similar that exists for linux?

Russ
Frank Gruman
2006-10-04 16:47:39 UTC
Permalink
Post by Ruslan Sivak
I'm looking to keep my working copies syncronized between 2 servers,
and I was wondering if there is something similar to DFS on linux. On
windows we use DFS (Distributed File System) which replicates changes
between 2 shares almost instantly. It detects that files have changed
and initiates replication. Sometimes it's kind of slow for large
changes, but it works perfect for small deployments.
Is there something similar that exists for linux?
Russ
---------------------------------------------------------------------
Well - there is the old tried and true rsync. You could also use the
newly released svnsync to replicate your repositories. In fact, I would
much more highly recommend this method. There are other options as well
that could be configured, but I think your best bet is svnsync.

Regards,
Frank
Ruslan Sivak
2006-10-04 17:19:35 UTC
Permalink
Frank,

I'm looking to replicate the working copy, not the repository. And I
want it to happen automatically. We also use DFS replication on another
server that stores images which might be uploaded by the client, and it
replicates the images between the servers.

Russ
Post by Frank Gruman
Post by Ruslan Sivak
I'm looking to keep my working copies syncronized between 2 servers,
and I was wondering if there is something similar to DFS on linux.
On windows we use DFS (Distributed File System) which replicates
changes between 2 shares almost instantly. It detects that files
have changed and initiates replication. Sometimes it's kind of slow
for large changes, but it works perfect for small deployments.
Is there something similar that exists for linux?
Russ
---------------------------------------------------------------------
Well - there is the old tried and true rsync. You could also use the
newly released svnsync to replicate your repositories. In fact, I
would much more highly recommend this method. There are other options
as well that could be configured, but I think your best bet is svnsync.
Regards,
Frank
Trevor Whitlock
2006-10-04 17:23:38 UTC
Permalink
Russ,
The only thing that I could think of would be putting rsync in a cron job
that happens every minute. That's how we have working copies replicated
between mac servers.

-Trevor
Post by Ruslan Sivak
Frank,
I'm looking to replicate the working copy, not the repository. And I
want it to happen automatically. We also use DFS replication on another
server that stores images which might be uploaded by the client, and it
replicates the images between the servers.
Russ
Post by Frank Gruman
Post by Ruslan Sivak
I'm looking to keep my working copies syncronized between 2 servers,
and I was wondering if there is something similar to DFS on linux.
On windows we use DFS (Distributed File System) which replicates
changes between 2 shares almost instantly. It detects that files
have changed and initiates replication. Sometimes it's kind of slow
for large changes, but it works perfect for small deployments.
Is there something similar that exists for linux?
Russ
---------------------------------------------------------------------
Well - there is the old tried and true rsync. You could also use the
newly released svnsync to replicate your repositories. In fact, I
would much more highly recommend this method. There are other options
as well that could be configured, but I think your best bet is svnsync.
Regards,
Frank
---------------------------------------------------------------------
Ruslan Sivak
2006-10-04 17:26:44 UTC
Permalink
That's what I was hoping to avoid. Right now, it's awesome, someone
uploads an image, and it's available usually within a second on the
second server. An rsync would take a while to run with the amount of
data that we have, and it would have to be scheduled, meaning that
users will have to wait the scheduled interval + replication period.

There must be a linux file system I can use which watches for changes
and automatically replicates them. I can't believe that windows would
have something for which there was no equivalent in linux.

Russ
Post by Trevor Whitlock
Russ,
The only thing that I could think of would be putting rsync in a
cron job that happens every minute. That's how we have working copies
replicated between mac servers.
-Trevor
Frank,
I'm looking to replicate the working copy, not the repository. And I
want it to happen automatically. We also use DFS replication on another
server that stores images which might be uploaded by the client, and it
replicates the images between the servers.
Russ
Post by Frank Gruman
Post by Ruslan Sivak
I'm looking to keep my working copies syncronized between 2
servers,
Post by Frank Gruman
Post by Ruslan Sivak
and I was wondering if there is something similar to DFS on linux.
On windows we use DFS (Distributed File System) which replicates
changes between 2 shares almost instantly. It detects that files
have changed and initiates replication. Sometimes it's kind of
slow
Post by Frank Gruman
Post by Ruslan Sivak
for large changes, but it works perfect for small deployments.
Is there something similar that exists for linux?
Russ
---------------------------------------------------------------------
Post by Frank Gruman
Well - there is the old tried and true rsync. You could also
use the
Post by Frank Gruman
newly released svnsync to replicate your repositories. In fact, I
would much more highly recommend this method. There are other
options
Post by Frank Gruman
as well that could be configured, but I think your best bet is
svnsync.
Post by Frank Gruman
Regards,
Frank
---------------------------------------------------------------------
Thomas Harold
2006-10-16 05:45:13 UTC
Permalink
Post by Ruslan Sivak
That's what I was hoping to avoid. Right now, it's awesome, someone
uploads an image, and it's available usually within a second on the
second server. An rsync would take a while to run with the amount of
data that we have, and it would have to be scheduled, meaning that
users will have to wait the scheduled interval + replication period.
There must be a linux file system I can use which watches for changes
and automatically replicates them. I can't believe that windows would
have something for which there was no equivalent in linux.
DRBD? I mean, essentially what you are doing is slaving the second
system off of the first and a lot of the Linux/Unix HA tools would apply.

Or you could setup the 2nd server's working copy to automatically update
from the SVN repository as needed (svn update). Bonus points if you
write a post-commit wrapper script that triggers the process by using
the post-commit e-mails.

(Essentially you would have a mail queue to accept the notification.
Parse it to find out what files/folders changed, then have the 2nd
machine do update commands on just those files/folders to get the latest
versions. That could be very close to real-time.)
Ryan Schmidt
2006-10-04 18:48:49 UTC
Permalink
Post by Ruslan Sivak
I'm looking to replicate the working copy, not the repository. And
I want it to happen automatically. We also use DFS replication on
another server that stores images which might be uploaded by the
client, and it replicates the images between the servers.
I can't help you with Linux, but Mac OS X, which is based on Unix and
makes a great server OS, has ways of monitoring when files change and
then performing an action with those files, so you should be able to
build a replication system with that fairly easily which would be
fairly quick and wouldn't have that delay you'd have with the cron task.

http://www.apple.com/applescript/folderactions/
Madison Kelly
2006-10-04 17:28:08 UTC
Permalink
Post by Ruslan Sivak
I'm looking to keep my working copies syncronized between 2 servers, and
I was wondering if there is something similar to DFS on linux. On
windows we use DFS (Distributed File System) which replicates changes
between 2 shares almost instantly. It detects that files have changed
and initiates replication. Sometimes it's kind of slow for large
changes, but it works perfect for small deployments.
Is there something similar that exists for linux?
Russ
I don't have any personal experience with it, but have you looked into
'code' or 'cifs' file systems? Also, do a general search for 'HA linux'
(HA=High Availability). A while back I was looking into that stuff for
other reasons and, of course, a big part of high-availability services
is having as close to real-time replicated file systems as possible. :)

Sorry I couldn't help more directly!

Madi
Les Mikesell
2006-10-04 18:00:02 UTC
Permalink
Post by Ruslan Sivak
I'm looking to keep my working copies syncronized between 2 servers, and
I was wondering if there is something similar to DFS on linux. On
windows we use DFS (Distributed File System) which replicates changes
between 2 shares almost instantly. It detects that files have changed
and initiates replication. Sometimes it's kind of slow for large
changes, but it works perfect for small deployments.
Is there something similar that exists for linux?
Just curious... Why would you want 2 different working copies
synchronized without committing to the repository and updating?
--
Les Mikesell
***@gmail.com
Ruslan Sivak
2006-10-04 21:20:51 UTC
Permalink
Post by Les Mikesell
Just curious... Why would you want 2 different working copies
synchronized without committing to the repository and updating?
We are using the working copies on the production server for our web
site. They provide easy and fast updates (deployments) to the code.
That seems to me like a good reason for making the only way to
change it be through the repository so you always have a clear
history. I wouldn't want a way to modify production without
committing first unless you have fast-changing binaries like
weather maps. Can't you ssh an 'svn update' on the server when
you want something new to appear there?
We don't modify production (except in an emergency) except through svn.
We do go on and do an update on one of the servers, and currently (as we
are on windows), it propagates the changes to another server. I really
don't want to have to replicate the changes manually, especially if we
are going to add more servers to the farm.

Now so far, most things that I see for linux are as good as the
microsoft product /or better. DFS has it's weaknesses, but it's just
awesome when you have small updates that need to be propagated. I can't
believe that linux doesn't have anything similar.

Russ
Les Mikesell
2006-10-04 21:40:44 UTC
Permalink
This post might be inappropriate. Click to display it.
Ruslan Sivak
2006-10-05 01:10:49 UTC
Permalink
Post by Les Mikesell
Post by Ruslan Sivak
Post by Les Mikesell
Just curious... Why would you want 2 different working copies
synchronized without committing to the repository and updating?
We are using the working copies on the production server for our web
site. They provide easy and fast updates (deployments) to the code.
That seems to me like a good reason for making the only way to
change it be through the repository so you always have a clear
history. I wouldn't want a way to modify production without
committing first unless you have fast-changing binaries like
weather maps. Can't you ssh an 'svn update' on the server when
you want something new to appear there?
We don't modify production (except in an emergency) except through svn.
We do go on and do an update on one of the servers, and currently (as we
are on windows), it propagates the changes to another server. I really
don't want to have to replicate the changes manually, especially if we
are going to add more servers to the farm.
Now so far, most things that I see for linux are as good as the
microsoft product /or better. DFS has it's weaknesses, but it's just
awesome when you have small updates that need to be propagated. I can't
believe that linux doesn't have anything similar.
Since you know when the update needs to be done, do it with
rsync over ssh to as many other places as necessary, or ssh
an 'svn update' command. It is a good idea to wrap these
operations in control scripts from the start so things adding
servers, dropping them out of a load balancer during the update,
etc. can be added if/when needed and the users just run the same
script to make a change.
rsync is still kind of slow on large datasizes. DFS is super slow when
you have a lot of data to sync over, but once the data is there, if you
update 1 file out of 50000 files, it will sync almost instantly. Rsync
will have to check 50000 files.

svn update might be a better solution, and might work, although an
update on a large working copy is still a little slow (which is fine
since I usually know what needs to be updated, and update those folders
specifically). if I had to do an update on each server, on the whole
working copy, that would be pretty slow.

svn update won't work for things like people updating images to the
webserver. The images get uploaded into the working copy, and
eventually i go through and check them into the repo. So the only thing
that might work here would be rsync, but like I mentioned before, that
would be pretty slow.

The best solution would be some sort of filesystem that detects changes
to the filesystem and sends out updates to the other cluster members.
I'm sure there is a filesystem like that out there, I just haven't found
it.

One alternative would be to somehow mount the repository as a folder,
and then have apache sever files off that folder. When people upload
something, it can be written straight into the repo, basically with
webdav. My fear is that this would be kind of slow. Is it possible to
mount the repo in linux as a folder?

Russ
Les Mikesell
2006-10-05 03:37:43 UTC
Permalink
Post by Ruslan Sivak
Post by Les Mikesell
Since you know when the update needs to be done, do it with
rsync over ssh to as many other places as necessary, or ssh
an 'svn update' command. It is a good idea to wrap these
operations in control scripts from the start so things adding
servers, dropping them out of a load balancer during the update,
etc. can be added if/when needed and the users just run the same
script to make a change.
rsync is still kind of slow on large datasizes. DFS is super slow when
you have a lot of data to sync over, but once the data is there, if you
update 1 file out of 50000 files, it will sync almost instantly. Rsync
will have to check 50000 files.
Rsync can do that very quickly if you have sufficient RAM to
have the directory entries for these files in cache - or
if you can restrict the run to a smaller directory containing
the changed files.
Post by Ruslan Sivak
svn update won't work for things like people updating images to the
webserver. The images get uploaded into the working copy, and
eventually i go through and check them into the repo. So the only thing
that might work here would be rsync, but like I mentioned before, that
would be pretty slow.
Again, I wouldn't want this update to happen in production
without the tracking through the repository so you know
what changed and have the ability to back it out. However
if that's what you want, perhaps you can at least restrict
it to a subset of the directories which would speed up
an rsync script.
Post by Ruslan Sivak
The best solution would be some sort of filesystem that detects changes
to the filesystem and sends out updates to the other cluster members.
I'm sure there is a filesystem like that out there, I just haven't found
it.
There is something called GFS which is supposed to work like that.
I've always been happy with the way rsync works, though, at
least on unix-like systems. Files are updated under new
temporary names, then renamed to replace the originals. The
rename is an atomic operation and unix filesystem semantics
allow programs that had opened the previous copy to continue
to access it's data and any subsequent opens get the new copy.
Programs never have to deal with partially modified copies.
Post by Ruslan Sivak
One alternative would be to somehow mount the repository as a folder,
and then have apache sever files off that folder. When people upload
something, it can be written straight into the repo, basically with
webdav. My fear is that this would be kind of slow. Is it possible to
mount the repo in linux as a folder?
If the machines are all on the same lan you could just NFS-mount
the one working copy into all machines and use the repository
as a backup. My server farms are distributed and rsync works
nicely - network issues don't cause any problems with the
servers accessing their own files and an incomplete transfer
would have no immediate effect and would be cleaned up on the
next attempt.
--
Les Mikesell
***@gmail.com
Thomas Harold
2006-10-16 05:50:59 UTC
Permalink
Post by Ruslan Sivak
svn update won't work for things like people updating images to the
webserver. The images get uploaded into the working copy, and
eventually i go through and check them into the repo. So the only thing
that might work here would be rsync, but like I mentioned before, that
would be pretty slow.
Why are you the bottleneck in getting things into the repository? (That
sounds harsh, but I suspect that your workflow might be better if it was
changed slightly?)

Why not let each user add things directly to the repository in a testing
branch and then merge their new images across to the mainline trunk that
holds the production version of the web server?

I haven't tried WebDAV yet (so I can't speak for speed). We went with
PuTTY+TortoiseSVN over SSH+SVN for our repository. Users have local
working copies and push their changes to the central repository server.
Russ
2006-10-16 13:27:09 UTC
Permalink
I don't think you're understanding our environment. We have a code folder and an image folder. Both get checked out to the developers working copy and get worked on, branched, etc. When the working copies get checked out on production, however, the code goes on the web app servers and images go on the images server. Users of the website upload images through the daily use of the site, which I eventually check into the repository.

When a user uploads an image, the image gets synced between the 2 image servers. This happens days or even weeks before I might go in and commit these images into the repo. These images need to be available on the second server as soon as possible, as the servers are load balanced, and we don't want users to are broken images.

So in this scenario there is no sort of svn update solution that I can see (short of updating my code to do a commit everytime an image is uploaded and use some sort of post-commit hook to force an update.)

Its seems there were some solutions suggested as far as what filesystems to use, but it doesn't look like a lot of them are ready for production use. I will go through the list at some point and see if I can spot a winner, but the documentation on most projects seems scarce and vague, and I don't think I'll be able to make a decision without installing a whole bunch of the choices and trying them out.

That is unless someone here who understands what I'm looking for would know something that would be perfect for the job.

Thanks for everyones help so far,


Russ
Sent wirelessly via BlackBerry from T-Mobile.

-----Original Message-----
From: Thomas Harold <***@tgharold.com>
Date: Mon, 16 Oct 2006 01:50:59
To:Ruslan Sivak <***@istandfor.com>
Cc:***@subversion.tigris.org
Subject: Re: DFS alternative for linux
Post by Ruslan Sivak
svn update won't work for things like people updating images to the
webserver. The images get uploaded into the working copy, and
eventually i go through and check them into the repo. So the only thing
that might work here would be rsync, but like I mentioned before, that
would be pretty slow.
Why are you the bottleneck in getting things into the repository? (That
sounds harsh, but I suspect that your workflow might be better if it was
changed slightly?)

Why not let each user add things directly to the repository in a testing
branch and then merge their new images across to the mainline trunk that
holds the production version of the web server?

I haven't tried WebDAV yet (so I can't speak for speed). We went with
PuTTY+TortoiseSVN over SSH+SVN for our repository. Users have local
working copies and push their changes to
Thomas Harold
2006-10-16 15:04:37 UTC
Permalink
Post by Russ
When a user uploads an image, the image gets synced between the 2
image servers. This happens days or even weeks before I might go in
and commit these images into the repo. These images need to be
available on the second server as soon as possible, as the servers
are load balanced, and we don't want users to are broken images.
Hmm... the other option that comes to mind is the Unison project which
offers a bi-directional synchronization. Not sure how responsive it is.
Post by Russ
So in this scenario there is no sort of svn update solution that I
can see (short of updating my code to do a commit everytime an image
is uploaded and use some sort of post-commit hook to force an
update.)
If your uploads happen via a web page (instead of scp/sftp/ftp), writing
some sort of script to trigger an update across the cluster might be a
good solution. In which case you could use rsync to look only at the
directory with new files to push across to the other server.

Otherwise I think you're going to have to hook into something low-level
to trigger the update script.
Post by Russ
Its seems there were some solutions suggested as far as what
filesystems to use, but it doesn't look like a lot of them are ready
for production use. I will go through the list at some point and
see if I can spot a winner, but the documentation on most projects
seems scarce and vague, and I don't think I'll be able to make a
decision without installing a whole bunch of the choices and trying
them out.
That is unless someone here who understands what I'm looking for
would know something that would be perfect for the job.
I suspect you're going to have to go visit High-Availability or
Clustering linux mailing lists / forums. They should be filled with
users who are working with directory synchronization issues across
multiple servers.

Bob Hiestand
2006-10-09 14:41:49 UTC
Permalink
You should look into inotify, which is the linux file change event
notification system. You can write a script to copy any changed file
through rsync or any other method you may prefer.
Post by Ruslan Sivak
I'm looking to keep my working copies syncronized between 2 servers, and
I was wondering if there is something similar to DFS on linux. On
windows we use DFS (Distributed File System) which replicates changes
between 2 shares almost instantly. It detects that files have changed
and initiates replication. Sometimes it's kind of slow for large
changes, but it works perfect for small deployments.
Is there something similar that exists for linux?
Russ
---------------------------------------------------------------------
Dave_Thomas mailing lists
2006-10-09 20:19:00 UTC
Permalink
Linux does in fact have something like that, DRBD. It works with
multimaster replication, I've used this reliably on a high-volume postgres
database.

http://en.wikipedia.org/wiki/DRBD

You will need to set aside a partition on the hard drive (or local raid) on
each system you need to share.

Also be careful to take lots of time to set it up properly and ALWAYS run
exactly the same version on every system!

Good luck,
Dave
Post by Bob Hiestand
You should look into inotify, which is the linux file change event
notification system. You can write a script to copy any changed file
through rsync or any other method you may prefer.
Post by Ruslan Sivak
I'm looking to keep my working copies syncronized between 2 servers, and
I was wondering if there is something similar to DFS on linux. On
windows we use DFS (Distributed File System) which replicates changes
between 2 shares almost instantly. It detects that files have changed
and initiates replication. Sometimes it's kind of slow for large
changes, but it works perfect for small deployments.
Is there something similar that exists for linux?
Russ
---------------------------------------------------------------------
---------------------------------------------------------------------
Ruslan Sivak
2006-10-09 22:11:49 UTC
Permalink
This looks promising... It says somewhere that only one node can be
read/write at a time. Not sure yet if this is going to be a problem.

Thank you for the link, Dave.

Russ
Post by Dave_Thomas mailing lists
Linux does in fact have something like that, DRBD. It works with
multimaster replication, I've used this reliably on a high-volume
postgres database.
http://en.wikipedia.org/wiki/DRBD
You will need to set aside a partition on the hard drive (or local
raid) on each system you need to share.
Also be careful to take lots of time to set it up properly and ALWAYS
run exactly the same version on every system!
Good luck,
Dave
You should look into inotify, which is the linux file change event
notification system. You can write a script to copy any changed file
through rsync or any other method you may prefer.
Post by Ruslan Sivak
I'm looking to keep my working copies syncronized between 2
servers, and
Post by Ruslan Sivak
I was wondering if there is something similar to DFS on linux. On
windows we use DFS (Distributed File System) which replicates
changes
Post by Ruslan Sivak
between 2 shares almost instantly. It detects that files have
changed
Post by Ruslan Sivak
and initiates replication. Sometimes it's kind of slow for large
changes, but it works perfect for small deployments.
Is there something similar that exists for linux?
Russ
---------------------------------------------------------------------
---------------------------------------------------------------------
Loading...