Discussion:
Reference to non-existent node
Davor Josipovic
2018-02-12 07:41:37 UTC
Permalink
It seems to me there is a bug in libapache2-mod-svn/stable,stable,now 1.9.5-1+deb9u1 amd64 [installed].

I described it here:https://superuser.com/questions/1293699/svn-error-occurred-while-committing-the-transaction

I assume this is the correct place to report?

I concerns the error 160014 "Reference to non-existent node". Some changes I can not commit through https://, but I can commit directly through the file:// protocol.

Any idea's?
Johan Corveleyn
2018-02-18 19:39:34 UTC
Permalink
Post by Davor Josipovic
It seems to me there is a bug in libapache2-mod-svn/stable,stable,now
1.9.5-1+deb9u1 amd64 [installed].
I described it
here:https://superuser.com/questions/1293699/svn-error-occurred-while-committing-the-transaction
I assume this is the correct place to report?
I concerns the error 160014 "Reference to non-existent node". Some changes I
can not commit through https://, but I can commit directly through the
file:// protocol.
Any idea's?
Yes, this is the correct place to report such an issue. But to
maximize your chances of people helping you, can you please copy the
relevant contents / context to the mailinglist here, instead of
forcing people to click a link?
--
Johan
Davor Josipovic
2018-02-18 20:58:09 UTC
Permalink
Here you go (copy from the link):

----------------------------------------------------------------------------------------------------------------------------

Can someone help me debug this issue? Everything works (i.e. I can commit, checkout, etc.) but this one commit always fails as described below. I tried checking out in a new repository, and committing, but same issue. Note also that this is a very large repository (500GB), and a very large commit (36k files totaling 10GB).

This is the `apache2` error log:

[Sat Feb 10 03:25:16.262765 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Could not DELETE /svn/repo/!svn/txn/463-e8. [500, #0]
[Sat Feb 10 03:25:16.265881 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] could not abort transaction. [500, #2]
[Sat Feb 10 03:25:16.265905 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Transaction '463-e8' cleanup failed [500, #2]
[Sat Feb 10 03:25:16.265923 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Can't remove '/mnt/vc/svn/repo/db/transactions/463-e8.txn/node._ji.0' [500, #2]
[Sat Feb 10 03:25:16.265940 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Can't remove file '/mnt/vc/svn/repo/db/transactions/463-e8.txn/node._ji.0': No such file or directory [500, #2]
[Sat Feb 10 03:25:30.640591 2018] [dav:error] [pid 2966] [client X.X.X.X:61712] Could not MERGE resource "/svn/repo/!svn/txn/463-e8" into "/svn/repo/repofolder". [500, #0]
[Sat Feb 10 03:25:30.642889 2018] [dav:error] [pid 2966] [client X.X.X.X:61712] An error occurred while committing the transaction. [500, #160014]
[Sat Feb 10 03:25:30.643003 2018] [dav:error] [pid 2966] [client X.X.X.X:61712] Reference to non-existent node '_1bqk.0.t463-e8' in filesystem '/mnt/vc/svn/repo/db' [500, #160014]

The client side gives:

done
Committing transaction...
svn: E175012: Commit failed (details follow):
svn: E175012: Connection timed out
svn: E200042: Additional errors:
svn: E175002: Unexpected server error 500 'Internal Server Error' on '/svn/repo/!svn/txn/463-e8'

First I thought this was a permission issue. So I did this on the server:

su - www-data -s /bin/bash -c "svnadmin rmtxns /mnt/vc/svn/repo/ 463-e8"
Transaction '463-e8' removed.

No error whatsoever. Apache runs as `www-data`. Following permissions are set:

chown www-data:www-data /mnt/vc -R
chmod 0774 /mnt/vc -R

I have run `svnadmin verify` and found no problems. `dmesg` doesn't show anything new.

The apache svn config is pretty much default.

KeepAlive On
MaxKeepAliveRequests 0
# Set to 10h.
Timeout 36000

SVNCompressionLevel 5
SVNInMemoryCacheSize 16384
SVNCacheTextDeltas On
SVNCacheFullTexts On
SVNAllowBulkUpdates Prefer

<Location /svn>
DAV svn
SVNParentPath /mnt/vc/svn
SVNListParentPath On

# Allow large request
LimitXMLRequestBody 0
</Location>

Extra info:

Server: Debian 9.3
apache2/stable,stable,now 2.4.25-3+deb9u3 amd64 [installed]
libapache2-mod-svn/stable,stable,now 1.9.5-1+deb9u1 amd64 [installed]
Server svn client: 1.9.5-1+deb9u1
Client: svn 1.9.7

**EDIT:** I copied the working copy to the repository server. Then I tried to commit, but had the same error. I then relocated the working copy to direct repository location: `svn relocate file:///mnt/vc/svn/repo`. Commit worked this time. So it has to do something with `dav_svn`?

----------------------------------------------------------------------------------------------------------------------------
Post by Davor Josipovic
It seems to me there is a bug in libapache2-mod-svn/stable,stable,now
1.9.5-1+deb9u1 amd64 [installed].
I described it
here:https://superuser.com/questions/1293699/svn-error-occurred-while-committing-the-transaction
I assume this is the correct place to report?
I concerns the error 160014 "Reference to non-existent node". Some changes I
can not commit through https://, but I can commit directly through the
file:// protocol.
Any idea's?
Yes, this is the correct place to report such an issue. But to
maximize your chances of people helping you, can you please copy the
relevant contents / context to the mailinglist here, instead of
forcing people to click a link?
--
Johan
Johan Corveleyn
2018-02-19 21:48:37 UTC
Permalink
Post by Davor Josipovic
----------------------------------------------------------------------------------------------------------------------------
Can someone help me debug this issue? Everything works (i.e. I can commit,
checkout, etc.) but this one commit always fails as described below. I tried
checking out in a new repository, and committing, but same issue. Note also
that this is a very large repository (500GB), and a very large commit (36k
files totaling 10GB).
[Sat Feb 10 03:25:16.262765 2018] [dav:error] [pid 3339] [client
X.X.X.X:61908] Could not DELETE /svn/repo/!svn/txn/463-e8. [500, #0]
[Sat Feb 10 03:25:16.265881 2018] [dav:error] [pid 3339] [client
X.X.X.X:61908] could not abort transaction. [500, #2]
[Sat Feb 10 03:25:16.265905 2018] [dav:error] [pid 3339] [client
X.X.X.X:61908] Transaction '463-e8' cleanup failed [500, #2]
[Sat Feb 10 03:25:16.265923 2018] [dav:error] [pid 3339] [client
X.X.X.X:61908] Can't remove
'/mnt/vc/svn/repo/db/transactions/463-e8.txn/node._ji.0' [500, #2]
[Sat Feb 10 03:25:16.265940 2018] [dav:error] [pid 3339] [client
X.X.X.X:61908] Can't remove file
'/mnt/vc/svn/repo/db/transactions/463-e8.txn/node._ji.0': No such file or
directory [500, #2]
[Sat Feb 10 03:25:30.640591 2018] [dav:error] [pid 2966] [client
X.X.X.X:61712] Could not MERGE resource "/svn/repo/!svn/txn/463-e8" into
"/svn/repo/repofolder". [500, #0]
[Sat Feb 10 03:25:30.642889 2018] [dav:error] [pid 2966] [client
X.X.X.X:61712] An error occurred while committing the transaction. [500,
#160014]
[Sat Feb 10 03:25:30.643003 2018] [dav:error] [pid 2966] [client
X.X.X.X:61712] Reference to non-existent node '_1bqk.0.t463-e8' in
filesystem '/mnt/vc/svn/repo/db' [500, #160014]
done
Committing transaction...
svn: E175012: Connection timed out
svn: E175002: Unexpected server error 500 'Internal Server Error' on
'/svn/repo/!svn/txn/463-e8'
su - www-data -s /bin/bash -c "svnadmin rmtxns /mnt/vc/svn/repo/ 463-e8"
Transaction '463-e8' removed.
chown www-data:www-data /mnt/vc -R
chmod 0774 /mnt/vc -R
I have run `svnadmin verify` and found no problems. `dmesg` doesn't show anything new.
The apache svn config is pretty much default.
KeepAlive On
MaxKeepAliveRequests 0
# Set to 10h.
Timeout 36000
SVNCompressionLevel 5
SVNInMemoryCacheSize 16384
SVNCacheTextDeltas On
SVNCacheFullTexts On
SVNAllowBulkUpdates Prefer
<Location /svn>
DAV svn
SVNParentPath /mnt/vc/svn
SVNListParentPath On
# Allow large request
LimitXMLRequestBody 0
</Location>
Server: Debian 9.3
apache2/stable,stable,now 2.4.25-3+deb9u3 amd64 [installed]
libapache2-mod-svn/stable,stable,now 1.9.5-1+deb9u1 amd64 [installed]
Server svn client: 1.9.5-1+deb9u1
Client: svn 1.9.7
**EDIT:** I copied the working copy to the repository server. Then I tried
to commit, but had the same error. I then relocated the working copy to
direct repository location: `svn relocate file:///mnt/vc/svn/repo`. Commit
worked this time. So it has to do something with `dav_svn`?
I must say I don't immediately have an idea why this fails with
mod_dav_svn, and not with file:///.

I did compare with httpd.conf settings we have set on our server (but
I must say we never had such large commits). These are our relevant
settings:

[[[
MaxKeepAliveRequests 20000
KeepAliveTimeout 60
LimitXMLRequestBody 0
MaxMemFree 16384

SVNInMemoryCacheSize 131072
SVNCacheFullTexts off
SVNCacheTextDeltas on
]]]

Maybe you can experiment with some of these settings to see if it
makes a difference?
--
Johan
Philip Martin
2018-02-20 12:09:32 UTC
Permalink
Post by Davor Josipovic
[Sat Feb 10 03:25:30.640591 2018] [dav:error] [pid 2966] [client
X.X.X.X:61712] Could not MERGE resource "/svn/repo/!svn/txn/463-e8" into
"/svn/repo/repofolder". [500, #0]
[Sat Feb 10 03:25:30.642889 2018] [dav:error] [pid 2966] [client
X.X.X.X:61712] An error occurred while committing the transaction. [500,
#160014]
[Sat Feb 10 03:25:30.643003 2018] [dav:error] [pid 2966] [client
X.X.X.X:61712] Reference to non-existent node '_1bqk.0.t463-e8' in
filesystem '/mnt/vc/svn/repo/db' [500, #160014]
done
Committing transaction...
svn: E175012: Connection timed out
svn: E175002: Unexpected server error 500 'Internal Server Error' on
'/svn/repo/!svn/txn/463-e8'
su - www-data -s /bin/bash -c "svnadmin rmtxns /mnt/vc/svn/repo/ 463-e8"
Transaction '463-e8' removed.
chown www-data:www-data /mnt/vc -R
chmod 0774 /mnt/vc -R
The server error suggests that at least one of the files that make up
the transaction is missing from disk when the MERGE is attempted.
That's odd, it is not a commonly seen error.

Even more odd is that you show "svnadmin rmtxns" succeeding. This is
odd, very odd, because when a MERGE fails the server code in mod_dav_svn
reacts by removing the transaction from disk. In addition, when the
client gets the failed MERGE repsonse it also reacts and sends a DELETE
to remove the transaction from disk; this DELETE usually fails with 404:

[dav:error] [pid 13161:tid 140441001391872] [client ::1:44218] Could not fetch resource information. [404, #0]
[dav:error] [pid 13161:tid 140441001391872] [client ::1:44218] Named transaction doesn't exist. [404, #175002]

Do you see the DELETE in the log after the failed MERGE? Was there an
error?

After a failed MERGE an attempt to run "svnadmin rmtxns" manually would
normally fail with:

E160007: No such transaction '463-e8'

but your attempt succeeded so at least some of the transaction was
present on disk.

So it looks like several strange things have happened:

- one of the files on disk that make up the transaction is missing
causing the MERGE to fail

- both automatic attempts to delete the transaction on disk have failed
causing the transaction to persist on disk

- whatever caused the transaction to persist on disk doesn't prevent
"svnadmin rmtxns" from working

The "_1bqk" implies that this transaction affects over 60,000 nodes,
this is large but Subversion should handle it.

What sort of filesystem hosts the repository?
--
Philip
Davor Josipovic
2018-02-20 16:26:22 UTC
Permalink
Post by Philip Martin
Do you see the DELETE in the log after the failed MERGE? Was there an
error?
The apache2 log I posted is the whole log for that day as far as I recall. There was nothing else. You see the DELETE error before the MERGE error. I think that is due to a timing issue. In my previous tries (there were transaction 463-e1 up to 463-e9...) the MERGE would sometimes show before the DELETE in the error log. That is what let me believe that MERGE error was primary, then the DELETE error (even though it seems the other way around in the apache2 log).
Post by Philip Martin
What sort of filesystem hosts the repository?
The filesystem for the reported transaction 463-e8 is EXT4 (noatime, nodiratime, data=ordered). Prior to that, the repository was on a NTFS formatted drive. The error was the same, but since I thought NTFS was the culprit, I moved to EXT4.

I have also tried with a older client (1.8, cf. stakoverflow question edit history), but the error was the same serverside (and slightly different clientside).

Since then I had some 20 commits. No issues...
Post by Philip Martin
Post by Davor Josipovic
[Sat Feb 10 03:25:30.640591 2018] [dav:error] [pid 2966] [client
X.X.X.X:61712] Could not MERGE resource "/svn/repo/!svn/txn/463-e8" into
"/svn/repo/repofolder". [500, #0]
[Sat Feb 10 03:25:30.642889 2018] [dav:error] [pid 2966] [client
X.X.X.X:61712] An error occurred while committing the transaction. [500,
#160014]
[Sat Feb 10 03:25:30.643003 2018] [dav:error] [pid 2966] [client
X.X.X.X:61712] Reference to non-existent node '_1bqk.0.t463-e8' in
filesystem '/mnt/vc/svn/repo/db' [500, #160014]
done
Committing transaction...
svn: E175012: Connection timed out
svn: E175002: Unexpected server error 500 'Internal Server Error' on
'/svn/repo/!svn/txn/463-e8'
su - www-data -s /bin/bash -c "svnadmin rmtxns /mnt/vc/svn/repo/ 463-e8"
Transaction '463-e8' removed.
chown www-data:www-data /mnt/vc -R
chmod 0774 /mnt/vc -R
The server error suggests that at least one of the files that make up
the transaction is missing from disk when the MERGE is attempted.
That's odd, it is not a commonly seen error.

Even more odd is that you show "svnadmin rmtxns" succeeding. This is
odd, very odd, because when a MERGE fails the server code in mod_dav_svn
reacts by removing the transaction from disk. In addition, when the
client gets the failed MERGE repsonse it also reacts and sends a DELETE
to remove the transaction from disk; this DELETE usually fails with 404:

[dav:error] [pid 13161:tid 140441001391872] [client ::1:44218] Could not fetch resource information. [404, #0]
[dav:error] [pid 13161:tid 140441001391872] [client ::1:44218] Named transaction doesn't exist. [404, #175002]

Do you see the DELETE in the log after the failed MERGE? Was there an
error?

After a failed MERGE an attempt to run "svnadmin rmtxns" manually would
normally fail with:

E160007: No such transaction '463-e8'

but your attempt succeeded so at least some of the transaction was
present on disk.

So it looks like several strange things have happened:

- one of the files on disk that make up the transaction is missing
causing the MERGE to fail

- both automatic attempts to delete the transaction on disk have failed
causing the transaction to persist on disk

- whatever caused the transaction to persist on disk doesn't prevent
"svnadmin rmtxns" from working

The "_1bqk" implies that this transaction affects over 60,000 nodes,
this is large but Subversion should handle it.

What sort of filesystem hosts the repository?
--
Philip
Philip Martin
2018-02-20 18:14:33 UTC
Permalink
Post by Davor Josipovic
Post by Philip Martin
Do you see the DELETE in the log after the failed MERGE? Was there an
error?
The apache2 log I posted is the whole log for that day as far as I
recall. There was nothing else. You see the DELETE error before the
MERGE error. I think that is due to a timing issue. In my previous
tries (there were transaction 463-e1 up to 463-e9...) the MERGE would
sometimes show before the DELETE in the error log. That is what let me
believe that MERGE error was primary, then the DELETE error (even
though it seems the other way around in the apache2 log).
Sorry, missed that. The DELETE error is interesting:

[Sat Feb 10 03:25:16.265923 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Can't remove '/mnt/vc/svn/repo/db/transactions/463-e8.txn/node._ji.0' [500, #2]
[Sat Feb 10 03:25:16.265940 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Can't remove file '/mnt/vc/svn/repo/db/transactions/463-e8.txn/node._ji.0': No such file or directory [500, #2]

DELETE uses low-level IO rather than parsing the transaction files.
Subversion simply does readdir() to get a list of children and then
loops deleting them. If readdir() returns a name that does not exist
then either the disk/filesystem is corrupt or some other process has
deleted the file between the readdir() and the delete attempt.

Corruption seems unlikely since you say you have moved filesystems. We
are left with some other process deleting files while apache is
attempting to access them.

Do you have some other process running that mointors the filesystem or
the repository? Does it actively delete files?
Post by Davor Josipovic
Post by Philip Martin
What sort of filesystem hosts the repository?
The filesystem for the reported transaction 463-e8 is EXT4 (noatime,
nodiratime, data=ordered). Prior to that, the repository was on a NTFS
formatted drive. The error was the same, but since I thought NTFS was
the culprit, I moved to EXT4.
Are you running Linux or Windows? Is the disk local or networked?
--
Philip
Davor Josipovic
2018-02-20 18:38:03 UTC
Permalink
Post by Philip Martin
Are you running Linux or Windows? Is the disk local or networked?
Server is Debian 9.3. Disk is mounted through fstab with options noatime,nodiratime,data=ordered,nofail.
Post by Philip Martin
Do you have some other process running that mointors the filesystem or
the repository? Does it actively delete files?
I am quite sure there is nothing of that kind. Server is a minimal Debian install. Only Apache processes run as www-data. There is only a post-commit hook for Trac (which was disabled during at least one try. Also, post-commit shouldn't be called until commit finishes.) Furthermore, that mount is only given access to dav_svn in apache2 config.
Post by Philip Martin
Post by Philip Martin
Do you see the DELETE in the log after the failed MERGE? Was there an
error?
The apache2 log I posted is the whole log for that day as far as I
recall. There was nothing else. You see the DELETE error before the
MERGE error. I think that is due to a timing issue. In my previous
tries (there were transaction 463-e1 up to 463-e9...) the MERGE would
sometimes show before the DELETE in the error log. That is what let me
believe that MERGE error was primary, then the DELETE error (even
though it seems the other way around in the apache2 log).
Sorry, missed that. The DELETE error is interesting:

[Sat Feb 10 03:25:16.265923 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Can't remove '/mnt/vc/svn/repo/db/transactions/463-e8.txn/node._ji.0' [500, #2]
[Sat Feb 10 03:25:16.265940 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Can't remove file '/mnt/vc/svn/repo/db/transactions/463-e8.txn/node._ji.0': No such file or directory [500, #2]

DELETE uses low-level IO rather than parsing the transaction files.
Subversion simply does readdir() to get a list of children and then
loops deleting them. If readdir() returns a name that does not exist
then either the disk/filesystem is corrupt or some other process has
deleted the file between the readdir() and the delete attempt.

Corruption seems unlikely since you say you have moved filesystems. We
are left with some other process deleting files while apache is
attempting to access them.

Do you have some other process running that mointors the filesystem or
the repository? Does it actively delete files?
Post by Philip Martin
Post by Philip Martin
What sort of filesystem hosts the repository?
The filesystem for the reported transaction 463-e8 is EXT4 (noatime,
nodiratime, data=ordered). Prior to that, the repository was on a NTFS
formatted drive. The error was the same, but since I thought NTFS was
the culprit, I moved to EXT4.
Are you running Linux or Windows? Is the disk local or networked?
--
Philip
Davor Josipovic
2018-02-20 18:38:02 UTC
Permalink
Post by Philip Martin
Are you running Linux or Windows? Is the disk local or networked?
Server is Debian 9.3. Disk is mounted through fstab with options noatime,nodiratime,data=ordered,nofail.
Post by Philip Martin
Do you have some other process running that mointors the filesystem or
the repository? Does it actively delete files?
I am quite sure there is nothing of that kind. Server is a minimal Debian install. Only Apache processes run as www-data. There is only a post-commit hook for Trac (which was disabled during at least one try. Also, post-commit shouldn't be called until commit finishes.) Furthermore, that mount is only given access to dav_svn in apache2 config.
Post by Philip Martin
Post by Philip Martin
Do you see the DELETE in the log after the failed MERGE? Was there an
error?
The apache2 log I posted is the whole log for that day as far as I
recall. There was nothing else. You see the DELETE error before the
MERGE error. I think that is due to a timing issue. In my previous
tries (there were transaction 463-e1 up to 463-e9...) the MERGE would
sometimes show before the DELETE in the error log. That is what let me
believe that MERGE error was primary, then the DELETE error (even
though it seems the other way around in the apache2 log).
Sorry, missed that. The DELETE error is interesting:

[Sat Feb 10 03:25:16.265923 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Can't remove '/mnt/vc/svn/repo/db/transactions/463-e8.txn/node._ji.0' [500, #2]
[Sat Feb 10 03:25:16.265940 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Can't remove file '/mnt/vc/svn/repo/db/transactions/463-e8.txn/node._ji.0': No such file or directory [500, #2]

DELETE uses low-level IO rather than parsing the transaction files.
Subversion simply does readdir() to get a list of children and then
loops deleting them. If readdir() returns a name that does not exist
then either the disk/filesystem is corrupt or some other process has
deleted the file between the readdir() and the delete attempt.

Corruption seems unlikely since you say you have moved filesystems. We
are left with some other process deleting files while apache is
attempting to access them.

Do you have some other process running that mointors the filesystem or
the repository? Does it actively delete files?
Post by Philip Martin
Post by Philip Martin
What sort of filesystem hosts the repository?
The filesystem for the reported transaction 463-e8 is EXT4 (noatime,
nodiratime, data=ordered). Prior to that, the repository was on a NTFS
formatted drive. The error was the same, but since I thought NTFS was
the culprit, I moved to EXT4.
Are you running Linux or Windows? Is the disk local or networked?
--
Philip
Philip Martin
2018-02-20 22:11:12 UTC
Permalink
Post by Davor Josipovic
Post by Philip Martin
Are you running Linux or Windows? Is the disk local or networked?
Server is Debian 9.3. Disk is mounted through fstab with options
noatime,nodiratime,data=ordered,nofail.
Post by Philip Martin
Do you have some other process running that mointors the filesystem or
the repository? Does it actively delete files?
I am quite sure there is nothing of that kind. Server is a minimal
Debian install. Only Apache processes run as www-data. There is only a
post-commit hook for Trac (which was disabled during at least one
try. Also, post-commit shouldn't be called until commit finishes.)
Furthermore, that mount is only given access to dav_svn in apache2
config.
I am at a bit of a loss. Something appears to be deleting files from
underneath Subversion. The file that goes missing while processing the
DELETE request was almost certainly present when we started processing
the DELETE, since readdir() returned its name, but a very short time
later was not present. The code that implements DELETE is relatively
simple and in the log you gave it is not even running in the same
process that processed the MERGE. It could be a Subversion bug but I
struggle to see what sort of bug would lead to this effect.
--
Philip
Davor Josipovic
2018-02-21 12:47:37 UTC
Permalink
Post by Philip Martin
I am at a bit of a loss. Something appears to be deleting files from
underneath Subversion. The file that goes missing while processing the
DELETE request was almost certainly present when we started processing
the DELETE, since readdir() returned its name, but a very short time
later was not present. The code that implements DELETE is relatively
simple and in the log you gave it is not even running in the same
process that processed the MERGE. It could be a Subversion bug but I
struggle to see what sort of bug would lead to this effect.
You make the PID sound very important, and with good reason too! I just went back through the logs to make sure that I didn't mess up the PID before posting. I didn't. In each case, the PID was different. This might explain the time difference. Now, as far as I understand apache2, this can only be caused by:

# After MaxConnectionsPerChild
# connections, the child process will die. If MaxConnectionsPerChild is 0, then
# the process will never expire.
# Setting MaxConnectionsPerChild to a non-zero value limits the amount of memory that
# process can consume by (accidental) memory leakage.
<IfModule mpm_prefork_module>
StartServers 1
MinSpareServers 0
MaxSpareServers 1
MaxRequestWorkers 50
MaxConnectionsPerChild 50
</IfModule>

These were also the settings I played with during my trials. MaxConnectionsPerChild is set to a limited number to make sure that no memory leaks explode.

I'll look further into this later. The default Debian apache2 install came with mpm_event enabled. According to my logs I switched to mpm_prefork later (due to performance reasons recommended for an other module).

PS Sorry for the double-postings. Live is double-sending for some reason.
Post by Philip Martin
Post by Philip Martin
Are you running Linux or Windows? Is the disk local or networked?
Server is Debian 9.3. Disk is mounted through fstab with options
noatime,nodiratime,data=ordered,nofail.
Post by Philip Martin
Do you have some other process running that mointors the filesystem or
the repository? Does it actively delete files?
I am quite sure there is nothing of that kind. Server is a minimal
Debian install. Only Apache processes run as www-data. There is only a
post-commit hook for Trac (which was disabled during at least one
try. Also, post-commit shouldn't be called until commit finishes.)
Furthermore, that mount is only given access to dav_svn in apache2
config.
I am at a bit of a loss. Something appears to be deleting files from
underneath Subversion. The file that goes missing while processing the
DELETE request was almost certainly present when we started processing
the DELETE, since readdir() returned its name, but a very short time
later was not present. The code that implements DELETE is relatively
simple and in the log you gave it is not even running in the same
process that processed the MERGE. It could be a Subversion bug but I
struggle to see what sort of bug would lead to this effect.
--
Philip
Davor Josipovic
2018-02-21 12:53:48 UTC
Permalink
Post by Philip Martin
I am at a bit of a loss. Something appears to be deleting files from
underneath Subversion. The file that goes missing while processing the
DELETE request was almost certainly present when we started processing
the DELETE, since readdir() returned its name, but a very short time
later was not present. The code that implements DELETE is relatively
simple and in the log you gave it is not even running in the same
process that processed the MERGE. It could be a Subversion bug but I
struggle to see what sort of bug would lead to this effect.
You make the PID sound very important, and with good reason too! I just went back through the logs to make sure that I didn't mess up the PID before posting. I didn't. In each case, the PID was different. This might explain the time difference. Now, as far as I understand apache2, this can only be caused by:

# After MaxConnectionsPerChild
# connections, the child process will die. If MaxConnectionsPerChild is 0, then
# the process will never expire.
# Setting MaxConnectionsPerChild to a non-zero value limits the amount of memory that
# process can consume by (accidental) memory leakage.
<IfModule mpm_prefork_module>
StartServers 1
MinSpareServers 0
MaxSpareServers 1
MaxRequestWorkers 50
MaxConnectionsPerChild 50
</IfModule>

These were also the settings I played with during my trials. MaxConnectionsPerChild is set to a limited number to make sure that no memory leaks explode.

I'll look further into this later. The default Debian apache2 install came with mpm_event enabled. According to my logs I switched to mpm_prefork later (due to performance reasons recommended for an other module).

PS Sorry for the double-postings. Live is double-sending for some reason.
Post by Philip Martin
Post by Philip Martin
Are you running Linux or Windows? Is the disk local or networked?
Server is Debian 9.3. Disk is mounted through fstab with options
noatime,nodiratime,data=ordered,nofail.
Post by Philip Martin
Do you have some other process running that mointors the filesystem or
the repository? Does it actively delete files?
I am quite sure there is nothing of that kind. Server is a minimal
Debian install. Only Apache processes run as www-data. There is only a
post-commit hook for Trac (which was disabled during at least one
try. Also, post-commit shouldn't be called until commit finishes.)
Furthermore, that mount is only given access to dav_svn in apache2
config.
I am at a bit of a loss. Something appears to be deleting files from
underneath Subversion. The file that goes missing while processing the
DELETE request was almost certainly present when we started processing
the DELETE, since readdir() returned its name, but a very short time
later was not present. The code that implements DELETE is relatively
simple and in the log you gave it is not even running in the same
process that processed the MERGE. It could be a Subversion bug but I
struggle to see what sort of bug would lead to this effect.
--
Philip
Philip Martin
2018-02-21 15:27:38 UTC
Permalink
Post by Davor Josipovic
You make the PID sound very important, and with good reason too! I
just went back through the logs to make sure that I didn't mess up the
PID before posting. I didn't. In each case, the PID was
different. This might explain the time difference.
The Subversion client can make multiple http connections, particularly
when a commit involves a large number of files. It is not unusual for
multiple PIDs to handle the requests for any one commit.
Post by Davor Josipovic
I'll look further into this later. The default Debian apache2 install
came with mpm_event enabled. According to my logs I switched to
mpm_prefork later (due to performance reasons recommended for an other
module).
Prefork is typically the lowest performace for Subversion. Subversion
server performance will generally be better with a small number of
processes each with a large number of threads as this allows you to
increase the size of the per-process Subversion cache.
--
Philip
Davor Josipovic
2018-05-21 20:05:05 UTC
Permalink
I have finally found the culprit. By coincidence, I had the same configuration running on two servers, one backed with a SSD, the other with a HDD.

The same commit worked on the SSD-backed server, and always failed on the HDD one with the described error.

Monitoring the process revealed that after the file transfer of the commit completes, server starts finalizing. Not sure what exactly happens during that period serverside (maybe someone can explain?), but for a 40k files commit, it took the SSD-backed one to complete in about 30 seconds, while the HDD-backed one struggled more than 30 minutes. During those 30 minutes, the client got no response, so it timed out, which resulted in the described error on the server side.

Anyway, setting http-timeout = 3600 in the "servers" file fixes the issue permanently (tested it 2 times to make sure). I think the default is 10 minutes.

Also, I am quite surprised at the performance penalty when using a HDD at this stage. Any way to tweak this for low memory servers?
Post by Davor Josipovic
You make the PID sound very important, and with good reason too! I
just went back through the logs to make sure that I didn't mess up the
PID before posting. I didn't. In each case, the PID was
different. This might explain the time difference.
The Subversion client can make multiple http connections, particularly
when a commit involves a large number of files. It is not unusual for
multiple PIDs to handle the requests for any one commit.
Post by Davor Josipovic
I'll look further into this later. The default Debian apache2 install
came with mpm_event enabled. According to my logs I switched to
mpm_prefork later (due to performance reasons recommended for an other
module).
Prefork is typically the lowest performace for Subversion. Subversion
server performance will generally be better with a small number of
processes each with a large number of threads as this allows you to
increase the size of the per-process Subversion cache.
--
Philip
Nathan Hartman
2018-05-21 23:17:03 UTC
Permalink
Post by Davor Josipovic
I have finally found the culprit. By coincidence, I had the same
configuration running on two servers, one backed with a SSD, the other with
a HDD.
The same commit worked on the SSD-backed server, and always failed on the
HDD one with the described error.
Monitoring the process revealed that after the file transfer of the commit
completes, server starts finalizing. Not sure what exactly happens during
that period serverside (maybe someone can explain?), but for a 40k files
commit, it took the SSD-backed one to complete in about 30 seconds, while
the HDD-backed one struggled more than 30 minutes. During those 30 minutes,
the client got no response, so it timed out, which resulted in the
described error on the server side.
(Snip)

Now that is interesting. 40k doesn't seem to be such a large amount of data
for modern computers. Very slow and fragmented hard drive? Or perhaps
there's something else going on that is manifesting this way?
Davor Josipovic
2018-05-22 18:30:58 UTC
Permalink
Now that is interesting. 40k doesn't seem to be such a large amount of data for modern computers. Very slow and fragmented hard drive? Or perhaps there's something else going on that is manifesting this way?
The HDD is indeed on the slowside, and together with low memory...

But I think this also show how I/O intensive SVN is. On the client side, for each committed file, one copy is placed in .svn folder, and an other copy in a temporary folder (which is deleted after file transfer in v1.9). So for each file committed, a double copy is made client-side. This temporary copy is really necessary?

Server-side, I see similar disk bashing. For each committed file, max 2 (?) copies are made in the transaction directory.

So any way to reduce the I/O?
Johan Corveleyn
2018-05-25 14:28:42 UTC
Permalink
Post by Davor Josipovic
Post by Nathan Hartman
Now that is interesting. 40k doesn't seem to be such a large amount of
data for modern computers. Very slow and fragmented hard drive? Or perhaps
there's something else going on that is manifesting this way?
The HDD is indeed on the slowside, and together with low memory...
But I think this also show how I/O intensive SVN is. On the client side, for
each committed file, one copy is placed in .svn folder, and an other copy in
a temporary folder (which is deleted after file transfer in v1.9). So for
each file committed, a double copy is made client-side. This temporary copy
is really necessary?
Server-side, I see similar disk bashing. For each committed file, max 2 (?)
copies are made in the transaction directory.
So any way to reduce the I/O?
A couple of improvements have been made in 1.10. It would be
interesting to test this same scenario with a 1.10 client (to verify
the client-side improvements) and if possible also with a 1.10 server,
with a dumped+loaded FSFS back-end (to get an idea of any server-side
improvements).

Client-side there were some changes to make the (http) client stream
svndiff deltas without creating temporary files (during 'commit' and
'import'):
https://svn.apache.org/r1803143
https://svn.apache.org/r1803144
https://svn.apache.org/r1804819

Server-side, LZ4 compression was added as the default compression
algorithm in the FSFS back-end, and on the wire (not sure if that
helps in your case ... it won't impact the number of files that are
created temporarily I suppose):
https://subversion.apache.org/docs/release-notes/1.10.html#lz4

Also, maybe a commit such as this one helps too (though it seems to
only talk about reads):
https://svn.apache.org/r1759399

As always, probably further improvements can be made I guess, but
someone will need to dig into it.

If you'd be able to test with 1.10 both client and server-side, that
would give some more data points about where we are with the latest
state of things ...

Finally, one last tip: this (rather old) performance tuning page
suggests putting the "transactions" folder on a separate (faster)
filesystem to speed up commits:
https://www.orcaware.com/svn/wiki/Server_performance_tuning_for_Linux_and_Unix#Reducing_the_number_of_writes

I don't know how realistic that is in your setup (plus you need to
make sure that transactions filesystem is large enough for your
commits).
--
Johan
Davor Josipovic
2018-05-22 18:31:07 UTC
Permalink
Now that is interesting. 40k doesn't seem to be such a large amount of data for modern computers. Very slow and fragmented hard drive? Or perhaps there's something else going on that is manifesting this way?
The HDD is indeed on the slowside, and together with low memory...

But I think this also show how I/O intensive SVN is. On the client side, for each committed file, one copy is placed in .svn folder, and an other copy in a temporary folder (which is deleted after file transfer in v1.9). So for each file committed, a double copy is made client-side. This temporary copy is really necessary?

Server-side, I see similar disk bashing. For each committed file, max 2 (?) copies are made in the transaction directory.

So any way to reduce the I/O?
Davor Josipovic
2018-05-22 18:36:33 UTC
Permalink
Now that is interesting. 40k doesn't seem to be such a large amount of data for modern computers. Very slow and fragmented hard drive? Or perhaps there's something else going on that is manifesting this way?
The HDD is indeed on the slowside, and together with low memory...

But I think this also show how I/O intensive SVN is. On the client side, for each committed file, one copy is placed in .svn folder, and an other copy in a temporary folder (which is deleted after file transfer in v1.9). So for each file committed, a double copy is made client-side. This temporary copy is really necessary?

Server-side, I see similar disk bashing. For each committed file, max 2 (?) copies are made in the transaction directory.

So any way to reduce the I/O?
Nico Kadel-Garcia
2018-05-23 02:20:30 UTC
Permalink
Post by Davor Josipovic
Post by Nathan Hartman
Now that is interesting. 40k doesn't seem to be such a large amount of
data for modern computers. Very slow and fragmented hard drive? Or perhaps
there's something else going on that is manifesting this way?
Post by Davor Josipovic
The HDD is indeed on the slowside, and together with low memory...
But I think this also show how I/O intensive SVN is. On the client side,
for each committed file, one copy is placed in .svn folder, and an other
copy in a temporary folder (which is deleted after file transfer in v1.9).
So for each file committed, a double copy is made client-side. This
temporary copy is really necessary?

I think it shows how I/O intensive using 40,000 small files is. Especially
if they are in the same directory, many filesystems get increasingly
unhappy as they try to manage that many files in one directory.
Davor Josipovic
2018-05-21 20:12:37 UTC
Permalink
I have finally found the culprit. By coincidence, I had the same configuration running on two servers, one backed with a SSD, the other with a HDD.

The same commit worked on the SSD-backed server, and always failed on the HDD one with the described error.

Monitoring the process revealed that after the file transfer of the commit completes, server starts finalizing. Not sure what exactly happens during that period serverside (maybe someone can explain?), but for a 40k files commit, it took the SSD-backed one to complete in about 30 seconds, while the HDD-backed one struggled more than 30 minutes. During those 30 minutes, the client got no response, so it timed out, which resulted in the described error on the server side.

Anyway, setting http-timeout = 3600 in the "servers" file fixes the issue permanently (tested it 2 times to make sure). I think the default is 10 minutes.

Also, I am quite surprised at the performance penalty when using a HDD at this stage. Any way to tweak this for low memory servers?
Post by Davor Josipovic
You make the PID sound very important, and with good reason too! I
just went back through the logs to make sure that I didn't mess up the
PID before posting. I didn't. In each case, the PID was
different. This might explain the time difference.
The Subversion client can make multiple http connections, particularly
when a commit involves a large number of files. It is not unusual for
multiple PIDs to handle the requests for any one commit.
Post by Davor Josipovic
I'll look further into this later. The default Debian apache2 install
came with mpm_event enabled. According to my logs I switched to
mpm_prefork later (due to performance reasons recommended for an other
module).
Prefork is typically the lowest performace for Subversion. Subversion
server performance will generally be better with a small number of
processes each with a large number of threads as this allows you to
increase the size of the per-process Subversion cache.
--
Philip
Davor Josipovic
2018-05-21 20:12:47 UTC
Permalink
I have finally found the culprit. By coincidence, I had the same configuration running on two servers, one backed with a SSD, the other with a HDD.

The same commit worked on the SSD-backed server, and always failed on the HDD one with the described error.

Monitoring the process revealed that after the file transfer of the commit completes, server starts finalizing. Not sure what exactly happens during that period serverside (maybe someone can explain?), but for a 40k files commit, it took the SSD-backed one to complete in about 30 seconds, while the HDD-backed one struggled more than 30 minutes. During those 30 minutes, the client got no response, so it timed out, which resulted in the described error on the server side.

Anyway, setting http-timeout = 3600 in the "servers" file fixes the issue permanently (tested it 2 times to make sure). I think the default is 10 minutes.

Also, I am quite surprised at the performance penalty when using a HDD at this stage. Any way to tweak this for low memory servers?
Post by Davor Josipovic
You make the PID sound very important, and with good reason too! I
just went back through the logs to make sure that I didn't mess up the
PID before posting. I didn't. In each case, the PID was
different. This might explain the time difference.
The Subversion client can make multiple http connections, particularly
when a commit involves a large number of files. It is not unusual for
multiple PIDs to handle the requests for any one commit.
Post by Davor Josipovic
I'll look further into this later. The default Debian apache2 install
came with mpm_event enabled. According to my logs I switched to
mpm_prefork later (due to performance reasons recommended for an other
module).
Prefork is typically the lowest performace for Subversion. Subversion
server performance will generally be better with a small number of
processes each with a large number of threads as this allows you to
increase the size of the per-process Subversion cache.
--
Philip
Davor Josipovic
2018-02-21 12:54:01 UTC
Permalink
Post by Philip Martin
I am at a bit of a loss. Something appears to be deleting files from
underneath Subversion. The file that goes missing while processing the
DELETE request was almost certainly present when we started processing
the DELETE, since readdir() returned its name, but a very short time
later was not present. The code that implements DELETE is relatively
simple and in the log you gave it is not even running in the same
process that processed the MERGE. It could be a Subversion bug but I
struggle to see what sort of bug would lead to this effect.
You make the PID sound very important, and with good reason too! I just went back through the logs to make sure that I didn't mess up the PID before posting. I didn't. In each case, the PID was different. This might explain the time difference. Now, as far as I understand apache2, this can only be caused by:

# After MaxConnectionsPerChild
# connections, the child process will die. If MaxConnectionsPerChild is 0, then
# the process will never expire.
# Setting MaxConnectionsPerChild to a non-zero value limits the amount of memory that
# process can consume by (accidental) memory leakage.
<IfModule mpm_prefork_module>
StartServers 1
MinSpareServers 0
MaxSpareServers 1
MaxRequestWorkers 50
MaxConnectionsPerChild 50
</IfModule>

These were also the settings I played with during my trials. MaxConnectionsPerChild is set to a limited number to make sure that no memory leaks explode.

I'll look further into this later. The default Debian apache2 install came with mpm_event enabled. According to my logs I switched to mpm_prefork later (due to performance reasons recommended for an other module).

PS Sorry for the double-postings. Live is double-sending for some reason.
Post by Philip Martin
Post by Philip Martin
Are you running Linux or Windows? Is the disk local or networked?
Server is Debian 9.3. Disk is mounted through fstab with options
noatime,nodiratime,data=ordered,nofail.
Post by Philip Martin
Do you have some other process running that mointors the filesystem or
the repository? Does it actively delete files?
I am quite sure there is nothing of that kind. Server is a minimal
Debian install. Only Apache processes run as www-data. There is only a
post-commit hook for Trac (which was disabled during at least one
try. Also, post-commit shouldn't be called until commit finishes.)
Furthermore, that mount is only given access to dav_svn in apache2
config.
I am at a bit of a loss. Something appears to be deleting files from
underneath Subversion. The file that goes missing while processing the
DELETE request was almost certainly present when we started processing
the DELETE, since readdir() returned its name, but a very short time
later was not present. The code that implements DELETE is relatively
simple and in the log you gave it is not even running in the same
process that processed the MERGE. It could be a Subversion bug but I
struggle to see what sort of bug would lead to this effect.
--
Philip
Davor Josipovic
2018-02-20 18:38:03 UTC
Permalink
Post by Philip Martin
Are you running Linux or Windows? Is the disk local or networked?
Server is Debian 9.3. Disk is mounted through fstab with options noatime,nodiratime,data=ordered,nofail.
Post by Philip Martin
Do you have some other process running that mointors the filesystem or
the repository? Does it actively delete files?
I am quite sure there is nothing of that kind. Server is a minimal Debian install. Only Apache processes run as www-data. There is only a post-commit hook for Trac (which was disabled during at least one try. Also, post-commit shouldn't be called until commit finishes.) Furthermore, that mount is only given access to dav_svn in apache2 config.
Post by Philip Martin
Post by Philip Martin
Do you see the DELETE in the log after the failed MERGE? Was there an
error?
The apache2 log I posted is the whole log for that day as far as I
recall. There was nothing else. You see the DELETE error before the
MERGE error. I think that is due to a timing issue. In my previous
tries (there were transaction 463-e1 up to 463-e9...) the MERGE would
sometimes show before the DELETE in the error log. That is what let me
believe that MERGE error was primary, then the DELETE error (even
though it seems the other way around in the apache2 log).
Sorry, missed that. The DELETE error is interesting:

[Sat Feb 10 03:25:16.265923 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Can't remove '/mnt/vc/svn/repo/db/transactions/463-e8.txn/node._ji.0' [500, #2]
[Sat Feb 10 03:25:16.265940 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Can't remove file '/mnt/vc/svn/repo/db/transactions/463-e8.txn/node._ji.0': No such file or directory [500, #2]

DELETE uses low-level IO rather than parsing the transaction files.
Subversion simply does readdir() to get a list of children and then
loops deleting them. If readdir() returns a name that does not exist
then either the disk/filesystem is corrupt or some other process has
deleted the file between the readdir() and the delete attempt.

Corruption seems unlikely since you say you have moved filesystems. We
are left with some other process deleting files while apache is
attempting to access them.

Do you have some other process running that mointors the filesystem or
the repository? Does it actively delete files?
Post by Philip Martin
Post by Philip Martin
What sort of filesystem hosts the repository?
The filesystem for the reported transaction 463-e8 is EXT4 (noatime,
nodiratime, data=ordered). Prior to that, the repository was on a NTFS
formatted drive. The error was the same, but since I thought NTFS was
the culprit, I moved to EXT4.
Are you running Linux or Windows? Is the disk local or networked?
--
Philip
Davor Josipovic
2018-02-20 16:31:27 UTC
Permalink
Post by Philip Martin
Do you see the DELETE in the log after the failed MERGE? Was there an
error?
The apache2 log I posted is the whole log for that day as far as I recall. There was nothing else. You see the DELETE error before the MERGE error. I think that is due to a timing issue. In my previous tries (there were transaction 463-e1 up to 463-e9...) the MERGE would sometimes show before the DELETE in the error log. That is what let me believe that MERGE error was primary, then the DELETE error (even though it seems the other way around in the apache2 log).
Post by Philip Martin
What sort of filesystem hosts the repository?
The filesystem for the reported transaction 463-e8 is EXT4 (noatime, nodiratime, data=ordered). Prior to that, the repository was on a NTFS formatted drive. The error was the same, but since I thought NTFS was the culprit, I moved to EXT4.

I have also tried with a older client (1.8, cf. stakoverflow question edit history), but the error was the same serverside (and slightly different clientside).

Since then I had some 20 commits. No issues...
Post by Philip Martin
Post by Davor Josipovic
[Sat Feb 10 03:25:30.640591 2018] [dav:error] [pid 2966] [client
X.X.X.X:61712] Could not MERGE resource "/svn/repo/!svn/txn/463-e8" into
"/svn/repo/repofolder". [500, #0]
[Sat Feb 10 03:25:30.642889 2018] [dav:error] [pid 2966] [client
X.X.X.X:61712] An error occurred while committing the transaction. [500,
#160014]
[Sat Feb 10 03:25:30.643003 2018] [dav:error] [pid 2966] [client
X.X.X.X:61712] Reference to non-existent node '_1bqk.0.t463-e8' in
filesystem '/mnt/vc/svn/repo/db' [500, #160014]
done
Committing transaction...
svn: E175012: Connection timed out
svn: E175002: Unexpected server error 500 'Internal Server Error' on
'/svn/repo/!svn/txn/463-e8'
su - www-data -s /bin/bash -c "svnadmin rmtxns /mnt/vc/svn/repo/ 463-e8"
Transaction '463-e8' removed.
chown www-data:www-data /mnt/vc -R
chmod 0774 /mnt/vc -R
The server error suggests that at least one of the files that make up
the transaction is missing from disk when the MERGE is attempted.
That's odd, it is not a commonly seen error.

Even more odd is that you show "svnadmin rmtxns" succeeding. This is
odd, very odd, because when a MERGE fails the server code in mod_dav_svn
reacts by removing the transaction from disk. In addition, when the
client gets the failed MERGE repsonse it also reacts and sends a DELETE
to remove the transaction from disk; this DELETE usually fails with 404:

[dav:error] [pid 13161:tid 140441001391872] [client ::1:44218] Could not fetch resource information. [404, #0]
[dav:error] [pid 13161:tid 140441001391872] [client ::1:44218] Named transaction doesn't exist. [404, #175002]

Do you see the DELETE in the log after the failed MERGE? Was there an
error?

After a failed MERGE an attempt to run "svnadmin rmtxns" manually would
normally fail with:

E160007: No such transaction '463-e8'

but your attempt succeeded so at least some of the transaction was
present on disk.

So it looks like several strange things have happened:

- one of the files on disk that make up the transaction is missing
causing the MERGE to fail

- both automatic attempts to delete the transaction on disk have failed
causing the transaction to persist on disk

- whatever caused the transaction to persist on disk doesn't prevent
"svnadmin rmtxns" from working

The "_1bqk" implies that this transaction affects over 60,000 nodes,
this is large but Subversion should handle it.

What sort of filesystem hosts the repository?
--
Philip
Davor Josipovic
2018-02-20 16:32:55 UTC
Permalink
Post by Philip Martin
Do you see the DELETE in the log after the failed MERGE? Was there an
error?
The apache2 log I posted is the whole log for that day as far as I recall. There was nothing else. You see the DELETE error before the MERGE error. I think that is due to a timing issue. In my previous tries (there were transaction 463-e1 up to 463-e9...) the MERGE would sometimes show before the DELETE in the error log. That is what let me believe that MERGE error was primary, then the DELETE error (even though it seems the other way around in the apache2 log).
Post by Philip Martin
What sort of filesystem hosts the repository?
The filesystem for the reported transaction 463-e8 is EXT4 (noatime, nodiratime, data=ordered). Prior to that, the repository was on a NTFS formatted drive. The error was the same, but since I thought NTFS was the culprit, I moved to EXT4.

I have also tried with a older client (1.8, cf. stakoverflow question edit history), but the error was the same serverside (and slightly different clientside).

Since then I had some 20 commits. No issues...
Post by Philip Martin
Post by Davor Josipovic
[Sat Feb 10 03:25:30.640591 2018] [dav:error] [pid 2966] [client
X.X.X.X:61712] Could not MERGE resource "/svn/repo/!svn/txn/463-e8" into
"/svn/repo/repofolder". [500, #0]
[Sat Feb 10 03:25:30.642889 2018] [dav:error] [pid 2966] [client
X.X.X.X:61712] An error occurred while committing the transaction. [500,
#160014]
[Sat Feb 10 03:25:30.643003 2018] [dav:error] [pid 2966] [client
X.X.X.X:61712] Reference to non-existent node '_1bqk.0.t463-e8' in
filesystem '/mnt/vc/svn/repo/db' [500, #160014]
done
Committing transaction...
svn: E175012: Connection timed out
svn: E175002: Unexpected server error 500 'Internal Server Error' on
'/svn/repo/!svn/txn/463-e8'
su - www-data -s /bin/bash -c "svnadmin rmtxns /mnt/vc/svn/repo/ 463-e8"
Transaction '463-e8' removed.
chown www-data:www-data /mnt/vc -R
chmod 0774 /mnt/vc -R
The server error suggests that at least one of the files that make up
the transaction is missing from disk when the MERGE is attempted.
That's odd, it is not a commonly seen error.

Even more odd is that you show "svnadmin rmtxns" succeeding. This is
odd, very odd, because when a MERGE fails the server code in mod_dav_svn
reacts by removing the transaction from disk. In addition, when the
client gets the failed MERGE repsonse it also reacts and sends a DELETE
to remove the transaction from disk; this DELETE usually fails with 404:

[dav:error] [pid 13161:tid 140441001391872] [client ::1:44218] Could not fetch resource information. [404, #0]
[dav:error] [pid 13161:tid 140441001391872] [client ::1:44218] Named transaction doesn't exist. [404, #175002]

Do you see the DELETE in the log after the failed MERGE? Was there an
error?

After a failed MERGE an attempt to run "svnadmin rmtxns" manually would
normally fail with:

E160007: No such transaction '463-e8'

but your attempt succeeded so at least some of the transaction was
present on disk.

So it looks like several strange things have happened:

- one of the files on disk that make up the transaction is missing
causing the MERGE to fail

- both automatic attempts to delete the transaction on disk have failed
causing the transaction to persist on disk

- whatever caused the transaction to persist on disk doesn't prevent
"svnadmin rmtxns" from working

The "_1bqk" implies that this transaction affects over 60,000 nodes,
this is large but Subversion should handle it.

What sort of filesystem hosts the repository?
--
Philip
Davor Josipovic
2018-02-18 21:04:08 UTC
Permalink
Here you go (copy from the link):

----------------------------------------------------------------------------------------------------------------------------

Can someone help me debug this issue? Everything works (i.e. I can commit, checkout, etc.) but this one commit always fails as described below. I tried checking out in a new repository, and committing, but same issue. Note also that this is a very large repository (500GB), and a very large commit (36k files totaling 10GB).

This is the `apache2` error log:

[Sat Feb 10 03:25:16.262765 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Could not DELETE /svn/repo/!svn/txn/463-e8. [500, #0]
[Sat Feb 10 03:25:16.265881 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] could not abort transaction. [500, #2]
[Sat Feb 10 03:25:16.265905 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Transaction '463-e8' cleanup failed [500, #2]
[Sat Feb 10 03:25:16.265923 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Can't remove '/mnt/vc/svn/repo/db/transactions/463-e8.txn/node._ji.0' [500, #2]
[Sat Feb 10 03:25:16.265940 2018] [dav:error] [pid 3339] [client X.X.X.X:61908] Can't remove file '/mnt/vc/svn/repo/db/transactions/463-e8.txn/node._ji.0': No such file or directory [500, #2]
[Sat Feb 10 03:25:30.640591 2018] [dav:error] [pid 2966] [client X.X.X.X:61712] Could not MERGE resource "/svn/repo/!svn/txn/463-e8" into "/svn/repo/repofolder". [500, #0]
[Sat Feb 10 03:25:30.642889 2018] [dav:error] [pid 2966] [client X.X.X.X:61712] An error occurred while committing the transaction. [500, #160014]
[Sat Feb 10 03:25:30.643003 2018] [dav:error] [pid 2966] [client X.X.X.X:61712] Reference to non-existent node '_1bqk.0.t463-e8' in filesystem '/mnt/vc/svn/repo/db' [500, #160014]

The client side gives:

done
Committing transaction...
svn: E175012: Commit failed (details follow):
svn: E175012: Connection timed out
svn: E200042: Additional errors:
svn: E175002: Unexpected server error 500 'Internal Server Error' on '/svn/repo/!svn/txn/463-e8'

First I thought this was a permission issue. So I did this on the server:

su - www-data -s /bin/bash -c "svnadmin rmtxns /mnt/vc/svn/repo/ 463-e8"
Transaction '463-e8' removed.

No error whatsoever. Apache runs as `www-data`. Following permissions are set:

chown www-data:www-data /mnt/vc -R
chmod 0774 /mnt/vc -R

I have run `svnadmin verify` and found no problems. `dmesg` doesn't show anything new.

The apache svn config is pretty much default.

KeepAlive On
MaxKeepAliveRequests 0
# Set to 10h.
Timeout 36000

SVNCompressionLevel 5
SVNInMemoryCacheSize 16384
SVNCacheTextDeltas On
SVNCacheFullTexts On
SVNAllowBulkUpdates Prefer

<Location /svn>
DAV svn
SVNParentPath /mnt/vc/svn
SVNListParentPath On

# Allow large request
LimitXMLRequestBody 0
</Location>

Extra info:

Server: Debian 9.3
apache2/stable,stable,now 2.4.25-3+deb9u3 amd64 [installed]
libapache2-mod-svn/stable,stable,now 1.9.5-1+deb9u1 amd64 [installed]
Server svn client: 1.9.5-1+deb9u1
Client: svn 1.9.7

**EDIT:** I copied the working copy to the repository server. Then I tried to commit, but had the same error. I then relocated the working copy to direct repository location: `svn relocate file:///mnt/vc/svn/repo`. Commit worked this time. So it has to do something with `dav_svn`?

----------------------------------------------------------------------------------------------------------------------------
Post by Davor Josipovic
It seems to me there is a bug in libapache2-mod-svn/stable,stable,now
1.9.5-1+deb9u1 amd64 [installed].
I described it
here:https://superuser.com/questions/1293699/svn-error-occurred-while-committing-the-transaction
I assume this is the correct place to report?
I concerns the error 160014 "Reference to non-existent node". Some changes I
can not commit through https://, but I can commit directly through the
file:// protocol.
Any idea's?
Yes, this is the correct place to report such an issue. But to
maximize your chances of people helping you, can you please copy the
relevant contents / context to the mailinglist here, instead of
forcing people to click a link?
--
Johan
Bo Berglund
2018-05-22 05:13:53 UTC
Permalink
Post by Nathan Hartman
Post by Davor Josipovic
I have finally found the culprit. By coincidence, I had the same
configuration running on two servers, one backed with a SSD, the
other with a HDD.
The same commit worked on the SSD-backed server, and always failed
on the HDD one with the described error.
Monitoring the process revealed that after the file transfer of the
commit completes, server starts finalizing. Not sure what exactly
happens during that period serverside (maybe someone can explain?),
but for a 40k files commit, it took the SSD-backed one to complete
in about 30 seconds, while the HDD-backed one struggled more than
30 minutes. During those 30 minutes, the client got no response, so
it timed out, which resulted in the described error on the server side.
(Snip)
Now that is interesting. 40k doesn't seem to be such a large amount
of data for modern computers. Very slow and fragmented hard drive? Or
perhaps there's something else going on that is manifesting this way?
I think he meant that he has committed 40.000 individual (small) files
in one go. That is a *lot* of files and in a large capacity HDD it
means wasting a lot of space too since every small file will uses up a
full cluster of disk space. I believe it can be improved (if svn uses
fsfs) by packing the database.
Please see:
http://svnbook.red-bean.com/nightly/en/svn.reposadmin.maint.html#svn.reposadmin.maint.diskspace.fsfspacking
--
Bo Berglund
Developer in Sweden
Continue reading on narkive:
Loading...