How to combine files from different svn locations into a working copy?�

Discussion:

Bo Berglund

2017-12-09 17:02:20 UTC

I have finally made a test conversion of our CVS(nt) repositories
(note1) and now I need to grasp how to work with our code via svn.
Specifically on the PC programming side we are using a number of
common source files in many different projects.

This has been solved in CVS by creating entries in the CVSROOT/modules
file which define a "virtual module" where individual files are
collected from different physical modules (or projects in svn
language). Such a module is given a name of its own, which is to the
user equivalent to a project name and can be checked out.

We have many such instances in our PC repository...

When a user checks out this virtual module he will get a working copy
looking according to the definition in the modules file similar to
this:

vProject
|--- src (all sources from the physical project module)
|--- bin (the binary output from the project)
|--- cmn (selected files from a "Common" project module)
|--- lib (maybe some selected binaries needed by the project)

etc...

I have looked in the SVN-book and the closest I get is what is
described in chapter "Sparse Directories", but it describes how the
developer can run a series of manually defined checkouts using the
--depth cmd line arguments followed by a number of updates where a
wanted file is given as the argument one by one.

This is really inconvenient because it means that the full project
cannot be checked out by someone not knowing these dependencies and it
is also awkward because there are so many individual checkouts/updates
to be done to build the working copy.

Is there no mechanism mimicking the modules handling in CVS?

Or must one include a batch file (on Windows) which contains all of
the operations one must perform to create a working copy?
Such a batch file could in theory execute all the individual
operations to create the working copy starting from the checked out
project itself only.
But it would be yet another component file to write and maintain and
everyone must know how to use it (once)....

IMPORTANT:
We do not want to *copy* the needed files from the common projects
into a new project and add them there to svn because then we lose the
ability to fix bugs in a central location.
If we do then there will be multiple copies of the same file all
around the repository.

Note1)
I had to do these undocumented steps in order to be able to convert
the existing CVS repositories to SVN on an Ubuntu16 Server 16.04 box:
1) Set cvs2svn to use the built-in parser.
Using cvs would not work in the end because cvs2svn tried to
run checkouts, which would not work on a copied repo

2) Delete all files living in the top level of the repos.
Only directories seemed to be allowed.

3) I had to prune the reposiotory of all CVS subdirectories
(where the cvsnt server stores some kind of attributes)

4) I had to add symbol_strategy_rules=global_symbol_strategy_rules,
at the end of the run_options.add_project( loop in the options file

With these steps done the conversion succeeded without error messages.
We are probably lucky this worked in view of the cvs2svn documentation
saying that one risk failure and that it could be helped by using the
cvs as parser, which we could not in fact NOT do...

--
Bo Berglund
Developer in Sweden

Stefan Sperling

2017-12-09 17:25:21 UTC

Permalink

Post by Bo Berglund
When a user checks out this virtual module he will get a working copy
looking according to the definition in the modules file similar to
vProject
|--- src (all sources from the physical project module)
|--- bin (the binary output from the project)
|--- cmn (selected files from a "Common" project module)
|--- lib (maybe some selected binaries needed by the project)
etc...
I have looked in the SVN-book and the closest I get is what is
described in chapter "Sparse Directories",

Sparse directories are for a different use case: When you want
to omit some files and/or directories from a checkout.

The closest equivalent to what you describe are svn externals:
http://svnbook.red-bean.com/nightly/en/svn.advanced.externals.html

Nathan Hartman

2017-12-10 03:54:33 UTC

Permalink

Post by Bo Berglund
I have finally made a test conversion of our CVS(nt) repositories
(note1) and now I need to grasp how to work with our code via svn.
Specifically on the PC programming side we are using a number of
common source files in many different projects.

FWIW in our firm we use svn externals for this. All our projects are in a monorepo (single monolithic svn repository). Each project has its own trunk/branches/tags directories. The shared code is organized as "libraries" which does not necessarily mean DLLs, but which does mean that they are treated like any other project, with their own trunk/branches/tags, their own version number (not related to the revision number of the repository), development rules/policies, and release cycle. When a larger encompassing project uses such a "library," somewhere under that larger project's trunk directory is a subdirectory with the svn:externals property set to fetch the code of the dependency. Usually we fetch only the "src" subdirectory of the library, because tests, documentation, etc., are not needed. The following is important: in the text of this property, we use the caret (^) syntax rather than specifying an absolute URL so that if the repository is ever moved to a different server, the links will not be broken (i.e., checking out old revisions will work correctly) -- this works because of our monorepo; furthermore we use (IIRC) the @ syntax to specify the exact revision of the external code that we want to use. This is so that the library can continue evolving on its own schedule without causing breakage in dependent projects, which evolve on their own different schedule. When a project decides to update to a newer version of the library, it can do so by changing the external property to refer to the newer revision.

Since you asked about sparse checkouts... Note that because all our projects, including such "libraries," are in a monorepo, we can take advantage of Subversion's atomic commits to do global cross-project refactoring such as renaming an identifier in a library and updating all code that will be affected throughout all dependent projects in one atomic commit transaction. This usually requires a checkout that encompasses all such code, which in our case would be huge. One consequence of each project having its own trunk/branches/tags is that a full repo checkout will not only fetch all projects, it will fetch every tagged revision and every branch of every project. This is where sparse checkouts (the "telescoping" feature -- see the --depth argument) come in handy. Currently there is no viewspec to automatically checkout a specific layout (I think someone is working on adding that; not sure though) so we have some Windows .bat files and Unix shell scripts that put together these kinds of checkouts -- generally this means getting each project (the parent directory of the trunk/branches/tags) with --depth=immediates, then updating its trunk directory to --depth=infinity. (I am writing this from memory so please excuse if the argument names are wrong.)

Hope this is helpful. The svn book explains these features quite well.

Bo Berglund

2018-01-02 15:38:23 UTC

Permalink

On Sat, 09 Dec 2017 18:02:20 +0100, Bo Berglund

Post by Bo Berglund
vProject
|--- src (all sources from the physical project module)
|--- bin (the binary output from the project)
|--- cmn (selected files from a "Common" project module)
|--- lib (maybe some selected binaries needed by the project)

So I have found that using externals in the "normal" projects sort of
solves my main problem of including library and common files into
different real projects.
However I have now run into a slightly different problem:
When I declare the external I have not found a way to limit the depth
of recursion such that only the top directory of the external gets
checked out..

I use an open-source library in several projects and it contains a doc
subdirectory with essentially a website of html and image files. I
don't want these 77 files to be checked out as externals if I can
avoid it, but limit to the 8-10 source files in the top level code
directory.

Can this be done using some externals flag?

Or should I create a branch of the library where I only hold the top
level directory (svn remove the doc dir in the branch) and use this as
the external?

--
Bo Berglund
Developer in Sweden

Johan Corveleyn

2018-01-02 16:31:13 UTC

Permalink

Post by Bo Berglund
On Sat, 09 Dec 2017 18:02:20 +0100, Bo Berglund

So I have found that using externals in the "normal" projects sort of
solves my main problem of including library and common files into
different real projects.
When I declare the external I have not found a way to limit the depth
of recursion such that only the top directory of the external gets
checked out..
I use an open-source library in several projects and it contains a doc
subdirectory with essentially a website of html and image files. I
don't want these 77 files to be checked out as externals if I can
avoid it, but limit to the 8-10 source files in the top level code
directory.
Can this be done using some externals flag?

No, I'm afraid that's not possible at the moment. See
https://issues.apache.org/jira/browse/SVN-3216.

--
Johan

Nathan Hartman

2018-01-02 17:28:30 UTC

Permalink

Post by Johan Corveleyn

Post by Bo Berglund
On Sat, 09 Dec 2017 18:02:20 +0100, Bo Berglund

No, I'm afraid that's not possible at the moment. See
https://issues.apache.org/jira/browse/SVN-3216.
--
Johan

We developed an internal library which is used by several programs. The library and the programs are all in a single repository, and each is treated as its own self-contained project, meaning they each have their own trunk/branches/tags, their own version number (not to be confused with svn revision numbers), their own release cycle, etc. We use externals to get the library sources into the working copy of the program sources, much like you describe. And we avoid pulling in docs and other non-source files:

The program src directory contains a subdirectory called Externals. On this subdirectory we set the svn:externals property to get all dependencies. Each dependency ends up in its own subdirectory of Externals.

To solve the same issue you're having now, which is to avoid checking out potentially a lot of unnecessary data like docs, etc., we structured the library project to keep all its sources in a src directory; so we have for example:

Library
Library/trunk
Library/trunk/src
Library/trunk/doc
etc.

In our externals property we fetch only that src directory, and we call the working copy, say, Library (or whatever the name of the library is):

Program
Program/trunk
Program/trunk/Externals
Program/trunk/Externals/Library

Since the library you're using comes from a third party and is tracked in your repo, you should see the svn-book section on vendor drops (sorry, I don't have the link handy). Contrary to what the book suggests, I would probably keep the vendor's original distribution separate from our internal customizations, treating them like two different branches (because that's essentially what they are). The first such customization would be to move the sources into a src subdirectory, so we could external it like I described above. Perhaps others have better ideas.

Not sure if others mentioned this (in fact I'm not even sure if I mentioned it) but here are a couple of additional externals-related suggestions:

In the externals property, we use the caret notation '^' which tells svn that the external URL is relative to the repository root. This is because (1) they are in the same repository anyway and (2) we avoid using an absolute URL to avoid future breakage should we move the repository to a different address in the future.

Furthermore even though we are generally fetching a tag (that is, a specific tagged release of our library), we use the '@' notation to fetch the tag in the specific revision where it was created. The reason for this is subtle but important: suppose someone later commits to the tag directory of the library, and suppose that in the future we check out today's revision of the program that uses that tagged version of the library; we want that future checkout to look identical to today's checkout. The '@' notation guarantees that. It also gives us the option, instead of using a tag, to use a specific trunk revision of the library without worrying that the library will change without the program devs knowing. I think that's sometimes done when they hack on both at the same time.

By structuring our repo this way, the library can be developed independently without fear of rippling effects through dependent programs, and the dependent programs can choose to update to a newer version of the library when they're ready to do so.

Hope this helps.