Sunday, January 9, 2011

Converting to Bazaar from CVS

I have been recently playing around with distributed version control systems. I switched my Mockemail project over to git and have been looking heavily into Bazaar. There are many pros and cons to all the various distributed version control systems (or DVCS) out there (bazaar, git, mercurial seem to be the big 3) that I won't go into; but I settled on trying to migrate a CVS project over to Bazaar in the end.

One of the things I was a little frustrated with was the slightly immature tools around for Bazaar. Now with that said I was basically comparing it to the CVS tool set that exists and one thing you can't argue with whatever your feeling on version control is that CVS is certainly mature (it was started in 1987) so I am not sure that this opinion is worth very much.

What follows is my diary of converting a CVS repository to Bazaar. The cvs root in the below is "/mycvsroot" and the module name is "module".

Starting with the authority from the bazaar wiki I first tied the fast import option (it was listed first!) One note is that you need to get an export format first and then you run the import. The idea behind the fast import system is that there is a common import routine and multiple exports from different SCMs.
bzr fast-export-from-cvs /mycvsroot/module /scratch/
bzr: ERROR: Unable to import library "fastimport": No module named fastimport

Ok so this isn't included by default with bazaar. Installing bazaar plugins are pretty easy, anytime I am installing one I am putting it in the users directory and not the global plugins dir.
cd ~/.bazaar/plugins/
bzr checkout lp:bzr-fastimport fastimport
Tried again still getting errors so try running it from python directly as it appears that this plugin is just a wrapper for the python script (I think it started as a bazaar plugin and migrated to a straight python library).
cd /Library/Python/2.5/site-packages/
bzr checkout lp:python-fastimport
cd python-fastimport
mv fastimport/ ../
rm python-fastimport
bzr fast-export-from-cvs /mycvsroot/module /scratch/
bzr: ERROR: cvs2bzr missing. Please install cvs2svn 2.30 or later and try again.

Ok, nice clear error message there, so I went to download and install (sudo make install) and try again:
bzr fast-export-from-cvs /mycvsroot/module /scratch/
Executing cvs2bzr --dumpfile /scratch/ /mycvsroot/module ...
ERROR: cvs2svn uses the anydbm package, which depends on lower level dbm
libraries. Your system has dbm, with which cvs2svn is known to have
problems. To use cvs2svn, you must install a Python dbm library other than
dumbdbm or dbm. See
for more information.
Export to /scratch/ exited with error code 1.
At this point I should note that I am running this on a mac as this is apparently an apple specific problem (I found that out later).

So first I tried:
sudo port install gdbm
No luck.
Googling around I found some very involved instructions....python from source, editing make files recompiling again at look for the section "How do I get cvs2svn to run on OS X 10.5.5?". Follow these to the letter, no short cuts worked for me.

I was ready to try again and now I get this error:
ERROR: Git output requires a default commit username

Googling around I came across this error. So I gave up on the plugin and tried using cvs2bzr directly instead of the python script.

After recompiling everything (including cvs2bzr) I edited the cvs2bzr-example.options file to make the necessary config settings. This is a big file so creating it from scratch would not be the best idea. I Only changed the target dir and the source dir, look for the line:
# The file in which to write the "fastimport" stream:
Changed the line directly underneath to be a path that I want the output file in. Then look for the following:
# The filesystem path to the part of the CVS repository (*not* a
# CVS working copy) that should be converted. This may be a
# subdirectory (i.e., a module) within a larger CVS repository.

Now try running it:
cvs2bzr --options=cvs2bzr-example.options
and I got a whole bunch of error messages of the form:
Error summary:
ERROR: A CVS repository cannot contain both /path/file.txt,v and /path/Attic/file.txt,v
Exited due to fatal error(s).
These are almost certainly due to suspect cvs administration. I removed the files from the attic and tried again. This time I received a number of encoding problems:
WARNING: Problem decoding log message:
Some log message.

ERROR: There were warnings converting author names and/or log messages
to Unicode (see messages above). Please restart this pass
with one or more '--encoding' parameters or with
The percentage of these messages was very small so I was not too concerned with them getting garbled so I tried setting the fallback encoding and tried again:
cvs2bzr --options=cvs2bzr-example.options --fallback-encoding=utf_8
This comand takes a LOOOOOOOOOONG time. The last message in the log was:
Time for pass15 (IndexSymbolsPass): 6.067 seconds.
----- pass 16 (OutputPass) -----
I eventually gave up. Re-reading the config file I found that there are some perfirmance options you can play with in the cvs2bzr-example.options file.
Trying it again; waited 10 hours (I kicked it off before going to bed [I am not that patient to actually wait 10 hours]) before giving up. Knowing what I know now I think that it had not crashed but rather the processing of the repository takes even more time. More on this later, but I was ready to try something else.

I proceeded on to cvsp-import option for converting a bazaar repository to CVS. So install the plugin the same way as fastimport and gave it a try:
cd ~/.bazaar/plugins/
bzr checkout lp:bzr-cvsps-import cvsps_import
bzr cvsps-import /mycvsroot/ module newbzrmodule
Processed 0 patches (0 new, 0 existing) on 0 branches (0 tags) in 0.1s (0.00 patch/s)
bzr: ERROR: exceptions.AttributeError: 'ProgressTask' object has no attribute 'note'

Traceback (most recent call last):
File "/Library/Python/2.5/site-packages/bzrlib/", line 917, in exception_to_return_code
return the_callable(*args, **kwargs)
File "/Library/Python/2.5/site-packages/bzrlib/", line 1117, in run_bzr
ret = run(*run_argv)
File "/Library/Python/2.5/site-packages/bzrlib/", line 691, in run_argv_aliases
File "/Library/Python/2.5/site-packages/bzrlib/", line 710, in run
return self._operation.run_simple(*args, **kwargs)
File "/Library/Python/2.5/site-packages/bzrlib/", line 135, in run_simple
self.cleanups, self.func, *args, **kwargs)
File "/Library/Python/2.5/site-packages/bzrlib/", line 165, in _do_with_cleanups
result = func(*args, **kwargs)
File "/Users/bstevens/.bazaar/plugins/cvsps_import/", line 95, in run
File "/Users/bstevens/.bazaar/plugins/cvsps_import/cvsps/", line 1272, in process
patchsets = self._parse_cvsps_dump(pb=pb)
File "/Users/bstevens/.bazaar/plugins/cvsps_import/cvsps/", line 1195, in _parse_cvsps_dump
pb.note('Creating cvsps dump file: %s', cvsps_dump_path)
AttributeError: 'ProgressTask' object has no attribute 'note'

In the end I needed to make a couple changes to the code to get it to run and finally got the below error:
bzr: ERROR: Could not run command: "cvsps --cvs-direct -A -u -q --root /mycvsroot/ module"
do you have cvsps installed?
[Errno 2] No such file or directory
Well nice error message so I install cvsps from:

make, then sudo make install. No problems there.

Try again with some further code problems that I had to make some changes for, and after that I also ran into some issues with commit messages that could not be interpreted
"bzr: ERROR: exceptions.ValueError: Invalid value for commit message:"
I modified the plugin to log a message for every commit it could not interpret instead of failing and fixed some of the other issues I had (probably related to either a new version of python or bazaar) and got it to run. Again I let it run all night and it had not finished. The nice thing with this plugin, however, is that it gives you a progress indicator. Based off this I decided to let it run; in the end it took 17 hours to finish. Based off this I believe that the cvs2bzr would have finished if I had given it more time, but I don't have any proof of that. Also a nice feature of cvsps import is that if it gets interupted it will pick up where it left off; this is very nice when the process takes 17 hours.

All told this was a pretty painful process, but in the end it worked out ok. My patch set as determined by cvsps for CVS was in the 20,000 range and I was running this on a 2.4 GHz dual core macbook and had all the cvs files local (no pserver or ssh). I was however able to convert the repository in one weekend without any meaningful errors (about 10 commit messages were lost out of that 20,000) and kept all the history, branches and tags. I will try and get my code changes up somewhere after I clean them up a bit.