Sunday, June 12, 2011

Converting to Bazaar from CVS Take 2 (with some EC2 help)

I decided to take another look at bzr and converting cvs after getting the cvsps-import to work. I got a bunch of advice from bzr experts that fast import is really what I wanted to be looking at when dealing with a large repository.

Step 1 is to setup a large EC2 instance using the ubuntu Natty Narwhal kindly supplied by ubuntu.

I recreated my cvs server directory structure on the ec2 server and uploaded my cvs repository. One note I did need the CVSROOT information as well as the directory of the module I was converting. I now have a nice beefy ubuntu server that looks (to fast-export-from-cvs) like my cvs server. You don't need to actually get it to work as a cvs server, as the conversion tools read the files directly.

Next install the software that will be needed by the conversion.
sudo apt-get install bzr
mkdir -p ~/.bazaar/plugins/
cd ~/.bazaar/plugins/
bzr branch lp:bzr-fastimport fastimport
bzr branch lp:python-fastimport
cd python-fastimport/
sudo python install
sudo apt-get install cvs2svn
sudo apt-get install cvs
And now I was ready to give it a try:
bzr fast-export-from-cvs /mycvsroot/module ~/
I am back to my old friend:
ERROR: Git output requires a default commit username
It turns out that the bzr fast-export-from-cvs is basically just a wrapper for the cvs2bzr program. You can hack around the code to get it to work with a properties file if you wish, but you may also call cvs2bzr directly and then work with the generated "fi" file.

I modified the cvs2bzr-example.options file for cvs2bzr from the source distribution of cvs2bzr (see here for an explanation of command line options vs options file option). I set the fallback_encoding to be ascii on lines 164 and 172; set the source dir to be my cvs repository path including the module name on line 524; finally setup the branches I wanted to be converted.

I did not need most of the branches or tags that existed in the repository I was converting and the time to do them all was the difference of many hours (something I can do overnight vs something I need a weekend for) and an export size difference of 74 gigs to 6 gigs. To get cvs2bzr to only import certain branched you will need to do it backwards so to speak. There is a way of forcing branches to use and another to exclude branches. So you must exclude everything and then specifically force what you want; which becomes the following two lines in the options file where "BRANCH_TO_INCLUDE" is the branch name you want to include:
This will skip the tags as well, but you will need to deal with tags existing on different branches and I didn't need or want them.

You can also use the ctx.trunk_only option to only import the trunk if this suits your particular situation.

Here is the full diff of cvs2svn-example.options (scrubbed a little):
< fallback_encoding='ascii' --- > #fallback_encoding='ascii'
< fallback_encoding='ascii' --- > #fallback_encoding='ascii'
< ForceBranchRegexpStrategyRule(r'BRANCH_TO_INCLUDE.*'), --- > #ForceBranchRegexpStrategyRule(r'branch.*'),
< ExcludeRegexpStrategyRule(r'.*'), --- > #ExcludeRegexpStrategyRule(r'unknown-.*'),
< '/mnt/', --- > 'cvs2svn-tmp/',
< r'/mycvsroot/module', --- > r'test-data/main-cvsrepos',

Now run the export and import. This will export it to a file /mnt/ (which is where I set it to get from above) and then in a separate step will import it into bzr.
cvs2bzr --options=cvs2bzr-example.options
cd /mnt
mkdir module
cd module
bzr init-repo .
cat ../ | bzr fast-import -
You now have a new bzr shared repo in the module folder.

Using a large amazon EC2 instance the entire conversion process took my repository about 3 hours. I highly recommend getting a fresh EC2 instance running a current version of ubuntu and therefore python and bzr. These conversion tools are hit or miss if they will work with your specific version of bzr, but I had 100% better experience running on the latest 11.04 (Natty Narwhal). Furthermore the fast import is much more robust than the cvsps-import module, but is also more complicated. If you have smaller repository try cvsps-import first, if that does not work or if you find this taking too long punt early and move to the industrial grade cvs2svn/cvs2bzr solution. My next step is to set this up to automatically run nightly with a boot strapped puppet setup!


  1. This blog is so nice to me. I will continue to come here again and again. Visit my link as well. Good luck
    cara menggugurkan kandungan

  2. Wao cvs are good method from converting to bazaar from cvs take