Monday, August 29, 2011

Installing ant with puppet

Java and linux are not necessarily the best of bedfellows. Thats not exactly true as Java apps run great on linux, its more like Java packaging and linux packaging are not the best of bedfellows. Java apps and servers are not kept in the normal places that a linux person would like to see them in (and visa versa) and are generally not packaged as RPMs or debs or insert your favorite package manager here.
This is a hassle when trying to manage something through puppet. With puppet I would like to say install such and such package and be done with it; but because java stuff is not distributed as RPMs this means I had to roll my own. What follows is the strategy I took to manage ant and later applied to other java packages. One note before I go any further is I have started to look at the project. They didn't have a stable setup (at least as I read it but perhaps I am wrong) of ant 1.8, it was in their latest (6.0) repo which was labeled a work in progress. Your millage may vary.

The first thing I did was find a wonderful program called effing package manager. The fine people on the puppet IRC channel turned me on to this. This is a great little program for creating simple RPMs. It took me a bit to setup and install but follow the below instructions and it hopfully should work. This was done on a fresh cloud setup of CentOS 5.6 from Rackspace.

At a high level the setup is an extremely simplified and tailored RPM of ant. This RPM is published and then distributed through an internal yum server. Puppet is then used to install the package and create some symbolic links. The sym links are there so I could switch between the different ant versions as well but they are not really necessary. The final directory structure looks something like this (adding in an old version of ant for clarity).
/usr/ant/default --> sym linked to --> /usr/ant/apache-ant-1.8.2 (for example)
/usr/bin --> sym linked to --> /usr/ant/default/bin/ant

Step one, is to get a machine setup with effing package manager to create the simplified ant rpm.
yum install rpm-build make gcc ruby-devel
gem install fpm
Next download ant from apache and unpack it to /usr/ant/apache-ant-1.8.2, or wherever you want it to be. Note this is written for ant 1.8.2, but it should work for any version of ant just modify accordingly. This will be the directory that your created RPM will install it to.
Note I prefix ant with "my". Replace my if you want it named something else, the important thing is that it is distinguished from "ant" as this package does exist in some yum repos and you want to be able to be clear what you are installing.
mkdir /usr/ant
cd /usr/ant
gunzip apache-ant-1.8.2-bin.tar.gz
tar -xvf apache-ant-1.8.2-bin.tar
fpm -s dir -t rpm -n myant -v -a noarch /usr/ant/apache-ant-1.8.2

Note, I used the ant version number plus an extra point release for the RPM version so that in case I need to change something in my packaging I can increment that last number.

Next setup a new puppet module for ant. The only thing in it is a manifests directory and an init.pp file containing:
class ant {
package {'myant-1.8.2': ensure => installed, alias => "ant"}

file { "/usr/ant/default":
ensure => link,
target => "/usr/ant/apache-ant-1.8.2/bin/ant",
require => Package['myant-1.8.2']

file { "/usr/bin/ant":
ensure => link,
target => "/usr/ant/default/bin/ant",
require => Package['myant-1.8.2']
Next you have to put your RPM up on a yum repo site somewhere. I have been using puppet to manage my yum repo as well so I can move it between environments, this has proven to be very handy. Finally add ant to your class or classes:
include ant
And that's all there is to it.

Friday, July 29, 2011

Puppet managing differences for multiple environments

*** EDIT ***
I have switched to using extlookup for dealing with my different environments.

I spent some time searching around for a way of dealing with a single source of puppet but using it in multiple different environments for mainly small differences. The generally suggested approach is to use your favorite version control system, but I didn't really want to pollute my version control with lots of difference that related to things like DNS servers and other environment specific things.
The solution that I eventually landed on was to put all of the variables that are truly different (there were not a lot, in the end less than 5) between the environments into a single statement for example:
include net.pp
if $network == "network1" {
$puppetserver = 'puppetnet1'
$nameservers = ['']
}elsif $network == "network2" {
$puppetserver = 'puppetnet2'
$nameservers = ['']
This can be placed where ever you want in your setup structure. For example it could go in the nodes.pp or site.pp file if you don't have a better spot for it. I had a separate params.pp already with some other stuff in there which made sense for me.
In a separate file called "net.pp" which I included in the above file I have one line which defines the network varirable, so:
#Define what network this puppet master is serving. Valid values are network1, network2 or network3
I also created a file called net.pp.TEMPLATE (name is not important) that has the above file just without the network defined; this way when setting up a new puppet master or new puppet master environment from the source configuration I have a template to start with (otherwise you have to remember what net.pp is and what to put in it).
#Define what network this puppetmaster is serving. Valid values are network1, network2 or network3
Finally the net.pp is set to be ignored by version control, for example using bzr:
bzr ignore net.pp
Now I don't have merge problems when moving changes between puppet masters, all the variables are versioned and can be edited at the same time instead of at deployment (to be fair this is both good and bad).
The main disadvantage of this approach is that there are these magic variables that seem to appear from nowhere in other areas. In the end however I was pretty happy with this approach, it was nothing ground breaking but allowed for a maintainable setup where I didn't have to deal with merges sprinkled through out the puppet config.

Sunday, June 12, 2011

Converting to Bazaar from CVS Take 2 (with some EC2 help)

I decided to take another look at bzr and converting cvs after getting the cvsps-import to work. I got a bunch of advice from bzr experts that fast import is really what I wanted to be looking at when dealing with a large repository.

Step 1 is to setup a large EC2 instance using the ubuntu Natty Narwhal kindly supplied by ubuntu.

I recreated my cvs server directory structure on the ec2 server and uploaded my cvs repository. One note I did need the CVSROOT information as well as the directory of the module I was converting. I now have a nice beefy ubuntu server that looks (to fast-export-from-cvs) like my cvs server. You don't need to actually get it to work as a cvs server, as the conversion tools read the files directly.

Next install the software that will be needed by the conversion.
sudo apt-get install bzr
mkdir -p ~/.bazaar/plugins/
cd ~/.bazaar/plugins/
bzr branch lp:bzr-fastimport fastimport
bzr branch lp:python-fastimport
cd python-fastimport/
sudo python install
sudo apt-get install cvs2svn
sudo apt-get install cvs
And now I was ready to give it a try:
bzr fast-export-from-cvs /mycvsroot/module ~/
I am back to my old friend:
ERROR: Git output requires a default commit username
It turns out that the bzr fast-export-from-cvs is basically just a wrapper for the cvs2bzr program. You can hack around the code to get it to work with a properties file if you wish, but you may also call cvs2bzr directly and then work with the generated "fi" file.

I modified the cvs2bzr-example.options file for cvs2bzr from the source distribution of cvs2bzr (see here for an explanation of command line options vs options file option). I set the fallback_encoding to be ascii on lines 164 and 172; set the source dir to be my cvs repository path including the module name on line 524; finally setup the branches I wanted to be converted.

I did not need most of the branches or tags that existed in the repository I was converting and the time to do them all was the difference of many hours (something I can do overnight vs something I need a weekend for) and an export size difference of 74 gigs to 6 gigs. To get cvs2bzr to only import certain branched you will need to do it backwards so to speak. There is a way of forcing branches to use and another to exclude branches. So you must exclude everything and then specifically force what you want; which becomes the following two lines in the options file where "BRANCH_TO_INCLUDE" is the branch name you want to include:
This will skip the tags as well, but you will need to deal with tags existing on different branches and I didn't need or want them.

You can also use the ctx.trunk_only option to only import the trunk if this suits your particular situation.

Here is the full diff of cvs2svn-example.options (scrubbed a little):
< fallback_encoding='ascii' --- > #fallback_encoding='ascii'
< fallback_encoding='ascii' --- > #fallback_encoding='ascii'
< ForceBranchRegexpStrategyRule(r'BRANCH_TO_INCLUDE.*'), --- > #ForceBranchRegexpStrategyRule(r'branch.*'),
< ExcludeRegexpStrategyRule(r'.*'), --- > #ExcludeRegexpStrategyRule(r'unknown-.*'),
< '/mnt/', --- > 'cvs2svn-tmp/',
< r'/mycvsroot/module', --- > r'test-data/main-cvsrepos',

Now run the export and import. This will export it to a file /mnt/ (which is where I set it to get from above) and then in a separate step will import it into bzr.
cvs2bzr --options=cvs2bzr-example.options
cd /mnt
mkdir module
cd module
bzr init-repo .
cat ../ | bzr fast-import -
You now have a new bzr shared repo in the module folder.

Using a large amazon EC2 instance the entire conversion process took my repository about 3 hours. I highly recommend getting a fresh EC2 instance running a current version of ubuntu and therefore python and bzr. These conversion tools are hit or miss if they will work with your specific version of bzr, but I had 100% better experience running on the latest 11.04 (Natty Narwhal). Furthermore the fast import is much more robust than the cvsps-import module, but is also more complicated. If you have smaller repository try cvsps-import first, if that does not work or if you find this taking too long punt early and move to the industrial grade cvs2svn/cvs2bzr solution. My next step is to set this up to automatically run nightly with a boot strapped puppet setup!

Thursday, May 5, 2011

Redhat OpenShift

I was at the Redhat Jboss conference this week during the announcement of OpenShift. I met some of the guys behind OpenShift and they were all great and very patiently answered all my questions and you can tell are very exited about OpenShift. So thank you to all of them that spoke with me.

First I started with OpenShift Express which is aimed at developers. The only thing bad I have to say is it doesn't yet support java. They say java is coming so I will wait patiently. There really isn't much to say about it that isn't covered on the main page, but it is simply insanely easy to use. Keeping in mind this is aimed at developers the following makes sense: git is the deployment mechanism. What that means is that once the OpenShift platform receives a "git push" it deploys the application automatically (remember this is the developer version read on for the Flex version).

To get this up an running you install the Redhat command line tools which are written in ruby so they run everywhere (although on my apple leopard I had to reinstall ruby from source to get it to work though this seems to be an OSX 10.5 problem). Also one note on the mac is you will need to install git. I just got this computer last week so I hadn't gotten git on here yet (that's my excuse). Once you run through a couple commands to create your domain and app make a change and run "git push" and your changes are deployed. I sat through about 5 demos where someone changed something on the default app that gets created and it really does take about 3 minutes soup to nuts to get a php app deployed. Also with OpenShift Express Redhat foots the bill for the amazon instances that are running behind everything.

OpenShift Express is a tool for developers, and it is quite possibly something that will really change the way development happens as developers push changes out to public urls and can setup multiple applications that can be used for rapid feedback and qa reviews at the feature level (or at least developer level) very easily. Although I think this is something that can happen with cloud technologies today it is not something that does happen as part of a routine development process...though this is only my anecdotal experience. I wouldn't use OpenShift Express as the primary source of any information including the git repo as there are no SLAs and as I understand it the data is not backed-upped at this point. Depending on how you use this however it shouldn't be a big impediment for adoption, just something to be aware of.

OpenShift Flex is about tools for production deployments (there is also another one coming called OpenShift Power which is not out yet). OpenShift Flex already supports java (both JBoss and Tomcat) and is a product geared towards an operations team as opposed to a developer. I ran though their tutorial and got a sample ear file up and running on JBoss 6 without any trouble very quickly. It runs on the Amazon infrastructure today and more I think will be added later.

Also an extremely cool thing about this whole thing is that running in the same AWS zone you also get a shared file system using Gluster. I don't think any other PaaS does anything like this. Sure there are concepts that solve the problem but dealing with a file is easy and well understood as opposed to a cloud file or an S3 bucket. Sometimes you just want a file is I guess the best way to put it.

I didn't have a lot of time to play with the database scaling and how it scales from one machine to multiple machines and where it puts the database (i.e. going from a collapsed architecture to a multi tier architecture with one db and multiple app servers). Scaling is however all built into the architecture with a couple different scaling options that although look basic are probably going to cover 90% of what you need and the product is only 2 days old. They are also super easy to use.

One final note is that if you want to setup a geographic fail over you will need to do some work yourself around the file systems and the database. Data replication is not built into the product so you would need to set this up yourself by controlling the database tier outside of OpenShift or syncing it in some other manner and you will need to sync the file systems to a separate geographic site. Depending on your app of course this may not be important to you.

Sunday, January 9, 2011

Converting to Bazaar from CVS

I have been recently playing around with distributed version control systems. I switched my Mockemail project over to git and have been looking heavily into Bazaar. There are many pros and cons to all the various distributed version control systems (or DVCS) out there (bazaar, git, mercurial seem to be the big 3) that I won't go into; but I settled on trying to migrate a CVS project over to Bazaar in the end.

One of the things I was a little frustrated with was the slightly immature tools around for Bazaar. Now with that said I was basically comparing it to the CVS tool set that exists and one thing you can't argue with whatever your feeling on version control is that CVS is certainly mature (it was started in 1987) so I am not sure that this opinion is worth very much.

What follows is my diary of converting a CVS repository to Bazaar. The cvs root in the below is "/mycvsroot" and the module name is "module".

Starting with the authority from the bazaar wiki I first tied the fast import option (it was listed first!) One note is that you need to get an export format first and then you run the import. The idea behind the fast import system is that there is a common import routine and multiple exports from different SCMs.
bzr fast-export-from-cvs /mycvsroot/module /scratch/
bzr: ERROR: Unable to import library "fastimport": No module named fastimport

Ok so this isn't included by default with bazaar. Installing bazaar plugins are pretty easy, anytime I am installing one I am putting it in the users directory and not the global plugins dir.
cd ~/.bazaar/plugins/
bzr checkout lp:bzr-fastimport fastimport
Tried again still getting errors so try running it from python directly as it appears that this plugin is just a wrapper for the python script (I think it started as a bazaar plugin and migrated to a straight python library).
cd /Library/Python/2.5/site-packages/
bzr checkout lp:python-fastimport
cd python-fastimport
mv fastimport/ ../
rm python-fastimport
bzr fast-export-from-cvs /mycvsroot/module /scratch/
bzr: ERROR: cvs2bzr missing. Please install cvs2svn 2.30 or later and try again.

Ok, nice clear error message there, so I went to download and install (sudo make install) and try again:
bzr fast-export-from-cvs /mycvsroot/module /scratch/
Executing cvs2bzr --dumpfile /scratch/ /mycvsroot/module ...
ERROR: cvs2svn uses the anydbm package, which depends on lower level dbm
libraries. Your system has dbm, with which cvs2svn is known to have
problems. To use cvs2svn, you must install a Python dbm library other than
dumbdbm or dbm. See
for more information.
Export to /scratch/ exited with error code 1.
At this point I should note that I am running this on a mac as this is apparently an apple specific problem (I found that out later).

So first I tried:
sudo port install gdbm
No luck.
Googling around I found some very involved instructions....python from source, editing make files recompiling again at look for the section "How do I get cvs2svn to run on OS X 10.5.5?". Follow these to the letter, no short cuts worked for me.

I was ready to try again and now I get this error:
ERROR: Git output requires a default commit username

Googling around I came across this error. So I gave up on the plugin and tried using cvs2bzr directly instead of the python script.

After recompiling everything (including cvs2bzr) I edited the cvs2bzr-example.options file to make the necessary config settings. This is a big file so creating it from scratch would not be the best idea. I Only changed the target dir and the source dir, look for the line:
# The file in which to write the "fastimport" stream:
Changed the line directly underneath to be a path that I want the output file in. Then look for the following:
# The filesystem path to the part of the CVS repository (*not* a
# CVS working copy) that should be converted. This may be a
# subdirectory (i.e., a module) within a larger CVS repository.

Now try running it:
cvs2bzr --options=cvs2bzr-example.options
and I got a whole bunch of error messages of the form:
Error summary:
ERROR: A CVS repository cannot contain both /path/file.txt,v and /path/Attic/file.txt,v
Exited due to fatal error(s).
These are almost certainly due to suspect cvs administration. I removed the files from the attic and tried again. This time I received a number of encoding problems:
WARNING: Problem decoding log message:
Some log message.

ERROR: There were warnings converting author names and/or log messages
to Unicode (see messages above). Please restart this pass
with one or more '--encoding' parameters or with
The percentage of these messages was very small so I was not too concerned with them getting garbled so I tried setting the fallback encoding and tried again:
cvs2bzr --options=cvs2bzr-example.options --fallback-encoding=utf_8
This comand takes a LOOOOOOOOOONG time. The last message in the log was:
Time for pass15 (IndexSymbolsPass): 6.067 seconds.
----- pass 16 (OutputPass) -----
I eventually gave up. Re-reading the config file I found that there are some perfirmance options you can play with in the cvs2bzr-example.options file.
Trying it again; waited 10 hours (I kicked it off before going to bed [I am not that patient to actually wait 10 hours]) before giving up. Knowing what I know now I think that it had not crashed but rather the processing of the repository takes even more time. More on this later, but I was ready to try something else.

I proceeded on to cvsp-import option for converting a bazaar repository to CVS. So install the plugin the same way as fastimport and gave it a try:
cd ~/.bazaar/plugins/
bzr checkout lp:bzr-cvsps-import cvsps_import
bzr cvsps-import /mycvsroot/ module newbzrmodule
Processed 0 patches (0 new, 0 existing) on 0 branches (0 tags) in 0.1s (0.00 patch/s)
bzr: ERROR: exceptions.AttributeError: 'ProgressTask' object has no attribute 'note'

Traceback (most recent call last):
File "/Library/Python/2.5/site-packages/bzrlib/", line 917, in exception_to_return_code
return the_callable(*args, **kwargs)
File "/Library/Python/2.5/site-packages/bzrlib/", line 1117, in run_bzr
ret = run(*run_argv)
File "/Library/Python/2.5/site-packages/bzrlib/", line 691, in run_argv_aliases
File "/Library/Python/2.5/site-packages/bzrlib/", line 710, in run
return self._operation.run_simple(*args, **kwargs)
File "/Library/Python/2.5/site-packages/bzrlib/", line 135, in run_simple
self.cleanups, self.func, *args, **kwargs)
File "/Library/Python/2.5/site-packages/bzrlib/", line 165, in _do_with_cleanups
result = func(*args, **kwargs)
File "/Users/bstevens/.bazaar/plugins/cvsps_import/", line 95, in run
File "/Users/bstevens/.bazaar/plugins/cvsps_import/cvsps/", line 1272, in process
patchsets = self._parse_cvsps_dump(pb=pb)
File "/Users/bstevens/.bazaar/plugins/cvsps_import/cvsps/", line 1195, in _parse_cvsps_dump
pb.note('Creating cvsps dump file: %s', cvsps_dump_path)
AttributeError: 'ProgressTask' object has no attribute 'note'

In the end I needed to make a couple changes to the code to get it to run and finally got the below error:
bzr: ERROR: Could not run command: "cvsps --cvs-direct -A -u -q --root /mycvsroot/ module"
do you have cvsps installed?
[Errno 2] No such file or directory
Well nice error message so I install cvsps from:

make, then sudo make install. No problems there.

Try again with some further code problems that I had to make some changes for, and after that I also ran into some issues with commit messages that could not be interpreted
"bzr: ERROR: exceptions.ValueError: Invalid value for commit message:"
I modified the plugin to log a message for every commit it could not interpret instead of failing and fixed some of the other issues I had (probably related to either a new version of python or bazaar) and got it to run. Again I let it run all night and it had not finished. The nice thing with this plugin, however, is that it gives you a progress indicator. Based off this I decided to let it run; in the end it took 17 hours to finish. Based off this I believe that the cvs2bzr would have finished if I had given it more time, but I don't have any proof of that. Also a nice feature of cvsps import is that if it gets interupted it will pick up where it left off; this is very nice when the process takes 17 hours.

All told this was a pretty painful process, but in the end it worked out ok. My patch set as determined by cvsps for CVS was in the 20,000 range and I was running this on a 2.4 GHz dual core macbook and had all the cvs files local (no pserver or ssh). I was however able to convert the repository in one weekend without any meaningful errors (about 10 commit messages were lost out of that 20,000) and kept all the history, branches and tags. I will try and get my code changes up somewhere after I clean them up a bit.