Saturday, January 9, 2010

Diary of setting up an Ubuntu Enterprise Cloud

Below is my experiencing with getting an Ubuntu Enterprise Cloud setup. This is really a Eucalyptus cloud setup that is packaged up by Ubuntu, so I may use the terms somewhat interchangeably.
Day 1
I requisitioned 2 servers and Installed ubuntu server 9.1 and choose the Enterprise cloud option from the main installation screen. I did this once without an Internet connection and had all kinds of problems with exchange the keys with the cloud controller and the node:

warning: //var/lib/eucalyptus/keys//node-cert.pem doesn't exists!
warning: //var/lib/eucalyptus/keys//cluster-cert.pem doesn't exists!
warning: //var/lib/eucalyptus/keys//node-pk.pem doesn't exists!
warning: //var/lib/eucalyptus/keys//cloud-cert.pem doesn't exists!


Trying scp to sync keys to: eucalyptus@!://var/lib/eucalyptus/keys/...
usage: scp [-1246BCpqrv] [-c cipher] [-F ssh_config] [-i identity_file]
[-l limit] [-o ssh_option] [-P port] [-S program]
[[user@]host1:]file1 ... [[user@]host2:]file2
failed.

ERROR: could not synchronize keys with !
The configuration will not have this node.
Hint: to setup passwordless login to the nodes as user eucalyptus, you can
run the following commands on node !:
sudo -u eucalyptus mkdir -p ~eucalyptus/.ssh
sudo -u eucalyptus tee ~eucalyptus/.ssh/authorized_keys > /dev/null
SNIP
EOT
Be sure that authorized_keys is not group/world readable or writable

So I followed the above directions and did many other things without any luck, eventually after reinstalling both the node and cloud controller machine while they were connected to the internet and then doing a distro update and reboot:
sudo apt-get update
sudo apt-get dist-upgrade
the key exchange process worked. I did this a couple times and had to do some or part of the above directions to get it to work.
With everything appearing to work I tried to get elastic fox to work so I could do something useful with my cloud; however I could not get it to accept a new region. I believe this is related to a recent update where amazon was adding some features that made it incompatible with Eucalyptus, though perhaps this will be addressed soon. I did eventually find a fork of the elastic fox called Hybridfox that is made to work with Eucalyptus but for now I was relegated to the command line tools to try and get an image up and running. The ubuntu website made mention of a "register-uec-tarball" script which when used with one of their pre-supplied images would register it with the cloud controller. I found the script at: http://bazaar.launchpad.net/~ubuntu-on-ec2/ubuntu-on-ec2/uec-tools/annotate/head:/register-uec-tarball
Unfortunately it didn't work out of the box and I was not really interested in fixing it.

./register-uec-tarball.sh euca-centos-5.3-i386.tar.gz centos i386
Mon Dec 21 20:08:16 EST 2009: ====== extracting image ======
can't find image
cleaning up /tmp/register-uec-tarball.sh.zEYnMK

One note is that if you browse through the instance store through the web interface you can just download an image with a click of a button and that seems to work without problems, but you are limited to only what is available through the image store which was a beta media server and 2 different version of ubuntu and I had my heart set on centOS.
I did manage to get my image to register using the command line instructions provided at https://help.ubuntu.com/community/UEC/CDInstall scroll down to "STEP 7: Run an Image". The steps refer to a non-existent EMI environment variable in step 3. This can be retrieved from the management console or from the euca-describe-images command (it will be the one with image.manifest.xml in the file name).
Bellow is what the commands looked like for my install. Note if you are trying to replicate this your would be different based off the image you are trying to install (again I was doing centOS) and the EKI and EMI values will be different.
gunzip centos.tar.gz
tar -xvf centos.tar
cd euca-centos-5.3-i386/
euca-bundle-image -i kvm-kernel/vmlinuz-2.6.28-11-server --kernel true
euca-upload-bundle -b centos-kernel-bucket -m /tmp/vmlinuz-2.6.28-11-server.manifest.xml
euca-register centos-kernel-bucket/vmlinuz-2.6.24-19-xen.manifest.xml
(set the printed eki to $EKI [word after IMAGE export EKI=eki-41CD162F in my case])

euca-bundle-image -i kvm-kernel/initrd.img-2.6.28-11-server --ramdisk true
euca-upload-bundle -b centos-ramdisk-bucket -m /tmp/initrd.img-2.6.28-11-server.manifest.xml
euca-register centos-ramdisk-bucket/initrd.img-2.6.28-11-server.manifest.xml
(set the printed eri to $ERI [word after IMAGE, export ERI=eri-A4B3177E in my case]))

euca-bundle-image -i centos.5-3.x86.img --kernel $EKI --ramdisk $ERI
euca-upload-bundle -b centos-image-bucket -m /tmp/centos.5-3.x86.img.manifest.xml
euca-register centos-image-bucket/centos.5-3.x86.img.manifest.xml

You should get back an image id and be able to see it in the cloud manager. Now use this image id to create the server:
euca-run-instances emi-E5C0150D -k mykey -t m1.small
Done! I was ready to break out in a jig when instead of getting my VM I get this message:
FinishedVerify: Not enough resources: vm instances.

Day 2
I found the “euca-describe-availability-zones verbose” command which when run shows I have no availability (i.e. no Nodes running). One note is despite some googling of output where it looks like it shows the registered nodes I never saw this in the output. I am not sure if there are just a different versions of this command or if in some cases it does and others it does not output node information. In the end (I am spoiling the ending but narrators are allowed to be omniscient) I couldn't find anything that would actually tell you what nodes really existed and what state they were in. So given it was a new day and my lack of availability of resources I decided to start over with a newly requisitioned laptop for my node that had a little more horsepower.

First I reinstall the controller, next I do the Node. Now I am back to my original problem with the key synchronization, so I do the apt-get dist-upgrade, run through the whole synching keys process and it completes without error. Cross my fingers and run “euca-describe-availability-zones verbose” still showing no availability…logs showed nothing obvious, but I am not sure what I am looking for so that doesn't mean very much coming from me. Posted a message on the forum for help. I never did get a reply but I had a theory that perhaps my hardware was not up to the task. I requisitioned a laptop that I knew had the intel VL extensions and reinstalled everything again with no luck. I posted a message to the Eucalyptus forum and went home.
Day 3
I got a response on the eucalyptus forum and tried their suggestion of opening up ports but this didn't help. I requisitioned a new laptop for my testing environment to use for the cloud controller (we had ordered a new laptop to replace one that was getting old and didn't have a working delete key. "Just don't make any mistakes for a couple days and you won't need the delete key" I said as I stole off with the machine.) I reinstall the cloud controller and now get stuck on installing the credentials. Do the dist upgrade and thankfully this goes away. Reboot the cloud controller, then the node so they come up in the right order and try to see what my availability is now and get a strange error instead:
path=/services/Eucalyptus/?AWSAccessKeyId=WKy3rMzOWPouVOxK1p3Ar1C2uRBwa2FBXnCw&Action=DescribeAvailabilityZones&SignatureMethod=HmacSHA256&SignatureVersion=2&Timestamp=2009-12-28T23%3A25%3A35&Version=2009-04-04&ZoneName.1=verbose&Signature=MiDl02MdVCSzgcI7QboHfCa0UUYBw3c2MHrlRNbPbrc%3D
Failure: 408 Request Timeout
Thankfully this turned out to be something simple. There appears to be a known bug where on a new boot things may come up in the wrong order. The work around is to restart the eucalyptus service on the node:
sudo service eucalyptus stop
sudo service eucalyptus start
Now back on the cloud controller I check the availability and finally it is registering the node. I now try to start up a new instance:
euca-run-instances emi-E5AC1512 -k mykey -t m1.small
Then with the handy dandy watch command:
watch -n5 euca-describe-instances
I was able to watch it terminate right after it started up. Not the desired affect. I remembered at this point reading somewhere that you may need to do some bios tinkering to get the virtualization settings enabled. So I shut everything down go into the bios and enable virtualization on both the cloud controller and the node. Start everything back up and try it again. By now I had found Hybridfox and was able to use their GUI to deploy my new instance. One note on Hybrid fox is that on the directions it says to use the "Query Id" and "Secret Key" I took these to be just dummy values but you need to get these by logging into the cloud controller web interface and clicking on show keys at the bottom of the login screen.
I still needed to do some work to get the networking all working, but there was my VM in my private cloud happily running.
Although I can't say this was easy to get up and running it is a frightfully nifty piece of tech. You need to have both the Cloud Controller and the Node to have VL extensions and turned on so you need a certain level of hardware to play with it (oh and the laptop with the missing delete key did in fact get replaced). Eucalyptus has re-implemented much of the Amazon cloud infrastructure, and not just the EC2 part. There is a commercial company behind it as well for those that need that level of support and piece of mind.