Saturday, January 9, 2010

Diary of setting up an Ubuntu Enterprise Cloud

Below is my experiencing with getting an Ubuntu Enterprise Cloud setup. This is really a Eucalyptus cloud setup that is packaged up by Ubuntu, so I may use the terms somewhat interchangeably.
Day 1
I requisitioned 2 servers and Installed ubuntu server 9.1 and choose the Enterprise cloud option from the main installation screen. I did this once without an Internet connection and had all kinds of problems with exchange the keys with the cloud controller and the node:

warning: //var/lib/eucalyptus/keys//node-cert.pem doesn't exists!
warning: //var/lib/eucalyptus/keys//cluster-cert.pem doesn't exists!
warning: //var/lib/eucalyptus/keys//node-pk.pem doesn't exists!
warning: //var/lib/eucalyptus/keys//cloud-cert.pem doesn't exists!


Trying scp to sync keys to: eucalyptus@!://var/lib/eucalyptus/keys/...
usage: scp [-1246BCpqrv] [-c cipher] [-F ssh_config] [-i identity_file]
[-l limit] [-o ssh_option] [-P port] [-S program]
[[user@]host1:]file1 ... [[user@]host2:]file2
failed.

ERROR: could not synchronize keys with !
The configuration will not have this node.
Hint: to setup passwordless login to the nodes as user eucalyptus, you can
run the following commands on node !:
sudo -u eucalyptus mkdir -p ~eucalyptus/.ssh
sudo -u eucalyptus tee ~eucalyptus/.ssh/authorized_keys > /dev/null
SNIP
EOT
Be sure that authorized_keys is not group/world readable or writable

So I followed the above directions and did many other things without any luck, eventually after reinstalling both the node and cloud controller machine while they were connected to the internet and then doing a distro update and reboot:
sudo apt-get update
sudo apt-get dist-upgrade
the key exchange process worked. I did this a couple times and had to do some or part of the above directions to get it to work.
With everything appearing to work I tried to get elastic fox to work so I could do something useful with my cloud; however I could not get it to accept a new region. I believe this is related to a recent update where amazon was adding some features that made it incompatible with Eucalyptus, though perhaps this will be addressed soon. I did eventually find a fork of the elastic fox called Hybridfox that is made to work with Eucalyptus but for now I was relegated to the command line tools to try and get an image up and running. The ubuntu website made mention of a "register-uec-tarball" script which when used with one of their pre-supplied images would register it with the cloud controller. I found the script at: http://bazaar.launchpad.net/~ubuntu-on-ec2/ubuntu-on-ec2/uec-tools/annotate/head:/register-uec-tarball
Unfortunately it didn't work out of the box and I was not really interested in fixing it.

./register-uec-tarball.sh euca-centos-5.3-i386.tar.gz centos i386
Mon Dec 21 20:08:16 EST 2009: ====== extracting image ======
can't find image
cleaning up /tmp/register-uec-tarball.sh.zEYnMK

One note is that if you browse through the instance store through the web interface you can just download an image with a click of a button and that seems to work without problems, but you are limited to only what is available through the image store which was a beta media server and 2 different version of ubuntu and I had my heart set on centOS.
I did manage to get my image to register using the command line instructions provided at https://help.ubuntu.com/community/UEC/CDInstall scroll down to "STEP 7: Run an Image". The steps refer to a non-existent EMI environment variable in step 3. This can be retrieved from the management console or from the euca-describe-images command (it will be the one with image.manifest.xml in the file name).
Bellow is what the commands looked like for my install. Note if you are trying to replicate this your would be different based off the image you are trying to install (again I was doing centOS) and the EKI and EMI values will be different.
gunzip centos.tar.gz
tar -xvf centos.tar
cd euca-centos-5.3-i386/
euca-bundle-image -i kvm-kernel/vmlinuz-2.6.28-11-server --kernel true
euca-upload-bundle -b centos-kernel-bucket -m /tmp/vmlinuz-2.6.28-11-server.manifest.xml
euca-register centos-kernel-bucket/vmlinuz-2.6.24-19-xen.manifest.xml
(set the printed eki to $EKI [word after IMAGE export EKI=eki-41CD162F in my case])

euca-bundle-image -i kvm-kernel/initrd.img-2.6.28-11-server --ramdisk true
euca-upload-bundle -b centos-ramdisk-bucket -m /tmp/initrd.img-2.6.28-11-server.manifest.xml
euca-register centos-ramdisk-bucket/initrd.img-2.6.28-11-server.manifest.xml
(set the printed eri to $ERI [word after IMAGE, export ERI=eri-A4B3177E in my case]))

euca-bundle-image -i centos.5-3.x86.img --kernel $EKI --ramdisk $ERI
euca-upload-bundle -b centos-image-bucket -m /tmp/centos.5-3.x86.img.manifest.xml
euca-register centos-image-bucket/centos.5-3.x86.img.manifest.xml

You should get back an image id and be able to see it in the cloud manager. Now use this image id to create the server:
euca-run-instances emi-E5C0150D -k mykey -t m1.small
Done! I was ready to break out in a jig when instead of getting my VM I get this message:
FinishedVerify: Not enough resources: vm instances.

Day 2
I found the “euca-describe-availability-zones verbose” command which when run shows I have no availability (i.e. no Nodes running). One note is despite some googling of output where it looks like it shows the registered nodes I never saw this in the output. I am not sure if there are just a different versions of this command or if in some cases it does and others it does not output node information. In the end (I am spoiling the ending but narrators are allowed to be omniscient) I couldn't find anything that would actually tell you what nodes really existed and what state they were in. So given it was a new day and my lack of availability of resources I decided to start over with a newly requisitioned laptop for my node that had a little more horsepower.

First I reinstall the controller, next I do the Node. Now I am back to my original problem with the key synchronization, so I do the apt-get dist-upgrade, run through the whole synching keys process and it completes without error. Cross my fingers and run “euca-describe-availability-zones verbose” still showing no availability…logs showed nothing obvious, but I am not sure what I am looking for so that doesn't mean very much coming from me. Posted a message on the forum for help. I never did get a reply but I had a theory that perhaps my hardware was not up to the task. I requisitioned a laptop that I knew had the intel VL extensions and reinstalled everything again with no luck. I posted a message to the Eucalyptus forum and went home.
Day 3
I got a response on the eucalyptus forum and tried their suggestion of opening up ports but this didn't help. I requisitioned a new laptop for my testing environment to use for the cloud controller (we had ordered a new laptop to replace one that was getting old and didn't have a working delete key. "Just don't make any mistakes for a couple days and you won't need the delete key" I said as I stole off with the machine.) I reinstall the cloud controller and now get stuck on installing the credentials. Do the dist upgrade and thankfully this goes away. Reboot the cloud controller, then the node so they come up in the right order and try to see what my availability is now and get a strange error instead:
path=/services/Eucalyptus/?AWSAccessKeyId=WKy3rMzOWPouVOxK1p3Ar1C2uRBwa2FBXnCw&Action=DescribeAvailabilityZones&SignatureMethod=HmacSHA256&SignatureVersion=2&Timestamp=2009-12-28T23%3A25%3A35&Version=2009-04-04&ZoneName.1=verbose&Signature=MiDl02MdVCSzgcI7QboHfCa0UUYBw3c2MHrlRNbPbrc%3D
Failure: 408 Request Timeout
Thankfully this turned out to be something simple. There appears to be a known bug where on a new boot things may come up in the wrong order. The work around is to restart the eucalyptus service on the node:
sudo service eucalyptus stop
sudo service eucalyptus start
Now back on the cloud controller I check the availability and finally it is registering the node. I now try to start up a new instance:
euca-run-instances emi-E5AC1512 -k mykey -t m1.small
Then with the handy dandy watch command:
watch -n5 euca-describe-instances
I was able to watch it terminate right after it started up. Not the desired affect. I remembered at this point reading somewhere that you may need to do some bios tinkering to get the virtualization settings enabled. So I shut everything down go into the bios and enable virtualization on both the cloud controller and the node. Start everything back up and try it again. By now I had found Hybridfox and was able to use their GUI to deploy my new instance. One note on Hybrid fox is that on the directions it says to use the "Query Id" and "Secret Key" I took these to be just dummy values but you need to get these by logging into the cloud controller web interface and clicking on show keys at the bottom of the login screen.
I still needed to do some work to get the networking all working, but there was my VM in my private cloud happily running.
Although I can't say this was easy to get up and running it is a frightfully nifty piece of tech. You need to have both the Cloud Controller and the Node to have VL extensions and turned on so you need a certain level of hardware to play with it (oh and the laptop with the missing delete key did in fact get replaced). Eucalyptus has re-implemented much of the Amazon cloud infrastructure, and not just the EC2 part. There is a commercial company behind it as well for those that need that level of support and piece of mind.

23 comments:

  1. Are you running Centos images (guests) in this setup?

    ReplyDelete
  2. im installing the same way u did ,.. but mine is not moving forward. network auto config is failing as no internet connection and i am not able to see GUI to set network settings. i have 2 machines one for cloud controller eucalyptus and walrus , but the second one., node controller is not communicating with cloud Contreras. how to configure LAN with two machines and I DONNNO.. plz help

    ReplyDelete
  3. In my case I had everything on one network (if I recall) that was separated from the main network so I was the only thing on it. I think in a "real" setup you would want to have a separate control layer from the nodes. Also I know I had problems when I didn't have an internet connection so this was necessary for me. Since I wrote this eucalyptus have also come out with a new version so it's install may be different I have not yet tried it.

    ReplyDelete
  4. Need your help! I am trying to set it up and am stuck up.
    Would you like to help me out?

    ReplyDelete
  5. does the installation of the cloud controller and node work on VMware or virtual box?
    i have it installed on VMware but i have been unable to connect to the internet even though the server and the node can ping each other successfully.

    ReplyDelete
  6. Not sure what you are trying to do. If you are trying to specify the underlying VM technology of virtualbox/vmware or trying to install this within a virtualbox/vmware set of VMs which I don't think would work.

    ReplyDelete
  7. i am using ubuntu 10.04 LTS,I have cloud and node controller in separate machine,i have problem with registering node,euca-describe-availability-zones verbose gives 000/000 in free max/cpu column ,another problem is if i assign static address i cannot access internet,please help me out to solve this problem,please tell me the way to set static address to both node and cloud controller, currently i am using college ips

    ReplyDelete
    Replies
    1. It sounds like you don't have the node correctly registered if it is showing 000 for your max cpu. Check the logs everywhere; that is on the node and the controller.
      As for the ips not sure how to help. If your college only gives you certain IP addresses via DHCP you would need to setup a separate subnet and route back into the college network. This means setting up a router with a different private ip range on the inside where you plug your node and controller into and the WAN interface would be plugged into the college network setup for DHCP.

      Delete
    2. One more thought is make sure you have virtualization extensions enabled in the bios. If memory serves I had a similar issue and that resolved it.

      Delete
  8. VT is enabled in my bios ,i have i3 processor.which log i should check,in /var/log/eucalyptus and whether i need to change any file?

    ReplyDelete
  9. i amnot able to create instances,after i gave euca-run-instances emi-E5AC1512 -k mykey -t m1.small ,it goes to pending state and then terminates?what might be the problem?now node is connected and fre/max is not 000/000.help me out?

    ReplyDelete
    Replies
    1. Your probably better off in the eucalyptus forums:
      http://open.eucalyptus.com/forums/eucalyptus-support-0
      Look in both the Node machine and the Controller machine log dirs which are usually in:
      /var/log/eucalyptus/
      This hopefully will get you to an error message...or lack of some important message. 000/000 is too general a message and just means there are no resources available and none being used.

      Delete