Tuesday, May 27, 2008

Alfresco Cluster in Compute Cloud (Amazon EC2)

Synopsis: This article describes simplified process of setting up Alfresco Cluster in Cloud Computing environment (Amazon Elastic Compute Cloud).

Pre-requisites.
  • You need to have Amazon Elastic Compute Cloud account (make sure you have files like pk-ABCDAABCDAABCDAABCDAABCDAABCDAABCDA.pem and cert-ABCDAABCDAABCDAABCDAABCDAABCDAABCDA.pem )
  • Basic knowledge of Alfresco CMS
  • Some knowledge of Amazon EC2 tools
Part 1. Introduction

What is Alfresco and why it is cool.

Alfresco is the leading open source alternative for enterprise content management. The open source model allows Alfresco to use best-of-breed open source technologies and contributions from the open source community to get higher quality software produced more quickly at much lower cost.

The Benefits of Using Alfresco

  • Ease-of-Use
  • Intelligent Virtual File System – As simple to use as a shared drive through CIFS, WebDAV or FTP
  • Google®-Like Search and Yahoo!®-Like Folder Browsing

Developer Productivity

  • Aspect Oriented Rules Development through Simple-to-Use Wizards
  • Rules and Actions Managed in the Server once for all Interfaces


Best-Practice Collaboration

  • Pre-Configured Smart-Space Templates – Project Structure, Content, Logic, Lifecycles
  • Forums – Threaded Discussions on Folders or Documents


Administrator Productivity

  • Simple Server Install and No Client Install
  • Advanced Content Security Management
  • Advanced Search/Knowledge Management
  • Sophisticated Content, Attribute, Location, Object Type and Multiple Taxonomy/Category

Search

  • Distributed Architecture
  • Highly Scalable and Fault Tolerant Service Oriented Architecture


Open Source

  • Dramatically Lower Cost

What is Amazon EC2 and why it is also cool.

Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers.
Amazon EC2's simple web service interface allows you to obtain and configure capacity with minimal friction. It provides you with complete control of your computing resources and lets you run on Amazon's proven computing environment.

Amazon EC2 reduces the time required to obtain and boot new server instances to minutes, allowing you to quickly scale capacity, both up and down, as your computing requirements change. Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use. Amazon EC2 provides developers the tools to build failure resilient applications and isolate themselves from common failure scenarios.


Why a combination of both is uber-cool.


Benefits of the cloud allow you to provision preconfigured Digital Assets Repository on demand and scale it dynamically, thus reducing total ownership costs considerably.

Part 2. Getting your Cluster in 15 minutes.

Run management instance.

I use ElasticFox extension to do basic tasks on EC2. Run instance of Amazon Machine Image ami-68af4b01. After state of the instance changes from pending to running, you need to get Public DNS and put it into browser address bar. This should bring Web User Interface for Alfresco Cluster Management. You can login with following credentials: username alfresco, password alfresco. Now you should have left navigation with options:

* Add New Alfresco Cluster

* Alfresco Cluster Home

* Cluster Provisioning Status

* General Settings

* List All Alfresco Clusters

* Provision Alfresco Cluster Nodes

Setup EC2 credentials.

Using client like Putty or other ssh client, log in to Public DNS address and download your key files to dedicated path (e.g. /root). Now you need to edit following file /var/www/sites/all/modules/alfresco_ec2/ws-amazon-ec2-client.php and add names of your key files.

Provision minimal cluster.

At this point you are ready to provision your minimal 3-node Alfresco Cluster. Proceed to web interface and choose Add New Alfresco Cluster option. Fill in the form and choose List All Alfresco Clusters option in the left navigation to see your new cluster. To actually provision nodes to it, click on radio button for specific cluster and press submit button 'Provision Nodes for Selected Cluster Now'. This will lock browser for pretty long time (around 6-7 minutes). I have modified version of software, that handles this in a more elegant fashion. After it finishes provisioning transaction, you'll see results on the screen.

Test cluster functionality.

See ElasticFox extension and note new instances, that are respectively, DB Node, Master Node and Slave Node. Get its Public DNS addresses and open it in browser (it should like http://ec2-00-100-200-100.compute-1.amazonaws.com:8080/alfresco). Log in to one of the instances (or re-log in) as admin:admin and Add Content. Then log in to another instance and see if content of the space shows new file.

Conclusions.

This setup allows one click provisioning of Alfresco Cluster on EC2 and provides enterprise class digital assets managment system, that can be used for multiple purposes.

Future Work.

Scale-On-Demand.

High-Availability Configuration.

Media Streaming Solutions.

LDAP integration.

Single Sign On/Out.

Contact.

If you have any questions, please write me an email at sapenov at gmail dot com.

Cloud Computing Google Group