Cloud Computing & Conway’s Law

a previous post: PaaS as change enabler, I mentioned Conway’s law as a difficulty to be addressed when looking for implementing a cloud solution.

organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations

Recently Gartner analyst: Thomas J. Bittman;Problems Encountered by 95% of Private Clouds researched which problems where suffering their clients with private clouds and found that 95% of them have problems with their solutions.

Amazon AWS (Private Clouds are things of the past) and other cloud providers/evangelists will parrot that “private clouds are inherently broken” but I can’t follow their logic on those reasons.

The problems encountered are related to the use of the technology not the technology itself, and most of them will occur implementing cloud, private or hybrid clouds.

If I had to extract a title it’d be “Companies don’t fully understand Cloud Computing“, they map their expertise, their knowledge and their organizational model to a cloud paradigm (hence Conway’s law) without fully committing/understanding all the consequences of a cloud model.

Anyway, those detected problems are critical. Addressing them mark the difference between a successful project or a failure for any company, and it usually involves a change of mindset which is cloud computing really about. A change of paradigm which without it, maybe there isn’t any difference to an advanced virtualization.

LAMP revisited: on steroids

The LAMP stack (Linux + Apache + MySQL + PHP) is one of those sharkarchitectures that’s been for so long that looks like a dinosaur. It’s an established, tested with lots of software and mainly responsible of the MySQL success but it’s like something that comes from another age.

Yes, sharks are living fossils. The oldest ones appeared 450 million of years ago, while modern ones appeared 100 million of years ago. White sharks are 16 million years old, in comparative humans appeared 3-4 millions years ago, the h0mo sapiens 300-400k years ago and the first vestiges of civilization merely 8-10k years ago.

sharknadoDue to a petition from HR to insource the training courses  which are based in moodle, I’ve had some time to review this stack and check which improvements can be applied to get a more modern architecture. And as modern I talk about getting High Availabity, scalability and a cloud friendly platform.

Usually the LAMP stack follows a 2-Tier model, web+app server tightly joined (web server + php engine) and the database plus some usual suspects like load balancers, CDNs, accelerators, caches, …

Load balancing at the front end has been common for ages usually with appliances like F5’s BigIP or Radware or a typical apache behind a dns round-robin. Now most hosting and cloud providers offers a front balancing or even a load balancing as a service but since some years ago HAProxy has positioned itself like a lightweight software load balancer and has been used to balance not only web servers but also docker containers or generally, any service.

The problem has always been maintaining the user session on all the nodes. The fallback or basic option is the “sticky session”, a session is tied to a node, if a node goes down all their sessions are lost (and those users are disconnected.) Actually this method is used for optimization, so switching contexts are minimized but if the app is not prepared for maintaining the sessions it’s also the method to not loose them. All our middle-ware apps here in Spain need “sticky sessions”. Between share and non-share strategy most of the middle-ware solutions have followed the former one, maintaining all the sessions on memory (and distributing them) or in database. The problem is the cost, apps must be programmed to support those practices (and tested!) and there are a lot of penalizations for doing it badly. Like some of our apps that need 4mb of memory per session, share that! Ok, yes, there is the possibility of fine granulation but if you have apps with sessions of that size is already too late (for anything except refactoring of course).

PHP and Moodle in particular follows a disk strategy, storing session and content in disk so if it isn’t found in memory it goes to disk (or generate it before) to retrieve it. This allow the use of a unified, shared filesystem for serving the app: glusterfs. Depending on the performance (we don’t have any metric from the previous provider!) I’ll think if more actions are needed (varnish/squid for caching content and memcached for the sessions). More info on optimizing php over glusterfs in this blog: .

moodle_layoutThe disk layout in its simplest form must be carefully segmented. Glusterfs needs a partition for itself (depending on the setup, maybe even it’s own drive for stripping, performance, etc… , here it’d be /opt) and MySQL Clustered solutions are very sensitive to corruption due to lack of free space so if we don’t change were the databases are stored (usually under /var/lib/mysql) we must assign a partition for the logs (/var/logs).

Finally, the database. Since the MySQL buyout by Oracle there has been some movements, first a real fork between an Enterprise/Commercial MySQL (but without cannibalizing their main db business) and a community version (and not actively maintained) by Oracle. As a response, forks by the open source community for restoring its former glory. The main alternatives now are MariaDB (by the former creator of MySQL) and Percona, both uses a mysql cluster implementation (called galera) on their flag products: MariaDB Galera Cluster and Percona XtraDB Cluster. I’ve used Mariadb Galera 10, Percona seems more professional but in future versions of RHEL MySQL is replaced by MariaDB. At this moment both are interchangeable.

The access to the database isn’t straight by the php app, we will be always accessing the same node, we must configure an HAproxy for balancing the mysql nodes, this way even if one node fails there is mysql service. For this migration HAProxy remains as point of failure but for our Moodle content this design is already overkill (and a dedicated VIP and/or dedicated proxies are too much hassle).

BTW, HAproxy implements to me the most inelegant hack of all, it only checks tcp connections or http petitions. the former doesn’t tell if the service is working (online, synced, etc… ) and the later, well mysql isn’t a http service. The health checks are implemented using a xinetd daemon ( 😕 😕 😕 , so it must be installed and running) after receiving an http petition on a port it calls a script (which user, and pass, with grant permissions!, in plain texts) that connects to mysql and checks the status of the service. Really, really, ugly!

All this implementation is coded in Ansible, i’m a bit shy about showing those playbooks in github, they need some serious cleaning.