Category Archives: Uncategorized

LAMP revisited: on steroids

The LAMP stack (Linux + Apache + MySQL + PHP) is one of those sharkarchitectures that’s been for so long that looks like a dinosaur. It’s an established, tested with lots of software and mainly responsible of the MySQL success but it’s like something that comes from another age.

Yes, sharks are living fossils. The oldest ones appeared 450 million of years ago, while modern ones appeared 100 million of years ago. White sharks are 16 million years old, in comparative humans appeared 3-4 millions years ago, the h0mo sapiens 300-400k years ago and the first vestiges of civilization merely 8-10k years ago.

sharknadoDue to a petition from HR to insource the training courses  which are based in moodle, I’ve had some time to review this stack and check which improvements can be applied to get a more modern architecture. And as modern I talk about getting High Availabity, scalability and a cloud friendly platform.

Usually the LAMP stack follows a 2-Tier model, web+app server tightly joined (web server + php engine) and the database plus some usual suspects like load balancers, CDNs, accelerators, caches, …

Load balancing at the front end has been common for ages usually with appliances like F5’s BigIP or Radware or a typical apache behind a dns round-robin. Now most hosting and cloud providers offers a front balancing or even a load balancing as a service but since some years ago HAProxy has positioned itself like a lightweight software load balancer and has been used to balance not only web servers but also docker containers or generally, any service.

The problem has always been maintaining the user session on all the nodes. The fallback or basic option is the “sticky session”, a session is tied to a node, if a node goes down all their sessions are lost (and those users are disconnected.) Actually this method is used for optimization, so switching contexts are minimized but if the app is not prepared for maintaining the sessions it’s also the method to not loose them. All our middle-ware apps here in Spain need “sticky sessions”. Between share and non-share strategy most of the middle-ware solutions have followed the former one, maintaining all the sessions on memory (and distributing them) or in database. The problem is the cost, apps must be programmed to support those practices (and tested!) and there are a lot of penalizations for doing it badly. Like some of our apps that need 4mb of memory per session, share that! Ok, yes, there is the possibility of fine granulation but if you have apps with sessions of that size is already too late (for anything except refactoring of course).

PHP and Moodle in particular follows a disk strategy, storing session and content in disk so if it isn’t found in memory it goes to disk (or generate it before) to retrieve it. This allow the use of a unified, shared filesystem for serving the app: glusterfs. Depending on the performance (we don’t have any metric from the previous provider!) I’ll think if more actions are needed (varnish/squid for caching content and memcached for the sessions). More info on optimizing php over glusterfs in this blog: http://joejulian.name/blog/optimizing-web-performance-with-glusterfs .

moodle_layoutThe disk layout in its simplest form must be carefully segmented. Glusterfs needs a partition for itself (depending on the setup, maybe even it’s own drive for stripping, performance, etc… , here it’d be /opt) and MySQL Clustered solutions are very sensitive to corruption due to lack of free space so if we don’t change were the databases are stored (usually under /var/lib/mysql) we must assign a partition for the logs (/var/logs).

Finally, the database. Since the MySQL buyout by Oracle there has been some movements, first a real fork between an Enterprise/Commercial MySQL (but without cannibalizing their main db business) and a community version (and not actively maintained) by Oracle. As a response, forks by the open source community for restoring its former glory. The main alternatives now are MariaDB (by the former creator of MySQL) and Percona, both uses a mysql cluster implementation (called galera) on their flag products: MariaDB Galera Cluster and Percona XtraDB Cluster. I’ve used Mariadb Galera 10, Percona seems more professional but in future versions of RHEL MySQL is replaced by MariaDB. At this moment both are interchangeable.

The access to the database isn’t straight by the php app, we will be always accessing the same node, we must configure an HAproxy for balancing the mysql nodes, this way even if one node fails there is mysql service. For this migration HAProxy remains as point of failure but for our Moodle content this design is already overkill (and a dedicated VIP and/or dedicated proxies are too much hassle).

BTW, HAproxy implements to me the most inelegant hack of all, it only checks tcp connections or http petitions. the former doesn’t tell if the service is working (online, synced, etc… ) and the later, well mysql isn’t a http service. The health checks are implemented using a xinetd daemon ( 😕 😕 😕 , so it must be installed and running) after receiving an http petition on a port it calls a script (which user, and pass, with grant permissions!, in plain texts) that connects to mysql and checks the status of the service. Really, really, ugly!

All this implementation is coded in Ansible, i’m a bit shy about showing those playbooks in github, they need some serious cleaning.

Some dead ends to be acknowledged the next year

Last post of the year, maybe it’s a good moment to summarize my progress in my foolish quest to design a modern elastic private infrastructure PAAS for a potential migration of our (Spanish) web services (without jumping straight to an already established platform; that will be the final solution anyway but knowing how it works and with the knowledge of why those processes, which problems existed and how had been tackled).

Goal for new year 2015

No SSH in user space!

No ssh for apps, no ssh for deploying, accessing logs, content, events, monitoring… microservices and containers for the win!

Spanish Web Services/Apps current Status

First of all, even an established PaaS is moot now. The reason is that our apps aren’t cloud friendly, they aren’t elastic, they violate several factors (http://12factor.net/) mainly because they were designed more than 10 years ago and while they have been mavenized in recent years there has been no willingness to redesign them. (I found later that myconnections already mentioned those concepts: Cloud enabling an app ‌and more specifically Re: Cloud enabling an app )

Those factors are:

IV. Backing Services: Dependencies and external services aren’t loosely coupled, they are tightly coupled.
VI. Processes: Except for our ajax web services which are stateless all our webs are statefuls and rely on sticky sessions. Loosing a node (the service in a node) implies the disconnection and loss work for the clients using that node.Apps don’t follow the “share nothing” paradigm.
VII. Port Binding: Our apps aren’t self-contained, they need web servers in front of the jboss app servers for serving static content wich isn’t in the app. It’s a legacy infrastructure design.
VIII. Concurrency: VI Process is a pre-requisite and we don’t comply with it.
IX. Disposability: Migrating from WebSphere to Jboss mitigated a lot the startup/shutdown times but failing VI, easily disposal of services is costly.
XI. Logs: Logs are a mess, there isn’t a signal/noise ratio there is just noise. Some apps log (in info or warn) thousands of lines per second! Each app has it’s own syntax, there are no error codes, no index, nothing, just developer… debug… garbage…

Cloud services are for Cloud

Most of my problems come from the lack of a Lab or a corporate AWS account. A local PC, vagrant and virtualbox aren’t the best tools. Windows, 8Gb of ram, a corporate proxy… it isn’t the best test bed to analyse how the system work.

PaaS/?aaS Infrastructure

Lot of names, lot of concepts, managers want a box with a lace, fellow workers look at me like an oddity, maybe they want all done or just no more problems.

Trying to explain the strata_container_ecosystemmain actors in a cloud infrastructure paradigmI found the following scheme very informative .

There are a lot of more actors but the layer distribution is understandable.

Technologies like OpenStack include several products that span over multiple layers (Heat, Nova,…).

Some technologies are compatible, some are interchangeable. Cloudfoundry and Heroku would be in layer 7 and 4.

Some tools or services like zookeeper, etcd, activemq, redis or databases (both: RDBMS and NoSQL) support or provide functionality to the ecosystem although they don’t have a proper layer.

Container Issues

Most of the technologies seem mature and ready, even Kubernetes which is still in alpha/beta state it’s already a contender. The battlefield is still Docker. Most of my previous concerns are already resolved, the ecosystem takes care of them, but I’m still thinking how to securize it.

The problem is that docker needs an image repository, the docker registry software is available but it’s authentication methods are very limited. They are open by default, anyone with access to the repository (an http port) could not only download an image but also push new ones or worse, replace an existing one.An OPS nightmare. You can’t even “do a git”, the access uses a HTTP Port so no security delegation to the OS. The only choice is using an apache/nginx frontend, the registry accepts BASIC auth but it’s a …

Actually that authentication exists, it’s called Docker Hub and follows a SaaS model. The payment is monthly and I don’t know if there is some type of integration with a corporate IDM, also the authentication is done in remote servers so Internet access is a must too.

Another point of concern is the security/integrity of the containers. The model proposed in CoreOS Rocket seems a lot more robust.

So how the container authorization and security are managed is another point to control when looking at a PaaS solution.