All posts by gGainza

LAMP revisited: on steroids

The LAMP stack (Linux + Apache + MySQL + PHP) is one of those sharkarchitectures that’s been for so long that looks like a dinosaur. It’s an established, tested with lots of software and mainly responsible of the MySQL success but it’s like something that comes from another age.

Yes, sharks are living fossils. The oldest ones appeared 450 million of years ago, while modern ones appeared 100 million of years ago. White sharks are 16 million years old, in comparative humans appeared 3-4 millions years ago, the h0mo sapiens 300-400k years ago and the first vestiges of civilization merely 8-10k years ago.

sharknadoDue to a petition from HR to insource the training courses  which are based in moodle, I’ve had some time to review this stack and check which improvements can be applied to get a more modern architecture. And as modern I talk about getting High Availabity, scalability and a cloud friendly platform.

Usually the LAMP stack follows a 2-Tier model, web+app server tightly joined (web server + php engine) and the database plus some usual suspects like load balancers, CDNs, accelerators, caches, …

Load balancing at the front end has been common for ages usually with appliances like F5’s BigIP or Radware or a typical apache behind a dns round-robin. Now most hosting and cloud providers offers a front balancing or even a load balancing as a service but since some years ago HAProxy has positioned itself like a lightweight software load balancer and has been used to balance not only web servers but also docker containers or generally, any service.

The problem has always been maintaining the user session on all the nodes. The fallback or basic option is the “sticky session”, a session is tied to a node, if a node goes down all their sessions are lost (and those users are disconnected.) Actually this method is used for optimization, so switching contexts are minimized but if the app is not prepared for maintaining the sessions it’s also the method to not loose them. All our middle-ware apps here in Spain need “sticky sessions”. Between share and non-share strategy most of the middle-ware solutions have followed the former one, maintaining all the sessions on memory (and distributing them) or in database. The problem is the cost, apps must be programmed to support those practices (and tested!) and there are a lot of penalizations for doing it badly. Like some of our apps that need 4mb of memory per session, share that! Ok, yes, there is the possibility of fine granulation but if you have apps with sessions of that size is already too late (for anything except refactoring of course).

PHP and Moodle in particular follows a disk strategy, storing session and content in disk so if it isn’t found in memory it goes to disk (or generate it before) to retrieve it. This allow the use of a unified, shared filesystem for serving the app: glusterfs. Depending on the performance (we don’t have any metric from the previous provider!) I’ll think if more actions are needed (varnish/squid for caching content and memcached for the sessions). More info on optimizing php over glusterfs in this blog: http://joejulian.name/blog/optimizing-web-performance-with-glusterfs .

moodle_layoutThe disk layout in its simplest form must be carefully segmented. Glusterfs needs a partition for itself (depending on the setup, maybe even it’s own drive for stripping, performance, etc… , here it’d be /opt) and MySQL Clustered solutions are very sensitive to corruption due to lack of free space so if we don’t change were the databases are stored (usually under /var/lib/mysql) we must assign a partition for the logs (/var/logs).

Finally, the database. Since the MySQL buyout by Oracle there has been some movements, first a real fork between an Enterprise/Commercial MySQL (but without cannibalizing their main db business) and a community version (and not actively maintained) by Oracle. As a response, forks by the open source community for restoring its former glory. The main alternatives now are MariaDB (by the former creator of MySQL) and Percona, both uses a mysql cluster implementation (called galera) on their flag products: MariaDB Galera Cluster and Percona XtraDB Cluster. I’ve used Mariadb Galera 10, Percona seems more professional but in future versions of RHEL MySQL is replaced by MariaDB. At this moment both are interchangeable.

The access to the database isn’t straight by the php app, we will be always accessing the same node, we must configure an HAproxy for balancing the mysql nodes, this way even if one node fails there is mysql service. For this migration HAProxy remains as point of failure but for our Moodle content this design is already overkill (and a dedicated VIP and/or dedicated proxies are too much hassle).

BTW, HAproxy implements to me the most inelegant hack of all, it only checks tcp connections or http petitions. the former doesn’t tell if the service is working (online, synced, etc… ) and the later, well mysql isn’t a http service. The health checks are implemented using a xinetd daemon ( 😕 😕 😕 , so it must be installed and running) after receiving an http petition on a port it calls a script (which user, and pass, with grant permissions!, in plain texts) that connects to mysql and checks the status of the service. Really, really, ugly!

All this implementation is coded in Ansible, i’m a bit shy about showing those playbooks in github, they need some serious cleaning.

PaaS as change enabler

In a previous post Some dead ends to be acknowledged the next year , I set “No SSH in user space” as goal for 2015.

For a sysadmin it’s a mind-blowing motto (it’s like abandoning decades of practices) but few people would understand the change it involves; not only technologically but also conceptually. Actually, it’s not even a goal, it’s a byproduct of a change of paradigm.

cartoons_dilbert-pointyHairedBossAlas, pursuing it would be the worst way to achieve it, there’s no way to sell it (except maybe if you’re Oracle sales).
What needs to be sold is the idea. The paradise of a continuous delivery utopia, invention driven and with a low profile bureaucracy (wait, it can be done, sign me in!).

First we should sell the idea that other cultures are possible and they are great, the future, the way to go. “Engineering culture at Spotify part1 and part2” explains the new age really well, check the videos, they are fun and the ideas in them are great.

Second, this paradigm, this new way of working, involves a change of mentality. People must move on with the times.

If the people are sold (I’m already btw) then finally, I’d be able to sell a technology change that would enable the paradigm change.

A brief mind map of the technology solution involved (Maybe it’s not the only way to achieve it but at the moment it’s the funnier):

mm-cp

The technology follows a micro-services paradigm. That’s because this way we can get high performance, isolation (which allow us for better agile process, more parallelization and quick releases… check the videos), resilient and fault tolerance and allow meeting elastic demands. With those features I’m able to support that paradigm change.

How I got those features, well not with the traditional server (standalone/virtualized) way, an IaaS for provisioning and a PaaS for delivering are needed.

  • A (private/public) cloud concept (for provisioning and economy of scale) allows the elasticity needed.
  • The container model/engine (docker for example) provides the isolation, performance, development friendliness while allowing fault tolerance with easy&quick availability and fast fleet.
  • The PaaS must provide all the facilities needed to run those containers (which are really complex, fleetthey are ephemeral, they are a lot, and there are a lot of fleets)
  • The data treatment is another issue, for better or worse. It must be acknowledge.

 

So after this, then I’d have met my goal of no more ssh (at user space) because each container runs a reduced set of tasks (usually one). But the involved mental process (to get it) isn’t straightforward.

Anyway there are important points left that need to be considered.

  • The applications need to be highly decoupled to be splitted in micro-services. It isn’t an easy task, more in legacy software. The apps need to be designed and programmed to run in cloud environments.
  • The complexity is increased brutally.Shifted from the app to the ecosystem. It’s reduced or controlled in two ways:
    • With an organization where each team is responsible of all the areas of a micro-services.
    • Infrastructure as Code, so the IaaS and PaaS are another program to manage (with the same checks and agile procedures of coding).
  • Cooperation and isolation barriers. One micro-service/One island. Team play although each team has a dedicate role it’s the way to resolve it along with an intelligent use of apis.
  • And of course the difficulties of distributed computing and Conway’s Law.

Some dead ends to be acknowledged the next year

Last post of the year, maybe it’s a good moment to summarize my progress in my foolish quest to design a modern elastic private infrastructure PAAS for a potential migration of our (Spanish) web services (without jumping straight to an already established platform; that will be the final solution anyway but knowing how it works and with the knowledge of why those processes, which problems existed and how had been tackled).

Goal for new year 2015

No SSH in user space!

No ssh for apps, no ssh for deploying, accessing logs, content, events, monitoring… microservices and containers for the win!

Spanish Web Services/Apps current Status

First of all, even an established PaaS is moot now. The reason is that our apps aren’t cloud friendly, they aren’t elastic, they violate several factors (http://12factor.net/) mainly because they were designed more than 10 years ago and while they have been mavenized in recent years there has been no willingness to redesign them. (I found later that myconnections already mentioned those concepts: Cloud enabling an app ‌and more specifically Re: Cloud enabling an app )

Those factors are:

IV. Backing Services: Dependencies and external services aren’t loosely coupled, they are tightly coupled.
VI. Processes: Except for our ajax web services which are stateless all our webs are statefuls and rely on sticky sessions. Loosing a node (the service in a node) implies the disconnection and loss work for the clients using that node.Apps don’t follow the “share nothing” paradigm.
VII. Port Binding: Our apps aren’t self-contained, they need web servers in front of the jboss app servers for serving static content wich isn’t in the app. It’s a legacy infrastructure design.
VIII. Concurrency: VI Process is a pre-requisite and we don’t comply with it.
IX. Disposability: Migrating from WebSphere to Jboss mitigated a lot the startup/shutdown times but failing VI, easily disposal of services is costly.
XI. Logs: Logs are a mess, there isn’t a signal/noise ratio there is just noise. Some apps log (in info or warn) thousands of lines per second! Each app has it’s own syntax, there are no error codes, no index, nothing, just developer… debug… garbage…

Cloud services are for Cloud

Most of my problems come from the lack of a Lab or a corporate AWS account. A local PC, vagrant and virtualbox aren’t the best tools. Windows, 8Gb of ram, a corporate proxy… it isn’t the best test bed to analyse how the system work.

PaaS/?aaS Infrastructure

Lot of names, lot of concepts, managers want a box with a lace, fellow workers look at me like an oddity, maybe they want all done or just no more problems.

Trying to explain the strata_container_ecosystemmain actors in a cloud infrastructure paradigmI found the following scheme very informative .

There are a lot of more actors but the layer distribution is understandable.

Technologies like OpenStack include several products that span over multiple layers (Heat, Nova,…).

Some technologies are compatible, some are interchangeable. Cloudfoundry and Heroku would be in layer 7 and 4.

Some tools or services like zookeeper, etcd, activemq, redis or databases (both: RDBMS and NoSQL) support or provide functionality to the ecosystem although they don’t have a proper layer.

Container Issues

Most of the technologies seem mature and ready, even Kubernetes which is still in alpha/beta state it’s already a contender. The battlefield is still Docker. Most of my previous concerns are already resolved, the ecosystem takes care of them, but I’m still thinking how to securize it.

The problem is that docker needs an image repository, the docker registry software is available but it’s authentication methods are very limited. They are open by default, anyone with access to the repository (an http port) could not only download an image but also push new ones or worse, replace an existing one.An OPS nightmare. You can’t even “do a git”, the access uses a HTTP Port so no security delegation to the OS. The only choice is using an apache/nginx frontend, the registry accepts BASIC auth but it’s a …

Actually that authentication exists, it’s called Docker Hub and follows a SaaS model. The payment is monthly and I don’t know if there is some type of integration with a corporate IDM, also the authentication is done in remote servers so Internet access is a must too.

Another point of concern is the security/integrity of the containers. The model proposed in CoreOS Rocket seems a lot more robust.

So how the container authorization and security are managed is another point to control when looking at a PaaS solution.