Tag Archives: cambios

Patching nightmares

I talked a few days ago with a colleague from the international team (that’s why this post is in english) about patching issues and strategies and this blog is a good way to convey those ideas.

First of all, patching is one of the final processes and a good indicative of the overall quality of all the system. A solid ground and good practices make it easier or even aspire to a seamless process. But the more complex or convoluted the system the more difficult is to done it.

Those best practices are not only for patching but for general management and may be summarized in (but not only) the following ones:

  1. Homogeneous hadware. One thing is leaving all the eggs in the same nest but leaving that aside, different HW providers, different server models installed, lots of system out of a standard,… make it a lot more complex and prone to fail (the same with support contracts).
  2. Standarized installation templates. At least one with the LCD of all the packages needed for a server. This is critical. You want to minize potential problems and patching time (The same applies for security and inventory). Those templates serve as testbeds for prior testing to the roll-outs.
  3. Installation and Configuration Management tools. The servers must be deployed/installed automatically with a tool. Different installations or changes in the process can (and will) provoke unexpected errors not detected at the testbeds (because those testbeds are not identical). Examples of these tools are jumpstart for Solaris , vmware templates or even ghost images. Also not using a configuration tool (like puppet or chef) means doing/changing things manually and without inventoring and/or auditing.
  4. Use official packages. If possible, try to use always official packages or with updates easily integrated. It easier, faster and more secure to patch/update an apache httpd server if it comes with a standard RHEL distribution than using a compiled version (by yourself) from source code
  5. Monitoring and logging. This one is easy. You need to know that everything works as it should. Problems due to patching (and everything else) must be detected and analyzed. To be sure that after a patch or upgrade roll-out there has been no problems. Monitoring of the patching and security process (instead of just the service offered) is an extra. 馃檪

So supposing a complex environment with several hundred of servers, multiple architectures (Solaris, Aix, Windows, Linux, VMWare,AS400, and more…), different versions of OS, all type of appliances (routers, balancers, logging, monitoring,…) and multitude of applications on top of them (oracle databases, weblogic, jboss, db2, web servers,SAS, SAP,…), well… finding a global solution, a silver bullet that fixes all the issues is a bit tricky.

Lets forget by now the management aspects: which servers, what to do, when, to whom, etc.

A pragmatic way to handle it would began attacking specifically each side. Usually each platform has its own distribution process already integrated in the O.S.

Full use of it is key for keeping tight each platform, some examples:

  • Windows: WSUS (or similar alternative) can be configured to support multiple channels, not only for OS updates but also applications and all controlled by GPOs. It’s a mature solution which is already implemented everywhere.
  • Linux: depends on the flavour, for enterprise use; Red Hat calls the shoots. The updates distribution can be centralized or distributed, with and without internet access and configured around the subscription model (not only with RH but also for internal channels). Less mature than windows it’s open source and can be scripted easily. If points 2, 3 and 4 are not correctly done can give issues.
  • Solaris: Here the sensible option is opening migration plans to Linux, really. Not only the users will note the performance gains and budget not having to talk with Oracle but the patching system was a mess and after the buyout it hadn’t improved much.There are techniques, hacks really, for minimizing risks and downtime (zfs, inherited zones,…) but it still seems manual (single mode for updating? When you can even change the linux kernel without rebooting?).
    Solaris 11… It seems it already comes with a new package distribution based on Debian but if you plan to migrate to 11 why not to Linux.

And this is just a brief glimpse, we have only mentioned application channels and skipped completely the management aspects.

 

Separaci贸n de poderes (Parte 1)

En 1748, Montesquieu en “El esp铆ritu de las leyes” establec铆a la separaci贸n de poderes en el estado (legislativo, ejecutivo y judicial) como m茅todo de control del desgobierno.

En la gestion de cambios, el proceso de desarrollo o mantenimiento de uno o m煤ltiples productos en el tiempo, este concepto es b谩sico y a partir de 茅l, definiremos la estrategia.

Los actores son 3 ( 4 si el proceso de auditor铆a lo lleva otro dpto.):

  • Desarrollo
  • Infraestructuras
  • Operaci贸n

Desarrollo controla el c贸digo fuente de las aplicaciones y los entornos de desarrollo. No tiene acceso f铆sico a los entornos de test ni producci贸n (dejare para otro d铆a que separa unos de otros) .

Infraestructuras (o Sistemas o Arquitectura o …) tiene control de los entornos de test y producci贸n. Su funci贸n es mantener el chiringuito funcionando llueva o nieve. CYA, desarrollo puede hacerte la vida muy聽dif铆cil, jugar todos con las mismas reglas escritas e inmutables es tu primera linea de defensa.

Operaci贸n gestiona los pasos o promociones entre entornos, son los encargados de instalar las aplicaciones u operar sobre ellas. No tienen acceso ni al codigo fuente ni a las maquinas fisicas. Trabajan con aplicaciones (ya hablaremos de ellas otro dia) como Quickbuild o Jenkins montadas por sistemas y son la unica forma legal de instalar una aplicacion en un entorno.

En general聽operaci贸n聽no tiene mucha libertad (y tampoco se busca) se trabaja con manuales o procedimientos y no se salen de ellos. Si un manual no existe tampoco existe la posibilidad de que lo hagan. A veces pueden controlar el acceso de Sistemas a las propias maquinas mediante herramientas de聽asignaci贸n聽de usuarios temporales y ventanas de tiempo con permisos.

Este blog tiene el punto de vista del聽谩rea聽de sistemas, otras聽谩reas聽tendr谩n聽su perspectiva que seguramente no sea la misma pero tampoco muchas veces est谩n sujetos a las mismas responsabilidades: si una aplicaci贸n cae sistemas es quien se da cuenta (o deber铆a, enterarse por el usuario es un no-no, primera y principal causa de mal rollo entre negocio e IT). Quiz谩 haya empresas en las que desarrollo (quien siempre tiene acceso a los logs) se hace responsable (es su aplicaci贸n, la conoce, la ha programado) analiza la causa y pone remedio… mi experiencia es que los logs ni se miran, a desarrollo hay que perseguirlo porque “no es cr铆tico para negocio”, la prueba de demostrar esa聽criticidad聽corre por parte de sistemas, hay que mantener un SLA pero hay cambios en producci贸n cada 3 dias, etc.

No hay balas de plata o soluciones聽t茅cnicas聽 a problemas no聽t茅cnicos,聽t茅cnicamente聽un marco estable de trabajo es el minimo para trabajar, la parte 2 tratar谩 de porque hace falta esta聽separaci贸n聽y a que peligros se enfrenta y porque. El siguiente tema sera la inmutabilidad del聽c贸digo聽fuente de una聽aplicaci贸n.