Welcome back All. In part 1 we covered the basic understanding of Cloud Native Applications (CNA) and more importantly its relevance to todays IT Operations Teams. Let’s start with a quick recap of what we’ve already covered:
- CNA is an evolution in software development focused on speed and agility, born out of removing the challenges of the traditional SDLC
- Ops Teams are struggling to fully operationalise modern web-based applications, i.e. build and/or maintain operational practices for hosting cloud native applications
- There is a vast array of CNA architectures, platforms and tools, all of which are in their relative infancy and require a degree of rationalisation to be useful in the enterprise
I also covered breaking down my understanding of CNA into two areas of research; Foundational Concepts and CNA Enablers, the later of which we’ll cover in this post.
How can CNA fit into existing IT Operations?..
To see where CNA Enablers might fit, I took a look at the responsibilities of a modern IT Team for application delivery. At a high-level our priorities might cover:
Development Team: Application architecture / resiliency / stability / performance, deployment, version control, data management, UX/UI.
Operations Team: Platform automation / orchestration / availability, scaling, security, authentication, network access, monitoring.
Note, this is a very generic view of the average IT Team dichotomy, but it does illustrate that there is virtually no crossover. More importantly, this shows that the core of operational tasks are still aligned with keeping hosting platform(s) alive, secure and running efficiently. So with this mind, how do we go about bringing operations and development closer together? Where will we start to see some overlap in responsibilities?
There has been a lot of commotion around containers (and by association, micro services) as the genesis of everything cloud native, however Linux containers have existed for a long time. If we filter the noise a little, it’s clear to see that containers have become essential because they address the lack of standardisation and consistency across development and operations environments, which has become more prevalent with the growing adoption of public clouds like AWS.
So what is all the fuss about? To begin to describe the simple beauty of containers, I like to think of them as a physical box where our developers take care of what’s inside the box, whilst operations ensure that the box is available, wherever it needs to be available. The box becomes the only component that both teams need to manipulate.
To overlay this onto the real world, our dev’s have to deal with multiple programming languages and frameworks, whilst we (as ops) have numerous platforms to maintain, which are often comprised of drastically different performance and security characteristics. If we introduce a container based architecture, the “box” reduces friction by providing a layer of consistency between both teams.
Note: There are plenty of awesome blogs and articles which describe the technical construct of a container in minute detail. If this is your area of interest, get Googling…
Now for me it was also important to understand that containers are not the only way to deploy a cloud native architecture (please refer to this excellent post from my VMware colleague @mreferre), but also to acknowledge that they are important for a number of reasons, namely:
- They provide a portable, consistent runtime across multiple platforms (desktop, bare metal, private & public cloud)
- They have a much smaller, more dynamic resource footprint
- They can be manipulated entirely via API
- They start in milliseconds, not seconds or minutes
- They strip away some layers which could be considered to add significant ‘bloat’ to a traditional deployment
- Taking into account all of the above they provide a better platform for stateless services
If we compare a “traditional” VM deployment to a containerised construct (diagram below), it’s evident that gen 2 (i.e. monolithic / single code base) apps often have a larger resource overhead because of their reliance on vertical scaling and the tight coupling of their constituent parts. If we have to move (or redeploy) a gen 2 app, we need to move (or redeploy) everything northbound of the VM layer, which can be considerable if we are moving app data as well.
Note: The above diagram is not intended to show a refactoring from gen 2 to gen 3, but instead how the same applications might look if architected differently from scratch.
From an operational perspective, gen 3 (ie. cloud native) apps which leverage containers and have a far greater focus on horizontal scaling, whilst greatly increasing consolidation of supporting platform resources.
As a comparison, when moving gen 3 apps between environments we only have to push the updated app code and supporting binaries/libraries not included in the base OS. This means we have a much smaller package to move (or redeploy) as the VM, guest OS and other supporting components already exist at the destination. Deployment therefore becomes far more rapid with far less dependency.
Now this is all very exciting, but in reality gen 2 and gen 3 will have to coexist for some time yet, therefore it’s probably best to have a strategy that supports both worlds. For this reason, I am researching the synergies between the two constructs which is where I believe many IT shops will thrive in the near term.
Where do we begin?..
If we start with a minimal platform, all we really need to be able to build a containerised application is; a host, an OS which supports a container runtime and a client for access. It’s entirely possible to build containerised applications in this way, but obviously we are severely limited in scalability. Once we go beyond a single host platform, management becomes far more complex and therefore requires greater sophistication in our control plane. But I guess we should try to walk before we run…
Let’s take a closer look at some of the layers of abstraction we will be working with. Note: So as not to confuse myself with too many technologies, I’ve focused my research on VMware’s Photon (for obvious reasons) and Docker, which I believe has firmly established itself as the leader in container and container management software.
Container Engine / Runtime – This is the software layer responsible for running multiple, isolated systems (i.e. containers) by providing a virtual environment that has its own CPU, memory, block I/O, network, cgroups and namespaces within a single host. It is also responsible for scheduling critical container functions (create, start, stop, destroy) in much the same way a hypervisor does.
In the case of Docker, it’s also the runtime that manages tasks from the Docker Daemon which is the interface that exposes the Docker API for client-server interaction (through socket or REST API).
Container OS – A container OS (as the name would suggest) is an operating system which provides all the binaries and libraries needed to run our code. It also enables the container engine interact with the underlying host by providing the hardware interfacing operations and other critical OS services.
Photon is VMware’s open source Linux operating system, optimised for containers. in addition to Docker, Photon also supports Rkt and Garden meaning we are not limited to a single container engine. It’s also fully supported on vSphere (and therefore vCloud Air) and it has no problems running on AWS, Azure and Google Cloud Engine (though it may be fully supported by these service providers at the time of writing).
Note: If you feel like having a play around with Photon (ISO), it can be downloaded from here, deployed directly from the public catalogue in vCloud Air, or if you want to build your own Photon image you can also fork it directly from GitHub.
Host – Our operating system still needs somewhere to run. I believe that for most of us, virtual machines are still best used here because of the sophistication in security, management and monitoring capabilities. In the short term it means we can run our containers and VM’s side by side, but it should be noted that we can also run our container OS on bare metal and schedule container operations through the control plane.
Platform – A platform in the context of operations is simply a hosting environment. This could be a laptop with AppCatalyst or Fusion, vSphere and / or private and public cloud, really any environment that is capable of hosting a container OS and the ecosystem of tools needed to manage our containers.
Basic Container Usage…
In order to make this an effective approach for our dev’s, they need self-service access to deploy code and consume resources as they see fit. The simplest approach for our dev’s is to deploy in an environment where they have full control over the resources, like their laptop.
Once we go beyond the dev laptop, our platforms might include on-premises virtual infrastructure, bare metal and public cloud. The platform itself is not really that important to our dev’s provided it has the capabilities needed to support the application. So ops really need to concentrate on transparently supporting our dev’s ability to operate at scale. With that comes operational changes, which might include:
- Secure access to the container runtime (via a container scheduling interface, which we’ll cover in the next post)
- Internal network communications to support containerised services to function at scale, including virtual routing/switching, distributed firewall, load balancing, message queuing, etc
- Secure internet and/or production network communications for application front end network traffic
- Support for auto-scaling and infrastructure lifecycle management, including configuration management, asset management, service discovery, etc
- Authentication across the entire stack defined through identity management and role based access controls (RBAC)
- Monitoring throughout the entire infrastructure stack (including the containers!)
- Patching container OS / runtime and all supporting platforms
Now I realise this is only scratching the surface, but if we listed all of the operational changes needed to incorporate this mode of delivery we would be here all day. For this reason I’m ignoring CI/CD and automation tools for the time being. Don’t get me wrong, they are absolutely critical to building a reliable self-service capability for our dev’s, but for now they are just adding a layer of complexity which is not going to aid our understanding. We’ll break it down in a later post.
So there you have it. In looking at the simple benefits that containers provide, we quickly begin to realise why so many organisations are developing cloud native capability. In the next post we’ll start to look at some of the realities of introducing a cloud native capability to our operations when working at scale.
References and credits: