Patterns - A Path Towards Hybrid Cloud

ˈpætɚns

Probably almost every time we are starting to understand and to solve a general problem, we will ask ourselves, am I the first one who is facing that problem, or have there been others who in the end either failed or succeeded. This is particularly true, if I am highly uncertain whether my own approach will be successful, or even worse, when I have not even a clue where to start.

According to Cambridge Dictionary, patterns are particular ways in which something is done or organized, or in which something happens, but also something that is used as an example, especially to copy.

Consequently, patterns are both guidance and examples. Which is intuitively correct, since often examples are the entry into the understanding on how a certain problem should or might be solved. In the world of software development, patterns are widely considered to reflect certain types of design patterns, in terms of reusable solutions to commonly occurring problems within a given context.

Now, why are patterns so important?

If you like, patterns are the opposite of re-inventing the wheel.

They save you time and effort, and increase the probability that your solution really works, and (best case) even provide an optimum approach to solving. They include knowledge about the diverse components used for a solution and how to utilize and leverage them in the desired way. They provide a context for the conversation about both the problem and the solution, and help bring structure into the methodology.

Patterns are especially valuable when things are becoming complex, because they can reduce complexity by focusing on the what rather than the how. This is particularly true when there is a big change, or disruption, where strong guidance is needed to master the change.

In order to increase the probability that a solution works, patterns need to be proved, or validated. And a pattern is never complete without the proper context for which the pattern has been created. It may outstand with the context it has been created for, but fail with a different context.

The context of the kind of patterns we are talking about in this article is everything about the software development and operations platform, materialized through a Hybrid Cloud, supporting the full software lifecycle, starting with the demand analysis and the design phase, followed by implementation, testing, deployment and secure operation.

Background

Imagine you are the one who is responsible for setting up the software development and operations platform for your company, and you are responsible for making everything secure. “What”, you might say, “isn’t it either software development or operations, and isn’t security something that is no more than just a last polishing”?

True for the past, where writing software and operating infrastructure and operations and securing everything have been highly separated tasks. But things are changing. Shorter release cycles require a deeper interlock of software creation and testing on one side, and operations on the other. And security is going to run through the full life cycle. Also, the objects of operations, the infrastructure and the workloads and the security controls, become more and more software-defined.

As a consequence, the contrast between software development and operations and security is fading. Even more, the interdependence of software development and operations and security becomes a key factor for being successful with the business. Where we have been organized into silos in the past, we need everything under cockpit control in the future.

This alone is a big change in how we are going to deal with IT in the future. But there is more. We are going through a change in society, the way we communicate with each other, the way we consume information and the way we perform our daily business are changing. A change which is called digital transformation. Any service, any offering, any information which doesn’t match our digital expectations creates friction and slows down our own performance.

So, digitalization is key to success, but digitalization itself isn’t always easy to achieve.

But what does this have to do with patterns?

Patterns are not always making things easier. If a certain pattern is well established, but the context changes, so that the old patterns won’t fit any more, then a new pattern is needed. But changing patterns is even more difficult than creating a pattern from scratch. Because everything is optimized towards the old pattern, and everybody used to work with the old pattern will stick to that pattern. And massive effort might be required to switch to the new pattern.

So, in order to avoid too much pain when switching patterns, the new pattern MUST consider both the old pattern and the new pattern, and provide a migration path.

Automation

Starting with the industrial age, many human tasks which could be done by machines were transferred to machines, for several reasons. We are not yet at the end of this development, because capabilities of machines are constantly improving, and even tasks we thought a couple of years ago will never be done by machines are already done by machines in terms of automation, or appear to be candidates to be done by machines in the future. AI/ML and especially generative AI are preparing for the next wave of disruption.

Automation in IT isn’t a new topic. Especially repeatable tasks have always been automated, mostly utilizing shell scripts. Shell scripts, however, reach their limits with increasing complexity, with the range of functionality that can be accessed via shell scripts, and with working in changing teams. Fortunately, cloud makes automation easier with infrastructure-as-code and concepts such as API-first. Nevertheless automation still needs to be manageable even in complex and changing environments.

Today automation encompasses more and more fields of application. Just as an example, increasing use of automated attacks requires more and more automated countermeasures in order to keep security at the desired level.

It is not the question, whether we will utilize automation, but just how we will do it. The main difference to classic industrialization is that we finally always kept the systems under control. It was always possible to tell why an automated system does what it is doing. That is changing with complexity, and this is particularly changing with non deterministic statistical decision making, as it is done with machine learning, where decisions are matter of probability, learning algorithms and training material, rather than of configuration or rules.

We are still responsible for how automation works, how good automation works, and for all consequences. Which means also optimizing processes before automating them, and performing an adequate quality assurance, especially with automation. Because automating bad processes will make things incredibly worse. But automation is essential for repeatable, scalable, stable, optimized, economically successful implementations, and boosting productivity, and is a key driver for digitalization. Again, proven patterns will help us master automation even with its complexity.

Invariants

With all that change, there are also things that still are part of the context, and therefore still need to be taken into consideration. Among those invariants are security requirements and regulations. Other invariants might be fixed budgets or existing skillsets.

This sometimes can be a challenge when there is a context change from without security requirements and regulations, towards another with security requirements and regulations, like from community development processes to production. Again, a context switch will make the transition more difficult, unless patterns exist which are capable of supporting the transition.

Hybrid Cloud

Hybrid Cloud stands for a certain paradigm. It stands for cloud, of course, and for the combined use of private and public cloud. In reality, it is much more than just that. Hybrid Cloud is the abstraction of what have been internal and external data centers in the past, with a holistic view, and with all of their infrastructure objects, development platforms, processes and policies.

Hybrid Clouds are multi-clouds by definition (the simplest Hybrid Cloud is made by one private cloud instance, and one public cloud instance). But it might be desirable to start with a private cloud with the option to later extend it to a Hybrid Cloud. The ideal Hybrid Cloud offers every capability you need to develop and run services and applications, such as

infrastructure/hardware
operating systems
resource provisioning (compute, storage, network)
infrastructure services (such as DNS, backup, DR and monitoring)
security and compliance services
development and testing environments
application management
layered cloud services (such as logging, databases, identity management, AI/ML, billing and cost control etc.)
update- and patch management

where cloud services can be used and workloads can be deployed across private and public clouds as needed.

Whereas the private part of the Hybrid Cloud usually can be assumed as more or less static, public cloud offerings are primarily used for elasticity, i.e. scaling resources up and down as needed. The private cloud part normally implements special requirements that are not available in the public cloud context, such as data privacy or the processing of classified information.

As stated, this is a view of an ideal Hybrid Cloud. Reality shows that there are certain challenges concerning the integration of HyperScalers into a private cloud context, and also creating/enforcing consistent policies across the clouds, and moving complete clusters from one cloud to another is not necessarily a breeze. Especially not when you are not familiar with that topic. But public cloud providers also normally have limited interest in being part of a multi-cloud environment, for the simple reason of customer retention.

Practically, utilizing Hybrid Cloud in general consists of three tasks. One is setting up the platform, the other is moving applications to that platform, and the third task is implementing all processes around the life cycle of both infrastructure and applications. Ideally in a standardized manner.

Here again patterns will come into play, since building and operating a Hybrid Cloud without will simply not work. Too many components, too many configuration options, too many interfaces, too many “legacy” applications and too many “good practices” will break your project. You need to have something, which guides you through the process and shows you how it works. Which implements best practices and security-by-default. Not in a way that it gives you exactly what you need, but in a way to provide you with a starting point, so that the basics are there, and you can concentrate on the specific aspects of your individual context e.g. by customization of a pattern.

Organization

Adopting the Hybrid Cloud also imposes capabilities to your organization. In a static world of slowly changing data center environments, a typical organization is structured into functional silos with defined responsibilities and strict standards. Interactions between these silos are governed py processes and decision boards with multiple stakeholders. Operations typically are risk and change averse and are aiming towards a steady state. A known working environment will typically be “frozen”, building a standard operating environment by keeping the hardware, OS and application software fixed for longer periods of time. Changes of course are possible, but need to be planned and processed following a defined, standardized way, involving all stakeholders.

In the Hybrid Cloud world, changes are the norm, and the organization must be able to adapt and react flexibly and agile. Decisions have to be made more quickly and at best at the lowest possible organizational level.

Multiple clouds (read: infrastructures) also require more and different knowledge of the team, skills need to adapt to changes and to additional services that are being offered in a continuous way.

It will become a challenge to keep control of a volatile operating environment. Moreover, public cloud infrastructure is maintained and configured externally. Virtualization hides the real hardware behind a logical abstraction, which is software-defined and software-controlled. The (virtual) hardware changes on the fly, operating systems patches need to be applied on a daily basis and applications are also constantly changing in a DevSecOps world. The way the organization is dealing with risks needs to change dramatically. While failing in the past meant big trouble, it is now the opportunity to analyze, optimize and adapt, thus creating a track of continuous improvement. And this leads to the possibly most difficult organizational challenge: Create a mindset and a leadership that allows for failures and incentives for continuous learning.

Consequently, we need to find a way to reflect the change, in that we improve and adapt to a changing context, while still asserting a consistent development and operations environment.

Evolution of standardized environments – Hybrid cloud patterns

So, is there a way to rescue the idea of standardized environments into the Hybrid Cloud era? Can we provide agile and flexible versions of our standard operating environments? This brings us back to the ideas of using patterns. But what are the operational challenges these patterns need to solve? And how should they do that, without becoming static and difficult to change themselves?

First of all these patterns should act as logical reference architectures, but they also might need to include concrete technologies instead of high-level logical diagrams, so that these patterns actually help and create value for the user by reducing the space of all possible options. We typically call this “opinionated”. There might be different patterns with different “opinions”, but under certain conditions, it might be advantageous if a chosen pattern should determine at least the most important technology decisions. It might even be desirable that a pattern alone is capable of deploying a fully working Hybrid Cloud environment for a defined purpose from scratch. But remember, not every problem which needs to be addressed is of technical nature.

In a rapidly changing world the patterns implement the life cycle of the applications and services they support, but patterns also need to include a life-cycle management for themselves. Change should be the norm, and therefore continuous testing, CI/CD, pipelines, etc. should be part of the concept. Also for the pattern itself and for the serving application as well. Generally speaking, patterns are using the same mechanisms as the processes and objects they are acting on. Everything is software-defined, everything is as-a-code.

To control the change (and be able to undo it) it is necessary to have a (textual) description of the whole configuration, including the representation of the patterns, in a version control system. This also helps for scalability and resiliency by creating identical deployments on different infrastructures. A modern and popular form to achieve this is GitOps.

As addressed in the beginning, we also need to change the management and control of the environment from a set of siloed views to a cockpit view of the complete stack. This needs capabilities for monitoring and structural logging that automatically correlates the complete stack. A declarative approach with configuration drift monitoring like GitOps can help here, too.

Security and compliance also need to be part of the pattern right from the start. Access and secrets management as well as continuous compliance checks should be embedded into the core of these patterns. A strategy for intelligent data backup and federation as well as disaster recovery procedures is needed to be able to fulfill the resilience and reliability requirements.

Patterns need to address a number of non-functional requirements and constraints independent from the infrastructure.

Remember: We cannot control all of the infrastructure. All public cloud instances are in case of doubt out of reach. If, for instance, the DNS system of one public cloud provider breaks down, there needs to be a strategy on how to proceed with the application. Depending on the criticality this can mean to wait until the problem gets resolved by the cloud provider, or the traffic should be automatically re-routed to another instance that is running in another independent infrastructure.

Outlook

Sounds interesting? Can patterns help to get productive in the Hybrid Cloud? What should these patterns look like? Who is providing them? Are there Open Source patterns? Can I have enterprise ready supported patterns?

These are many and very relevant questions. We will try to answer some of them in our next posts on this topic. We are also happy to receive feedback in the comments and sorry for the cliffhanger 😉