Target group: If you are a solution architect or consultant looking for some references of distributed data messaging solutions at a high level, this is an article for you. As well, if you are experiencing significant growth at your company and your messaging system is struggling to handle the volume of traffic, you experience frequent downtime and outages, this article can be a starting point to find out the best fitting solution meeting your messaging needs.
Disclaimer: I work for Red Hat and I am going to present our product which is AMQ Streams. The goal is not to show our solution is the best. However, the article is not about comparing specific features and characteristics but explaining the concepts behind the alternative solutions. I am not an expert, but I am very curious about the topic. This article was created because I was lost myself in so many different ideas available on the market and I had the need to gather everything in one place.
Distributed messaging systems have become an essential component of modern software architectures. As organizations continue to adopt microservices and other distributed computing models, the need for reliable and scalable message brokers has grown significantly. Apache Kafka, one of the most popular open source distributed messaging systems, has gained immense popularity in recent years. However, it’s not the only option available in the market. In this article, we’ll explore several distributed messaging systems, including Kafka, and compare them using a simple metaphor that demonstrates the high level differences between open-source, self-managed, and fully managed solutions.
Briefly about open source vs proprietary
In the context of building something, open source software can be compared to a set of blueprints that are freely available for anyone to use and modify. Imagine if an architect creates a set of blueprints for a house and puts them up on a community bulletin board for anyone to use. Anyone can take those blueprints, modify them, and build their own unique version of the house without any legal or financial obligations to the original architect.
On the other hand, proprietary software can be compared to hiring an architect who has their own set of blueprints for the house, but they hold the copyright to those blueprints. In order for someone to use those blueprints, they must pay a fee for a license to do so. The architect may also have restrictions on how the blueprints can be used or modified.
Similarly, with open source software, anyone can access the code, modify it to fit their specific needs, and distribute it as they see fit without any legal or financial obligations. With proprietary software, the code is typically kept private and only available to those who have paid for a license to use it.
My goal is to classify different types of solutions available on the market in a coherent and simple way. However, it is a challange!
The classification started from exploring Apache Kafka, which is an open-source distributed streaming platform that is widely used in enterprise messaging systems. Some of the solutions mentioned in the article are built on top of Kafka, while others use different underlying technologies. The classification is based on several factors, including whether the solution is open source, self-managed, or fully managed, and the underlying technology it uses.
Let’s use our imagination!
Choosing between open-source, self-managed, and fully managed solutions can be compared to choosing between building your own house, hiring a contractor and an architect, or buying a turnkey property. Building your own house from construction elements requires a lot of technical expertise, but you have complete control over the final product. Hiring a contractor provides a balance between control and ease of use, but there may be hidden costs. Buying a turnkey property is the easiest option, but it’s also the most expensive.
Comparing Kafka to Other Solutions:
Open-Source technology = Building blocks:
Let’s begin with the open-source technologies, which have gained immense popularity among developers and organizations that prefer open and community-driven software. Open-source messaging systems are free to use and modify, making them a popular choice for organizations looking to save on licensing costs. However, they require significant technical expertise and resources to set up, configure, and maintain. Kafka, RabbitMQ and Apache Pulsar are three popular open-source distributed messaging systems.
Imagine that you want to build a house using Lego blocks. Apache Kafka is like a set of pre-built modular blocks that you can use to construct your house. It’s well-established and has a large community of users who contribute to its development. RabbitMQ is another set of pre-built modular blocks that you can use, but it has a different focus on message queuing and offers additional features such as message routing and priority queues.
Apache Pulsar, on the other hand, is like a newer set of modular blocks that offers similar functionality to Kafka and RabbitMQ, but with some differences in the way the blocks are designed and how they work together. Pulsar provides features such as a multi-tenant architecture, built-in support for functions, and geo-replication. It’s also designed to be more scalable and fault-tolerant than Kafka or RabbitMQ.
In this metaphor, all three solutions offer different sets of pre-built blocks that can be combined to construct a messaging system. Kafka provides a solid foundation for building a messaging system, RabbitMQ offers additional features and flexibility, and Pulsar brings newer, more advanced building blocks to the table.
Self-Managed Solutions = Hiring a contractor:
Self-managed messaging systems like Red Hat AMQ Streams, Cloudera Data Flow, and Confluent Platform are like building a custom home with help from experts. You hire a contractor to build the house according to your specifications and preferences, and you’re responsible for maintaining the property once it’s built. Similarly, organizations that choose self-managed messaging systems are responsible for managing and maintaining the messaging infrastructure. They need to have a dedicated team with the necessary technical expertise to manage the system effectively.
Although self-managed solutions require more resources and expertise, they provide more control over the messaging system and allow for greater customization. Organizations can tailor the messaging system to their specific needs and have more flexibility in terms of deployment and configuration. However, this also means they bear the responsibility for the system’s performance, availability, and security. It’s like building a custom home – you get exactly what you want, but you also have to deal with the maintenance and upkeep.
Below you can find more information about the self-managed solutions.
Red Hat AMQ Streams:
Red Hat AMQ is an enterprise-grade distribution of Apache Kafka that is optimized for running on the Red Hat OpenShift Container Platform. It includes additional features such as:
- a schema registry,
- distributed tracing,
- and monitoring capabilities.
Organizations that use Red Hat AMQ Streams are responsible for deploying and managing the system on their own infrastructure, whether on-premises or in the cloud.
Cloudera Data Flow:
Cloudera Data Flow is an open-source platform that provides a range of data management and processing capabilities, including messaging, streaming, and batch processing. It includes Apache NiFi for data ingestion, Apache Kafka for messaging and streaming, and Apache Flink for batch processing. Cloudera Data Flow is designed to be deployed on-premises or in the cloud, and organizations are responsible for managing and maintaining the system themselves.
Confluent Platform is a distribution of Apache Kafka that includes additional features and tools to make it easier to manage and operate at scale. It includes features such as Confluent Control Center for monitoring and management, Confluent Schema Registry for managing data schemas, and Confluent Replicator for replicating data across multiple Kafka clusters. Like Red Hat AMQ Streams, Confluent Platform is designed to be deployed and managed by organizations themselves, whether on-premises or in the cloud.
Based on the three solutions mentioned, it is evident that enterprises have multiple options when it comes to self-managed messaging systems. Each solution offers its unique set of features and capabilities, but all of them require the organization to have a dedicated team with technical expertise to manage and maintain the infrastructure.
Fully Managed Solutions = Buying a turnkey property:
Similar to buying a turnkey property, users of fully managed solutions don’t need to worry about the underlying infrastructure or building and maintaining their own messaging system. Instead, they can focus on using the messaging service to fulfill their needs.
Just as turnkey properties can be customized to some extent by the buyer, fully managed messaging solutions can be configured and customized to meet specific requirements, such as message size limits or delivery guarantees. However, like a turnkey property, the level of customization may be limited to what the service provider offers. Below you can find more details about the fully managed solutions.
Amazon Simple Queue Service (SQS):
Amazon Simple Queue Service (SQS) is a fully managed messaging service that offers reliable and scalable message queuing and processing. SQS provides a highly available and durable message queue that can handle any volume of messages. It is like a vending machine that provides a set of pre-built options that you can choose from.
Azure Event Hubs:
Azure Event Hubs is a fully managed streaming platform and event ingestion service that can receive and process millions of events per second. It is designed to capture and process streaming data from a wide variety of sources and integrate with other Azure services for real-time analytics, monitoring, and alerting. A good metaphor for Azure Event Hubs would be a high-speed train that transports a large volume of passengers quickly and efficiently.
Google Pub/Sub is a fully managed messaging service that enables asynchronous communication between independent applications. It offers durable message storage, real-time message delivery with low latency, and flexible message routing options. Google Pub/Sub is often used for building event-driven architectures, data pipelines, and other distributed systems. A good metaphor for Google Pub/Sub would be a reliable postal service that delivers messages and packages to their destinations quickly and securely.
Overall, fully managed solutions offer a hassle-free option for those who want an easy-to-use messaging system, similar to the convenience of buying a turnkey property. However, this comes with a price.
Last but not least…
In conclusion, distributed messaging systems like Kafka and its competitors have revolutionized the way data is handled and processed in modern enterprises. With the rise of Big Data, IoT, and other high-volume data streams, companies need to implement solutions that can handle these volumes at scale. While proprietary solutions from hyperscalers may seem like the go-to option, open-source and self-managed solutions like Kafka and Red Hat AMQ Streams offer comparable features and functionality, with the added benefits of flexibility, customizability, and cost-effectiveness.
Choosing the right solution ultimately depends on a company’s specific needs and goals. However, with the help of the metaphor I used to differentiate open-source, self-managed, and fully-managed solutions, and a simple classification model, you can start your research with a base understanding of what solutions are there in the marketplace.
In the end, it’s not just about choosing the most popular or hyped solution, but rather the one that fits your organization’s unique needs and can help you achieve your goals in the most efficient and effective way possible.