ARTICLES
- Clean Up Event Data in Ansible Event-Driven Automation
In the past few articles, we explored how to use different event sources in Ansible Event-Driven Automation (EDA). In this demo, we'll focus on how event filters can help clean up and simplify event data, making automation easier to manage. Specifically, we'll explore the ansible.eda.dashes_to_underscores event filter and how it works. When using Ansible EDA with tools like webhooks, Prometheus, or cloud services, events often come in as JSON data. These JSON payloads usually have keys with dashes in their names, like alert-name or instance-id. While this is fine in JSON, it becomes a problem in Ansible because variable names with dashes can't be used directly in playbooks or Jinja2 templates. The dashes_to_underscores filter helps solve this issue by converting those dashed keys into names that Ansible can work with more easily.
- How to Build the Right Infrastructure for AI in Your Private Cloud
AI is no longer optional. From fraud detection to predictive maintenance, businesses everywhere are investing in machine learning and deep learning models. But training and running these models isn't light work. They require high-performance hardware, massive storage, fast networking, and serious automation. Public clouds like AWS and Azure offer AI-ready infrastructure, but not every company wants to go that route. Whether it's for compliance, cost control, or pure performance, many teams are building AI stacks in their private cloud environments.
- Java's Quiet Revolution: Thriving in the Serverless Kubernetes Era
Along with the rise of Kubernetes, there is another shift that is happening under the hood - the rise of serverless architecture, which is quietly rewriting the way we deploy and scale applications, with Java taking a lead. Java, which is usually associated with legacy code and monolithic enterprise applications, has been slowly but steadily evolving into a microservices architecture and is now evolving into a leaner, serverless-ready world. With the availability of tools like Knative and frameworks like Quarkus, Java has been transforming from a heavyweight language into a zero-management, Kubernetes-ready ready. In this article, we will reflect on this promising transformation in Java and where it can take us in 2025 and beyond.
- Java's Quiet Revolution: Thriving in the Serverless Kubernetes Era
Along with the rise of Kubernetes, there is another shift that is happening under the hood - the rise of serverless architecture, which is quietly rewriting the way we deploy and scale applications, with Java taking a lead. Java, which is usually associated with legacy code and monolithic enterprise applications, has been slowly but steadily evolving into a microservices architecture and is now evolving into a leaner, serverless-ready world. With the availability of tools like Knative and frameworks like Quarkus, Java has been transforming from a heavyweight language into a zero-management, Kubernetes-ready ready. In this article, we will reflect on this promising transformation in Java and where it can take us in 2025 and beyond.
- Auto-Instrumentation in Azure Application Insights With AKS
Monitoring containerized applications in Kubernetes environments is essential for ensuring reliability and performance. Azure Monitor Application Insights provides powerful application performance monitoring capabilities that can be integrated seamlessly with Azure Kubernetes Service (AKS). This article focuses on auto-instrumentation, which allows you to collect telemetry from your applications running in AKS without modifying your code. We'll explore a practical implementation using the monitoring-demo-azure repository as our guide.
- The Role of AI in Enhancing DevOps Processes
An Introduction to DevOps and AI Integration DevOps is this awesome mix of teamwork and tech that’s all about getting software developers and IT operations on the same page. It’s less about silos and more about chatting openly, working together, and using automation to pump out top-notch software faster than ever. In today’s wild, fast-moving digital world, DevOps is your ticket to staying ahead, cranking out products quicker, and always tweaking them to be better. But here’s the thing: as software delivery gets fancier with microservices, cloud setups, and slick CI/CD pipelines, DevOps teams are stuck wrestling with mountains of data, crazy-complex systems, and stuff that needs to happen right now. That’s where artificial intelligence (AI) swoops in like a superhero sidekick. With AI in the mix, teams can ditch the repetitive grunt work, spot trouble brewing before it hits, and keep everything flowing smoothly. The payoff? Software drops faster and works like a dream.
- A Modern Stack for Building Scalable Systems
In software engineering, we have a lot of tools—tens or hundreds of different tools, products, and platforms. We have SQL DBs, we have NoSQL DBs with multiple subtypes, we have queues, data streaming platforms, caches, orchestrators, cloud, cloud versions of all the previous. We have enough .... In this article, I want to describe a “basic” modern stack that will allow you to build robust and scalable systems. They are language agnostic and can be easily integrated into most of the modern day programming languages.
- A Modern Stack for Building Scalable Systems
In software engineering, we have a lot of tools—tens or hundreds of different tools, products, and platforms. We have SQL DBs, we have NoSQL DBs with multiple subtypes, we have queues, data streaming platforms, caches, orchestrators, cloud, cloud versions of all the previous. We have enough .... In this article, I want to describe a “basic” modern stack that will allow you to build robust and scalable systems. They are language agnostic and can be easily integrated into most of the modern day programming languages.
- From Concept to Cloud: Building With Cursor and the Heroku MCP Server
I’ve been experimenting with Cursor as a development tool, and it’s been surprisingly helpful in my day-to-day workflow. It’s not just that it writes code — it understands context, offers suggestions in the right moments, and even anticipates what I’m about to do next. When I saw the announcement about the Heroku MCP Server, I got curious. Could I use Cursor to go beyond just writing code, and actually build and deploy an app to Heroku, primarily via chat prompts and responses? I decided to try it out.
- Mastering Fluent Bit: Installing and Configuring Fluent Bit Using Container Images (Part 2)
This series is a general-purpose getting-started guide for those of us wanting to learn about the Cloud Native Computing Foundation (CNCF) project Fluent Bit. Each article in this series addresses a single topic by providing insights into what the topic is, why we are interested in exploring that topic, where to get started with the topic, and how to get hands-on with learning about the topic as it relates to the Fluent Bit project.
- Mastering Fluent Bit: Installing and Configuring Fluent Bit Using Container Images (Part 2)
This series is a general-purpose getting-started guide for those of us wanting to learn about the Cloud Native Computing Foundation (CNCF) project Fluent Bit. Each article in this series addresses a single topic by providing insights into what the topic is, why we are interested in exploring that topic, where to get started with the topic, and how to get hands-on with learning about the topic as it relates to the Fluent Bit project.
- Platform Engineering for Cloud Teams
Platform engineering has emerged as a key practice for cloud teams, providing self-service capabilities, automation, and governance to streamline software delivery. This practice has evolved out of scaling out DevOps at a large scale. In this blog, we will explore the role of platform engineering, its benefits, and how Cloud teams can successfully implement it. What Is Platform Engineering? Platform engineering is the practice of designing and building internal developer platforms (IDPs) that enable software teams to develop, deploy, and manage applications efficiently. These platforms integrate tools, infrastructure, and workflows to reduce cognitive load on developers, allowing them to focus on writing code rather than managing complex cloud environments and learning the processes to manage them.
- Platform Engineering for Cloud Teams
Platform engineering has emerged as a key practice for cloud teams, providing self-service capabilities, automation, and governance to streamline software delivery. This practice has evolved out of scaling out DevOps at a large scale. In this blog, we will explore the role of platform engineering, its benefits, and how Cloud teams can successfully implement it. What Is Platform Engineering? Platform engineering is the practice of designing and building internal developer platforms (IDPs) that enable software teams to develop, deploy, and manage applications efficiently. These platforms integrate tools, infrastructure, and workflows to reduce cognitive load on developers, allowing them to focus on writing code rather than managing complex cloud environments and learning the processes to manage them.
- Simplifying Vector Embeddings With Go, Cosmos DB, and OpenAI
When working on applications that require vector, semantic, or similarity search, it's often useful to have a quick and easy way to create vector embeddings of data and save them in a vector database for further querying. This blog will walk you through a simple web application that allows you to quickly generate vector embeddings for various document types and store them directly in Azure Cosmos DB. Once stored, this data can be leveraged by other applications for tasks like vector search, part of a Retrieval-Augmented Generation (RAG) workflow, and more.
- Distributed Systems 101
Distributed systems are all around us: Facebook, Uber, Revolut — even the Google search engine is one of them. One search in Google can trigger tens (or hundreds) of calls to different microservices owned by Google. What is more, they are the core of what we work with: multiple services working together, or maybe a database, or just a service or two with some cache layer, or even some service that connects via an async message queue.
- Docs That Write Themselves: Scaling With gRPC and Protobuf
In this article, we’ll explore how gRPC and code generation can help you: Write documentation that people actually read and use, Standardize communication between microservices, Avoid code duplication across services. Nowadays, many developers are already familiar with gRPC. It’s no longer surprising that service-related teams prefer it for inter-service communication and even for documentation purposes.
- Zero Trust Isn't Just for Networks: Applying Zero-Trust Principles to CI/CD Pipelines
Zero trust has emerged as a cornerstone of modern enterprise security. It is mainly applied to networks, user identities, and endpoints of most organizations. However, the single layer left undersecured is the CI/CD pipeline. These systems orchestrate code validation for production deployment and do so with persistent credentials and system privileges. This contradiction is fundamentally inconsistent with the rest of the zero-trust model, in which, by default, there should be no trust in any service, identity, or connection. But in our environment, we realized that our pipelines had been quietly left out of security scrutiny. Jobs had long-lived secrets, and build containers were reused. The amount of access reached a level beyond what should be expected for any single job. We recognized these risks, and so we deemed that our pipelines should be treated as untrusted by default. This decision radically altered how we would approach automation and access.
- Mastering Fluent Bit: Installing Fluent Bit From Source (Part 1)
This series is a general-purpose getting-started guide for those who want to learn about the Cloud Native Computing Foundation (CNCF) project Fluent Bit. Each article in this series addresses a single topic by providing insights into what the topic is, why we are interested in exploring that topic, where to get started with the topic, and how to get hands on with learning about the topic as it relates to the Fluent Bit project.
- Mastering Fluent Bit: Installing Fluent Bit From Source (Part 1)
This series is a general-purpose getting-started guide for those who want to learn about the Cloud Native Computing Foundation (CNCF) project Fluent Bit. Each article in this series addresses a single topic by providing insights into what the topic is, why we are interested in exploring that topic, where to get started with the topic, and how to get hands on with learning about the topic as it relates to the Fluent Bit project.
- Gemma 3: Unlocking GenAI Potential Using Docker Model Runner
The demand for fully local GenAI development is growing — and for good reason. Running large language models (LLMs) on your own infrastructure ensures privacy, flexibility, and cost-efficiency. With the release of Gemma 3 and its seamless integration with Docker Model Runner, developers now have the power to experiment, fine-tune, and deploy GenAI models entirely on their local machines. In this Blog, we’ll explore how you can set up and run Gemma 3 locally using Docker, unlocking a streamlined GenAI development workflow without relying on cloud-based inference services.
- Gemma 3: Unlocking GenAI Potential Using Docker Model Runner
The demand for fully local GenAI development is growing — and for good reason. Running large language models (LLMs) on your own infrastructure ensures privacy, flexibility, and cost-efficiency. With the release of Gemma 3 and its seamless integration with Docker Model Runner, developers now have the power to experiment, fine-tune, and deploy GenAI models entirely on their local machines. In this Blog, we’ll explore how you can set up and run Gemma 3 locally using Docker, unlocking a streamlined GenAI development workflow without relying on cloud-based inference services.
- Integrating Jenkins With Playwright TypeScript: A Complete Guide
In this blog post, we'll explore how to set up and integrate Jenkins with Playwright TypeScript for automated testing. This integration enables continuous integration and automated test execution in your development pipeline. Playwright is a modern, open-source automation testing framework developed by Microsoft that enables reliable end-to-end testing for web applications. It supports multiple browser engines (Chromium, Firefox, and WebKit) and allows you to write tests in multiple programming languages, including TypeScript, JavaScript, Python, and .NET. Playwright is known for its auto-wait capabilities, strong reliability, and cross-browser testing features.
- Feature Flag Framework in Salesforce Using LaunchDarkly
Releasing new features in a Salesforce environment can sometimes feel like walking a tightrope; one misstep can take down mission-critical processes. That’s why feature flagging has emerged as a powerful strategy. Instead of deploying features to everyone all at once, you introduce them incrementally, fine-tuning your approach along the way. In this article, I’ll share how I built a feature flag framework using custom permissions, permission sets, and an integration with LaunchDarkly. This setup has helped my team safely roll out new Salesforce functionality while maintaining complete control over who sees what and when.
- Terraform Drift Detection at Scale: How to Catch Configuration Drift Early
While Terraform possesses the declarative model for managing infrastructure across cloud platforms, it makes one assumption that is rare at scale: that the state of the deployed infrastructure is always managed exclusively through Terraform. In practice, environments evolve. Under such pressure, teams manually make changes, apply hotfixes directly in the cloud console, and deploy infrastructure through the parallelisation of automation. These changes create configuration drift outside of the Terraform lifecycle. The infrastructure stays functional but is misaligned with the Terraform codebase, causing unpredictable behavior, broken expectations, and sometimes even production incidents.
- Terraform Drift Detection at Scale: How to Catch Configuration Drift Early
While Terraform possesses the declarative model for managing infrastructure across cloud platforms, it makes one assumption that is rare at scale: that the state of the deployed infrastructure is always managed exclusively through Terraform. In practice, environments evolve. Under such pressure, teams manually make changes, apply hotfixes directly in the cloud console, and deploy infrastructure through the parallelisation of automation. These changes create configuration drift outside of the Terraform lifecycle. The infrastructure stays functional but is misaligned with the Terraform codebase, causing unpredictable behavior, broken expectations, and sometimes even production incidents.
- Jira Restore And Disaster Recovery: Scenarios and Use Cases
It’s hard to imagine the company managing its projects without issue-tracking tools. For example, Jira has probably become one of the most popular project management software solutions for organized teams. According to Atlassian, over 180k customers in about 190 countries use Jira in their daily work. So, what will they do if their Jira account suddenly fails to work? Every agile team, project management, product management, and software development team leader and member should consider a situation like that and have a plan to overcome any disaster with minimal impact.
- Overview of Telemetry for Kubernetes Clusters: Enhancing Observability and Monitoring
Kubernetes has become a norm for deploying and managing software in a containerized manner. Its ability to dynamically manage microservices and scale has revolutionized software development in current times. However, it is not an easy task to maintain transparency in and monitor availability and performance of Kubernetes clusters. That is where telemetry comes in. Telemetry in Kubernetes involves collecting, processing, and visualization of cluster information for cluster health, fault diagnostics, and performance optimizations. In this article, we will see why telemetry is significant, key components, tools, and best practice in developing an effective observability stack for Kubernetes.
- Overview of Telemetry for Kubernetes Clusters: Enhancing Observability and Monitoring
Kubernetes has become a norm for deploying and managing software in a containerized manner. Its ability to dynamically manage microservices and scale has revolutionized software development in current times. However, it is not an easy task to maintain transparency in and monitor availability and performance of Kubernetes clusters. That is where telemetry comes in. Telemetry in Kubernetes involves collecting, processing, and visualization of cluster information for cluster health, fault diagnostics, and performance optimizations. In this article, we will see why telemetry is significant, key components, tools, and best practice in developing an effective observability stack for Kubernetes.
- Chaos Engineering for Microservices
As someone who works closely with distributed systems and microservices, I've seen firsthand how complex things can get once Kubernetes, Istio, and service meshes enter the picture. The shift to a cloud-native world is exciting, but it brings new challenges — especially around resilience. We can't just hope things won’t fail — because they will. That’s where chaos engineering comes in. It’s a proactive way to build confidence in your system’s ability to handle real-world disruptions by intentionally injecting failure and observing how everything holds up.
- Chaos Engineering for Microservices
As someone who works closely with distributed systems and microservices, I've seen firsthand how complex things can get once Kubernetes, Istio, and service meshes enter the picture. The shift to a cloud-native world is exciting, but it brings new challenges — especially around resilience. We can't just hope things won’t fail — because they will. That’s where chaos engineering comes in. It’s a proactive way to build confidence in your system’s ability to handle real-world disruptions by intentionally injecting failure and observing how everything holds up.
- Monitoring journald Logs With Event-Driven Ansible
Monitoring journald is essential for keeping systems running smoothly and securely. By regularly checking logs generated by systemd, administrators can catch potential issues like failing services or resource constraints — before they turn into major problems. Beyond performance and troubleshooting, journald is a powerful tool for security and compliance. It helps track login attempts, privilege escalations, and unusual service behavior, making it crucial for detecting unauthorized access or potential cyber threats. For organizations that need to meet regulatory requirements, monitoring system logs ensures proper tracking of system changes and security policies. About the Module ansible.eda.journald is an Event-Driven Ansible (EDA) plugin that listens to journald logs in real time and triggers automated responses based on log events. This makes it useful for tasks like automatically restarting failed services, detecting security threats, or alerting administrators when critical system issues occur.
- Unlocking the Power of Serverless AI/ML on AWS: Expert Strategies for Scalable and Secure Applications
Amazon Web Services (AWS) provides an expansive suite of tools to help developers build and manage serverless applications with ease. By abstracting the complexities of infrastructure, AWS enables teams to focus on innovation. When combined with the transformative capabilities of artificial intelligence (AI) and machine learning (ML), serverless architectures become a powerhouse for creating intelligent, scalable, and cost-efficient solutions. In this article, we delve into serverless AI/ML on AWS, exploring best practices, implementation strategies, and an example to illustrate these concepts in action. Why Combine AI, ML, and Serverless Computing? The fusion of serverless computing with AI and ML represents a significant leap forward for modern application development. Serverless systems scale automatically, simplify operational overhead, and use a pay-per-use model that keeps costs in check. On the other hand, AI brings capabilities like natural language processing (NLP), image recognition, and data analytics, while ML enables predictive modeling, dynamic decision making, and personalization. Together, AI and ML unlock opportunities to build intelligent applications that are not only efficient but also responsive to real-world challenges.
- Unlocking the Power of Serverless AI/ML on AWS: Expert Strategies for Scalable and Secure Applications
Amazon Web Services (AWS) provides an expansive suite of tools to help developers build and manage serverless applications with ease. By abstracting the complexities of infrastructure, AWS enables teams to focus on innovation. When combined with the transformative capabilities of artificial intelligence (AI) and machine learning (ML), serverless architectures become a powerhouse for creating intelligent, scalable, and cost-efficient solutions. In this article, we delve into serverless AI/ML on AWS, exploring best practices, implementation strategies, and an example to illustrate these concepts in action. Why Combine AI, ML, and Serverless Computing? The fusion of serverless computing with AI and ML represents a significant leap forward for modern application development. Serverless systems scale automatically, simplify operational overhead, and use a pay-per-use model that keeps costs in check. On the other hand, AI brings capabilities like natural language processing (NLP), image recognition, and data analytics, while ML enables predictive modeling, dynamic decision making, and personalization. Together, AI and ML unlock opportunities to build intelligent applications that are not only efficient but also responsive to real-world challenges.
- Context Search With AWS Bedrock, Cohere Model, and Spring AI
Today, we will create simple applications using the Cohere Embed Multilingual v3 model via Amazon Bedrock and Spring AI. We’ll skip over basic Spring concepts like bean management and starters, as the main goal of this article is to explore the capabilities of Spring AI and Amazon Bedrock.
- Simulating Events in Ansible EDA: A Practical Use Case of ansible.eda.generic
When developing an Event-Driven Ansible rulebook to automate tasks like handling a server outage or responding to a failed CI/CD job, testing the logic can be tricky if we don’t have a live system constantly generating events. That’s where the ansible.eda.generic source plugin comes in handy. It allows us to define mock events and inject them directly into the EDA workflow. This makes it easy to simulate real-time scenarios, test the rule conditions, and ensure the playbooks run as expected in a safe and controlled environment. In this article, I’ll walk you through how to include payloads directly within an Ansible EDA rulebook, as well as how to read payloads from an external file and use that data in rule conditions. I’ll also include some of the parameters like loop_count and loop_delay, which will help to control the number of times an event is triggered and the delay between each trigger. These features are especially helpful for simulating and managing event flow effectively during testing and development.
- Understanding the Identity Bridge Framework
Modern authentication protocols, such as SAML and OpenID Connect (OIDC), rely heavily upon federation as the guiding principle to secure front-door authentication. Federation is an elegant approach for web-based applications to isolate authentication from the application using a trust established between the centralized identity provider (IDP) and a subscribing application. Armed with asymmetric key-based digital certificates, federation ensures that an application can securely leverage an external IDP service for authentication and free itself from the burden of handling user interaction during the authentication process. With federation comes the concept of single sign-on (SSO). Suppose the centralized IDP has already established a secure authentication session. In that case, multiple applications can benefit from being able to single sign-on and bypass redundant login, improving user experience and reducing authentication frictions.
- Building Scalable and Efficient Architectures With ECS Serverless and Event-Driven Design
In modern cloud-native application development, scalability, efficiency, and flexibility are paramount. As organizations increasingly migrate their workloads to the cloud, architects are embracing innovative technologies and design patterns to meet the growing demands of their systems. Two such technologies—Amazon Elastic Container Service (ECS) with serverless computing and event-driven architectures—offer powerful tools for building scalable and efficient systems. This article explores the key concepts, benefits, and best practices for designing architectures that leverage ECS with serverless capabilities and event-driven design patterns. Understanding ECS and Serverless Amazon ECS is a powerful, fully managed container orchestration service that lets developers run Docker containers on a scalable, secure infrastructure—without the hassle of managing clusters or scaling containers. Say goodbye to infrastructure headaches and focus on building your application logic.
- AWS S3 Strategies for Scalable and Secure Data Lake Storage
Amazon S3 is an object storage service that offers scalability, data availability, security, and performance. S3 is the main component of your data lake, and creating buckets with the right strategy and properties can help you consume the data from the data lake in an efficient and secure way. The article will guide you through bucket strategies when creating a data lake and discuss other things to keep in mind.
- Why Rate Limiting Matters in Istio and How to Implement It
In today's microservices-driven world, managing traffic smartly is just as crucial as deploying the services themselves. As your system grows, so do the risks — like overuse, misuse, and cascading failures. And if you're running multi-tenant services, it's essential to enforce request limits for each customer. That’s where rate limiting in a service mesh like Istio can make a big difference. In this post, we’ll explore why rate limiting is important in Istio and show you how to set it up effectively. Why Rate Limiting Matters in Istio Why Was It Important for Us? This is in continuation of the incident that we faced, which is detailed in How I Made My Liberty Microservices Load-Resilient. One of the findings during the incident was the missing rate limiting in the Istio ingress private gateway. Here are the challenges that we faced:
- *You* Can Shape Trend Reports: Join DZone's Software Supply Chain Security Research
Hey, DZone Community! We have an exciting year of research ahead for our beloved Trend Reports. And once again, we are asking for your insights and expertise (anonymously if you choose) — readers just like you drive the content we cover in our Trend Reports. Check out the details for our research survey below.