Kubernetes Operators in Java with the Java Operator SDK

Kubernetes is great for managing containerized applications, and its open API allows it to be extended with third-party controllers to meet custom requirements. With custom resources, it’s possible to declare the desired cluster behavior at a higher level of abstraction, but also to manage resources completely external to your cluster. Custom Kubernetes operators provide a very flexible way to extend a GitOps setup to integrate resources that do not typically live within Kubernetes, like DNS records, databases, message queues, Kafka topics, etc.

Go is the go-to (pun intended) language for Kubernetes tooling. What if your team’s expertise lies in Java? Fortunately, building robust Kubernetes operators in Java is not only possible but practical. I’ll walk you through how to do just that, with Spring Boot and the Java Operator SDK, using my project ddns-operator as an example.

The Kubernetes control loop

The Kubernetes control loop is the process that makes sure your cluster behaves according to the resources you define. That is, it will continuously attempt to make the actual state of the cluster equal to your desired state. Your desired state is expressed by the spec part of a resource, while a resource’s status contains additional information about the actual state. The Kubernetes system component implementing this, is the kube-controller-manager.

Watching Kubernetes

To be able to achieve this control loop behavior, kube-controller-manager needs to be continuously up-to-date with the relevant resources in the cluster. It would be very wasteful for it to continously poll the Kubernetes API server with GET requests to check for updates. Instead, the API server provides the watch API. When a client makes a request like

GET /api/v1/namespaces/test/pods

it can make a follow-up watch request, specifying watch and providing the resourceVersion returned in the response to that initial request as a starting point.

GET /api/v1/namespaces/test/pods?watch=1&resourceVersion=1234

When a watch request is made, the server keeps the HTTP connection open, and starts pushing events in real time. This way, the client can react both faster and more efficiently to changes.

The reconciliation process

With the ability to watch for updates on resources, the rest of the control loop is pretty straightforward. These are the main steps.

You apply a manifest for a resource, for example, a Deployment. The controller observes this change.
The controller applies the change. In case of a Deployment, it reacts by creating or updating a corresponding ReplicaSet.
Based on the outcome of the reconciliation process, it writes information into the status field of the reconciled resource.

Reconciliation

Specifically for a deployment, that looks like:

Deployment reconciliation

This feedback loop is the core of what Kubernetes does. Operators are a way to extend it to new kinds of resources.

Running pods

The final step of our deployment example does not go through kube-controller-manager. There are two other components involved in this missing link between creating a pod through replicaset reconciliation and actually running an image on a machine in the cluster.

In a first step, kube-scheduler, a separate Kubernetes control plane process, monitors unassigned pods and updates those pods’ specs to reference a specific nodeName, based on various conditions and settings. The final piece is kubelet, which is the agent that runs on the worker nodes. It picks up on pods assigned to that node, and orchestrates the remaining image-pulling to container-running work.

That completes the picture on how the Kubernetes control loop gets your deployments running. For our purposes here, however, this part is not very relevant. Interfacing with external systems will be done directly within the operator, without further indirections within Kubernetes.

The Kubernetes operator pattern

The operator pattern lets you extend Kubernetes by adding custom resources and writing controllers that manage them. These resources act just like built-in ones (Pod, Deployment, etc.). Their definition is itself a built-in resouce, CustomResource.

A custom resource definition looks like this

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: cats.test.example.com
spec:
  group: test.example.com
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                name:
                  type: string
            status:
              type: object
              properties:
                alive:
                  type: boolean
  scope: Namespaced
  names:
    kind: Cat
    # these are the names you want use in tools like kubectl
    # > `kubectl get cats`
    plural: cats
    singular: cat
    shortNames:
      - ct

and a concrete resource definition would be

apiVersion: test.example.com/v1
kind: Cat
metadata:
  name: kitty-the-cat
spec:
  name: kitty

They usually represent what would otherwise be a manual orchestration of internal or external resources. Either it’s an abstraction of what would actually be a manual process in Kubernetes. For example, some kind of application resource would encapsulate a series of related deployments, services, etc., that comprise the components that you would normally create separately to deploy that application. Another possibility is that the resource attempts to represent or mirror some external resource that lives completely outside the cluster, for example:

DNS zones
Kafka topics
Databases
Message queues
SSL certificates

With operators, you can declaratively manage these external systems using YAML-based resources. Those resources are complete first-class citizens of Kubernetes, meaning they can be easily integrated into a GitOps setup. That way, a single GitOps mechanism can become the single source of truth and the engine behind not just your Kubernetes deployments, but any external system you augment it with.

In the end, the goal is to establish a similar control loop to the one in kube-controller-manager.

Operator reconciliation

Don’t reinvent the wheel: community operators

Before looking at implementing a custom operator yourself, it’s worth checking what the extensive Kubernetes community has come up with already. For many use cases, there are production-proven third-party operators available, often directly affiliated with the product it manages.

Recently, many tools or applications are being distributed for Kubernetes as operators. Often, this is the easiest and preferred way to deploy that application.

Argo CD: A declarative continuous deployment tool for Kubernetes, syncing Git repositories with resources on a Kubernetes cluster. If you’re doing GitOps, you’re probably running Argo CD or one of its competitors.
cert-manager: Automates the management and renewal of TLS certificates, automatically making HTTPS work for your ingresses and services.
Postgres Operator: Provides automated deployment, lifecycle management, backup, user management, syncing with cloud databases, etc. to run PostgreSQL databases right within in your cluster. While this is a great way to deploy Postgres, it’s more common to use a managed (Postgres or other) database service by a cloud provider.
External Secrets Operator: Pulls secrets from external stores (like AWS Secrets Manager, Azure Vault, etc.) and injects them into Kubernetes. This is one of the two common ways to get secrets into your cluster when using a GitOps flow (DO NOT drop them in your GitOps repository, obviously!), the other common option being sealed-secrets
Elastic Cloud on Kubernetes: Automates running and scaling Elasticsearch and Kibana; a very easy way to deploy a full Elastic stack.

Honorable mention: many clusters have Reloader installed. It is not technically an operator, because it only operates on Kubernetes-native resources (ConfigMaps and Secrets). That means, if you’re playing Kubernetes taxonomy, it’s just a controller. An operator is the combination of at least one custom resource and a controller managing those resources.

For a catalog of third-party operators, have a look at OperatorHub.

Why Java?

Because Kubernetes itself is written in Go, many of related community projects are also developed using Go. Because of that, Go may be the first language on your mind when considering a custom operator. However, since all interactions with Kubernetes go through its HTTP API, there is no real technical reason to prefer Go.

Development teams are often particularly comfortable with the specific programming language they use for their projects. Very often, in an enterprise environment, that’s not Go, but maybe Java, C#, Python or JavaScript. For each of these, the community provides not just a Kubernetes client library, but a specific Kubernetes operator framework too. There’s even shell-operator, if you’re that kind of person.

So, while Go is seen as the default for developing around Kubernetes, it can be much more comfortable and maintainable in the long run to pick your favorite language. This also means you can use any libraries you’ve written or are used to working with. Monitoring, testing, logging, and similar typically framework-heavy aspects, can work just like the other applications you’re already using. Specifically to Java, the Java Operator SDK works well when combined with Spring Boot or Quarkus, such that your custom operator is just another Spring boot app, happily running among your other apps in the cluster. The bottom line here is consistency between “normal” apps, and this new type of beast.

The Java Operator SDK

As previously mentioned, theoretically, all you need to build your operator is to be able to interface with the Kubernetes API. In that regard, any Kubernetes client library, like the one by Fabric8, or even any plain old HTTP client, will do. However, nearly all of the additional operator-specific boilerplate and complexity can be handled by the Java Operator SDK. The benefits of using it include:

CRD generation from Java POJOs
Handles Kubernetes event listening
Deals with missed events, events from changes by the operator itself, …
Handles retry policies and error handling out-of-the-box
Deals with finalizers when handling resource deletion
Caching and indexing for optimized reconciliation
Leader election between multiple replicas
Simplified testing
Clean Spring Boot or Quarkus integration

Use case: dynamic DNS operator

To demonstrate the Java Operator SDK, I’ll walk through a real use case for a custom operator. Let’s say I want my home server reachable at server.inias.eu, but, typical for residential internet contracts, my ISP changes my public IP address frequently. The situation looks like this:

DDNS use case

The solution to this is often to use a dynamic DNS service like No-IP, DuckDNS, ddclient or a custom script. These types of programs are very simple. They run in the background and do one thing: check your IP address about every 5 minutes and call an API to update one or more DNS records.

The method I was previously using was a custom script that ran every 5 minutes through a cron job. However, I wanted:

A simple way to fully manage DNS records, instead of having to create the record manually, then edit a script to keep it in sync
Better observability
To get rid of my only cron job

Since I was already in the progress of migrating my small army of Docker Compose files to run as a single-node Kubernetes cluster with Argo CD, I was interested in writing a custom operator to do this. It ended up being exactly what I wanted.

To experiment a bit further, I wanted to support hosting static websites behind the created records, with content based on markdown embedded into the resource manifests, and a minimal templating system. Of course, this is for demo purposes; it would not make much sense to host a website like that in production. The resulting project is open source and available on GitHub: ddns-operator.

Managing DNS records

Let’s start with the custom resources that describe the functionality implemented by the operator. I’ll show some actual examples instead of the CustomResourceDefinitions.

To keep it a bit generic, the first resource describes a DNS zone. Defining the top-level domain name and the Cloudflare credentials at this separate level allows you to potentially manage multiple domains across different Cloudflare accounts.

apiVersion: ddns.inias.eu/v1
kind: CloudflareZone
metadata:
  annotations:
  creationTimestamp: "2025-05-03T14:18:37Z"
  generation: 1
  name: inias-eu
  namespace: ddns
  resourceVersion: "576631"
  uid: 49fb2fe2-090d-4e69-8568-b2b1c27d30fd
spec:
  apiTokenSecretRef:
    key: api-token
    name: cloudflare-secret
  domain: inias.eu
status:
  id: 7d1f066a1cb04bd229858bff04b72a3a
  observedGeneration: 1

The record refers to the domain managed in Cloudflare and the token to use for calling the Cloudflare API. apiTokenSecretRef refers by name to a secret in the same namespace and a key within that secret. Its status.id, set after reconciliation, indicates the actual id of this zone in Cloudflare. It provides useful info, and will prevent unnecessary API calls to look up the zone by name when handling the related DNS records.

Next up is the actual DNS record.

apiVersion: ddns.inias.eu/v1
kind: CloudflareRecord
metadata:
  creationTimestamp: "2025-07-20T23:15:49Z"
  finalizers:
    - cloudflarerecords.ddns.inias.eu/finalizer
  generation: 1
  name: demo-record
  namespace: ddns
  resourceVersion: "8390910"
  uid: 21db10d7-89fb-4dd0-ad24-c8b88357b323
spec:
  name: ddns-demo
  zoneRef: inias-eu
  proxied: true
status:
  host: ddns-demo.inias.eu
  id: 4a802f4b3d1e80e5069aa36ce8af1588
  lastSyncedIp: 206.58.182.184
  lastUpdateTime: "2025-07-21T12:25:19.806555297Z"
  observedGeneration: 1

This defines the subdomain (ddns-demo.inias.eu) to reconcile. Notice that it doesn’t include much at all. In order to create a DNS A-record with your home IP address, you don’t really need any other specifications besides the name of the subdomain, and a reference to the zone. The content of zoneRef refers to a CloudflareZone resource in the same namespace. In this case, the zone inias-eu shown before. I’ve added a proxied flag to indicate whether this record should have Cloudflare’s proxying setting turned on. In the status, there’s some useful information like the current IP address, update time, and the id of this record in Cloudflare.

When processing these two resources, the operator should

Resolve the current public IP.
Fetch any existing corresponding Cloudflare record.
Update the record if the IP or the proxy setting has changed.

Serving static sites

Now that we have DNS records, we can create additional resources that reference it. We’ll need to reference a DNS record for the resources managing the static site, since we need the target domain in order to deploy an ingress.

The idea is to have at most one site resource per DNS record resource and different page resources referencing it. During HTML generation, the root of the domain will show an index page that will be populated with links to the different pages, which will be created behind subpaths of the domain.

A site resource looks like this.

apiVersion: ddns.inias.eu/v1
kind: Site
metadata:
  name: demo-site
spec:
  cloudflareRecordRef: demo-record
  indexTemplate: |
    <html>
      <body>
        <h1>Index</h1>
        {{index}}
      </body>
    </html>
  pageTemplate: |-
    <html>
      <head>
        <title>{{title}}</title>
      </head>
      <body>
        <header>
          <a href="index.html">&larr; Back to index</a>
        </header>
        <main>
          <h1>{{title}}</h1>
          {{content}}
        </main>
        <footer>Powered by Kubernetes 😎</footer>
      </body>
    </html>

There are three spec fields. One is again a reference by name to a different resource, the CloudflareRecord we looked at earlier. The indexTemplate and pageTemplate respectively provide some HTML templating for the index page and for the actual pages.

All that’s missing now is the page resource.

apiVersion: ddns.inias.eu/v1
kind: Page
metadata:
  name: demo-page
spec:
  siteRef: demo-site
  path: demo
  title: Demo Page
  content: |
    This is a simple demo page.

It includes a reference to the Site resource in siteRef. Then, path specifies the path at which the page should be hosted. In this case, the resulting path would be https://ddns-demo.inias.eu/demo.html. title speaks for itself, it’s injected in the HTML template. The same goes for content, but here we expect the content in Markdown format. Before inserting it in the template, it’s converted to HTML.

In a nutshell, the operator will

Generate a ConfigMap of HTML files for page content.
Create a Deployment (Nginx) to serve the site.
Mount the generated contents from the ConfigMap.
Expose it with a Service and Ingress.
Annotate the Ingress for cert-manager, to support HTTPS.

Have a peek at the end result in action at https://ddns-demo.inias.eu.

Implementation

Now that the custom resources have been designed, the only thing still missing is the actual operator application. As extensively foreshadowed, we’ll use the Java Operator SDK! Like all frameworks, the Java Operator SDK attempts to reduce our concerns to just the aspects that are actually specific to our use case, like the reconciliation logic.

The framework allows you to focus on these three main concerns:

Custom resources
Reconciliation logic
Dependencies between resources

However, we’ll start with a general look at how the Java Operator SDK works.

The event system

The Java Operator SDK is an event-driven framework. It’s internal event system is what triggers the execution of the reconciliation methods you’ll implement. The actual source of that event can be a creation, deletion or modification of the resource manifest but also a change in a different resource it depends on, or . The framework will also trigger reconciliations every once in a while, even if nothing happens, just to further reduce the risk of drifting. The framework makes sure you don’t double-reconcile when these events happen around the same time, triggering retries when an exception was thrown, those kinds of things.

You can hook into that event system on both sides. By writing reconcilers for custom resources defined as Java POJOs, the framework knows what to listen for in Kubernetes and also what to call for reconciliation: our Reconciler implementations. We’ll do just that in the next sections.

Event system

By using the Spring Boot starter, we can define our reconcilers as Spring beans to avoid even more boilerplate.

<dependency>
  <groupId>io.javaoperatorsdk</groupId>
  <artifactId>operator-framework-spring-boot-starter</artifactId>
  <!-- Remember to check the actual latest version! -->
  <version>6.1.0</version>
</dependency>

Custom resources

The Operator SDK is intended to be used with the CustomResourceDefinition (CRD) generator provided by the Fabric8 Kubernetes client library. It allows you to write your CRDs as Java POJOs and generate the corresponding YAML manifests. Otherwise, you’d have to keep both of those in sync manually.

<plugin>
  <groupId>io.fabric8</groupId>
  <artifactId>crd-generator-maven-plugin</artifactId>
  <!-- Remember to check the actual latest version! -->
  <version>7.3.1</version>
  <executions>
    <execution>
      <phase>compile</phase>
      <goals>
        <goal>generate</goal>
      </goals>
    </execution>
  </executions>
</plugin>

You’ll need to copy the generated CRD manifests from target/classes/META-INF/fabric8/ into your GitOps repo, or apply them directly to the cluster.

Let’s look at an example of a custom resource defined as a Java POJO.

// CloudflareRecordCustomResource.java
import io.fabric8.kubernetes.api.model.Namespaced;
import io.fabric8.kubernetes.client.CustomResource;
import io.fabric8.kubernetes.model.annotation.*;

@Group("ddns.inias.eu")
@Version("v1")
@Kind("CloudflareRecord")
@Plural("cloudflarerecords")
public class CloudflareRecordCustomResource
        extends CustomResource<CloudflareRecordSpec, CloudflareRecordStatus>
        implements Namespaced {
}

// CloudflareRecordSpec.java
import io.fabric8.generator.annotation.*;

public record CloudflareRecordSpec(
        @Required String zoneRef,
        @Required String name,
        @Default("true") boolean proxied
) {
}

// CloudflareRecordStatus.java
import java.time.Instant;

public record CloudflareRecordStatus(
        Long observedGeneration,
        String id,
        String lastSyncedIp,
        Instant lastUpdateTime,
        String host
) {
}

Note that we’re extending the abstract CustomResource class from Fabric8 here. Check the documentation for an overview of all the annotation options.

Reconciliation and dependencies

The Java Operator SDK supports two styles of reconciliation. Each has its benefits and drawbacks. There’s the workflow / dependent resources style, which is the most “managed” approach. It works especially well when you’re only dealing with Kubernetes native resources. Linking in external resources is possible, but it’s a bit limited when you’re not completely aligned with the intended use cases. On the other hand, there’s the more manual approach, where you’re more directly tapping into the event source system of the framework. We’ll look at both approaches.

Workflow and dependent resources: the site reconciler

Using the workflow + dependent resources approach, you’re mostly defining the relationship between primary and secondary resources. The secondary dependent resource’s desired specification depends on another primary resource. We’ll use this approach to implement the Site reconciler. We don’t implement a separate Page reconciler. Instead, reconciling a Site looks up all the related pages, and takes care of the whole site with subpages from there.

Let’s look at the reconciler that enables this.

@Component
@ControllerConfiguration
@Workflow(dependents = {
        @Dependent(
                name = "site-configmap",
                type = SiteConfigMapDependentResource.class
        ),
        @Dependent(
                name = "site-deployment",
                type = SiteDeploymentDependentResource.class,
                dependsOn = "site-configmap"
        ),
        @Dependent(
                name = "site-service",
                type = SiteServiceDependentResource.class,
                dependsOn = "site-deployment"
        ),
        @Dependent(
                name = "site-ingress",
                type = SiteIngressDependentResource.class,
                dependsOn = "site-service"
        )
})
public class SiteReconciler implements Reconciler<SiteCustomResource> {
    @Override
    public UpdateControl<SiteCustomResource> reconcile(SiteCustomResource siteResource, Context<SiteCustomResource> context) {
        siteResource.setStatus(new ObservedGenerationStatus(siteResource.getMetadata().getGeneration()));
        return UpdateControl.patchStatus(siteResource);
    }
}

There’s barely any logic here, just the status calculation. Everything else lives in the DependentResource implementations. It’s a declarative workflow with dependent resources for a ConfigMap, a Deployment, a Service and an Ingress, that are all defined in terms of the Site resource. Those resources also depend on each other in that order. That is, the framework will reconcile the configmap before the deployment, and so on.

Let’s look at a complete implementation for SiteConfigMapDependentResource.

import com.vladsch.flexmark.html.HtmlRenderer;
import com.vladsch.flexmark.parser.Parser;
import eu.inias.ddnsoperator.crds.page.PageCustomResource;
import eu.inias.ddnsoperator.crds.site.SiteCustomResource;
import io.fabric8.kubernetes.api.model.ConfigMap;
import io.fabric8.kubernetes.api.model.ConfigMapBuilder;
import io.javaoperatorsdk.operator.api.reconciler.Context;
import io.javaoperatorsdk.operator.processing.dependent.kubernetes.CRUDKubernetesDependentResource;

import java.util.*;

public class SiteConfigMapDependentResource
        extends CRUDKubernetesDependentResource<ConfigMap, SiteCustomResource> {
    private static final Parser PARSER = Parser.builder().build();
    private static final HtmlRenderer RENDERER = HtmlRenderer.builder().build();

    public SiteConfigMapDependentResource() {
        super(ConfigMap.class);
    }

    @Override
    protected ConfigMap desired(SiteCustomResource site, Context<SiteCustomResource> context) {
        List<PageCustomResource> pages = getPageResources(context);
        Map<String, String> htmlFiles = new HashMap<>();
        htmlFiles.put("index.html", generateIndexHtml(pages, site.getSpec().indexTemplate()));
        for (PageCustomResource page : pages) {
            String path = page.getSpec().path() + ".html";
            htmlFiles.put(path, renderPage(page, site.getSpec().pageTemplate()));
        }
        return new ConfigMapBuilder()
                .withNewMetadata()
                .withName(site.getMetadata().getName())
                .withNamespace(site.getMetadata().getNamespace())
                .endMetadata()
                .withData(htmlFiles)
                .build();
    }

    private static List<PageCustomResource> getPageResources(Context<SiteCustomResource> context) {
        return context.getSecondaryResourcesAsStream(PageCustomResource.class)
                .sorted(Comparator.comparing(p -> p.getSpec().title()))
                .toList();
    }

    private String renderPage(PageCustomResource page, String template) {
        String htmlBody = RENDERER.render(PARSER.parse(page.getSpec().content()));
        if (template == null) {
            return htmlBody;
        }
        String title = page.getSpec().title();
        return template
                .replace("{{title}}", title)
                .replace("{{content}}", htmlBody);
    }

    private String generateIndexHtml(List<PageCustomResource> pages, String template) {
        if (template == null) {
            template = "<h1>Index</h1>\n{{index}}";
        }
        String indexList = generateIndexListHtml(pages);
        return template.replace("{{index}}", indexList);
    }

    private String generateIndexListHtml(List<PageCustomResource> pages) {
        StringBuilder sb = new StringBuilder("<ul>");
        for (PageCustomResource page : pages) {
            sb.append("<li><a href=\"")
                    .append(page.getSpec().path())
                    .append(".html\">")
                    .append(page.getSpec().title())
                    .append("</a></li>");
        }
        sb.append("</ul>");
        return sb.toString();
    }
}

As you can see, the implementation is very straightforward. You simply fetch the related pages, use Flexmark to render the Markdown to HTML, and implement some crude caveman-style “templating” using string replacements.

Because we’re extending CRUDKubernetesDependentResource, there’s only a single method to implement. You simply need to indicate what your desired secondary resource looks like in terms of the primary resource. There are other out-of-the-box dependent resource implementations available to extend from, even specific ones to track external state both ways. That way, you could support not only pushing changes in resources to the external system, but also polling for direct changes in the external system too, and remediate any drift that has occurred. In my opinion, these are hard to work with. You will have to jump through some hoops and you won’t end up with a simple, clean and declarative dependent resource in the end. For that reason, the Cloudflare part of the operator is handled with a manual reconciliation flow.

Another reason to avoid this approach when interfacing with external systems, is that it will start to get hard to work cleanly with Spring Boot. While the Spring Boot starter is available and we’re using it, the Java Operator SDK was not designed with Spring Boot in mind. For this workflow-based reconciler, we have to pass the dependent resource classes in the @Workflow annotation. The classes are instantiated within the framework, and are expected to have parameterless constructors. In order to inject other components into them, we’d have to use AspectJ load time weaving (@Configurable), which is entirely feasible, but generally not considered ideal.

Finally, because we implement reconciliation fully at Site level, there’s a missing link. We want to reconcile a Site again when any of its Pages changes, or when pages are added or removed. The framework cannot automatically know about this. To support it, an InformerEventSource is used. In fact, that’s what allows us to call context.getSecondaryResourcesAsStream in the implementation shown above. We’ll need informer event sources anyway for the manual flow, so we ignore this here, and cover it in the next section. If you’re curious, the implementation can be found in SiteReconciler.java.

Manual flow

For the Cloudflare zones and records, we’ll use the manual flow. I’ll present some aspects of the CloudflareReconciler. Feel free to have a look at the complete implementation.

The class signature looks like this. Here, a maximal reconciliation interval is configured at 5 minutes, to ensure the record is reconciled at least every 5 minutes in case the IP address changes. It would also be possible to implement this as an event source polling for the external IP, but this is a very simple way to achieve similar behavior.

@Component
@ControllerConfiguration(
        maxReconciliationInterval = @MaxReconciliationInterval(interval = 5, timeUnit = TimeUnit.MINUTES)
)
public class CloudflareRecordReconciler
        implements Reconciler<CloudflareRecordCustomResource>, Cleaner<CloudflareRecordCustomResource>

That is, this implements Reconciler and Cleaner. The latter to support deleting Cloudflare records when the corresponding Kubernetes resources are deleted. Because cleanup is implemented, the framework will take care of adding a finalizer to each CloudflareRecord, and only remove it after the cleanup method returns successfully. No @Workflow annotation this time, since dependencies are managed via event sources, and syncing the Cloudflare resources happens right in the reconcile method. Here’s a version of that method with extra comments, some other methods and intermediary services inlined, and an edge case trimmed out. It should speak for itself.

@Override
public UpdateControl<CloudflareRecordCustomResource> reconcile(
        CloudflareRecordCustomResource recordResource,
        Context<CloudflareRecordCustomResource> context
) {
    String namespace = recordResource.getMetadata().getNamespace();
    
    // Get public IP
    String publicIp = publicIpService.getPublicIp();

    // Get the CloudflareZone resource
    CloudflareZoneCustomResource zoneResource = context.getClient()
            .resources(CloudflareZoneCustomResource.class)
            .inNamespace(recordResource.getMetadata().getNamespace())
            .withName(recordResource.getSpec().zoneRef())
            .require();
    String zoneId = zoneResource.getStatus().id();

    // Fetch the content of the cloudflare token secret from the reference in the zone resource
    SecretReference secretReference = zoneResource.getSpec().apiTokenSecretRef();
    String secretBase64 = context.getClient().secrets()
            .inNamespace(namespace)
            .withName(secretReference.name())
            .require()
            .getData()
            .get(secretReference.key());
    String secret = new String(Base64.getDecoder().decode(secretBase64), UTF_8);
    
    // Instantiate a Cloudflare service class, encapsulating API interactions
    CloudflareService cloudflareService = cloudflareServiceFactory.create(secret);

    String zoneName = cloudflareService.getZoneById(zoneId).name();
    String host = recordResource.getSpec().name() + "." + zoneName;
    boolean proxied = recordResource.getSpec().proxied();
    // Fetch any existing record
    CloudflareApiRecord cloudflareApiRecord = cloudflareService.getDnsRecordByName(zoneId, host)
            .map(existingRecord -> {
                // There was an existing record, update if out of sync
                if (existingRecord.content().equals(publicIp) && existingRecord.proxied() == proxied) {
                    LOGGER.info("Record {} is up to date, nothing to do.", existingRecord.name());
                    return existingRecord;
                } else {
                    return cloudflareService.updateDnsRecord(zoneId, existingRecord.updated(publicIp, proxied));
                }
            })
            .orElseGet(() -> {
                // No existing record, create it
                CloudflareApiRecord record = CloudflareApiRecord.newARecord(host, publicIp, proxied);
                return cloudflareService.createDnsRecord(zoneId, record);
            });

    // Calculate and update the status
    CloudflareRecordStatus status = new CloudflareRecordStatus(
            recordResource.getMetadata().getGeneration(),
            cloudflareApiRecord.id(),
            publicIp,
            Instant.now(),
            host
    );
    recordResource.setStatus(status);
    return UpdateControl.patchStatus(recordResource);
}

A similar cleanup method from the Cleaner interface implements the deletion logic.

Let’s look at the event setup. How do we make sure a record is reconciled again when something about its zone changes? Previously, with the managed workflow, this was handled automatically. That is, a change in the Site or any of its dependents (for example, the ConfigMap) will automatically trigger the reconciliation of the Site, keeping everything in sync. For the manual flow, we’ll need to implement prepareEventSources to manually register the dependency between records and sites as an event source relationship. Again, this is an inlined version with additional comments explaining the flow. The event source setup is still heavily supported by the framework. We’re not doing low level eventing, but showing the framework a two-way mapping between records and zones. For record to zone, that’s easy, the zone reference is in the resource definition. For the other way around, we can simply use the Operator SDK’s built-in indexing functionality to build an index for lookup in the other direction.

@Override
public List<EventSource<?, CloudflareRecordCustomResource>> prepareEventSources(
        EventSourceContext<CloudflareRecordCustomResource> context
) {

    // the primary to secondary mapper is the obvious direction, again using spec.zoneRef
    PrimaryToSecondaryMapper<CloudflareRecordCustomResource> primaryToSecondary = p ->
            Set.of(new ResourceID(p.getSpec().zoneRef(), p.getMetadata().getNamespace()));

    // For the reverse direction, mapping zones back to records, we'll use the Operator SDK's index support.
    // We map the record to an index key, which we define as the name of the zone and the namespace, separated by a '#'.
    // In the actual SecondaryToPrimaryMapper, we build that index key and look up the reverse relation in the index.
    String indexName = "cloudflare-record-zone";
    context.getPrimaryCache().addIndexer(indexName, p ->
        List.of(p.getSpec().zoneRef() + "#" + p.getMetadata().getNamespace())
    );
    SecondaryToPrimaryMapper<CloudflareZoneCustomResource> secondaryToPrimary = s -> {
        ResourceID id = ResourceID.fromResource(s);
        return context.getPrimaryCache()
                .byIndex(indexName, id.getName() + "#" + id.getNamespace())
                .stream()
                .map(ResourceID::fromResource)
                .collect(Collectors.toSet());
    };

    // build the config
    InformerEventSourceConfiguration<CloudflareZoneCustomResource> configuration =
            InformerEventSourceConfiguration.from(CloudflareZoneCustomResource.class, CloudflareRecordCustomResource.class)
                    .withPrimaryToSecondaryMapper(primaryToSecondary)
                    .withSecondaryToPrimaryMapper(secondaryToPrimary)
                    .build();
    return List.of(new InformerEventSource<>(configuration, context));
}

Testing

I like having integration tests that actually run the application. For this kind of application, that’s almost going to be the only test that really makes sense. You could unit test whether your reconciler calls the right services, but you’re much more likely to run into other kinds of issues.

Luckily, the framework includes great testing support. You’ll want to include the following dependencies:

io.javaoperatorsdk:operator-framework-spring-boot-starter-test
io.javaoperatorsdk:operator-framework-junit-5

This provides support for using Fabric8’s mock Kubernetes server. The simplest Spring Boot test setup looks like this.

@SpringBootTest
@EnableMockOperator(crdPaths = {
        "classpath:META-INF/fabric8/cloudflarerecords.ddns.inias.eu-v1.yml",
        "classpath:META-INF/fabric8/cloudflarezones.ddns.inias.eu-v1.yml",
        "classpath:META-INF/fabric8/pages.ddns.inias.eu-v1.yml",
        "classpath:META-INF/fabric8/sites.ddns.inias.eu-v1.yml",
})
public class IntegrationTest {
    @Autowired
    KubernetesClient client;
    //...
}

That is, you include the paths to the generated CRDs, autowire a Kubernetes client, and you’re good to go.

Testcontainers or Wiremock work very well to represent your external system. If you happen to run into the limitations of the mock Kubernetes server, you can also use a K3s Testcontainer for a more complete Kubernetes test environment.

For the actual assertions, Awaitility is ideal. Use the autowired Kubernetes client to create or modify resources, and Awaitility to wait for the expected effects of the reconciliation, both in the mocked external system and in the status fields of the resource.

Here’s an example of a simple test. The createX methods create resources using the injected Kubernetes client.

@Test
void test() {
    when(publicIpService.getPublicIp()).thenReturn("1.1.1.1");

    createApiTokenSecret();
    createZone();
    createRecord();

    await().untilAsserted(() ->
            assertThat(testCloudflareService.getDnsRecordByName(TEST_ZONE.id(), "test.example.com"))
                    .isPresent()
                    .hasValueSatisfying(r -> assertThat(r.proxied()).isTrue())
    );

    createSite();
    createPage();

    await().untilAsserted(() -> {
        ConfigMap configMap = client.resources(ConfigMap.class)
                .inNamespace(NAMESPACE)
                .withName("test-site")
                .get();
        assertThat(configMap).isNotNull();
        // ...
    });
}

Local development

Getting started with local development is straightforward. Your Kubernetes operator will run completely out-of-the-box using your current kubectl context. This means it will automatically connect to your currently configured cluster, whether it’s a remote environment or a local one like Minikube.