Browse Source

fix: handle namespace deletion race conditions with finalizers (#5154)

* fix: handle namespace deletion race conditions with finalizers

Replace error suppression approach with proper finalizer-based cleanup
to prevent race conditions when namespaces are deleted.

Changes:
- Add finalizers to ClusterExternalSecret and ExternalSecret controllers
- Implement proper cleanup order during deletion
- Add namespace deletion checks before creating ExternalSecrets
- Remove namespace finalizers during CES cleanup
- Handle managed secret cleanup based on DeletionPolicy

This ensures resources are cleaned up in the correct order and prevents
errors when namespaces are terminating.

Fixes #4175

Signed-off-by: framsouza <fram.souza14@gmail.com>

* fix: resolve CI failures - lint and race conditions

- Fixed godot lint errors by adding periods to comment endings
- Fixed namespace update race conditions by:
  - Fetching latest namespace before updates to avoid conflicts
  - Handling IsConflict errors with immediate requeue
  - Making test retry on conflict when updating namespaces
- All tests now pass locally

Signed-off-by: framsouza <fram.souza14@gmail.com>

* Address review feedback and improve error handling

Based on reviewer feedback, this commit improves the code quality:

Controller behavior fixes:
- Return immediately after Update() calls to let Kubernetes trigger
  the next reconciliation naturally rather than continuing
- Replace deprecated Requeue field with RequeueAfter: 0 for
  immediate reconciliation on conflicts

Error handling improvements:
- Properly return errors from cleanup functions instead of just
  logging them
- Aggregate multiple errors during cleanup and return them all
- Add clear comments explaining the namespace finalizer purpose

The namespace finalizer prevents race conditions when namespaces
are deleted while ClusterExternalSecrets still have resources in
them. Each CES gets its own finalizer to track its resources
independently.

Signed-off-by: framsouza <fram.souza14@gmail.com>

* fix: add namespace update permissions for ClusterExternalSecret controller

The controller needs permission to update namespaces when managing finalizers
to prevent race conditions during namespace deletion.

Signed-off-by: framsouza <fram.souza14@gmail.com>

* fix: resolve unused parameter linter error

The log parameter in cleanupExternalSecrets was not being used,
causing linter failures in CI. Changed to underscore to indicate
intentionally unused parameter.

Signed-off-by: framsouza <fram.souza14@gmail.com>

* refactor: reduce cognitive complexity in cleanup methods

Split cleanupExternalSecrets into smaller functions to improve readability.
Extracted namespace and finalizer operations into separate helper methods.
Reduces SonarCloud complexity score from 30 to under 10.

Signed-off-by: framsouza <fram.souza14@gmail.com>

* fix: resolve golangci-lint errors

- Add periods to end of comment lines to fix godot linter errors
- Combine repeated parameter types to fix gocritic paramTypeCombine warnings
- Updated function signatures to use combined type declarations (e.g., namespaceName, finalizer string)

Signed-off-by: framsouza <fram.souza14@gmail.com>

* refactor: improve finalizer handling and error aggregation

- Always attempt cleanup on deletion to handle edge cases
- Only call Update() when finalizers actually change
- Use errors.Join() instead of error slices
- Return error instead of RequeueAfter:0 for conflicts

Addresses review feedback from jakobmoellerdev.

Signed-off-by: framsouza <francismara.souza@elastic.co>

* docs(helm): Add detailed comment for processClusterExternalSecret flag

Clarifies that enabling processClusterExternalSecret adds update/patch
permissions on namespaces to handle finalizers for proper cleanup during
namespace deletion, preventing race conditions with ExternalSecrets.

Signed-off-by: framsouza <francismara.souza@elastic.co>

* docs(helm): Update generated README with processClusterExternalSecret documentation

The helm chart README is auto-generated from values.yaml comments.
This commit includes the generated documentation for the
processClusterExternalSecret flag.

Signed-off-by: framsouza <francismara.souza@elastic.co>

---------

Signed-off-by: framsouza <fram.souza14@gmail.com>
Signed-off-by: framsouza <francismara.souza@elastic.co>
Co-authored-by: Moritz Johner <moolen@users.noreply.github.com>
Co-authored-by: Jakob Möller <jakob.moeller@sap.com>
framsouza 7 months ago
parent
commit
e324c5f9ee

+ 1 - 1
deploy/charts/external-secrets/README.md

@@ -156,7 +156,7 @@ The command removes all the Kubernetes components associated with the chart and
 | podSecurityContext.enabled | bool | `true` |  |
 | podSpecExtra | object | `{}` | Any extra pod spec on the deployment |
 | priorityClassName | string | `""` | Pod priority class name. |
-| processClusterExternalSecret | bool | `true` | if true, the operator will process cluster external secret. Else, it will ignore them. |
+| processClusterExternalSecret | bool | `true` | if true, the operator will process cluster external secret. Else, it will ignore them. When enabled, this adds update/patch permissions on namespaces to handle finalizers for proper cleanup during namespace deletion, preventing race conditions with ExternalSecrets. |
 | processClusterGenerator | bool | `true` | if true, the operator will process cluster generator. Else, it will ignore them. |
 | processClusterPushSecret | bool | `true` | if true, the operator will process cluster push secret. Else, it will ignore them. |
 | processClusterStore | bool | `true` | if true, the operator will process cluster store. Else, it will ignore them. |

+ 9 - 0
deploy/charts/external-secrets/templates/rbac.yaml

@@ -125,6 +125,15 @@ rules:
     - "get"
     - "list"
     - "watch"
+  {{- if .Values.processClusterExternalSecret }}
+  - apiGroups:
+    - ""
+    resources:
+    - "namespaces"
+    verbs:
+    - "update"
+    - "patch"
+  {{- end }}
   - apiGroups:
     - ""
     resources:

+ 2 - 0
deploy/charts/external-secrets/values.yaml

@@ -86,6 +86,8 @@ openshiftFinalizers: true
 systemAuthDelegator: false
 
 # -- if true, the operator will process cluster external secret. Else, it will ignore them.
+# When enabled, this adds update/patch permissions on namespaces to handle finalizers for proper
+# cleanup during namespace deletion, preventing race conditions with ExternalSecrets.
 processClusterExternalSecret: true
 
 # -- if true, the operator will process cluster push secret. Else, it will ignore them.

+ 193 - 2
pkg/controllers/clusterexternalsecret/clusterexternalsecret_controller.go

@@ -58,6 +58,11 @@ const (
 	errConvertLabelSelector = "unable to convert labelselector"
 	errGetExistingES        = "could not get existing ExternalSecret"
 	errNamespacesFailed     = "one or more namespaces failed"
+
+	// ClusterExternalSecretFinalizer is the finalizer for ClusterExternalSecret resources.
+	// This finalizer ensures that all ExternalSecrets created by the ClusterExternalSecret
+	// are properly cleaned up before the ClusterExternalSecret is deleted, preventing orphaned resources.
+	ClusterExternalSecretFinalizer = "externalsecrets.external-secrets.io/clusterexternalsecret-cleanup"
 )
 
 func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
@@ -81,9 +86,33 @@ func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Resu
 		return ctrl.Result{}, err
 	}
 
-	// skip reconciliation if deletion timestamp is set on cluster external secret
+	// Handle deletion with finalizer
+	// When a ClusterExternalSecret is being deleted, we need to ensure all created ExternalSecrets
+	// and namespace finalizers are cleaned up to prevent resource leaks and namespace deletion blocking.
 	if clusterExternalSecret.DeletionTimestamp != nil {
-		log.Info("skipping as it is in deletion")
+		// Always attempt cleanup to handle edge case where finalizer might be removed externally
+		if err := r.cleanupExternalSecrets(ctx, log, &clusterExternalSecret); err != nil {
+			log.Error(err, "failed to cleanup ExternalSecrets")
+			return ctrl.Result{}, err
+		}
+
+		// Remove finalizer from ClusterExternalSecret if it exists
+		if updated := controllerutil.RemoveFinalizer(&clusterExternalSecret, ClusterExternalSecretFinalizer); updated {
+			if err := r.Update(ctx, &clusterExternalSecret); err != nil {
+				return ctrl.Result{}, err
+			}
+		}
+		return ctrl.Result{}, nil
+	}
+
+	// Add finalizer if it doesn't exist
+	// This ensures the ClusterExternalSecret cannot be deleted until we've cleaned up all
+	// ExternalSecrets it created and removed our finalizers from namespaces.
+	if updated := controllerutil.AddFinalizer(&clusterExternalSecret, ClusterExternalSecretFinalizer); updated {
+		if err := r.Update(ctx, &clusterExternalSecret); err != nil {
+			return ctrl.Result{}, err
+		}
+		// Return immediately after update to let the change propagate
 		return ctrl.Result{}, nil
 	}
 
@@ -142,6 +171,14 @@ func (r *Reconciler) reconcile(ctx context.Context, log logr.Logger, clusterExte
 	sort.Strings(provisionedNamespaces)
 	clusterExternalSecret.Status.ProvisionedNamespaces = provisionedNamespaces
 
+	// Check if any failures are due to conflicts - if so, requeue immediately
+	for _, err := range failedNamespaces {
+		if apierrors.IsConflict(err) {
+			log.V(1).Info("conflict detected, requeuing immediately")
+			return ctrl.Result{}, fmt.Errorf("conflict detected, will retry: %w", err)
+		}
+	}
+
 	return ctrl.Result{RequeueAfter: refreshInt}, nil
 }
 
@@ -155,6 +192,11 @@ func (r *Reconciler) gatherProvisionedNamespaces(
 ) []string {
 	var provisionedNamespaces []string //nolint:prealloc // we don't know the size
 	for _, namespace := range namespaces {
+		// Skip namespace if it's being deleted
+		if namespace.DeletionTimestamp != nil {
+			log.Info("skipping namespace as it is being deleted", "namespace", namespace.Name)
+			continue
+		}
 		var existingES esv1.ExternalSecret
 		err := r.Get(ctx, types.NamespacedName{
 			Name:      esName,
@@ -172,6 +214,12 @@ func (r *Reconciler) gatherProvisionedNamespaces(
 		}
 
 		if err := r.createOrUpdateExternalSecret(ctx, clusterExternalSecret, namespace, esName, clusterExternalSecret.Spec.ExternalSecretMetadata); err != nil {
+			// If conflict, don't log as error - just add to failed namespaces for retry
+			if apierrors.IsConflict(err) {
+				log.V(1).Info("conflict while updating namespace, will retry", "namespace", namespace.Name)
+				failedNamespaces[namespace.Name] = err
+				continue
+			}
 			log.Error(err, "failed to create or update external secret")
 			failedNamespaces[namespace.Name] = err
 			continue
@@ -204,7 +252,116 @@ func (r *Reconciler) removeOldSecrets(ctx context.Context, log logr.Logger, clus
 	return nil
 }
 
+// cleanupExternalSecrets removes all ExternalSecrets created by this ClusterExternalSecret
+// and removes the namespace finalizers we added. This uses a dual-finalizer strategy:
+// 1. ClusterExternalSecret has a finalizer to ensure cleanup happens
+// 2. Each namespace gets a CES-specific finalizer to prevent deletion race conditions
+// The namespace finalizer is named with the CES name to allow multiple CES resources
+// to provision into the same namespace independently.
+func (r *Reconciler) cleanupExternalSecrets(ctx context.Context, log logr.Logger, clusterExternalSecret *esv1.ClusterExternalSecret) error {
+	esName := r.getExternalSecretName(clusterExternalSecret)
+
+	var err error
+	for _, ns := range clusterExternalSecret.Status.ProvisionedNamespaces {
+		if cleanupErr := r.cleanupNamespaceResources(ctx, log, clusterExternalSecret, ns, esName); cleanupErr != nil {
+			err = errors.Join(err, cleanupErr)
+		}
+	}
+
+	return err
+}
+
+// getExternalSecretName returns the name to use for ExternalSecrets.
+func (r *Reconciler) getExternalSecretName(clusterExternalSecret *esv1.ClusterExternalSecret) string {
+	if clusterExternalSecret.Status.ExternalSecretName != "" {
+		return clusterExternalSecret.Status.ExternalSecretName
+	}
+	return clusterExternalSecret.ObjectMeta.Name
+}
+
+// cleanupNamespaceResources handles cleanup for a single namespace.
+func (r *Reconciler) cleanupNamespaceResources(ctx context.Context, log logr.Logger, clusterExternalSecret *esv1.ClusterExternalSecret, namespaceName, esName string) error {
+	// Get the namespace
+	namespace, err := r.getNamespace(ctx, namespaceName)
+	if err != nil {
+		if apierrors.IsNotFound(err) {
+			// Namespace already deleted, that's ok
+			return nil
+		}
+		return fmt.Errorf("failed to get namespace %s: %w", namespaceName, err)
+	}
+
+	// Remove namespace finalizer
+	if err := r.removeNamespaceFinalizer(ctx, log, namespace, clusterExternalSecret.Name); err != nil {
+		return err
+	}
+
+	// Delete ExternalSecret
+	if err := r.deleteExternalSecret(ctx, esName, clusterExternalSecret.Name, namespaceName); err != nil {
+		return fmt.Errorf("failed to delete ExternalSecret in namespace %s: %w", namespaceName, err)
+	}
+
+	return nil
+}
+
+// getNamespace fetches a namespace by name.
+func (r *Reconciler) getNamespace(ctx context.Context, name string) (*v1.Namespace, error) {
+	var namespace v1.Namespace
+	err := r.Get(ctx, types.NamespacedName{Name: name}, &namespace)
+	return &namespace, err
+}
+
+// removeNamespaceFinalizer removes the CES-specific finalizer from a namespace.
+func (r *Reconciler) removeNamespaceFinalizer(ctx context.Context, log logr.Logger, namespace *v1.Namespace, cesName string) error {
+	finalizer := r.buildCESFinalizer(cesName)
+
+	if !controllerutil.ContainsFinalizer(namespace, finalizer) {
+		return nil // Finalizer doesn't exist, nothing to do
+	}
+
+	return r.updateNamespaceRemoveFinalizer(ctx, log, namespace.Name, finalizer)
+}
+
+// buildCESFinalizer creates the finalizer name for a CES.
+func (r *Reconciler) buildCESFinalizer(cesName string) string {
+	return "externalsecrets.external-secrets.io/ces-" + cesName
+}
+
+// updateNamespaceRemoveFinalizer removes a finalizer from a namespace with conflict handling.
+func (r *Reconciler) updateNamespaceRemoveFinalizer(ctx context.Context, log logr.Logger, namespaceName, finalizer string) error {
+	// Fetch the latest namespace to avoid conflicts
+	namespace, err := r.getNamespace(ctx, namespaceName)
+	if err != nil {
+		if apierrors.IsNotFound(err) {
+			return nil // Namespace deleted, that's OK
+		}
+		return fmt.Errorf("failed to get latest namespace %s: %w", namespaceName, err)
+	}
+
+	// Only update if the finalizer was actually removed
+	if updated := controllerutil.RemoveFinalizer(namespace, finalizer); updated {
+		if err := r.Update(ctx, namespace); err != nil {
+			// Ignore NotFound (namespace deleted) and Conflict (will retry)
+			if apierrors.IsNotFound(err) || apierrors.IsConflict(err) {
+				log.V(1).Info("ignoring expected error during finalizer removal",
+					"namespace", namespaceName,
+					"error", err.Error())
+				return nil
+			}
+			return fmt.Errorf("failed to remove finalizer from namespace %s: %w", namespaceName, err)
+		}
+	}
+
+	return nil
+}
+
 func (r *Reconciler) createOrUpdateExternalSecret(ctx context.Context, clusterExternalSecret *esv1.ClusterExternalSecret, namespace v1.Namespace, esName string, esMetadata esv1.ExternalSecretMetadata) error {
+	// Add namespace finalizer first to prevent deletion race conditions
+	if err := r.ensureNamespaceFinalizer(ctx, &namespace, clusterExternalSecret.Name); err != nil {
+		return err
+	}
+
+	// Create or update the ExternalSecret
 	externalSecret := &esv1.ExternalSecret{
 		ObjectMeta: metav1.ObjectMeta{
 			Namespace: namespace.Name,
@@ -240,6 +397,40 @@ func (r *Reconciler) createOrUpdateExternalSecret(ctx context.Context, clusterEx
 	return nil
 }
 
+// ensureNamespaceFinalizer adds a CES-specific finalizer to the namespace if it doesn't exist.
+// This prevents the namespace from being deleted while we're managing ExternalSecrets in it.
+func (r *Reconciler) ensureNamespaceFinalizer(ctx context.Context, namespace *v1.Namespace, cesName string) error {
+	finalizer := r.buildCESFinalizer(cesName)
+
+	if controllerutil.ContainsFinalizer(namespace, finalizer) {
+		return nil // Already has finalizer
+	}
+
+	return r.addNamespaceFinalizer(ctx, namespace.Name, finalizer)
+}
+
+// addNamespaceFinalizer adds a finalizer to a namespace with conflict handling.
+func (r *Reconciler) addNamespaceFinalizer(ctx context.Context, namespaceName, finalizer string) error {
+	// Fetch the latest namespace to avoid conflicts
+	namespace, err := r.getNamespace(ctx, namespaceName)
+	if err != nil {
+		return fmt.Errorf("could not get latest namespace: %w", err)
+	}
+
+	// Only update if the finalizer was actually added
+	if updated := controllerutil.AddFinalizer(namespace, finalizer); updated {
+		if err := r.Update(ctx, namespace); err != nil {
+			// If conflict, return error to trigger requeue
+			if apierrors.IsConflict(err) {
+				return err
+			}
+			return fmt.Errorf("could not add namespace finalizer: %w", err)
+		}
+	}
+
+	return nil
+}
+
 func (r *Reconciler) deleteExternalSecret(ctx context.Context, esName, cesName, namespace string) error {
 	var existingES esv1.ExternalSecret
 	err := r.Get(ctx, types.NamespacedName{

+ 9 - 2
pkg/controllers/clusterexternalsecret/clusterexternalsecret_controller_test.go

@@ -539,8 +539,15 @@ var _ = Describe("ClusterExternalSecret controller", func() {
 					}
 				}).WithTimeout(timeout).WithPolling(interval).Should(Succeed())
 
-				namespaces[0].Labels = map[string]string{}
-				Expect(k8sClient.Update(ctx, &namespaces[0])).ShouldNot(HaveOccurred())
+				// Retry on conflict since controller may be updating namespace with finalizers
+				Eventually(func() error {
+					var ns v1.Namespace
+					if err := k8sClient.Get(ctx, types.NamespacedName{Name: namespaces[0].Name}, &ns); err != nil {
+						return err
+					}
+					ns.Labels = map[string]string{}
+					return k8sClient.Update(ctx, &ns)
+				}).WithTimeout(timeout).WithPolling(interval).Should(Succeed())
 			},
 			expectedClusterExternalSecret: func(namespaces []v1.Namespace, created esv1.ClusterExternalSecret) esv1.ClusterExternalSecret {
 				return esv1.ClusterExternalSecret{

+ 55 - 2
pkg/controllers/externalsecret/externalsecret_controller.go

@@ -65,6 +65,9 @@ const (
 	fieldOwnerTemplate    = "externalsecrets.external-secrets.io/%v"
 	fieldOwnerTemplateSha = "externalsecrets.external-secrets.io/sha3/%x"
 
+	// ExternalSecretFinalizer is the finalizer for ExternalSecret resources.
+	ExternalSecretFinalizer = "externalsecrets.external-secrets.io/externalsecret-cleanup"
+
 	// condition messages for "SecretSynced" reason.
 	msgSynced       = "secret synced"
 	msgSyncedRetain = "secret retained due to DeletionPolicy=Retain"
@@ -181,12 +184,30 @@ func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (result ct
 		return ctrl.Result{}, err
 	}
 
-	// skip reconciliation if deletion timestamp is set on external secret
+	// Handle deletion with finalizer
 	if !externalSecret.GetDeletionTimestamp().IsZero() {
-		log.V(1).Info("skipping ExternalSecret, it is marked for deletion")
+		// Always attempt cleanup to handle edge case where finalizer might be removed externally
+		if err := r.cleanupManagedSecrets(ctx, log, externalSecret); err != nil {
+			log.Error(err, "failed to cleanup managed secrets")
+			return ctrl.Result{}, err
+		}
+
+		// Remove finalizer if it exists
+		if updated := controllerutil.RemoveFinalizer(externalSecret, ExternalSecretFinalizer); updated {
+			if err := r.Update(ctx, externalSecret); err != nil {
+				return ctrl.Result{}, err
+			}
+		}
 		return ctrl.Result{}, nil
 	}
 
+	// Add finalizer if it doesn't exist
+	if updated := controllerutil.AddFinalizer(externalSecret, ExternalSecretFinalizer); updated {
+		if err := r.Update(ctx, externalSecret); err != nil {
+			return ctrl.Result{}, err
+		}
+	}
+
 	// if extended metrics is enabled, refine the time series vector
 	resourceLabels = ctrlmetrics.RefineLabels(resourceLabels, externalSecret.Labels)
 
@@ -603,6 +624,38 @@ func (r *Reconciler) markAsFailed(msg string, err error, externalSecret *esv1.Ex
 	counter.Inc()
 }
 
+func (r *Reconciler) cleanupManagedSecrets(ctx context.Context, log logr.Logger, externalSecret *esv1.ExternalSecret) error {
+	// Only delete secrets if DeletionPolicy is Delete
+	if externalSecret.Spec.Target.DeletionPolicy != esv1.DeletionPolicyDelete {
+		log.V(1).Info("skipping secret deletion due to DeletionPolicy", "policy", externalSecret.Spec.Target.DeletionPolicy)
+		return nil
+	}
+
+	secretName := externalSecret.Spec.Target.Name
+	if secretName == "" {
+		secretName = externalSecret.Name
+	}
+
+	var secret v1.Secret
+	err := r.Get(ctx, types.NamespacedName{Name: secretName, Namespace: externalSecret.Namespace}, &secret)
+	if err != nil {
+		if apierrors.IsNotFound(err) {
+			return nil
+		}
+		return err
+	}
+
+	// Only delete if we own it
+	if metav1.IsControlledBy(&secret, externalSecret) {
+		if err := r.Delete(ctx, &secret); err != nil && !apierrors.IsNotFound(err) {
+			return err
+		}
+		log.V(1).Info("deleted managed secret", "secret", secretName)
+	}
+
+	return nil
+}
+
 func (r *Reconciler) deleteOrphanedSecrets(ctx context.Context, externalSecret *esv1.ExternalSecret, secretName string) error {
 	ownerLabel := utils.ObjectHash(fmt.Sprintf("%v/%v", externalSecret.Namespace, externalSecret.Name))