Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-10372. SCM and Datanode communication for reconciliation #6506

Conversation

errose28
Copy link
Contributor

@errose28 errose28 commented Apr 10, 2024

What changes were proposed in this pull request?

A lot of boilerplate code to do something very simple:

  • Tell SCM to start reconciliation for a container from the CLI.
  • Have SCM tell Datanodes to reconcile that container with their peers.
  • Datanodes send back a placeholder container data checksum which we can fill in with reconciliation implementation later.
    • There is no communication between datanodes added in this change.
  • SCM updates its replica info based on the container report received after the Datanodes reconcile.

I've tried to avoid making any design related decisions in this PR. It is intended as a skeleton we can use to plug in the reconciliation implementation for end to end testing in future changes.

In scope for this change

  • Add new ozone admin container reconcile <container-id> command.
  • New command should be restricted to admins
  • Audit logging for new command
  • Blocking reconciliation of invalid containers (EC, 1 replica, still open)
  • Datanode queue metrics for reconciliation commands
  • Datanode and SCM application logs to follow the command as it moves through the system.
  • SCM saves container replicas' data checksums in memory, and they can be retrieved with ozone admin container info --json

Out of scope for this change (but will be handled in later tasks)

  • Any actual checksum related implementations
    • Currently byte strings are used as placeholders just to move filler data around for testing.
  • Recon integration with container data checksums
    • This includes Recon's ContainerReplicaHistoryProto
  • Finalized protobuf changes
    • Since the change is going to a feature branch we have the flexibility to evolve the protos later.
  • Good UX 😄
    • This includes flags for the reconcile command, an easy way to track reconciliation progress, and reading containers from stdin like other container subcommands support.
    • These will need some discussion so are probably best done as their own set of changes.
  • The following tasks have been moved out to do in follow up changes:
    • HDDS-10714 datanode status filtering for reconciliation peers and targets
    • HDDS-10759 Consider allowing reconciliation when not all replicas have reached closed state
    • HDDS-10760 SCMExceptions resulting from admin CLI commands are treated as retriable

What is the link to the Apache JIRA

HDDS-10372

How was this patch tested?

  • Acceptance test for CLI added
  • Manually tested the CLI with valid and invalid containers. Also manually checked SCM audit logging
  • Unit and integration tests added in the following classes:
    • TestReconcileContainerEventHandler: Tests SCM's filtering of reconciliation requests based on eligible container and replica states. When containers are eligible, tests that reconcile commands are sent to datanodes.
    • TestStateContext: Tests that the new command shows up in datanode queue metrics.
    • TestReconcileContainerCommandHandler: Tests datanode queue and runtime metrics when a reconcile command is received. Also tests that the ICR sent as the result of the command has the expected data checksum.
    • TestContainerDataYaml: Tests that the data checksum is not written to the .container file. Merkle tree information will be written to its own file in a different change.
    • TestHeartbeatEndpointTask: Tests that datanodes add a reconcile command to their queue when it is received on an SCM heartbeat response.
    • TestKeyValueHandler: Tests that the the KeyValueHandler triggers an ICR back to SCM with the expected values when reconciliation is invoked.
    • TestContainerReportHandler, TestIncrementalContainerReportHandler: Tests that SCM correctly saves replicas' data checksum information it receives on a heartbeat.

@errose28
Copy link
Contributor Author

errose28 commented May 7, 2024

At last the CI is green

@errose28 errose28 marked this pull request as ready for review May 7, 2024 03:22
Copy link
Contributor

@kerneltime kerneltime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished one round, will go over testing next.

@@ -35,6 +40,8 @@ public final class ContainerReplicaInfo {
private long keyCount;
private long bytesUsed;
private int replicaIndex = -1;
@JsonSerialize(using = LongToHexJsonSerializer.class)
private long dataChecksum;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: Hex for printing to make it more human friendly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. That seemed standard but we could use a different format if there's a better option.

Comment on lines +47 to +49
System.out.println("Use \"ozone admin container info --json " + containerId + "\" to see the checksums of each " +
"container replica");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
System.out.println("Use \"ozone admin container info --json " + containerId + "\" to see the checksums of each " +
"container replica");
}
System.out.println("Once reconciliation is complete, use \"ozone admin container info --json " + containerId + "\" to see the checksums of each " +
"container replica");
}

@aswinshakil aswinshakil self-requested a review May 22, 2024 16:34
Copy link
Member

@aswinshakil aswinshakil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @errose28. I just have a few minor nits. Otherwise LGTM

…concile-cli

* HDDS-10239-container-reconciliation: (296 commits)
  HDDS-10897. Refactor OzoneQuota (apache#6714)
  HDDS-10422. Fix some warnings about exposing internal representation in hdds-common (apache#6351)
  HDDS-10899. Refactor Lease callbacks (apache#6715)
  HDDS-10890. Increase default value for hdds.container.ratis.log.appender.queue.num-elements (apache#6711)
  HDDS-10832. Client should switch to streaming based on OpenKeySession replication (apache#6683)
  HDDS-10435. Support S3 object tags for existing requests (apache#6607)
  HDDS-10883. Improve logging in Recon for finalising DN logic. (apache#6704)
  HDDS-8752. Enable TestOzoneRpcClientAbstract#testOverWriteKeyWithAndWithOutVersioning (apache#6702)
  HDDS-10875. XceiverRatisServer#getRaftPeersInPipeline should be called before XceiverRatisServer#removeGroup (apache#6696)
  HDDS-10514. Recon - Provide DN decommissioning detailed status and info inline with current CLI command output. (apache#6376)
  HDDS-10878. Bump zstd-jni to 1.5.6-3 (apache#6701)
  HDDS-10877. Bump Dropwizard metrics to 3.2.6 (apache#6699)
  HDDS-10876. Bump jackson to 2.16.2 (apache#6697)
  HDDS-6116. Remove flaky tag from TestSCMInstallSnapshot (apache#6695)
  HDDS-2643. TestOzoneDelegationTokenSecretManager#testRenewTokenFailureRenewalTime fails intermittently.
  HDDS-10699. Refactor ContainerBalancerTask and TestContainerBalancerTask (apache#6537)
  HDDS-10861. Ozone cli supports default ozone.om.service.id (apache#6680)
  HDDS-10859. Improve error messages when decommission and maintenance fail-early (apache#6678)
  HDDS-9031. Upgrade acceptance tests to Docker Compose v2 (apache#6667)
  HDDS-10559. Add a warning or a check to run repair tool as System user (apache#6574)
  ...

Conflicts:
    hadoop-ozone/dist/src/main/smoketest/admincli/container.robot
@kerneltime kerneltime merged commit 930d3f0 into apache:HDDS-10239-container-reconciliation May 29, 2024
39 checks passed
errose28 added a commit to errose28/ozone that referenced this pull request Jun 6, 2024
* HDDS-10239-container-reconciliation:
  HDDS-10372. SCM and Datanode communication for reconciliation (apache#6506)
  HDDS-10239. Storage Container Reconciliation. (apache#6121)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants