Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11149. Have a generic version Validator for validating Requests #6932

Draft
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

swamirishi
Copy link
Contributor

@swamirishi swamirishi commented Jul 12, 2024

What changes were proposed in this pull request?

Currently RequestFeatureValidator only has a validator which checks if a certain client version for the OM request made by an older client or if the OM needs to be finalized.
There is a need for a more generic validator which should have a capability to validate the request from the client based on the server's layout version & client's version.
The patch introduces a new annotation RegisterValidator which allows to register a new feature validator for any request. The annotation enforces the validator to have a RequestType, maxVersion, ProcessingPhase in it's definition.
ValidationRegistry should be able to pull the validators registered and inturn should be able to get all the underlying validator methods based on the usage of the registered feature validator. The patch segregates the ClientVersionValidation and FINALIZATION validation into 2 validators named OMClientVersionValidator & OMLayoutVersionValidator respectively which would be responsible for checking the request client version & server side metadata layout version.
The validator has been written in such a way that the same code can be reused for other components (SCM & DN) as well to do their request validations as well.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11149

How was this patch tested?

Existing unit test and additional unit tests

…or to validate clients older than the specified version

Change-Id: Ifcbb239994e7357032934d104ab18063c8d64251
@swamirishi swamirishi marked this pull request as draft July 12, 2024 04:17
@swamirishi
Copy link
Contributor Author

Creating this patch as permanent fix in favour of #6922

@swamirishi
Copy link
Contributor Author

If this approach looks good, I can also go about adding an integration test which tests all Om operations for different client versions.

Copy link
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @swamirishi for making the long-term fix quickly. :)

Only one comment so far (but it applies to many files).

Comment on lines 463 to 466
conditions = {},
processingPhase = RequestProcessingPhase.PRE_PROCESS,
requestType = Type.CreateBucket
requestType = Type.CreateBucket,
maxClientVersion = ClientVersion.ERASURE_CODING_SUPPORT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add {} as default value here:

and tweak changes in @RequestFeatureValidator usage:

-      conditions = ValidationCondition.OLDER_CLIENT_REQUESTS,
+      maxClientVersion = ClientVersion.ERASURE_CODING_SUPPORT,

to reduce the patch. This changes only 1 line per item, instead of 3 lines + an unchanged line in the middle.

@sadanand48
Copy link
Contributor

Looks like I commented without refreshing the PR, this seems to follow same approach as what is suggested as long term solution. Thanks @swamirishi for the quick fix.

Entry<? extends ExecutableElement, ? extends AnnotationValue> entry) {
if (isPropertyNamedAs(entry, ANNOTATION_MAX_CLIENT_VERSION_PROPERTY_NAME)) {
Name maxClientVersion = visit(entry, new MaxClientVersionVisitor());
return !maxClientVersion.contentEquals(MAX_CLIENT_VERSION_FUTURE_VERSION);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also check for sanity of maxClientVersion if not equal to MAX_CLIENT_VERSION_FUTURE_VERSION i.e whether it lies in the same range as ClientVersion.values()

@swamirishi
Copy link
Contributor Author

Looks like I commented without refreshing the PR, this seems to follow same approach as what is suggested as long term solution. Thanks @swamirishi for the quick fix.

Oh we can't do that since this is part of the static code. The annotation processor also doesn't have any access to the enums. So if we add this check, everytime we change something in the enum,. We would have to make a change. Thus bringing us back to the same problem.

@fapifta
Copy link
Contributor

fapifta commented Jul 12, 2024

Please give me some time to get back to this one next week... I gonna need to digest this a bit more and also I would like to dig out some of my notes on earlier plans about how to make this better. I would not rush to commit this, so if it is required to fix this, let's push #6922 if we have time to work out this solution then please wait for me a bit. Thank you!

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @swamirishi. While this PR is focused on the issues with the OLDER_CLIENT_REQUEST condition, the CLUSTER_NEEDS_FINALIZATION condition has the same problem. I suggest this:

  • Adding two new keys in the annotation: one for client version and one for layout version. They are exclusive. Only one is allowed to be specified per annotation.
  • Removing the general ValidationCondition from the annotation.
  • Two 3-nested maps in the registry: one for layout feature validators and one for client version validators since the version enums have different types.

Currently there is some iteration required for every request, even if they don't have a validator. With the general concept of validation conditions removed we can do a simple map lookup which will be more efficient in the common case, especially if we order the keys accordingly.

In this layout I think request version -> request type -> request phase -> validator is the most efficient layout of the map. Most clients will probably be newer versions and can fail out of the check in the first level. After that, most requests will not have compat concerns for a given version, so even more requests exit early. The phase is hardcoded depending on which method is called so there is no option to exit early here, hence it should be last.

This is less general than the validation condition support we have now, but IMO the general implementation is causing code complexity and inefficiencies on the write path for something we do not use and may never need. This also makes it consistent with other upgrade related annotations like DisallowedUntilLayoutVersion which only take a version and act on it with a fixed condition.

EDIT: I just remembered client version is the only thing that is not an exact match, so using it as a key directly will not work. The idea of baking the conditions into the framework to simplify the implementation and make it more consistent with the layout feature finalization framework still seems like a good approach though.

processingPhase = RequestProcessingPhase.PRE_PROCESS,
requestType = Type.CreateBucket
requestType = Type.CreateBucket,
maxClientVersion = ClientVersion.ERASURE_CODING_SUPPORT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this so that ClientVersion.BUCKET_LAYOUT_SUPPORT is provided in the annotation for validators that deal with bucket layout. That would make this an exclusive upper bound on the client version that needs to be processed instead of an inclusive one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we can remove the if statements in the body now right?

@@ -81,7 +82,7 @@
* Runtime conditions in which a validator should run.
* @return a list of conditions when the validator should be applied
*/
ValidationCondition[] conditions();
ValidationCondition[] conditions() default {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this field?

* conditions though.
* @returns the max required client version for which the validator runs for older clients.
*/
ClientVersion maxClientVersion() default ClientVersion.FUTURE_VERSION;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove the conditions field then this probably shouldn't have a default value. Technically the request validators can be used for things other than client versioning but I don't think we have a use case for that right now. Narrowing the scope of conditions indicates what their intended use is.

Comment on lines 55 to 57
private final Map<Pair<Type, RequestProcessingPhase>, Pair<List<Method>, TreeMap<Integer, Integer>>>
maxAllowedVersionValidatorMap = new HashMap<>(Type.values().length * RequestProcessingPhase.values().length,
1.0f);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we get rid ValidationCondition then we just have 3 keys to identify the validator(s) that need to run:

  • Request type
  • Request phase
  • Client version

A triple nested map of these keys will be much easier to follow than the wild data structure created here.

@swamirishi
Copy link
Contributor Author

swamirishi commented Jul 12, 2024

Thanks for working on this @swamirishi. While this PR is focused on the issues with the OLDER_CLIENT_REQUEST condition, the CLUSTER_NEEDS_FINALIZATION condition has the same problem. I suggest this:

  • Adding two new keys in the annotation: one for client version and one for layout version. They are exclusive. Only one is allowed to be specified per annotation.
  • Removing the general ValidationCondition from the annotation.
  • Two 3-nested maps in the registry: one for layout feature validators and one for client version validators since the version enums have different types.

Currently there is some iteration required for every request, even if they don't have a validator. With the general concept of validation conditions removed we can do a simple map lookup which will be more efficient in the common case, especially if we order the keys accordingly.

In this layout I think request version -> request type -> request phase -> validator is the most efficient layout of the map. Most clients will probably be newer versions and can fail out of the check in the first level. After that, most requests will not have compat concerns for a given version, so even more requests exit early. The phase is hardcoded depending on which method is called so there is no option to exit early here, hence it should be last.

This is less general than the validation condition support we have now, but IMO the general implementation is causing code complexity and inefficiencies on the write path for something we do not use and may never need. This also makes it consistent with other upgrade related annotations like DisallowedUntilLayoutVersion which only take a version and act on it with a fixed condition.

Thanks for working on this @swamirishi. While this PR is focused on the issues with the OLDER_CLIENT_REQUEST condition, the CLUSTER_NEEDS_FINALIZATION condition has the same problem. I suggest this:

  • Adding two new keys in the annotation: one for client version and one for layout version. They are exclusive. Only one is allowed to be specified per annotation.
  • Removing the general ValidationCondition from the annotation.
  • Two 3-nested maps in the registry: one for layout feature validators and one for client version validators since the version enums have different types.

Currently there is some iteration required for every request, even if they don't have a validator. With the general concept of validation conditions removed we can do a simple map lookup which will be more efficient in the common case, especially if we order the keys accordingly.

In this layout I think request version -> request type -> request phase -> validator is the most efficient layout of the map. Most clients will probably be newer versions and can fail out of the check in the first level. After that, most requests will not have compat concerns for a given version, so even more requests exit early. The phase is hardcoded depending on which method is called so there is no option to exit early here, hence it should be last.

Version shouldn't be part of the map key? We need the validator running for all clients older than a certain condition.
I believe we need the same for Layout version. E.g. We know if a server hasn't currently finalized ERASURE_CODED_STORAGE_SUPPORT then we can be sure that it hasn't started preparing for HSYNC as well. Thus if a client comes in and tries to use HSYNC the validator should block that request as well.
Moreover we don't need a nested map since all the keys would be used together. If we have Triple or Tuple map it is the same hash code wise. I believe this would be much better to tuple map since you hash map size would be much smaller and only on the elements added. EnumMaps would create a map for all the values. It would be wasteful in the case we don't have validators blocking for all the enum values there are.

This is less general than the validation condition support we have now, but IMO the general implementation is causing code complexity and inefficiencies on the write path for something we do not use and may never need. This also makes it consistent with other upgrade related annotations like DisallowedUntilLayoutVersion which only take a version and act on it with a fixed condition.

Change-Id: Ib1742e6a78f908598021ed0bb80f71d3c1527735
Change-Id: Ic084d8bc456468d66cb26d8fc0cfdd33920fb8ed

# Conflicts:
#	hadoop-hdds/common/src/main/resources/META-INF/services/javax.annotation.processing.Processor
Change-Id: Iab5e022b6a1401998d905c68d491fe2a2a7413b7
Change-Id: Iba9e1caa6a66b033d39c95b20f78ebc8e23fba4d
@swamirishi swamirishi changed the title HDDS-11149. Add a maximum client version in the RequestFeatureValidator to validate clients older than the specified version HDDS-11149. Have a generic version Validator for validating Requests Jul 15, 2024
Change-Id: I05265cad5f26082351c6926c24866b20a50b26d2
@fapifta
Copy link
Contributor

fapifta commented Jul 17, 2024

Hi @swamirishi,

thank you for continued work on this one, the idea presented in the PR seems really promising, and for the first sight, it is the generalisation in the proper direction in order to make it possible to extend the current request validations to other service components, which I really like. To be honest I really envy you to get to this, I never forgot how badly I wanted to push things towards this direction back at the time when the original feature was ready but we could not spare time for this... Anyway... let's go back to the review itself...

Please note that the single most crucial point of this whole system is the validationsFor() methods in ValidatorRegistry, every other bit is secondary compared to the operational effectiveness of how we get the validations, as that is in the way of all request processing. This means optimally the system can figure out at the very first step the cheapest possible way that the there are no validations to apply to a request, as in almost all request will fall into this bucket. Besides that the data structure should be designed in a way that it does not hold back the request processing more than it is absolutely necessary when there are version differences between the client and the server.
From this sense, we possibly should check the operational complexity of querying for a case where the client and the server is at the same version, and the operational complexity of querying for a case where the client and the server has different versions. The different version scenario has two sub cases, where the client is newer, this falls into the bucket of client is the same version as server, as in this case the client side deals with compatibility, and where the server is newer that is the case where multiple validations may be applied.
This note I put here just to reflect to your argument in the HashMap vs EnumMap questions, as this is not just a matter of taste, as a suboptimal solution for the same client server version case for sure affects the whole performance of any service where this runs, so we need to be careful and decide based on this property. (And yes in this case even a few CPU cycles count at scale.) We can neglect the costs of building up the in-memory data structure, as that happens once, and we can also neglect the memory efficiency as it does not matter if this structure at the end of the day retains 1MB or 100MB for the runtime of the process as usually these processes have tens or hundreds of GB RAM available for the processes.
In light of this... I would argue that the array based enum map provides probably a better performance for getting the values at the price of retaining more memory, but that is more feasible for us than using a hashTable and calculating hashCode at every get.
This of course is only an argument if we do not have other means to optimize the same client same server version case, but we need an optimal older client newer server case resolution also, as after an upgrade there might be a bunch of clients to be upgraded and during that happens performance degradation can be problematic. (This might not just happen at rolling upgrade which we don't support yet, but that is the most obvious time when this can become a problem.)

I still did not have the time to digest fully the changes proposed, I wanted to give feedback as early as possible even if it is partial. You can expect some more review comments from me after I have the time to understand things further, most likely just early next week unfortunately.

On the other hand, there is a problem which we need to address.
Please take a look at the solution added in #6796, this PR adds the annotation processors explicitly to those modules where we use the annotation, and makes it impossible to use the annotations outside those modules, as importing the annotations became a violation in the enforcer plugin.
Renaming an annotation or an annotation processor, effectively turns these enforcements off if the names are not updated in the pom files, so with a rename we will need to go through the pom files and fix them to keep the prior enforcements in effect.

A less burning issue, however it is our best interest to deal with this:
I noticed is that due to renames and class deletes and method signature changes affects API documentation, from which the automation removed class references but this way the API doc is sometimes makes no sense, and most of the times deviates from the current implementation of the documented function.

Change-Id: I22ff08e5f4c43cbdda7a7db5f4d83417b55d481e
Change-Id: I551cb26f648bfedb7bc5bfee63b34731b56b4d9b
@swamirishi
Copy link
Contributor Author

swamirishi commented Jul 19, 2024

Hi @fapifta, thank you for your first level review. I have left comments inline

Hi @swamirishi,

thank you for continued work on this one, the idea presented in the PR seems really promising, and for the first sight, it is the generalisation in the proper direction in order to make it possible to extend the current request validations to other service components, which I really like. To be honest I really envy you to get to this, I never forgot how badly I wanted to push things towards this direction back at the time when the original feature was ready but we could not spare time for this... Anyway... let's go back to the review itself...

Please note that the single most crucial point of this whole system is the validationsFor() methods in ValidatorRegistry, every other bit is secondary compared to the operational effectiveness of how we get the validations, as that is in the way of all request processing. This means optimally the system can figure out at the very first step the cheapest possible way that the there are no validations to apply to a request, as in almost all request will fall into this bucket. Besides that the data structure should be designed in a way that it does not hold back the request processing more than it is absolutely necessary when there are version differences between the client and the server. From this sense, we possibly should check the operational complexity of querying for a case where the client and the server is at the same version, and the operational complexity of querying for a case where the client and the server has different versions. The different version scenario has two sub cases, where the client is newer, this falls into the bucket of client is the same version as server, as in this case the client side deals with compatibility, and where the server is newer that is the case where multiple validations may be applied. This note I put here just to reflect to your argument in the HashMap vs EnumMap questions, as this is not just a matter of taste, as a suboptimal solution for the same client server version case for sure affects the whole performance of any service where this runs, so we need to be careful and decide based on this property. (And yes in this case even a few CPU cycles count at scale.) We can neglect the costs of building up the in-memory data structure, as that happens once, and we can also neglect the memory efficiency as it does not matter if this structure at the end of the day retains 1MB or 100MB for the runtime of the process as usually these processes have tens or hundreds of GB RAM available for the processes. In light of this... I would argue that the array based enum map provides probably a better performance for getting the values at the price of retaining more memory, but that is more feasible for us than using a hashTable and calculating hashCode at every get. This of course is only an argument if we do not have other means to optimize the same client same server version case, but we need an optimal older client newer server case resolution also, as after an upgrade there might be a bunch of clients to be upgraded and during that happens performance degradation can be problematic. (This might not just happen at rolling upgrade which we don't support yet, but that is the most obvious time when this can become a problem.)

I have moved the entire thing into a ImmutableMap for the version class containing a nested enumMap which would track the request type and the phase for each of the validator, and to get all the validators which are supposed to run which have their validation condition version > the request version, I have used a treeMap which internally uses a Red black tree where all the keys are naturally sorted. So this makes the get very optimal to get all the request since we can directly jump to the start index in the treeMap to get all the required validators.
I have also moved the ValidatorRegistry & request processing phase to ozone-common module. With this we can use this validation framework in all services.

I still did not have the time to digest fully the changes proposed, I wanted to give feedback as early as possible even if it is partial. You can expect some more review comments from me after I have the time to understand things further, most likely just early next week unfortunately.

On the other hand, there is a problem which we need to address. Please take a look at the solution added in #6796, this PR adds the annotation processors explicitly to those modules where we use the annotation, and makes it impossible to use the annotations outside those modules, as importing the annotations became a violation in the enforcer plugin. Renaming an annotation or an annotation processor, effectively turns these enforcements off if the names are not updated in the pom files, so with a rename we will need to go through the pom files and fix them to keep the prior enforcements in effect.

I have added the banned imports to the best of my knowledge to ban import of the OmClientVersionValidator & OMLayoutVersionValidator.

A less burning issue, however it is our best interest to deal with this: I noticed is that due to renames and class deletes and method signature changes affects API documentation, from which the automation removed class references but this way the API doc is sometimes makes no sense, and most of the times deviates from the current implementation of the documented function.

I have fixed the docs after finding their usages in the package-info & the different class's java-doc.

@sumitagrawl
Copy link
Contributor

sumitagrawl commented Jul 25, 2024

@swamirishi
I am getting confused about changes. I am re-iterating about the requirement initially developed (HDDS-6213),
Validator:
Validation list:

  1. minVersion: validate minimum client version to be supported
  2. upgrade-finalized: upgrade finalization or not for the request to be supported

processor:

  1. processingPhase: pre-process/post-process to adapt request and response (later than minimum version) which will trigger associated method.

Actual implementation:
conditions: this act as condition when associated method to be triggered, like CLUSTER_NEEDS_FINALIZATION and OLDER_CLIENT_REQUESTS
associated Method: This performs re-validation and some places update request

Here, problem is, bug in OLDER_CLIENT_REQUESTS triggered associated method

Above implementation:

  1. Generic validation framework
  2. removed conditions and added maxVersion. -- I can not relate maxVersion meaning here. Is it version before which processingPhase to be triggered OR something else.

I am confused how actual requirement is satisfied.
IMO based on above understanding, we need have for any request,

  • Pre-validation: that does validation and automatic respond back if not satisfied, like BUCKET_OLD_CLIENT, CLUSTER_NEEDS_FINALIZATION, in generic way
  • processing-condition: range of feature enum where need trigger processing method, either do extra validation, or adapt request
  • processingLogic: method to which validation method is associated.

Also we can separate PR associated with generic implementation refactoring and concern we are trying to resolve, for better clarity and review.

@swamirishi
Copy link
Contributor Author

swamirishi commented Jul 26, 2024

@sumitagrawl thanks for taking a look at the patch. I have a few comments which have added inline.

@swamirishi I am getting confused about changes. I am re-iterating about the requirement initially developed (HDDS-6213), Validator: Validation list:

  1. minVersion: validate minimum client version to be supported
  2. upgrade-finalized: upgrade finalization or not for the request to be supported

processor:

  1. processingPhase: pre-process/post-process to adapt request and response (later than minimum version) which will trigger associated method.

Actual implementation: conditions: this act as condition when associated method to be triggered, like CLUSTER_NEEDS_FINALIZATION and OLDER_CLIENT_REQUESTS associated Method: This performs re-validation and some places update request

Here, problem is, bug in OLDER_CLIENT_REQUESTS triggered associated method

Above implementation:

  1. Generic validation framework
  2. removed conditions and added maxVersion. -- I can not relate maxVersion meaning here. Is it version before which processingPhase to be triggered OR something else.

Since the check is happening on the server side. One should be implementing stuff related to the server. Max version here is the max version[exclusive] for which the validator is supposed to run.

I am confused how actual requirement is satisfied. IMO based on above understanding, we need have for any request,

  • Pre-validation: that does validation and automatic respond back if not satisfied, like BUCKET_OLD_CLIENT, CLUSTER_NEEDS_FINALIZATION, in generic way

For instance the server has the capability to read and write EC keys and the Bucket layout feature has not been finalized. We should allow write of EC keys on the server and not block it just because the upgrade has not been finalized to the latest metadata layout version.

  • processing-condition: range of feature enum where need trigger processing method, either do extra validation, or adapt request
  • processingLogic: method to which validation method is associated.

Also we can separate PR associated with generic implementation refactoring and concern we are trying to resolve, for better clarity and review.

I personally don't see any value in separating out the PR. Since we would still need to create the new annotation which allows validator annotation support generic enums that extend the version interface. To make it generic all we are doing is also making the requestType another generic enum. So that we can reuse this in other components as well.

Copy link
Contributor

@fapifta fapifta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @swamirishi for your patience and also for the work you put into this initiative so far, not to mention the general discussion and your openness to try out various approaches.

In general I think we are more than good with the overall approach, however, there are a good couple of small things that I have marked inline for you to consider.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that you have added the methods to validate the max version, but those methods are not used anywhere in the processor itself, therefore if an invalid value is present (FUTURE_VERSION), the processor does not emit an error message in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the annotation processor to validate all that? I would rather put this check someplace else. Probably in a place like https://github.com/swamirishi/ozone/blob/fc57929908cbb05a6304366e67f64317dbf9505d/hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/request/validation/ValidatorRegistry.java#L85-L88 where instead of just a set of allowedVersionTypes. I would pass a Map of version class to the allowed values. What do you think of that?

* Base class defining the version in the entire system.
*/
public interface Version {
int getVersion();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to conform all the method naming in the implementing interfaces/classes, can we call this method simply as version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure makes sense

@@ -28,6 +28,8 @@
* List of OM Layout features / versions.
*/
public enum OMLayoutFeature implements LayoutFeature {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we skip adding this two empty lines please?

@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.ANNOTATION_TYPE)
public @interface RegisterValidator {
String MAX_VERSION_METHOD_NAME = "maxVersion";
Copy link
Contributor

@fapifta fapifta Jul 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if applyUntil would be a better name instead of maxVersion? I feel much more guided by applyUntil regarding the value to use for a pre-finalized validation, or a validation that should deal with older clients.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look at this as a version validator. ApplyUntil does make sense, I believe in future we might also need to have an ApplyFrom method as well. Just want to ensure this makes sense. minVersion & maxVersion made a lot of sense to me.

ElementKind.ENUM, REQUEST_PROCESSING_PHASE_CLASS_NAME);
}
}
if (!hasMaxVersion) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if it has any value to limit the number of elements in these annotations to exactly this 3 value, and check that there aren't any additional methods defined for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should put a restriction on any other method. Since the annotation could be used for some other purpose as well on a later basis. It doesn't make sense to just restrict the annotation definition to just have 3 methods and making it a minimum requirement is good enough.

RequestProcessingPhase phase,
V requestVersion) {
return Optional.ofNullable(this.indexedValidatorMap.get(requestVersion.getClass()))
.map(requestTypeMap -> requestTypeMap.get(requestType)).map(phaseMap -> phaseMap.get(phase))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.map(requestTypeMap -> requestTypeMap.get(requestType)).map(phaseMap -> phaseMap.get(phase))
.map(requestTypeMap -> requestTypeMap.get(requestType))
.map(phaseMap -> phaseMap.get(phase))


}

private <K, V> V getAndInitialize(K key, Supplier<V> defaultSupplier, Map<K, V> from) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should just inline this, as it came up earlier when this method body was switched to computeIfAbsent.

public List<T> getItemsGreaterThanIdx(IDX indexValue) {
return Optional.ofNullable(indexMap.higherEntry(indexValue))
.map(Map.Entry::getValue)
.map(startIndex -> items.subList(startIndex, items.size())).orElse(Collections.emptyList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.map(startIndex -> items.subList(startIndex, items.size())).orElse(Collections.emptyList());
.map(startIndex -> items.subList(startIndex, items.size()))
.orElse(Collections.emptyList());

}

/**
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apidoc here would be nice, however I am fine if this method and the add method is not documented, however in this case please explain the IndexedItems class, and its role in the data structure within the registry somewhere.

return from.computeIfAbsent(key, k -> defaultSupplier.get());
}

static final class IndexedItems<T, IDX extends Comparable<IDX>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static final class IndexedItems<T, IDX extends Comparable<IDX>> {
private static final class IndexedItems<T, IDX extends Comparable<IDX>> {

@sumitagrawl
Copy link
Contributor

@swamirishi
As per discussion for above, we can have below cases handled,

  • Above are Pre/Post executor which is similar to Webservice PreFilter and PostFilter. I think we can rename with similar pattern, as its job is similar and readability will be good.

  • A common pre/post processor can be associated with multiple Request, like CLUSTER_NEEDS_FINALIZATION. Instead of attaching same annotation with multiple request method, one method can be attached to multiple requests, like [] requestType.

  • Priority also can be provided for execution in order for multiple method (similar to WebService pre/post filter).

  • Currently both annotation has common "validatorRegistry and validationFor()" logic dependency with maxVersion to filter and execute method. Version is retrieved from softwareVersion and client version and provided as list. There is no difference which is clientVersion and which is Server version.
    We need more clarity for mapping annotation and pre-condition execution to call associated methods. Pre-condition must be separated and defined clearly, not as common logic with somehow integrating.

cc: @fapifta

Change-Id: I5fb15b4e3559fcee51f1964a4b0a50c57222f409
@fapifta
Copy link
Contributor

fapifta commented Jul 29, 2024

Hi @sumitagrawl

Thank you for the suggestions, let me add some answers and further ideas for them.

  • Above are Pre/Post executor which is similar to Webservice PreFilter and PostFilter. I think we can rename with similar pattern, as its job is similar and readability will be good.

PreFilter and PostFilter are a long standing part of Spring, however in J2EE we only have the Filter as part of the servlet specification.
Pre and PostFilter is part of Spring Security, and works based off of Spring EL to support expression based authentication and authorization related operations on request/responses.
The Filter interface in J2EE on the other hand gets a FilterChain that represents the chain of filters applied to a request that goes down to actual request processing, than back on the chain until the response is served, and the doFilter method has the responsibility to pass on the request to the next filter, and then the response back through the call stack.

In our case we do something similar, but still some fundamentally different. I am not particularly fan of the original name either of the current OMLayoutVersionValidator or ClientVersionValidator either. But PreFilter and PostFilter in itself for me is also just a name that is close but somehow meh a bit.

Let me give some history to the naming as I remember things.
RequestFeatureValidator was coming from the use case, we wanted to validate certain feature's of a request, and drop the request in case we should not process the request as we are in the pre-finalized state.
Once we realized that ok this can be used to also check and sometimes even fix compatibility related request or response features (by feature here I don't mean new functionality, but rather than that a distinctive attribute), so we included this in the system, but did not changed the name.
That was the point when we settled, as we wanted to release 1.3 with upgrade support so we needed to at least have something to support EC and new bucket layouts during the upgrade, and compatibility after the upgrade.

Once the original system was ready we already knew that there are a good couple of things that are not suitable for other components, and this have hurt us when we needed to solve compatibility in other components also for the same release.

Back than we had some discussions about how to do this transitively for clients that are connecting to OM, but then OM also connects to SCM to fulfil the request. We also have had discussions about exotic and less exotic race conditions and what problems might come from them and how we can solve these, and most importantly we talked about how to extend the system to other components.
We talked about separated annotations for pre and post processing, with priority (either statically defined of related version based), for which the methods we can store in two lists ordered by priority. We also discussed that the problem of the different versioning schemes we might solve via extending/axpanding the ValidationContext interface. And we also talked about allowing multiple requests to be associated with a validator method, but that we haven't worked out, as we saw a fundamental problem with this when we started to implement pre-finalized validations (more on these features later).
Yet we could not get to the actual implementation, as so far the system was good enough, and we had to deal with other priorities.

  • A common pre/post processor can be associated with multiple Request, like CLUSTER_NEEDS_FINALIZATION. Instead of attaching same annotation with multiple request method, one method can be attached to multiple requests, like [] requestType.

The problem of pre-finalized validations is the following (and you can check this for bucket layouts for example):
We have a separate type defined for all the requests via the relevant .proto file, and these types are coming from generated source files that are compiled via protoc, so it is not easy to dynamically figure out where the relevant information is for a specific request type.
That was why we gave up on this, as for every request type there was some specific place where the information can be find, and we decided that it is easier to implement the pre-finalization validations on a one-by-one basis. Speculatively we played with the idea of grouping these validators to multiple versions, but that also just works if the same property is to be validated, but even then different versions might require different checks/actions. So we dropped also that idea.
If we missed something, I am open to revisit any ideas to simplify this, but at the moment I don't see how that could be done that results in more readable and easier to understand code. Can you elaborate more on how you would implement a generic validator for the bucket layout related checks for example?

  • Priority also can be provided for execution in order for multiple method (similar to WebService pre/post filter).

Priority is a thing as I mentioned we put some thoughts into already, however at that point we have not implemented it, as we did not see the use of it in the context of 1.2->1.3 upgrade support. In that context EC and Bucket layout support was two features added within the same version, so ordering the validations would not meant any difference in the overall processing because these two had to be validated anyway, and if one validation passes the other might still fail, while the use of any of them was not predicted to be more common overall.
Here is the reasoning to do not add priority and this is much more true after adding the maxVersion to the picture:
Usually a cluster go through an upgrade to the next version, and then it is finalized, then rinse and repeat. We consider a scenario of upgrades that crosses multiple versions as rare at the lifecycle phase of Ozone as a product. User's at that time mostly upgraded based on master, while even internally in Cloudera we were using the master branch for Cloudera releases, so we predicted that most of the times during an upgrade there will be equally used features we need validations for, and it did not seem to worth it to deal with and figure out priorities between validators.
Moreover, the list of validators to apply is eventually and ideally empty in all clusters undergo upgrades because with the cluster upgrade over time clients should be upgraded also, so these are applied around the time of the upgrade mostly, actually this fact is why we desperately wanted to separate the checks from the actual request processing code, as eventually all of these checks are obsolete, but in order to keep compatibility you need to keep them there and check all the time, so this way we also ensured that the actual usually more expensive check is done just in case it is really needed, and all checks that are needed are checking the versions first.

As of now even if we argue that an upgrade can now jump more than one release version, we have the possibility to order the validations based on their maxVersion, which naturally prioritizes older validators first, and still serves validators for the same release in an ad-hoc way. I do believe there is no use case where priority would be needed at least I am yet to see one, and once I saw one I am happy to revisit my stance on this.

  • Currently both annotation has common "validatorRegistry and validationFor()" logic dependency with maxVersion to filter and execute method. Version is retrieved from softwareVersion and client version and provided as list. There is no difference which is clientVersion and which is Server version.
    We need more clarity for mapping annotation and pre-condition execution to call associated methods. Pre-condition must be separated and defined clearly, not as common logic with somehow integrating.

I would argue that we have a terminology problem here...
At the moment if you look at all the method annotations, OMLayoutVersionValidator is exclusively used for pre-finalized state validations, and this will remain the case based on what I forsee and am able to predict. If you can show me a case when this might not be true, again I am happy to re-validate my point of view.
On the other hand, OMClientVersionValidator is exclusively used for client compatibility related validations, and the same is true here, I do not forsee any case where there will be a different use case for this.
If we want to ensure quality, probably we may come up with better naming for these annotations...

All in all:
Thank you for you all being engaged and discussing this further to get to a good end state.
Let me summarize the ideas I think we should pursue and was already mentioned:

  • We should think about better naming for pre-finalized state validations in OM, and for compatibility related validations in OM. We may also revisit the term validation for these hooks, however PreFilter and PostFilter I don't believe will help to better understand the meaning and structure of these.
  • We may also introduce separate annotations, and instead of using RequestProcessingPhase, we may use two separate annotations for pre and post processing, and build this in into the data structure within the registry as a tuple where one side(right) is for pre and the other side(left) is for post process validations.
  • We should keep the ordering based on versions, and if there is a real use case we may introduce priority for validators, if there is a use case for it, if there isn't then we should not bother with priority yet and keep it simple.
  • We probably will run into further possibilities and problems once we try to apply this same framework to SCM's and DataNode's request processing.

Besides all of this we have to talk about possible race conditions with the system we are already aware of, but we left as is back than, and we still did not solved, like the one in HDDS-6682. As that may also reveal new problems and new ideas to simplify or complicate the current logic.

processingPhase = RequestProcessingPhase.PRE_PROCESS,
requestType = Type.CreateDirectory
requestType = Type.CreateDirectory,
maxVersion = ClientVersion.ERASURE_CODING_SUPPORT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be BUCKET_LAYOUT_SUPPORT as max version

@sumitagrawl
Copy link
Contributor

sumitagrawl commented Jul 31, 2024

@fapifta requestType can be array for requests to be supported for this...
As common validator where finalizer must be performed to support further request
@OMLayoutVersionValidator(
processingPhase = RequestProcessingPhase.PRE_PROCESS,
requestType = {Type.CreateDirectory, Type.CreateFile }
maxVersion = OMLayoutFeature.CURRENT
)
public static OMRequest mandateFinalization(
OMRequest req, ValidationContext ctx) throws OMException {
if (!ctx.versionManager().needsFinalization()) {
throw new OMException("Upgrade finalization node completed",
OMException.ResultCodes.NOT_SUPPORTED_OPERATION_PRIOR_FINALIZATION);
}
return req;
}

Change-Id: If7c47ea7242cfac29934c1a0eae6c57a46ccea58
@fapifta
Copy link
Contributor

fapifta commented Aug 28, 2024

@sumitagrawl thank you for giving an example for a generic pre-finalized state validation. The problem with this is that we do not want to disable all operations that have a new feature in the pre-finalized state, but we want to prevent using any new feature, so that if tests shows that a rollback/downgrade is needed, then it can be done.

So in the pre-finalized state what we need is to disable any new features coming in, that would store metadata that the old code does not understand, but we want to ensure that all things are operational so that it is possible to verify that the upgraded software works as expected before allowing to write data/metadata that the old code can not understand.
The aim is to ensure that there are no such data/metadata is written in the pre-finalized state, that prevents a rollback/downgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants