Unknown parameter in TrainingJobDefinition: "Environment" #3627

DougTrajano · 2023-02-02T15:08:38Z

Describe the bug

It's related to another bug that I previously reported in #3598 and #3614

Now it's failing in the parameter validation

ParamValidationError: Parameter validation failed:
Unknown parameter in TrainingJobDefinition: "Environment", must be one of: DefinitionName, TuningObjective, HyperParameterRanges, StaticHyperParameters, AlgorithmSpecification, RoleArn, InputDataConfig, VpcConfig, OutputDataConfig, ResourceConfig, StoppingCondition, EnableNetworkIsolation, EnableInterContainerTrafficEncryption, EnableManagedSpotTraining, CheckpointConfig, RetryStrategy, HyperParameterTuningResourceConfig

To reproduce

from sagemaker.pytorch import PyTorch
from sagemaker.tuner import (
    IntegerParameter,
    CategoricalParameter,
    ContinuousParameter,
    HyperparameterTuner
)

checkpoint_s3_uri = f"s3://{bucket_name}/{prefix}/checkpoints"

instance_type = "ml.g4dn.xlarge" 
# 4 vCPUs, 16 GB RAM, 1 x NVIDIA T4 16GB GPU - $ 0.736 per hour

estimator = PyTorch(
    entry_point="train.py",
    source_dir="ml",
    role=params.sagemaker_execution_role_arn,
    sagemaker_session=sagemaker_session,
    py_version="py38",
    framework_version="1.12.0",
    instance_count=1,
    instance_type=instance_type,
    use_spot_instances=True,
    max_wait=10800,
    max_run=10800,
    checkpoint_s3_uri=checkpoint_s3_uri,
    checkpoint_local_path="/opt/ml/checkpoints",
    environment={
        "MLFLOW_TRACKING_URI": params.mlflow_tracking_uri,
        "MLFLOW_EXPERIMENT_NAME": params.mlflow_experiment_name,
        "MLFLOW_TRACKING_USERNAME": params.mlflow_tracking_username,
        "MLFLOW_TRACKING_PASSWORD": params.mlflow_tracking_password,
        "MLFLOW_TAGS": params.mlflow_tags,
        "MLFLOW_RUN_ID": mlflow.active_run().info.run_id,
        "MLFLOW_FLATTEN_PARAMS": "True"
    },
    hyperparameters={
        ## If you want to test the code, uncomment the following lines to use smaller datasets
        # "max_train_samples": 100,
        # "max_val_samples": 100,
        # "max_test_samples": 100,
        "num_train_epochs": params.num_train_epochs,
        "early_stopping_patience": params.early_stopping_patience,
        "eval_dataset": "validation",
        "batch_size": params.batch_size,
        "seed": params.seed
    }
)

tuner = HyperparameterTuner(
    estimator,
    max_jobs=18,
    max_parallel_jobs=3,
    objective_type="Maximize",
    objective_metric_name="eval_f1",
    metric_definitions=[
        {
            "Name": "eval_f1",
            "Regex": "eval_f1: ([0-9\\.]+)"
        }
    ],
    hyperparameter_ranges={
        "learning_rate": ContinuousParameter(1e-5, 1e-3),
        "weight_decay": ContinuousParameter(0.0, 0.1),
        "adam_beta1": ContinuousParameter(0.8, 0.999),
        "adam_beta2": ContinuousParameter(0.8, 0.999),
        "adam_epsilon": ContinuousParameter(1e-8, 1e-6),
        "label_smoothing_factor": ContinuousParameter(0.0, 0.1),
        "optim": CategoricalParameter(
            [
                "adamw_hf",
                "adamw_torch",
                "adamw_apex_fused",
                "adafactor"
            ]
        )
    }
)

System information
A description of your system. Please provide:

SageMaker Python SDK version: 2.131.0
Framework name (eg. PyTorch) or algorithm (eg. KMeans): PyTorch
Framework version: 1.12.0
Python version: 3.8
CPU or GPU: GPU
Custom Docker image (Y/N): N

The text was updated successfully, but these errors were encountered:

repushko · 2023-02-03T12:41:33Z

Hello Douglas!
Could you share please which version of the boto3 SDK are you using?

DougTrajano · 2023-02-05T20:20:08Z

Hello Douglas! Could you share please which version of the boto3 SDK are you using?

boto3==1.26.32

repushko · 2023-02-06T14:53:51Z

@DougTrajano according to the boto3 changelog, this functional is supported in versions >= 1.26.53. Could you try to update your boto3 version?

DougTrajano · 2023-02-07T01:39:24Z

@DougTrajano according to the boto3 changelog, this functional is supported in versions >= 1.26.53. Could you try to update your boto3 version?

yeah! I tested with the latest version of boto3 and it worked. :)

DougTrajano added the bug label Feb 2, 2023

DougTrajano closed this as completed Feb 7, 2023

timxieICN mentioned this issue Mar 2, 2023

ClientError: Failed to invoke sagemaker:CreateHyperParameterTuningJob. Error Details: Only the following fields in TrainingJobDefinition are allowed to change #3693

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unknown parameter in TrainingJobDefinition: "Environment" #3627

Unknown parameter in TrainingJobDefinition: "Environment" #3627

DougTrajano commented Feb 2, 2023 •

edited

Loading

repushko commented Feb 3, 2023

DougTrajano commented Feb 5, 2023

repushko commented Feb 6, 2023 •

edited

Loading

DougTrajano commented Feb 7, 2023

Unknown parameter in TrainingJobDefinition: "Environment" #3627

Unknown parameter in TrainingJobDefinition: "Environment" #3627

Comments

DougTrajano commented Feb 2, 2023 • edited Loading

repushko commented Feb 3, 2023

DougTrajano commented Feb 5, 2023

repushko commented Feb 6, 2023 • edited Loading

DougTrajano commented Feb 7, 2023

DougTrajano commented Feb 2, 2023 •

edited

Loading

repushko commented Feb 6, 2023 •

edited

Loading