We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug
It's related to another bug that I previously reported in #3598 and #3614
Now it's failing in the parameter validation
ParamValidationError: Parameter validation failed: Unknown parameter in TrainingJobDefinition: "Environment", must be one of: DefinitionName, TuningObjective, HyperParameterRanges, StaticHyperParameters, AlgorithmSpecification, RoleArn, InputDataConfig, VpcConfig, OutputDataConfig, ResourceConfig, StoppingCondition, EnableNetworkIsolation, EnableInterContainerTrafficEncryption, EnableManagedSpotTraining, CheckpointConfig, RetryStrategy, HyperParameterTuningResourceConfig
To reproduce
from sagemaker.pytorch import PyTorch from sagemaker.tuner import ( IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner ) checkpoint_s3_uri = f"s3://{bucket_name}/{prefix}/checkpoints" instance_type = "ml.g4dn.xlarge" # 4 vCPUs, 16 GB RAM, 1 x NVIDIA T4 16GB GPU - $ 0.736 per hour estimator = PyTorch( entry_point="train.py", source_dir="ml", role=params.sagemaker_execution_role_arn, sagemaker_session=sagemaker_session, py_version="py38", framework_version="1.12.0", instance_count=1, instance_type=instance_type, use_spot_instances=True, max_wait=10800, max_run=10800, checkpoint_s3_uri=checkpoint_s3_uri, checkpoint_local_path="/opt/ml/checkpoints", environment={ "MLFLOW_TRACKING_URI": params.mlflow_tracking_uri, "MLFLOW_EXPERIMENT_NAME": params.mlflow_experiment_name, "MLFLOW_TRACKING_USERNAME": params.mlflow_tracking_username, "MLFLOW_TRACKING_PASSWORD": params.mlflow_tracking_password, "MLFLOW_TAGS": params.mlflow_tags, "MLFLOW_RUN_ID": mlflow.active_run().info.run_id, "MLFLOW_FLATTEN_PARAMS": "True" }, hyperparameters={ ## If you want to test the code, uncomment the following lines to use smaller datasets # "max_train_samples": 100, # "max_val_samples": 100, # "max_test_samples": 100, "num_train_epochs": params.num_train_epochs, "early_stopping_patience": params.early_stopping_patience, "eval_dataset": "validation", "batch_size": params.batch_size, "seed": params.seed } ) tuner = HyperparameterTuner( estimator, max_jobs=18, max_parallel_jobs=3, objective_type="Maximize", objective_metric_name="eval_f1", metric_definitions=[ { "Name": "eval_f1", "Regex": "eval_f1: ([0-9\\.]+)" } ], hyperparameter_ranges={ "learning_rate": ContinuousParameter(1e-5, 1e-3), "weight_decay": ContinuousParameter(0.0, 0.1), "adam_beta1": ContinuousParameter(0.8, 0.999), "adam_beta2": ContinuousParameter(0.8, 0.999), "adam_epsilon": ContinuousParameter(1e-8, 1e-6), "label_smoothing_factor": ContinuousParameter(0.0, 0.1), "optim": CategoricalParameter( [ "adamw_hf", "adamw_torch", "adamw_apex_fused", "adafactor" ] ) } )
System information A description of your system. Please provide:
The text was updated successfully, but these errors were encountered:
Hello Douglas! Could you share please which version of the boto3 SDK are you using?
Sorry, something went wrong.
boto3==1.26.32
@DougTrajano according to the boto3 changelog, this functional is supported in versions >= 1.26.53. Could you try to update your boto3 version?
yeah! I tested with the latest version of boto3 and it worked. :)
No branches or pull requests
Describe the bug
It's related to another bug that I previously reported in #3598 and #3614
Now it's failing in the parameter validation
To reproduce
System information
A description of your system. Please provide:
The text was updated successfully, but these errors were encountered: