darts.legacy_training.wandb_sweep_smp¶
Create a sweep with wandb and run it on the specified cuda device, or continue an existing sweep.
If sweep_id
is None, a new sweep will be created. Otherwise, the sweep with the given ID will be continued.
All artifacts are gathered under nested directory based on the sweep id: {artifact_dir}/sweep-{sweep_id}.
Since each sweep-configuration has (currently) an own name and id, a single run can be found under:
{artifact_dir}/sweep-{sweep_id}/{run_name}/{run_id}. Read the training-docs for more info.
If a cuda_device
is specified, run an agent on this device. If None, do nothing.
You can specify the frequency on how often logs will be written and validation will be performed.
- log_every_n_steps
specifies how often train-logs will be written. This does not affect validation.
- check_val_every_n_epoch
specifies how often validation will be performed.
This will also affect early stopping.
- plot_every_n_val_epochs
specifies how often validation samples will be plotted.
Since plotting is quite costly, you can reduce the frequency. Works similar like early stopping.
In epochs, this would be check_val_every_n_epoch * plot_every_n_val_epochs
.
This will NOT use cross-validation. For cross-validation, use optuna_sweep_smp
.
Example
In one terminal, start a sweep:
$ rye run darts wandb-sweep-smp --config-file /path/to/sweep-config.toml
... # Many logs
Created sweep with ID 123456789
... # More logs from spawned agent
In another terminal, start an a second agent:
Parameters:
-
train_data_dir
(pathlib.Path
) –Path to the training data directory.
-
sweep_config
(pathlib.Path
) –Path to the sweep yaml configuration file. Must contain a valid wandb sweep configuration. Hyperparameters must contain the following fields:
model_arch
,model_encoder
,augment
,gamma
,batch_size
. Please read https://docs.wandb.ai/guides/sweeps/sweep-config-keys for more information. -
n_trials
(int
, default:10
) –Number of runs to execute. Defaults to 10.
-
sweep_id
(str | None
, default:None
) –The ID of the sweep. If None, a new sweep will be created. Defaults to None.
-
artifact_dir
(pathlib.Path
, default:pathlib.Path('lightning_logs')
) –Path to the training output directory. Will contain checkpoints and metrics. Defaults to Path("lightning_logs").
-
max_epochs
(int
, default:100
) –Maximum number of epochs to train. Defaults to 100.
-
log_every_n_steps
(int
, default:10
) –Log every n steps. Defaults to 10.
-
check_val_every_n_epoch
(int
, default:3
) –Check validation every n epochs. Defaults to 3.
-
plot_every_n_val_epochs
(int
, default:5
) –Plot validation samples every n epochs. Defaults to 5.
-
num_workers
(int
, default:0
) –Number of Dataloader workers. Defaults to 0.
-
device
(int | str | None
, default:None
) –The device to run the model on. Defaults to None.
-
wandb_entity
(str | None
, default:None
) –Weights and Biases Entity. Defaults to None.
-
wandb_project
(str | None
, default:None
) –Weights and Biases Project. Defaults to None.
Source code in darts/src/darts/legacy_training/train.py
264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 |
|