Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved W&B integration #2125

Merged
merged 84 commits into from
Mar 22, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
ba39bfd
Init Commit
AyushExel Feb 2, 2021
5fcd1dc
new wandb integration
AyushExel Feb 3, 2021
c540e3b
Update
AyushExel Feb 3, 2021
8253b24
Use data_dict in test
AyushExel Feb 3, 2021
7f89535
Updates
AyushExel Feb 3, 2021
c149930
Update: scope of log_img
AyushExel Feb 3, 2021
49edb90
Update: scope of log_img
AyushExel Feb 3, 2021
7922683
Update
AyushExel Feb 3, 2021
e1e7179
Update: Fix logging conditions
AyushExel Feb 3, 2021
e632514
Add tqdm bar, support for .txt dataset format
AyushExel Feb 9, 2021
3e8f4ae
Improve Result table Logger
AyushExel Feb 21, 2021
cd094f3
Init Commit
AyushExel Feb 2, 2021
aa5231e
new wandb integration
AyushExel Feb 3, 2021
0fdf3d3
Update
AyushExel Feb 3, 2021
37a2ed6
Use data_dict in test
AyushExel Feb 3, 2021
ac7d4b1
Updates
AyushExel Feb 3, 2021
ebc1d18
Update: scope of log_img
AyushExel Feb 3, 2021
745a272
Update: scope of log_img
AyushExel Feb 3, 2021
7679454
Update
AyushExel Feb 3, 2021
b8210a7
Update: Fix logging conditions
AyushExel Feb 3, 2021
ac9a613
Add tqdm bar, support for .txt dataset format
AyushExel Feb 9, 2021
4f7c150
Improve Result table Logger
AyushExel Feb 21, 2021
b8bbfce
Merge branch 'wandb_clean' of https://github.com/AyushExel/yolov5 int…
AyushExel Feb 23, 2021
c1e6697
Add dataset creation in training script
AyushExel Feb 23, 2021
1948562
Change scope: self.wandb_run
AyushExel Feb 23, 2021
8848f3c
Add wandb-artifact:https:// natively
AyushExel Feb 25, 2021
deca116
Add suuport for logging dataset while training
AyushExel Feb 26, 2021
20185f2
Cleanup
AyushExel Feb 26, 2021
5287a79
Merge branch 'master' into wandb_clean
AyushExel Feb 26, 2021
e13994d
Fix: Merge conflict
AyushExel Feb 26, 2021
1080952
Fix: CI tests
AyushExel Feb 26, 2021
5a859d4
Automatically use wandb config
AyushExel Feb 27, 2021
519cb7d
Fix: Resume
AyushExel Feb 28, 2021
3242f52
Fix: CI
AyushExel Feb 28, 2021
8128216
Enhance: Using val_table
AyushExel Feb 28, 2021
043befa
More resume enhancement
AyushExel Feb 28, 2021
c2d98f0
FIX : CI
AyushExel Feb 28, 2021
dbb69f4
Add alias
AyushExel Feb 28, 2021
8505a58
Get useful opt config data
AyushExel Mar 1, 2021
04f8880
train.py cleanup
AyushExel Mar 2, 2021
27a33dd
Merge remote-tracking branch 'upstream/master' into wandb_clean
AyushExel Mar 2, 2021
54dee24
Cleanup train.py
AyushExel Mar 2, 2021
21a15a5
more cleanup
AyushExel Mar 2, 2021
d38c620
Cleanup| CI fix
AyushExel Mar 2, 2021
e5400ba
Reformat using PEP8
AyushExel Mar 3, 2021
45e2c55
FIX:CI
AyushExel Mar 3, 2021
75f31d0
Merge remote-tracking branch 'upstream/master' into wandb_clean
AyushExel Mar 6, 2021
613b102
rebase
AyushExel Mar 6, 2021
9772645
remove uneccesary changes
AyushExel Mar 6, 2021
cd1237e
remove uneccesary changes
AyushExel Mar 6, 2021
d172ba1
remove uneccesary changes
AyushExel Mar 6, 2021
7af0186
remove unecessary chage from test.py
AyushExel Mar 6, 2021
51dca6d
FIX: resume from local checkpoint
AyushExel Mar 8, 2021
1438483
FIX:resume
AyushExel Mar 8, 2021
e7d18c6
FIX:resume
AyushExel Mar 8, 2021
22d97a7
Reformat
AyushExel Mar 8, 2021
8e97cdf
Performance improvement
AyushExel Mar 9, 2021
2ffb643
Fix local resume
AyushExel Mar 9, 2021
7836d17
Fix local resume
AyushExel Mar 9, 2021
aa785ec
FIX:CI
AyushExel Mar 9, 2021
f97446e
Fix: CI
AyushExel Mar 9, 2021
807a0e1
Imporve image logging
AyushExel Mar 9, 2021
20b4450
(:(:Redo CI tests:):)
AyushExel Mar 9, 2021
db81c64
Remember epochs when resuming
AyushExel Mar 9, 2021
25ff6b8
Remember epochs when resuming
AyushExel Mar 9, 2021
819ebec
Update DDP location
glenn-jocher Mar 10, 2021
b23a902
merge master
glenn-jocher Mar 14, 2021
f742857
PEP8 reformat
glenn-jocher Mar 14, 2021
350b8ab
0.25 confidence threshold
glenn-jocher Mar 14, 2021
395379e
reset train.py plots syntax to previous
glenn-jocher Mar 14, 2021
a06b25c
reset epochs completed syntax to previous
glenn-jocher Mar 14, 2021
cc49f6a
reset space to previous
glenn-jocher Mar 14, 2021
2d56697
remove brackets
glenn-jocher Mar 14, 2021
ba859a6
reset comment to previous
glenn-jocher Mar 14, 2021
52e3e71
Update: is_coco check, remove unused code
AyushExel Mar 14, 2021
ad1ad8f
Remove redundant print statement
AyushExel Mar 14, 2021
72dd23b
Remove wandb imports
AyushExel Mar 14, 2021
ac955ab
remove dsviz logger from test.py
AyushExel Mar 14, 2021
8bded54
Remove redundant change from test.py
AyushExel Mar 14, 2021
1aca390
remove redundant changes from train.py
AyushExel Mar 14, 2021
4c1c9bf
reformat and improvements
AyushExel Mar 20, 2021
f4923b4
Fix typo
AyushExel Mar 21, 2021
af23506
Merge branch 'master' of https://github.com/ultralytics/yolov5 into w…
AyushExel Mar 21, 2021
ca06d31
Add tqdm tqdm progress when scanning files, naming improvements
AyushExel Mar 21, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add dataset creation in training script
  • Loading branch information
AyushExel committed Feb 23, 2021
commit c1e6697135e46acecb8891e0b6058374333ea935
12 changes: 6 additions & 6 deletions test.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,13 +74,13 @@ def test(data,
names = {k: v for k, v in enumerate(model.names if hasattr(model, 'names') else model.module.names)}

# Logging
testset_table, result_table, log_imgs = None, None, 0
val_table, result_table, log_imgs = None, None, 0
if wandb_logger and wandb_logger.wandb:
import wandb
log_imgs = min(wandb.config.log_imgs, 100)
log_imgs = min(wandb_logger.log_imgs, 100)
class_set = wandb.Classes([{'id':id , 'name':name} for id,name in names.items()])
if wandb_logger.test_artifact_path:
testset_table=wandb_logger.testset_artifact.get("val")
if wandb_logger.val_artifact:
val_table=wandb_logger.val_artifact.get("val")
result_table=wandb_logger.result_table

# Dataloader
Expand Down Expand Up @@ -159,7 +159,7 @@ def test(data,
boxes = {"predictions": {"box_data": box_data, "class_labels": names}} # inference-space
wandb_images.append(wandb.Image(img[si], boxes=boxes, caption=path.name))
# W&B logging - DSVIZ
if testset_table and result_table:
if val_table and result_table:
box_data = []
total_conf = 0
for *xyxy, conf, cls in predn.tolist():
Expand All @@ -174,7 +174,7 @@ def test(data,
id = batch_i*batch_size + si
result_table.add_data(wandb_logger.current_epoch,
id,
wandb.Image(testset_table.data[id][1], boxes=boxes,classes=class_set,caption=path.name),
wandb.Image(val_table.data[id][1], boxes=boxes,classes=class_set,caption=path.name),
total_conf / max(1,len(box_data))
)

Expand Down
6 changes: 2 additions & 4 deletions train.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ def train(hyp, opt, device, tb_writer=None):
wandb_logger = WandbLogger(opt, Path(opt.save_dir).stem, run_id, data_dict)
if wandb_logger.wandb:
import wandb
weights = str(wandb_logger.weights) if opt.resume_from_artifact else weights
weights = opt.weights # WandbLogger might update weights path
loggers = {'wandb': wandb_logger.wandb} # loggers dict

# Model
Expand Down Expand Up @@ -155,8 +155,6 @@ def train(hyp, opt, device, tb_writer=None):
start_epoch = ckpt['epoch'] + 1
if opt.resume:
assert start_epoch > 0, '%s training to %g epochs is finished, nothing to resume.' % (weights, epochs)
if opt.resume_from_artifact:
assert start_epoch < epochs, '%s training to %g epochs is finished, nothing to resume.' % (weights, epochs)
if epochs < start_epoch:
logger.info('%s has been trained for %g epochs. Fine-tuning for %g additional epochs.' %
(weights, ckpt['epoch'], epochs))
Expand Down Expand Up @@ -470,7 +468,7 @@ def train(hyp, opt, device, tb_writer=None):
parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
parser.add_argument('--quad', action='store_true', help='quad dataloader')
parser.add_argument('--linear-lr', action='store_true', help='linear LR')
parser.add_argument('--log-imgs', type=int, default=16, help='number of images for W&B logging, max 100')
parser.add_argument('--upload_dataset', action='store_true', help='Upload you dataset as interactive W&B artifact table')
parser.add_argument('--bbox_interval', type=int, default=-1, help='Set bounding-box image logging interval for W&B')
parser.add_argument('--save_period', type=int, default=-1, help='Save model artifact after every "save_period" epoch')
parser.add_argument('--artifact_alias', type=str, default="latest", help='version of dataset artifact to be used')
Expand Down
102 changes: 77 additions & 25 deletions utils/wandb_logging/wandb_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,16 @@
import sys
from datetime import datetime
from pathlib import Path
import yaml

from tqdm import tqdm
import torch

sys.path.append(str(Path(__file__).parent.parent.parent)) # add utils/ to path
print(str(Path(__file__).parent.parent.parent))
from utils.general import colorstr, xywh2xyxy
from utils.datasets import img2label_paths
from utils.datasets import LoadImagesAndLabels

try:
import wandb
Expand All @@ -27,38 +30,62 @@ def remove_prefix(from_string, prefix):
class WandbLogger():
def __init__(self, opt, name, run_id, data_dict, job_type='Training'):
self.wandb = wandb
self.wandb_run = wandb.init(config=opt, resume="allow",
project='YOLOv5' if opt.project == 'runs/train' else Path(opt.project).stem,
name=name,
job_type=job_type,
id=run_id) if self.wandb else None

# Incorporate dataset creation in training script requires workarounds such as using 2 runs as we need
# to wait for dataset artifact to get uploaded before the training starts.
if opt.upload_dataset:
assert wandb, 'Install wandb to upload dataset'
wandb.init(config=data_dict,
project='YOLOv5' if opt.project == 'runs/train' else Path(opt.project).stem,
name=name,
job_type="Dataset Creation")
path = self.create_dataset_artifact(opt.data,
opt.single_cls,
'YOLOv5' if opt.project == 'runs/train' else Path(opt.project).stem)
wandb.finish() # Finish dataset creation run.
print("Using ", path, " to train")
with open(path) as f:
wandb_data_dict = yaml.load(f, Loader=yaml.SafeLoader)
data_dict = wandb_data_dict
print("Data dict =>" , data_dict)
if self.wandb:
self.wandb_run = wandb.init(config=opt, resume="allow",
project='YOLOv5' if opt.project == 'runs/train' else Path(opt.project).stem,
name=name,
job_type=job_type,
id=run_id) if not wandb.run else wandb.run
self.wandb_run.config.data_dict = data_dict
if job_type == 'Training':
self.setup_training(opt, data_dict)
self.current_epoch = 0
self.log_imgs = 16
if opt.bbox_interval == -1:
opt.bbox_interval = (opt.epochs // 10) if opt.epochs > 10 else opt.epochs


def setup_training(self, opt, data_dict):
self.log_dict = {}
self.train_artifact_path, self.trainset_artifact = \
self.download_dataset_artifact(data_dict['train'], opt.artifact_alias)
self.test_artifact_path, self.testset_artifact = \
self.download_dataset_artifact(data_dict['val'], opt.artifact_alias)
if opt.resume:
modeldir, _ = self.download_model_artifact(opt.resume_from_artifact)
if modeldir:
self.weights = Path(modeldir) / "best.pt"
opt.weights = self.weights
data_dict = self.wandb_run.config.data_dict # Advantage: Eliminates the need for config file to resume

self.train_artifact_path, self.train_artifact = \
self.download_dataset_artifact(data_dict.get('train'), opt.artifact_alias)
self.val_artifact_path, self.val_artifact = \
self.download_dataset_artifact(data_dict.get('val'), opt.artifact_alias)

self.result_artifact, self.result_table, self.weights = None, None, None
if self.train_artifact_path is not None:
train_path = Path(self.train_artifact_path) / 'data/images/'
data_dict['train'] = str(train_path)
if self.test_artifact_path is not None:
test_path = Path(self.test_artifact_path) / 'data/images/'
data_dict['val'] = str(test_path)
if self.val_artifact_path is not None:
val_path = Path(self.val_artifact_path) / 'data/images/'
data_dict['val'] = str(val_path)
self.result_artifact = wandb.Artifact("run_" + wandb.run.id + "_progress", "evaluation")
self.result_table = wandb.Table(["epoch", "id", "prediction", "avg_confidence"])
if opt.resume_from_artifact:
modeldir, _ = self.download_model_artifact(opt.resume_from_artifact)
if modeldir:
self.weights = Path(modeldir) / "best.pt"
opt.weights = self.weights


def download_dataset_artifact(self, path, alias):
if path.startswith(WANDB_ARTIFACT_PREFIX):
Expand All @@ -71,25 +98,49 @@ def download_dataset_artifact(self, path, alias):
return None, None

def download_model_artifact(self, name):
model_artifact = wandb.use_artifact(name + ":latest")
assert model_artifact is not None, 'Error: W&B model artifact doesn\'t exist'
modeldir = model_artifact.download()
return modeldir, model_artifact
if name.startswith(WANDB_ARTIFACT_PREFIX):
model_artifact = wandb.use_artifact(remove_prefix(name, WANDB_ARTIFACT_PREFIX) + ":latest")
assert model_artifact is not None, 'Error: W&B model artifact doesn\'t exist'
modeldir = model_artifact.download()
epochs_trained = model_artifact.metadata.get('epochs_trained')
total_epochs = model_artifact.metadata.get('total_epochs')
assert epochs_trained < total_epochs, '%s training to %g epochs is finished, nothing to resume.' % (weights, epochs)
return modeldir, model_artifact
return None, None

def log_model(self, path, opt, epoch):
datetime_suffix = datetime.today().strftime('%Y-%m-%d-%H-%M-%S')
model_artifact = wandb.Artifact('run_' + wandb.run.id + '_model', type='model', metadata={
'original_url': str(path),
'epoch': epoch + 1,
'epochs_trained': epoch + 1,
'save period': opt.save_period,
'project': opt.project,
'datetime': datetime_suffix
'datetime': datetime_suffix,
'total_epochs': opt.epochs
})
model_artifact.add_file(str(path / 'last.pt'), name='last.pt')
model_artifact.add_file(str(path / 'best.pt'), name='best.pt')
wandb.log_artifact(model_artifact)
print("Saving model artifact on epoch ", epoch + 1)

def create_dataset_artifact(self, data_file, single_cls, project, overwrite_config=False):
with open(data_file) as f:
data = yaml.load(f, Loader=yaml.SafeLoader) # data dict
nc, names = (1, ['item']) if single_cls else (int(data['nc']), data['names'])
names = {k: v for k, v in enumerate(names)} # to index dictionary
self.train_artifact = self.log_dataset_artifact(LoadImagesAndLabels(data['train']), names, name='train') if data.get('train') else None
self.val_artifact = self.log_dataset_artifact(LoadImagesAndLabels(data['val']), names, name='val') if data.get('val') else None
if data.get('train'):
data['train'] = WANDB_ARTIFACT_PREFIX + str(Path(project) / 'train')
if data.get('val'):
data['val'] = WANDB_ARTIFACT_PREFIX + str(Path(project) / 'val')
path = data_file if overwrite_config else data_file.replace('.', '_wandb.') # updated data.yaml path
data.pop('download', None) # download via artifact instead of predefined field 'download:'
with open(path, 'w') as f:
yaml.dump(data, f)
print("New Config file => ", path)
return path

def log_dataset_artifact(self, dataset, class_to_id, name='dataset'):
artifact = wandb.Artifact(name=name, type="dataset")
for img_file in [dataset.path] if Path(dataset.path).is_dir() else dataset.img_files:
Expand Down Expand Up @@ -123,6 +174,7 @@ def log_dataset_artifact(self, dataset, class_to_id, name='dataset'):
shutil.make_archive(zip_path.with_suffix(''), 'zip', labels_path)
artifact.add_file(str(zip_path), name='data/labels.zip')
wandb.log_artifact(artifact)
return artifact

def log(self, log_dict):
if self.wandb_run:
Expand All @@ -134,7 +186,7 @@ def end_epoch(self, best_result=False):
wandb.log(self.log_dict)
self.log_dict = {}
if self.result_artifact:
train_results = wandb.JoinedTable(self.testset_artifact.get("val"), self.result_table, "id")
train_results = wandb.JoinedTable(self.val_artifact.get("val"), self.result_table, "id")
self.result_artifact.add(train_results, 'result')
wandb.log_artifact(self.result_artifact, aliases=['best'] if best_result else None)
self.result_table = wandb.Table(["epoch", "id", "prediction", "avg_confidence"])
Expand Down