YOLOv4¶
Abstract¶
There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP (65.7% AP50) for the MS COCO dataset at a realtime speed of 65 FPS on Tesla V100.
Results¶
performance tested on Ascend 910(8p) with graph mode
Name | Scale | BatchSize | ImageSize | Dataset | Box mAP (%) | Params | Recipe | Download |
---|---|---|---|---|---|---|---|---|
YOLOv4 | CSPDarknet53 | 16 * 8 | 608 | MS COCO 2017 | 45.4 | 27.6M | yaml | weights |
YOLOv4 | CSPDarknet53(silu) | 16 * 8 | 608 | MS COCO 2017 | 45.8 | 27.6M | yaml | weights |
performance tested on Ascend 910*(8p)
Name | Scale | BatchSize | ImageSize | Dataset | Box mAP (%) | ms/step | Params | Recipe | Download |
---|---|---|---|---|---|---|---|---|---|
YOLOv4 | CSPDarknet53 | 16 * 8 | 608 | MS COCO 2017 | 46.1 | 337.25 | 27.6M | yaml | weights |
Notes¶
- Box mAP: Accuracy reported on the validation set.
Quick Start¶
Please refer to the QUICK START in MindYOLO for details.
Training¶
- Pretraining Model¶
You can get the pre-training model trained on ImageNet2012 from here.
To convert it to a loadable ckpt file for mindyolo, please put it in the root directory then run it
python mindyolo/utils/convert_weight_cspdarknet53.py
- Distributed Training¶
It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run
# distributed training on multiple GPU/Ascend devices
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov4_log python train.py --config ./configs/yolov4/yolov4-silu.yaml --device_target Ascend --is_parallel True --epochs 320
Similarly, you can train the model on multiple GPU devices with the above msrun command. Note: For more information about msrun configuration, please refer to here.
For detailed illustration of all hyper-parameters, please refer to config.py.
Notes¶
- As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
- If the following warning occurs, setting the environment variable PYTHONWARNINGS='ignore:semaphore_tracker:UserWarning' will fix it.
multiprocessing/semaphore_tracker.py: 144 UserWarning: semaphore_tracker: There appear to be 235 leaked semaphores to clean up at shutdown len(cache))
- Standalone Training¶
If you want to train or finetune the model on a smaller dataset without distributed training, please run:
# standalone training on a CPU/GPU/Ascend device
python train.py --config ./configs/yolov4/yolov4-silu.yaml --device_target Ascend --epochs 320
Validation and Test¶
To validate the accuracy of the trained model, you can use test.py
and parse the checkpoint path with --weight
.
python test.py --config ./configs/yolov4/yolov4-silu.yaml --device_target Ascend --iou_thres 0.6 --weight /PATH/TO/WEIGHT.ckpt
Deployment¶
See here.
References¶
[1] Alexey Bochkovskiy, Chien-Yao Wang and Ali Farhadi. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv preprint arXiv:2004.10934, 2020.