Skip to content

YOLOv4

YOLOv4: Optimal Speed and Accuracy of Object Detection

Abstract

There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP (65.7% AP50) for the MS COCO dataset at a realtime speed of 65 FPS on Tesla V100.

Results

performance tested on Ascend 910(8p) with graph mode
Name Scale BatchSize ImageSize Dataset Box mAP (%) Params Recipe Download
YOLOv4 CSPDarknet53 16 * 8 608 MS COCO 2017 45.4 27.6M yaml weights
YOLOv4 CSPDarknet53(silu) 16 * 8 608 MS COCO 2017 45.8 27.6M yaml weights
performance tested on Ascend 910*(8p)
Name Scale BatchSize ImageSize Dataset Box mAP (%) ms/step Params Recipe Download
YOLOv4 CSPDarknet53 16 * 8 608 MS COCO 2017 46.1 337.25 27.6M yaml weights


Notes

  • Box mAP: Accuracy reported on the validation set.

Quick Start

Please refer to the QUICK START in MindYOLO for details.

Training

- Pretraining Model

You can get the pre-training model trained on ImageNet2012 from here.

To convert it to a loadable ckpt file for mindyolo, please put it in the root directory then run it

python mindyolo/utils/convert_weight_cspdarknet53.py

- Distributed Training

It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run

# distributed training on multiple GPU/Ascend devices
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov4_log python train.py --config ./configs/yolov4/yolov4-silu.yaml --device_target Ascend --is_parallel True --epochs 320

Similarly, you can train the model on multiple GPU devices with the above msrun command. Note: For more information about msrun configuration, please refer to here.

For detailed illustration of all hyper-parameters, please refer to config.py.

Notes

  • As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.
  • If the following warning occurs, setting the environment variable PYTHONWARNINGS='ignore:semaphore_tracker:UserWarning' will fix it.
    multiprocessing/semaphore_tracker.py: 144 UserWarning: semaphore_tracker: There appear to be 235 leaked semaphores to clean up at shutdown len(cache))
    

- Standalone Training

If you want to train or finetune the model on a smaller dataset without distributed training, please run:

# standalone training on a CPU/GPU/Ascend device
python train.py --config ./configs/yolov4/yolov4-silu.yaml --device_target Ascend --epochs 320

Validation and Test

To validate the accuracy of the trained model, you can use test.py and parse the checkpoint path with --weight.

python test.py --config ./configs/yolov4/yolov4-silu.yaml --device_target Ascend --iou_thres 0.6 --weight /PATH/TO/WEIGHT.ckpt

Deployment

See here.

References

[1] Alexey Bochkovskiy, Chien-Yao Wang and Ali Farhadi. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv preprint arXiv:2004.10934, 2020.