YOLOv7¶

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors

Abstract¶

YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100. YOLOv7-E6 object detector (56 FPS V100, 55.9% AP) outperforms both transformer-based detector SWIN-L Cascade-Mask R-CNN (9.2 FPS A100, 53.9% AP) by 509% in speed and 2% in accuracy, and convolutional-based detector ConvNeXt-XL Cascade-Mask R-CNN (8.6 FPS A100, 55.2% AP) by 551% in speed and 0.7% AP in accuracy, as well as YOLOv7 outperforms: YOLOR, YOLOX, Scaled-YOLOv4, YOLOv5, DETR, Deformable DETR, DINO-5scale-R50, ViT-Adapter-B and many other object detectors in speed and accuracy. Moreover, we train YOLOv7 only on MS COCO dataset from scratch without using any other datasets or pre-trained weights.

Results¶

performance tested on Ascend 910(8p) with graph mode

Name	Scale	BatchSize	ImageSize	Dataset	Box mAP (%)	Params	Recipe	Download
YOLOv7	Tiny	16 * 8	640	MS COCO 2017	37.5	6.2M	yaml	weights
YOLOv7	L	16 * 8	640	MS COCO 2017	50.8	36.9M	yaml	weights
YOLOv7	X	12 * 8	640	MS COCO 2017	52.4	71.3M	yaml	weights

performance tested on Ascend 910*(8p)

Name	Scale	BatchSize	ImageSize	Dataset	Box mAP (%)	ms/step	Params	Recipe	Download
YOLOv7	Tiny	16 * 8	640	MS COCO 2017	37.5	496.21	6.2M	yaml	weights

Notes¶

Context: Training context denoted as {device}x{pieces}-{MS mode}, where mindspore mode can be G - graph mode or F - pynative mode with ms function. For example, D910x8-G is for training on 8 pieces of Ascend 910 NPU using graph mode.
Box mAP: Accuracy reported on the validation set.
We refer to the official YOLOV7 to reproduce the P5 series model, and the differences are as follows: We use 8x NPU(Ascend910) for training, and the single-NPU batch size for tiny/l/x is 16/16/12. This is different from the official code.

Quick Start¶

Please refer to the QUICK START in MindYOLO for details.

Training¶

- Distributed Training¶

It is easy to reproduce the reported results with the pre-defined training recipe. For distributed training on multiple Ascend 910 devices, please run

# distributed training on multiple Ascend devices
msrun --worker_num=8 --local_worker_num=8 --bind_core=True --log_dir=./yolov7_log python train.py --config ./configs/yolov7/yolov7.yaml --device_target Ascend --is_parallel True

Note: For more information about msrun configuration, please refer to here.

For detailed illustration of all hyper-parameters, please refer to config.py.

Note: As the global batch size (batch_size x num_devices) is an important hyper-parameter, it is recommended to keep the global batch size unchanged for reproduction or adjust the learning rate linearly to a new global batch size.

- Standalone Training¶

If you want to train or finetune the model on a smaller dataset without distributed training, please run:

# standalone training on a CPU/Ascend device
python train.py --config ./configs/yolov7/yolov7.yaml --device_target Ascend

Validation and Test¶

To validate the accuracy of the trained model, you can use test.py and parse the checkpoint path with --weight.

python test.py --config ./configs/yolov7/yolov7.yaml --device_target Ascend --weight /PATH/TO/WEIGHT.ckpt

Deployment¶

See here.

References¶

[1] Chien-Yao Wang, Alexey Bochkovskiy, and HongYuan Mark Liao. Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696, 2022.