Image Classification Prediction¶
This tutorial introduces how to call the pretraining model in MindCV to make classification prediction on the test image.
Model Loading¶
View All Available Models¶
By calling the registry.list_models
function in mindcv.models
, the names of all network models can be printed. The models of a network in different parameter configurations will also be printed, such as resnet18 / resnet34 / resnet50 / resnet101 / resnet152.
import sys
sys.path.append("..")
from mindcv.models import registry
registry.list_models()
['BiT_resnet50',
'repmlp_b224',
'repmlp_b256',
'repmlp_d256',
'repmlp_l256',
'repmlp_t224',
'repmlp_t256',
'convit_base',
'convit_base_plus',
'convit_small',
...
'visformer_small',
'visformer_small_v2',
'visformer_tiny',
'visformer_tiny_v2',
'vit_b_16_224',
'vit_b_16_384',
'vit_b_32_224',
'vit_b_32_384',
'vit_l_16_224',
'vit_l_16_384',
'vit_l_32_224',
'xception']
Load Pretraining Model¶
Taking the resnet50 model as an example, we introduce two methods to load the model checkpoint using the create_model
function in mindcv.models
.
1). When the pretrained
parameter in the interface is set to True, network weights can be automatically downloaded.
from mindcv.models import create_model
model = create_model(model_name='resnet50', num_classes=1000, pretrained=True)
# Switch the execution logic of the network to the inference scenario
model.set_train(False)
102453248B [00:16, 6092186.31B/s]
ResNet<
(conv1): Conv2d<input_channels=3, output_channels=64, kernel_size=(7, 7), stride=(2, 2), pad_mode=pad, padding=3, dilation=(1, 1), group=1, has_bias=False, weight_init=normal, bias_init=zeros, format=NCHW>
(bn1): BatchNorm2d<num_features=64, eps=1e-05, momentum=0.9, gamma=Parameter (name=bn1.gamma, shape=(64,), dtype=Float32, requires_grad=True), beta=Parameter (name=bn1.beta, shape=(64,), dtype=Float32, requires_grad=True), moving_mean=Parameter (name=bn1.moving_mean, shape=(64,), dtype=Float32, requires_grad=False), moving_variance=Parameter (name=bn1.moving_variance, shape=(64,), dtype=Float32, requires_grad=False)>
(relu): ReLU<>
(max_pool): MaxPool2d<kernel_size=3, stride=2, pad_mode=SAME>
...
(pool): GlobalAvgPooling<>
(classifier): Dense<input_channels=2048, output_channels=1000, has_bias=True>
>
2). When the checkpoint_path
parameter in the interface is set to the file path, the model parameter file with the .ckpt
can be loaded.
from mindcv.models import create_model
model = create_model(model_name='resnet50', num_classes=1000, checkpoint_path='./resnet50_224.ckpt')
# Switch the execution logic of the network to the inference scenario
model.set_train(False)
Data Preparation¶
Create Dataset¶
Here, we download a Wikipedia image as a test image, and use the create_dataset
function in mindcv.data
to construct a custom dataset for a single image.
from mindcv.data import create_dataset
num_workers = 1
# path of dataset
data_dir = "./data/"
dataset = create_dataset(root=data_dir, split='test', num_parallel_workers=num_workers)
# Image visualization
from PIL import Image
Image.open("./data/test/dog/dog.jpg")
Data Preprocessing¶
Call the create_transforms
function to obtain the data processing strategy (transform list) of the ImageNet dataset used by the pre-trained model.
We pass the obtained transform list into the create_loader
function, specify batch_size=1
and other parameters, and then complete the preparation of test data. The Dataset
object is returned as the input of the model.
from mindcv.data import create_transforms, create_loader
transforms_list = create_transforms(dataset_name='imagenet', is_training=False)
data_loader = create_loader(
dataset=dataset,
batch_size=1,
is_training=False,
num_classes=1000,
transform=transforms_list,
num_parallel_workers=num_workers
)
Model Inference¶
The picture of the user-defined dataset is transferred to the model to obtain the inference result. Here, use the Squeeze
function of mindspore.ops
to remove the batch dimension.
import mindspore.ops as P
import numpy as np
images, _ = next(data_loader.create_tuple_iterator())
output = P.Squeeze()(model(images))
pred = np.argmax(output.asnumpy())
with open("imagenet1000_clsidx_to_labels.txt") as f:
idx2label = eval(f.read())
print('predict: {}'.format(idx2label[pred]))
predict: Labrador retriever