Configuration parameter description¶

This document takes configs/rec/crnn/crnn_icdar15.yaml as an example to describe the usage of parameters in detail.

Environment parameters (system)¶

Parameter	Description	Default	Optional Values	Remarks
mode	Mindspore running mode (static graph/dynamic graph)	0	0 / 1	0: means running in GRAPH_MODE mode; 1: PYNATIVE_MODE mode
distribute	Whether to enable parallel training	True	True / False	\
device_id	Specify the device id while standalone training	7	The ids of all devices in the server	Only valid when distribute=False (standalone training) and environment variable 'DEVICE_ID' is NOT set. While standalone training, if both this arg and environment variable 'DEVICE_ID' are NOT set, use device 0 by default.
amp_level	Mixed precision mode	O0	O0/O1/O2/O3	'O0' - no change. 'O1' - convert the cells and operations in the whitelist to float16 precision, and keep the rest in float32 precision. 'O2' - Keep the cells and operations in the blacklist with float32 precision, and convert the rest to float16 precision. 'O3' - Convert all networks to float16 precision.
seed	Random seed	42	Integer	\
ckpt_save_policy	The policy for saving model weights	top_k	"top_k" or "latest_k"	"top_k" means to keep the top k checkpoints according to the metric score; "latest_k" means to keep the last k checkpoints. The value of `k` is set via `ckpt_max_keep`
ckpt_max_keep	The maximum number of checkpoints to keep during training	5	Integer	\
log_interval	The interval of printing logs (unit: epoch)	100	Integer	\
val_while_train	Whether to enable the evaluation mode while training	True	True/False	If the value is True, please configure the eval data set synchronously
val_start_epoch	From which epoch to run the evaluation	1	Interger
val_interval	Evaluation interval (unit: epoch)	1	Interger
drop_overflow_update	Whether not updating network parameters when loss/gradient overflows	True	True/False	If value is true, network parameters will not be updated when overflow occurs

Shared parameters (common)¶

Because the same parameter may need to be reused in different configuration sections, you can customize some common parameters in this section for easy management.

Model architecture (model)¶

In MindOCR, the network architecture of the model is divided into four modules: Transform, Backbone, Neck and Head. For details, please refer to documentation, the following are the configuration instructions and examples of each module.

Parameter	Description	Default	Remarks
type	Network type	-	Currently supports rec/det; 'rec' means recognition task, 'det' means detection task
pretrained	Specify pre-trained weight path or url	null	Supports local checkpoint path or url
transform:	Transformation method configuration	null
name	Specify transformation method name	-	Currently supports STN_ON
backbone:	Backbone network configuration
name	Specify the backbone network class name or function name	-	Currently defined classes include rec_resnet34, rec_vgg7, SVTRNet and det_resnet18, det_resnet50, det_resnet152, det_mobilenet_v3. You can also customize new classes, please refer to the documentation for definition.
pretrained	Whether to load pre-trained backbone weights	False	Supports bool type or str type to be passed in. If it is True, the default weight will be downloaded and loaded through the url link defined in the backbone py file. If str is passed in, the local checkpoint path or url path can be specified for loading.
neck:	Network Neck configuration
name	Neck class name	-	Currently defined classes include RNNEncoder, DBFPN, EASTFPN and PSEFPN. New classes can also be customized, please refer to the documentation for definition.
hidden_size	RNN hidden layer unit number	-	\
head:	Network prediction header configuration
name	Head class name	-	Currently supports CTCHead, AttentionHead, DBHead, EASTHead and PSEHead
weight_init	Set weight initialization	'normal'	\
bias_init	Set bias initialization	'zeros'	\
out_channels	Set the number of classes	-	\

Note: For different networks, the configurable parameters of the backbone/neck/head module will be different. The specific configurable parameters are determined by the init input parameter of the class specified by the name parameter of the module in the above table (For example, assume you specify the neck module is DBFPN. Since the DBFPN class initialization includes adaptive input parameters, parameters such as adaptive can be configured under the model.head in yaml.)

Reference example: DBNet, CRNN

Postprocessing (postprocess)¶

Please see the code in mindocr/postprocess

Parameter	Description	Example	Remarks
name	Post-processing class name	-	Currently supports DBPostprocess, EASTPostprocess, PSEPostprocess, RecCTCLabelDecode and RecAttnLabelDecode
character_dict_path	Recognition dictionary path	None	If None, then use the default dictionary [0-9a-z]
use_space_char	Set whether to add spaces to the dictionary	False	True/False

Note: For different post-processing methods (specified by name), the configurable parameters are different, and are determined by the input parameters of the initialization method __init__ of the post-processing class.

Reference example: DBNet, PSENet

Evaluation metrics (metric)¶

Please see the code in mindocr/metrics

Parameter	Description	Default	Remarks
name	Metric class name	-	Currently supports RecMetric, DetMetric
main_indicator	Main indicator, used for comparison of optimal models	'hmean'	'acc' for recognition tasks, 'f-score' for detection tasks
character_dict_path	Recognition dictionary path	None	If None, then use the default dictionary "0123456789abcdefghijklmnopqrstuvwxyz"
ignore_space	Whether to filter spaces	True	True/False
print_flag	Whether to print log	False	If set True, then output information such as prediction results and standard answers

Loss function (loss)¶

Please see the code in mindocr/losses

Parameter	Description	Default	Remarks
name	loss function name	-	Currently supports DBLoss, CTCLoss, AttentionLoss, PSEDiceLoss, EASTLoss and CrossEntropySmooth
pred_seq_len	length of predicted text	26	Determined by network architecture
max_label_len	The longest label length	25	The value is less than the length of the text predicted by the network
batch_size	single card batch size	32	\

Note: For different loss functions (specified by name), the configurable parameters are different and determined by the input parameters of the selected loss function.

Learning rate adjustment strategy and optimizer (scheduler, optimizer, loss_scaler)¶

Learning rate adjustment strategy (scheduler)¶

Please see the code in mindocr/scheduler

Parameter	Description	Default	Remarks
scheduler	Learning rate scheduler name	'constant'	Currently supports 'constant', 'cosine_decay', 'step_decay', 'exponential_decay', 'polynomial_decay', 'multi_step_decay'
min_lr	Minimum learning rate	1e-6	Lower lr bound for 'cosine_decay' schedulers.
lr	Learning rate value	0.01
num_epochs	Number of total epochs	200	The number of total epochs for the entire training.
warmup_epochs	The number of epochs in the training learning rate warmp phase	3	For 'cosine_decay', 'warmup_epochs' indicates the epochs to warmup learning rate from 0 to `lr`.
decay_epochs	The number of epochs in the training learning rate decay phase	10	For 'cosine_decay' schedulers, decay LR to min_lr in `decay_epochs`. For 'step_decay' scheduler, decay LR by a factor of `decay_rate` every `decay_epochs`.

optimizer¶

Please see the code location: mindocr/optim

Parameter	Description	Default	Remarks
opt	Optimizer name	'adam'	Currently supports 'sgd', 'nesterov', 'momentum', 'adam', 'adamw', 'lion', 'nadam', 'adan', 'rmsprop', 'adagrad', 'lamb'.
filter_bias_and_bn	Set whether to exclude the weight decrement of bias and batch norm	True	If True, weight decay will not apply on BN parameters and bias in Conv or Dense layers.
momentum	momentum	0.9	\
weight_decay	weight decay rate	0	It should be noted that weight decay can be a constant value or a Cell. It is a Cell only when dynamic weight decay is applied. Dynamic weight decay is similar to dynamic learning rate, users need to customize a weight decay schedule only with global step as input, and during training, the optimizer calls the instance of WeightDecaySchedule to get the weight decay value of current step.
nesterov	Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients.	False	True/False

Loss scaling (loss_scaler)¶

Parameter	Description	Default	Remarks
type	Loss scaling method type	static	Currently supports static, dynamic
loss_scale	Loss scaling value	1.0	\
scale_factor	When using dynamic loss scaler, the coefficient to dynamically adjust the loss_scale	2.0	At each training step, the loss scaling value is updated to `loss_scale`/`scale_factor` when overflow occurs.
scale_window	When using the dynamic loss scaler, when there is no overflow after the scale_window training step, enlarge the loss_scale by scale_factor times	1000	If the continuous `scale_window` steps does not overflow, the loss will be increased by `loss_scale` * `scale_factor` to update the scaling number

Training, evaluation and predict process (train, eval, predict)¶

The configuration of the training process is placed under train, and the configuration of the evaluation phase is placed under eval. Note that during model training, if the training-while-evaluation mode is turned on, that is, when val_while_train=True, an evaluation will be run according to the configuration under eval after each epoch is trained. During the non-training phase, only the eval configuration is read when only running model evaluation.

Training process (train)¶

Parameter	Description	Default	Remarks
ckpt_save_dir	Set model save path	./tmp_rec	\
resume	Resume training after training is interrupted, you can set True/False, or specify the ckpt path that needs to be loaded to resume training	False	If True, load resume_train.ckpt under the ckpt_save_dir directory to continue training. You can also specify the ckpt file path to load and resume training.
dataset_sink_mode	Whether the data is directly sinked to the processor for processing	-	If set to True, the data sinks to the processor, and the data can be returned at least after the end of each epoch
gradient_accumulation_steps	Number of steps to accumulate the gradients	1	Each step represents a forward calculation, and a reverse correction is performed after the gradient accumulation is completed.
clip_grad	Whether to clip the gradient	False	If set to True, gradients are clipped to `clip_norm`
clip_norm	The norm of clipping gradient if set clip_grad as True	1	\
ema	Whether to use EMA algorithm	False	\
ema_decay	EMA decay rate	0.9999	\
pred_cast_fp32	Whether to cast the data type of logits to fp32	False	\
dataset	Dataset configuration		For details, please refer to Data document
type	Dataset class name	-	Currently supports LMDBDataset, RecDataset and DetDataset
dataset_root	The root directory of the dataset	None	Optional
data_dir	The subdirectory where the dataset is located	-	If `dataset_root` is not set, please set this to the full directory
label_file	The label file path of the dataset	-	If `dataset_root` is not set, please set this to the full path, otherwise just set the subpath
sample_ratio	Data set sampling ratio	1.0	If value < 1.0, random selection
shuffle	Whether to shuffle the data order	True if undering training, otherwise False	True/False
transform_pipeline	Data processing flow	None	For details, please see transforms
output_columns	Data loader (data loader) needs to output a list of data attribute names (given to the network/loss calculation/post-processing) (type: list), and the candidate data attribute names are determined by transform_pipeline.	None	If the value is None, all columns are output. Take crnn as an example, output_columns: ['image', 'text_seq']
net_input_column_index	In output_columns, the indices of the input items required by the network construct function	[0]	\
label_column_index	In output_columns, the indices of the input items required by the loss function	[1]	\
loader	Data Loading Settings
shuffle	Whether to shuffle the data order for each epoch	True if undering training, otherwise False	True/False
batch_size	Batch size of a single card	-	\
drop_remainder	Whether to drop the last batch of data when the total data cannot be divided by batch_size	True if undering training, otherwise False	\
max_rowsize	Specifies the maximum space allocated by shared memory when copying data between multiple processes	64	Default value: 64
num_workers	Specifies the number of concurrent processes/threads for batch operations	n_cpus / n_devices - 2	This value should be greater than or equal to 2

Reference example: DBNet, CRNN

Evaluation process (eval)¶

The parameters of eval are basically the same as train, only a few additional parameters are added, and for the rest, please refer to the parameter description of train above.

Parameter	Usage	Default	Remarks
ckpt_load_path	Set model loading path	-	\
num_columns_of_labels	Set the number of labels in the dataset output columns	None	If None, assuming the columns after image (data[1:]) are labels. If not None, the num_columns_of_labels columns after image (data[1:1+num_columns_of_labels]) are labels, and the remaining columns are additional info like image_path.
drop_remainder	Whether to discard the last batch of data when the total number of data cannot be divided by batch_size	True if undering training, otherwise False	It is recommended to set it to False when doing model evaluation. If it cannot be divisible, mindocr will automatically select a batch size that is the largest divisible