Skip to content

ICPR MTWI-2018 Dataset

Data Downloading

The ICPR MTWI-2018 dataset Official Website | Download Link

Note: Please register an account to download this dataset.

The ICPR MTWI dataset has derived three tasks: Text Line(column) Recognition of Web Images, Text Detection of Web Images, and End to End Text Detection and Recognition of Web Images. The three tasks share the same training data: mtwi_train.zip; For test data, task1 has test data: mtwi_task1.zip, and task⅔ share the same test data: mtwi_task2_3.zip. For now, we will consider and download only the training data mtw_train.zip.

After downloading the dataset, unzip the file, after which the directory structure should be like as follows (ignoring the archive files):

MTWI-2018
  |--- image_train
  |    |--- <image_name>.jpg
  |    |--- <image_name>.jpg
  |    |--- ...
  |--- txt_train
  |    |--- <image_name>.txt
  |    |--- <image_name>.txt
  |    |--- ...

Data Preparation

For Detection Task

To prepare the data for text detection, you can run the following commands:

python tools/dataset_converters/convert.py \
    --dataset_name mtwi2018 --task det \
    --image_dir path/to/MTWI-2018/image_train/ \
    --label_dir path/to/MTWI-2018/txt_train.json \
    --output_path path/to/MTWI-2018/det_gt.txt

The generated standard annotation file det_gt.txt will now be placed under the folder MTWI-2018/.

Back to dataset converters