Skip to content

MSRA Text Detection 500 Database (MSRA-TD500)

Data Downloading

MSRA Text Detection 500 Database(MSRA-TD500)Download Link

Please download the data from the website above and unzip the file. After unzipping the file, the data structure should be like:

MSRA-TD500
 ├── test
 │   ├── IMG_0059.gt
 │   ├── IMG_0059.JPG
 │   ├── IMG_0080.gt
 │   ├── IMG_0080.JPG
 │   ├── ...
 ├── train
 │   ├── IMG_0030.gt
 │   ├── IMG_0030.JPG
 │   ├── IMG_0063.gt
 │   ├── IMG_0063.JPG
 │   ├── ...

Data Preparation

For Detection task

To prepare the data for text detection, you can run the following commands:

python tools/dataset_converters/convert.py \
    --dataset_name td500 --task det \
    --image_dir path/to/MSRA-TD500/train/ \
    --label_dir path/to/MSRA-TD500/train \
    --output_path path/to/MSRA-TD500/train_det_gt.txt
python tools/dataset_converters/convert.py \
    --dataset_name td500 --task det \
    --image_dir path/to/MSRA-TD500/test/ \
    --label_dir path/to/MSRA-TD500/test \
    --output_path path/to/MSRA-TD500/test_det_gt.txt

Then you can have two annotation files train_det_gt.txt and test_det_gt.txt under the folder MSRA-TD500/.

Back to dataset converters