RCTW-17 Dataset¶
Data Downloading¶
The RCTW dataset Official Website | Download Link
The training set is split into two zip files train_images.zip.001
and train_images.zip.002
. The annotations are *_gts.zip
files.
After downloading and unzipping the images and annotations, collect the images into a single folder e.g. train_images/
, after which the directory structure should be like as follows (ignoring the archive files):
RCTW-17
|--- train_images
| |--- <image_name>.jpg
| |--- <image_name>.jpg
| |--- ...
|--- train_gts
| |--- <image_name>.txt
| |--- <image_name>.txt
| |--- ...
Data Preparation¶
For Detection Task¶
To prepare the data for text detection, you can run the following commands:
python tools/dataset_converters/convert.py \
--dataset_name rctw17 --task det \
--image_dir path/to/RCTW-17/train_images/ \
--label_dir path/to/RCTW-17/train_gts \
--output_path path/to/RCTW-17/det_gt.txt
The generated standard annotation file det_gt.txt
will now be placed under the folder RCTW-17/
.