定制化数据
数据模块指南¶
代码结构¶
├── README.md
├── __init__.py
├── base_dataset.py # base dataset class with __getitem__
├── builder.py # API for create dataset and loader
├── det_dataset.py # general text detection dataset class
├── rec_dataset.py # general rec detection dataset class
├── rec_lmdb_dataset.py # LMDB dataset class
└── transforms
├── det_transforms.py # processing and augmentation ops (callabel classes) especially for detection tasks
├── general_transforms.py # general processing and augmentation ops (callabel classes)
├── modelzoo_transforms.py # transformations adopted from modelzoo
├── rec_transforms.py # processing and augmentation ops (callabel classes) especially for recognition tasks
└── transforms_factory.py # API for create and run transforms
如何添加自己的dataset类¶
- 继承BaseDataset类
-
在BaseDataset中重写以下文件和标注解析函数。
def load_data_list(self, label_file: Union[str, List[str]], sample_ratio: Union[float, List] = 1.0, shuffle: bool = False, **kwargs) -> List[dict]
def _parse_annotation(self, data_line: str) -> Union[dict, List[dict]]
如何添加自己的数据转换¶
请参考定制化数据转换开发指导