Self-supervised pretraining has advanced the capabilities of many computer vision tasks without requiring additional labels. One drawback is this technique requires
extensive datasets and computational resources. This requirement of large datasets to pretrain with has often precluded the use of smaller, more niche datasets. Recently
a method of pretraining has been developed that uses several stages of training, arranging each subsequent pretraining step to a dataset more closely resembling the target labelled data. This Hierarchical PreTraining (HPT) allows small datasets that are significantly different from generalized pretraining datasets (e.g. ImageNet) to build off subsequent knowledge transfers of increasingly focused training. However, there remains computer vision domains that are sufficiently difficult to acquire data that the use of synthetic data to augment their training has become a common convention. This paper examines how Remote Sensing Imagery (RSI) datasets, both augmented with synthetic data and without, still benefit from HPT despite being a niche domain. We show the fine balance that must be maintained when pretraining with these small datasets through a series of experiments focused on isolating various training parameters. We also demonstrate how these techniques lead to model improvements over existing baselines with and without synthetic data. Given that HPT provides a straightforward process to increase performance, and synthetic data is a growing resource for dataset augmentation, these combined methods can enhance a wide variety of current and future computer vision tasks.
RarePlanes soar higher: Self-supervised pretraining for resource constrained and synthetic datasets
2022
Research areas