According to the data transform interface convention of TorchVision, all data transform classes need to implement the __call__
method. And in the convention of OpenMMLab 1.0, we require the input and output of the __call__
method should be a dictionary.
In OpenMMLab 2.0, to make the data transform classes more extensible, we use transform
method instead of __call__
method to implement data transformation, and all data transform classes should inherit the mmcv.transforms.BaseTransform class. And you can still use these data transform classes by calling.
A tutorial to implement a data transform class can be found in the Data Transform.
In addition, we move some common data transform classes from every repositories to MMCV, and in this document, we will compare the functionalities, usages and implementations between the original data transform classes (in MMClassification v0.23.2, MMDetection v2.25.1) and the new data transform classes (in MMCV v2.0.0rc1)
Functionality Differences¶ MMClassification (original) MMDetection (original) MMCV (new)LoadImageFromFile
Join the 'img_prefix' and 'img_info.filename' field to find the path of images and loading. Join the 'img_prefix' and 'img_info.filename' field to find the path of images and loading. Support specifying the order of channels. Load images from 'img_path'. Support ignoring failed loading and specifying decode backend. LoadAnnotations
Not available. Load bbox, label, mask (include polygon masks), semantic segmentation. Support converting bbox coordinate system. Load bbox, label, mask (not include polygon masks), semantic segmentation. Pad
Pad all images in the "img_fields" field. Pad all images in the "img_fields" field. Support padding to integer multiple size. Pad the image in the "img" field. Support padding to integer multiple size. CenterCrop
Crop all images in the "img_fields" field. Support cropping as EfficientNet style. Not available. Crop the image in the "img" field, the bbox in the "gt_bboxes" field, the semantic segmentation in the "gt_seg_map" field, the keypoints in the "gt_keypoints" field. Support padding the margin of the cropped image. Normalize
Normalize the image. No differences. No differences, but we recommend to use data preprocessor to normalize the image. Resize
Resize all images in the "img_fields" field. Support resizing proportionally according to the specified edge. Use Resize
with ratio_range=None
, the img_scale
have a single scale, and multiscale_mode="value"
. Resize the image in the "img" field, the bbox in the "gt_bboxes" field, the semantic segmentation in the "gt_seg_map" field, the keypoints in the "gt_keypoints" field. Support specifying the ratio of new scale to original scale and support resizing proportionally. RandomResize
Not available Use Resize
with ratio_range=None
, img_scale
have two scales and multiscale_mode="range"
, or ratio_range
is not None.
Resize( img_sacle=[(640, 480), (960, 720)], mode="range", )Have the same resize function as
Resize
. Support sampling the scale from a scale range or scale ratio range.
RandomResize(scale=[(640, 480), (960, 720)])
RandomChoiceResize
Not available Use Resize
with ratio_range=None
, img_scale
have multiple scales, and multiscale_mode="value"
.
Resize( img_sacle=[(640, 480), (960, 720)], mode="value", )Have the same resize function as
Resize
. Support randomly choosing the scale from multiple scales or multiple scale ratios.
RandomChoiceResize(scales=[(640, 480), (960, 720)])
RandomGrayscale
Randomly grayscale all images in the "img_fields" field. Support keeping channels after grayscale. Not available Randomly grayscale the image in the "img" field. Support specifying the weight of each channel, and support keeping channels after grayscale. RandomFlip
Randomly flip all images in the "img_fields" field. Support flipping horizontally and vertically. Randomly flip all values in the "img_fields", "bbox_fields", "mask_fields" and "seg_fields". Support flipping horizontally, vertically and diagonally, and support specifying the probability of every kind of flipping. Randomly flip the values in the "img", "gt_bboxes", "gt_seg_map", "gt_keypoints" field. Support flipping horizontally, vertically and diagonally, and support specifying the probability of every kind of flipping. MultiScaleFlipAug
Not available Used for test-time-augmentation. Use TestTimeAug
ToTensor
Convert the values in the specified fields to torch.Tensor
. No differences No differences ImageToTensor
Convert the values in the specified fields to torch.Tensor
and transpose the channels to CHW. No differences. No differences. Implementation Differences¶
Take RandomFlip
as example, the new version RandomFlip in MMCV inherits BaseTransfrom
, and move the functionality implementation from __call__
to transform
method. In addition, the randomness related code is placed in some extra methods and these methods need to be wrapped by cache_randomness
decorator.
MMDetection (original version)
class RandomFlip: def __call__(self, results): """Randomly flip images.""" ... # Randomly choose the flip direction cur_dir = np.random.choice(direction_list, p=flip_ratio_list) ... return results
MMCV (new version)
class RandomFlip(BaseTransfrom): def transform(self, results): """Randomly flip images""" ... cur_dir = self._random_direction() ... return results @cache_randomness def _random_direction(self): """Randomly choose the flip direction""" ... return np.random.choice(direction_list, p=flip_ratio_list)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4