Shortcuts

Advanced Data Processing Configuration

Processing Chain

The HandlerChain allows you to sequentially combine multiple data processing modules. The output of each module overwrites the input content and is passed as the input to the next module.

Select Parameters

For example, common image processing:

from rainbowneko.data.handler import HandlerChain, ImageHandler, LoadImageHandler
from rainbowneko.data import FixedBucket
from torchvision import transforms as T

handler = HandlerChain(
    load=LoadImageHandler(),
    bucket=FixedBucket.handler,  # The bucket includes some built-in processing modules
    image=ImageHandler(transform=T.Compose([
        T.RandomCrop(size=32, padding=4),
        T.RandomHorizontalFlip(),
        T.ToTensor(),
        T.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.2010]),
    ]),
    )
),

Processing Group

The HandlerChain also allows for parallel combination of multiple data processing modules. Data is input into each module, and their outputs are aggregated to produce the final output.

Select Parameters

For instance, reading an image and applying different processing steps to store results in separate variables:

from rainbowneko.data.handler import HandlerGroup, HandlerChain, ImageHandler, LoadImageHandler

handler = HandlerChain(
    load=LoadImageHandler(),
    image=HandlerGroup(
        weak=ImageHandler(..., key_map_out=('image -> image_weak',)),
        strong=ImageHandler(..., key_map_out=('image -> image_strong',)),
    )
),

Synchronizing Random Seeds for Processing

In certain scenarios, multiple processing modules may need to use the same random seed. For example, in super-resolution tasks where both low-resolution (LR) and high-resolution (HR) images require identical random cropping. In such cases, you can use SyncHandler:

from rainbowneko.data.handler import SyncHandler, ImageHandler

SyncHandler(
    LR=ImageHandler(...),
    HR=ImageHandler(...),
)

Data Flow Control

Handlers allow control over data flow by specifying the direction of input and output data using key_map_in and key_map_out:

ImageHandler(..., key_map_in=('image -> image',), key_map_out=('image -> image_weak',))

Tip

If the input is a nested structure like a dictionary or list (e.g., {'data': {'image': img, 'label': label}}), you can specify the input flow using syntax like ('data.image -> image', 'data.label -> label').

For structures like {'data': [img, label]}, you can use indexing to specify inputs as follows: ('data.0 -> image', 'data.1 -> label').

Similarly, outputs can be mapped using operations like ('image -> data.image', 'label -> data.label').