Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to make dataloader return multiple tensors each time #17773

Closed
zhangyu2ustc opened this issue Mar 7, 2019 · 3 comments
Closed

how to make dataloader return multiple tensors each time #17773

zhangyu2ustc opened this issue Mar 7, 2019 · 3 comments
Labels
module: dataloader Related to torch.utils.data.DataLoader and Sampler

Comments

@zhangyu2ustc
Copy link

zhangyu2ustc commented Mar 7, 2019

🚀 Feature

I am working on medical image analysis, in which for data sample (i.e. medial image) it contains multiple samples (data + label) for training. How can I write my own dataset and dataloader to extract all the tensors from one image?

Motivation

Here is my definition of the dataset:
class HCP_taskfmri_datasets(Dataset):
##build a new class for own dataset

def __init__(self, output_dir, fmri_files, confound_files, label_matrix, target_name, perm=None, data_type='train',block_dura=1,transform=None):
    super(HCP_taskfmri_datasets,self).__init__()
    self.pathout = output_dir
    os.makedirs(self.pathout, exist_ok=True)
    self.fmri_files = fmri_files
    self.confound_files = confound_files
    self.label_matrix = pd.DataFrame(data=label_matrix)
    self.target_name = target_name
    
    self.block_dura = block_dura
    self.data_type = data_type
    self.transform = transform

def __len__(self):
    return len(self.fmri_files)

def __getitem__(self,idx):
    fmri_file = self.fmri_files[idx]
    confound_file = self.confound_files[idx]
    label_trial_data = self.label_matrix.iloc[idx]

    fmri_data, label_data = self.map_load_fmri_event_block(fmri_file, label_trial_data, block_dura=self.block_dura)  
   ##function to extract data and save into 3d array
    print(fmri_data.shape, label_data.shape)  
    ##shape of data:(170, 1, 360) shape of label (170,)
    ###which means that we have 170 samples extracted from each image file

    tensor_x = torch.stack([torch.Tensor(fmri_data[ii].transpose()) for ii in range(len(label_data))]) # transform to torch tensors
    tensor_y = torch.stack([torch.Tensor([label_data[ii]]) for ii in range(len(label_data))])
    print(tensor_x.size(),tensor_y.size())
    
    return tensor_x, tensor_y

###the dataloader will treat all 170 samples as one tensor. How can I extract each individual sample and use it for training the model?

Pitch

Can we make the a for loop for the get_item function? for instance:

    for ii in range(trailNum):
        tensor_x = torch.Tensor(fmri_data[ii]) # transform to torch tensors
        tensor_y = torch.Tensor([label_data[ii]])
        sample = {'input': tensor_x, 'target': tensor_y} 
        yield sample

Alternatives

Additional context

@ssnl
Copy link
Collaborator

ssnl commented Mar 7, 2019

#14705 will support something like your proposed snippet.

However, in your case, your __len__ should really be len(self.fmri_files) * 170 as your have 170* num_files samples. But that will cause you to read the same file 170 times, which is not quite efficient. So I would suggest just use a inner loop to loop over the 170 samples fetched each time.

@vishwakftw vishwakftw added the module: dataloader Related to torch.utils.data.DataLoader and Sampler label Mar 8, 2019
@ssnl
Copy link
Collaborator

ssnl commented Aug 23, 2019

Fixed via #19228

@ssnl ssnl closed this as completed Aug 23, 2019
@t2ac32
Copy link

t2ac32 commented Sep 10, 2019

#14705 will support something like your proposed snippet.

However, in your case, your __len__ should really be len(self.fmri_files) * 170 as your have 170* num_files samples. But that will cause you to read the same file 170 times, which is not quite efficient. So I would suggest just use a inner loop to loop over the 170 samples fetched each time.

Hi @ssnl !

Could you elaborate a bit more on what has to be done to achieve this behavior?
I currently have N ids in my data set and i want to return 15 samples per id:
So.. len(self) = len(dataset)*15

But i cant manage to make my getitem to return 15 samples.

This is currently my getitem function:
`
def getitem(self, idx):

    img_id = self.ids[idx]

    images, target = get_imgs_and_masks(img_id, self.dir_img,
                                        self.dir_mask,
                                        self.expositions)
    
    for i in range(self.expositions):
        tensor_x = torch.Tensor(images[i])
        tensor_y = torch.Tensor(target)   
        sample = {'input': tensor_x, 'target': tensor_y)

        if self.transform:
            sample = self.transform(sample)
       
        return sample

`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: dataloader Related to torch.utils.data.DataLoader and Sampler
Projects
None yet
Development

No branches or pull requests

4 participants