Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train the DL-based reconstruction with UFLoss #6

Open
Aristot1e opened this issue Nov 4, 2021 · 9 comments
Open

Train the DL-based reconstruction with UFLoss #6

Aristot1e opened this issue Nov 4, 2021 · 9 comments

Comments

@Aristot1e
Copy link

Traceback (most recent call last):
File "../train_ufloss.py", line 803, in
main(args)
File "../train_ufloss.py", line 562, in main
model_re.load_state_dict(
File "/home/img/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
size mismatch for memory_bank: copying a param with shape torch.Size([256, 1]) from checkpoint, the shape in current model is torch.Size([256, 2457600]).

This is the error when running launch_training_MoDL_traditional_UFLoss_256_demo.sh. The model shape is not corresponding, so why? I can’t deal with it.
And the other problem is in the file train_ufloss.py in line 193/194.
if args.loss_normalized == False:
output = output * std + mean
target = target * std + mean
Both the std and mean are not defined. What should I do?

@KeWang0622
Copy link
Member

Please set args.loss_normalized = True and try again, and I will solve this issue, Thanks!

@Aristot1e
Copy link
Author

Please set args.loss_normalized = True and try again, and I will solve this issue, Thanks!
Namespace(accelerations=[10, 15], batch_size=1, checkpoint=None, circular_pad=True, data_parallel=False, data_path='/home/img/Desktop/lff/Dataset/pre-processed/multicoil', device='cuda', device_num='0', drop_prob=0.0, efficient_ufloss=False, exp_dir='/home/img/Desktop/lff/Dataset/summary/train-3D_MELD_4steps_MoDLflag0_shared_CGsteps_6date_20210929_ufloss0_ufloss_weight_10_dimension_256_debug', fix_step_size=True, ge_mask=None, kernel_size=3, ### loss_normalized='True', loss_type=2, loss_uflossdir='/data/train_ufloss/train_UFLoss_feature_256_features_date_202104283_temperature_1_lr1e-5/checkpoints/ckpt200.pth', lr=0.0002, lr_gamma=0.5, lr_step_size=20, meld_cp=False, meld_flag=False, modl_flag=True, modl_lamda=0.05, num_cg_steps=6, num_emaps=1, num_epochs=2000, num_features=256, num_grad_steps=4, num_resblocks=2, patch_size=64, report_interval=10, resume=False, sample_rate=1.0, seed=42, share_weights=True, slwin_init=True, ufloss3d=False, ufloss_weight=10.0, uflossfreq=8, weight_decay=0.0)
Using parameters:
Temperature: 1.0
2
Traceback (most recent call last):
File "../train_ufloss.py", line 803, in
main(args)
File "../train_ufloss.py", line 562, in main
model_re.load_state_dict(
File "/home/img/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Model:
size mismatch for memory_bank: copying a param with shape torch.Size([256, 1]) from checkpoint, the shape in current model is torch.Size([256, 2457600]).

The loss_normalized is setting True, but it can’t help.

@KeWang0622
Copy link
Member

I see, how did you train the UFLoss? the error is not about the normalization. It's about the checkpoint loading, how many patches did you use to train the UFLoss feature mapping network

@Aristot1e
Copy link
Author

I see, how did you train the UFLoss? the error is not about the normalization. It's about the checkpoint loading, how many patches did you use to train the UFLoss feature mapping network

I trained the UFloss using launch_training_patch_learning.sh.
And the total patch_extraction number should be 15568.

@Aristot1e
Copy link
Author

I see, how did you train the UFLoss? the error is not about the normalization. It's about the checkpoint loading, how many patches did you use to train the UFLoss feature mapping network

The total patch_data number I used is 311360. The multicoil knee dataset I downloaded has 973 .h5 files. And then it becomes 15568 going through the data_preprocessing.py. Then to do patch_extraction.py, it becomes 311360. But the error say the current model is torch.Size([256,2457600]). I don't know why it's so huge.
Another question is when training the UFloss, the loss is too big (11.3+) after running 200 epochs, how can I make it smaller?

@Aristot1e
Copy link
Author

Aristot1e commented Nov 14, 2021

I've got some new problem. After Successfully loaded UFLoss model (Traditional), the error appeared .

  Traceback (most recent call last):
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 95, in apply
      output = self._apply(input)
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 1330, in _apply
      return block.array_to_blocks(input, self.blk_shape,
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/block.py", line 103, in array_to_blocks
      raise ValueError('Only support ndim=1, 2, or 3, got {}'.format(ndim))
  ValueError: Only support ndim=1, 2, or 3, got 4
  
  The above exception was the direct cause of the following exception:
  
  Traceback (most recent call last):
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 95, in apply
      output = self._apply(input)
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 362, in _apply
      output = linop(output)
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 122, in __call__
      return self.__mul__(input)
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 131, in __mul__
      return self.apply(input)
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 98, in apply
      raise RuntimeError('Exceptions from {}.'.format(self)) from e
  RuntimeError: Exceptions from <[1, 1, 73, 40, 1, 2, 60, 60]x[1, 2, 640, 372]> ArrayToBlocks Linop>.
  
  The above exception was the direct cause of the following exception:
  
  Traceback (most recent call last):
    File "../train_ufloss.py", line 785, in <module>
      main(args)
    File "../train_ufloss.py", line 568, in main
      train_loss, train_l2, train_ufloss, train_time = train_epoch(args, epoch, model, train_loader, optimizer, writer, model_ufloss)
    File "../train_ufloss.py", line 273, in train_epoch
      ) = compute_metrics(args, model, data, model_ufloss)
    File "../train_ufloss.py", line 223, in compute_metrics
      output_patch = Fa2b(output_roll)
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/pytorch.py", line 118, in forward
      return to_pytorch(linop(from_pytorch(
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 122, in __call__
      return self.__mul__(input)
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 131, in __mul__
      return self.apply(input)
    File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 98, in apply
      raise RuntimeError('Exceptions from {}.'.format(self)) from e
  RuntimeError: Exceptions from <[2920, 2, 60, 60]x[1, 2, 640, 372]> Reshape * ArrayToBlocks Linop>.

It's about compute_metrics in train_ufloss.py and in train_ufloss.py the line 204 to 228. I don't understand it. Can you help me explain? I'll thank you so much.

               arraytoblock = sp.linop.ArrayToBlocks( 
                    ishape=list(
                        (
                            output_roll.shape[0],
                            2,
                            output_roll.shape[2],
                            output_roll.shape[3],
                        )
                    ),
                    blk_shape=list((output_roll.shape[0], 2, 60, 60)),
                    blk_strides=list((1, 1, n_featuresq, n_featuresq)),
                )
    
                reshape = sp.linop.Reshape(
                    ishape=arraytoblock.oshape,
                    oshape=(arraytoblock.oshape[2] * arraytoblock.oshape[3], 2, 60, 60),
                )
    
                Fa2b = sp.to_pytorch_function(reshape * arraytoblock).apply
                output_patch = Fa2b(output_roll)
                target_patch = Fa2b(target_roll)
    
                output_features = model_ufloss(output_patch)
                target_features = model_ufloss(target_patch)
                ufloss = nn.MSELoss()(output_features[0], target_features[0])

@Aristot1e
Copy link
Author

       File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/linop.py", line 1330, in _apply
            return block.array_to_blocks(input, self.blk_shape,
          File "/home/img/anaconda3/lib/python3.8/site-packages/sigpy/block.py", line 103, in array_to_blocks
            raise ValueError('Only support ndim=1, 2, or 3, got {}'.format(ndim))
        ValueError: Only support ndim=1, 2, or 3, got 4

In sigpy.block.arrat_to_blocks, the dim should be <=3 . Source code: (blk_shape (tuple): block shape of length ndim, with ndim={1, 2, 3}.) But the blk_shape dim you gave is 4 lead to this problem. Which dim should be deleted or something else. I have try my best to deal with it, but it doesn't work. May you give me some advice.

@KeWang0622
Copy link
Member

Hi, I believe it is a sigpy version mismatch!
Maybe we can schedule a quick chat to address these issues? And I will update the repo accordingly.
Apologize for the bugs, it's in a early development stage and thanks for your feedbacks.
What would be the best way to contact you?
Ke

@Aristot1e
Copy link
Author

Thanks you for replying. We can contact by email or github. And my email is [email protected]. You can email to me anytime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants