The following are my notes while debugging the issue on 4 & 5 June 2018

Some formatting is lost in the conversion from Apple Notes to Markdown.

WNixalo


cifar10 codealong tensor issue

1. Training FastAI learner with PyTorch dataloaders:

TypeError: eq received an invalid combination of arguments - got (torch.LongTensor), but expected one of:
 * (int value)
      didn't match because some of the arguments have invalid types: (torch.LongTensor)
 * (torch.cuda.LongTensor other)
      didn't match because some of the arguments have invalid types: (torch.LongTensor)

targs in fastai.metrics.accuracy is torch.LongTensor

  • —> should be torch.cuda.LongTensor

what is calling metrics.accuracy? where is targs created?

  • —> fastai.model.validate calls metrics as f on preds.data & y.

that y is given to model.validate. where does y come from?

  • —> No. model.validate is given a dataloader dl, which it uses to get y.

what’s the dataloader’s signature?

  • —> <torch.utils.data.dataloader.DataLoader object at 0x7f5c9365b048>
  • —> >>> vars(dl): {'dataset': <torchvision.datasets.folder.ImageFolder object at 0x7f5c949ce978>, 'batch_size': 512, 'num_workers': 4, 'collate_fn': <function default_collate at 0x7f5c96193268>, 'pin_memory': True, 'drop_last': False, 'timeout': 0, 'worker_init_fn': None, 'sampler': <torch.utils.data.sampler.SequentialSampler object at 0x7f5c9365b080>, 'batch_sampler': <torch.utils.data.sampler.BatchSampler object at 0x7f5c9365b0b8>}

let’s find where that dataloader was passed in from:

  • —> fastai.model.fit called validate.
  • —> dl is passed in as cur_data.val_dl

my val_loader is not on the gpu.. should that be done manually? is any of my data on the gpu?

  • —> why is training working? what datatype is returned from my dataloaders?
    • —> next(iter(learner.data.val_dl))[0].type() returns ‘torch.FloatTensor’. As does …trn_dl….

how do dataloaders work w/ the gpu? how does fastai do it? —> if I set up the usual fastai way, do I get cuda tensors?

- >>> tmfs=tfms_from_stats(stats,sz=32)
- >>> md = ImageClassifierData.from_csv(PATH, ‘train’, PATH/‘tmp.csv’, tfms=tfms)
- >>> next(iter(md.trn_dl))[0].type() returns ‘torch.cuda.FloatTensor`
  • —> my dataloaders aren’t working w/ the gpu.

what is the fastai mechanism responsible for putting dataloaders on the gpu?

  • —> fastai.dataloader.DataLoader class automatically sends data to the gpu IF available, ELSE cpu; via .get_tensor(•) via fastai.core.to_gpu(•)

<<< PAUSE >>>


1A. Using FastAI dataloaders:

TypeError: batch must contain numbers, dicts or lists; found <class 'torch.FloatTensor'>

why am I unable to pull data from the dataloader? how was it incorrectly initialized?

  • —> FastAI dls want me to play with val_idxs; I dont want to.
  • —> (pytorch non-cuda tensors) apparently this is a pytorch bug. Fixed by now. But I’m not using v0.4.
  • —> so can I just made a wrapper to call to_gpu() on w/e the pytorch dl returns?
    • —> how the fuck do I do that. <<< END >>>

<<< RESUME >>>

1B. Training FastAI learner with PyTorch dataloaders:

—> The Error:

  • via fastai ModelData&Loaders: preds is a torch.autograd.variable.Variable, but preds.data is a torch.cuda.FloatTensor. y is a torch.cuda.LongTensor.

Calling torch.max(X, dim=1)[1] doesn’t change type for X=preds, but returns torch.cuda.LongTensor for X=preds.data. Comparing f(preds.data) & y is valid; f(pred)s & y is not.

  • x: torch.cuda.FloatTensor
  • y: torch.cuda.LongTensor
    • preds,loss = stepper(VV(x), VV(y))
  • preds: torch.autograd.variable.Variable
  • preds.data:torch.cuda.FloatTensor
  • type(torch.max(preds.data, dim=1)[1]): torch.cuda.LongTensor
  • Now to see what’s different w/ pytorch dataloaders:

x is a list of torch.FloatTensor. preds is torch.autograd.variable.Variable. preds.data is torch.cuda.FloatTensor. Where f:torch.max(..): f(preds):torch.autograd.variable.Variable; f(preds.data,..) yields torch.cuda.LongTensor. y is a torch.LongTensor. no cuda. Interesting.

  • (x,y) are yielded by dl, where dl is cur_data.val_dl
  • x: torch.FloatTensor
  • y: torch.LongTensor
    • preds,loss = stepper(VV(x), VV(y))
  • preds: torch.autograd.variable.Variable
  • preds.data: torch.cuda.FloatTensor
  • type(torch.max(preds.data, dim=1)[1]): torch.cuda.LongTensor

  • —> this all points to the root of the problem being that y is yielded as a torch.LongTensor instead of a torch.cuda.LongTensor by the pytorch dataloader.

    just found out everything ‘works fine’ if the batch size is 8. Currently finding the cut-off point. —> starts breaking at bs=13.


<< continuing >>

x is automatically placed on GPU via VV(x). VV(•) calls map_over(•, VV_) which calls VV_ on every element of . VV_(•) calls create_variable(•, True) which calls Variable(T(•)), and T(•) automatically puts on the gpu via to_gpu(•).

using the version of fastai Radek used for his cifar10 baseline works…

—>> The issue was different fastai versions. Somewhere there is an added/missing to_gpu() call on y.


In [ ]: