The following are my notes while debugging the issue on 4 & 5 June 2018
Some formatting is lost in the conversion from Apple Notes to Markdown.
WNixalo
TypeError: eq received an invalid combination of arguments - got (torch.LongTensor), but expected one of:
* (int value)
didn't match because some of the arguments have invalid types: (torch.LongTensor)
* (torch.cuda.LongTensor other)
didn't match because some of the arguments have invalid types: (torch.LongTensor)
targs
in fastai.metrics.accuracy
is torch.LongTensor
what is calling metrics.accuracy? where is targs
created?
fastai.model.validate
calls metrics as f
on preds.data
& y
.that y
is given to model.validate
. where does y
come from?
model.validate
is given a dataloader dl, which it uses to get y
.what’s the dataloader’s signature?
<torch.utils.data.dataloader.DataLoader object at 0x7f5c9365b048>
>>> vars(dl): {'dataset': <torchvision.datasets.folder.ImageFolder object at 0x7f5c949ce978>, 'batch_size': 512, 'num_workers': 4, 'collate_fn': <function default_collate at 0x7f5c96193268>, 'pin_memory': True, 'drop_last': False, 'timeout': 0, 'worker_init_fn': None, 'sampler': <torch.utils.data.sampler.SequentialSampler object at 0x7f5c9365b080>, 'batch_sampler': <torch.utils.data.sampler.BatchSampler object at 0x7f5c9365b0b8>}
let’s find where that dataloader was passed in from:
fastai.model.fit
called validate
.dl
is passed in as cur_data.val_dl
my val_loader
is not on the gpu.. should that be done manually? is any of my data on the gpu?
next(iter(learner.data.val_dl))[0].type()
returns ‘torch.FloatTensor’
. As does …trn_dl
….how do dataloaders work w/ the gpu? how does fastai do it? —> if I set up the usual fastai way, do I get cuda tensors?
- >>> tmfs=tfms_from_stats(stats,sz=32)
- >>> md = ImageClassifierData.from_csv(PATH, ‘train’, PATH/‘tmp.csv’, tfms=tfms)
- >>> next(iter(md.trn_dl))[0].type() returns ‘torch.cuda.FloatTensor`
what is the fastai mechanism responsible for putting dataloaders on the gpu?
fastai.dataloader.DataLoader
class automatically sends data to the gpu IF available, ELSE cpu; via .get_tensor(•)
via fastai.core.to_gpu(•)
<<< PAUSE >>>
TypeError: batch must contain numbers, dicts or lists; found <class 'torch.FloatTensor'>
why am I unable to pull data from the dataloader? how was it incorrectly initialized?
val_idxs
; I dont want to.to_gpu()
on w/e the pytorch dl returns?<<< RESUME >>>
—> The Error:
preds
is a torch.autograd.variable.Variable
, but preds.data
is a torch.cuda.FloatTensor
. y is a torch.cuda.LongTensor
.Calling torch.max(X, dim=1)[1]
doesn’t change type for X=preds
, but returns torch.cuda.LongTensor
for X=preds.data
.
Comparing f(preds.data)
& y
is valid; f(pred)
s & y
is not.
x
: torch.cuda.FloatTensor
y
: torch.cuda.LongTensor
preds,loss = stepper(VV(x), VV(y))
preds: torch.autograd.variable.Variable
preds.data:
torch.cuda.FloatTensor
type(torch.max(preds.data, dim=1)[1])
: torch.cuda.LongTensor
x is a list of torch.FloatTensor. preds is torch.autograd.variable.Variable. preds.data is torch.cuda.FloatTensor. Where f
:torch.max(..)
: f(preds)
:torch.autograd.variable.Variable
; f(preds.data,..)
yields torch.cuda.LongTensor
. y
is a torch.LongTensor
. no cuda
. Interesting.
(x,y)
are yielded by dl
, where dl
is cur_data.val_dl
x
: torch.FloatTensor
y
: torch.LongTensor
preds,loss = stepper(VV(x), VV(y))
preds
: torch.autograd.variable.Variable
preds.data
: torch.cuda.FloatTensor
type(torch.max(preds.data, dim=1)[1])
: torch.cuda.LongTensor
—> this all points to the root of the problem being that y
is yielded as a torch.LongTensor
instead of a torch.cuda.LongTensor
by the pytorch dataloader.
just found out everything ‘works fine’ if the batch size is 8. Currently finding the cut-off point. —> starts breaking at bs=13
.
<< continuing >>
x
is automatically placed on GPU via VV(x)
. VV(•)
calls map_over(•, VV_)
which calls VV_
on every element of •
. VV_(•)
calls create_variable(•, True)
which calls Variable(T(•))
, and T(•)
automatically puts •
on the gpu via to_gpu(•)
.
using the version of fastai Radek used for his cifar10 baseline works…
—>> The issue was different fastai versions. Somewhere there is an added/missing to_gpu()
call on y
.
In [ ]: