InfluxDB Logger Example

This notebook is a small demo of how to use gpumon in Jupyter notebooks and some convenience methods for working with GPUs You will need to have PyTorch and Torchvision installed to run this as well as the python InfluxDB client

To install Pytorch and associated requiremetns run the following:

cuda install pytorch torchvision cuda80 -c python

To install python InfluxDB client

pip install influxdb

see here for more details on the InfluxDB client


In [37]:
from gpumon import device_count, device_name

In [38]:
device_count() # Returns the number of GPUs available


Out[38]:
4

In [39]:
device_name() # Returns the type of GPU available


Out[39]:
'Tesla P40'

Let's create a simple CNN and run the CIFAR dataset against it to see the load on our GPU


In [1]:
import torch
import torchvision
import torchvision.transforms as transforms

In [2]:
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64,
                                          shuffle=True, num_workers=4)

classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')


Files already downloaded and verified

In [3]:
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


net = Net()

In [4]:
net.cuda()


Out[4]:
Net(
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (pool): MaxPool2d(kernel_size=(2, 2), stride=(2, 2), dilation=(1, 1), ceil_mode=False)
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

In [5]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

In [6]:
from gpumon.influxdb import log_context

In [7]:
display_every_minibatches=100

Be carefull that you specify the correct host and credentials in the context below


In [8]:
with log_context('localhost', 'admin', 'password', 'gpudb', 'gpuseries'):
    for epoch in range(20):  # loop over the dataset multiple times

        running_loss = 0.0
        for i, data in enumerate(trainloader, 0):
            # get the inputs
            inputs, labels = data

            # wrap them in Variable
            inputs, labels = Variable(inputs.cuda()), Variable(labels.cuda())

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.data[0]
        print('[%d] loss: %.3f' %
              (epoch + 1, running_loss / (i+1)))

    print('Finished Training')


[1] loss: 2.301
[2] loss: 2.241
[3] loss: 1.954
[4] loss: 1.778
[5] loss: 1.670
[6] loss: 1.593
[7] loss: 1.532
[8] loss: 1.475
[9] loss: 1.422
[10] loss: 1.370
[11] loss: 1.330
[12] loss: 1.289
[13] loss: 1.254
[14] loss: 1.221
[15] loss: 1.194
[16] loss: 1.161
[17] loss: 1.137
[18] loss: 1.111
[19] loss: 1.086
[20] loss: 1.069
Finished Training

If you had your Grafana dashboard running you should have seen the measurements there. You can also pull the data from the database using the InfluxDB python client


In [22]:
from influxdb import InfluxDBClient, DataFrameClient

In [14]:
client = InfluxDBClient(host='localhost', username='admnin', password='password', database='gpudb')

In [15]:
client.get_list_measurements()


Out[15]:
[{'name': 'gpuseries'}]

In [30]:
data = client.query('select * from gpuseries limit 10;')

In [31]:
type(data)


Out[31]:
influxdb.resultset.ResultSet

In [32]:
data


Out[32]:
ResultSet({'('gpuseries', None)': [{'time': '2018-04-02T15:00:00.452204032Z', 'GPU': '0', 'Memory Used': 8685355008, 'Memory Used Percent': 36.14988257180032, 'Memory Utilization': 70, 'Power': 94728, 'Temperature': 39, 'Utilization': 90, 'timestamp': '2018-04-02 15:00:00.452204'}, {'time': '2018-04-02T15:00:00.499869184Z', 'GPU': '1', 'Memory Used': 6447693824, 'Memory Used Percent': 26.83636700881325, 'Memory Utilization': 56, 'Power': 57664, 'Temperature': 42, 'Utilization': 73, 'timestamp': '2018-04-02 15:00:00.499869'}, {'time': '2018-04-02T15:00:00.54717312Z', 'GPU': '2', 'Memory Used': 8507097088, 'Memory Used Percent': 35.40794365628589, 'Memory Utilization': 55, 'Power': 57827, 'Temperature': 38, 'Utilization': 68, 'timestamp': '2018-04-02 15:00:00.547173'}, {'time': '2018-04-02T15:00:00.595969024Z', 'GPU': '3', 'Memory Used': 2947547136, 'Memory Used Percent': 12.268178185359254, 'Memory Utilization': 38, 'Power': 62068, 'Temperature': 36, 'Utilization': 73, 'timestamp': '2018-04-02 15:00:00.595969'}, {'time': '2018-04-02T15:00:02.519879168Z', 'GPU': '0', 'Memory Used': 9931063296, 'Memory Used Percent': 41.334726287277654, 'Memory Utilization': 0, 'Power': 57226, 'Temperature': 36, 'Utilization': 0, 'timestamp': '2018-04-02 15:00:02.519879'}, {'time': '2018-04-02T15:00:02.565956096Z', 'GPU': '1', 'Memory Used': 7693402112, 'Memory Used Percent': 32.02121072429059, 'Memory Utilization': 0, 'Power': 56027, 'Temperature': 40, 'Utilization': 0, 'timestamp': '2018-04-02 15:00:02.565956'}, {'time': '2018-04-02T15:00:02.615364096Z', 'GPU': '2', 'Memory Used': 9752805376, 'Memory Used Percent': 40.59278737176322, 'Memory Utilization': 0, 'Power': 56572, 'Temperature': 36, 'Utilization': 0, 'timestamp': '2018-04-02 15:00:02.615364'}, {'time': '2018-04-02T15:00:02.66162304Z', 'GPU': '3', 'Memory Used': 3641704448, 'Memory Used Percent': 15.157375609303696, 'Memory Utilization': 1, 'Power': 185043, 'Temperature': 35, 'Utilization': 7, 'timestamp': '2018-04-02 15:00:02.661623'}, {'time': '2018-04-02T15:00:04.063194112Z', 'GPU': '0', 'Memory Used': 3687841792, 'Memory Used Percent': 22.428376981345146, 'Memory Utilization': 0, 'Power': 69760, 'Temperature': 36, 'Utilization': 0, 'timestamp': '2018-04-02 15:00:04.063194'}, {'time': '2018-04-02T15:00:04.11728512Z', 'GPU': '1', 'Memory Used': 4067426304, 'Memory Used Percent': 16.929300313414636, 'Memory Utilization': 0, 'Power': 57086, 'Temperature': 39, 'Utilization': 0, 'timestamp': '2018-04-02 15:00:04.117285'}]})

In [23]:
df_client = DataFrameClient(host='localhost', username='admnin', password='password', database='gpudb')

In [33]:
df = df_client.query('select * from gpuseries limit 100;')['gpuseries']

In [36]:
df.head(100)


Out[36]:
GPU Memory Used Memory Used Percent Memory Utilization Power Temperature Utilization timestamp
2018-04-02 15:00:00.452204032+00:00 0 8685355008 36.149883 70 94728 39 90 2018-04-02 15:00:00.452204
2018-04-02 15:00:00.499869184+00:00 1 6447693824 26.836367 56 57664 42 73 2018-04-02 15:00:00.499869
2018-04-02 15:00:00.547173120+00:00 2 8507097088 35.407944 55 57827 38 68 2018-04-02 15:00:00.547173
2018-04-02 15:00:00.595969024+00:00 3 2947547136 12.268178 38 62068 36 73 2018-04-02 15:00:00.595969
2018-04-02 15:00:02.519879168+00:00 0 9931063296 41.334726 0 57226 36 0 2018-04-02 15:00:02.519879
2018-04-02 15:00:02.565956096+00:00 1 7693402112 32.021211 0 56027 40 0 2018-04-02 15:00:02.565956
2018-04-02 15:00:02.615364096+00:00 2 9752805376 40.592787 0 56572 36 0 2018-04-02 15:00:02.615364
2018-04-02 15:00:02.661623040+00:00 3 3641704448 15.157376 1 185043 35 7 2018-04-02 15:00:02.661623
2018-04-02 15:00:04.063194112+00:00 0 3687841792 22.428377 0 69760 36 0 2018-04-02 15:00:04.063194
2018-04-02 15:00:04.117285120+00:00 1 4067426304 16.929300 0 57086 39 0 2018-04-02 15:00:04.117285
2018-04-02 15:00:04.170159104+00:00 2 4539285504 18.893256 3 162872 35 4 2018-04-02 15:00:04.170159
2018-04-02 15:00:04.219676928+00:00 3 3698327552 15.393050 0 59018 35 0 2018-04-02 15:00:04.219677
2018-04-02 15:00:05.273978880+00:00 0 4237295616 17.636324 60 58091 37 94 2018-04-02 15:00:05.273979
2018-04-02 15:00:05.316706048+00:00 1 1190133760 4.953533 51 56316 40 91 2018-04-02 15:00:05.316706
2018-04-02 15:00:05.358547968+00:00 2 1194328064 4.970991 6 57086 36 41 2018-04-02 15:00:05.358548
2018-04-02 15:00:05.401321984+00:00 3 1200619520 4.997177 0 56652 35 0 2018-04-02 15:00:05.401322
2018-04-02 15:00:06.460380160+00:00 0 12890144768 53.650912 54 149179 38 92 2018-04-02 15:00:06.460380
2018-04-02 15:00:06.509203968+00:00 1 10652483584 44.337397 79 115113 41 89 2018-04-02 15:00:06.509204
2018-04-02 15:00:06.561969152+00:00 2 10656677888 44.354854 84 60168 37 89 2018-04-02 15:00:06.561969
2018-04-02 15:00:06.611894016+00:00 3 10662969344 44.381040 58 59793 36 62 2018-04-02 15:00:06.611894
2018-04-02 15:00:07.817122048+00:00 0 7668236288 31.916466 8 57674 36 24 2018-04-02 15:00:07.817122
2018-04-02 15:00:07.872345856+00:00 1 4017094656 16.719812 6 62825 40 44 2018-04-02 15:00:07.872346
2018-04-02 15:00:07.927056128+00:00 2 5434769408 22.620408 14 57944 36 100 2018-04-02 15:00:07.927056
2018-04-02 15:00:07.973288960+00:00 3 10662969344 44.381040 40 138665 36 100 2018-04-02 15:00:07.973289
2018-04-02 15:00:09.074198016+00:00 0 14886633472 61.960628 24 195220 37 58 2018-04-02 15:00:09.074198
2018-04-02 15:00:09.132369152+00:00 1 6844055552 28.486090 24 146183 40 26 2018-04-02 15:00:09.132369
2018-04-02 15:00:10.191634944+00:00 2 14066647040 58.547709 22 55931 36 21 2018-04-02 15:00:10.191635
2018-04-02 15:00:10.257611008+00:00 3 8622440448 35.888022 0 56455 34 0 2018-04-02 15:00:10.257611
2018-04-02 15:00:11.995235072+00:00 0 9788456960 40.741175 0 57415 35 0 2018-04-02 15:00:11.995235
2018-04-02 15:00:12.071012096+00:00 1 8611954688 35.844378 14 56604 39 15 2018-04-02 15:00:12.071012
... ... ... ... ... ... ... ... ...
2018-04-02 15:00:28.550659840+00:00 2 3872391168 16.117532 1 58820 36 2 2018-04-02 15:00:28.550660
2018-04-02 15:00:28.607621888+00:00 3 3675258880 15.297035 0 76506 35 0 2018-04-02 15:00:28.607622
2018-04-02 15:00:29.683577088+00:00 0 3599761408 14.982802 43 58089 37 86 2018-04-02 15:00:29.683577
2018-04-02 15:00:29.725943040+00:00 1 1190133760 4.953533 40 56219 40 92 2018-04-02 15:00:29.725943
2018-04-02 15:00:29.766936064+00:00 2 1194328064 4.970991 4 57278 37 24 2018-04-02 15:00:29.766936
2018-04-02 15:00:29.810834944+00:00 3 1200619520 4.997177 0 56842 36 90 2018-04-02 15:00:29.810835
2018-04-02 15:00:30.872054016+00:00 0 3599761408 14.982802 0 57511 35 0 2018-04-02 15:00:30.872054
2018-04-02 15:00:30.914840064+00:00 1 1190133760 4.953533 0 56123 39 0 2018-04-02 15:00:30.914840
2018-04-02 15:00:30.957829888+00:00 2 1194328064 4.970991 0 56990 35 0 2018-04-02 15:00:30.957830
2018-04-02 15:00:31.000922880+00:00 3 1200619520 4.997177 0 56548 34 0 2018-04-02 15:00:31.000923
2018-04-02 15:00:32.136954112+00:00 0 3599761408 14.982802 0 57415 35 0 2018-04-02 15:00:32.136954
2018-04-02 15:00:32.231229184+00:00 1 1190133760 4.953533 0 55897 39 0 2018-04-02 15:00:32.231229
2018-04-02 15:00:33.769680896+00:00 2 1194328064 4.970991 0 51498 34 0 2018-04-02 15:00:33.769681
2018-04-02 15:00:33.813342208+00:00 3 1200619520 4.997177 0 51057 33 0 2018-04-02 15:00:33.813342
2018-04-02 15:00:34.890827776+00:00 0 3599761408 14.982802 0 103997 34 0 2018-04-02 15:00:34.890828
2018-04-02 15:00:34.940800+00:00 1 1242562560 5.171751 0 51565 38 5 2018-04-02 15:00:34.940800
2018-04-02 15:00:35.006163968+00:00 2 1194328064 4.970991 0 56759 34 0 2018-04-02 15:00:35.006164
2018-04-02 15:00:35.070588160+00:00 3 1200619520 4.997177 1 51219 33 1 2018-04-02 15:00:35.070588
2018-04-02 15:00:36.154573056+00:00 0 9335472128 38.855777 1 52021 34 7 2018-04-02 15:00:36.154573
2018-04-02 15:00:36.199578880+00:00 1 7439646720 30.965039 3 50633 38 3 2018-04-02 15:00:36.199579
2018-04-02 15:00:36.249982208+00:00 2 8783921152 36.560131 9 51595 34 10 2018-04-02 15:00:36.249982
2018-04-02 15:00:36.296148992+00:00 3 9027190784 37.572660 2 51255 33 4 2018-04-02 15:00:36.296149
2018-04-02 15:00:39.521494016+00:00 0 10113515520 42.094123 46 154253 36 55 2018-04-02 15:00:39.521494
2018-04-02 15:00:39.565041152+00:00 1 9889120256 41.160152 29 101989 40 38 2018-04-02 15:00:39.565041
2018-04-02 15:00:39.608422144+00:00 2 9884925952 41.142695 25 153609 36 34 2018-04-02 15:00:39.608422
2018-04-02 15:00:39.650899968+00:00 3 9884925952 41.142695 55 159186 36 62 2018-04-02 15:00:39.650900
2018-04-02 15:00:40.716825856+00:00 0 10113515520 42.094123 16 156730 37 21 2018-04-02 15:00:40.716826
2018-04-02 15:00:40.765632+00:00 1 9889120256 41.160152 0 129466 41 9 2018-04-02 15:00:40.765632
2018-04-02 15:00:40.807444992+00:00 2 9884925952 41.142695 75 157078 37 87 2018-04-02 15:00:40.807445
2018-04-02 15:00:40.849585920+00:00 3 9884925952 41.142695 77 147475 36 92 2018-04-02 15:00:40.849586

100 rows × 8 columns


In [ ]: