Turbulucid tutorial

Opening the dataset

In this tutorial you will learn how to use turbulucid to analyse and plot data. We will use a dataset from a computational fluid dynamics simulation of a flow over a backward-facing step (BFS) as an example to learn about the different functionalities of the package!

Lets start by doing some imports and making figures a bit larger than the default size.


In [1]:
%matplotlib inline

In [2]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from turbulucid import *

matplotlib.rcParams['figure.figsize'] = (12, 6)

The most important class defined by turbulucid is Case. A Case object contains the dataset that is being analysed and several conenience function for data extraction. Opening a dataset therefore amounts to constructing a Case object, which requires a path to the file with the data. The file can be either VTK polyData in the legacy format (.vtk), VTK polyData in XML format (.vtu) or a multiblock VTK dataset (.vtm) where one block corresponds to the internal data and the other blocks are boundaries. Let us open the BFS dataset stored as a .vtm.


In [3]:
from os.path import join
import turbulucid
case = Case(join(turbulucid.__path__[0], "datasets", "bfs", "bfs.vtm"))

The dataset is now opened. Let us inspect some of its properties. The fields attribute returns a list of names of the CellData arrays that exist in the dataset.


In [4]:
print(case.fields)


['UMean', 'pMean']

We can access each field by using the Case object as a dictionary, i.e Case['UMean'] will return the values of UMean as a numpy array.


In [5]:
case['UMean']


Out[5]:
array([[-3.3928135e-03,  3.7426277e-04, -2.8111454e-04],
       [-5.8059758e-03, -3.6385437e-04, -9.2573313e-04],
       [-4.2166258e-03, -2.6553916e-03, -1.6405234e-03],
       ...,
       [ 7.6765442e+00, -1.6946964e-02,  5.3007121e-10],
       [ 7.6751127e+00, -1.0513628e-02,  2.9178990e-10],
       [ 7.6743207e+00, -3.5870713e-03,  1.9597464e-10]], dtype=float32)

The order of the data corresponds to the order of the cells in the dataset. This doesn't necessarily follow any logical pattern. However, it is possible to get the location of each cell centre via Case.cellCentres.


In [6]:
case.cellCentres


Out[6]:
array([[ 0.0002215 , -0.00951875],
       [ 0.0002215 , -0.00934245],
       [ 0.0002215 , -0.00913619],
       ...,
       [-0.0002215 ,  0.04282301],
       [-0.0002215 ,  0.04479957],
       [-0.0002215 ,  0.04691013]], dtype=float32)

The order in the cellCentres array is the same as that of the data for each field. This means that it is possible, for intance, to find the value of field at a certian location by first finding the closest cell centre and the grabbing the field's value at the same index.

It is possible to add additional fields to the case. Let us create a new field -- the magnitude of the mean velocity UMean.


In [7]:
case["magUMean"] = np.linalg.norm(case["UMean"], axis=1)
case["magUMean"]


C:\ProgramData\Anaconda3\lib\site-packages\vtk\util\numpy_support.py:137: FutureWarning: Conversion of the second argument of issubdtype from `complex` to `np.complexfloating` is deprecated. In future, it will be treated as `np.complex128 == np.dtype(complex).type`.
  assert not numpy.issubdtype(z.dtype, complex), \
Out[7]:
array([3.42494971e-03, 5.89056220e-03, 5.24617545e-03, ...,
       7.67656279e+00, 7.67511988e+00, 7.67432165e+00])

Plotting a field

It is time to make a plot! Let us plot the newly created field. This can be done by using the plot_field function. The first argument to the function is a Case object and the second is the field we want to plot.


In [8]:
plot_field(case, "magUMean")


Out[8]:
<matplotlib.collections.PatchCollection at 0x2844f892208>

Voila! Note that the returned object of plot_field is a PatchCollection. This means that by assigning plot_field to a variable you can further customize the figure. Also, you can pass additional keyword arguments to plot_field that will be used in the PatchCollection constructor. This allows for a lot of customization. Let us, for instance, use a different colormap.


In [9]:
plot_field(case, 'magUMean', cmap='Blues')


Out[9]:
<matplotlib.collections.PatchCollection at 0x28453a5f2e8>

Sometimes it is nice to scale the axes by a certain parameter. For the BFS it is common to scale the axes with the height of the step h. To this end plot_field and all the other plot functions in turbulucid accept the keyword arguments scaleX and scaleY.


In [10]:
h = 0.01
plot_field(case, 'magUMean', scaleX=h, scaleY=h, cmap='Blues')
plt.xlabel(r'$x/h$')
plt.ylabel(r'$y/h$')


Out[10]:
Text(0,0.5,'$y/h$')

Don't forget that all function is turbulucid have extensive doctrings! Use them to get more information.


In [11]:
?plot_field()


Signature: plot_field(case, field, scaleX=1, scaleY=1, plotBoundaries=True, colorbar=True, **kwargs)
Docstring:
Plot a field.

This function uses a matplotlib PatchCollection to compose the
plot. Additional customization parameters can be passed to the
constructor of the PatchCollection via kwargs. In particular,
cmap can be used to set the colormap and edgecolor to color the
edges of the cells.

Parameters
----------
case : Case
    The case that the vector field used for the streamlines belongs
    to.
field : str or ndarray
    The scalar field that will be plotted. Either a  string with
    the name of the field as found in the case or an ndarray with
    the data.
scaleX : float, optional
    A scaling factor for the abscissa.
scaleY : float, optional
    A scaling factor for the ordinate.
plotBoundaries : bool, optional
    Whether to plot the boundary of the geometry as a black line.
colorbar : bool, optional
    Whether to add a vertical colorbar to the right of the plot.
**kwargs
    Additional arguments to be passed to PatchCollection constructor.

Raises
------
TypeError
    If field is neither a string or and ndarray.
ValueError
    If the field to be plotted has more dimensions than one.
    If one or both scaling factors are non-positive.

Returns
-------
PatchCollection
    The collection of polygons defining the cells.
File:      c:\users\timofey mukha\documents\turbulucid\turbulucid\core\plotting.py
Type:      function

Plotting vectors

Let us now make a vector plot. This can be done using the plot_vectors() function. We need to pass the Case object and either the name of a vector field present in the case or an array with several column-components. This alternative is implemented for all the plot functions, including plot_field that we looked at above.


In [12]:
a = plot_vectors(case, 'UMean')


This is not very satisfying. Creating a good vector plot usually needs some tweeking of additional keyword arguments. By default the length of the arrows is proportional to the magnitude of the vector, but this can be changed by the normalize keyword.


In [13]:
plot_vectors(case, 'UMean', normalize=True)


Out[13]:
<matplotlib.quiver.Quiver at 0x2844d4e8320>

This, perhaps, looks slightly better, but the arrows are obviously too small. This can be controlled by the scale keyword argument. The behaviour of scale is not intuitive: the larger the value of scale the smaller the vectors. This is how the parameter behaves in pyplot.quiver, which is the function that plot_vectors uses to create the plot, so it was decided to leave that as it is. Typically several iterations are needed to get a good value for scale.

Let us make the arrows larger and zoom in on the recirculation zone. Additionally, let us scale the axes.


In [14]:
plot_vectors(case, 'UMean', normalize=True, scaleX=h, scaleY=h, scale=60)
plt.xlim([-1, 6])
plt.ylim([-1, 1])


Out[14]:
(-1, 1)

OK, this is looking better, we can clearly see two recirculation boubles. The issue is, however, that there are too much arrows. By default, an arrow will be drawn at each cell center, which means the density of arrows is proportional to the density of the mesh. To avoid this, plot_vectors supports another mechanism for positioning the arrows. Instead of using each cell center it is possible to use a Cartesian grid of prescribed resolution to sample the values of the vector field and then plot arrows only at the locations of the nodes of the grid. To do this the sampleByPlane keyword argument should be set to True and the keyword argument planeResolution should provide the amount of nodes in the Cartesian grid in x and y, respectively.

To illustrate the concept let us use a grid with 10 rows and 20 columns.


In [15]:
plot_vectors(case, 'UMean', scaleX=h, scaleY=h, sampleByPlane=True, planeResolution=[10, 20])


Out[15]:
<matplotlib.quiver.Quiver at 0x284540da860>

OK, let's now increase the resolution of the mesh, tweak scale, and zoom in on the recirculation zone.


In [16]:
plot_vectors(case, 'UMean', scaleX=h, scaleY=h, normalize=True, sampleByPlane=True, planeResolution=[70, 150], scale=50)
plt.xlim([-1, 6])
plt.ylim([-1, 1])


Out[16]:
(-1, 1)

Looking good! Let us finally add some color to the arrows. One possibility is to use a single color for all the arrows, this can be done using the color keyword argument, similar to how this is done in e.g. plt.plot() and similar function. It is also possible to color the arrows by the value of field present in a case. In this case, the colorField argument should be used and set to the name of the scalar field that is to be used for coloring. You can also pass an array of values directly, but it is somewhat tricky, when sampling by plane is also used, since the array size and form has to comply to that of the sampling plane. Let us color the vectors using the velocity magnitude field.


In [17]:
plot_vectors(case, 'UMean', colorField='magUMean', scaleX=h, scaleY=h, normalize=True, sampleByPlane=True, planeResolution=[70,150], scale=50)
plt.xlim([-1, 6])
plt.ylim([-1, 1])


Out[17]:
(-1, 1)

Plotting streamlines

Let us now plot some streamlines. The corresponding function is called plot_streamlines. As with the other plotting functions, we need to pass a Case object and either a field name or an array.


In [18]:
plot_streamlines(case, 'UMean')


Out[18]:
<matplotlib.streamplot.StreamplotSet at 0x2844a79e470>

This looks quite nice already. Under the hood the function uses pyplot.streamplot. Unfortunately, pyplot.streamplot does not work with unstructured datasets, and requires data stored on a Cartesian grid. For this reason, plot_streamlines always uses the approach used by plot_vectors when the sampleByPlane keyword argrument is set to True. The plane resolution is set by default to be 50 rows and 50 columns. This, of course, can be adjusted.

As with the vectors, we can add color.


In [19]:
plot_streamlines(case, 'UMean', colorField='magUMean', planeResolution=[300,300], scaleX=h, scaleY=h)


Quite a lot of cutomisation is possible by passing additional keyword arguments, for instance the locations of the seed points for the streamlines. Overall, however, pyplot.streamplot is not the ideal function to rely on and in the future it will probably be attempted to use VTK to compute the streamlines.

Auxillary plotting functions

Two additionaly plotting functions are probided by the library. The first one, plot_boundary simply plots the boundary of the domain.


In [20]:
plot_boundaries(case, color="Red")


Out[20]:
<matplotlib.collections.LineCollection at 0x28453933f28>

The second, add_colorbar, adds a colorbar to the plot. This can be useful if more control over the colorbar is needed. By default plot_field adds a colorbar on its own, but this can be avoided by setting the colorbar keyword argument to False.

Extracting a profile along a line

As it was mentioned in the beginning, the most general way of extracting values from specific locations is working with the cellCentres attribute of the Case class. However, turbulucid defines several functions that cover commonly needed data extraction procedures. One of the is profile_along_line which alows to extract a linear profile between two points in the dataset.


In [21]:
point1 = (-0.05, -1)
point2 = (-0.05, 1)
points, data = profile_along_line(case, point1, point2)

Let us explore what has been returned by the function. Firstly, note that we have two returned objects. The first one, assigned to points is a 1D array which holds the locations, along the line, where the sampled data is located.


In [22]:
points


Out[22]:
array([1.        , 1.00007855, 1.00024034, 1.00041176, 1.00059338,
       1.00078581, 1.00098969, 1.00120571, 1.00143459, 1.0016771 ,
       1.00193404, 1.00220627, 1.00249471, 1.00280032, 1.00312413,
       1.0034672 , 1.0038307 , 1.00421584, 1.0046239 , 1.00505625,
       1.00551434, 1.00600127, 1.00651125, 1.00703631, 1.00757691,
       1.00813349, 1.00870654, 1.00929655, 1.009904  , 1.01052942,
       1.01117335, 1.0118723 , 1.0126278 , 1.01340544, 1.01420587,
       1.01502975, 1.01587777, 1.01675064, 1.01764909, 1.01857387,
       1.01952574, 1.02050551, 1.02151398, 1.02255201, 1.02362045,
       1.0247202 , 1.02585218, 1.02701732, 1.02821661, 1.02945103,
       1.03072163, 1.03202946, 1.03337562, 1.03476122, 1.03618742,
       1.03765541, 1.03923836, 1.04097193, 1.04282301, 1.04479957,
       1.04691013, 1.048     ])

Note that the first value is 1. This reflects the fact that the line was defined as a vertical segment between y=-1 and y=1, and the first datapoint we have is located at y=0, which then corresponds to a distance of 1 in y from point1. Quite often it is convenient to subtract the length of the segment, where no data resides, so that the first coordinate is always 0. This can be done with the correctDistance keyword.


In [23]:
points, data = profile_along_line(case, point1, point2, correctDistance=True)
points


Out[23]:
array([0.00000000e+00, 7.85546436e-05, 2.40339985e-04, 4.11755755e-04,
       5.93375240e-04, 7.85805867e-04, 9.89691005e-04, 1.20571279e-03,
       1.43459346e-03, 1.67709845e-03, 1.93403871e-03, 2.20627384e-03,
       2.49471376e-03, 2.80032353e-03, 3.12412530e-03, 3.46720126e-03,
       3.83069925e-03, 4.21583513e-03, 4.62389644e-03, 5.05624805e-03,
       5.51433582e-03, 6.00127224e-03, 6.51125005e-03, 7.03631341e-03,
       7.57690798e-03, 8.13349430e-03, 8.70654453e-03, 9.29654576e-03,
       9.90399905e-03, 1.05294222e-02, 1.11733451e-02, 1.18722972e-02,
       1.26277991e-02, 1.34054404e-02, 1.42058674e-02, 1.50297489e-02,
       1.58777721e-02, 1.67506449e-02, 1.76490936e-02, 1.85738690e-02,
       1.95257440e-02, 2.05055103e-02, 2.15139836e-02, 2.25520097e-02,
       2.36204527e-02, 2.47202031e-02, 2.58521773e-02, 2.70173214e-02,
       2.82166079e-02, 2.94510350e-02, 3.07216346e-02, 3.20294648e-02,
       3.33756208e-02, 3.47612165e-02, 3.61874178e-02, 3.76554057e-02,
       3.92383635e-02, 4.09719273e-02, 4.28230055e-02, 4.47995700e-02,
       4.69101258e-02, 4.80000004e-02])

Let us now consider the second returned object. It is a dictionary with the same keys as the Case object, i.e the names of the fields present in the case. Similarly, the values for each key is the values of the corresponding field at locations along the line.


In [24]:
print(data.keys())
data['magUMean']


dict_keys(['UMean', 'pMean', 'magUMean'])
Out[24]:
array([0.78923261, 0.78923261, 2.29034233, 3.43619037, 4.1814723 ,
       4.67278194, 5.01791286, 5.27699232, 5.4826827 , 5.6540904 ,
       5.8026371 , 5.93444586, 6.05443668, 6.16568756, 6.27085638,
       6.37193489, 6.46978283, 6.56556988, 6.65970755, 6.75241089,
       6.84401321, 6.93536472, 7.02430773, 7.10863686, 7.18908167,
       7.26502657, 7.33669472, 7.40459585, 7.46730995, 7.52374792,
       7.57336426, 7.61656809, 7.65741873, 7.69035673, 7.72397327,
       7.74356174, 7.74349022, 7.73991108, 7.7373333 , 7.73548031,
       7.7341218 , 7.73295736, 7.73194599, 7.73101282, 7.73014021,
       7.72931767, 7.72853041, 7.72778225, 7.72706652, 7.72638702,
       7.72573185, 7.72511435, 7.72454262, 7.7239995 , 7.7235074 ,
       7.72304535, 7.72263718, 7.72225523, 7.72194338, 7.72171354,
       7.72159624, 7.43818283])

Now that we have the data we can use Python's plotting routines to make line-plots.


In [25]:
plt.plot(points, data['pMean'])


Out[25]:
[<matplotlib.lines.Line2D at 0x28450076358>]

Combining with ploting function we can do more fancy things


In [26]:
plot_boundaries(case, scaleX=h, scaleY=h)
Uref = data['UMean'][-1 ,0]

point1, point2 = (-0.05, 0), (-0.05, 1)
points, data = profile_along_line(case, point1, point2)
plt.plot(data['UMean'][:,0]/Uref - 0.05/h, points/h, 'C1')

point1, point2 = (0, 0), (0, 1)
points, data = profile_along_line(case, point1, point2)
plt.plot(data['UMean'][:,0]/Uref, points/h, 'C1')

point1, point2 = (0.1, -h), (0.1, 1)
points, data = profile_along_line(case, point1, point2)
plt.plot(data['UMean'][:,0]/Uref + 0.1/h, points/h-1, 'C1')


Out[26]:
[<matplotlib.lines.Line2D at 0x2845016fe80>]

Extracting boundary data

As it was briefly mentioned in the beginning, turbulucid supports reading data from a multiblock dataset (.vtm), where one block corresponds to the internal field and the other corresponds to defined named boudaries. Functions are provided for accessing both the data on the boundaries, but also from internal cells which are on the boundary, i.e. have one edge that is part of the boundary.

If a .vtk or .vtu file is read, one boundary, called boundary, is defined. Too see what boundaries are present the boundaries attribute of the Case object can be used.


In [27]:
case.boundaries


Out[27]:
['inletB', 'outletB', 'stepB', 'lowB1', 'lowB2', 'upB']

Too access data on the boundary, the boundary_data memeber function of the Case object can be used. The name of the boundary is passed as argument.


In [28]:
points, data = case.boundary_data('inletB')

The structure of the returned values is similar to profile_along_line. The first returned object is an array, defining the location of the edge centres of the boundary edges.


In [29]:
points[:10, :]


Out[29]:
array([[-9.6000001e-02,  7.8554644e-05],
       [-9.6000001e-02,  2.4033998e-04],
       [-9.6000001e-02,  4.1175575e-04],
       [-9.6000001e-02,  5.9337524e-04],
       [-9.6000001e-02,  7.8580587e-04],
       [-9.6000001e-02,  9.8969101e-04],
       [-9.6000001e-02,  1.2057128e-03],
       [-9.6000001e-02,  1.4345935e-03],
       [-9.6000001e-02,  1.6770985e-03],
       [-9.6000001e-02,  1.9340387e-03]], dtype=float32)

The second returned object is the dictionary with the data.

Similarly the boundary_cell_data member function returns the values in the boundary-adjacent cells. The order of the values is the same is that returned by boundary_data.

Another function in turbulucid, dist can be used to compute the distance between the cell centers of the boundary-adjacent cells and a given boundary.


In [30]:
dist(case, 'stepB')


C:\ProgramData\Anaconda3\lib\site-packages\vtk\util\numpy_support.py:137: FutureWarning: Conversion of the second argument of issubdtype from `complex` to `np.complexfloating` is deprecated. In future, it will be treated as `np.complex128 == np.dtype(complex).type`.
  assert not numpy.issubdtype(z.dtype, complex), \
Out[30]:
array([0.0002215, 0.0002215, 0.0002215, 0.0002215, 0.0002215, 0.0002215,
       0.0002215, 0.0002215, 0.0002215, 0.0002215, 0.0002215, 0.0002215,
       0.0002215, 0.0002215, 0.0002215, 0.0002215, 0.0002215, 0.0002215,
       0.0002215, 0.0002215, 0.0002215, 0.0002215, 0.0002215, 0.0002215,
       0.0002215, 0.0002215])

In this case all the distances are the same due to the geometry being meshed with rectangles.

This covers most of the functionality in turbulucid! More things are available, to explore them look at the code reference!