This notebook demonstrates how systematic analysis of tally scores is possible using Pandas dataframes. A dataframe can be automatically generated using the Tally.get_pandas_dataframe(...) method. Furthermore, by linking the tally data in a statepoint file with geometry and material information from a summary file, the dataframe can be shown with user-supplied labels.


In [1]:
import glob

from IPython.display import Image
import matplotlib.pyplot as plt
import scipy.stats
import numpy as np
import pandas as pd

import openmc
%matplotlib inline

Generate Input Files

First we need to define materials that will be used in the problem. We will create three materials for the fuel, water, and cladding of the fuel pin.


In [2]:
# 1.6 enriched fuel
fuel = openmc.Material(name='1.6% Fuel')
fuel.set_density('g/cm3', 10.31341)
fuel.add_nuclide('U235', 3.7503e-4)
fuel.add_nuclide('U238', 2.2625e-2)
fuel.add_nuclide('O16', 4.6007e-2)

# borated water
water = openmc.Material(name='Borated Water')
water.set_density('g/cm3', 0.740582)
water.add_nuclide('H1', 4.9457e-2)
water.add_nuclide('O16', 2.4732e-2)
water.add_nuclide('B10', 8.0042e-6)

# zircaloy
zircaloy = openmc.Material(name='Zircaloy')
zircaloy.set_density('g/cm3', 6.55)
zircaloy.add_nuclide('Zr90', 7.2758e-3)

With our three materials, we can now create a materials file object that can be exported to an actual XML file.


In [3]:
# Instantiate a Materials collection
materials_file = openmc.Materials([fuel, water, zircaloy])

# Export to "materials.xml"
materials_file.export_to_xml()

Now let's move on to the geometry. This problem will be a square array of fuel pins for which we can use OpenMC's lattice/universe feature. The basic universe will have three regions for the fuel, the clad, and the surrounding coolant. The first step is to create the bounding surfaces for fuel and clad, as well as the outer bounding surfaces of the problem.


In [4]:
# Create cylinders for the fuel and clad
fuel_outer_radius = openmc.ZCylinder(x0=0.0, y0=0.0, R=0.39218)
clad_outer_radius = openmc.ZCylinder(x0=0.0, y0=0.0, R=0.45720)

# Create boundary planes to surround the geometry
# Use both reflective and vacuum boundaries to make life interesting
min_x = openmc.XPlane(x0=-10.71, boundary_type='reflective')
max_x = openmc.XPlane(x0=+10.71, boundary_type='vacuum')
min_y = openmc.YPlane(y0=-10.71, boundary_type='vacuum')
max_y = openmc.YPlane(y0=+10.71, boundary_type='reflective')
min_z = openmc.ZPlane(z0=-10.71, boundary_type='reflective')
max_z = openmc.ZPlane(z0=+10.71, boundary_type='reflective')

With the surfaces defined, we can now construct a fuel pin cell from cells that are defined by intersections of half-spaces created by the surfaces.


In [5]:
# Create fuel Cell
fuel_cell = openmc.Cell(name='1.6% Fuel', fill=fuel,
                        region=-fuel_outer_radius)

# Create a clad Cell
clad_cell = openmc.Cell(name='1.6% Clad', fill=zircaloy)
clad_cell.region = +fuel_outer_radius & -clad_outer_radius

# Create a moderator Cell
moderator_cell = openmc.Cell(name='1.6% Moderator', fill=water,
                             region=+clad_outer_radius)

# Create a Universe to encapsulate a fuel pin
pin_cell_universe = openmc.Universe(name='1.6% Fuel Pin', cells=[
    fuel_cell, clad_cell, moderator_cell
])

Using the pin cell universe, we can construct a 17x17 rectangular lattice with a 1.26 cm pitch.


In [6]:
# Create fuel assembly Lattice
assembly = openmc.RectLattice(name='1.6% Fuel - 0BA')
assembly.pitch = (1.26, 1.26)
assembly.lower_left = [-1.26 * 17. / 2.0] * 2
assembly.universes = [[pin_cell_universe] * 17] * 17

OpenMC requires that there is a "root" universe. Let us create a root cell that is filled by the pin cell universe and then assign it to the root universe.


In [7]:
# Create root Cell
root_cell = openmc.Cell(name='root cell', fill=assembly)

# Add boundary planes
root_cell.region = +min_x & -max_x & +min_y & -max_y & +min_z & -max_z

# Create root Universe
root_universe = openmc.Universe(name='root universe')
root_universe.add_cell(root_cell)

We now must create a geometry that is assigned a root universe and export it to XML.


In [8]:
# Create Geometry and export to "geometry.xml"
geometry = openmc.Geometry(root_universe)
geometry.export_to_xml()

With the geometry and materials finished, we now just need to define simulation parameters. In this case, we will use 5 inactive batches and 15 minimum active batches each with 2500 particles. We also tell OpenMC to turn tally triggers on, which means it will keep running until some criterion on the uncertainty of tallies is reached.


In [9]:
# OpenMC simulation parameters
min_batches = 20
max_batches = 200
inactive = 5
particles = 2500

# Instantiate a Settings object
settings = openmc.Settings()
settings.batches = min_batches
settings.inactive = inactive
settings.particles = particles
settings.output = {'tallies': False}
settings.trigger_active = True
settings.trigger_max_batches = max_batches

# Create an initial uniform spatial source distribution over fissionable zones
bounds = [-10.71, -10.71, -10, 10.71, 10.71, 10.]
uniform_dist = openmc.stats.Box(bounds[:3], bounds[3:], only_fissionable=True)
settings.source = openmc.Source(space=uniform_dist)

# Export to "settings.xml"
settings.export_to_xml()

Let us also create a plot file that we can use to verify that our pin cell geometry was created successfully.


In [10]:
# Instantiate a Plot
plot = openmc.Plot(plot_id=1)
plot.filename = 'materials-xy'
plot.origin = [0, 0, 0]
plot.width = [21.5, 21.5]
plot.pixels = [250, 250]
plot.color_by = 'material'

# Instantiate a Plots collection and export to "plots.xml"
plot_file = openmc.Plots([plot])
plot_file.export_to_xml()

With the plots.xml file, we can now generate and view the plot. OpenMC outputs plots in .ppm format, which can be converted into a compressed format like .png with the convert utility.


In [11]:
# Run openmc in plotting mode
openmc.plot_geometry(output=False)

In [12]:
# Convert OpenMC's funky ppm to png
!convert materials-xy.ppm materials-xy.png

# Display the materials plot inline
Image(filename='materials-xy.png')


Out[12]:

As we can see from the plot, we have a nice array of pin cells with fuel, cladding, and water! Before we run our simulation, we need to tell the code what we want to tally. The following code shows how to create a variety of tallies.


In [13]:
# Instantiate an empty Tallies object
tallies = openmc.Tallies()

Instantiate a fission rate mesh Tally


In [14]:
# Instantiate a tally Mesh
mesh = openmc.Mesh(mesh_id=1)
mesh.type = 'regular'
mesh.dimension = [17, 17]
mesh.lower_left = [-10.71, -10.71]
mesh.width = [1.26, 1.26]

# Instantiate tally Filter
mesh_filter = openmc.MeshFilter(mesh)

# Instantiate energy Filter
energy_filter = openmc.EnergyFilter([0, 0.625, 20.0e6])

# Instantiate the Tally
tally = openmc.Tally(name='mesh tally')
tally.filters = [mesh_filter, energy_filter]
tally.scores = ['fission', 'nu-fission']

# Add mesh and Tally to Tallies
tallies.append(tally)

Instantiate a cell Tally with nuclides


In [15]:
# Instantiate tally Filter
cell_filter = openmc.CellFilter(fuel_cell)

# Instantiate the tally
tally = openmc.Tally(name='cell tally')
tally.filters = [cell_filter]
tally.scores = ['scatter']
tally.nuclides = ['U235', 'U238']

# Add mesh and tally to Tallies
tallies.append(tally)

Create a "distribcell" Tally. The distribcell filter allows us to tally multiple repeated instances of the same cell throughout the geometry.


In [16]:
# Instantiate tally Filter
distribcell_filter = openmc.DistribcellFilter(moderator_cell)

# Instantiate tally Trigger for kicks
trigger = openmc.Trigger(trigger_type='std_dev', threshold=5e-5)
trigger.scores = ['absorption']

# Instantiate the Tally
tally = openmc.Tally(name='distribcell tally')
tally.filters = [distribcell_filter]
tally.scores = ['absorption', 'scatter']
tally.triggers = [trigger]

# Add mesh and tally to Tallies
tallies.append(tally)

In [17]:
# Export to "tallies.xml"
tallies.export_to_xml()

Now we a have a complete set of inputs, so we can go ahead and run our simulation.


In [18]:
# Remove old HDF5 (summary, statepoint) files
!rm statepoint.*

# Run OpenMC!
openmc.run()


                               %%%%%%%%%%%%%%%
                          %%%%%%%%%%%%%%%%%%%%%%%%
                       %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                     %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
                                   %%%%%%%%%%%%%%%%%%%%%%%%
                                    %%%%%%%%%%%%%%%%%%%%%%%%
                ###############      %%%%%%%%%%%%%%%%%%%%%%%%
               ##################     %%%%%%%%%%%%%%%%%%%%%%%
               ###################     %%%%%%%%%%%%%%%%%%%%%%%
               ####################     %%%%%%%%%%%%%%%%%%%%%%
               #####################     %%%%%%%%%%%%%%%%%%%%%
               ######################     %%%%%%%%%%%%%%%%%%%%
               #######################     %%%%%%%%%%%%%%%%%%
                #######################     %%%%%%%%%%%%%%%%%
                ######################     %%%%%%%%%%%%%%%%%
                 ####################     %%%%%%%%%%%%%%%%%
                   #################     %%%%%%%%%%%%%%%%%
                    ###############     %%%%%%%%%%%%%%%%
                      ############     %%%%%%%%%%%%%%%
                         ########     %%%%%%%%%%%%%%
                                     %%%%%%%%%%%

                   | The OpenMC Monte Carlo Code
         Copyright | 2011-2018 MIT and OpenMC contributors
           License | http://openmc.readthedocs.io/en/latest/license.html
           Version | 0.10.0
          Git SHA1 | 199126b2fcc5cb094f2cc820ae13e1a972cacddd
         Date/Time | 2018-10-11 16:41:25
    OpenMP Threads | 8

 Reading settings XML file...
 Reading cross sections XML file...
 Reading materials XML file...
 Reading geometry XML file...
 Building neighboring cells lists for each surface...
 Reading U235 from /home/jan/openmc/nndc_hdf5/U235.h5
 Reading U238 from /home/jan/openmc/nndc_hdf5/U238.h5
 Reading O16 from /home/jan/openmc/nndc_hdf5/O16.h5
 Reading H1 from /home/jan/openmc/nndc_hdf5/H1.h5
 Reading B10 from /home/jan/openmc/nndc_hdf5/B10.h5
 Reading Zr90 from /home/jan/openmc/nndc_hdf5/Zr90.h5
 Maximum neutron transport energy: 2.00000E+07 eV for U235
 Reading tallies XML file...
 Writing summary.h5 file...
 Initializing source particles...

 ====================>     K EIGENVALUE SIMULATION     <====================

  Bat./Gen.      k            Average k         
  =========   ========   ====================   
        1/1    0.55921                       
        2/1    0.63816                       
        3/1    0.68834                       
        4/1    0.71192                       
        5/1    0.67935                       
        6/1    0.68254                       
        7/1    0.65804    0.67029 +/- 0.01225
        8/1    0.66225    0.66761 +/- 0.00756
        9/1    0.66336    0.66655 +/- 0.00545
       10/1    0.68037    0.66931 +/- 0.00505
       11/1    0.71728    0.67731 +/- 0.00899
       12/1    0.66098    0.67498 +/- 0.00795
       13/1    0.69969    0.67806 +/- 0.00755
       14/1    0.70998    0.68161 +/- 0.00754
       15/1    0.70092    0.68354 +/- 0.00702
       16/1    0.71586    0.68648 +/- 0.00699
       17/1    0.65949    0.68423 +/- 0.00677
       18/1    0.67696    0.68367 +/- 0.00625
       19/1    0.65444    0.68158 +/- 0.00615
       20/1    0.69766    0.68266 +/- 0.00583
 Triggers unsatisfied, max unc./thresh. is 1.17617 for absorption in tally 3
 The estimated number of batches is 26
 Creating state point statepoint.020.h5...
       21/1    0.64126    0.68007 +/- 0.00603
       22/1    0.69287    0.68082 +/- 0.00572
       23/1    0.70254    0.68203 +/- 0.00552
       24/1    0.68198    0.68203 +/- 0.00523
       25/1    0.67214    0.68153 +/- 0.00498
       26/1    0.68171    0.68154 +/- 0.00474
 Triggers satisfied for batch 26
 Creating state point statepoint.026.h5...

 =======================>     TIMING STATISTICS     <=======================

 Total time for initialization     =  6.5303E-01 seconds
   Reading cross sections          =  5.8105E-01 seconds
 Total time in simulation          =  4.6015E+00 seconds
   Time in transport only          =  3.5767E+00 seconds
   Time in inactive batches        =  4.5008E-01 seconds
   Time in active batches          =  4.1514E+00 seconds
   Time synchronizing fission bank =  2.3493E-03 seconds
     Sampling source sites         =  1.7160E-03 seconds
     SEND/RECV source sites        =  4.7010E-04 seconds
   Time accumulating tallies       =  2.3040E-04 seconds
 Total time for finalization       =  2.6451E-02 seconds
 Total time elapsed                =  5.3123E+00 seconds
 Calculation Rate (inactive)       =  27772.9 particles/second
 Calculation Rate (active)         =  12646.3 particles/second

 ============================>     RESULTS     <============================

 k-effective (Collision)     =  0.67976 +/-  0.00436
 k-effective (Track-length)  =  0.68154 +/-  0.00474
 k-effective (Absorption)    =  0.68320 +/-  0.00518
 Combined k-effective        =  0.68122 +/-  0.00432
 Leakage Fraction            =  0.34011 +/-  0.00283

Tally Data Processing


In [19]:
# We do not know how many batches were needed to satisfy the 
# tally trigger(s), so find the statepoint file(s)
statepoints = glob.glob('statepoint.*.h5')

# Load the last statepoint file
sp = openmc.StatePoint(statepoints[-1])

Analyze the mesh fission rate tally


In [20]:
# Find the mesh tally with the StatePoint API
tally = sp.get_tally(name='mesh tally')

# Print a little info about the mesh tally to the screen
print(tally)


Tally
	ID             =	1
	Name           =	mesh tally
	Filters        =	MeshFilter, EnergyFilter
	Nuclides       =	total 
	Scores         =	['fission', 'nu-fission']
	Estimator      =	tracklength

Use the new Tally data retrieval API with pure NumPy


In [21]:
# Get the relative error for the thermal fission reaction 
# rates in the four corner pins 
data = tally.get_values(scores=['fission'],
                        filters=[openmc.MeshFilter, openmc.EnergyFilter], \
                        filter_bins=[((1,1),(1,17), (17,1), (17,17)), \
                                    ((0., 0.625),)], value='rel_err')
print(data)


[[[0.17581417]]

 [[0.06842901]]

 [[0.30578219]]

 [[0.12436752]]]

In [22]:
# Get a pandas dataframe for the mesh tally data
df = tally.get_pandas_dataframe(nuclides=False)

# Set the Pandas float display settings
pd.options.display.float_format = '{:.2e}'.format

# Print the first twenty rows in the dataframe
df.head(20)


Out[22]:
mesh 1 energy low [eV] energy high [eV] score mean std. dev.
x y z
0 1 1 1 0.00e+00 6.25e-01 fission 2.24e-04 3.94e-05
1 1 1 1 0.00e+00 6.25e-01 nu-fission 5.46e-04 9.59e-05
2 1 1 1 6.25e-01 2.00e+07 fission 8.42e-05 6.79e-06
3 1 1 1 6.25e-01 2.00e+07 nu-fission 2.22e-04 1.65e-05
4 2 1 1 0.00e+00 6.25e-01 fission 1.85e-04 2.70e-05
5 2 1 1 0.00e+00 6.25e-01 nu-fission 4.52e-04 6.58e-05
6 2 1 1 6.25e-01 2.00e+07 fission 6.82e-05 5.29e-06
7 2 1 1 6.25e-01 2.00e+07 nu-fission 1.81e-04 1.35e-05
8 3 1 1 0.00e+00 6.25e-01 fission 2.05e-04 2.25e-05
9 3 1 1 0.00e+00 6.25e-01 nu-fission 5.00e-04 5.49e-05
10 3 1 1 6.25e-01 2.00e+07 fission 7.53e-05 7.06e-06
11 3 1 1 6.25e-01 2.00e+07 nu-fission 1.99e-04 1.81e-05
12 4 1 1 0.00e+00 6.25e-01 fission 2.06e-04 2.79e-05
13 4 1 1 0.00e+00 6.25e-01 nu-fission 5.03e-04 6.80e-05
14 4 1 1 6.25e-01 2.00e+07 fission 6.65e-05 3.91e-06
15 4 1 1 6.25e-01 2.00e+07 nu-fission 1.75e-04 1.04e-05
16 5 1 1 0.00e+00 6.25e-01 fission 2.03e-04 2.78e-05
17 5 1 1 0.00e+00 6.25e-01 nu-fission 4.94e-04 6.78e-05
18 5 1 1 6.25e-01 2.00e+07 fission 6.26e-05 5.71e-06
19 5 1 1 6.25e-01 2.00e+07 nu-fission 1.64e-04 1.53e-05

In [23]:
# Create a boxplot to view the distribution of
# fission and nu-fission rates in the pins
bp = df.boxplot(column='mean', by='score')



In [24]:
# Extract thermal nu-fission rates from pandas
fiss = df[df['score'] == 'nu-fission']
fiss = fiss[fiss['energy low [eV]'] == 0.0]

# Extract mean and reshape as 2D NumPy arrays
mean = fiss['mean'].values.reshape((17,17))

plt.imshow(mean, interpolation='nearest')
plt.title('fission rate')
plt.xlabel('x')
plt.ylabel('y')
plt.colorbar()


Out[24]:
<matplotlib.colorbar.Colorbar at 0x7f39196c0b00>

Analyze the cell+nuclides scatter-y2 rate tally


In [25]:
# Find the cell Tally with the StatePoint API
tally = sp.get_tally(name='cell tally')

# Print a little info about the cell tally to the screen
print(tally)


Tally
	ID             =	2
	Name           =	cell tally
	Filters        =	CellFilter
	Nuclides       =	U235 U238 
	Scores         =	['scatter']
	Estimator      =	tracklength


In [26]:
# Get a pandas dataframe for the cell tally data
df = tally.get_pandas_dataframe()

# Print the first twenty rows in the dataframe
df.head(20)


Out[26]:
cell nuclide score mean std. dev.
0 1 U235 scatter 3.81e-02 1.65e-04
1 1 U238 scatter 2.33e+00 9.59e-03

Use the new Tally data retrieval API with pure NumPy


In [27]:
# Get the standard deviations the total scattering rate
data = tally.get_values(scores=['scatter'], 
                        nuclides=['U238', 'U235'], value='std_dev')
print(data)


[[[0.00958717]
  [0.00016469]]]

Analyze the distribcell tally


In [28]:
# Find the distribcell Tally with the StatePoint API
tally = sp.get_tally(name='distribcell tally')

# Print a little info about the distribcell tally to the screen
print(tally)


Tally
	ID             =	3
	Name           =	distribcell tally
	Filters        =	DistribcellFilter
	Nuclides       =	total 
	Scores         =	['absorption', 'scatter']
	Estimator      =	tracklength

Use the new Tally data retrieval API with pure NumPy


In [29]:
# Get the relative error for the scattering reaction rates in
# the first 10 distribcell instances 
data = tally.get_values(scores=['scatter'], filters=[openmc.DistribcellFilter],
                        filter_bins=[tuple(range(10))], value='rel_err')
print(data)


[[[0.03500496]]

 [[0.02745568]]

 [[0.02988488]]

 [[0.04474905]]

 [[0.03697764]]

 [[0.0409214 ]]

 [[0.03366461]]

 [[0.03210393]]

 [[0.03216398]]

 [[0.04003553]]]

Print the distribcell tally dataframe


In [30]:
# Get a pandas dataframe for the distribcell tally data
df = tally.get_pandas_dataframe(nuclides=False)

# Print the last twenty rows in the dataframe
df.tail(20)


Out[30]:
level 1 level 2 level 3 distribcell score mean std. dev.
univ cell lat univ cell
id id id x y id id
558 3 4 2 7 16 1 3 279 absorption 6.81e-04 2.84e-05
559 3 4 2 7 16 1 3 279 scatter 8.82e-02 1.86e-03
560 3 4 2 8 16 1 3 280 absorption 6.65e-04 3.46e-05
561 3 4 2 8 16 1 3 280 scatter 8.37e-02 2.02e-03
562 3 4 2 9 16 1 3 281 absorption 5.61e-04 2.91e-05
563 3 4 2 9 16 1 3 281 scatter 7.52e-02 1.79e-03
564 3 4 2 10 16 1 3 282 absorption 4.77e-04 2.33e-05
565 3 4 2 10 16 1 3 282 scatter 6.68e-02 1.14e-03
566 3 4 2 11 16 1 3 283 absorption 4.64e-04 2.05e-05
567 3 4 2 11 16 1 3 283 scatter 6.20e-02 1.61e-03
568 3 4 2 12 16 1 3 284 absorption 4.44e-04 2.93e-05
569 3 4 2 12 16 1 3 284 scatter 5.47e-02 1.53e-03
570 3 4 2 13 16 1 3 285 absorption 3.67e-04 2.63e-05
571 3 4 2 13 16 1 3 285 scatter 4.68e-02 1.52e-03
572 3 4 2 14 16 1 3 286 absorption 2.76e-04 1.75e-05
573 3 4 2 14 16 1 3 286 scatter 3.81e-02 1.28e-03
574 3 4 2 15 16 1 3 287 absorption 2.08e-04 1.69e-05
575 3 4 2 15 16 1 3 287 scatter 2.85e-02 1.13e-03
576 3 4 2 16 16 1 3 288 absorption 1.32e-04 1.30e-05
577 3 4 2 16 16 1 3 288 scatter 1.86e-02 7.12e-04

In [31]:
# Show summary statistics for absorption distribcell tally data
absorption = df[df['score'] == 'absorption']
absorption[['mean', 'std. dev.']].dropna().describe()

# Note that the maximum standard deviation does indeed
# meet the 5e-5 threshold set by the tally trigger


Out[31]:
mean std. dev.
count 2.89e+02 2.89e+02
mean 4.17e-04 2.05e-05
std 2.42e-04 8.32e-06
min 2.27e-05 4.04e-06
25% 2.01e-04 1.40e-05
50% 4.00e-04 2.05e-05
75% 6.08e-04 2.60e-05
max 9.38e-04 4.27e-05

Perform a statistical test comparing the tally sample distributions for two categories of fuel pins.


In [32]:
# Extract tally data from pins in the pins divided along y=-x diagonal 
multi_index = ('level 2', 'lat',)
lower = df[df[multi_index + ('x',)] + df[multi_index + ('y',)] < 16]
upper = df[df[multi_index + ('x',)] + df[multi_index + ('y',)] > 16]
lower = lower[lower['score'] == 'absorption']
upper = upper[upper['score'] == 'absorption']

# Perform non-parametric Mann-Whitney U Test to see if the 
# absorption rates (may) come from same sampling distribution
u, p = scipy.stats.mannwhitneyu(lower['mean'], upper['mean'])
print('Mann-Whitney Test p-value: {0}'.format(p))


Mann-Whitney Test p-value: 0.3933685843661936

Note that the symmetry implied by the y=-x diagonal ensures that the two sampling distributions are identical. Indeed, as illustrated by the test above, for any reasonable significance level (e.g., $\alpha$=0.05) one would not reject the null hypothesis that the two sampling distributions are identical.

Next, perform the same test but with two groupings of pins which are not symmetrically identical to one another.


In [33]:
# Extract tally data from pins in the pins divided along y=x diagonal
multi_index = ('level 2', 'lat',)
lower = df[df[multi_index + ('x',)] > df[multi_index + ('y',)]]
upper = df[df[multi_index + ('x',)] < df[multi_index + ('y',)]]
lower = lower[lower['score'] == 'absorption']
upper = upper[upper['score'] == 'absorption']

# Perform non-parametric Mann-Whitney U Test to see if the 
# absorption rates (may) come from same sampling distribution
u, p = scipy.stats.mannwhitneyu(lower['mean'], upper['mean'])
print('Mann-Whitney Test p-value: {0}'.format(p))


Mann-Whitney Test p-value: 7.927841393301949e-42

Note that the asymmetry implied by the y=x diagonal ensures that the two sampling distributions are not identical. Indeed, as illustrated by the test above, for any reasonable significance level (e.g., $\alpha$=0.05) one would reject the null hypothesis that the two sampling distributions are identical.


In [34]:
# Extract the scatter tally data from pandas
scatter = df[df['score'] == 'scatter']

scatter['rel. err.'] = scatter['std. dev.'] / scatter['mean']

# Show a scatter plot of the mean vs. the std. dev.
scatter.plot(kind='scatter', x='mean', y='rel. err.', title='Scattering Rates')


/home/jan/.local/lib/python3.6/site-packages/ipykernel_launcher.py:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  after removing the cwd from sys.path.
Out[34]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f39195e74e0>

In [35]:
# Plot a histogram and kernel density estimate for the scattering rates
scatter['mean'].plot(kind='hist', bins=25)
scatter['mean'].plot(kind='kde')
plt.title('Scattering Rates')
plt.xlabel('Mean')
plt.legend(['KDE', 'Histogram'])


Out[35]:
<matplotlib.legend.Legend at 0x7f39195d7cf8>