Linear Sensitivity Examples

Copyright (C) 2014-2019 The BET Development Team

Here, we consider a simple example where a parameter space is given by a 5-dimensional hypercube and the goal is to choose an optimal QoI map from a space of possible QoI maps, denoted by $\mathcal{Q}$, where each QoI map is linear. We use this simple example to demonstrate the use of the code to optimally choose which possible QoI map does the best job of "scaling" inverse sets to smaller sets.

The idea is that if we generally consider a set of high probability in a particular data space defined by the range of a QoI map, we would prefer that the inverse of this set is as small as possible in order to try and identify the parameter responsible for the data. This only makes sense for stochastic inverse problems framed within the context of parameter identification under uncertainty.

In other words, when the problem is that the data are uncertain due to measurement uncertainty and there is a true/exact parameter responsible for whichever uncertain data is observed, then this is the type of problem for which this optimization criteria is most appropriate.

This set of examples generates uniform random samples in the unit n-dimensional hypercube and corresponding QoIs (data) generated by a linear map $Q$. We then calculate thegradients using an RBF scheme and use the gradient information to choose the optimal set of 2 (3, 4, ... input_dim) QoIs to use in the inverse problem.

Every real world problem requires special attention regarding how we choose optimal QoIs. This set of examples (examples/sensitivity/linear) covers some of the more common scenarios using easy to understand linear maps.


In [1]:
import numpy as np
import bet.sensitivity.gradients as grad
import bet.sensitivity.chooseQoIs as cqoi
import bet.calculateP.simpleFunP as simpleFunP
import bet.calculateP.calculateP as calculateP
import bet.postProcess.postTools as postTools
import bet.Comm as comm
import bet.sample as sample

Define Methods

The following executes code that is shared by all three linear examples, allowing us to avoid copy/pasting the same functions in the notebook.


In [2]:
def initialize_problem(input_dim, output_dim, num_samples=1E5, num_centers=10):
    # Let the map Q be a random matrix of size (output_dim, input_dim)
#     np.random.seed(0)
    Q = np.random.random([output_dim, input_dim])

    # Initialize some sample objects we will need
    input_samples = sample.sample_set(input_dim)
    output_samples = sample.sample_set(output_dim)

    # Choose random samples in parameter space to solve the model
    domain_min, domain_max = 0, 1
    input_samples.set_values(np.random.uniform(domain_min, domain_max, 
                                [np.int(num_samples), input_dim]))
    input_samples.set_domain(np.array([[domain_min, domain_max] 
                                for _ in range(input_dim)]))
    
    # Make the MC assumption and compute the volumes of each voronoi cell
    input_samples.estimate_volume_mc()


    # Compute the output values with the map Q
    output_samples.set_values(Q.dot(input_samples.get_values().transpose()).\
            transpose())

    # Calculate the gradient vectors at some subset of the samples.  Here the
    # *normalize* argument is set to *True* because we are using bin_ratio to
    # determine the uncertainty in our data.
    cluster_discretization = sample.discretization(input_samples, output_samples)
    # We will approximate the jacobian at each of the centers
    center_discretization = grad.calculate_gradients_rbf(cluster_discretization,
            num_centers, normalize=True)

    return input_samples, output_samples, center_discretization, Q



def solve_problem(my_discretization, Q_ref, QoI_indices, percentile = 1.0, measure=True):
    input_samples = my_discretization.get_input_sample_set()
    output_samples = my_discretization.get_output_sample_set()
    # Choose some QoI indices to solve the inverse problem with
    output_samples._dim = len(QoI_indices)
    output_samples.set_values(output_samples.get_values()[:, QoI_indices])
    
    # bin_ratio defines the uncertainty in our data
    # Define the level of uncertainty in the measured reference datum
    uncertainty = rect_scale = bin_ratio = 0.25

    # Make the MC assumption and compute the volumes of each voronoi cell
    input_samples.estimate_volume_mc()
    
    # Find the simple function approximation
    if measure:
        simpleFunP.regular_partition_uniform_distribution_rectangle_size(
            data_set=my_discretization, Q_ref=Q_ref, rect_size=uncertainty,
            cells_per_dimension=1)
    else:
        simpleFunP.regular_partition_uniform_distribution_rectangle_scaled(
            data_set=my_discretization, Q_ref=Q_ref, rect_scale=uncertainty,
            cells_per_dimension=1)
    
    
    # Calculate probabilities making the Monte Carlo assumption
    calculateP.prob(my_discretization)
    
    # Sort samples by highest probability density and find how many samples lie in
    # the support of the inverse solution.  With the Monte Carlo assumption, this
    # also tells us the approximate volume of this support.
    (num_samples, _, indices_in_inverse) =\
        postTools.sample_highest_prob(top_percentile=percentile,
        sample_set=input_samples, sort=True)
    
    # Print the approximate percentage of the measure of the parameter space defined
    # by the support of the inverse density
    if comm.rank == 0:
        print('The approximate percentage of the measure of the parameter space defined')
        print('by the support of the inverse density associated with the choice of QoI map is')
        print('%2.4f%%  with '%(100*np.sum(input_samples.get_volumes()[indices_in_inverse])), 
              num_samples, ' samples.')

    return num_samples, indices_in_inverse

Suggested Changes

Example 1: linear_measure_binsize_large.py

Objective: achieve the smallest support of the inverse solution, assuming we define the uncertainty in our data to be fixed, i.e., independent of the range of data measured for each QoI (bin_size).

  • independent_error = True
  • measure = True
  • (optional): set output_dim = 100 to leverage keyword arguments that optimize computations.

Example 2: linear_measure_binratio.py

Objective: achieve the smallest support of the inverse solution, assuming we define the uncertainty in our data to be relative to the range of the data for each QoI (bin_ratio).

  • independent_error = False
  • measure = True

Example 3: linear_skewness_binratio.py

Objective: optimal skewness properties which will yield an inverse solution that can be approximated well on the implicitly-defined Borel sets (Voronoi cells) that constitute parameter space. The uncertainty in our data is relative to the range of data measured in each QoI (bin_ratio), but can be changed with

  • independent_error = False
  • measure = False By optimizing for our ability to approximate sets in our parameter space, we can expect much less precision on average in the solution to a given inverse problem.


In [3]:
############ MAKE SELECTION ############
independent_error = True # is the uncertainty in the data independent of the range of the data?
measure = True # if True, optimize w/r/t the size of the inverse set (expected scaling effect)
########################################

# Set up the info for the spaces
num_samples = 1E5
num_centers = 10

# feel free to change the following, but ideally, keep input_dim <= output_dim
input_dim = 5
output_dim = 10

In [4]:
np.random.seed(0) # (optional) set seed for repeatable results.
input_samples, output_samples, center_discretization, Q = \
        initialize_problem(input_dim, output_dim, num_samples, num_centers)

With these gradient vectors, we are now ready to choose an optimal set of QoIs to use in the inverse problem, based on minimizing the support of the inverse solution (measure).

The most robust method for this is bet.sensitivity.chooseQoIs.chooseOptQoIs_large which returns the best set of 2, (3, 4 ... until input_dim). This method returns a list of matrices. Each matrix has 10 rows, the first column representing the expected inverse measure ratio, and the rest of the columns the corresponding QoI indices.


In [5]:
input_samples_center = center_discretization.get_input_sample_set()

num_best_sets = 3 # what is the worst-ranked option you want to investigate?

if output_dim > 50: # optional tolerances for large problems (output space dimension)
    best_sets = cqoi.chooseOptQoIs_large(input_samples_center, measure=measure,
                            max_qois_return=5, num_optsets_return=num_best_sets, 
                            inner_prod_tol=0.9, measskew_tol=1E2)
else:
    best_sets = cqoi.chooseOptQoIs_large(input_samples_center, measure=measure, 
                            num_optsets_return=num_best_sets)

for i in range(num_best_sets):
    print(best_sets[i], '\n')


[[4.27257026 3.         6.        ]
 [4.34105947 3.         4.        ]
 [4.43961888 3.         9.        ]] 

[[11.55460603  3.          6.          9.        ]
 [14.26945967  3.          6.          8.        ]
 [14.37424713  6.          8.          9.        ]] 

[[41.39727768  3.          6.          8.          9.        ]
 [59.10273955  2.          3.          8.          9.        ]
 [64.74186687  3.          4.          8.          9.        ]] 

The number in the first column represents the expected volume of the inverse image of a unit hypercube in the data space if measure=True, and it is the expected skewness if measure=False.

With the independent_error definition of the uncertainty in the data, here we expect to see inverse solutions that have a smaller support (expected inverse measure ratio < 1 or 2) than the original volume of the hypercube in the data space (which we nominally set to 0.25 in solve_problem above... you are welcome to change it).

This interpretation of the expected volume ratios is only valid for inverting from a data space that has the same dimensions as the parameter space. When inverting into a higher dimensional space, this expected volume ratio is the expected volume of the cross section of the inverse solution.


At this point we have determined the optimal set of QoIs to use in the inverse problem.
Now we compare the support of the inverse solution using different sets of these QoIs. We set Q_ref to correspond to the output from the parameter taken to be in the center of the parameter space. We choose the set of QoIs to consider below, both how many and how optimal:


In [6]:
############ MAKE SELECTION ############
num_qoi = 2 # select the number of quantities of interest
ranking_selection = 1 # select your choice (1st, 2nd, 3rd) best (start at 1)
########################################

QoI_indices = best_sets[num_qoi-2][ranking_selection-1, 1:].astype(int) # Chooses the optimal set of 2 QoI
print("Your QoI sub-indices selection: ", QoI_indices)


Your QoI sub-indices selection:  [3 6]

In [7]:
# Create discretization object and solve problem
my_discretization = sample.discretization(input_sample_set=input_samples,
                                        output_sample_set=output_samples)

# Define the reference point in the output space to correspond to the center of the input space.
param_ref = 0.5 * np.ones(input_dim)
Q_ref = Q[QoI_indices, :].dot(param_ref)

num_samples, indices_in_inverse = solve_problem(my_discretization, Q_ref, QoI_indices, 
                                                measure=measure, percentile=1.0)


The approximate percentage of the measure of the parameter space defined
by the support of the inverse density associated with the choice of QoI map is
7.1840%  with  7184  samples.

In [8]:
%store my_discretization
%store param_ref
%store Q_ref


Stored 'my_discretization' (discretization)
Stored 'param_ref' (ndarray)
Stored 'Q_ref' (ndarray)