ServerSim Overview and Tutorial

Introduction

This is an overview and tutorial about ServerSim, a framework for the creation of discrete event simulation models to analyze the performance, throughput, and scalability of services deployed on computer servers.

Following the overview of ServerSim, we will provide an example and tutorial of its use for the comparison of two major service deployment patterns.

This document is a Jupyter notebook. See http://jupyter.org/ for more information on Jupyter notebooks.

ServerSim Core Concepts

ServerSim is a small framework based on SimPy, a well-known discrete-event simulation framework written in the Python language. The reader should hav at least a cursory familiarity with Python and SimPy (https://simpy.readthedocs.io/en/latest/contents.html) in order to make the most of this document.

Python is well-suited to this kind of application due to its rapid development dynamic language characteristics and the availability of powerful libraries relevant for this kind of work. In addition to SimPy, we will use portions of SciPy, a powerful set of libraries for efficient data analysis and visualization that includes Matplotlib, which will be used for plotting graphs in our tutorial.

ServerSim consists of a several classes and utilities. The main classes are described below.

class Server

Represents a server -- physical, VM, or container, with a predetermined computation capacity. A server can execute arbitrary service request types. The computation capacity of a server is represented in terms of a number of hardware threads and a total speed number (computation units processed per unit of time). The total speed is equally apportioned among the hardware threads, to give the speed per hardware thread. A server also has a number of associated software threads (which must be no smaller than the number of hardware threads). Software threads are relevant for blocking computations only.

The simulations in this document assume non-blocking services, so the software threads will not be of consequence in the tutorial example.

Attributes:

env: SimPy Environment, used to start and end simulations, and used internally by SimPy to control simulation events.
max_concurrency: The maximum of _hardware threads for the server.
num_threads: The maximum number of software threads for the server.
speed: Aggregate server speed across all _hardware threads.
name: The server's name.
hw_svc_req_log: If not None, a list where hardware service requests will be logged. Each log entry is a triple ("hw", name, svc_req), where name is this server's name and svc_req is the current service request asking for hardware resources.
sw_svc_req_log: If not None, a list where software thread service requests will be logged. Each log entry is a triple ("sw", name, svc_req), where name is this server's name and svc_req is the current service request asking for a software thread.

class SvcRequest

A request for execution of computation units on one or more servers.

A service request submission is implemented as a SimPy Process.

A service request can be a composite of sub-requests. A service request that is part of a composite has an attribute that is a reference to its parent service request. In addition, such a composition is implemented through the gen attribute of the service request, which is a generator. That generator can yield service request submissions for other service requests.

By default, a service request is non-blocking, i.e., a thread is held on the target server only while the service request itself is executing; the thread is relinquished when the request finishes executing and it passes control to its sub-requests. However, blocking service requests can be modeled as well (see the Blkg class).

Attributes:

env: The SimPy Environment.
parent: The immediately containing service request, in case this is part of a composite service request. None othersise.
svc_name: Name of the service this request is associated with.
gen: Generator which defines the behavior of this request. The generator produces an iterator which yields simpy.Event instances. The submit() method wratps the iterator in a simpy.Process object to schedule the request for execution by SimPy..
server: The target server. May be None for composite service requests, i.e., those not produced by CoreSvcRequester.
in_val: Optional input value of the request.
in_blocking_call: Indicates whether this request is in the scope of a blocking call. When this parameter is true, the service request will hold a software thread on the target server while the service request itself and any of its sub-requests (calls to other servers) are executing. Otherwise, the call is non-blocking, so a thread is held on the target server only while the service request itself is executing; the thread is relinquished when this request finishes executing and it passes control to its sub-requests.
out_val: Output value produced from in_val by the service execution. None by default.
id: The unique numerical ID of this request.
time_log: List of tag-time pairs representing significant occurrences for this request.
time_dict: Dictionary with contents of time_log, for easier access to information.

class SvcRequester

Base class of service requesters.

A service requester represents a service. In this framework, a service requester is a factory for service requests. "Deploying" a service on a server is modeled by having service requests produced by the service requester sent to the target server.

A service requester can be a composite of sub-requesters, thus representing a composite service.

Attributes:

env: The SimPy Environment.
svc_name: Name of the service.
log: Optional list to collect all service request objects produced by this service requester.

class UserGroup

Represents a set of identical users or clients that submit service requests.

Each user repeatedly submits service requests produced by service requesters randomly selected from the set of service requesters specified for the group.

Attributes:

env: The Simpy Environment.
num_users: Number of users in group. This can be either a positive integer or a sequence of (float, int), where the floats are monotonically increasing. In this case, the sequence represents a step function of time, where each pair represents a step whose range of x values extend from the first component of the pair (inclusive) to the first component of the next pair (exclusive), and whose y value is the second component of the pair. The first pair in the sequence must have 0 as its first component. If the num_users argument is an int, it is transformed into the list [(0, num_users)].
name: This user group's name.
weighted_svcs: List of pairs of SvcRequester instances and positive numbers representing the different service request types issued by the users in the group and their weights. The weights are the relative frequencies with which the service requesters will be executed (the weights do not need to add up to 1, as they are normalized by this class).
min_think_time: The minimum think time between service requests from a user. Think time will be uniformly distributed between min_think_time and max_think_time.
max_think_time: The maximum think time between service requests from a user. Think time will be uniformly distributed between min_think_time and max_think_time.
quantiles: List of quantiles to be tallied. It defaults to [0.5, 0.95, 0.99] if not provided.
svc_req_log: If not None, a sequence where service requests will be logged. Each log entry is a pair (name, svc_req), where name is this group's name and svc_req is the current service request generated by this group.
svcs: The first components of weighted_svcs.

class CoreSvcRequester(SvcRequester)

This is the core service requester implementation that interacts with servers to utilize server resources.

All service requesters are either instances of this class or composites of such instances created using the various service requester combinators in this module

Attributes:

env: See base class.
svc_name: See base class.
fcompunits: A (possibly randodm) function that generates the number of compute units required to execute a service request instance produced by this object.
fserver: Function that produces a server (possibly round-robin, random, or based on server load information) when given a service request name. Models a load-balancer.
log: See base class.
f: An optional function that is applied to a service request's in_val to produce its out_val. If f is None, the constant function that always returns None is used.

Other service requester classes

Following are other service requester classes (subclasses of SvcRequester) in addition to CoreSvcRequester, that can be used to define more complex services, including blocking services, asynchronous fire-and-forget services, sequentially dependednt services, parallel service calls, and service continuations. These additional classes are not used in the simulations in this document.

class Async(SvcRequester)

Wraps a service requester to produce asynchronous fire-and-forget service requests.

An asynchronous service request completes and returns immediately to the parent request, while the underlying (child) service request is scheduled for execution on its target server.

Attributes:

env: See base class.
svc_requester: The underlying service requester that is wrapped by this one.
log: See base class.

class Blkg(SvcRequester)

Wraps a service requester to produce blocking service requests.

A blocking service request will hold a software thread on the target server until the service request itself and all of its non-asynchronous sub-requests complete.

Attributes:

env: See base class.
svc_requester: The underlying service requester that is wrapped by this one.
log: See base class.

class Seq(SvcRequester)

Combines a non-empty list of service requesters to yield a sequential composite service requester.

This service requester produces composite service requests.
A composite service request produced by this service requester consists of a service request from each of the provided service requesters. Each of the service requests is submitted in sequence, i.e., each service request is submitted when the previous one completes.

Attributes:

env: See base class.
svc_name: See base class.
svc_requesters: A composite service request produced by this service requester consists of a service request from each of the provided service requesters
cont: If true, the sequence executes as continuations of the first request, all on the same server. Otherwise, each request can execute on a different server.
log: See base class.

class Par(SvcRequester)

Combines a non-empty list of service requesters to yield a parallel composite service requester.

This service requester produces composite service requests.
A composite service request produced by this service requester consists of a service request from each of the provided service requesters. All of the service requests are submitted concurrently.

When the attribute cont is True, this represents multi-threaded execution of requests on the same server. Otherwise, each service request can execute on a different server.

Attributes:

env: See base class.
svc_name: See base class.
svc_requesters: See class docstring.
f: Optional function that takes the outputs of all the component service requests and produces the overall output for the composite. If None then the constant function that always produces None is used.
cont: If true, all the requests execute on the same server. Otherwise, each request can execute on a different server. When cont is True, the server is the container service request's server if not None, otherwise the server is picked from the first service request in the list of generated service requests.
log: See base class.

Tutorial Example: Comparison of Two Service Deployment Patterns

Below we compare two major service deployment patterns by using discrete-event simulations. Ideally the reader will have had some prior exposure to the Python language in order to follow along all the details. However, the concepts and conclusions should be understandable to readers with software architecture or engineering background even if not familiar with Python.

We assume an application made up of multiple multi-threaded services and consider two deployment patterns:

Cookie-cutter deployment, where all services making up an application are deployed together on each VM or container. This is typical for "monolithic" applications but can also be used for micro-services. See Fowler and Hammant.
Individualized deployment, where each of the services is deployed on its own VM or (more likely) it own container.

In the simulations below, the application is made up of just two services, to simplify the model and the analysis, but without loss of generality in terms of the main conclusions.

Environment set-up

The code used in these simulations should be compatible with both Python 2.7 and Python 3.x.

Python and the following Python packages need to be installed in your computer:

jupyter-notebook
simpy
matplotlib
LiveStats

The model in this document should be run from the parent directory of the serversim package directory, which contains the source files for the ServerSim framework.

The core simulation function

Following is the the core function used in the simulations This function will be called with different arguments to simulate different scenarios.

This function sets up a simulation with the following givens:

Simulation duration of 200 time units (e.g., seconds).
A set of servers. Each server has 10 hardware threads, 20 software threads, and a speed of 20 compute units per unit of time. The number of servers is fixed by the server_range1 and server_range2 parameters described below.
Two services:
- svc_1, which consumes a random number of compute units per request, with a range from 0.2 to 3.8, averaging 2.0 compute units per request
- svc_2, which consumes a random number of compute units per request, with a range from 0.1 to 1.9, averaging 1.0 compute units per request
A user group, with a number of users determined by the num_users parameter described below. The user group generates service requests from the two services, with probabilities proportional to the parameters weight_1 and weight_2 described below. The think time for users in the user group ranges from 2.0 to 10.0 time units.

Parameters:

num_users: the number of users being simulated. This parameter can be either a positive integer or a list of pairs. In the second case, the list of pairs represents a number of users that varies over time as a step function. The first elements of the pairs in the list must be strictly monotonically increasing and each pair in the list represents a step in the step function. Each step starts (inclusive) at the time represented by the first component of the corresponding pair and ends (exclusive) at the time represented by the first component of the next pair in the list.
weight1: the relative frequency of service requests for the first service.
weight2: the relative frequency of service requests for the second service.
server_range1: a Python range representing the numeric server IDs of the servers on which the first service can be deployed.
server_range2: a Python range representing the numeric server IDs of the servers on which the second service can be deployed. This and the above range can be overlapping. In case they are overlapping, the servers in the intersection of the ranges will host both the first and the second service.

Imports

We import the required libraries, as well as the __future__ import for compatibility between Python 2.7 and Python 3.x.



In [1]:

    
# %load simulate_deployment_scenario.py
from __future__ import print_function

from typing import List, Tuple, Sequence

from collections import namedtuple
import random

import simpy

from serversim import *


def simulate_deployment_scenario(num_users, weight1, weight2, server_range1,
                                 server_range2):
    # type: (int, float, float, Sequence[int], Sequence[int]) -> Result

    Result = namedtuple("Result", ["num_users", "weight1", "weight2", "server_range1",
                         "server_range2", "servers", "grp"])

    def cug(mid, delta):
        """Computation units generator"""
        def f():
            return random.uniform(mid - delta, mid + delta)
        return f

    def ld_bal(svc_name):
        """Application server load-balancer."""
        if svc_name == "svc_1":
            svr = random.choice(servers1)
        elif svc_name == "svc_2":
            svr = random.choice(servers2)
        else:
            assert False, "Invalid service type."
        return svr

    simtime = 200
    hw_threads = 10
    sw_threads = 20
    speed = 20
    svc_1_comp_units = 2.0
    svc_2_comp_units = 1.0
    quantiles = (0.5, 0.95, 0.99)

    env = simpy.Environment()

    n_servers = max(server_range1[-1] + 1, server_range2[-1] + 1)
    servers = [Server(env, hw_threads, sw_threads, speed, "AppServer_%s" % i)
               for i in range(n_servers)]
    servers1 = [servers[i] for i in server_range1]
    servers2 = [servers[i] for i in server_range2]

    svc_1 = CoreSvcRequester(env, "svc_1", cug(svc_1_comp_units,
                                               svc_1_comp_units*.9), ld_bal)
    svc_2 = CoreSvcRequester(env, "svc_2", cug(svc_2_comp_units,
                                               svc_2_comp_units*.9), ld_bal)

    weighted_txns = [(svc_1, weight1),
                     (svc_2, weight2)
                     ]

    min_think_time = 2.0  # .5 # 4
    max_think_time = 10.0  # 1.5 # 20
    svc_req_log = []  # type: List[Tuple[str, SvcRequest]]

    grp = UserGroup(env, num_users, "UserTypeX", weighted_txns, min_think_time,
                    max_think_time, quantiles, svc_req_log)
    grp.activate_users()

    env.run(until=simtime)

    return Result(num_users=num_users, weight1=weight1, weight2=weight2,
            server_range1=server_range1, server_range2=server_range2,
            servers=servers, grp=grp)

Printing the simulation results

The following function prints the outputs from the above core simulation function.



In [2]:

    
# %load print_results.py
from __future__ import print_function

from typing import Sequence, Any, IO

from serversim import *


def print_results(num_users=None, weight1=None, weight2=None, server_range1=None,
                  server_range2=None, servers=None, grp=None, fi=None):
    # type: (int, float, float, Sequence[int], Sequence[int], Sequence[Server], UserGroup, IO[str]) -> None
    
    if fi is None:
        import sys
        fi = sys.stdout

    print("\n\n***** Start Simulation --", num_users, ",", weight1, ",", weight2, ", [", server_range1[0], ",", server_range1[-1] + 1,
          ") , [", server_range2[0], ",", server_range2[-1] + 1, ") *****", file=fi)
    print("Simulation: num_users =", num_users, file=fi)

    print("<< ServerExample >>\n", file=fi)

    indent = " " * 4

    print("\n" + "Servers:", file=fi)
    for svr in servers:
        print(indent*1 + "Server:", svr.name, file=fi)
        print(indent * 2 + "max_concurrency =", svr.max_concurrency, file=fi)
        print(indent * 2 + "num_threads =", svr.num_threads, file=fi)
        print(indent*2 + "speed =", svr.speed, file=fi)
        print(indent * 2 + "avg_process_time =", svr.avg_process_time, file=fi)
        print(indent * 2 + "avg_hw_queue_time =", svr.avg_hw_queue_time, file=fi)
        print(indent * 2 + "avg_thread_queue_time =", svr.avg_thread_queue_time, file=fi)
        print(indent * 2 + "avg_service_time =", svr.avg_service_time, file=fi)
        print(indent * 2 + "avg_hw_queue_length =", svr.avg_hw_queue_length, file=fi)
        print(indent * 2 + "avg_thread_queue_length =", svr.avg_thread_queue_length, file=fi)
        print(indent * 2 + "hw_queue_length =", svr.hw_queue_length, file=fi)
        print(indent * 2 + "hw_in_process_count =", svr.hw_in_process_count, file=fi)
        print(indent * 2 + "thread_queue_length =", svr.thread_queue_length, file=fi)
        print(indent * 2 + "thread_in_use_count =", svr.thread_in_use_count, file=fi)
        print(indent*2 + "utilization =", svr.utilization, file=fi)
        print(indent*2 + "throughput =", svr.throughput, file=fi)

    print(indent*1 + "Group:", grp.name, file=fi)
    print(indent*2 + "num_users =", grp.num_users, file=fi)
    print(indent*2 + "min_think_time =", grp.min_think_time, file=fi)
    print(indent*2 + "max_think_time =", grp.max_think_time, file=fi)
    print(indent * 2 + "responded_request_count =", grp.responded_request_count(None), file=fi)
    print(indent * 2 + "unresponded_request_count =", grp.unresponded_request_count(None), file=fi)
    print(indent * 2 + "avg_response_time =", grp.avg_response_time(), file=fi)
    print(indent * 2 + "std_dev_response_time =", grp.std_dev_response_time(None), file=fi)
    print(indent*2 + "throughput =", grp.throughput(None), file=fi)

    for svc in grp.svcs:
        print(indent*2 + svc.svc_name + ":", file=fi)
        print(indent * 3 + "responded_request_count =", grp.responded_request_count(svc), file=fi)
        print(indent * 3 + "unresponded_request_count =", grp.unresponded_request_count(svc), file=fi)
        print(indent * 3 + "avg_response_time =", grp.avg_response_time(svc), file=fi)
        print(indent * 3 + "std_dev_response_time =", grp.std_dev_response_time(svc), file=fi)
        print(indent*3 + "throughput =", grp.throughput(svc), file=fi)

Mini-batching, plotting, and comparison of results

The following three functions handle mini-batching, plotting, and comparison of results.

minibatch_resp_times -- This function takes the user group from the results of the deployment_example function, scans the service request log of the user group, and produces mini-batch statistics for every time_resolution time units. For example, with a simulation of 200 time units and a time_resolution of 5 time units, we end up with 40 mini-batches. The statistics produced are the x values corresponding to each mini-batch, and the counts, means, medians, 95th percentile, and 99th percentile in each mini-batch.
plot_counts_means_q95 -- Plots superimposed counts, means, and 95th percentiles for two mini-batch sets coming from two simulations.
compare_scenarios -- Combines the above two functions to produce comparison plots from two simulations.



In [3]:

    
# %load report_resp_times.py
from typing import TYPE_CHECKING, Sequence, Tuple
import functools as ft
from collections import OrderedDict

import matplotlib.pyplot as plt
from livestats import livestats

if TYPE_CHECKING:
    from serversim import UserGroup


def minibatch_resp_times(time_resolution, grp):
    # type: (float, UserGroup) -> Tuple[Sequence[float], Sequence[float], Sequence[float], Sequence[float], Sequence[float], Sequence[float]]
    quantiles = [0.5, 0.95, 0.99]

    xys = [(int(svc_req.time_dict["submitted"]/time_resolution),
            svc_req.time_dict["completed"] - svc_req.time_dict["submitted"])
           for (_, svc_req) in grp.svc_req_log
           if svc_req.is_completed]

    def ffold(map_, p):
        x, y = p
        if x not in map_:
            map_[x] = livestats.LiveStats(quantiles)
        map_[x].add(y)
        return map_

    xlvs = ft.reduce(ffold, xys, dict())

    xs = xlvs.keys()
    xs.sort()

    counts = [xlvs[x].count for x in xs]
    means = [xlvs[x].average for x in xs]
    q_50 = [xlvs[x].quantiles()[0] for x in xs]
    q_95 = [xlvs[x].quantiles()[1] for x in xs]
    q_99 = [xlvs[x].quantiles()[2] for x in xs]

    return xs, counts, means, q_50, q_95, q_99


def plot_counts_means_q95(quantiles1, quantiles2):

    x = quantiles1[0]  # should be same as quantiles2[0]

    counts1 = quantiles1[1]
    counts2 = quantiles2[1]

    means1 = quantiles1[2]
    means2 = quantiles2[2]

    q1_95 = quantiles1[4]
    q2_95 = quantiles2[4]

    # Plot counts
    plt.plot(x, counts1, color='b', label="Counts 1")
    plt.plot(x, counts2, color='r', label="Counts 2")
    plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
    plt.xlabel("Time buckets")
    plt.ylabel("Throughput")
    plt.show()

    # Plot averages and 95th percentiles

    plt.plot(x, means1, color='b', label="Means 1")
    plt.plot(x, q1_95, color='c', label="95th Percentile 1")

    plt.plot(x, means2, color='r', label="Means 2")
    plt.plot(x, q2_95, color='m', label="95th Percentile 2")

    # Hack to avoid duplicated labels (https://stackoverflow.com/questions/13588920/stop-matplotlib-repeating-labels-in-legend)
    handles, labels = plt.gca().get_legend_handles_labels()
    by_label = OrderedDict(zip(labels, handles))
    plt.legend(by_label.values(), by_label.keys(), bbox_to_anchor=(1.05, 1),
               loc=2, borderaxespad=0.)

    plt.xlabel("Time buckets")
    plt.ylabel("Response times")

    plt.show()

    
def compare_scenarios(sc1, sc2):
    grp1 = sc1.grp
    grp2 = sc2.grp

    quantiles1 = minibatch_resp_times(5, grp1)
    quantiles2 = minibatch_resp_times(5, grp2)

    plot_counts_means_q95(quantiles1, quantiles2)

Random number generator seed

We set the random number generator seed to a known value to produce repeatable simulations. Comment-out this line to have a different system-generated seed every time the simulations are executed.



In [4]:

    
random.seed(123456)

Simulations

Several simulation scenarios are executed below. See the descriptions of the parameters and hard-coded given values of the core simulation function above.

With 10 servers and weight_1 = 2 and weight_2 = 1, this configuration supports 720 users with average response times close to the minimum possible. How did we arrive at that number? For svc_1, the heavier of the two services, the minimum possible average response time is 1 time unit (= 20 server compute units / 10 hardware threads / 2 average service compute units). One server can handle 10 concurrent svc_1 users without think time, or 60 concurrent svc_1 users with average think time of 6 time units. Thus, 10 servers can handle 600 concurrent svc_1 users. Doing the math for both services and taking their respective probabilities into account, the number of users is 720. For full details, see the spreadsheet CapacityPlanning.xlsx. Of course, due to randomness, there will be queuing and the average response times will be greater than the minimum possible. With these numbers, the servers will be running hot as there is no planned slack capacity.

Simulation 0

This is a simulation of one scenario (not a comparison) and printing out of its results. It illustrates the use of the print_results function. The scenario here is the same as the first scenario for Simulation 1 below.



In [5]:

    
sc1 = simulate_deployment_scenario(num_users=720, weight1=2, weight2=1, 
                                   server_range1=range(0, 10), server_range2=range(0, 10))
print_results(**sc1.__dict__)









    



***** Start Simulation -- 720 , 2 , 1 , [ 0 , 10 ) , [ 0 , 10 ) *****
Simulation: num_users = 720
<< ServerExample >>


Servers:
    Server: AppServer_0
        max_concurrency = 10
        num_threads = 20
        speed = 20
        avg_process_time = 0.846180691609
        avg_hw_queue_time = 0.2101130285
        avg_thread_queue_time = 0.0591755936664
        avg_service_time = 1.10295158856
        avg_hw_queue_length = 2.06856276558
        avg_thread_queue_length = 0.582583719646
        hw_queue_length = 10
        hw_in_process_count = 10
        thread_queue_length = 15
        thread_in_use_count = 20
        utilization = 0.833064890889
        throughput = 9.845
    Server: AppServer_1
        max_concurrency = 10
        num_threads = 20
        speed = 20
        avg_process_time = 0.813498888851
        avg_hw_queue_time = 0.145477017523
        avg_thread_queue_time = 0.00782950371385
        avg_service_time = 0.966805410088
        avg_hw_queue_length = 1.42203784629
        avg_thread_queue_length = 0.0765333988029
        hw_queue_length = 0
        hw_in_process_count = 7
        thread_queue_length = 0
        thread_in_use_count = 7
        utilization = 0.795195163851
        throughput = 9.775
    Server: AppServer_2
        max_concurrency = 10
        num_threads = 20
        speed = 20
        avg_process_time = 0.839782710651
        avg_hw_queue_time = 0.170248598116
        avg_thread_queue_time = 0.00287106836888
        avg_service_time = 1.01290237714
        avg_hw_queue_length = 1.7263207849
        avg_thread_queue_length = 0.0291126332604
        hw_queue_length = 2
        hw_in_process_count = 10
        thread_queue_length = 0
        thread_in_use_count = 12
        utilization = 0.8515396686
        throughput = 10.14
    Server: AppServer_3
        max_concurrency = 10
        num_threads = 20
        speed = 20
        avg_process_time = 0.822385749048
        avg_hw_queue_time = 0.154224631025
        avg_thread_queue_time = 0.0110989753953
        avg_service_time = 0.987709355468
        avg_hw_queue_length = 1.5723201133
        avg_thread_queue_length = 0.113154054155
        hw_queue_length = 0
        hw_in_process_count = 7
        thread_queue_length = 0
        thread_in_use_count = 7
        utilization = 0.838422271154
        throughput = 10.195
    Server: AppServer_4
        max_concurrency = 10
        num_threads = 20
        speed = 20
        avg_process_time = 0.850279443063
        avg_hw_queue_time = 0.245376623513
        avg_thread_queue_time = 0.01014489631
        avg_service_time = 1.10410311101
        avg_hw_queue_length = 2.53228675466
        avg_thread_queue_length = 0.104695329919
        hw_queue_length = 7
        hw_in_process_count = 10
        thread_queue_length = 0
        thread_in_use_count = 17
        utilization = 0.877488385241
        throughput = 10.32
    Server: AppServer_5
        max_concurrency = 10
        num_threads = 20
        speed = 20
        avg_process_time = 0.822566613236
        avg_hw_queue_time = 0.229022881493
        avg_thread_queue_time = 0.0787443972051
        avg_service_time = 1.13033389193
        avg_hw_queue_length = 2.37153193786
        avg_thread_queue_length = 0.815398233059
        hw_queue_length = 0
        hw_in_process_count = 10
        thread_queue_length = 0
        thread_in_use_count = 10
        utilization = 0.851767728006
        throughput = 10.355
    Server: AppServer_6
        max_concurrency = 10
        num_threads = 20
        speed = 20
        avg_process_time = 0.824548933463
        avg_hw_queue_time = 0.157893749316
        avg_thread_queue_time = 0.0126651786461
        avg_service_time = 0.995107861425
        avg_hw_queue_length = 1.58999005561
        avg_thread_queue_length = 0.127538348966
        hw_queue_length = 1
        hw_in_process_count = 10
        thread_queue_length = 0
        thread_in_use_count = 11
        utilization = 0.830320775997
        throughput = 10.07
    Server: AppServer_7
        max_concurrency = 10
        num_threads = 20
        speed = 20
        avg_process_time = 0.828856353753
        avg_hw_queue_time = 0.170297090757
        avg_thread_queue_time = 0.013766158056
        avg_service_time = 1.01291960257
        avg_hw_queue_length = 1.7114857621
        avg_thread_queue_length = 0.138349888463
        hw_queue_length = 0
        hw_in_process_count = 5
        thread_queue_length = 0
        thread_in_use_count = 5
        utilization = 0.833000635522
        throughput = 10.05
    Server: AppServer_8
        max_concurrency = 10
        num_threads = 20
        speed = 20
        avg_process_time = 0.821426053444
        avg_hw_queue_time = 0.155358173515
        avg_thread_queue_time = 0.00308564349972
        avg_service_time = 0.979869870459
        avg_hw_queue_length = 1.58853732419
        avg_thread_queue_length = 0.0315507047846
        hw_queue_length = 1
        hw_in_process_count = 10
        thread_queue_length = 0
        thread_in_use_count = 11
        utilization = 0.839908139647
        throughput = 10.225
    Server: AppServer_9
        max_concurrency = 10
        num_threads = 20
        speed = 20
        avg_process_time = 0.833113262958
        avg_hw_queue_time = 0.101102714316
        avg_thread_queue_time = 5.35979711947e-05
        avg_service_time = 0.934269575245
        avg_hw_queue_length = 0.996872763155
        avg_thread_queue_length = 0.00052847599598
        hw_queue_length = 0
        hw_in_process_count = 4
        thread_queue_length = 0
        thread_in_use_count = 4
        utilization = 0.821449677277
        throughput = 9.86
    Group: UserTypeX
        num_users = [(0, 720)]
        min_think_time = 2.0
        max_think_time = 10.0
        responded_request_count = 20167
        unresponded_request_count = 119
        avg_response_time = 1.02325785115
        std_dev_response_time = 0.60107201482
        throughput = 100.835
        svc_1:
            responded_request_count = 13445
            unresponded_request_count = 87
            avg_response_time = 1.19072022077
            std_dev_response_time = 0.610786210662
            throughput = 67.225
        svc_2:
            responded_request_count = 6722
            unresponded_request_count = 32
            avg_response_time = 0.688308199312
            std_dev_response_time = 0.411686279379
            throughput = 33.61

Simulation 1

In the first scenario, there are 10 servers which are shared by both services. In the second scenario, there are 10 servers, of which 8 are allocated to the first service and 2 are allocated to the second service. This allocation is proportional to their respective loads.



In [6]:

    
rand_state = random.getstate()
sc1 = simulate_deployment_scenario(num_users=720, weight1=2, weight2=1, 
                                   server_range1=range(0, 10), server_range2=range(0, 10))
random.setstate(rand_state)
sc2 = simulate_deployment_scenario(num_users=720, weight1=2, weight2=1, 
                                   server_range1=range(0, 8), server_range2=range(8, 10))
compare_scenarios(sc1, sc2)

Repeating above comparison to illustrate variability of results.



In [7]:

    
rand_state = random.getstate()
sc1 = simulate_deployment_scenario(num_users=720, weight1=2, weight2=1, 
                                   server_range1=range(0, 10), server_range2=range(0, 10))
random.setstate(rand_state)
sc2 = simulate_deployment_scenario(num_users=720, weight1=2, weight2=1, 
                                   server_range1=range(0, 8), server_range2=range(8, 10))
compare_scenarios(sc1, sc2)

Conclusions: The results of the two deployment strategies are similar in terms of throughput, mean response times, and 95th percentile response times. This is as would be expected, since the capacities allocated under the individualized deployment strategy are proportional to the respective service loads.

Simulation 2

Now, we change the weights of the different services, significantly increasing the weight of svc_1 from 2 to 5.



In [8]:

    
rand_state = random.getstate()
sc1 = simulate_deployment_scenario(num_users=720, weight1=5, weight2=1, 
                                   server_range1=range(0, 10), server_range2=range(0, 10))
random.setstate(rand_state)
sc2 = simulate_deployment_scenario(num_users=720, weight1=5, weight2=1, 
                                   server_range1=range(0, 8), server_range2=range(8, 10))
compare_scenarios(sc1, sc2)

Conclusions: The cookie-cutter deployment strategy was able to absorb the change in load mix, while the individualized strategy was not, with visibly lower throughput and higher mean and 95th percentile response times.

Simulation 3

For this simulation, we also change the weights of the two services, but now in the opposite direction -- we change the weight of svc_1 from 2 to 1.



In [9]:

    
rand_state = random.getstate()
sc1 = simulate_deployment_scenario(num_users=720, weight1=1, weight2=1, 
                                   server_range1=range(0, 10), server_range2=range(0, 10))
random.setstate(rand_state)
sc2 = simulate_deployment_scenario(num_users=720, weight1=1, weight2=1, 
                                   server_range1=range(0, 8), server_range2=range(8, 10))
compare_scenarios(sc1, sc2)

Conclusions: Again the cookie-cutter deployment strategy was able to absorb the change in load mix, while the individualized strategy was not, with visibly lower throughput and higher mean and 95th percentile response times. Notice that due to the changed load mix, the total load was lower than before and, with the same number of servers, the cookie-cutter configuration had excess capacity while the individualized configuration had excess capacity for svc_1 and insufficient capacity for svc_2.

Simulation 4

We now continue with the weights used in Simulation 3, but adjust server capacity to account for the lower aggregate load and different load mix.

Below we have three scenarios:

Scenario 1 (cookie-cutter) removes one server
Scenario 2 (individualized) removes one server from the pool allocated to svc_1
Scenario 3 (individualized) removes one server and reassigns one server from the svc_1 pool to the svc_2 pool.

Run the three scenarios:



In [10]:

    
rand_state = random.getstate()
sc1 = simulate_deployment_scenario(num_users=720, weight1=1, weight2=1, 
                                   server_range1=range(0, 9), server_range2=range(0, 9))
random.setstate(rand_state)
sc2a = simulate_deployment_scenario(num_users=720, weight1=1, weight2=1, 
                                    server_range1=range(0, 7), server_range2=range(7, 9))
random.setstate(rand_state)
sc2b = simulate_deployment_scenario(num_users=720, weight1=1, weight2=1, 
                                    server_range1=range(0, 6), server_range2=range(6, 9))

Compare the results of scenarios 1 and 2a:



In [11]:

    
compare_scenarios(sc1, sc2a)

Compare the results of scenarios 1 and 2b:



In [12]:

    
compare_scenarios(sc1, sc2b)

Conclusions: Scenario 1 performs significantly than better Scenario 2a and comparably to Scenario 2b. This simulation shows again that the cookie-cutter strategy is comparable in performance and throughput to a tuned individualized configuration, and beats hands-down an individualized configuration that is not perfectly tuned for the load mix.

Vary the number of users over time

The simulations below will vary the load over time by varying the number of users over time. The list below defines a step function the represents the number of users varying over time. In this case, the number of users changes every 50 time units.



In [13]:

    
users_curve = [(0, 900), (50, 540), (100, 900), (150, 540)]

Simulation 5

This simulation is similar to Simulation 1, the difference being the users curve instead of a constant 720 users.



In [14]:

    
rand_state = random.getstate()
sc1 = simulate_deployment_scenario(num_users=users_curve, weight1=2, weight2=1, 
                                   server_range1=range(0, 10), server_range2=range(0, 10))
random.setstate(rand_state)
sc2 = simulate_deployment_scenario(num_users=users_curve, weight1=2, weight2=1, 
                                   server_range1=range(0, 8), server_range2=range(8, 10))
compare_scenarios(sc1, sc2)

Conclusions: The cookie-cutter and individualized strategies produced similar results.

Simulation 6

We now run a simulation similar to Simulation 4, with the difference that the number of users varies over time. This combines load variability over time as well as a change in load mix. As in Simulation 4, we adjust server capacity to account for the lower aggregate load and different load mix.

Below we have three scenarios:

Scenario 1 (cookie-cutter) removes one server
Scenario 2a (individualized) removes one server from the pool allocated to svc_1
Scenario 2b (individualized) removes one server and reassigns one server from the svc_1 pool to the svc_2 pool.

Run the three scenarios:



In [15]:

    
rand_state = random.getstate()
sc1 = simulate_deployment_scenario(num_users=users_curve, weight1=1, weight2=1, 
                                   server_range1=range(0, 9), server_range2=range(0, 9))
random.setstate(rand_state)
sc2a = simulate_deployment_scenario(num_users=users_curve, weight1=1, weight2=1, 
                                    server_range1=range(0, 7), server_range2=range(7, 9))
random.setstate(rand_state)
sc2b = simulate_deployment_scenario(num_users=users_curve, weight1=1, weight2=1, 
                                    server_range1=range(0, 6), server_range2=range(6, 9))

Compare the results of scenarios 1 and 2a:



In [16]:

    
compare_scenarios(sc1, sc2a)

Compare the results of scenarios 1 and 2b:



In [17]:

    
compare_scenarios(sc1, sc2b)

Conclusions: Scenario 1 performs significantly better than Scenario 2a and comparably to Scenario 2b. This simulation shows again that the cookie-cutter strategy is comparable in performance and throughput to a tuned individualized configuration, and beats an individualized configuration that is not perfectly tuned for the load mix.

Simulation 7

This final simulation is similar to Simulation 1, with the difference that the number of users is 864 instad of 720. In this scenario, the total number of servers required for best capacity utilization can be calculated to be 12 (see CapacityPlanning.xlsx). Under the individualized deployment strategy, the ideal number of servers allocated to svc_1 and svc_2 would be 9.6 and 2.4, respectively. Since the number of servers needs to be an integer, we will run simulations with server allocations to svc_1 and svc_2, respectively, of 10 and 2, 9 and 3, and 10 and 3.

Thus, we have five scenarios:

Scenario 1a (cookie-cutter) with 12 servers
Scenario 2a1 (individualized) with 9 servers for svc_1 and 3 servers for svc_2
Scenario 2a2 (individualized) with 10 servers for svc_1 and 2 servers for svc_2
Scenario 1b (cookie-cutter) with 13 servers
Scenario 2b (individualized) with 10 servers for svc_1 and 3 servers for svc_2

Run the scenarios:



In [18]:

    
rand_state = random.getstate()
sc1a = simulate_deployment_scenario(num_users=864, weight1=2, weight2=1, 
                                server_range1=range(0, 12), server_range2=range(0, 12))
random.setstate(rand_state)
sc2a1 = simulate_deployment_scenario(num_users=864, weight1=2, weight2=1, 
                                server_range1=range(0, 9), server_range2=range(9, 12))
random.setstate(rand_state)
sc2a2 = simulate_deployment_scenario(num_users=864, weight1=2, weight2=1, 
                                server_range1=range(0, 10), server_range2=range(10, 12))
random.setstate(rand_state)
sc1b = simulate_deployment_scenario(num_users=864, weight1=2, weight2=1, 
                                server_range1=range(0, 13), server_range2=range(0, 13))
random.setstate(rand_state)
sc2b = simulate_deployment_scenario(num_users=864, weight1=2, weight2=1, 
                                server_range1=range(0, 10), server_range2=range(10, 13))

Compare the results of scenarios 1a and 2a1:



In [19]:

    
compare_scenarios(sc1a, sc2a1)

Compare the results of scenarios 1a and 2a2:



In [20]:

    
compare_scenarios(sc1a, sc2a2)

Compare the results of scenarios 1b and 2b:



In [21]:

    
compare_scenarios(sc1b, sc2b)

Conclusions: Scenario 1a has comparable throughput but somewhat better response times than Scenario 2a1. Scenario 1a has somewhat better throughput and response times than Scenario 2a2. Scenario 1b has comparable throughput and a bit less extreme response times than Scenario 2b. In all three comparisons, the cookie-cutter strategy performs better than or comparably to the individualized strategy.

Overall Conclusions

The various simulations show consistently that the cookie-cutter strategy is comparable in performance and throughput (and therefore hardware utilization) to a tuned individualized configuration, and beats an individualized configuration that is not well-tuned for the load mix. Cookie-cutter thus proves to be a more robust and stable deployment strategy in many realistic situations, in the face of likely load mix fluctuations, mismatches between forecast average load mixes and actual average load mixes, and mismatches between forecast load mixes and allocated server capacities. However, although not highlighted on the simulation graphs presented, it is a fact (that can be observed in the simulation logs) that response times for svc_2 are better under a well-tuned individualized configuration because then svc_2 requests don't have to share a queue with longer-running svc_1 requests. When that's an important consideration, an individualized deployment strategy could be a more appropriate choice.