This document is a Jupyter notebook. See http://jupyter.org/ for more information on Jupyter notebooks.
This is a tutorial for ServerSim, a framework for the creation of discrete event simulation models to analyze the performance, throughput, and scalability of services deployed on computer servers.
Software engineers and architects not familiar with Python may want to jump directly to the Simulations section, skip the code, and focus on the simulation definitions and conclusions.
Below we compare two major service deployment patterns by using discrete-event simulations. Ideally the reader will have had some prior exposure to the Python language in order to follow along all the details. However, the concepts and conclusions should be understandable to readers with software architecture or engineering background even if not familiar with Python.
We assume an application made up of multiple multi-threaded services and consider two deployment patterns:
In the simulations below, the application is made up of just two services, to simplify the model and the analysis, but without loss of generality in terms of the main conclusions.
The code used in these simulations should be compatible with both Python 2.7 and Python 3.x.
Python and the following Python packages need to be installed in your computer:
The model in this document should be run from the parent directory of the serversim
package directory, which contains the source files for the ServerSim framework.
Following is the the core function used in the simulations This function will be called with different arguments to simulate different scenarios.
This function sets up a simulation with the following givens:
We import the required libraries, as well as the __future__
import for compatibility between Python 2.7 and Python 3.x.
In [1]:
# %load simulate_deployment_scenario.py
from __future__ import print_function
from typing import List, Tuple, Sequence
from collections import namedtuple
import random
import simpy
from serversim import *
def simulate_deployment_scenario(num_users, weight1, weight2, server_range1,
server_range2):
# type: (int, float, float, Sequence[int], Sequence[int]) -> Result
Result = namedtuple("Result", ["num_users", "weight1", "weight2", "server_range1",
"server_range2", "servers", "grp"])
def cug(mid, delta):
"""Computation units generator"""
def f():
return random.uniform(mid - delta, mid + delta)
return f
def ld_bal(svc_name):
"""Application server load-balancer."""
if svc_name == "svc_1":
svr = random.choice(servers1)
elif svc_name == "svc_2":
svr = random.choice(servers2)
else:
assert False, "Invalid service type."
return svr
simtime = 200
hw_threads = 10
sw_threads = 20
speed = 20
svc_1_comp_units = 2.0
svc_2_comp_units = 1.0
quantiles = (0.5, 0.95, 0.99)
env = simpy.Environment()
n_servers = max(server_range1[-1] + 1, server_range2[-1] + 1)
servers = [Server(env, hw_threads, sw_threads, speed, "AppServer_%s" % i)
for i in range(n_servers)]
servers1 = [servers[i] for i in server_range1]
servers2 = [servers[i] for i in server_range2]
svc_1 = CoreSvcRequester(env, "svc_1", cug(svc_1_comp_units,
svc_1_comp_units*.9), ld_bal)
svc_2 = CoreSvcRequester(env, "svc_2", cug(svc_2_comp_units,
svc_2_comp_units*.9), ld_bal)
weighted_txns = [(svc_1, weight1),
(svc_2, weight2)
]
min_think_time = 2.0 # .5 # 4
max_think_time = 10.0 # 1.5 # 20
svc_req_log = [] # type: List[Tuple[str, SvcRequest]]
grp = UserGroup(env, num_users, "UserTypeX", weighted_txns, min_think_time,
max_think_time, quantiles, svc_req_log)
grp.activate_users()
env.run(until=simtime)
return Result(num_users=num_users, weight1=weight1, weight2=weight2,
server_range1=server_range1, server_range2=server_range2,
servers=servers, grp=grp)
In [2]:
# %load print_results.py
from __future__ import print_function
from typing import Sequence, Any, IO
from serversim import *
def print_results(num_users=None, weight1=None, weight2=None, server_range1=None,
server_range2=None, servers=None, grp=None, fi=None):
# type: (int, float, float, Sequence[int], Sequence[int], Sequence[Server], UserGroup, IO[str]) -> None
if fi is None:
import sys
fi = sys.stdout
print("\n\n***** Start Simulation --", num_users, ",", weight1, ",", weight2, ", [", server_range1[0], ",", server_range1[-1] + 1,
") , [", server_range2[0], ",", server_range2[-1] + 1, ") *****", file=fi)
print("Simulation: num_users =", num_users, file=fi)
print("<< ServerExample >>\n", file=fi)
indent = " " * 4
print("\n" + "Servers:", file=fi)
for svr in servers:
print(indent*1 + "Server:", svr.name, file=fi)
print(indent * 2 + "max_concurrency =", svr.max_concurrency, file=fi)
print(indent * 2 + "num_threads =", svr.num_threads, file=fi)
print(indent*2 + "speed =", svr.speed, file=fi)
print(indent * 2 + "avg_process_time =", svr.avg_process_time, file=fi)
print(indent * 2 + "avg_hw_queue_time =", svr.avg_hw_queue_time, file=fi)
print(indent * 2 + "avg_thread_queue_time =", svr.avg_thread_queue_time, file=fi)
print(indent * 2 + "avg_service_time =", svr.avg_service_time, file=fi)
print(indent * 2 + "avg_hw_queue_length =", svr.avg_hw_queue_length, file=fi)
print(indent * 2 + "avg_thread_queue_length =", svr.avg_thread_queue_length, file=fi)
print(indent * 2 + "hw_queue_length =", svr.hw_queue_length, file=fi)
print(indent * 2 + "hw_in_process_count =", svr.hw_in_process_count, file=fi)
print(indent * 2 + "thread_queue_length =", svr.thread_queue_length, file=fi)
print(indent * 2 + "thread_in_use_count =", svr.thread_in_use_count, file=fi)
print(indent*2 + "utilization =", svr.utilization, file=fi)
print(indent*2 + "throughput =", svr.throughput, file=fi)
print(indent*1 + "Group:", grp.name, file=fi)
print(indent*2 + "num_users =", grp.num_users, file=fi)
print(indent*2 + "min_think_time =", grp.min_think_time, file=fi)
print(indent*2 + "max_think_time =", grp.max_think_time, file=fi)
print(indent * 2 + "responded_request_count =", grp.responded_request_count(None), file=fi)
print(indent * 2 + "unresponded_request_count =", grp.unresponded_request_count(None), file=fi)
print(indent * 2 + "avg_response_time =", grp.avg_response_time(), file=fi)
print(indent * 2 + "std_dev_response_time =", grp.std_dev_response_time(None), file=fi)
print(indent*2 + "throughput =", grp.throughput(None), file=fi)
for svc in grp.svcs:
print(indent*2 + svc.svc_name + ":", file=fi)
print(indent * 3 + "responded_request_count =", grp.responded_request_count(svc), file=fi)
print(indent * 3 + "unresponded_request_count =", grp.unresponded_request_count(svc), file=fi)
print(indent * 3 + "avg_response_time =", grp.avg_response_time(svc), file=fi)
print(indent * 3 + "std_dev_response_time =", grp.std_dev_response_time(svc), file=fi)
print(indent*3 + "throughput =", grp.throughput(svc), file=fi)
The following three functions handle mini-batching, plotting, and comparison of results.
In [3]:
# %load report_resp_times.py
from typing import TYPE_CHECKING, Sequence, Tuple
import functools as ft
from collections import OrderedDict
import matplotlib.pyplot as plt
from livestats import livestats
if TYPE_CHECKING:
from serversim import UserGroup
def minibatch_resp_times(time_resolution, grp):
# type: (float, UserGroup) -> Tuple[Sequence[float], Sequence[float], Sequence[float], Sequence[float], Sequence[float], Sequence[float]]
quantiles = [0.5, 0.95, 0.99]
xys = [(int(svc_req.time_dict["submitted"]/time_resolution),
svc_req.time_dict["completed"] - svc_req.time_dict["submitted"])
for (_, svc_req) in grp.svc_req_log
if svc_req.is_completed]
def ffold(map_, p):
x, y = p
if x not in map_:
map_[x] = livestats.LiveStats(quantiles)
map_[x].add(y)
return map_
xlvs = ft.reduce(ffold, xys, dict())
xs = xlvs.keys()
xs.sort()
counts = [xlvs[x].count for x in xs]
means = [xlvs[x].average for x in xs]
q_50 = [xlvs[x].quantiles()[0] for x in xs]
q_95 = [xlvs[x].quantiles()[1] for x in xs]
q_99 = [xlvs[x].quantiles()[2] for x in xs]
return xs, counts, means, q_50, q_95, q_99
def plot_counts_means_q95(quantiles1, quantiles2):
x = quantiles1[0] # should be same as quantiles2[0]
counts1 = quantiles1[1]
counts2 = quantiles2[1]
means1 = quantiles1[2]
means2 = quantiles2[2]
q1_95 = quantiles1[4]
q2_95 = quantiles2[4]
# Plot counts
plt.plot(x, counts1, color='b', label="Counts 1")
plt.plot(x, counts2, color='r', label="Counts 2")
plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
plt.xlabel("Time buckets")
plt.ylabel("Throughput")
plt.show()
# Plot averages and 95th percentiles
plt.plot(x, means1, color='b', label="Means 1")
plt.plot(x, q1_95, color='c', label="95th Percentile 1")
plt.plot(x, means2, color='r', label="Means 2")
plt.plot(x, q2_95, color='m', label="95th Percentile 2")
# Hack to avoid duplicated labels (https://stackoverflow.com/questions/13588920/stop-matplotlib-repeating-labels-in-legend)
handles, labels = plt.gca().get_legend_handles_labels()
by_label = OrderedDict(zip(labels, handles))
plt.legend(by_label.values(), by_label.keys(), bbox_to_anchor=(1.05, 1),
loc=2, borderaxespad=0.)
plt.xlabel("Time buckets")
plt.ylabel("Response times")
plt.show()
def compare_scenarios(sc1, sc2):
grp1 = sc1.grp
grp2 = sc2.grp
quantiles1 = minibatch_resp_times(5, grp1)
quantiles2 = minibatch_resp_times(5, grp2)
plot_counts_means_q95(quantiles1, quantiles2)
In [4]:
random.seed(123456)
Several simulation scenarios are executed below. See the descriptions of the parameters and hard-coded given values of the core simulation function above.
With 10 servers and weight_1 = 2 and weight_2 = 1, this configuration supports 720 users with average response times close to the minimum possible. How did we arrive at that number? For svc_1, the heavier of the two services, the minimum possible average response time is 1 time unit (= 20 server compute units / 10 hardware threads / 2 average service compute units). One server can handle 10 concurrent svc_1 users without think time, or 60 concurrent svc_1 users with average think time of 6 time units. Thus, 10 servers can handle 600 concurrent svc_1 users. Doing the math for both services and taking their respective probabilities into account, the number of users is 720. For full details, see the spreadsheet CapacityPlanning.xlsx. Of course, due to randomness, there will be queuing and the average response times will be greater than the minimum possible. With these numbers, the servers will be running hot as there is no planned slack capacity.
In [5]:
sc1 = simulate_deployment_scenario(num_users=720, weight1=2, weight2=1,
server_range1=range(0, 10), server_range2=range(0, 10))
print_results(**sc1.__dict__)
In [6]:
rand_state = random.getstate()
sc1 = simulate_deployment_scenario(num_users=720, weight1=2, weight2=1,
server_range1=range(0, 10), server_range2=range(0, 10))
random.setstate(rand_state)
sc2 = simulate_deployment_scenario(num_users=720, weight1=2, weight2=1,
server_range1=range(0, 8), server_range2=range(8, 10))
compare_scenarios(sc1, sc2)
Repeating above comparison to illustrate variability of results.
In [7]:
rand_state = random.getstate()
sc1 = simulate_deployment_scenario(num_users=720, weight1=2, weight2=1,
server_range1=range(0, 10), server_range2=range(0, 10))
random.setstate(rand_state)
sc2 = simulate_deployment_scenario(num_users=720, weight1=2, weight2=1,
server_range1=range(0, 8), server_range2=range(8, 10))
compare_scenarios(sc1, sc2)
Conclusions: The results of the two deployment strategies are similar in terms of throughput, mean response times, and 95th percentile response times. This is as would be expected, since the capacities allocated under the individualized deployment strategy are proportional to the respective service loads.
In [8]:
rand_state = random.getstate()
sc1 = simulate_deployment_scenario(num_users=720, weight1=5, weight2=1,
server_range1=range(0, 10), server_range2=range(0, 10))
random.setstate(rand_state)
sc2 = simulate_deployment_scenario(num_users=720, weight1=5, weight2=1,
server_range1=range(0, 8), server_range2=range(8, 10))
compare_scenarios(sc1, sc2)
Conclusions: The cookie-cutter deployment strategy was able to absorb the change in load mix, while the individualized strategy was not, with visibly lower throughput and higher mean and 95th percentile response times.
In [9]:
rand_state = random.getstate()
sc1 = simulate_deployment_scenario(num_users=720, weight1=1, weight2=1,
server_range1=range(0, 10), server_range2=range(0, 10))
random.setstate(rand_state)
sc2 = simulate_deployment_scenario(num_users=720, weight1=1, weight2=1,
server_range1=range(0, 8), server_range2=range(8, 10))
compare_scenarios(sc1, sc2)
Conclusions: Again the cookie-cutter deployment strategy was able to absorb the change in load mix, while the individualized strategy was not, with visibly lower throughput and higher mean and 95th percentile response times. Notice that due to the changed load mix, the total load was lower than before and, with the same number of servers, the cookie-cutter configuration had excess capacity while the individualized configuration had excess capacity for svc_1 and insufficient capacity for svc_2.
We now continue with the weights used in Simulation 3, but adjust server capacity to account for the lower aggregate load and different load mix.
Below we have three scenarios:
Run the three scenarios:
In [10]:
rand_state = random.getstate()
sc1 = simulate_deployment_scenario(num_users=720, weight1=1, weight2=1,
server_range1=range(0, 9), server_range2=range(0, 9))
random.setstate(rand_state)
sc2a = simulate_deployment_scenario(num_users=720, weight1=1, weight2=1,
server_range1=range(0, 7), server_range2=range(7, 9))
random.setstate(rand_state)
sc2b = simulate_deployment_scenario(num_users=720, weight1=1, weight2=1,
server_range1=range(0, 6), server_range2=range(6, 9))
Compare the results of scenarios 1 and 2a:
In [11]:
compare_scenarios(sc1, sc2a)
Compare the results of scenarios 1 and 2b:
In [12]:
compare_scenarios(sc1, sc2b)
Conclusions: Scenario 1 performs significantly than better Scenario 2a and comparably to Scenario 2b. This simulation shows again that the cookie-cutter strategy is comparable in performance and throughput to a tuned individualized configuration, and beats hands-down an individualized configuration that is not perfectly tuned for the load mix.
In [13]:
users_curve = [(0, 900), (50, 540), (100, 900), (150, 540)]
In [14]:
rand_state = random.getstate()
sc1 = simulate_deployment_scenario(num_users=users_curve, weight1=2, weight2=1,
server_range1=range(0, 10), server_range2=range(0, 10))
random.setstate(rand_state)
sc2 = simulate_deployment_scenario(num_users=users_curve, weight1=2, weight2=1,
server_range1=range(0, 8), server_range2=range(8, 10))
compare_scenarios(sc1, sc2)
Conclusions: The cookie-cutter and individualized strategies produced similar results.
We now run a simulation similar to Simulation 4, with the difference that the number of users varies over time. This combines load variability over time as well as a change in load mix. As in Simulation 4, we adjust server capacity to account for the lower aggregate load and different load mix.
Below we have three scenarios:
Run the three scenarios:
In [15]:
rand_state = random.getstate()
sc1 = simulate_deployment_scenario(num_users=users_curve, weight1=1, weight2=1,
server_range1=range(0, 9), server_range2=range(0, 9))
random.setstate(rand_state)
sc2a = simulate_deployment_scenario(num_users=users_curve, weight1=1, weight2=1,
server_range1=range(0, 7), server_range2=range(7, 9))
random.setstate(rand_state)
sc2b = simulate_deployment_scenario(num_users=users_curve, weight1=1, weight2=1,
server_range1=range(0, 6), server_range2=range(6, 9))
Compare the results of scenarios 1 and 2a:
In [16]:
compare_scenarios(sc1, sc2a)
Compare the results of scenarios 1 and 2b:
In [17]:
compare_scenarios(sc1, sc2b)
Conclusions: Scenario 1 performs significantly better than Scenario 2a and comparably to Scenario 2b. This simulation shows again that the cookie-cutter strategy is comparable in performance and throughput to a tuned individualized configuration, and beats an individualized configuration that is not perfectly tuned for the load mix.
This final simulation is similar to Simulation 1, with the difference that the number of users is 864 instad of 720. In this scenario, the total number of servers required for best capacity utilization can be calculated to be 12 (see CapacityPlanning.xlsx). Under the individualized deployment strategy, the ideal number of servers allocated to svc_1 and svc_2 would be 9.6 and 2.4, respectively. Since the number of servers needs to be an integer, we will run simulations with server allocations to svc_1 and svc_2, respectively, of 10 and 2, 9 and 3, and 10 and 3.
Thus, we have five scenarios:
Run the scenarios:
In [18]:
rand_state = random.getstate()
sc1a = simulate_deployment_scenario(num_users=864, weight1=2, weight2=1,
server_range1=range(0, 12), server_range2=range(0, 12))
random.setstate(rand_state)
sc2a1 = simulate_deployment_scenario(num_users=864, weight1=2, weight2=1,
server_range1=range(0, 9), server_range2=range(9, 12))
random.setstate(rand_state)
sc2a2 = simulate_deployment_scenario(num_users=864, weight1=2, weight2=1,
server_range1=range(0, 10), server_range2=range(10, 12))
random.setstate(rand_state)
sc1b = simulate_deployment_scenario(num_users=864, weight1=2, weight2=1,
server_range1=range(0, 13), server_range2=range(0, 13))
random.setstate(rand_state)
sc2b = simulate_deployment_scenario(num_users=864, weight1=2, weight2=1,
server_range1=range(0, 10), server_range2=range(10, 13))
Compare the results of scenarios 1a and 2a1:
In [19]:
compare_scenarios(sc1a, sc2a1)
Compare the results of scenarios 1a and 2a2:
In [20]:
compare_scenarios(sc1a, sc2a2)
Compare the results of scenarios 1b and 2b:
In [21]:
compare_scenarios(sc1b, sc2b)
Conclusions: Scenario 1a has comparable throughput but somewhat better response times than Scenario 2a1. Scenario 1a has somewhat better throughput and response times than Scenario 2a2. Scenario 1b has comparable throughput and a bit less extreme response times than Scenario 2b. In all three comparisons, the cookie-cutter strategy performs better than or comparably to the individualized strategy.
The various simulations show consistently that the cookie-cutter strategy is comparable in performance and throughput (and therefore hardware utilization) to a tuned individualized configuration, and beats an individualized configuration that is not well-tuned for the load mix. Cookie-cutter thus proves to be a more robust and stable deployment strategy in many realistic situations, in the face of likely load mix fluctuations, mismatches between forecast average load mixes and actual average load mixes, and mismatches between forecast load mixes and allocated server capacities. However, although not highlighted on the simulation graphs presented, it is a fact (that can be observed in the simulation logs) that response times for svc_2 are better under a well-tuned individualized configuration because then svc_2 requests don't have to share a queue with longer-running svc_1 requests. When that's an important consideration, an individualized deployment strategy could be a more appropriate choice.