Skip to content

deeanop/Close-Workload-Infrastructure-Simulation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Computing Infrastructure with a Closed Workload tags: closed, single class, Delay/Queue, JSIMg.

In this section we describe a model of a computing infrastructure with a closed work- load (see Sect. 1.2) solved with the simulation technique. The main characteristic

of this type of workload is that the number of customers in execution is constant. A new customer enter the system when a customer complete its execution. On the basis of the assumptions made, this model could also be solved analytically with JMVA. However, we have used the simulation technique to provide a first simple example of implementing a model with a simulator. Furthermore, it should be noted that simulation is by far the most popular modeling technique used in performance engineering. Indeed, simulators are very powerful tools and the set of models they can implement is practically unlimited given the great generality offered in terms of characteristic of the systems and type of assumptions that can be represented.

2.2.1 Problem Description

A computing infrastructure, located in a large data center, is used to execute appli- cations that are very critical to the company’s business. This infrastructure adopts

very high security techniques to control accesses that are reserved only to a lim- ited number of authorized employees. It mainly consists of three servers: a Web

Server (WS) and two servers (AS1 and AS2) dedicated to the Application and Storage functions, see Fig. 2.6a.

Due to the apps executed, the resource requirements of the user requests are simi- lar, i.e., the workload issingle-class. The Service times of the three servers have

different mean values, and are assumed exponentially distributed. The probabilities (i.e., the routing probabilities) that the requests in output from the Web Server are routed to servers AS1 and AS2 are known. In some problems, instead of the routing probabilities, the visits that a request perform to each resource during its execution

Fig. 2.6 The computing infrastructure considered (a) and the corresponding queueing network (b)

28 2 Systems with Homogeneous Workloads are known. These two sets of values are related each other and to derive one set from the other it is required to know the topology of the network. In Appendix A.1 it is described how to obtain the relationships between the routing probabilities and the visits for the topology considered in Fig. 2.6b. Assuming that when a request leaves the model it has been completely executed, i.e., that it is V0 = 1, we have:

VWS = 1 p0 = 10 VAS1 = p1 p0 = 6 VAS2 = p2 p0 = 3 (2.2) Models can be parameterized with one set of values or the other. JSIMg accept both types of parameters. The scheduling algorithm adopted by the resources is FCFS.

2.2.2 Model Implementation Since the number of users (i.e., the employees authorized to access the computing infrastructure) is constant, we implement a closed model with four stations: one delay and three queue, see Fig. 2.6b. Each user submit one request. The probabilities

pi’s that after a visit to the Web server WS a request is routed to App&Stora- ge servers ASi are known. The index 0 is used to represent the world outside

the system, and the metrics with index 0 are at system-level. Therefore, X0 and R0 represent the Throughput and the Response time of the global system, and p0 is the probability that a request leaves the system as it has completed its execution. We assume that a request is routed to this path only once in his lifetime, so the number of visits V0 that it performs outside the system is one. According to the layout of the model it is 2 i=0 pi = 1.

The workload is generated by a station external to the system representing the

Users, that we consider as Reference station. This station is used to com- pute the System Response Time R0 and the System Throughput X0 .

R0 is defined as the period of time between the instant in which a request enters the model (leaving the Reference station) and the one in which it leaves the model (entering the Reference station). X0 is the rate of completed requests that leave the model and enter the Reference station. Others performance indexes are also influenced by the selection of the station that will be considered as reference (see Appendix A.1). The mean Service time for each Visit to servers

WS, AS1 and AS2 are: SWS = 0.005 s, SAS1 = 0.020 s, and SAS2 = 0.025 s, respec- tively. The think time of the delay station Users is Z=1s. All the values are

exponentially distributed. The JSIMg model of Fig. 2.7 was solved with simulation. The routing probabilities of the requests leaving the Web Server are: p0 = 0.1, p1 = 0.6, and p2 = 0.3, see Fig. 2.8.

2.2 A Computing Infrastructure with a Closed Workload 29

Fig. 2.7 The JSIMg model of the computing infrastrucure of Fig. 2.6b

Fig. 2.8 Settings of the Routing Probabilities of the Web Server WS 2.2.3 Results Several objectives of the capacity planning study were set. In what follows we will describe the results of some of them referred to as Obj.1–Obj.4.

Obj.1: Implement the model of the computing infrastructure with the param- eters assigned. Investigate the behavior of System Throughput X0 and

System Response Time R0 for the Number of Customers N0 ranging from 1 to 20. Which will be the 90th percentile of R0 with N0 = 20?

30 2 Systems with Homogeneous Workloads

Fig. 2.9 System Throughput and System Response Time versus Number of customers A What-if analysis is performed by setting the Number of customers N0 = 1 ÷ 20 as control parameter. Figure 2.9 show the behavior of System Throughput X0 and System Response Time R0, respectively, with respect to N0. Please note that the R0 values computed by JSIMg include the time spent in the Reference station, i.e., the Users station, that is Z = 1 s. For N0 = 20 we have X0 = 8.32 req/s and R0 = 2.4 s. As N0 increases from 1 to 20, X0 becomes flat and tends to its horizontal upper bound, while R0 becomes linear and tends to its lower bound which is a oblique line. These behaviors are typical of closed systems when a resource is approaching saturation. In the following Objs. 2, 3 we will analyze this condition in detail. The values of some percentiles of the System Response Times, for example the 90th or the 95th, are often requested in performance studies. Let us recall that the 90th percentile 90 of a variable Y is the value below which can be found 90% of all the values assumed by Y, i.e., it is P(Y ≤ 90) = 0.9. To obtain the percentile values in JSIMg it is necessary to flag the check box Stat.Res. (see, e.g., Fig. 1.8) in the window of the metrics to be collected. A CSV file with all collected values of the selected metric is then generated and stored. Various statistical indexes are computed by clicking on the Statistical Results button (see Fig. 1.9) in the window of the analyzed metric. Selecting Distribution as a drawing option, the values are sorted in increasing order and are grouped in intervals. For example, 300 intervals have been selected in Fig. 2.10. The percentiles corresponding to each interval are calculated and stored in a CSV file. A sample of this file for the intervals 70 ÷ 76 with the corresponding percentiles (from 88.9 to 91.3) is shown in Fig. 2.11. The 90.1 percentile corresponds to R0 = 4.88 s. It should be noted that if the values of a variable Y are exponentially distributed it is 90 2.3 x (mean value of Y ). In our case, the values of R0 are hypo-exponentially distributed (the coefficient of variation is 0.76<1, see Fig. 2.10). Their variance is less than that of an exponentially distributed variable with the same mean. Thus, it seems correct to obtain the value of

2.2 A Computing Infrastructure with a Closed Workload 31

Fig. 2.10 Statistical indexes of the System Response Times

Fig. 2.11 Sample of the CSV file with the values of R0 sorted in increasing order and subdivided into 300 intervals. The four columns refer respectively to: the id of the intervals, the extremes of each interval, and the percentile corresponding to the extreme with maximum value 4.88 s for the 90th percentile of R0 which is less than 5.52 s (2.3 x 2.4), as it would be if they were exponentially distributed. By increasing the number of intervals, more detailed percentiles can be obtained. Obj.2: To improve the computing infrastructure performance, one of the first actions that seems natural is to replace AS2, the slowest of the App&Storage servers, with a new model that is 20% faster (that is, the same as AS1). Evaluate the effects on X0 and R0. The mean Service time of server AS2 of the original model must be modified decreasing its value from 0.025 to 0.020 s. The model with the What-if for N0 = 1 ÷ 20 users is executed again. As expected, the Utilization of AS2 decreased, e.g., from 61.8% to 50% with N0 = 20. However, surprisingly NO improvements are obtained on X0 and R0.

32 2 Systems with Homogeneous Workloads

Fig. 2.12 Utilizations of the three servers AS1, AS2, and WS versus N0 Indeed, with the new fast server AS2 we have X0(20) = 8.27 req/s and R0(20) =

2.42 s while with the slow one we had X0 = 8.32 req/s and R0 = 2.40 s, respec- tively. The two values of X0 can be considered equally likely estimates of the exact

throughput value since they are both in the same 99% confidence interval. The same observation applies to R0 (see Appendix A.2).

Analyzing the Utilizations of the three servers, in Fig. 2.12a with the orig- inal configuration and in Fig. 2.12b with the new AS2, we have an answer to this

unexpected result. From Fig. 2.12a it is possible to see that the utilizations of AS1 and AS2 are unbalanced, and that AS1 is the bottleneck of the computing infrastructure despite being the faster of the two. Indeed, its utilization is the highest of all servers and for heavy load it is close to saturation (e.g., with N0 > 15 it is UAS1 > 0.95). This is the main motivation of the uselessness of the action we have done: improving any station but the bottleneck do not generate any performance gain with heavy workload. It is known that performance improvements can only be achieved by reducing its contention. Actions that reduce the load of stations other than the bottleneck produce minimal improvements (if any) only under very light workload (see Obj.3). Obj.3: Given the insignificant results obtained in Obj.2, we want to evaluate the performance improvements that can be achieved by replacing the AS1 server with a new model 20% faster (the same increase considered in Obj.2 for AS2). We recompute the original model (Fig. 2.7) settings the mean Service Time of server AS1 to a value 20% faster (from 0.020 s to 0.016 s). We then execute again the What-if for N0 = 1 ÷ 20 users obtaining the values of X0 and R0 reported in Fig. 2.13. For N0 = 20, with respect to the original system, X0 increases of 20%, from

2.3 Equivalent Model with Service Demands 33

Fig. 2.13 X0 and R0 with the new server AS1 20% faster 8.32 to 9.99 req/s, and R0 drops of 17% from 2.4 to 1.99 s. The bottleneck remain the server AS1, its utilization is 0.95 (in the original model was 0.99). Let us remark that these positive results were obtained because we improved the station that is the bottleneck of the system, i.e., the server AS1. Indeed, as seen in the previous Obj.2, improving other stations do not produce any significant results on performance.

Obj.4: According to the management, the number of internal employees autho- rized to access the computing infrastructure may increase to 40 in a semester.

Which will be R0 and X0 with the actual configuration with N0 = 40 users? We recompute the original model (Fig. 2.7) settings the Number of Customers in the closed class definition window to 40. The behavior of the mean value of R0 and of the confidence intervals during the simulation are shown in Fig. 2.14. As can be seen, the mean value of R0 is 4.821 s obtained from the model is very close to the lower bound 4.8 s given by N0Dmax = N0VAS1 SAS1 = 40 x 0.12. The X0(40) is 8.325 req/s, very close to its upper bound 1/DAS1 = 1/0.12 = 8.333 req/s (see Sect. 2.3).

About

🟢 Finished

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors