XS Load Testing
 Testing the OLPC School Server
Author: Benjamin Tran
The One Laptop Per Child (OLPC) project oversees the development, construction, and deployment of an affordable educational laptop (XO) for use in the developing countries. OLPC has been deployed in many countries and as of August 2010, there are over 1.85 million XO laptops in the field. The low-cost XOs help to revolutionize the way we educate the world's children. In the school, each child's XO laptop connects to a central server that provides various networking, content, and communication services. The server also acts as the Internet gateway if an Internet connection is available. The hardware capabilities of the server depend on factors such as cost of hardware and availability of electricity. A more powerful machine means faster access to data and the capability to handle more users simultaneously, but this usually implies higher power consumption, a luxury that is unaffordable in many parts of the world. The software that runs on the server in the classroom, called the School Server (XS), is based on the Fedora distribution of the Linux operating system and provides networking infrastructure, services, as well as education and discovery tools to the XO laptops. Among the stack of customized software included in the XS, one of the most noticeable collaborations tools included is Moodle. Moodle is a free web-based course management system. With the customized software running on limited hardware resources, how many users can reasonably connect to the server and use Moodle? What hardware combination would be ideal for use with the XS given the estimated total number of students in a school? The purpose of this project is to come up with a method to determine the approximate number of users a potential server loaded with the OLPC School Server is capable of supporting. The approach taken will be to perform functional and load testing against different hardware systems loaded with the School Server software. The project makes use of industry-standard technologies and tools such as Selenium, Apache JMeter, and nmon for Linux. Selenium ensures Moodle functions according to specifications while JMeter simulates various load against the server to test its capability. Nmon for Linux monitors server-side resources and presents graphs of CPU usage, free memory, and disk I/O activities among many others. Test results from six different machines are compared. The outcome of this project is a formal process to test and measure the capability of a potential server that is being considered for deployment.
This work is the result of taking course CSC895 Applied Research Project (Computer Science Department, San Francisco State University), with the guidance and assistance from the following advisors:
- Professor Sameer Verma, College of Business, San Francisco State University
- Professor Dragutin Petkovic, Department of Computer Science, San Francisco State University
- Professor Barry Levine, Department of Computer Science, San Francisco State University
This work is licensed under the Creative Commons Attribution-NonCommercial 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.
 Full Document
 Scripts and Testing Material
Git repository at https://dev.laptop.org/git/users/sverma/xsloadtesting/
 Methods and Approach
The purpose of this project is to come up with a method to determine the approximate number of users a potential server loaded with the OLPC School Server is capable of supporting. The approach taken will be to perform functional and load testing against different hardware systems loaded with the School Server software. Functional testing ensures that the software works the way it was intended to. Effective test cases need to be written to cover major and common use cases of the software system. Load testing involves putting variable demands onto a system and measuring its response time. This ensures the system's hardware is capable of supporting an expected number of simultaneous users at peak times. Similarly, exemplary test cases need to be written for this type of testing. The test approach needs to be defined. The review and selection of software testing tools is also of great importance. Additionally, a comprehensive but yet easy-to-use system monitoring tool must be chosen to monitor the server's resources, such as CPU and memory usage. Taking all of the required tasks into consideration, the following design objectives were finalized (with the advice of the candidate's committee members):
- To design and write effective test cases to ensure the program's major functionality is within the scope of specifications.
- To design and write exemplary test cases that can effectively resemble the activities and network traffic that occur when many users simultaneously access the system.
- To select a quick and efficient approach to carry out these test cases.
- To choose an effective functional testing framework that allows for easy creation of tests since a web application's user interface is expected to change frequently.
- To choose an easy-to-use but yet powerful load testing tool that can simulate a heavy load with unique, dynamic data.
- To select one or more comprehensive system monitoring tool that can graphically show the consumption of a system's resources.
- To pick out and obtain several computer systems that can represent a broad spectrum of potential servers. The XS software will be loaded on to these systems and testing will be performed against them.
- To write a thorough test plan containing information such as the testing environment, features to be tested, features not to be tested, test approach, hardware/software setup, etc.
All of the above design objectives that served as requirements for the project were successfully satisfied.
 Performance/Load/Stress Testing
Many people use the terms performance testing, load testing, and stress testing interchangeably but they have quite different meanings . Unlike functional testing, performance testing does not aim to uncover defects in the system. Instead, it is performed with the hopes of eliminating bottlenecks and establishing a baseline for future regression testing. It is of crucial importance to have a set of expectations for the testing so that the results can be meaningful. For example, in the context of this project, it is necessary to know the expected number of students that will be using the School Server's services at the same time and how long the acceptable response time will be. When the number of students reaches a certain level that causes the response time to be no longer acceptable, then bottlenecks can be looked for at different levels (application, web server, database, operating system, network) and then performance tuning can be performed. Due to the many layers of complexity involved in a modern day web application, performance tuning has to be done with care. Measurements should be collected after every modification of a variable. Changing multiple variables for a single test session can lead to complications and unexpected results. The complete process should be to run the tests, measure the performance, tune the system and repeat the cycle until the system under test performs at an acceptable level. This process is also known as configuration testing.
The definition of load testing is to execute the largest tasks the system under test can handle and understand the behavior of the system under an expected load . Load testing is usually performed when the maximum number of supported concurrent users for a system is known in advance. Thus, the system is put under such a load to ensure it can still function properly.
The goal of stress testing is to make sure a system fails and then recovers in a graceful manner . This is done by either using up all the resources of the system or taking resources away from the system. For example, stress testing a web application could involve taking the database offline or running unrelated processes that consumes most, if not all, of the resources (CPU, memory, disk, etc.) on the web server
 Decisions regarding Performance/Load Testing
JMeter was selected for automating the load and performance testing in this project due to its popularity and large user base in the testing community. Similar to Selenium, JMeter offers a HTTP Proxy Recorder component to allow for quick creation of tests by recording all the requests made in a browser. Unwanted requests can be easily filtered or deleted. Shared settings such as server and port address can be extracted to a central setup location in the test script. The tool allows for testing with dynamic data; a different record from a comma-separated file can be used for each thread in a test. For example, it is possible to simulate fifty different users logging in to a system instead of reusing the same account over and over again. Some systems do not allow multiple logins from the same account, too. Also, the built-in Linux commands and the nmon tool should be sufficient for monitoring resource usage on the School Server during test execution.
The load on a server primarily depends on the number of concurrent users and not on the total number of user accounts in the system nor the number of users logged in at a given point in time. There are different definitions of concurrent users on different systems though . Generally, concurrent users are those that are causing the server to actively do something for them at the same time, such as processing a page, querying the database, or transferring a file. In other words, many similar computations are being performed simultaneously and possibly interacting with each other on the server. Many users trying to post to a forum at the same time or many users trying to watch a certain video at the same time are considered concurrent users in this case. Therefore, test cases must be designed with this fact in mind. It will take a very long time if the tester needs to manually create hundreds or thousands of accounts in the system for testing. Although Moodle provides the option to perform a bulk upload of user accounts, there is still a need to easily create a CSV file with many records of account information. Therefore, a Perl script was written to perform this task. It generates a CSV file for upload to Moodle to create many user accounts at once along with various settings such as enrolling each user into an existing course. The script also generates a CSV file that can be set to be used in a JMeter test so each thread can use a record in the file, thus simulating different users logging in at the same time. The script and its usage can be found in Appendix E.
In addition to the number of concurrent users, there are other factors that can have a significant impact on the test results. First, a course representative of the actual courses used in the field is needed. In the countries where OLPC is deployed, the course materials are different from what a typical course here in the United States would contain. Each user loads the course page in Moodle and the amount of content on the course page contributes to the overall response time. For the project, a sample course was created, which is shown below in Figure 7.a. Due to the lack of data from the field, we could only estimate what should be in the course. Also, the ramp-up time should not be overlooked. It is highly unlikely that all students try to connect to the School Server at the exact same time after they were told to do so. The amount of time to spread out the connection requests to the XS can greatly affect the overall response time. Based on a video of an OLPC deployment, we estimated the average ramp-up time to be 60 seconds . That was the value used for the experiments in this project. In addition, to make the load more realistic, a Gaussian Random Timer was added to each JMeter test case to add a small random delay between each request. This is equivalent to a Normal distribution of the load. A ramp-up time of 60 seconds can be viewed as aggressive. It really depends on the number of students in the classroom. There are lots of variations, leading to a wide range of possible values. We do not have good estimates and cannot observe the actual ramp-up time in a classroom in a remote part of Peru, for example. The effects to system performance could be significant due to the load being spread out over a longer period of time. An experiment was performed to confirm this and the results can be found in Appendix I. In the remainder of this paper, we will use 60 seconds as the average ramp-up time.
It would be too time-consuming to set up the test environment over and over again for each execution of the JMeter test on each of the machines. User accounts have to be uploaded. The sample course has to be restored to Moodle. Various IDs in Moodle need to be known and configured into JMeter. These IDs are different for each installation. The forum should be restored to its original state for each test run. The aforementioned list is just some of the steps needed to set up the test environment. A couple of Selenium scripts were written for the purpose of setting up and cleaning up Moodle for test execution. These scripts can be found in Appendix F.
 Test Execution
Testing was done on six different machines, ranging from a XO laptop to a Dell Xeon workstation. The XO laptop used for testing was the same as the ones used by students in the field. The FitPC and FitPC2 machines are compact, fanless, and power-efficient. They are potential candidates for an OLPC deployment. The OLPCorps SolidLogic machine is the actual hardware used in the field and thus must be included. A regular P4 desktop computer and a powerful Dell Xeon workstation were included in the project to see how their performance compare to the other machines. Specifications of all test machines can be found in Appendix H.
After the test environment was set up according to the test plan, a Selenium RC server was started on the machine executing the tests. Then the Selenium tests were executed against the School Server. After running the functional tests, JMeter was started on the machine executing the tests. The server monitoring tool, nmon, was started on the School Server so it could monitor system resources during execution of the JMeter tests. The server was also restarted between each test run to avoid any data from being cached in memory.
For each JMeter test, we used 25 threads as an initial starting point. Since it would take a considerable amount of time to find the point where errors start occurring if we increased the number of threads by increments of 25, we took a guess at where this breaking point would be. In the case of the 'Log In' test (test case ID XSP001) being executed on the Dell Xeon workstation, the guess was 1000 threads. The corresponding error rate was 12.5% and so we started decreasing the number of threads by 25 until we found the breaking point to be at 850 threads. From there, we executed the test to get the average response time for four more data points (threads = 825, 800, 775, 750) prior to the breaking point. The same approach was used for all test executions.
Figure 8.m consists of data from the test results. For each test machine, it shows the maximum number of users that were able to successfully complete the test case with no errors and the total average response time for each user. As we can see, 125 users took an average of 164 seconds to log in with the School Server running on the XO. In other words, if 125 users try to log in to Moodle over a period of 1 minute, each user will have to wait close to 3 minutes before seeing the next page. Although there was no error for everyone, the total average response time might not be acceptable to most users. What total average response time is acceptable or unacceptable will not be discussed here.
The purpose of this project was to create a method to help determine what hardware is ideal for use given the total number of students. Or, vice versa, given a system that could potentially act as the server, we can use the same method to find out how many users it is capable of supporting. Functional testing using Selenium ensured each system in the experiments was working correctly. Load testing using Apache JMeter provided us the data in Figure 8.m. Nmon for Linux verified the correctness of this data by logging and presenting consistent patterns of CPU usage, free memory, and disk I/O activities across different test cases. The different hardware systems used in the experiments, ranging from the XO itself to a powerful workstation, presented data across a spectrum. This array of data allowed us to see the relationship between increased hardware performance and the maximum number of users a system can handle. The method of testing presented in this paper helps to solve the stated problem. For example, if the FitPC is being considered for deployment, we can claim it can support up to 150 users. The number would increase to 175 for the FitPC2 along with decreased total average response times. However, as a safety measure, we recommend reducing the total number of users in the table by 20% as this will help ensure the School Server hardware can really support the load. That would bring the maximum number of users supported by the FitPC2 down to 140.
The test results were presented in this chapter. The functional test results ensured that the system was working according to the specified requirements. After this was done, the system was ready for the load testing. Server-side resources were monitored while tests were being executed. Analysis of the test results and resource usage was performed afterward. The next chapter is the concluding chapter describing the outcome of this project and recommendations for future work that could be done to enhance this project.
The OLPC School Server was designed to be installed on generic low-end servers with limited hardware resources. The limitations put a cap on how many students can be supported simultaneously when they are all trying to access the server at a given point in time. The main goal of this project was to design, write, and execute a test plan on a set of representative computer systems in order to help determine what hardware combination would be needed given the total number of students in a school. We wrote a test plan for carrying out functional and load tests against the OLPC School Server. In the process of writing the test plan, certain decisions had to be made, including what machines to test against, what tools to use for testing, what test cases to include, what a sample course should contain, and the estimated ramp-up time of users. Every one of these decisions were crucial and affected the overall results of this project. We ran into several technical issues during the project such as setting up the network connections, obtaining initial access to Moodle, generating and uploading Moodle user accounts in bulk, bypassing an unexpected session key error, and installing tools for server-side resource monitoring. All the technical problems and solutions were documented in this paper, which will benefit any contributor who plans on doing future work relevant to this project.
Once the test plan was written and the test environment was set up, the tests were executed. Results were logged and resources were monitored. A detailed description and analysis of the results were provided in the previous chapter, along with a comparison of the results from all the test machines. Given an estimated total number of students at a future deployment site, we wanted to know what hardware combination would be ideal for use as the School Server. The hardware selection process depends on the number of XOs in the school, the cost of hardware, power requirements of the hardware, and the availability of electricity at the deployment site. This project provided a formal process to set up, test, and measure the capabilities of potential machines. The outcome of this project included a test plan, test scripts, and experimental results from executing these tests against a set of test machines. The testing process can be easily replicated with a different set of test machines for future OLPC deployments.
 Future Work
There is no easy formula to calculate the maximum number of concurrent users a system is capable of supporting. That number highly depends on the hardware, software, and network combination involved. There are many performance factors that lie within each category. Having a faster processor will help to a certain extent, but it is usually the amount of memory installed that is the deciding factor. Performance also depends on the configurations of the different software running on the system. Work can be done to benchmark each component of the software stack – operating system, web server, database server, PHP engine and accelerator, for example. Further work could involve tuning variables in the configurations one-by-one, executing tests against the system, and analyzing the effects of a single change. The work done in this project provided a process and a set of tests that can act as a starting point for a much bigger project and thus, only barely scratched the surface of what can be done.