Ejabberd resource tests: Difference between revisions

From OLPC
Jump to navigation Jump to search
 
(16 intermediate revisions by 4 users not shown)
Line 7: Line 7:
== The purpose of these tests ==
== The purpose of these tests ==


The XS school server is going to be installed in schools with more
The XS [[school server]] is going to be installed in schools with more
than 3000 students. In these large schools, ejabberd is crucial for
than 3000 students. In these large schools, ejabberd is crucial for
functional [[Activity sharing|collaboration]]. If all the students
functional [[Activity sharing|collaboration]]. If all the students
Line 21: Line 21:


The client load was provided by
The client load was provided by
[http://dev.laptop.org/git?p=users/guillaume/hyperactivity/.git
[http://dev.laptop.org/git/users/guillaume/hyperactivity/.git/
hyperactivity]. Each client was limited in number of connections it
hyperactivity]. Each client was limited in number of connections it
could maintain (by, it seems, [[Telepathy Gabble]] or [[dbus]]), so
could maintain (by, it seems, [[Telepathy Gabble]] or [[dbus]]), so
Line 52: Line 52:
case).
case).


=== Logging and graphing scripts ===
=== First try: 523 accounts, single client machine ===


The scripts that collected the information and made the graphs are
Mostly by accident, 523 ejabberd accounts were created. These
stored in [http://dev.laptop.org/git/users/dbagnall/ejabberd-tests.git/ git].
accounts were used in 25 connection increments, connecting from just
one client.


=== benchmark results ===
The memory numbers were gathered after the load had settled after a
few minutes. Peak use was perhaps 10% higher.


====Comparisons====
Clients interval mem load avg client OK server OK
---------------------------------------------------------------
1 15 69 0.01 True True
50 15 72 0.02 True True
100 15 68 0.12 True True
125 15 74 0.08 True True
150 15 82 0.22 True True
175 15 84 0.24 True True
200 15 92 0.21 True True
225 15 96 0.06 True True
250 15 101 0.13 True True
275 15 103 0.09 False True
300 15 107 - False True
350 15 89 - False True


* [[Ejabberd_resource_tests/tls_comparison]] -- comparing aspects of tries 6 and 7.
Note: from 275 up the client crashed before the numbers had time to settle.


====With shared roster====
[[Image:523-users-15-seconds.png]]




* [[Ejabberd_resource_tests/try_9]] -- With ejabberd 2.0.2 and postgres.
This is probably best viewed as a steady linear increase,
* [[Ejabberd_resource_tests/try_8]] -- With ejabberd 2.0.1 and postgres.
with the low numbers hidden by a noise floor. Seen like that, there
* [[Ejabberd_resource_tests/try_7]] -- identical conditions to [[Ejabberd_resource_tests/try_6| try 6]], but with the old SSL code.
seems to be a memory footprint of 47MB + 22MB per 100 clients.
* [[Ejabberd_resource_tests/try_6]] -- up to 750 connections with shared roster and new SSL code.
Extrapolating to 3000 clients would add to about 700 MB, though that
* [[Ejabberd_resource_tests/try_5]] -- up to 450 connections with shared roster
is a very long way to extrapolate.


The results below might be less trustworthy, as the shared roster was not always working.
[[Image:523-users-start-100.png]]


* [[Ejabberd_resource_tests/try_1]]
=== Second try: 3200 accounts, multiple clients ===
* [[Ejabberd_resource_tests/try_2]]
* [[Ejabberd_resource_tests/try_3]]
* [[Ejabberd_resource_tests/try_4]] Faulty -- shared roster was not working.


=== Raw benchmark results ===
I ran hyperactivity on several machines to get the following results.

# Tested with 3200 users and 15 second intervals from 3 clients (2 XOs
# 50 each; toshiba laptop - up to 275).
#
# Clients secs mem load avg client OK server OK
100 15 119 0.12 True True
150 15 129 0.16 True True
175 15 145 0.38 True True
200 15 151 0.25 True True
225 15 166 0.31 True True
250 15 172 0.40 True True
275 15 179 0.43 True True
300 15 189 0.43 True True
325 15 194 0.51 True True
350 15 200 0.67 True True
375 15 206 0.60 False True
#Starting 200 users from xs-devel (martin's dell core2 duo laptop)
# then adding 2 XOs with 50 each
# then steps of 25 from the toshiba
200 15 167 0.20 True True
300 15 182 0.20 True True
375 15 211 0.80 True True
400 15 222 0.77 True True
#adding another XO
# web interface is very slow to report these connections,
# getting stuck first on 427
427 15 253 0.86 True True
# stop all but 1 XO (now dell 200, toshiba 100, XO 50).
350 15 234 0.56 True True
#restart 3 XOs, 1 at a time
400 15 232 0.88 True True
450 15 240 0.96 True True
500 15 104 1.02 True False
# web interface dies at 500-15. mem drops to 89.
# sharing works for new connections

However I tried it, ejabberd would always crash with around 500
connections. It turns out this was due to a system limit on the
number of open files, which can be raised by editing
/etc/security/limits.conf (see the next section).

[[Image:3200-users.png]]

This shows that memory usage is fairly well predicted as 80 MB + 37MB
per 100 active clients. That would mean an ejabberd instance with
3000 active clients needs about 1200MB.

The load average jumps around a bit, but definitely goes up as clients
are added.

=== Try 3: 3000ish clients; past the 500 connection barrier ===

Eventually I thought to increase the number of open files that
ejabberd can use, which allowed it to maintain more than 500
connections. This can be done in a couple of ways: most properly by
adding these lines to /etc/security/limits.conf:

ejabberd soft nofile 65535
ejabberd hard nofile 65535

or by putting this line in /etc/init.d/ejabberd:

start() {
+ ulimit -n 65535
echo -n $"Starting ejabberd: "

which shouldn't require a new login to take effect.

I started with about 2900 clients, but hyperactivity increased this as
it created new clients. It got to around 3300. The number of
inactive clients seems to have relatively little effect on ejabberd,
so I'm ignoring this.

# Starting with 2926 registered users
# open files set by ulimit to 65535
#
# Clients secs mem load avg client OK server OK
600 15 262 0.75 True True
650 15 269 0.84 True True
700 15 274 1.12 True True
750 15 311 1.32 True True
800 15 328 1.66 True True
850 15 354 1.80 True True
900 15 407 1.83 True True
950 15 368 1.78 True True
950 15 450 1.78 True True
1000 15 400 1.80 True True
1050 15 416 2.06 True True
1100 15 416 2.01 True True
1150 15 468 2.04 True True
1200 15 436 2.07 True True
# drop all but 200 clients, wait 5 minutes.
# 200 15 371 0.63 True True
1200 15 440 1.89 True True
# memory use was really jumpy. These numbers are approximately what
# the system converged on over time.
# (e.g. 1000 clients peaked over 500M, dropped to 390ish)
#
# After 800, clients sometimes dropped off.

1200 is about the limit that my test setup can get to (i.e. 4 * 250 +
4 * 50). ejabberd ran quite happily at that point, though the load
averages suggest it would not have liked much more.

ejabberd peaked at 667.6MB, and went over 500 several times. These
fleeting binges tended to follow the connection of new clients. As
hyperactivity connects with unnatural speed, it would unfair to judge
ejabberd on those numbers, but it does seem that ejabberd could do
with 50% headroom over its long term average.

This set gives us 69MB + 33 per 100, or about 1070MB for 3000, and it
suggests that 1.6 GB would accommodate surges. A faster processor is
almost certainly necessary.

[[Image:3000ish-users.png]]


Here is a graph combining the last 2:

[[Image:combined.png]]

and here is one with them all, without the line (remember the lower
set had a quite different number of accounts):

[[Image:combined-all.png]]

=== Memory use of inactive users ===


# Adding users, restarting to find the base load of registered inactive users.
#
# users after add after restart
523 - 27.2
670 59.5 31.1
924 83.4 39.4
1024 45.4
1324 84.7 50.0
1574 113.9 54.1
1774 116.0 64.1
1974 120.1 65.2
2200 130 71.0
2400 131 75.7
2600 130 85.3
2800 138 83.2
3200 160 93.7

This suggests the memory cost of registered inactive users is also
linear, and relatively low at 25MB per thousand.

[[Image:user-base.png]]

=== Try 4: a few thousand users ===

This test used several copies of hyperactivity on each client machine,
all using the same 15 second interval. Its presentation is formatted on a
time series: unlike in previous tests, the server only once stabilised on a set number of clients.

The graph below shows the numbers of registered users and online users
over 2 and a bit hours, or about 8000 seconds. The test ended with a
ejabberd crash.

[[Image:users_active-users_reg-error.png]]

The numbers diverge when a hyperactivity instance crashes badly: the
accounts are lost to hyperactivity so it creates new ones. After a
period of stability with 2000 users, ejabberd went somewhat haywire
when more connections were attempted. The points marked '''a''' are
times when the ejabberd web interface stopped responding (which was
the source of the numbers), while '''b''' is where it crashed
outright. During the stable period, XO collaboration was possible.

The red lumps along the bottom are points at which ejabberd logged
errors.

This next graph related memory use against active connections. The
server only has 1GB of RAM, so resident memory is restricted below
that.

[[Image:users_active-resident_mem-virtual_mem.png]]

Here's a closer view of the memory, including the ps_mem.py numbers,
which closely track top's resident memory report.

[[Image:psmem-resident_mem-virtual_mem.png]]

Load average over the same period:

[[Image:load_avg_1-load_avg_5-load_avg_15.png]]

and 1 minute load average vs ejabberd reported errors:

[[Image:load_avg_1-error.png]]

Also load average vs active users:

[[Image:load_avg_5_vs_users_active.png]]

Load drops quite a lot during the stable period.

This last picture shows various kinds of cpu usage.

[[Image:cpu_user-cpu_sys-cpu_wait-cpu_softIRQ.png]]


http://dev.laptop.org/~dbagnall/ejabberd-tests/ -- includes graphs.


=== Issues ===
=== Issues ===


* Is pounding ejabberd every 15 seconds reasonable? A lighter load actually makes very little memory difference, but it probably saves CPU time.
* Is pounding ejabberd every 15 seconds reasonable? A lighter load actually makes very little memory difference, but it probably saves CPU time.
ok

Latest revision as of 18:41, 24 February 2011

   Jabber: | Community Jabber Servers | Run a Jabber Server | Category:Jabber

The purpose of these tests

The XS school server is going to be installed in schools with more than 3000 students. In these large schools, ejabberd is crucial for functional collaboration. If all the students are using their laptops at once, ejabberd might be considerably stressed. These tests were run to find out how it runs in various circumstances.

Set up

The cpu of the server running ejabberd reports itself as "Intel(R) Pentium(R) Dual CPU E2180 @ 2.00GHz". The server has 1 GB ram and 2 GB swap.

The client load was provided by [http://dev.laptop.org/git/users/guillaume/hyperactivity/.git/ hyperactivity]. Each client was limited in number of connections it could maintain (by, it seems, Telepathy Gabble or dbus), so several machines were used in parallel. Four of the client machines were fairly recent commodity desktops/laptops -- one was the server itself -- and four were XO laptops. The big machines were connected via wired ethernet and could provide up to 250 connections each, while the XOs were using mesh and providing 50 clients each. From time to time hyperactivity would fail with these numbers and have to be restarted.

It took time to work out these limits, so the tests were initially tentative. The graphs below, the script that made them, longer versions of these notes, and perhaps unrelated stuff can be found at [1].

In order to test, I had to add the line

{registration_timeout, infinity}.

to /etc/ejabberd/ejabberd.cfg (including the full-stop).

The memory usage numbers below were gathered by ps_mem.py, and the load average is as reported by top. These are not peak numbers, but approximately what ejabberd settled to after running for some time. For the record, the memory use reported by top track that of ps_mem.py, but was consistently a little higher (as if it were counting in decimal megabytes, though I am not sure if this is the case).

Logging and graphing scripts

The scripts that collected the information and made the graphs are stored in git.

benchmark results

Comparisons

With shared roster

The results below might be less trustworthy, as the shared roster was not always working.

Raw benchmark results

http://dev.laptop.org/~dbagnall/ejabberd-tests/ -- includes graphs.

Issues

  • Is pounding ejabberd every 15 seconds reasonable? A lighter load actually makes very little memory difference, but it probably saves CPU time.

ok