Andreas: dice::hfe::GhostManager::writeData ()

"Einar S. Idsø" esi at itk.ntnu.no
Thu Jun 30 18:06:15 EDT 2005


Hi Andreas,

Ever since we this morning enabled ranking on our servers, they have
been crashing ridiculously often. We do not know if this is related to
the ranking itself, or if it is because of the absolutely amazing
interest people have taken in our servers since they became ranked (they
are CONSTANTLY full, and it is a matter of minutes or even seconds
before they fill up after a crash).

However, there seems to be a pattern: Almost every single crash has the
same backtrace, so this bug could perhaps be "simple" to fix. And if it
is indeed the primary problem for crashing, as seems to be the case from
what I'll be showing below, it may affect many people badly and
hopefully get a relatively high priority to be fixed.

Following is the output on two dual Opteron servers from the following
script:
-----
#!/bin/bash

for filename in `ls -t -C1 $@`; do
    ls -l $filename
    gdb -batch -c $filename bin/amd-64/bf2_f |grep -e "^\#"
done
-----
The script is run in the bf2 server installation dir with core* as argument.

The first output is from a server which runs two 64-player servers (and
has the clock offset by +2 hours - never mind that ;) ):

-rw-------  1 server users 278659072 Jul  1 01:09 core.6586
#0  0x0000000000000000 in ?? ()
-rw-------  1 server users 199155712 Jul  1 00:44 core.8196
#0  0x000000000043c19b in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 242470912 Jul  1 00:33 core.5028
#0  0x000000000043c195 in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 188968960 Jul  1 00:13 core.6234
#0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 194682880 Jun 30 23:56 core.5653
#0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 193380352 Jun 30 23:28 core.5026
#0  0x0000000000000050 in ?? ()
-rw-------  1 server users 254894080 Jun 30 23:00 core.2508
#0  0x000000000043c195 in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 237834240 Jun 30 22:50 core.3078
#0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 196165632 Jun 30 21:15 core.2426
#0  0x000000000043c195 in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 228093952 Jun 30 19:56 core.31182
#0  0x000000000043c195 in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 239190016 Jun 30 18:33 core.28499
#0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 232099840 Jun 30 18:26 core.29738
#0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 225918976 Jun 30 17:13 core.28603
#0  0x000000000043c1a1 in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 235868160 Jun 30 16:18 core.26252
#0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 222158848 Jun 30 16:10 core.27353
#0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 251736064 Jun 30 15:09 core.23075
#0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 224284672 Jun 30 14:32 core.23197
#0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 288645120 Jun 29 23:31 core.5484
#0  0x000000000043c195 in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 299806720 Jun 29 20:47 core.2594
#0  0x0000000000000000 in ?? ()
-rw-------  1 server users 361648128 Jun 29 00:56 core.6367
#0  0x000000000043c195 in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 297541632 Jun 28 22:14 core.9107
#0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 216424448 Jun 28 19:09 core.4090
#0  0x000000000043c195 in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 215441408 Jun 28 15:40 core.4076
#0  0x000000000043c19b in dice::hfe::GhostManager::writeData ()


The second output is from a server which runs one 64-player server and 2
32-player servers:

-rw-------  1 server users 308256768 Jun 30 23:21 core.27145
#0  0x000000000043c19b in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 234938368 Jun 30 21:04 core.24407
#0  0x000000000043c1a1 in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 182644736 Jun 30 19:21 core.23964
#0  0x000000000043c195 in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 239296512 Jun 30 17:55 core.18121
#0  0x00002aaaae706890 in ?? ()
-rw-------  1 server users 222105600 Jun 30 15:54 core.15912
#0  0x000000000043c195 in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 303640576 Jun 30 14:32 core.13667
#0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 212078592 Jun 30 13:47 core.8387
#0  0x000000000043c19b in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 289026048 Jun 30 13:13 core.8050
#0  0x000000000043c19b in dice::hfe::GhostManager::writeData ()
-rw-------  1 server users 179232768 Jun 29 18:37 core.3653
#0  0x00000000004c0c09 in dice::hfe::CheckServerAliveThread::run ()
-rw-------  1 server users 170971136 Jun 29 18:37 core.2794
#0  0x00000000004c0c09 in dice::hfe::CheckServerAliveThread::run ()
-rw-------  1 server users 172134400 Jun 29 18:37 core.2795
#0  0x00000000004c0c09 in dice::hfe::CheckServerAliveThread::run ()

As you can see, the dice::hfe::GhostManager::writeData () function is
active at almost every single crash.

Following is the detailed backtrace of one of those, more specifically
core.8196 which is number 2 from the very top:

--- BEGIN LONG GDB OUTPUT ----------------------------------
(gdb) bt
#0  0x000000000043c19b in dice::hfe::GhostManager::writeData ()
#1  0x000000000044396e in dice::hfe::GhostManager::transmit ()
#2  0x00000000004316ab in dice::hfe::ClientConnection::transmitMsgs ()
#3  0x000000000045c248 in
dice::hfe::GameServer::processGameStateAndSendPackets ()
#4  0x000000000045cd17 in dice::hfe::GameServer::update ()
#5  0x00000000004c29b0 in dice::hfe::BF2Engine::mainLoop ()
#6  0x000000000040ad1c in dice::hfe::BF2::run ()
#7  0x000000000040b2d8 in main ()
(gdb) info threads
  5 process 8204  0x00002aaaab5f9075 in nanosleep () from /lib/tls/libc.so.6
  4 process 8205  0x00002aaaab5f9075 in nanosleep () from /lib/tls/libc.so.6
  3 process 8206  0x00002aaaab61cff6 in select () from /lib/tls/libc.so.6
  2 process 8207  0x00002aaaab5f9075 in nanosleep () from /lib/tls/libc.so.6
* 1 process 8196  0x000000000043c19b in
dice::hfe::GhostManager::writeData ()
(gdb) thread 2
[Switching to thread 2 (process 8207)]#0  0x00002aaaab5f9075 in nanosleep ()
   from /lib/tls/libc.so.6
(gdb) bt
#0  0x00002aaaab5f9075 in nanosleep () from /lib/tls/libc.so.6
#1  0x00000000007080c3 in dice::hfe::System::sleep ()
#2  0x00000000004bc4ce in dice::hfe::VoipServerInternalHookThread::run ()
#3  0x0000000000708b3a in dice::hfe::(anonymous
namespace)::pthreads_thread_trampoline
    ()
#4  0x00002aaaab89d0b1 in start_thread () from /lib/tls/libpthread.so.0
#5  0x00002aaaab623263 in clone () from /lib/tls/libc.so.6
#6  0x0000000000000000 in ?? ()
#7  0x0000000000000000 in ?? ()
#8  0x0000000000000000 in ?? ()
#9  0x0000000000000000 in ?? ()
#10 0x0000000000000000 in ?? ()
#11 0x0000000000000000 in ?? ()
#12 0x0000000000000000 in ?? ()
#13 0x0000000000000000 in ?? ()
#14 0x0000000000000000 in ?? ()
#15 0x0000000000000000 in ?? ()
#16 0x0000000000000000 in ?? ()
#17 0x0000000000000000 in ?? ()
#18 0x0000000000000000 in ?? ()
#19 0x0000000000000000 in ?? ()
#20 0x0000000000000000 in ?? ()
#21 0x0000000000000000 in ?? ()
#22 0x0000000000000000 in ?? ()
#23 0x0000000000000000 in ?? ()
#24 0x0000000000000000 in ?? ()
#25 0x0000000000000000 in ?? ()
#26 0x0000000000000000 in ?? ()
#27 0x0000000000000000 in ?? ()
#28 0x0000000000000000 in ?? ()
#29 0x0000000000000000 in ?? ()
#30 0x0000000000000000 in ?? ()
#31 0x0000000000000000 in ?? ()
#32 0x0000000000000000 in ?? ()
#33 0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) thread 3
[Switching to thread 3 (process 8206)]#0  0x00002aaaab61cff6 in select ()
   from /lib/tls/libc.so.6
(gdb) bt
#0  0x00002aaaab61cff6 in select () from /lib/tls/libc.so.6
#1  0x000000000079ff5f in dice::hfe::io::SocketManager::sleep ()
#2  0x000000000079b5aa in dice::hfe::io::NetServerThread::run ()
#3  0x0000000000708b3a in dice::hfe::(anonymous
namespace)::pthreads_thread_trampoline
    ()
#4  0x00002aaaab89d0b1 in start_thread () from /lib/tls/libpthread.so.0
#5  0x00002aaaab623263 in clone () from /lib/tls/libc.so.6
#6  0x0000000000000000 in ?? ()
#7  0x0000000000000000 in ?? ()
#8  0x0000000000000000 in ?? ()
#9  0x0000000000000000 in ?? ()
#10 0x0000000000000000 in ?? ()
#11 0x0000000000000000 in ?? ()
#12 0x0000000000000000 in ?? ()
#13 0x0000000000000000 in ?? ()
#14 0x0000000000000000 in ?? ()
#15 0x0000000000000000 in ?? ()
#16 0x0000000000000000 in ?? ()
#17 0x0000000000000000 in ?? ()
#18 0x0000000000000000 in ?? ()
#19 0x0000000000000000 in ?? ()
#20 0x0000000000000000 in ?? ()
#21 0x0000000000000000 in ?? ()
#22 0x0000000000000000 in ?? ()
#23 0x0000000000000000 in ?? ()
#24 0x0000000000000000 in ?? ()
#25 0x0000000000000000 in ?? ()
#26 0x0000000000000000 in ?? ()
#27 0x0000000000000000 in ?? ()
#28 0x0000000000000000 in ?? ()
#29 0x0000000000000000 in ?? ()
#30 0x0000000000000000 in ?? ()
#31 0x0000000000000000 in ?? ()
#32 0x0000000000000000 in ?? ()
#33 0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) thread 4
[Switching to thread 4 (process 8205)]#0  0x00002aaaab5f9075 in nanosleep ()
   from /lib/tls/libc.so.6
(gdb) bt
#0  0x00002aaaab5f9075 in nanosleep () from /lib/tls/libc.so.6
#1  0x00000000007080c3 in dice::hfe::System::sleep ()
#2  0x000000000044a8cb in dice::hfe::AutoRecorderHookThread::run ()
#3  0x0000000000708b3a in dice::hfe::(anonymous
namespace)::pthreads_thread_trampoline
    ()
#4  0x00002aaaab89d0b1 in start_thread () from /lib/tls/libpthread.so.0
#5  0x00002aaaab623263 in clone () from /lib/tls/libc.so.6
#6  0x0000000000000000 in ?? ()
#7  0x0000000000000000 in ?? ()
#8  0x0000000000000000 in ?? ()
#9  0x0000000000000000 in ?? ()
#10 0x0000000000000000 in ?? ()
#11 0x0000000000000000 in ?? ()
#12 0x0000000000000000 in ?? ()
#13 0x0000000000000000 in ?? ()
#14 0x0000000000000000 in ?? ()
#15 0x0000000000000000 in ?? ()
#16 0x0000000000000000 in ?? ()
#17 0x0000000000000000 in ?? ()
#18 0x0000000000000000 in ?? ()
#19 0x0000000000000000 in ?? ()
#20 0x0000000000000000 in ?? ()
#21 0x0000000000000000 in ?? ()
#22 0x0000000000000000 in ?? ()
#23 0x0000000000000000 in ?? ()
#24 0x0000000000000000 in ?? ()
#25 0x0000000000000000 in ?? ()
#26 0x0000000000000000 in ?? ()
#27 0x0000000000000000 in ?? ()
#28 0x0000000000000000 in ?? ()
#29 0x0000000000000000 in ?? ()
#30 0x0000000000000000 in ?? ()
#31 0x0000000000000000 in ?? ()
#32 0x0000000000000000 in ?? ()
#33 0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---q
Quit
(gdb) thread 5
[Switching to thread 5 (process 8204)]#0  0x00002aaaab5f9075 in nanosleep ()
   from /lib/tls/libc.so.6
(gdb) bt
#0  0x00002aaaab5f9075 in nanosleep () from /lib/tls/libc.so.6
#1  0x00000000007080c3 in dice::hfe::System::sleep ()
#2  0x00000000004c0ba9 in dice::hfe::CheckServerAliveThread::run ()
#3  0x0000000000708b3a in dice::hfe::(anonymous
namespace)::pthreads_thread_trampoline
    ()
#4  0x00002aaaab89d0b1 in start_thread () from /lib/tls/libpthread.so.0
#5  0x00002aaaab623263 in clone () from /lib/tls/libc.so.6
#6  0x0000000000000000 in ?? ()
#7  0x0000000000000000 in ?? ()
#8  0x0000000000000000 in ?? ()
#9  0x0000000000000000 in ?? ()
#10 0x0000000000000000 in ?? ()
#11 0x0000000000000000 in ?? ()
#12 0x0000000000000000 in ?? ()
#13 0x0000000000000000 in ?? ()
#14 0x0000000000000000 in ?? ()
#15 0x0000000000000000 in ?? ()
#16 0x0000000000000000 in ?? ()
#17 0x0000000000000000 in ?? ()
#18 0x0000000000000000 in ?? ()
#19 0x0000000000000000 in ?? ()
#20 0x0000000000000000 in ?? ()
#21 0x0000000000000000 in ?? ()
#22 0x0000000000000000 in ?? ()
#23 0x0000000000000000 in ?? ()
#24 0x0000000000000000 in ?? ()
#25 0x0000000000000000 in ?? ()
#26 0x0000000000000000 in ?? ()
#27 0x0000000000000000 in ?? ()
#28 0x0000000000000000 in ?? ()
#29 0x0000000000000000 in ?? ()
#30 0x0000000000000000 in ?? ()
#31 0x0000000000000000 in ?? ()
#32 0x0000000000000000 in ?? ()
#33 0x0000000000000000 in ?? ()
---Type <return> to continue, or q <return> to quit---q
Quit
--- END LONG GDB OUTPUT ----------------------------------

I hope this info is useful! Please ask if you need more detailed
information.

If anyone else wants to try and see if GhostManager::writeData is a
problematic function for you as well, feel free to run the script above
and see what happens.

Cheers,
Einar



More information about the Bf1942 mailing list