[bf1942] Andreas: dice::hfe::GhostManager::writeData ()

"Einar S. Idsø" esi at itk.ntnu.no
Fri Jul 1 11:31:10 EDT 2005


Andreas,

Sorry to hear that, but I'm glad you are aware of the problem and
working on it. As you can see from the timestamps in my original mail,
this really kills our servers. A couple of crashes per hour per box is
not good at all. :(

Is this kind of crash only for ranked servers, or also for unranked?
Like I said, we started to experience a lot more crashes after our
servers were ranked, but that also meant a lot more people are jamming
our servers, so the server being ranked may not be of direct essence here.

Is the crash for 64-bit servers only? If it is, then we could consider
switching to 32-bit until it gets fixed.

Is there anything I can do to help in this matter?

Oh, and while I'm at it: You are aware that changing the time on the box
crashes BF2 servers? They then crash in this function:
#0  0x00000000004c0c09 in dice::hfe::CheckServerAliveThread::run ()

Einar


Andreas Andersson wrote:
> Hello, 
> 
> We have had that crash here on our server as well. I have been looking at it for a couple of days now. But so far I haven't been able to come up with a fix. I'm afraid this seems to be one of the tricky ones...
> This means that the bug will most likely not be fixed with the patch EA talked about in the latest community-update (unless it has been fixed as a sideeffect of another fix).
> 
> /AndreasA
> 
> -----Original Message-----
> From: "Einar S. Idsø" [mailto:esi at itk.ntnu.no] 
> Sent: den 1 juli 2005 00:06
> To: bf1942 at icculus.org
> Subject: [bf1942] Andreas: dice::hfe::GhostManager::writeData ()
> 
> Hi Andreas,
> 
> Ever since we this morning enabled ranking on our servers, they have
> been crashing ridiculously often. We do not know if this is related to
> the ranking itself, or if it is because of the absolutely amazing
> interest people have taken in our servers since they became ranked (they
> are CONSTANTLY full, and it is a matter of minutes or even seconds
> before they fill up after a crash).
> 
> However, there seems to be a pattern: Almost every single crash has the
> same backtrace, so this bug could perhaps be "simple" to fix. And if it
> is indeed the primary problem for crashing, as seems to be the case from
> what I'll be showing below, it may affect many people badly and
> hopefully get a relatively high priority to be fixed.
> 
> Following is the output on two dual Opteron servers from the following
> script:
> -----
> #!/bin/bash
> 
> for filename in `ls -t -C1 $@`; do
>     ls -l $filename
>     gdb -batch -c $filename bin/amd-64/bf2_f |grep -e "^\#"
> done
> -----
> The script is run in the bf2 server installation dir with core* as argument.
> 
> The first output is from a server which runs two 64-player servers (and
> has the clock offset by +2 hours - never mind that ;) ):
> 
> -rw-------  1 server users 278659072 Jul  1 01:09 core.6586
> #0  0x0000000000000000 in ?? ()
> -rw-------  1 server users 199155712 Jul  1 00:44 core.8196
> #0  0x000000000043c19b in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 242470912 Jul  1 00:33 core.5028
> #0  0x000000000043c195 in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 188968960 Jul  1 00:13 core.6234
> #0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 194682880 Jun 30 23:56 core.5653
> #0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 193380352 Jun 30 23:28 core.5026
> #0  0x0000000000000050 in ?? ()
> -rw-------  1 server users 254894080 Jun 30 23:00 core.2508
> #0  0x000000000043c195 in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 237834240 Jun 30 22:50 core.3078
> #0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 196165632 Jun 30 21:15 core.2426
> #0  0x000000000043c195 in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 228093952 Jun 30 19:56 core.31182
> #0  0x000000000043c195 in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 239190016 Jun 30 18:33 core.28499
> #0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 232099840 Jun 30 18:26 core.29738
> #0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 225918976 Jun 30 17:13 core.28603
> #0  0x000000000043c1a1 in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 235868160 Jun 30 16:18 core.26252
> #0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 222158848 Jun 30 16:10 core.27353
> #0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 251736064 Jun 30 15:09 core.23075
> #0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 224284672 Jun 30 14:32 core.23197
> #0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 288645120 Jun 29 23:31 core.5484
> #0  0x000000000043c195 in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 299806720 Jun 29 20:47 core.2594
> #0  0x0000000000000000 in ?? ()
> -rw-------  1 server users 361648128 Jun 29 00:56 core.6367
> #0  0x000000000043c195 in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 297541632 Jun 28 22:14 core.9107
> #0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 216424448 Jun 28 19:09 core.4090
> #0  0x000000000043c195 in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 215441408 Jun 28 15:40 core.4076
> #0  0x000000000043c19b in dice::hfe::GhostManager::writeData ()
> 
> 
> The second output is from a server which runs one 64-player server and 2
> 32-player servers:
> 
> -rw-------  1 server users 308256768 Jun 30 23:21 core.27145
> #0  0x000000000043c19b in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 234938368 Jun 30 21:04 core.24407
> #0  0x000000000043c1a1 in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 182644736 Jun 30 19:21 core.23964
> #0  0x000000000043c195 in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 239296512 Jun 30 17:55 core.18121
> #0  0x00002aaaae706890 in ?? ()
> -rw-------  1 server users 222105600 Jun 30 15:54 core.15912
> #0  0x000000000043c195 in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 303640576 Jun 30 14:32 core.13667
> #0  0x000000000043c19e in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 212078592 Jun 30 13:47 core.8387
> #0  0x000000000043c19b in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 289026048 Jun 30 13:13 core.8050
> #0  0x000000000043c19b in dice::hfe::GhostManager::writeData ()
> -rw-------  1 server users 179232768 Jun 29 18:37 core.3653
> #0  0x00000000004c0c09 in dice::hfe::CheckServerAliveThread::run ()
> -rw-------  1 server users 170971136 Jun 29 18:37 core.2794
> #0  0x00000000004c0c09 in dice::hfe::CheckServerAliveThread::run ()
> -rw-------  1 server users 172134400 Jun 29 18:37 core.2795
> #0  0x00000000004c0c09 in dice::hfe::CheckServerAliveThread::run ()
> 
> As you can see, the dice::hfe::GhostManager::writeData () function is
> active at almost every single crash.
> 
> Following is the detailed backtrace of one of those, more specifically
> core.8196 which is number 2 from the very top:
> 
> --- BEGIN LONG GDB OUTPUT ----------------------------------
> (gdb) bt
> #0  0x000000000043c19b in dice::hfe::GhostManager::writeData ()
> #1  0x000000000044396e in dice::hfe::GhostManager::transmit ()
> #2  0x00000000004316ab in dice::hfe::ClientConnection::transmitMsgs ()
> #3  0x000000000045c248 in
> dice::hfe::GameServer::processGameStateAndSendPackets ()
> #4  0x000000000045cd17 in dice::hfe::GameServer::update ()
> #5  0x00000000004c29b0 in dice::hfe::BF2Engine::mainLoop ()
> #6  0x000000000040ad1c in dice::hfe::BF2::run ()
> #7  0x000000000040b2d8 in main ()
> (gdb) info threads
>   5 process 8204  0x00002aaaab5f9075 in nanosleep () from /lib/tls/libc.so.6
>   4 process 8205  0x00002aaaab5f9075 in nanosleep () from /lib/tls/libc.so.6
>   3 process 8206  0x00002aaaab61cff6 in select () from /lib/tls/libc.so.6
>   2 process 8207  0x00002aaaab5f9075 in nanosleep () from /lib/tls/libc.so.6
> * 1 process 8196  0x000000000043c19b in
> dice::hfe::GhostManager::writeData ()
> (gdb) thread 2
> [Switching to thread 2 (process 8207)]#0  0x00002aaaab5f9075 in nanosleep ()
>    from /lib/tls/libc.so.6
> (gdb) bt
> #0  0x00002aaaab5f9075 in nanosleep () from /lib/tls/libc.so.6
> #1  0x00000000007080c3 in dice::hfe::System::sleep ()
> #2  0x00000000004bc4ce in dice::hfe::VoipServerInternalHookThread::run ()
> #3  0x0000000000708b3a in dice::hfe::(anonymous
> namespace)::pthreads_thread_trampoline
>     ()
> #4  0x00002aaaab89d0b1 in start_thread () from /lib/tls/libpthread.so.0
> #5  0x00002aaaab623263 in clone () from /lib/tls/libc.so.6
> #6  0x0000000000000000 in ?? ()
> #7  0x0000000000000000 in ?? ()
> #8  0x0000000000000000 in ?? ()
> #9  0x0000000000000000 in ?? ()
> #10 0x0000000000000000 in ?? ()
> #11 0x0000000000000000 in ?? ()
> #12 0x0000000000000000 in ?? ()
> #13 0x0000000000000000 in ?? ()
> #14 0x0000000000000000 in ?? ()
> #15 0x0000000000000000 in ?? ()
> #16 0x0000000000000000 in ?? ()
> #17 0x0000000000000000 in ?? ()
> #18 0x0000000000000000 in ?? ()
> #19 0x0000000000000000 in ?? ()
> #20 0x0000000000000000 in ?? ()
> #21 0x0000000000000000 in ?? ()
> #22 0x0000000000000000 in ?? ()
> #23 0x0000000000000000 in ?? ()
> #24 0x0000000000000000 in ?? ()
> #25 0x0000000000000000 in ?? ()
> #26 0x0000000000000000 in ?? ()
> #27 0x0000000000000000 in ?? ()
> #28 0x0000000000000000 in ?? ()
> #29 0x0000000000000000 in ?? ()
> #30 0x0000000000000000 in ?? ()
> #31 0x0000000000000000 in ?? ()
> #32 0x0000000000000000 in ?? ()
> #33 0x0000000000000000 in ?? ()
> ---Type <return> to continue, or q <return> to quit---q
> Quit
> (gdb) thread 3
> [Switching to thread 3 (process 8206)]#0  0x00002aaaab61cff6 in select ()
>    from /lib/tls/libc.so.6
> (gdb) bt
> #0  0x00002aaaab61cff6 in select () from /lib/tls/libc.so.6
> #1  0x000000000079ff5f in dice::hfe::io::SocketManager::sleep ()
> #2  0x000000000079b5aa in dice::hfe::io::NetServerThread::run ()
> #3  0x0000000000708b3a in dice::hfe::(anonymous
> namespace)::pthreads_thread_trampoline
>     ()
> #4  0x00002aaaab89d0b1 in start_thread () from /lib/tls/libpthread.so.0
> #5  0x00002aaaab623263 in clone () from /lib/tls/libc.so.6
> #6  0x0000000000000000 in ?? ()
> #7  0x0000000000000000 in ?? ()
> #8  0x0000000000000000 in ?? ()
> #9  0x0000000000000000 in ?? ()
> #10 0x0000000000000000 in ?? ()
> #11 0x0000000000000000 in ?? ()
> #12 0x0000000000000000 in ?? ()
> #13 0x0000000000000000 in ?? ()
> #14 0x0000000000000000 in ?? ()
> #15 0x0000000000000000 in ?? ()
> #16 0x0000000000000000 in ?? ()
> #17 0x0000000000000000 in ?? ()
> #18 0x0000000000000000 in ?? ()
> #19 0x0000000000000000 in ?? ()
> #20 0x0000000000000000 in ?? ()
> #21 0x0000000000000000 in ?? ()
> #22 0x0000000000000000 in ?? ()
> #23 0x0000000000000000 in ?? ()
> #24 0x0000000000000000 in ?? ()
> #25 0x0000000000000000 in ?? ()
> #26 0x0000000000000000 in ?? ()
> #27 0x0000000000000000 in ?? ()
> #28 0x0000000000000000 in ?? ()
> #29 0x0000000000000000 in ?? ()
> #30 0x0000000000000000 in ?? ()
> #31 0x0000000000000000 in ?? ()
> #32 0x0000000000000000 in ?? ()
> #33 0x0000000000000000 in ?? ()
> ---Type <return> to continue, or q <return> to quit---q
> Quit
> (gdb) thread 4
> [Switching to thread 4 (process 8205)]#0  0x00002aaaab5f9075 in nanosleep ()
>    from /lib/tls/libc.so.6
> (gdb) bt
> #0  0x00002aaaab5f9075 in nanosleep () from /lib/tls/libc.so.6
> #1  0x00000000007080c3 in dice::hfe::System::sleep ()
> #2  0x000000000044a8cb in dice::hfe::AutoRecorderHookThread::run ()
> #3  0x0000000000708b3a in dice::hfe::(anonymous
> namespace)::pthreads_thread_trampoline
>     ()
> #4  0x00002aaaab89d0b1 in start_thread () from /lib/tls/libpthread.so.0
> #5  0x00002aaaab623263 in clone () from /lib/tls/libc.so.6
> #6  0x0000000000000000 in ?? ()
> #7  0x0000000000000000 in ?? ()
> #8  0x0000000000000000 in ?? ()
> #9  0x0000000000000000 in ?? ()
> #10 0x0000000000000000 in ?? ()
> #11 0x0000000000000000 in ?? ()
> #12 0x0000000000000000 in ?? ()
> #13 0x0000000000000000 in ?? ()
> #14 0x0000000000000000 in ?? ()
> #15 0x0000000000000000 in ?? ()
> #16 0x0000000000000000 in ?? ()
> #17 0x0000000000000000 in ?? ()
> #18 0x0000000000000000 in ?? ()
> #19 0x0000000000000000 in ?? ()
> #20 0x0000000000000000 in ?? ()
> #21 0x0000000000000000 in ?? ()
> #22 0x0000000000000000 in ?? ()
> #23 0x0000000000000000 in ?? ()
> #24 0x0000000000000000 in ?? ()
> #25 0x0000000000000000 in ?? ()
> #26 0x0000000000000000 in ?? ()
> #27 0x0000000000000000 in ?? ()
> #28 0x0000000000000000 in ?? ()
> #29 0x0000000000000000 in ?? ()
> #30 0x0000000000000000 in ?? ()
> #31 0x0000000000000000 in ?? ()
> #32 0x0000000000000000 in ?? ()
> #33 0x0000000000000000 in ?? ()
> ---Type <return> to continue, or q <return> to quit---q
> Quit
> (gdb) thread 5
> [Switching to thread 5 (process 8204)]#0  0x00002aaaab5f9075 in nanosleep ()
>    from /lib/tls/libc.so.6
> (gdb) bt
> #0  0x00002aaaab5f9075 in nanosleep () from /lib/tls/libc.so.6
> #1  0x00000000007080c3 in dice::hfe::System::sleep ()
> #2  0x00000000004c0ba9 in dice::hfe::CheckServerAliveThread::run ()
> #3  0x0000000000708b3a in dice::hfe::(anonymous
> namespace)::pthreads_thread_trampoline
>     ()
> #4  0x00002aaaab89d0b1 in start_thread () from /lib/tls/libpthread.so.0
> #5  0x00002aaaab623263 in clone () from /lib/tls/libc.so.6
> #6  0x0000000000000000 in ?? ()
> #7  0x0000000000000000 in ?? ()
> #8  0x0000000000000000 in ?? ()
> #9  0x0000000000000000 in ?? ()
> #10 0x0000000000000000 in ?? ()
> #11 0x0000000000000000 in ?? ()
> #12 0x0000000000000000 in ?? ()
> #13 0x0000000000000000 in ?? ()
> #14 0x0000000000000000 in ?? ()
> #15 0x0000000000000000 in ?? ()
> #16 0x0000000000000000 in ?? ()
> #17 0x0000000000000000 in ?? ()
> #18 0x0000000000000000 in ?? ()
> #19 0x0000000000000000 in ?? ()
> #20 0x0000000000000000 in ?? ()
> #21 0x0000000000000000 in ?? ()
> #22 0x0000000000000000 in ?? ()
> #23 0x0000000000000000 in ?? ()
> #24 0x0000000000000000 in ?? ()
> #25 0x0000000000000000 in ?? ()
> #26 0x0000000000000000 in ?? ()
> #27 0x0000000000000000 in ?? ()
> #28 0x0000000000000000 in ?? ()
> #29 0x0000000000000000 in ?? ()
> #30 0x0000000000000000 in ?? ()
> #31 0x0000000000000000 in ?? ()
> #32 0x0000000000000000 in ?? ()
> #33 0x0000000000000000 in ?? ()
> ---Type <return> to continue, or q <return> to quit---q
> Quit
> --- END LONG GDB OUTPUT ----------------------------------
> 
> I hope this info is useful! Please ask if you need more detailed
> information.
> 
> If anyone else wants to try and see if GhostManager::writeData is a
> problematic function for you as well, feel free to run the script above
> and see what happens.
> 
> Cheers,
> Einar
> 
> 



More information about the Bf1942 mailing list