[Bug 3616] recursive error after client command overflow

Sun May 4 10:55:56 EDT 2008

http://bugzilla.icculus.org/show_bug.cgi?id=3616

------- Comment #2 from devhc97 at gmail.com  2008-05-04 10:55 EDT -------
(Forget that. These bugs are not related.)

Our reliable command buffer keeps filling up when we add commands for
transmission, and it's emptied (to an extent) when we receive acknowledge. But
what if want to add one, and our buffer is full (we still haven't received
acknowledge to those commands)? We would have to overwrite/drop old commands.
Doing so however does not guarantee that every command will get to the server,
and thus we wouldn't be calling it a "reliable" command. So when adding a
reliable command, checks are done (an actual overflow does not happen). If
we're about to overwrite an old command, the connection is considered
defective, and must be dropped.

Such cases arise from the client trying to send commands too fast for the
server to answer to them in time. With higher ping times it's easier, but even
with low ping, bursting reliable commands can overfill the buffer. (Or if the
server does not respond at all, the client can slowly overflow.)

To reproduce, do this:
- /bind v "a;a;a;a;a;a;a;a;a;a;a;a;a;a;a;a;a;a;a;a;a;a;a;a;a"
- connect to a server with bad ping (first make a server with packet delay)
- quickly press v a few times

Let's say that our buffer is just full, and a CL_AddReliableCommand is done. We
execute Com_Error( ERR_DROP, "Client command overflow" ), and there, call
CL_Disconnect.
The assumption that CL_Disconnect makes is that it can gracefully disconnect
from the server, and therefore does a CL_AddReliableCommand("disconnect"). But
our buffer is ALREADY full, and we're trying to overfill it the second time.
If we wouldn't be checking for recursive errors in Com_Error, then it would be
an infinite cycle of CL_AddReliableCommand, Com_Error, and CL_Disconnect. But
we do check. Since com_errorEntered got set at the first Com_Error call, we
call a Sys_Error.
Sys_Error's first thing is to do CL_Shutdown, which calls CL_Disconnect. Again,
it would be another infinite cycle of Com_Error, Sys_Error, CL_Shutdown, and
CL_Disconnect. But CL_Shutdown also checks for recursive errors, and does
nothing but returns if called recursively.
So we're back at our second Sys_Error call, and we exit the program.

So how to diagnose the overflow?
Borrow the overflow check from CL_AddReliableCommand, and don't send
"disconnect" if we're full? The connection will hang then.
Overwrite the last command with "disconnect"? Command scripts are sent in packs
(like small programs), and may have rather undesired effects when half-sent.
So formulate a new packet with just one "disconnect" command? Or keep track of
how many reliable commands we have summed up in the last frame, and remove ones
that were added in the current frame that caused the overflow?
What if we're just at the buffer limit, and CL_Disconnecting because of
something else? Send an additional "disconnect" from outside the buffer?

-- 
Configure bugmail: http://bugzilla.icculus.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.