From: Michel Stam Date: Thu, 2 Oct 2014 11:39:16 +0000 (+0000) Subject: proto-shell: fix for not handling switch from DHCP to static race X-Git-Url: https://git.archive.openwrt.org/?p=project%2Fnetifd.git;a=commitdiff_plain;h=fd7d5b43d85b2e3bb2eee98644bca833c6202048 proto-shell: fix for not handling switch from DHCP to static race When a shell script call is finished, proto_shell_task_finish( ) is called to monitor processes, and determine the next interface state. When the interface is brought up after a reconfiguration from dhcp to static, it will first try to (erroneously?) reconfigure the interface for DHCP. Upon doing this, it realises the mistake and kills off the script by setting the state to S_SETUP_ABORT. This is done by the proto_shell_handler. When this happens. the scripts have 1 second to finish. When this happens, S_SETUP_ABORT in proto_shell_task_finish( ) should issue a 'teardown' event to the shell script to deconfigure the interface. It is here that things go wrong. Shell scripts do not execute commands themselves, they should finish as quick as possible. This is very race condition sensitive, though; Instead of executing commands, they post messages to execute commands. It is therefore possible that when the script finishes, there's still commands to execute. The dhcp protocol handler script, one of the scripts involved, notifies netifd of changes by (indirectly) calling proto_shell_update_link( ). Once every so often, the dhcp script will not be finished in time, and proto_shell_task_finish( ) cannot immediately continue, because (in this case) the proto_task is still pending. If this happens, the proto_shell_task_finish( ) will wait, but if the proto_shell_update_link( ) notification is then received, it will set the statemachine to idle, thus breaking the S_SETUP_ABORT. Furthermore, an event is generated to indicate that the network interface should be set to UP, rather than DOWN. This confuses netifd, and the result is a stuck process that does not respond to UCI calls anymore. Note that a similar situation happens in the S_TEARDOWN state in proto_shell_task_finish( ). The fix, although a bit ugly, is to prevent the UP event from being sent, and not to reset the state machine to idle in proto_shell_update_link( ). Signed-off-by: Michel Stam --- diff --git a/proto-shell.c b/proto-shell.c index 7c23caa..0131e19 100644 --- a/proto-shell.c +++ b/proto-shell.c @@ -506,9 +506,11 @@ proto_shell_update_link(struct proto_shell_state *state, struct blob_attr *data, interface_update_complete(state->proto.iface); - if (!keep) - state->proto.proto_event(&state->proto, IFPEV_UP); - state->sm = S_IDLE; + if ((state->sm != S_SETUP_ABORT) && (state->sm != S_TEARDOWN)) { + if (!keep) + state->proto.proto_event(&state->proto, IFPEV_UP); + state->sm = S_IDLE; + } return 0; }