Thursday, August 20, 2015

Investigation into the effects of server timeouts on the CIFS client.

The investigation is driven by a user request to reduce the time spent by the client waiting for a request to an unresponsive server.

The client awaits responses by the server in the kernel thread "cifsd" which runs the call path
cifs_demultiplex_thread()->cifs_read_from_socket()->cifs_readv_from_socket()->kernel_recvmsg()

We investigate 2 scenarios
a) The server is orderly shutdown ie. the server closes the socket.
b) The server is abruptly taken off the network ie. it has no chance to orderly shutdown the socket. This is simulated using iptables.

In both cases we assume that the client has a share mounted from the server.

Server is orderly shutdown

kernel_recvmsg() call in cifs_readv_from_socket() returns 0.


int

cifs_readv_from_socket(struct TCP_Server_Info *server, struct kvec *iov_orig,

                       unsigned int nr_segs, unsigned int to_read)

{

..

                length = kernel_recvmsg(server->ssocket, &smb_msg,

                                        iov, segs, to_read, 0);

..

                } else if (length <= 0) {

                        cifs_dbg(FYI, "Received no data or error: expecting %d\n"

                                 "got %d", to_read, length);

                        cifs_reconnect(server);

                        total_read = -ECONNABORTED;

                        break;

                }

..

}
A call is made to cifs_reconnect() where an attempt is made to continuously reconnect to the server in the "cifsd" thread every  in the code path
cifs_demultiplex_thread()->cifs_read_from_socket()->cifs_readv_from_socket()->cifs_reconnect()->generic_ip_connect()


int

cifs_reconnect(struct TCP_Server_Info *server)

{

..

        do {

..

                rc = generic_ip_connect(server);

                if (rc) {

                        cifs_dbg(FYI, "reconnect error %d\n", rc);

                        mutex_unlock(&server->srv_mutex);

                        msleep(3000);

                }

..

        } while (server->tcpStatus == CifsNeedReconnect);

        return rc;

}

While the server is down, generic_ip_connect() continously returns -EHOSTDOWN(111). It will sleep for 3 seconds between each reconnect attempt.

While the attempts to reconnect are made in the cifsd thread, any attempt to perform an operation on the share results in the thread calling smb_init() which calls cifs_reconnect_tcon()
ex: when calling ls -l - CIFSSMBUnixQPathInfo() -> smb_init() -> cifs_reconnect_tcon()

static int

cifs_reconnect_tcon(struct cifs_tcon *tcon, int smb_command)

{

..

        while (server->tcpStatus == CifsNeedReconnect) {

                wait_event_interruptible_timeout(server->response_q,

                        (server->tcpStatus != CifsNeedReconnect), 10 * HZ);

..

                /*

                 * on "soft" mounts we wait once. Hard mounts keep

                 * retrying until process is killed or server comes

                 * back on-line

                 */

                if (!tcon->retry) {

                        cifs_dbg(FYI, "gave up waiting on reconnect in smb_init\n");

                        return -EHOSTDOWN;

                }

        }

..

}
In the while loop above, we return after 10 seconds in case of a soft mount or continously retry in case of hardmounts.

Conclusion:

In cases where the server is shutdown orderly, the client immediately attempts to reconnect. It continues doing this through the cifsd thread keeping an interval of 3 seconds between each attempt. Any attempt to perform an operation on this mount on a soft mounted share(which is the default) will fail after waiting for 10 seconds for the reconnect attempts made by the cifsd thread to succeed.

Server is abruptly shutdown


The server tcp connection is represented by struct TCP_Server_Info. This contains a variable "unsigned long lstrp" which contains the jiffies at which the client last got a response from the server.


struct TCP_Server_Info {

..

        unsigned long lstrp; /* when we got last response from this server */

..

};

This is initialised in cifs_get_tcp_session when the first connection to the server is made.
static struct TCP_Server_Info *
cifs_get_tcp_session(struct smb_vol *volume_info)

{

..

        tcp_ses->lstrp = jiffies;

..

}

This value is set to the latest jiffies at the end of every successful read request.

static int
cifs_demultiplex_thread(void *p)
{
..
        while (server->tcpStatus != CifsExiting) {
..
                mid_entry = server->ops->find_mid(server, buf);
                if (!mid_entry || !mid_entry->receive)
                        length = standard_receive3(server, mid_entry);
                else
                        length = mid_entry->receive(server, mid_entry);
                if (length < 0)
                        continue;
                if (server->large_buf)
                        buf = server->bigbuf;
                server->lstrp = jiffies;
..
        } /* end while !EXITING */
..
}

It is this last response variable server->lstrep which is used to determine the responsiveness of the server.

Before we look at how it is used to determine the server responsiveness, we should also take a look at the echo service. This is the keepalive service used to "ping" the server at regular intervals in case of inactivity. This is done by setting up a work on the cifsiod_wq workqueue which is called ever SMB_ECHO_INTERVAL(60) seconds.

static struct TCP_Server_Info *
cifs_get_tcp_session(struct smb_vol *volume_info)
{
..
        INIT_DELAYED_WORK(&tcp_ses->echo, cifs_echo_request);
..
        /* queue echo request delayed work */
        queue_delayed_work(cifsiod_wq, &tcp_ses->echo, SMB_ECHO_INTERVAL);
..
}

After SMB_ECHO_INTERVAL(60) seconds, we call 

static void
cifs_echo_request(struct work_struct *work)
{
        int rc;
        struct TCP_Server_Info *server = container_of(work,
                                        struct TCP_Server_Info, echo.work);
        /*
         * We cannot send an echo if it is disabled or until the
         * NEGOTIATE_PROTOCOL request is done, which is indicated by
         * server->ops->need_neg() == true. Also, no need to ping if
         * we got a response recently.
         */
        if (!server->ops->need_neg || server->ops->need_neg(server) ||
            (server->ops->can_echo && !server->ops->can_echo(server)) ||
            time_before(jiffies, server->lstrp + SMB_ECHO_INTERVAL - HZ))
                goto requeue_echo;
        rc = server->ops->echo ? server->ops->echo(server) : -ENOSYS;
        if (rc)
                cifs_dbg(FYI, "Unable to send echo request to server: %s\n",
                         server->hostname);
requeue_echo:
        queue_delayed_work(cifsiod_wq, &server->echo, SMB_ECHO_INTERVAL);
}

We call an echo if the session hasn't been negotiated yet and if no response was received since the last echo interval. At the end of the call, we requeue the echo work request on the cifsiod_wq.

Before each call to read on the socket, we first check to make sure that the server is not unresponsive.

int
cifs_readv_from_socket(struct TCP_Server_Info *server, struct kvec *iov_orig,
                       unsigned int nr_segs, unsigned int to_read)
{
..
                if (server_unresponsive(server)) {
                        total_read = -ECONNABORTED;
                        break;
                }
..
                length = kernel_recvmsg(server->ssocket, &smb_msg,
                                        iov, segs, to_read, 0);
..
}

Note that we send an echo call every SMB_ECHO_INTERVAL. So in case of an inactive connections, we should receive a reply atleast after every SMB_ECHO_INTERVAL(60) seconds. The timeout for receives on the socket is set to 7 seconds when setting up the socket in generic_ip_connect(). If no replies have been received for 7 seconds, kernel_recvmsg() timesout and we redo the for loop which checks if the server is unresponsive again.

static bool
server_unresponsive(struct TCP_Server_Info *server)
{
        /*
         * We need to wait 2 echo intervals to make sure we handle such
         * situations right:
         * 1s  client sends a normal SMB request
         * 2s  client gets a response
         * 30s echo workqueue job pops, and decides we got a response recently
         *     and don't need to send another
         * ...
         * 65s kernel_recvmsg times out, and we see that we haven't gotten
         *     a response in >60s.
         */
        if (server->tcpStatus == CifsGood &&
            time_after(jiffies, server->lstrp + 2 * SMB_ECHO_INTERVAL)) {
                cifs_dbg(VFS, "Server %s has not responded in %d seconds. Reconnecting...\n",
                         server->hostname, (2 * SMB_ECHO_INTERVAL) / HZ);
                cifs_reconnect(server);
                wake_up(&server->response_q);
                return true;
        }
        return false;
}



Any filesystem activity performing an operation on this share uses functions such as SendReceive(), SendReceive2() or SendReceiveBlockingLock() to send over commands to the remote server. They then wait for a response from the server by calling wait_for_response().


static int
wait_for_response(struct TCP_Server_Info *server, struct mid_q_entry *midQ)
{
        int error;
        error = wait_event_freezekillable_unsafe(server->response_q,
                                    midQ->mid_state != MID_REQUEST_SUBMITTED);
        if (error < 0)
                return -ERESTARTSYS;
        return 0;
}

It continues waiting for a response until we finally timeout in the cifs_demultiplex_thread "cifsd" in the call path cifs_demultiplex_thread()->cifs_read_from_socket()->cifs_readv_from_socket()->server_unresponsive(). Once the function server_unresponsive() confirms that the server hasn't replied for atleast 2 * SMB_ECHO_INTERVAL, it calls cifs_reconnect() to mark the TCP_Server_Info as needing a reconnect.

int
cifs_reconnect(struct TCP_Server_Info *server)
{
..
  //Mark the TCP_Server_Info as needing reconnect
                server->tcpStatus = CifsNeedReconnect;
..
 //Mark all sessions and tcons as needing a reconnect
        list_for_each(tmp, &server->smb_ses_list) {
                ses = list_entry(tmp, struct cifs_ses, smb_ses_list);
                ses->need_reconnect = true;
                ses->ipc_tid = 0;
                list_for_each(tmp2, &ses->tcon_list) {
                        tcon = list_entry(tmp2, struct cifs_tcon, tcon_list);
                        tcon->need_reconnect = true;
                }
        }
..
        spin_lock(&GlobalMid_Lock);
        list_for_each_safe(tmp, tmp2, &server->pending_mid_q) {
                mid_entry = list_entry(tmp, struct mid_q_entry, qhead);
  //Mark each pending request as requiring a retry
                if (mid_entry->mid_state == MID_REQUEST_SUBMITTED)
                        mid_entry->mid_state = MID_RETRY_NEEDED;
                list_move(&mid_entry->qhead, &retry_list);
        }
        spin_unlock(&GlobalMid_Lock);
..
}

We then wake up all pending requests in server_unresponsive()

static bool
server_unresponsive(struct TCP_Server_Info *server)
{
..
        if (server->tcpStatus == CifsGood &&
            time_after(jiffies, server->lstrp + 2 * SMB_ECHO_INTERVAL)) {
                cifs_dbg(VFS, "Server %s has not responded in %d seconds. Reconnecting...\n",
                         server->hostname, (2 * SMB_ECHO_INTERVAL) / HZ);
  //Mark all pending requests as needing retries
                cifs_reconnect(server);
  //Wake up all tasks waiting for a response.
                wake_up(&server->response_q);
                return true;
        }
        return false;
}

For the task sending the request, we return and return -EAGAIN requesting the task to be run again.

int
SendReceive(const unsigned int xid, struct cifs_ses *ses,
            struct smb_hdr *in_buf, struct smb_hdr *out_buf,
            int *pbytes_returned, const int timeout)
{
..
 //Set request status as submitted
        midQ->mid_state = MID_REQUEST_SUBMITTED;
        cifs_in_send_inc(ses->server);
 //Send request
        rc = smb_send(ses->server, in_buf, be32_to_cpu(in_buf->smb_buf_length));
        cifs_in_send_dec(ses->server);
        cifs_save_when_sent(midQ);
..
 //Wait for response
        rc = wait_for_response(ses->server, midQ);
..
 //Check response. For a retry, this call will return -EAGAIN
        rc = cifs_sync_mid_result(midQ, ses->server);
        if (rc != 0) {
                add_credits(ses->server, 1, 0);
  //Which is returned to the called
                return rc;
        }
..
}

The call attempts to retry and subsequently attempts to reconnect to the tcon

ex: when calling ls -l - CIFSSMBUnixQPathInfo() -> smb_init() -> cifs_reconnect_tcon()

static int
cifs_reconnect_tcon(struct cifs_tcon *tcon, int smb_command)
{
..
        while (server->tcpStatus == CifsNeedReconnect) {
                wait_event_interruptible_timeout(server->response_q,
                        (server->tcpStatus != CifsNeedReconnect), 10 * HZ);
..
                /*
                 * on "soft" mounts we wait once. Hard mounts keep
                 * retrying until process is killed or server comes
                 * back on-line
                 */
                if (!tcon->retry) {
                        cifs_dbg(FYI, "gave up waiting on reconnect in smb_init\n");
                        return -EHOSTDOWN;
                }
        }
..
}
In the while loop above, we return after 10 seconds in case of a soft mount or continuously retry in case of hardmounts.

Conclusion:

When the server is abruptdly taken off the network, we will mark the server as requiring a reconnect. This can take at max a total of
timeout for recvmsg(7) + 2 * SMB_ECHO_INTERVAL(60) + time waiting for reconnect in cifs_reconnect_tcon() (10) = 137 seconds.

Setting a frost alarm on Home assistant

One of the issues with winter is the possibility of ice covering your windscreen which needs to be cleared before you can drive. Clearing ou...