■ - ウォンツテック

apache reading

さて、今日からはgdbを使って実際に動かしながらapacheのソースを読む事にしたのでまずはgdbでapache探索出来るように下準備をする。

http://httpd.apache.org/dev/debugging.html
まずはgdbで使うsymbol infoが埋め込まれた実行ファイルが必要なのでapacheを以下の方法で再コンパイル

aprのインストール
./configure --prefix=/usr/local/apr-httpd/
make
make install

本体のインストール
./configure EXTRA_CFLAGS=-g --with-apr=/usr/local/apr-httpd/
make
make install
※EXTRA_CFLAGS=-gでsymbol情報を埋め込む。--with-aprは私のシステムに既にaprを別個にインストールしていたため、それが干渉しインストール出来ないというエラーが履かれたので別個にインストールした。

gdbの起動

# gdb /usr/local/apache2/bin/httpd
(gdb) b ap_process_request ←どっか適当にbreakpointを作る
(gdb) run -k start -X -d /usr/local/apache2
 ※-k startは既に起動しているhttpdをとめてからstart

これでgdbを使ったapacheの探索が可能に。

まずはbreakpointをap_process_requestした時のbacktraceを見てみると

#0  ap_process_request (r=0x916aae0) at http_request.c:252
#1  0x0808b64b in ap_process_http_connection (c=0x9166c68) at http_core.c:184
#2  0x0807bb39 in ap_run_process_connection (c=0x9166c68) at connection.c:43
#3  0x080a1280 in child_main (child_num_arg=) at prefork.c:640
#4  0x080a14e3 in make_child (s=0x90cbc80, slot=0) at prefork.c:680
#5  0x080a22c6 in ap_mpm_run (_pconf=0x90c70a8, plog=0x91051a0, s=0x90cbc80)
    at prefork.c:956
#6  0x080629f5 in main (argc=151802144, argv=0x0) at main.c:717

となってた。
とりあえずMPM(Multi Process Model)のsocketをacceptするまでのコードは置いといてそれ以降のap_run_process_connectionから見ていこうと思う。(child_main, make_child, ap_mpm_runがMPMのコード）

ap_run_process_connection自体はap_hook_process_connectionでフック関数として登録されている関数が呼ばれる。それは以下の箇所

register_hooks /modules/http/http_core.c

static void register_hooks(apr_pool_t *p)
{
......
    else {
        ap_hook_process_connection(ap_process_http_connection,             
        NULL, NULL, APR_HOOK_REALLY_LAST);
    }
......

このregister_hooksにはHTTP関連の色々なfilter APIが登録されていて例えば以下の物がある。

    ap_http_input_filter_handle =
        ap_register_input_filter("HTTP_IN", ap_http_filter,
                                 NULL, AP_FTYPE_PROTOCOL);
    ap_http_header_filter_handle =
        ap_register_output_filter("HTTP_HEADER", 
                                  ap_http_header_filter,
                                  NULL, AP_FTYPE_PROTOCOL);
    ap_chunk_filter_handle =
        ap_register_output_filter("CHUNK", ap_http_chunk_filter,
                                  NULL, AP_FTYPE_TRANSCODE);
    ap_http_outerror_filter_handle =
        ap_register_output_filter("HTTP_OUTERROR", 
                                  ap_http_outerror_filter,
                                  NULL, AP_FTYPE_PROTOCOL);
    ap_byterange_filter_handle =
        ap_register_output_filter("BYTERANGE", 
                                  ap_byterange_filter,
                                  NULL, AP_FTYPE_PROTOCOL);

HTTP_INってのはデフォルトのinputフィルタAPIっぽい（単純にサーバのローカルファイルを読み込む処理）後は名前の通りかな？HEADER処理とかCHUNK処理とか。

ではap_hook_process_connectionで登録されているap_process_http_connectionを調べてみる（さっきのbacktraceで載っていた関数）。grep-findで。

ap_process_http_connection /modules/http/httpd_core.c

static int ap_process_http_connection(conn_rec *c)
{
.....
    /*
     * Read and process each request found on our connection
     * until no requests are left or we decide to close.
     */
    ap_update_child_status(c->sbh, SERVER_BUSY_READ, NULL);
    while ( (r = ap_read_request(c)) != NULL) {
.....

ここにap_read_requestが呼ばれてる。恐らくここでrequestを読み取って処理をしているっぽい。コメントを読むとrequestが残らなくなるまでreadしてそれを処理する。あと場合によっては自分たちでコネクションを切るよ。と書いてある。

 c->keepalive = AP_CONN_UNKNOWN;

c(conn_rec *c)のkeepaliveに値AP_CONN_UNKNOWNを設定している
これは

/include/httpd.h

typedef enum {
    AP_CONN_UNKNOWN,
    AP_CONN_CLOSE,
    AP_CONN_KEEPALIVE
} ap_conn_keepalive_e;

で定義されている。たぶんこのwhileループの処理の中でkeepaliveが無ければこのループを抜けるみたいな処理をしていると思う

        if (c->keepalive != AP_CONN_KEEPALIVE || c->aborted)
            break;

それでこのwhileループのメインの処理は恐らく次

        /* process the request if it was read without error */

        ap_update_child_status(c->sbh, SERVER_BUSY_WRITE, r);
        if (r->status == HTTP_OK)
            ap_process_request(r);

まずはrequest(request_rec *r)のstatusを取得してOKならばrequest処理を進めている(ap_process_request)
この後は次のrequestを読むための処理をしている？
最後のコメントに

        /* Go straight to select() to wait for the next request */

とあり、次のrequestまで待つとあるのでap_read_request()はblockingしてるのかな？という事でap_read_requestを見てみる

ap_read_request /server/protocol.c

request_rec *ap_read_request(conn_rec *conn)
{
    request_rec *r;
.......
    r = apr_pcalloc(p, sizeof(request_rec));

最初の方でrequest_rec構造体を生成してフィルタやらを登録しているのでここがrequest_rec構造体の生成ポイントっぽい。（request_recはかなり重要な構造体らしい）
このap_read_requestの中身をこの構造体に絞ってちょっと見てみると

    apr_bucket_brigade *tmp_bb;
......
    tmp_bb = apr_brigade_create(r->pool, 
                                r->connection->bucket_alloc);

ここでこのrequest用のbrigadeを生成している。この後でrequestの中身をbucketとして入れていくのだろう。

次にコメントで

    /* Get the request... */
    if (!read_request_line(r, tmp_bb)) {
        if (r->status == HTTP_REQUEST_URI_TOO_LARGE) {
....
            ap_send_error_response(r, 0);
            ap_update_child_status(conn->sbh, SERVER_BUSY_LOG, r);
            ap_run_log_transaction(r);
            apr_brigade_destroy(tmp_bb);
            return r;
        }

        apr_brigade_destroy(tmp_bb);
        return NULL;
    }

Get the requestとあるのでここでrequestの中身を取得していると思う。
read_request_lineとあるので1行ずつ取得しているのだろうそれをbrigadeにどんどん繋げているっぽい。ifの中身は失敗したときのエラー処理。

次に進むと

    /* We may have been in keep_alive_timeout mode, so toggle back
     * to the normal timeout mode as we fetch the header lines,
     * as necessary.
     */
    csd = ap_get_module_config(conn->conn_config, &core_module);
    apr_socket_timeout_get(csd, &cur_timeout);
    if (cur_timeout != conn->base_server->timeout) {
        apr_socket_timeout_set(csd, conn->base_server->timeout);
        cur_timeout = conn->base_server->timeout;
    }

コメントを見ると。keep_alive_timeoutモードに居るかもしれないから、必要であればheaderから取ってきてnormal timeout modeに戻すよ。
と書いてある。
次に進むとrequest_recのassbackwards変数を元に場合分けしている。
そこでrequest_recのassbackwardsの部分を見てみると

    /** HTTP/0.9, "simple" request (e.g. GET /foo\n w/no headers) */
    int assbackwards;

とある。これはHTTP/0.9のようなsimple requestかどうかのチェックをしているっぽい。（しかしassbackwardsってすごい変数名だ。。「でたらめな」って意味かな？ assだけ切り離すと大変な事になる。）
先にそのケツの処理を見てみると

    else {
        if (r->header_only) {
       /*
        * Client asked for headers only with HTTP/0.9, 
        * which doesn't send headers! Have to dink things 
        * just to make sure the error message comes through...
        */
....
            r->header_only = 0;
            r->status = HTTP_BAD_REQUEST;
            ap_send_error_response(r, 0);
            ap_update_child_status(conn->sbh, SERVER_BUSY_LOG, r);
            ap_run_log_transaction(r);
            apr_brigade_destroy(tmp_bb);
            return r;
        }
    }

とあって、クライアントはheaderだけを求めてるけど、HTTP/0.9ではrequestがheaderだけってのは駄目だよ！と書いてある。なのでエラー処理が続く。

ではHTTP/0.9じゃない場合（assbackwardsフラグが立っていない場合）

    if (!r->assbackwards) {
        ap_get_mime_headers_core(r, tmp_bb);
        if (r->status != HTTP_REQUEST_TIME_OUT) {
            ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
                          "request failed: error reading the headers");
            ap_send_error_response(r, 0);
            ap_update_child_status(conn->sbh, SERVER_BUSY_LOG, r);
            ap_run_log_transaction(r);
            apr_brigade_destroy(tmp_bb);
            return r;
        }
......
    }

まず、ap_get_mime_headers_coreでheaderのcoreを取得して来ている（となるとさっきのread_request_lineはどこまで読み込んでいるのか？結論から言うと1行だけらしい。 GET / HTTP/1.1 ... の部分だけ）
そして読み込んだrequest_recのstatus HTTP_REQUEST_TIME_OUTかどうかを調べている。HTTP_REQUEST_TIME_OUTじゃないとエラーとして処理しているみたい。そこでこいつを調べてみると

/include/httpd.h

.....
#define HTTP_METHOD_NOT_ALLOWED            405
#define HTTP_NOT_ACCEPTABLE                406
#define HTTP_PROXY_AUTHENTICATION_REQUIRED 407
#define HTTP_REQUEST_TIME_OUT              408
#define HTTP_CONFLICT                      409
#define HTTP_GONE                          410
.....

とありHTTPのstatusコードだった。コード408はRFC2616を見ると

The client did not produce a request within the time that the server was prepared to wait. The client MAY repeat the request without modifications at any later time.

クライアントは時間内にリクエストを送らなかった。クライアントはrequestを繰り返してもいい、それ以降に修正してなくても。
ってことはこれ以外のstatusの場合は全てエラーとして処理されるようだ。そこでこのステータスの初期化部(ap_read_requestの最初の方）を見てみると

   r->status  = HTTP_REQUEST_TIME_OUT;  /* Until we get a request */

とある。ようはデフォでこのstatusになっててそれ以外のstatusがap_get_mime_headers_coreとかで設定されるとエラー処理になると言う事。それじゃあ正常な場合はこの後に

        if (apr_table_get(r->headers_in, "Transfer-Encoding")
            && apr_table_get(r->headers_in, "Content-Length")) {
            /* 2616 section 4.4, point 3: "if both Transfer-Encoding
             * and Content-Length are received, the latter MUST be
             * ignored"; so unset it here to prevent any confusion
             * later. */
            apr_table_unset(r->headers_in, "Content-Length");
        }

って処理をしてる。これはTransfer-EncodingとContent-Lengthのヘッダーを拾ってきて両方ある場合はContent-Lengthを消すって処理をしてる。それはRFC2616の section4.4.3に*MUST*で処理しろと書いてある。