Posted 2020-01-31Updated 2020-10-25Python / Web FrameWork

WSGI

前言

开个新坑，Django 源码学习以及深入理解 Django Web 框架
首先从 WSGI 开始，本篇和 Django 看似无联系，确实很重要的一个部分
Django 的自带服务器是基于 Python 的 wsgiref 模块实现的，所以我们在测试期间往往不需要部署 nginx 之类的，那么想要理解这里，就要从 PEP 对于WSGI规范的定义开始

WSGI

全名 Python Web Server Gateway Interface

理由和目标

Python currently boasts a wide variety of web application frameworks, such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web – to name just a few. This wide variety of choices can be a problem for new Python users, because generally speaking, their choice of web framework will limit their choice of usable web servers, and vice versa.

By contrast, although Java has just as many web application frameworks available, Java’s “servlet” API makes it possible for applications written with any Java web application framework to run in any web server that supports the servlet API.

The availability and widespread use of such an API in web servers for Python – whether those servers are written in Python (e.g. Medusa), embed(嵌入) Python (e.g. mod_python), or invoke Python via a gateway protocol (e.g. CGI, FastCGI, etc.) – would separate choice of framework from choice of web server, freeing users to choose a pairing that suits them, while freeing framework and server developers to focus on their preferred area of specialization.
This PEP, therefore, proposes(提出) a simple and universal interface between web servers and web applications or frameworks: the Python Web Server Gateway Interface (WSGI)

看原文更有味道

Thus, WSGI must be easy to implement, so that an author’s initial investment in the interface can be reasonably low.
Again, the goal of WSGI is to facilitate easy interconnection of existing servers and applications or frameworks, not to create a new web framework.
it allows for the possibility of an entirely new kind of Python web application framework: one consisting of loosely-coupled WSGI middleware components.

简单来说就是: enable the use of any framework with any server

OverView

The Application/Framework Side

The application object is simply a callable object that accepts two arguments. The term “object” should not be misconstrued as requiring an actual object instance: a function, method, class, or instance with a __call__ method are all acceptable for use as an application object. Application objects must be able to be invoked more than once, as virtually all servers/gateways (other than CGI) will make such repeated requests.

不一定需要实例，只需要 __call__() 接口来提供调用方法

官方文档给出的样例

sample

def simple_app(environ, start_response):
    """Simplest possible application object"""
    status = '200 OK'
    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)
    return ['Hello world!\n']


class AppClass:
    """Produce the same output, but using a class

    (Note: 'AppClass' is the "application" here, so calling it
    returns an instance of 'AppClass', which is then the iterable
    return value of the "application callable" as required by
    the spec.

    If we wanted to use *instances* of 'AppClass' as application
    objects instead, we would have to implement a '__call__'
    method, which would be invoked to execute the application,
    and we would need to create an instance for use by the
    server or gateway.
    """

    def __init__(self, environ, start_response):
        self.environ = environ
        self.start = start_response

    def __iter__(self):
        status = '200 OK'
        response_headers = [('Content-type', 'text/plain')]
        self.start(status, response_headers)
        yield "Hello world!\n"  # 每被迭代一次就返回hello world?

The Server/Gateway Side

The server or gateway invokes the application callable once for each request it receives from an HTTP client, that is directed at the application. To illustrate, here is a simple CGI gateway, implemented as a function taking an application object. Note that this simple example has limited error handling, because by default an uncaught exception will be dumped to sys.stderr and logged by the web server.

sample

import os, sys

def run_with_cgi(application):

    environ = dict(os.environ.items())
    environ['wsgi.input']        = sys.stdin
    environ['wsgi.errors']       = sys.stderr
    environ['wsgi.version']      = (1, 0)
    environ['wsgi.multithread']  = False
    environ['wsgi.multiprocess'] = True
    environ['wsgi.run_once']     = True

    if environ.get('HTTPS', 'off') in ('on', '1'):
        environ['wsgi.url_scheme'] = 'https'
    else:
        environ['wsgi.url_scheme'] = 'http'

    headers_set = []
    headers_sent = []

    def write(data):
        if not headers_set:
            raise AssertionError("write() before start_response()")

        elif not headers_sent:
            # Before the first output, send the stored headers
            # 自动解包依次赋值给变量
            status, response_headers = headers_sent[:] = headers_set
            # 为啥是 \r\n?
            sys.stdout.write('Status: %s\r\n' % status)
            for header in response_headers:
                sys.stdout.write('%s: %s\r\n' % header)
            sys.stdout.write('\r\n')

        sys.stdout.write(data)
        sys.stdout.flush()

    def start_response(status, response_headers, exc_info=None):
        if exc_info:
            try:
                if headers_sent:
                    # Re-raise original exception if headers sent
                    raise exc_info[0], exc_info[1], exc_info[2]
            finally:
                exc_info = None     # avoid dangling circular ref
        elif headers_set:
            raise AssertionError("Headers already set!")
        # 保持 headers_set id 不变，只将值赋给其
        headers_set[:] = [status, response_headers]
        return write

    result = application(environ, start_response)
    try:
        # 把所有返回的结果都写好之后再返回
        for data in result:
            if data:    # don't send headers until body appears
                write(data)
        if not headers_sent:
            write('')   # send headers now if body was empty
    finally:
        if hasattr(result, 'close'):
            result.close()

Middleware

中间键文档中也有给出的例子，不过这个不在我们这次的讨论范围里，中间键是对我们第一步处理的一个二次处理或者一定的补充。

Routing a request to different application objects based on the target URL, after rewriting the environ accordingly.

Allowing multiple applications or frameworks to run side-by-side in the same process

Load balancing and remote processing, by forwarding requests and responses over a network

Perform content postprocessing, such as applying XSL stylesheets

Specification Details

我们的目的起码目前不是自己写一个，无需看详细的细节，前面的都弄明白就行了

总结

简单来说，WSGI 定义了服务器程序和 Web 框架直接通信的手段：服务器程序将请求和包装好的环境变量传给 Web 框架的程序，这种传递方法官方文档给出了传递函数指针的方式或者被调用者实现 callable 的接口，也就是 __call__ 方法。例子中使用的是 start_response，这个函数在服务器类中被定义，在 Web 框架的函数中被调用。结果作为类的一个属性被，当类被加载时，为了获取这个属性，就会去调用 application（框架程序）来处理。以 call 的方式实现的如 Django 的 WSGIHandler，我们会在后面说到。

Python wsgiref

wsgiref 是 Python 内置的一个实现 wsgi 的参考，纯 Python 编写，它提供了一个开发和测试的工具，其实现的功能有：
- 操作 wsgi 的环境变量
- 应答头部的处理
- 实现简单的 HTTP server
- 简单的对程序端和服务器端校验函数

代码结构

wsgiref
|-- handlers.py            # 负责 wsgi 程序的处理
|-- headers.py             # 处理头
|-- __init__.py              #
|-- simple_server.py # 简单的 wsgi HTTP 服务器实现
|-- util.py                     # 帮助函数
|-- validate.py             # wsgi 格式检查和校验

流程

服务器应用程序创建 socket，并监听在特定的端口（往往是80），等待客户端的连接
客户端发送 http 请求
socket server 读取请求的数据，交给 WSGIServer
WSGIServer 首先用继承自 http server 的方法基于 http 的规范解析请求
WSGIServer 把客户端的信息存放在 environ 变量里，然后交给绑定的 handler 处理请求
WSGIRequestHandler 调用继承 HTTPHandler 的方法解析请求，把 method、path 等放在 environ，通过自己的额外函数把服务器端的信息也放到 environ 里
WSGIRequestHandler 调用绑定的 wsgi ServerHandler，把上面包含了服务器信息，客户端信息，将本次请求信息的 environ 传入
wsgi ServerHandler 调用注册的 wsgi app，把 environ 和 start_response 作为参数传递过去
wsgi app 处理后将 reponse header、status、body 回传给 wsgi handler，然后 handler 逐层传递，最后把这些信息通过 socket 发送到客户端，客户端的程序接到应答，解析应答，并把结果打印出来。

源码简单解读

我们可以对 simple_server.py 展开详细的阅读

simple_server.py

class ServerHandler(SimpleHandler):

    server_software = software_version

    def close(self):
        try:
            self.request_handler.log_request(
                self.status.split(' ',1)[0], self.bytes_sent
            )
        finally:
            SimpleHandler.close(self)



class WSGIServer(HTTPServer):

    """BaseHTTPServer that implements the Python WSGI protocol"""

    application = None

    def server_bind(self):
        """Override server_bind to store the server name."""
        HTTPServer.server_bind(self)
        self.setup_environ()

    def setup_environ(self):
        # Set up base environment
        env = self.base_environ = {}
        env['SERVER_NAME'] = self.server_name
        env['GATEWAY_INTERFACE'] = 'CGI/1.1'
        env['SERVER_PORT'] = str(self.server_port)
        env['REMOTE_HOST']=''
        env['CONTENT_LENGTH']=''
        env['SCRIPT_NAME'] = ''

    def get_app(self):
        return self.application

    # 绑定 app
    def set_app(self,application):
        self.application = application



class WSGIRequestHandler(BaseHTTPRequestHandler):

    server_version = "WSGIServer/" + __version__

    def get_environ(self):
        # 拷贝一份基本的环境变量
        env = self.server.base_environ.copy()
        # 设置服务器的版本，协议
        env['SERVER_PROTOCOL'] = self.request_version
        env['SERVER_SOFTWARE'] = self.server_version
        # 请求方法
        env['REQUEST_METHOD'] = self.command
        # 获取请求的 url 中的参数
        if '?' in self.path:
            path,query = self.path.split('?',1)
        else:
            path,query = self.path,''

        env['PATH_INFO'] = urllib.parse.unquote(path, 'iso-8859-1')
        env['QUERY_STRING'] = query

        # 这里不是很明白，大概是对于请求的 ip 的获取
        host = self.address_string()
        if host != self.client_address[0]:
            env['REMOTE_HOST'] = host
        env['REMOTE_ADDR'] = self.client_address[0]

        # 请求内容
        if self.headers.get('content-type') is None:
            env['CONTENT_TYPE'] = self.headers.get_content_type()
        else:
            env['CONTENT_TYPE'] = self.headers['content-type']
        
        # 获取请求内容字段长
        length = self.headers.get('content-length')
        if length:
            env['CONTENT_LENGTH'] = length

        # 除标注定义外的一些额外字段的添加?
        for k, v in self.headers.items():
            k=k.replace('-','_').upper(); v=v.strip()
            if k in env:
                continue                    # skip content length, type,etc.
            if 'HTTP_'+k in env:
                env['HTTP_'+k] += ','+v     # comma-separate multiple headers
            else:
                env['HTTP_'+k] = v
        return env

    def get_stderr(self):
        return sys.stderr

    # 处理单次请求
    def handle(self):
        """Handle a single HTTP request"""

        self.raw_requestline = self.rfile.readline(65537)
        if len(self.raw_requestline) > 65536:
            self.requestline = ''
            self.request_version = ''
            self.command = ''
            self.send_error(414)
            return

        if not self.parse_request(): # An error code has been sent, just exit
            return
        
        # 请求交给 handler 处理
        handler = ServerHandler(
            self.rfile, self.wfile, self.get_stderr(), self.get_environ()
        )
        handler.request_handler = self      # backpointer for logging
        # 把封装的环境变量交给 ServerHandler，然后由 ServerHandler 调用 wsgi app
        handler.run(self.server.get_app())



def demo_app(environ,start_response):
    from io import StringIO
    stdout = StringIO()
    print("Hello world!", file=stdout)
    print(file=stdout)
    h = sorted(environ.items())
    for k,v in h:
        print(k,'=',repr(v), file=stdout)
    start_response("200 OK", [('Content-Type','text/plain; charset=utf-8')])
    return [stdout.getvalue().encode("utf-8")]


def make_server(
    host, port, app, server_class=WSGIServer, handler_class=WSGIRequestHandler
):
    """Create a new WSGI server listening on `host` and `port` for `app`"""
    server = server_class((host, port), handler_class)
    server.set_app(app)
    return server


if __name__ == '__main__':
    with make_server('', 8000, demo_app) as httpd:
        sa = httpd.socket.getsockname()
        print("Serving HTTP on", sa[0], "port", sa[1], "...")
        import webbrowser
        webbrowser.open('http://localhost:8000/xyz?abc')
        httpd.handle_request()  # serve one request, then exit

简易 demo

我们可以自己尝试弄一个简易的服务器来玩

# app.py
def hello_world_app(environ, start_response):
    # environ 是一个包含所有 HTTP 请求信息的 dict 对象
    status = "200 OK"
    # HTTP响应的输出都可以通过 start_response() 加上函数返回值作为 Body
    headers = [("Content-type", "text/html")]
    start_response(status, headers)
    body = "<h1>hello {}</h1>".format(environ['PATH_INFO'][1:] or "Web")  # 去掉第一个斜杠
    return [body.encode("utf-8")]

# server.py
from wsgiref.simple_server import make_server
from app import hello_world_app

httpd = make_server('', 8000, hello_world_app)
print("Starting server at 8000")
httpd.serve_forever()

下集预告?

Django 中自带的 wsgi的实现

参考

WSGI

http://cyx0706.github.io/2020/01/31/django-understanding-1/

Author

Ctwo

Posted on

2020-01-31

Updated on

2020-10-25

Licensed under

#Django

WSGI

前言

WSGI

理由和目标

OverView

The Application/Framework Side

The Server/Gateway Side

Middleware

Specification Details

总结

Python wsgiref

代码结构

流程

源码简单解读

简易 demo

下集预告?

参考

Author

Posted on

Updated on

Licensed under

Comments

Catalogue