A Simple WebSocket Echo Server

It's time to learn WebSocket protocol by scratching a simple echo server after reading Writing WebSocket Servers.

An overview to the WebScoekt protocol, WebSocket is upon the TCP protocol, it is a replacement to polling when we want to do some real-time works on Web, just like Socket. And like TCP protocol, WebSocket needs to handshake first then communicate with each other, client and server. During communication, either client or server can send data or close the connection at any time.

  1. Handshake

    handshake.png

    The request from above is very standard, it is OK to add headers like "User-Agent", "Refere", "Cookie". If any header is not understood or with an incorrect value, then server should send a "400 Bad Request" and close the socket immediately.

    If server does not understand the "Sec-WebSocket-Version" from the client, then server should tell client that which version it supports by sending "Sec-WebSocket-Version" back.

    As for "Sec-WebSocket-Accept" header, server must derive it from "Sec-WebSocket-Key" that client sent, concatenating client's Sec-WebSocket-Key and "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" together, then take the SHA-1 hash of the result and return the base64 encoding of the hash. If the values of "Sec-WebSocket-Accept" and "Sec-WebSocket-Key" are right, then we can start swapping data.

    When constructing response, remember each header ends with \r\n and put an extra \r\n after the last one, or it will raise a error like: "WebSocket connection to 'ws://xxxxxx/' failed: Invalid frame header".

  2. Communication

    Data Frame Format

    data-frame-format.png

    Exchanging Data Frames requires to extract information from these frames, now let's read (or decode) the fields one by one and step by step:

    1. FIN field (1 bit) tells the whether it is the last message in a series:
      • 0x0, server will keep listening
      • 0x9 and 0xA, ping and pong frames, control frames
        • 0x9 is for pings
        • 0xA is for pongs, a pong should be sent with same Payload data of a ping after receiving the ping
        • for pings and pongs, the max payload length is 125
        • ignore a pong without ever sending a ping; send only one pong if have gotten more than one ping before get the chance to send a pong
      • others, server will consider the message delivered, that is time to close the connection.
    2. RSV1-3 fields (3 bits) for extensions can be ignore.
    3. opcode field (4 bits) defines how to interpret payload data:
      • 0x0 for continuation, that suggests the message had been split up before it was sent.
      • 0x1 for text, 0x2 for binary
      • the others have no meaning.
    4. MASK field (1 bit) tells whether the message is encoded. Server excepts it to be 1, and disconnect from the client if the message is not encoded. When sending a frame back to the client, don't mask it and don't set the bit.
    5. Payload len field tells the length of payload so that server can know when to stop while extracting payload data. Decoding payload length follow 3 steps:
      1. Read bits 9-15 (inclusive) and interpret that as an unsigned integer. If it's 125 or less, then that's the length; you're done. If it's 126, go to step 2. If it's 127, go to step 3.
      2. Read the next 16 bits (2 bytes) and interpret those as an unsigned integer. You're done.
      3. Read the next 64 bits (8 bytes )and interpret those as an unsigned integer (The most significant bit MUST be 0). You're done.
    6. Masking-key field (32 bits or 4 bytes) needs to be read if the MASK field was set (and it should be). We will need it to decode (or unmasking) the encoded data that is payload data field. The way to decode should look like

      #! /usr/bin/env python3
      decoded_data = []
      for i in range(len(encoded_data)):
          decoded_data.append(encoded_data[i] ^ masking_key[i % 4])
      
    7. Payload data field is the data exchanged. If MASK was set, it has to be decoded with masking key.

Let's code

Now that we have known how WebSocket handshakes and communicates, it is about time to code to help us to understand WebSocket. The plan is to build a echo server with socketserver module and to en/decode the frame with struct module, this websocket server is an echo server, that is, server responses what client sent.

If you have not used socketserver and struct module before, please glance at this one and this one.

I write some comments to try to make you understand easier. If you want to download the file, please check it out on Github, and give me a star to let me know this article do help you.

#! /usr/bin/env python3
# echo_server.py
import argparse
import base64
import hashlib
import socketserver
import struct

MAGIC_STRING = b"258EAFA5-E914-47DA-95CA-C5AB0DC85B11"

RSP_TO_BAD_REQ = (
    b"HTTP/1.1 400 Bad Request\r\n"
    b"Content-Type: text/plain\r\n"
    b"Connection: close\r\n"
    b"\r\n"
    b"Incorrect request"
)

RSP_TO_COMPLETE_HANDSHAKE = (
    b"HTTP/1.1 101 Switching Protocols\r\n"
    b"Upgrade: websocket\r\n"
    b"Connection: Upgrade\r\n"
    b"Sec-WebSocket-Accept: %s\r\n\r\n")


class EchoRequestHandler(socketserver.BaseRequestHandler):
    def handle(self):
        """
        The plan:
            1. handshake
            2. if handshake successfully, then start to swap message,
               to swap message, do it in two parts: extract payload
               and send message back.
            3. if handshake failed, tell client that "Bad Request"
        """
        is_handshake_completed = self.handshake()
        if is_handshake_completed:
            while True:
                (
                    fin_and_opcode,
                    payload_len_indicator,
                    payload_len,
                    decoded_payload
                ) = self.extract_payload(self.request.recv(1024).strip())
                self.send_back(fin_and_opcode, payload_len_indicator,
                               payload_len, decoded_payload)

    def handshake(self):
        """
        The plan:
            1. handshake with HTTP GET request
            2. only consider "Connection", "Upgrade" and "Sec-WebSocket-Key"
               headers in this example.
            3. calculate Sec-WebSocket-Accept to send to the client
            4. return True for successful or False for failed to determine
               whether to swap data
        """
        _headers = self.request.recv(1024).strip().split(b"\r\n")
        headers = {}
        for h in _headers:
            try:
                key, value = h.split(b":")
            except ValueError:
                # ignore lines like "GET / HTTP/1.1"
                continue
            headers[key.strip(b" ")] = value.strip(b" ")
        if headers.get(b"Connection") == b"Upgrade" and \
           headers.get(b"Upgrade") == b"websocket":
            sec_websocket_key = headers.get(b"Sec-WebSocket-Key")
            if not sec_websocket_key:
                return False
            # to calcuate Sec-WebSocket-Accept
            sec_websocket_key += MAGIC_STRING
            sec_websocket_key = base64.standard_b64encode(
                hashlib.sha1(sec_websocket_key).digest())
            # no problem now, then complete the handshake
            self.request.sendall(
                RSP_TO_COMPLETE_HANDSHAKE % sec_websocket_key)
            # staring to swap data and to decode the frame
            return True
        else:
            self.request.sendall(RSP_TO_BAD_REQ)
            return False

    def extract_payload(self, frame):
        """
        One thing important is: the frame from the client is
        a bytes or bytearray and every byte equals to 8 bits.
        When we begin to extract fields from frame, this thing
        and data frame format will be the key to let us understand
        how to decode frame.

        When decoding, we need to think about what we will need to
        construct a frame to send back.

        The fields we will need to construct a frame are FIN and opcode,
        payload len, decoded payload and payload len indicator which tells
        us how to get payload len.

        Once we knew that, it is time to decode frame.
        """
        # the first byte stores FIN field and opcode field.
        # the second byte stores MASK field and payload_len indicator
        fin_and_opcode = frame[0]
        payload_len_indicator = frame[1] - 128
        # extract payload_len according to payload_len_indicator
        if payload_len_indicator <= 125:
            # the frame use 7 bits to store payload_len
            payload_len = payload_len_indicator
            mask_key = frame[2:6]
            mask_key_end = 6
        elif payload_len_indicator == 126:
            # the frames use 2 bytes to store payload_len
            payload_len = struct.unpack_from("!H", frame[2:4])[0]
            mask_key = frame[4:8]
            mask_key_end = 8
        else:
            # the frame uses 8 bytes to store payload_len
            payload_len = struct.unpack_from("!Q", frame[2:10])[0]
            mask_key = frame[10:14]
            mask_key_end = 14
        encrypted_payload = frame[
            mask_key_end: mask_key_end+payload_len]
        decoded_payload = bytearray(
            [
                encrypted_payload[i] ^ mask_key[i % 4]
                for i in range(payload_len)
            ])
        return (fin_and_opcode, payload_len_indicator,
                payload_len, decoded_payload)

    def send_back(self, fin_and_opcode, payload_len_indicator,
                  payload_len, decoded_payload):
        """
        To send back, we need to learn 3 things:
        1. the frame to send back whose mask field MUST not be set to 1;
        2. not consider the situation that message fragmentation here
        3. how to construct a frame manually
        """
        decoded_payload = decoded_payload
        if payload_len_indicator <= 125:
            # when payload_len_indicator <= 125,
            # the length of payload is payload_len_indicator
            frame = bytearray(
                [fin_and_opcode, payload_len]) + decoded_payload
        elif payload_len_indicator == 126:
            # unlike that the payload_len_indicator <= 125,
            # in this case, it is necessary to store payload_len
            # in other bytes, as well as payload_len_indicator is 127
            frame = bytearray(
                [fin_and_opcode, payload_len_indicator]) + \
                struct.pack("!H", payload_len) + decoded_payload
        else:
            frame = bytearray(
                [fin_and_opcode, payload_len_indicator]) + \
                struct.pack("!Q", payload_len) + decoded_payload
        self.request.sendall(frame)


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="Kick off a echo websocket server")
    parser.add_argument('host', help='IP or hostname')
    parser.add_argument('-p', help='Port (default=8001)',
                        metavar='port', type=int, default=8001)
    args = parser.parse_args()
    HOST, PORT = args.host, args.p
    socketserver.TCPServer.allow_reuse_address = True
    server = socketserver.TCPServer((HOST, PORT), EchoRequestHandler)
    server.serve_forever()

Time to try it out. To make it quickly, you better have a Chrome browser, I got a "Content Security Policy" problem in Firefox, but it was fine in Chrome. There is no any other browser on my computer, so I don't know other browsers.

Kick off your echo server with

python3 echo_server.py ''

Then fire up your Chrome console, and input these statements

websocket-test.png

Don't panic! Building a websocket server like this is not a fantasy experience. Fortunately, there are many tools can help you build a websocket server easily and quickly. One of them is the well-known Tornado. I am learning how to use Tornado these day. It's a simple and powerful framework, I will write an article about my experience of learning Tornado another days, of course, not for websocket. :).

What is missing? There are 3 things are missing. If you like adventures, I recommend you to do complete these.

  1. data fragmentation and processing
  2. processing of pings and pongs
  3. cross-site checking

Author: saltb0rn (asche34@outlook.com)

Date: 2018-07-26

Emacs 28.2 (Org mode 9.5.5)

Validate