In this article, you’re going to learn about what a WebSocket Protocol is and how it works. Including what problem WebSockets solve, and an overview of how WebSockets are described at the protocol level.
Basically, the idea of WebSockets was borne out of the limitations of HTTP-based technology. With HTTP, a client requests a resource, and the server responds with the requested data. HTTP is a strictly unidirectional protocol — any data sent from the server to the client must be first requested by the client.
In fact, long-polling has traditionally acted as a workaround for this limitation. Not to mention, with long-polling, a client makes an HTTP request with a long timeout period. And, as a result, the server uses that long timeout to push data to the client.
Related Topic: How to Design a Pop‑Up Form that Engages Web Audience
Unfortunately, long-polling works so well — but, it also comes with a drawback. For instance, the resources on the server are tied up throughout the length of the long poll. Even when no data is available to send.
WebSockets, on the other hand, allows for sending message-based data, similar to UDP. But with the reliability of TCP. In other words, WebSocket uses HTTP as the initial transport mechanism but keeps the TCP connection alive.
However, it’s only after the HTTP response is received. So that it can be used for sending messages between client and server. Fortunately, WebSockets allow us to build “real-time” applications without the use of long-polling.
What Is WebSocket Protocol?
To enumerate, a WebSocket is a persistent connection between a client and server. And, in reality, WebSockets provide a bidirectional, full-duplex communications channel. Especially, that operates over HTTP through a single TCP/IP socket connection.
By the same token, a WebSocket Protocol is a computer application platform providing full-duplex communication channels over a single TCP connection. At the same time, the WebSocket Protocol was standardized by the IETF as RFC 6455 in 2011.
While the WebSocket API in Web IDL is being standardized by the W3C. Almost all WebSockets are distinct from HTTP. For they enable a persistent connection between a client and server. And since they provide a bidirectional, full-duplex communications channel that operates over HTTP.
Particularly, through a single TCP/IP socket connection. And at its core, a WebSocket Protocol facilitates message passing between a client and server. Not forgetting, the protocol consists of an opening handshake. Then again, followed by basic message framing, layered over TCP.
Why use WebSocket instead of HTTP?
In older times, the client-server model was built with client requests the server for a resource. The Web was built for this kind of model, and HTTP was sufficient to handle these requests.
However, with new advancements of technologies, the need of online gaming and real-time applications has marked the need of a protocol that could provide a bidirectional connection between client and server to allow live streaming.
Theoretically, many Web Applications have grown up a lot, and are now consuming more data than ever before. The biggest thing holding them back was the traditional HTTP model of client-initiated transactions.
Related Topic: How a CRM Software System works | Wonder what’s Next?
To overcome this, a number of different strategies were devised to allow servers to push data to the client. One of the most popular of these strategies is long-polling. This involves keeping an HTTP connection open until the server has some data to push down to the client.
The problem with all of these solutions is that they carry the overhead of HTTP. Every time you make an HTTP request, a bunch of headers and cookie data are transferred to the server. Initially, HTTP was thought to be modified to create a bidirectional channel between client and server.
But, this model could not sustain because of the HTTP overhead and would certainly introduce latency. But in real-time applications, especially gaming applications, latency cannot be afforded.
Because of this shortcoming of HTTP, that’s why a new protocol that’s now known as WebSocket was designed. Essentially, to run over the same TCP/IP model.
How a WebSocket Protocol works
WebSockets begins life as a standard HTTP request and response. Within that request-response chain, the client asks to open a WebSocket connection, and the server responds (if it’s able to). If this initial handshake is successful, the client and server have agreed to use the existing TCP/IP connection.
The TCP/IP connection was established for the HTTP request as a WebSocket connection. Whereby, data can now flow over this connection using a basic framed message protocol. Once both parties acknowledge that the WebSocket connection should be closed, the TCP connection is torn down.
Related Topic: What Is Website Optimization? 10 Best Tools For Site SEO
WebSockets do not use the
https:// scheme (because they do not follow the HTTP protocol). Rather, WebSocket URIs use a new scheme
wss: for a secure WebSocket). The remainder of the URI is the same as an HTTP URI: a host, port, path, and any query parameters.
"ws:" "//" host [ ":" port ] path [ "?" query ] "wss:" "//" host [ ":" port ] path [ "?" query ]
WebSocket connections can only be established to URIs that follow this scheme. That is if you see a URI with a scheme of
wss://), then both the client and the server MUST follow the WebSocket connection protocol to follow the WebSocket specification.
WebSocket connections are established by upgrading an HTTP request/response pair. A client that supports WebSockets and wants to establish a connection will send an HTTP request that includes a few required headers:
Connectionheader generally controls whether or not the network connection stays open after the current transaction finishes. A common value for this header is
keep-aliveto make sure the connection is persistent to allow for subsequent requests to the same server. During the WebSocket opening handshake we set to header to
Upgrade, signaling that we want to keep the connection alive and use it for non-HTTP requests.
Upgradeheader is used by clients to ask the server to switch to one of the listed protocols, in descending preference order. We specify
websockethere to signal that the client wants to establish a WebSocket connection.
Sec-WebSocket-Keyis a one-time random value (a nonce) generated by the client. The value is a randomly selected 16-byte value that has been base64-encoded.
- The only accepted version of the WebSocket protocol is 13. Any other version listed in this header is invalid.
Together, these headers would result in an HTTP GET request from the client to a
Like in the following example:
GET ws://example.com:8181/ HTTP/1.
Once a client sends the initial request to open a WebSocket connection, it waits for the server’s reply. The reply must have an
HTTP 101 Switching Protocols response code.
HTTP 101 Switching Protocols the response indicates that the server is switching to the protocol that the client requested in its
Upgrade request header. In addition, the server must include HTTP headers that validate the connection was successfully upgraded:
HTTP/1.1 101 Switching Protocols Upgrade: websocket Connection: Upgrade Sec-WebSocket-Accept: fA9dggdnMPU79lJgAE3W4TRnyDM=
- Confirms that the connection has been upgraded.
- Confirms that the connection has been upgraded.
Sec-WebSocket-Acceptis a base64 encoded, and SHA-1 hashed value. Whereas, you generate this value by concatenating the client’s
Sec-WebSocket-Keynonce. And the static value
258EAFA5-E914-47DA-95CA-C5AB0DC85B11defined in RFC 6455.
- Although the
Sec-WebSocket-Key andSec-WebSocket-Accept` seems complicated, they exist for a reason. So that both the client and the server can know that their counterpart supports WebSockets.
But, since the WebSocket re-uses the HTTP connection, there are potential security concerns. Like if either side interprets WebSocket data as an HTTP request. After the client receives the server response, the WebSocket connection is open to start transmitting data.
The WebSocket Protocol
WebSocket is a framed protocol, meaning that a chunk of data (a message) is divided into a number of discrete chunks, with the size of the chunk encoded in the frame. The frame includes a frame type, a payload length, and a data portion. An overview of the frame is given in RFC 6455 and reproduced here.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-------+-+-------------+-------------------------------+ |F|R|R|R| opcode|M| Payload len | Extended payload length | |I|S|S|S| (4) |A| (7) | (16/64) | |N|V|V|V| |S| | (if payload len==126/127) | | |1|2|3| |K| | | +-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - + | Extended payload length continued, if payload len == 127 | + - - - - - - - - - - - - - - - +-------------------------------+ | |Masking-key, if MASK set to 1 | +-------------------------------+-------------------------------+ | Masking-key (continued) | Payload Data | +-------------------------------- - - - - - - - - - - - - - - - + : Payload Data continued ... : + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - + | Payload Data continued ... | +---------------------------------------------------------------+
I won’t cover every piece of the frame protocol here. Refer to RFC 6455 for full details. Rather, I will cover the most important bits so that we can gain an understanding of the WebSocket protocol.
The first bit of the WebSocket header is the Fin bit. This bit is set if this frame is the last data to complete this message.
RSV1, RSV2, RSV3 Bits
These bits are reserved for future use.
Every frame has an opcode that determines how to interpret this frame’s payload data.
|0x00||This frame continues the payload from the previous frame.|
|0x01||Denotes a text frame. Text frames are UTF-8 decoded by the server.|
|0x02||Denotes a binary frame. Binary frames are delivered unchanged by the server.|
|0x03-0x07||Reserved for future use.|
|0x08||Denotes the client wishes to close the connection.|
|0x09||A ping frame. Serves as a heartbeat mechanism ensuring the connection is still alive. The receiver must respond with a pong.|
|0x0a||A pong frame. Serves as a heartbeat mechanism ensuring the connection is still alive. The receiver must respond with a ping frame.|
|0x0b-0x0f||Reserved for future use.|
Setting this bit to 1 enables masking. WebSockets requires that all payloads be obfuscated using a random key (the mask) chosen by the client. The masking key is combined with the payload data using an XOR operation before sending data to the payload.
This masking prevents caches from misinterpreting WebSocket frames as cacheable data. Why should we prevent caching of WebSocket data? Security. What happens during the development of the WebSocket protocol?
Related Topic: How to Purge Cache and Keep your Website content fresh
It was shown that if a compromised server is deployed, and clients connect to that server, it is possible to have intermediate proxies. Or infrastructure cache the responses of the compromised server. So that future clients requesting that data receive the incorrect response.
In general, this attack is called cache poisoning. And it results from the fact that we cannot control how misbehaving proxies behave in the wild. It’s, especially, problematic when introducing a new protocol like WebSocket. Particularly, that has to interact with the existing infrastructure of the internet.
Payload len field and
Extended payload length field are used to encode the total length of the payload data for this frame. If the payload data is small (under 126 bytes), the length is encoded in the
Payload len field.
On the other hand, as the payload data grows, we use the additional fields to encode the length of the payload.
Technically, as discussed with the
MASK bit, all frames sent from the client to the server are masked by a 32-bit value that is contained within the frame. This field is present if the mask bit is set to 1 and is absent if the mask bit is set to 0.
Payload data includes arbitrary application data and any extension data that has been negotiated between the client and the server. Extensions are negotiated during the initial handshake and allow you to extend the WebSocket protocol for additional uses.
How to Close a WebSocket connection
— The WebSocket Close Handshake
By all means, to close a WebSocket connection, a closing frame is sent (opcode
0x08). In addition to the opcode, the close frame may contain a body that indicates the reason for closing.
As such, if either side of a connection receives a close frame, it must send a close frame in response. And no more data should be sent over the connection. Once the close frame has been received by both parties, the TCP connection is torn down.
In the end, the server always initiates closing the TCP connection.
How to Use SSE Instead of WebSockets
As webmasters, whenever we design a web application utilizing real-time data, we need to consider how we are going to deliver our data from the server to the client. Of course, the default answer usually is “WebSockets.”
But is there a better way? Well, let’s compare three different methods in order to understand this case further.
The 3 Methods include:
- Long polling,
- WebSockets, and
- Server-Sent Events.
In order to understand their real-world limitations. The answer might surprise you. In reality, when building a web application, one must consider what kind of delivery mechanism they are going to use.
At hand, let’s say we have a cross-platform application. That works with real-time data; a stock market application providing ability to buy or sell stock in real-time. This application is composed of widgets that bring different values to different users.
Related Topic: Website Conversion Rate | Best tools for online marketers
When it comes to data delivery from the server to the client, we are limited to two general approaches: client pull or server push. As an example, with any web application, the client is the web browser.
Thus, when the website in your browser is asking the server for data, this is called client pull. The reverse, when the server is proactively pushing updates to your website, it is called server push. Nowadays, there are a few ways to implement these.
Ways to Implement include:
- Long/short polling (client pull)
- WebSockets (server push)
- Server-Sent Events (server push)
This is just a kickstart. So, having said that, in this article, you can read and learn more about using WebSockets for Unidirectional Data Flow Over HTTP/2 in detail. Also, as can be seen, this article is only an introduction of the WebSocket Protocol.
But, I am sure that it covers a lot of ground. However, the full protocol has more detail than what I could fit into this blog post. Therefore, if you want to learn more, there are several great resources to choose from. You’ll find them at the end of this article.
In nutshell, WebSockets can transfer as much data as you like. Without incurring the overhead associated with traditional HTTP requests. Data is transferred through a WebSocket as messages.
Each of these consists of one or more frames containing the data you are sending (the payload). In order to ensure the message can be properly reconstructed when it reaches the client, each frame is prefixed with 4-12 bytes of data about the payload.
Related Topic: Why are Unique Visitors so Important in Website Analytics?
Using this frame-based messaging system helps to reduce the amount of non-payload data that is transferred. Leading to significant reductions in latency. Finally, I hope the above-revised guide on WebSocket Protocol was resourceful enough to you as a webmaster.
But, what if you’ll have more additional opinion thoughts, suggestions, contributions, or questions? Well, you can share them in our comments section below this blog or other related articles. While, all in all, you can also Contact Us if you’ll need more help or support.