Writing MySQL Proxy in GO for self-learning: Part 2 — decoding handshake packet

10 min readApr 29, 2020

After we built a generic TCP proxy, we can continue with our journey. Today’s goal will be to understand the MySQL Protocol, receive, decode, encode and send to the client the first packet sent by the MySQL Server.

MySQL connections threads

Each client connection to MySQL Server handled by a thread. MySQL is portable, so the underhood threads implementation is system dependent (Windows, macOS, and Linux have their own threads implementation).

What important to us, it’s to understand that a single client connection is handled by a corresponding thread in MySQL. This thread is responsible to do authentication and executing all the queries and commands initiated by the client. Creating a new thread considered a heavy operation. Less heavy then forking a process, like in PostgreSQL for example, but still…

To avoid unnecessary resource usage on a thread creation, MySQL caches connections. So, each time you’re asking for a new connection, MySQL will try to reuse threads from the cache. After you close the connection, the thread is returned back to the cache.

Don’t be confused with a connection pool, which is implemented as a plugin for MySQL Enterprise Edition.

The thread cache doesn’t really matter for most of the applications, which are opening and keeping the connections alive. It helps mostly for schedulers which opens a new connection to do its job and then closing it until the next run or web application written in PHP, where the connection is opened on each HTTP request, due to its architecture constraints.

So, at any time, a number of connection threads equal to a number of connected clients (read connections). You can read more about threads management at the documentation.

MySQL Protocol Basics

The basic unit of communication is the application-layer packet. The packets can be compressed, or/and transmitted over the SSL layer. The decision about using compression or SSL is made during the handshake stage and depends on the capabilities and settings of both the client and the server.

All MySQL versions considered forward and backward compatible. Any client can talk to any server, regardless of the version. It’s possible because of using capability flags we will look at later.

Additionally, regardless of the compression option, the packets are divided into two categories: commands sent by the client, and responses returned by the server.

Server response packets are divided into four categories: data packets, end-of-data-stream packets, success report (OK) packets, and the error message packets.

When MySQL client or server wants to send the data, it:

Splits the data into packets of size ²²⁴ bytes (16mb)
Prepends to each chunk a packet header

Each packet consists of the header and the payload. The header includes payload length and sequence ID values.

Payload length stored as a first 3 bytes of the header (that’s why the max size of the packet limited by ²²⁴). Then it’s followed by a single byte that used for storing sequence id value. And finally, it followed by the payload itself.

This is how the MySQL packet could look like in a HEX representation:

Packet: 01 00 00 00 01+------------+-----------------+-------------+
| Length = 1 | Sequence ID = 0 | Payload = 1 |
+------------+-----------------+-------------+
| 01 00 00   | 00              | 01          |
+------------+-----------------+-------------+

The Handshake Stage

MySQL has a special connection phase, which performs these tasks:

Exchange the capabilities of client and server
Setup SSL communication channel if requested
Authenticate the client against the server

We will skip the SSL configuration, and assume our client are connecting with no SSL and no compression.

Initial Handshake starts with server sending the Initial Handshake Packet. After this, optionally, client can request an SSL connection to be established with SSL Connection Request Packet, and then client sends the Handshake Response Packet.

This stage is our target for today.

Sniffing MySQL traffic

Let’s “feel” the data flow between the client and MySQL Server.

Install the ngrep utility. It’s a network packet analyzer that we will use as a sniffer.

brew install ngrep

And run it against the local MySQL Server instance:

sudo ngrep -x -q -d lo0 '' 'port 3306'

Now, you can connect to the MySQL with any client, and see all the data that are flowing between the client and server in the terminal. This is what I got.

What’s happening here?

The client connects the MySQL Server and the server replies with theHandshakeV10 packet (don’t be confused with TCP packet), which includes protocol version, server version, auth-plugin and other fields we will cover later.

The client receives the HandshakeV10packet, validates and sends back theHandshakeResponse packet, which includes hashed credentials. Server receives theHandshakeResponse, and replies either with OK or Error packet.

If there were no errors, the client and server considered connected, and the client can start sending queries to the database and receive response after each request.

The client and server use a synchronous approach when communicating. It means that after the client sends any command to the server, it waits for response before sending the next one.

Exploring the handshake packet

This is the definition of Protocol::HandshakeV10, the first packet the MySQL Server send to the client right after the TCP connection has been established.

1              [0a] protocol version
string[NUL]    server version
4              connection id
string[8]      auth-plugin-data-part-1
1              [00] filler
2              capability flags (lower 2 bytes)
  if more data in the packet:
1              character set
2              status flags
2              capability flags (upper 2 bytes)
  if capabilities & CLIENT_PLUGIN_AUTH {
1              length of auth-plugin-data
  } else {
1              [00]
  }
string[10]     reserved (all [00])
  if capabilities & CLIENT_SECURE_CONNECTION {
string[$len]   auth-plugin-data-part-2 ($len=MAX(13, length of auth-plugin-data - 8))
  if capabilities & CLIENT_PLUGIN_AUTH {
string[NUL]    auth-plugin name
  }

Let’s see how we can project this definition to our example. As you can see, the first 4 bytes are the header and the rest is the payload.

The packet definition is quite self describing, so let’s see how we will decode the payload.

Decoding Integers

The MySQL Protocol has a set of possible encodings for integers:

Fixed-length integers
Length-encoded integers

Let’s see how would we decode 2 byte fixed length integer.

Imagine you have a number, let’s say — 33039 (decimal), and you want to pass it through the network. The max unsigned integer that can be represented by a single byte is 255 (²⁸-1). It means, we need exact 2 bytes to represent the number 33039

When you send these two bytes through the network, the receiver needs to know how the number was encoded (Little Endian vs Big Endian) to be able to convert the raw bytes to the internal integer representation.

A big-endian means the least significant byte is the last byte in stream, and a little-endian ordering means the least significant byte is the first byte.

The least significant byte means a byte that have the least effect on the overall number if you remove it. Let’s see example using decimals.

In the number 33039 the least significant number is 9, cause if you change 9 to be for example 4 (33034), the number haven’t changed much. As an opposite, if you change the 3 to be 1 (13039), the new number is ~2.5 times less than it was.

The same rules applied to the multibyte integer encoding. In our case, we know that the MySQL Protocol is portable, and it uses Little Endian to encode the numbers.

So, to decode the 33039 that was encoded with Little Endian, and can be represented by 16 bit integer, we keep the first byte (byte at position 0 in the stream or byte array) as is, and shift the second byte (byte in position 1) by 8, and then apply bitwise sum operator:

uint16(b[0]) | uint16(b[1])<<8

This is what happens in the low level:

b[0]                = 00001111 (8 bit integer)
uint16(b[0])        = 0000000000001111 (we created 16 bit integer)They are both still represent the same value (15 in decimal)b[1]                 = 10000001 (8 bit integer)
uint16(b[1])         = 0000000010000001 (we created 16 bit integer)But the value of second byte alone doesn't make sense to us. So we move it to be in the "right place" by shifting it by 8 bits.uint16(b[1])<<8      = 1000000100000000 (shifting left 8 times)and finally we use bitwise sumuint16(b[0]) | uint16(b[1])<<8 = 1000000100001111 (33039 in decimal)We successfully reconstructed the original value.Just remind you how the binary sum works:0000000000001111
1000000100000000
----------------
1000000100001111

What will happen if we use the wrong byte order? This what will happen if we mistakenly thought that the number was encoded as Big Endian.

b[0]                = 00001111 (8 bit integer)
uint16(b[0])        = 0000000000001111 (we created 16 bit integer)
uint16(b[0])<<8     = 0000111100000000We shifted the byte at index 0 by 8 bits, cause Big Endian says this byte is the most significant byte, so we move it to the right position.b[1]                 = 10000001 (8 bit integer)
uint16(b[1])         = 0000000010000001 (we created 16 bit integer)And finally apply bitwise sum operatoruint16(b[0]) | uint16(b[1])<<8 = 0000111110000001This number is 3969 in decimal, but the original number was 33039

Now you know how decode the the integers from the raw bytes. Encoding works with the same principles, just in the opposite direction (right shifting).

Decoding Strings

MySQL Protocol supports 5 types of string encodings. You can read more in the docs.

string<lenenc>Protocol::LengthEncodedString
string<fix>Protocol::FixedLengthString
string<var>Protocol::VariableLengthString:
string<EOF>Protocol::RestOfPacketString
string<NUL>Protocol::NulTerminatedString

There is nothing special about decoding strings. We either reading bytes until we get terminal symbol (NUL) or by reading specific numbers of bytes.

We can know how many bytes to read either by using hardcoded value (like for fixed length strings) or by reading some other field in the packet which value will be used as a string length size.

Flags

MySQL Protocol uses bitmask, a well known trick to implement flags. Each flag is an unsigned integer with a single bit ON in its binary representation. It allows to encode 8 different flags with a single byte.

Let’s look at CapabilityFlags.

We see, that CapabilityFlags implemented by 4 bytes (32 bit unsigned integer). In the docs it represented by HEX for readability. And as we saw in the HandshakeV10 packet definition, CapabilityFlags passes as two separate parts, 2 bytes each. We call it lower and upper bytes.

For example, given an capability flags unsigned integer, to check if it includes clientPluginAuth ON, we will use bitwise AND (&) operator.

After we apply & operator, the new integers keeps bits ON only if it was ON in both integers. Other bit becomes to be zeros. You can learn more about bitwise operators on https://en.wikipedia.org/wiki/Mask_(computing).

The code

This is what was done as part of this second article:

Data read/write directly from the socket for handshake only
Catch ^C signal, shut down connections and exit (uses context)
Structs created (PacketHeader, InitialHandshakePacket, CapabilityFlag)

Also I used GO’s famous iota feature to implements flags for the CapabilityFlag.

The only issue is that in GO you don’t have an easy way to print a name of the const variable, cause the constants are resolved at compile time (we cannot use reflect :/

So, to print the flags I used a simple map. There is also an Stringer package, that can be used with new go generatefeature.

Now, let’s see a very dumb decoding/encoding implementation. Again, the main goal was to understand the protocol and implement kind of POC. It works, and it’s enough for me :)

As you can see, decoding done just by taking the needed bytes in a specific position, and decoding it according to the documentation. The “hard” thing was to remember how to work with numbers and bitwise operators. But as u see, it’s quite simple.

Instead of manual integer decoding I used already existing functions like

binary.LittleEndian.Uint16()

And to decode null terminated string I also used existing function provided by GO’s standard lib

index := bytes.IndexByte(payload, byte(0x00))

After the packet was decoded, we encode it back using Encode() method, and send it to the client. From now we can cheat our client, by changing any value of the handshake packet.