MongoDB profiling: writing CLI for tracing Query Logs in GO using the new driver and go modules

We will write a small tool that will help us with mongo query profiling and tracing. It’s very handy when you want to see all queries currently executed in MongoDB. I use it mostly to find long-running queries, queries that don’t use indexes or tracing a business flow in case of a bug.

Main requirements

  • Print currently executing queries to stdout in raw or human-friendly form
  • Knows to work with a replica set
  • Support for config files (cause we have many environments)

The research

Before we start, let’s talk about MongoDB profiler. All the information we need to be able to write our tool exists on the documentation page:

Things we should account before we start coding:

Capped collection — once a collection fills its allocated space, it makes room for new documents by overwriting the oldest documents in the collection.

It means we must retrieve the logs asap, to prevent losing the logs when the database is under real load (in case you are profiling production or just doing a benchmark on any other environment).

Another interesting feature we will use is a Tailable Cursor.

We are lucky, cause a new Golang mongo-driver package supports tailable cursors.

Also, we will use some cool packages like :

Let’s code

I will create the project using a relatively new “modules” feature + vendoring.

So, let’s create the project directory somewhere outside GOPATH:

mkdir -p ~/goprojects/mongotrace
cd ~/goprojects/mongotrace
go mod init github.com/spaiz/mongotrace

Now we have a project directory with a new `go.mod` file.

In GO you cannot import stuff from the main package. So, to allow other people to reuse our package, we will keep the main package as small as possible.

main.go

The main logic will be implemented under theapp package. The App will be responsible to execute our commands.

CLI interface

Our tool will support multiple commands: version, info, enable, disable, tail, where the tail command will support multiple options like colored output, indentation and other. Let’s see some of the usage examples:

mongotrace version
mongotrace info --config="./config.local.json"
mongotrace enable --config="./config.local.json"
mongotrace tail --config="./config.local.json"
mongotrace tail --raw --indent=false --config="./config.local.json"

The Cobra package allows us to easily implement such interface.

app.go

Config

For the config, we will use a simple structure and simple JSON parsing. The only requirements there are to support multiple database connections:

config.local.json

If you want to see the logs from the cluster, the config will look something like this:

config.staging.json

Pay attention for the connect=direct parameter, cause we need to connect separately to each of the cluster’s nodes.

To parse the config we will use a native JSON parser. Nothing really interesting here.

config.go

Later the config will be loaded only for the commands that really need it. For example, the “version” command doesn’t need to read the config file.

MongoDB Connection

I created a tiny wrapper over the MongoDB connection with a few helper methods. The code is quite straightforward.

Commands

Each command will be implemented as an independent entity that implements the Command interface (kind of Command Pattern):

type Command interface {
Execute() error
}

Let’s look at LogReportCommand:

This one responsible for printing currently executing queries in the database. The implementation uses GO’s variant of the Producer-Consumer pattern and uses the channel as a communication buffer.

It knows to parse the log document received from the tailable cursor to LogResult structure and tries to build the original query if the formatting was not disabled. All the formatting stuff encapsulated under Formatter interface, that injected from outside. So the command itself looks relatively clean and small.

cmd_report.go

Formatters

I have separate formatter per MongoDB command. Each formatter tries to “reverse engineer” the log back to be a valid MongoDB command. It’s not always possible, cause MongoDB has bugs like this one. Also, the log document may have a different structure in different MongoDB versions. So, it’s not very reliable but sometimes just helpful to see what is going on now in the database.

Let’s look at just for example how the GROUP MongoDB command built from the log. I’ll use the query from the MongoDB documentation:

example for the GROUP query

The raw log document looks like:

And this is how you will see it if the formatting is enabled (by default):

The code is very dumb here and was written just by looking to the documentation and the log.

Publishing

Cause we used the modules feature, the flow of the building and publishing is a little different. Now, the build command will not only build the binary but also will update the go.mod file with the used packages.

go build

Also, in GO I prefer to have used packages committed to the repository. To do so, run:

go mod vendor

and all the usage packages will be copied under the vendor folder.

Next time you want to build your project with packages from the vendor directory, you need to run:

go build -mod vendor

To allow others to use the module, it should be published at git and tagged with a specific version.

git init 
git add .
git commit -m "initial commit"
git remote add origin git@github.com:spaiz/mongotrace.git
git push -u origin master
git tag v0.0.2
git push --tags

Bonus

I created a tiny Makefile to help me to build and install the binary with supporting binary versioning.

To build and install it just do:

make build
make install

TIP

When you want to see the queries that are triggered by some specific feature in non-local environment, I suggest you use MongoDB’s $comment field. You can pass the request-id (it has many names like correlation-id, trace-id, etc) to be the $comment in the query. Then you will see this value in the logs.

As a simple example, the query

db.mycollection.find({
name: 'Alex',
$comment: 'request-id-fe4a6604-ec0e-40ec-a72c-6f3b899b7d4b'
});

will generate this log, and you can find all the queries by searching for the request-id-fe4a6604-ec0e-40ec-a72c-6f3b899b7d4b value…

That’s all.

You can find the source code at git: https://github.com/spaiz/mongotrace

In GO we trust. Software Engineer.