Docker and Swarm Mode – Part 3

Refresher

In part 1 we have created a swarm of 5 nodes of which we defined 3 to be master nodes and the remaining ones worker nodes. Then we deployed the open source version of the Docker Registry v2 in our swarm. On node1 of our swarm we cloned the GitHub repository of Jerome Petazzo containing the dockercoins application that mines Docker coins and consists of 4 services rng, hasher, worker and webui. We then created images for the 4 services and pushed them to our local registry listening at port 5000. Normally the Docker Registry wants us to communicate via TLS but to make it simple we use it on localhost:5000. when using the registry on localhost the communication is in plain text and no TLS encryption is needed. By defining the registry service to publish port 5000 each node in the swarm can now use localhost:5000 to access the registry, even if the registry itself is running on a different node. In this case the swarm will automatically forward the call to the correct node.

If on any node we execute the following command

curl localhost:5000/v2/_catalog

we should see something similar to this

In part 2 we then learned about services, tasks and software defined networks and how they are related.

Now it is time to use all what we have learned so far and get our mining application up and running.

Running the Mining Application

When we want to run an application in a swarm we first want to define a network. The services will then be running on this network. The type of network has to be overlay so that our application can span all the nodes of the swarm. Let’s do that. We call our network dockercoins

docker network create dockercoins --driver overlay

We can double check that it has been created by using this command

docker network ls

which lists all networks visible to the node on which I am (node1 in this case). In my case it looks like this and we can see the newly created network in the list

Next we are going to run the Redis service which is used as the storage backend for our mining application. We should already be familiar on how to do that after reading part 2.

docker service create --name redis --network dockercoins redis

Please note how we place the service onto the dockercoins network by using the --network parameter.

After this we run all the other services. To simplify things and avoid repetitive typing we can use a for loop

REGISTRY=localhost:5000
TAG=v0.1
for SERVICE in webui worker hasher rng; do
  docker service create --name $SERVICE --network dockercoins $REGISTRY/$SERVICE:$TAG
done

After running this and waiting for a short moment we should see the following when listing all services with docker service ls

The column replicas in the above image shows 1/1 for each service which indicates that all is good. If there was a problem with any of the services we would see something like 0/1, which indicates the desired number of instances of the service is 1 but the number of running instances is zero.

If we want to see the details of each service we could now use the docker service ps command for each service. This is kind of tedious and thus a better solution is to use some combined command

docker service ls -q | xargs -n1 docker service ps

The output of this for me looks like this

Agreed, it looks a bit messy, but at least I have all the necessary information at one place with a simple command. I expect that Docker will extend the docker servic command with some more global capabilities but for now we have to hack our own commands together.

In the above output we can see that each service runs in a single container and the containers are distributed accross all the nodes of the swarm, e.g. redis runs on node3 and the worker service on node5.

If we wanted to watch our application to start up we could just put the above command as an argument into a watch statement

watch "docker service ls -q | xargs -n1 docker service ps"

which is useful for situations where the individual services need a bit more time to initialize than the simple mining services.

We have one little problem left. As is, the webui service is not accessible from the outside since it has no published port. We can change that by using the update command for a Docker service. If we want to publish the internal port 80 to the host port 8080 we have to do this

docker service update --publish-add 8080:80 webui

After this our service is reachable from the outside. We could also have chosen a more radical way and re-created the service by destroying and creating it again with a --publish 8080:80 statement.

By choosing the update command we instructed the scheduler (Docker Swarm) to terminate the old version of the service and run the updated one instead

If our service would have been scaled out to more than one instance then the swarm would have done a rolling update.

Now we can open a browser and connect to ANY of the nodes of our swarm on port 8080 and we should see the Web UI. Let’s do this. In my case webui is running on node1 with IP address 192.168.99.100 and thus I’ll try to connect to say node2 with IP address 192.168.99.101.

And indeed I see this

Load Balancer

Now in a production system we would not want anyone from the internet hit the webui service directly but we would want to place the service behind a load balancer, e.g. an ELB if running in AWS. The load balancer would then forward the request to any of the nodes of the swarm which in turn would reroute it to the node on which webui is running. An image probably helps to clarify the situation

Logging

What can we do if one of our service instances shows a problem? How can we find out what is the root cause of the problem? We could technically ssh into the swarm node on which the problematic container is running and then use the docker logs [container ID] command to get the details. But this of course is not a scalable solution. There must be a better way of getting insight into our application. The answer is log aggregation. We want to collect the log output of each container and redirect it to a central location e.g. in the cloud.

Commercial Offerings

There are many services that offer just that, some of them being Logentries, SumoLogic, Splunk, Loggly, to just name a few.

Let’s take Logentries as a sample. The company provides a Docker image that we can use to create a container running on each node of the swarm. This container hooks into the event stream of Docker Engine and forwards all event messages to a pre-defined endpoint in the cloud. We can then use the Web client of Logentries to slice and dice the aggregated information and easily find what we’re looking for.

If you do not yet have an account with Logentries you can easily create a 30-days trial account as I did. Once you have created the account you can define a new Log Set by clicking on + Add New

In the following dialog when asked to Select How To Send Your Logs select Docker and then in step 2 define the name of the new log set. I called mine my-log-set. In this step you will also generate a token that you will be using when running the log container.A token has this form

a62dc88a-xxxx-xxxx-xxxx-a1fee4df9557

Once we’re done with the configuration we can execute the following command to start an instance of the Logentries container

docker run -d -v /var/run/docker.sock:/var/run/docker.sock logentries/docker-logentries -t [your-token] -j

If we do this then the container will run on the current node of the swarm and collect and forward all its information. That’s not exactly what we want though! We want to run an instance of the container on each and every node. Thus we use the feature of a global service

docker service create --name log --mount type=bind,source=/var/run/docker.sock,target=/var/run/docker.sock --mode global logentries/docker-logentries -t [your-token] -j

After a short period of time we should have an instance of the Logentries container running on each node and collecting log information. To verify this just ssh into any node of the swarm and run an instance of busybox, e.g. something like

docker run --rm -it busybox echo "Hello world"

while you have Logentries running in Live tail mode. You should see something similar to this

In the above image we can see an entry in the log for each event generated by Docker during the life-cycle of the busybox container.

Logging with an ELK Stack

If we want to run our own log aggregator then we can use the so called ELK stack (ELK = Elastic Search, Logstash and Kibana). We only really need to configure Logstash, the other two services run with defaults.

First we create a network just for logging

docker network create --driver overlay logging

now we can create the service for Elasticsearch

docker service create --network logging --name elasticsearch elasticsearch

Then we will define a service for Kibana. Kibana needs to know where Elasticsearch is found thus we need a tiny bit more configure information

docker service create --network logging --name kibana --publish 5601:5601 -e ELASTICSEARCH_URL=http://elasticsearch:9200 kibana

Note how we use the integrated DNS service to locate the Elasticsearch service via its name in http://elasticsearch:9200.

Finally we need a service for Logstash

docker service create --network logging --name logstash -p 12201:12201/udp logstash -e "$(cat ~/orchestration-workshop/elk/logstash.conf)"

As you can see Logstash needs a configuration which we get from the logstash.conf file that is part of our repository. Also we use the Gelf protocol for logging which uses port 12201/udp.

To see what Logstash is reporting we can localize the Logstash container with docker service ps logstash and then can ssh into the corresponding node and use

docker logs --follow [container id]

where [container id] corresponds to the ID of the Logstash container (the ID we can get via docker ps on the node).

To generate/send a (sample) log message we can e.g. use the following command

docker run --log-driver gelf --log-opt gelf-address=udp://127.0.0.1:12201 --rm busybox echo hello

Now we can update all our services to use the ELK stack with this command

for SERVICE in webui worker rng hasher; do
  docker service update $SERVICE --log-driver gelf --log-opt gelf-address=udp://localhost:12201
done

Finally we can open the Browser at the IP of one of our nodes and port 5601 (e.g. http://192.168.99.101:5601) to see Kibana. Click on the top level menu “Discover” to see the incoming logs. You might want to change the time window and the refresh interval in the top right of the screen to say last 1 hour and every 5 sec.

Summary

In this post I have shown how we can deploy and run an application consisting of multiple services. Once an application runs in production it needs to be monitored. This requires, among other things, that we collect all the log output of all our containers to be aggregated in a central location. I have shown how we can use one of the commercial SaaS offerings to do exactly that and also how we can run our own ELK stack instead. In part 4 I will be showing how we can further automate the deployment of services and the subsequent upgrade to new versions without incurring any downtime.

About Gabriel Schenker

Gabriel N. Schenker started his career as a physicist. Following his passion and interest in stars and the universe he chose to write his Ph.D. thesis in astrophysics. Soon after this he dedicated all his time to his second passion, writing and architecting software. Gabriel has since been working for over 25 years as a consultant, software architect, trainer, and mentor mainly on the .NET platform. He is currently working as senior software architect at Alien Vault in Austin, Texas. Gabriel is passionate about software development and tries to make the life of developers easier by providing guidelines and frameworks to reduce friction in the software development process. Gabriel is married and father of four children and during his spare time likes hiking in the mountains, cooking and reading.
This entry was posted in containers, docker, Elasticsearch, How To, introduction and tagged , , , , . Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Even easier for looking what is going on with logstash :)

    ctnID=$(docker ps -q –filter label=com.docker.swarm.service.name=logstash) &&
    docker logs –follow $ctnID

  • Gabriel this tutorial is rock solid. I read hundreds of Docker docs/tutorials and I love your approach.
    To the point! Continue the good work :)

    ###

    Even easier for looking what is going on with logstash :)

    ctnID=$(docker ps -q –filter label=com.docker.swarm.service.name=logstash) &&
    docker logs –follow $ctnID

    • gabrielschenker

      Thanks for the positive feedback!

  • Thanks Gabriel, awesome tutorials on docker swarm. Keep your good work going, God bless you.

  • stef0

    Hi!
    Great tutorial.

    For elasticsearch 5.0 i had to set this in the host OS, because of an error:

    [2016-11-22T11:49:59,562][INFO ][o.e.b.BootstrapCheck ] [3OkrTgi] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks
    ERROR: bootstrap checks failed
    max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

    Solution:
    sysctl -w vm.max_map_count=262144

  • Greg Gilley

    I’m trying to run the ELK setup on a multi-cpu swarm. I see that you are doing udp:/localhost:12201, will that cause problems with this setup?

    • gabrielschenker

      I have to admit that I’m not a ELK stack expert… Maybe you should ask your question on Stackoverflow

      • Greg Gilley

        To follow up on this. I pinned the logstash service to a particular host in the swarm. Then I used its ip address instead of localhost and things seem to be working as expected.