Zabbix distributes Docker images for each component. Not only does this mean you can quickly standup the monitoring solution, but upgrades also become a simple matter of trading up images.
- Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization.
- So now your VM and firewall are created, it’s time to install Jenkins in VM. Installing Jenkins. To install Jenkins, click on the SSH button at the extreme right of the instance, an ssh terminal will open and enter the following command: Since Jenkins Java coded, we need to install Java first. Sudo apt update sudo apt install openjdk-8-jdk.
In this article, I will show how to stand up and then upgrade a zabbix installation using docker-compose.
The only difference is that the gitlab-runner command is executed inside of a Docker container. Install the Docker image and start the container. Before you begin, ensure Docker is installed. To run gitlab-runner inside a Docker container, you need to make sure that the configuration is not lost when the container is restarted. To run the Docker Engine on Windows in Azure, AWS or GCP, the best way is to install the Docker Engine on Windows solution from the marketplace. It comes preinstalled with the Docker Engine Enterprise Powershell modules and packages and is fully configured as per the requirements to run Docker on Windows Server.
If you don’t already have Docker and Docker Compose installed, then see my article here for installing on Ubuntu.
Also make sure you have a git client
Checkout an older 4.0.1 version
Although Zabbix has newer 4.2 images, we will start by standing up a 4.01 image so that we can later show the upgrade path to 4.0.13 and then the latest 4.2.
Deploy a 4.0.1 stack
Then deploy the full stack described in the yml using docker-compose:
After a few minutes of pulling down the images and creating the containers, it is done. You should see multiple zabbix related containers now when you run:
The status of each should show as “up”. And if you want to follow the tailed logs of all the components:
Validate the 4.0.1 stack
In order to validate, we need to know the IP address of the nginx container that is exposing the Zabbix Admin web interface, which can be done using:
The “zabbix-docker_zbx_net_frontend” used to pull the outside facing IP address is created by Docker Compose, and is the concatenation of the directory name “zabbix-docker” and “zbx_net_frontend” defined in the networks section of the original yaml.
The web front end should be available via HTTP on port 80, which means commands like below should be successful (substitute with your IP from above).
And from the browser, you should be able to login with the default credentials (Admin/zabbix).
This will get you into the Zabbix Admin web interface.
If you scroll to the very bottom of that web page the version will be shown as “Zabbix 4.0.1”.
Enable Zabbix Agent reporting
You may notice that the dashboard lists one problem ‘Zabbix agent on Zabbix server is unreachable for 5 minutes’. This is because unlike a standard installation where they would be processes on the same host, the zabbix agent is in a different container than the server.
To fix, this we need to get the IP address of the zabbix agent which is on the backend network.
Go back to the Zabbix Admin web interface, and go to Configuration > Hosts, and click on the “Zabbix server” to update to the IP address we got above.
Then press the “Update” button. Go back to the zabbix main dashboard and within a couple of minutes, the problem originally seen will disappear.
And if you go to Monitoring > Latest data, and select “Zabbix server” and apply it, you should see the latest values coming in as shown below.
Create a host definition
In order to validate that our definitions and data persist through the our planned upgrade from 4.0.1 -> 4.0-latest -> 4.2, we will now create a host definition with sample data.
This host will have a single trapper item named “mycount” that we will send manually using zabbix_sender (items of type “trapper” are sent by the agent, versus being queried for by the server).
First create a host definition by going to Configuration>Hosts and pressing the “Create Host” button. Then use “testhost1” as the hostname, “Linux servers” as the hostgroup, and press “Add”. This will create the host definition in the database.
This will get you back to the main hosts page, click on “items” on the “testhost1” row to add a field that will accept our data. Press the “Create item” button. Use “mycount” as both the name and key, and use the type of “Zabbix trapper”. Press the “Add” button.
Now that the host and field are created on the zabbix side, we need to send data.
The zabbix-agent container has a “zabbix_sender” binary available. We wil use it to send values to mycount.
If there is a failure, give the host and item a couple of minutes to sync and then retry. When successful, you will see a message like below that says “failed: 0”:
And if you go to Monitoring>Latest data and select the hostname of “testhost1” you should see the latest value sent. Pressing the “graph” link will show you a history of the values you have sent. Beware that you will not always see every value, because this in an aggregated metric fit into a time bucket.
Upgrading to Zabbix 4.0.13
The previous work with setting up hosts was to show that upgrades can be accomplished without losing data. The zabbix binaries are running in containers, but the ‘zbx_env’ directory is where the persistent volumes are kept for the MySql database, external scripts, etc. and will survive through upgrades.
There are several signficant changes between 4.0.1 and 4.0.13 including mysql changing to mysql:8.0 in 4.0.7, and starting in 4.0.10 Docker secrets were introduced, but all these details will be hidden from you during the image upgrade including the required database schema changes.
Checkout the latest 4.0.13 tag from source control. The yml file by default references “4.0-latest” so there is no need to modify it like we did for 4.0.1. Then have Docker Compose bring it back up.
Give the operation a few minutes, and the database time to be updated.
Install Docker On Gcp Vm App
Because the containers were rebuilt, the nginx container fronting the web interface may have changed. Check the IP address again using the command below, then go to the Zabbix Admin web interface using this IP address.
If you scroll to the very bottom of the main Admin web page, you should see “Zabbix 4.0.13” reported. And looking at the latest data you should be able to see testhost1.mycount has the same historical values as before.
Let’s go ahead and send a value of “2” to the mycount item.
And now if we look at the latest data graph for testhost1.mycount, the min/max=1/2, and we can see the graph visualize this change.
Upgrading to Zabbix 4.2-latest
The last step is to upgrade to the next major release, 4.2-latest. First stop the currently running 4.0-latest images, and verify they are gone.
Then checkout the 4.2 branch from source control . The yml file already references “4.2-latest” so there is no need to modify it. Then have Docker Compose bring it back up.
Give the operation a few minutes, and the database time to be updated.
Then find the IP address of the fronting nginx again:
And use this IP to open your browser to the Zabbix Admin web interface again. If you scroll to the very bottom of the main page, you should see “Zabbix 4.2.7” reported.
Now send a value of “3” to the mycount item.
The latest data will reflect a min/max of 1/3 and visually you can see the growth from 1 (using zabbix 4.0.1) to 2 (using zabbix 4.1.13) to 3 (using the latest zabbix 4.2 branch).
shell into container
view the differences between different git versions of file
Bring down just the mysql database, gracefully
copying external volumes so it could be restored if necessary
Checking mysql version
went from using values to secrets between 4.0.1 and 4.0.13
test bringing up only single component
If you do a “docker-compose kill” with default SIGKILL, the mysql database will register this as a crash next time you startup. Need to send SIGTERM instead (-s SIGTERM). This will recover for the same version, but for upgrades will fail and the image will have to be rolled back.
As a comparison, here are the logs when issuing a SIGTERM
docker-compose ‘kill’ can do graceful stop if you send custom signal
In addition to SIGTERM, you can also do a graceful shutdown with:
Wow, that title looks like a mouth full doesn’t it? So why do this? Isn’t there Kubernetes for running Docker containers? Well, sometimes I only need just one container running and I really don’t want to add the overhead of having to maintain another Kubernetes clusters. Lame excuse, but this works well for a small installation where I just don’t need the power of a full Kubernetes cluster.
What exactly is a Container-Optimized OS (COS)? Well, this OS is built by Google to be optimized for being the base operating system of a Kubernetes cluster or even just a simple Docker container. Its built upon the Chromium OS and has a lot of security features baked into it from the onset. For example, the root filesystem is mounted as read only. So the binaries that are on the system don’t change, and can’t be changed. Go ahead and read all about it, the project is a great one.
Here is what I will show you in this blog post. Throughout this blog post, we will use a base Nginx Docker image as the container we want to run. We will first use some short cuts the
gcloud command provides to run a container on this image. We will then go through what would be required to get a image from a private repository up and running. Let’s get started!
All source code is hosted on github as well so you don’t have to retype if you don’t want too.
This post will assume you have a Google cloud account all setup and a Google cloud project already carved out. If you haven’t maybe follow the quickstart tutorial, then come back.
Warning: I will be showing commands that create resources inside of a Google Cloud environment. That means real resources that cost real money. Be careful if you don’t want to be charged.
Let’s do something very basic, we will start up a container using the
gcloud commands to run a container on top of the COS VM. Here is what that
gcloud command looks like:
Take note of the external IP address that is spit out after running that command. We will need that in the next step.
Next we will need to expose that instance with a firewall rule:
Once that is complete, go ahead and try and hit the external IP address of that image. You should now see a “Welcome to nginx!” screen.
Before the next part, if you want to delete that image, go ahead and run
gcloud compute instances delete nginx-vm.
Well that doesn’t look hard does it? We can take images from docker hub and push them up, and bam, we have a running docker container running in the cloud. However, what happens if we want our own code, or code not hosted on the Docker hub? Let’s create a private docker image and show you how to authenticate and pull that image down.
Let’s establish a private docker image that we would like to be running on the COS image. We will take the base nginx image, and modify it with a custom index page to demonstrate we can build and deploy the custom image.
Here are the two files you will need, first the new
And here is what the
Dockerfile would look like:
If you would like to test that locally, run
docker build -t test . && docker run --rm -p 8080:80 test and point the browser to http://localhost:8080, you should see the “Welcome to My Custom nginx!” banner.
Let’s now take our image and push it up into the private repository. This is a little beyond the scope of this blog post, but go ahead and pick somewhere you can upload a docker image that can be accessed over the public internet. I’ve used gitlab but any registry really would work. Just make sure its accessible to the public. If you need help with that part, reach out to me and I’ll attempt to assist.
Now that we have our image up in a registry, we need some way to authenticate with that registry on our VM. If you were doing this locally, I bet you ran a
docker login command at some point to get authenticated with that registry. Well, thats exactly what we are going to do with this VM as well.
So how do we going about running that command? Well, COS has a toolkit installed called cloud-init.
Cloud-init is a set of tools to manage cloud images. These help provide ways to create startup scripts to get your cloud image running in exactly the way you want. They are used on most cloud providers and most distros have hooks or tools that can be used. For this particular case we will be interested in providing user scripts that can run at startup. For this, we will be providing a
user-data variable that provides a
cloud-config block. This block will create a user, a service definition, as well as run a startup script to start that service. Sounds easy right? Well, let’s go!
Install Docker On Gcp Vm Mac
Let’s take a look at what this
cloud-config configuration file may look like here:
This configuration file is used as a template that will generate us a configuration file for our
cloud-config process. Let’s point out some of the more important lines then I’ll explain what the variables are:
Line 1: This flags this file as a cloud config file. If you don’t have this, the system won’t pick it up and process it. It is VERY important to have it exactly this way.
Line 3 - 5: Sets up a user dedicated to running this service. This creates a security sandbox because this user will not have any permissions assigned to it except to run this container.
Line 7 - 26: This is the file contents of
/etc/systemd/system/myservice.service. As the path suggests, this is a
systemd service file that will run our docker container. This section has information about the file permissions as well as the actual content of the file embedded in this configuration.
Line 12 - 15: Sets the description of the service as well as any requirements the service needs to run. In this case, we want the network to be online and docker to be running before this service starts.
Line 17 - 23: Sets how the service starts and stops. This is some of the more important bits. First, we are setting up the home directory to be the user that we had created earlier. This kinds of sandboxes and isolating where this is running. Next the
ExecStartPre is where some of this magic starts to come into place. This is where the
docker login command is executed. All of the parameters are currently variables and can be replaced but this is where the magic happens. Next the
ExecStart actually runs the
docker run command. One important thing to note is the
--network=host flag. We want the docker container to be using the host’s network interfaces instead of creating the normal isolated network stack. This way the container can act just like the host on the network and have all ports exposed without any further magic.
Install Docker On Gcp Vm Software
Line 22 - 23: This sets up the service to restart on failure and wait 10 seconds between each retry
Line 28 - 31: This section runs once the configuration has been read and all other parts are completed. This acts like a startup script. First we setup the internal firewall to accept TCP connections using
iptables. Then we reload the
systemd daemon to read in our configuration files, then we enable and queue up startup of our service. This is really important to note,the
--now --no-block commands are important here because we want the service to start up now, but we also want it queued in the
systemd process so that it waits for the Docker service and network connects to be online. Otherwise our container may not start right because Docker isn’t ready or we can’t reach out to the internet to get our container.
Now, what are all these variables for:
%REGISTRY%: This defines the root domain of where the registry is. For example, in testing this, I used gitlab so the registry would have been
%USERNAME%: This is the username to be logged in as. Keeping with the gitlab theme, I used a deploy token and the username was given to me by gitlab.
%PASSWORDD%: This is the password or token that can be used to log in.
%DOCKER_IMAGE%: This is the docker image path. I.e.
Now, how do I run this? Well, replace all of the variables with the right values and then run this command:
If you take that IP address that is spit out by that command, you should see the “Welcome to My Custom nginx!” banner. If not, maybe read through the troubleshooting section. If everything worked, congratulations!
Alright, the site didn’t come up. Now what? Well, let’s first SSH into the box so we can start investigating what happened. To do that, let’s make sure we have the firewall open by running this command:
gcloud compute firewall-rules create allow-ssh-traffic --allow tcp:22. This should open up the firewall to allow you to SSH into the box by executing the following command
gcloud compute ssh <box name, i.e. test-nginx>.
Now that you are on the box, what exactly are you looking for? Well, for our docker example we want to make sure we have the docker container up and running so we would issue a
docker container ls command to verify it is up and running. If it is up and running, then you may have a firewall issue. Visit the console and inspect the network interface and see if the ingress is setup correctly. What we would like to see is that port 80 is open and working right. If it is, maybe the
iptables didn’t run as expected so verify on the VM instance itself you can do something like
curl localhost. If that works, verify
iptables is correctly setup.
But what if the container isn’t even running? Where do I go from there? Well, we need to investigate the logs and start sifting through what may have happened. For this the
sudo journalctl command is what you start poking around in. This gives you all of the logs of the box as it starts. Another command for our example is the
sudo systemctl status myservice this shows you the raw status of the service, if its up or down, and what may be happening right now with the service.
Another thing to check that I ran into a lot is first very that the cloud init script is even running correctly. A great way to check for that is to verify that the files we were expecting are even there. In this example, we would expect to see a file at
/etc/systemd/system/myservice.service. If that isn’t there, usually the problem is that the very first line doesn’t read
#cloud-config EXACTLY. Double check your config and consult the logs for
user-data parsing and retrieving. There is usually a log message that the file isn’t understood from the variable and is ignoring the data.
Hopefully I’ve given you a taste of how to setup a single Docker container instance using the Container-Optimized OS on the Google Cloud. I’ve found this a useful feature when I don’t really need all the horse power of a kubernetes instance. This lighter weight infrastructure for a few containers provides to be useful and cost effective. But also be warned, these instances are obliviously not replicated in anyway, so if one goes down, it is all down. Be careful in the cloud because every stack should be built on the assumption that cloud resources could come and go at will.
All of this code and examples are hosted on my github account.