Solved: How to use mssql with DRBD and pacemaker

i26c2

New Member
Joined
Jan 29, 2020
Messages
10
Reaction score
0
Credits
0
Hi,

I am fairly new to databases. We use software that uses mssql which typically runs on Windows machines, but we have a client that has only Linux servers. On two test servers I have successfully installed mysql, setup drdb, and setup pacemaker to switch which server has access to the drdb. I have also configured mssql so that it uses the drdb drive (mounted as /var/sql) as its location for the master database as well as the default for new databases created. I have verified that this works by starting mssql manually, creating a DB, stopping mssql, stopping and starting pacemaker to fail over, and doing the same on the other server. The two servers can see the DBs that the other created. As far as mssql is concerned its all one master database, which is perfect.

The only thing is, I don't want to have to manually stop and start mssql in the event of a failure. I have been unable to find instructions on how to do this automatically with pacemaker. Can someone help me out? I don't want to use the Microsoft solution of creating an HA server, as A. This requires licencing a higher version of mssql (I am running express), and B. I want the solution to be seamless so mssql acts as if its just one DB.

By the way the test servers are Dell servers with 3 hard drives each (two running as RAID 1, the third as a stand alone HDD used for the DRDB).
I am running Ubuntu 16.04 on both servers (as recommended by Microsoft).

Thanks.
 


So update I am all over the damn place trying to put bits and pieces of different instructions together from different sources. It turns out there are instructions for this, using pcs, but of course all my work up until that point had been using crm. It turns out pcs will just wipe out everything you did in crm. I am almost there but nothing on my nodes are starting and it looks like my CIB is corrupt. I want to reload my cib based on one I had saved with pcs before pushing it, but I get error Call cib_replace failed (-205): Update was older than existing configuration, how do I fix that?
 
ah, apparently by manually updating the "epoch" value in the config file to be higher
 
so I can't get mssql to go, here is what I have:


andrew@sql1:~$ sudo pcs constraint list
Location Constraints:
Ordering Constraints:
promote drbd_res-master then start fs_res (kind:Mandatory)
Colocation Constraints:
fs_res with drbd_res-master (score:INFINITY) (rsc-role:Master) (with-rsc-role:Master)
virtualip with mssqlha (score:INFINITY)
fs_res with mssqlha (score:INFINITY)
andrew@sql1:~$ sudo pcs resource show
mssqlha (ocf::mssql:fci): Stopped
virtualip (ocf::heartbeat:IPaddr2): Stopped
Master/Slave Set: drbd_res-master [drbd_res]
Masters: [ sql1 ]
Stopped: [ sql2 ]
fs_res (ocf::heartbeat:Filesystem): Stopped

If I remove the constraints tying the drbd filesystem and the virtualip to mssql, then I can get everything running except mssql (probably because mssql is trying to start before the filesystem and is crashing). Is there something wrong with how I have my constraints? Any ideas?
 
I tried changing the order of the collocation constraints and it does the same thing. I even tried adding an order constraint so that theoretically the FS should boot up before MSSQL but it doesn't help. Interestingly if I remove those two colocation restraints for mssql all the other resources run correctly. I can then manually restart mssql-server and it runs just fine and I can log in. There is something about the way the resource launches mssql that mssql doens't like
 
so I found the issue by running crm_mon:

* mssqlha_start_0 on sql1 'invalid parameter' (2): call=20, status=complete, exitreason='
2020/01/30 12:07:39 Could not read credentials file: could not read first line to extract
username',
last-rc-change='Thu Jan 30 12:07:38 2020', queued=0ms, exec=1223ms

It might have something to do with the hacluster user that was created and permissions perhaps
 
hmmm, it works fine if I manually log in as that user. I am basically trying to follow these instructions:

Configure Red Hat Enterprise Linux shared disk cluster for SQL Server at http://box.commie.cn/files/linux.pdf

The only difference is I am using DRBD instead of NFS file sharing. I had followed https://www.theurbanpenguin.com/drbd-pacemaker-ha-cluster-ubuntu-16-04/ to setup DRDB originally and doing some poking around I have converted my DRDB from using crm to using pcs so that it should work with the first page's instructions. I am not sure why the resource will not start mssql but the manual command will. It is probably because the DRBD resource is not started so there is no file to read from, but then why when I put the colocation restrictions in to prevent this from happening, all my resources including drdb and floating ip stop working. Some please help
 
So because it was buried in the part baout setting up NFS which I was not doing because I am using DRBD, I missed an important step about adding the username and password into the /var/opt/mssql/secrets/passwd file. I am not sure which user it wants me to put in their but I assumed the hacluster one, so I added that and now I get this instead:

* mssqlha_start_0 on sql1 'unknown error' (1): call=23, status=complete, exitreason='SQL Server crashed duri
ng startup.',

Checking the logs I see this:

2020-01-31 10:07:10.14 spid23s Error: 39002, Severity: 16, State: 1.
QL failed to boot extensibility for error code 0x80070005
2020-01-31 10:07:10.14 spid23s InitializeXdbPkgLauncher failed. ErrorCode: 0x80004005.

Everything I see about these errors seems to point to them happening when trying to configure mssql but it is already configured. Plus I can log into it fine by once cluster is up starting the mssql-server process manually and logging in locally as either SA or the hacluster user just fine. It seems it is only when the cluster tries to start mssql that I get this problem. Does anyone have any ideas?

Edit: so it seems if I wait 30 seconds the resource does come online. It seems to be a timing issue it fails to load and then does, maybe I can fix this with restraints, if not still looking for ideas
 
Last edited:
I got it working!!!! I had to go back and install mssql_ha on the second node because I forgot that, and did the secret file on it too and failover was working but giving the above error and going slow, so I added an order constraint that the drbd filesystem had to be running before mssql could start and now it fails over flawlessly. :) I will probably leave this thread open and create a step-by-step guide of what I did in case others need it, but that will take some time.
 
I have pieced together what I did best I can remember based on the resources I used. Note that this cluster is not safe for production because it can split brain very easily. My next step will be to use the second NIC card to jumper between the servers and use that for the cluster, as that would be safer than relying on a switch (less likely it would become unplugged, not reliant on its on power supply, etc). In the meantime I will close the thread and here are the steps I believe I took in case anyone like me needs help with this:

How to Setup Microsoft SQL Server for Linux on a Pacemaker Cluster Using DRBD

Requirements:
2 servers, at least 2 logical drives on each, connected to the same switch with access to the internet
At least 2GB RAM on each
Ubuntu 16.04

Step 1: Install Ubuntu 16.04

Step 2: Setup networking

Setup each server on a static IP with a gateway and DNS server so they can reach the internet and each other. Add all servers to the /etc/hosts file on each server so they can ping each other by hostname.

Step 3: Setup DRBD
Download DRBD components on both servers:
sudo apt-get update
sudo apt-get install -y drbd8-utils

install NTP on both servers:
sudo apt-get install -y ntpsudo

Zero out the drive on both servers:
dd if=/dev/zero of=/dev/sdb

on each host edit the /etc/drbd.conf:
global { usage-count no; }
common { protocol C; }
resource r0 {
on <hostname1> {
device /dev/drbd0;
disk /dev/sdb;
address <IPAddress1>:7788;
meta-disk internal;
}
on <hostname2> {
device /dev/drbd0;
disk /dev/sdb;
address <IPAddress2>:7788;
meta-disk internal;
}
}

Still working on both systems, load the kernel module with:
sudo modprobe drbd

Create the mirror device:
sudo drbdadm create-md r0

bring the mirror device online:
sudo drbdadm up r0

To view the status of the mirroring, use one of the following commands:
sudo drbd-overview
sudo cat /proc/drbd

We will see both nodes as secondary and the data to be inconsistent. We can correct this by forcing the Primary role on one node. Only do this on one node!

sudo drbdadm -- --overwrite-data-of-peer primary r0/0

To view the progress:
sudo watch cat /proc/drbd

Once complete, from the Primary node format the disk and mount it.
sudo mkfs.ext4 /dev/drbd0
sudo mkdir -p /var/sql
sudo mount /dev/drbd0 /var/sql

Step 3: Install and configure SQL Server on each cluster node

Import the public repository GPG keys:
wget -qO- https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add –

Register the Microsoft SQL Server Ubuntu repository for SQL Server 2019:
sudo add-apt-repository "$(wget -qO- https://packages.microsoft.com/config/ubuntu/16.04/mssql-server-2019.list)"

Run the following commands to install SQL Server:
sudo apt-get update
sudo apt-get install -y mssql-server

After the package installation finishes, run mssql-conf setup and follow the prompts to set the SA password and choose your edition:
sudo /opt/mssql/bin/mssql-conf setup

Once the configuration is done, verify that the service is running:
systemctl status mssql-server --no-pager


Step 4: Install the SQL Server command-line tools

To create a database, you need to connect with a tool that can run Transact-SQL statements on the SQL Server. The following steps install the SQL Server command-line tools: sqlcmd and bcp.

Import the public repository GPG keys:
curl https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add –

Register the Microsoft Ubuntu repository:
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list | sudo tee /etc/apt/sources.list.d/msprod.list

Update the sources list and run the installation command with the unix ODBC developer package:
sudo apt-get update
sudo apt-get install mssql-tools unixodbc-dev

To make sqlcmd/bcp accessible from the bash shell for login sessions, modify your PATH in the ~/.bash_profile file with the following command:

echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bash_profile
To make sqlcmd/bcp accessible from the bash shell for interactive/non-login sessions, modify the PATH in the ~/.bashrc file with the following command:
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc
source ~/.bashrc

The following steps use sqlcmd to locally connect to your new SQL Server instance. Run sqlcmd with parameters for your SQL Server name (-S), the user name (-U), and the password (-P). In this tutorial, you are connecting locally, so the server name is localhost. The user name is SA and the password is the one you provided for the SA account during setup:

sqlcmd -S localhost -U SA -P '<YourPassword>'

If successful, you should get to a sqlcmd command prompt: 1>

Step 5: Change the default data or log directory location
The filelocation.defaultdatadir and filelocation.defaultlogdir settings change the location where the new database and log files are created. By default, this location is /var/opt/mssql/data, but we want them to be on the DRBD drive. Start by running the following commands on the primary server.

Create the target directory for new database data and log files:
sudo mkdir /var/sql

Change the owner and group of the directory to the mssql user:
sudo chown mssql /var/sql
sudo chgrp mssql /var/sql

Use mssql-conf to change the default data directory with the set command:
sudo /opt/mssql/bin/mssql-conf set filelocation.defaultdatadir /var/sql
Restart the SQL Server service:
sudo systemctl restart mssql-server

Now all the database files for the new databases created will be stored in this new location. Next do the same for the log files:
sudo mkdir /var/sql/log
sudo chown mssql /var/sql/log
sudo /opt/mssql/bin/mssql-conf set filelocation.defaultdatadir /var/sql/log

Restart the SQL Server service:
sudo systemctl restart mssql-server

On the second server, do the following:
sudo mkdir /var/sql
sudo chown mssql /var/sql
sudo chgrp mssql /var/sql
sudo /opt/mssql/bin/mssql-conf set filelocation.defaultdatadir /var/sql
DO NOT RESTART SQL YET

Step 6: Change the default master database file directory location

The filelocation.masterdatafile and filelocation.masterlogfile setting changes the location where the SQL Server engine looks for the master database files. By default, this location is /var/opt/mssql/data, but again we want this to use the shared drive.

Create the target directory for new error log files:
sudo mkdir /var/sql/masterdatabasedir

Change the owner and group of the directory to the mssql user:
sudo chown mssql /var/tmp/masterdatabasedir
sudo chgrp mssql /var/tmp/masterdatabasedir

Use mssql-conf to change the default master database directory for the master data and log files with the set command:
sudo /opt/mssql/bin/mssql-conf set filelocation.masterdatafile /var/sql/masterdatabasedir/master.mdf
sudo /opt/mssql/bin/mssql-conf set filelocation.masterlogfile /var/sql/masterdatabasedir/mastlog.ldf

Stop the SQL Server service:
sudo systemctl stop mssql-server

Move the master.mdf and masterlog.ldf:
sudo mv /var/opt/mssql/data/master.mdf /var/sql/masterdatabasedir/master.mdf
sudo mv /var/opt/mssql/data/mastlog.ldf /var/sql/masterdatabasedir/mastlog.ldf

Start the SQL Server service:
sudo systemctl start mssql-server

On the second server, do the following:
sudo /opt/mssql/bin/mssql-conf set filelocation.masterdatafile /var/sql/masterdatabasedir/master.mdf
sudo /opt/mssql/bin/mssql-conf set filelocation.masterlogfile /var/sql/masterdatabasedir/mastlog.ldf
sudo systemctl stop mssql-server
DO NOT START THE SQL SERVER YET

Step 7: Install and configure the cluster

We will use Pacemaker as our Cluster Resource Manager and support can be gained for this from Linbit as with DRBD. When installing Pacemaker we will also install Corosync that is used to sync the Pacemaker cluster details. On both nodes, first ensure that the DRBD service is not enabled on either node:
sudo systemctl disable drbd

Also ensure that the directory is not mounted and the drbd device is not in use on either node:
sudo umount /var/sql
sudo drbdadm down r0

Then install Pacemaker on both nodes:
sudo apt-get install -y pacemaker

On the primary node, create an SQL server login for Pacemaker and grant the login permission to run sp_server_diagnostics . Pacemaker will use this account to verify which nodeis running SQL Server:
sqlcmd -S localhost -U SA -P '<YourPassword>'
USE [master]
GO
CREATE LOGIN [hacluster] with PASSWORD= N'<loginPassword>'
GRANT VIEW SERVER STATE TO hacluster

On the primary node, stop and disableSQL Server:
sudo systemctl stop mssql-server
sudo systemctl disable mssql-server
On both cluster nodes, create a file to store the SQL Server username and password for the Pacemaker login. The following commands create and populate this file:
sudo touch /var/opt/mssql/secrets/passwd
sudo echo 'hacluster' >> /var/opt/mssql/secrets/passwd
sudo echo '<loginPassword>' >> /var/opt/mssql/secrets/passwd
sudo chown root:root /var/opt/mssql/secrets/passwd
sudo chmod 600 /var/opt/mssql/secrets/passwd

Install Pacemaker packages on each node:
sudo yum install pacemaker pcs fence-agents-all resource-agents

Set the password for the default user that is created when installing Pacemaker and Corosync packages. Use the same password on both nodes:
sudo passwd hacluster

Enable and start pcsd service. This will allow nodes to rejoin the cluster after a reboot.
Run the following command on both nodes:
sudo systemctl enable pcsd
sudo systemctl start pcsd

Install the FCI resource agent for SQL Server. Run the following command on both nodes:
sudo apt-get install mssql-server-ha

On one of the nodes, create the cluster:
sudo pcs cluster auth <hostname1 hostname2 …> -u hacluster
sudo pcs cluster setup –name
sudo pcs cluster start --all
Modify the following /etc/corosync/corosync.conf file on both devices:
totem {
version: 2
cluster_name: sql
secauth: off
transport:udpu
interface {
ringnumber: 0
bindnetaddr: <network IP address>
broadcast: yes
mcastport: 5405
}
}

nodelist {
node {
ring0_addr: <IPAddress1>
name: <hostname1>
nodeid: 1
}
node {
ring0_addr: <IPAddress2>
name: <hostname2>
nodeid: 2
}
}

quorum {
provider: corosync_votequorum
two_node: 1
wait_for_all: 1
last_man_standing: 1
auto_tie_breaker: 0
}

Disable Fencing on both nodes:
sudo pcs property set stonith-enabled=false
sudo pcs property set start-failure-is-fatal=false

Then restart corosync and start pacemaker on both nodes:
sudo systemctl restart corosync
sudo systemctl start pacemaker

On either host, use the command crm status to see the cluster come online:
sudo crm status

Configure the cluster resources for SQL Server, FileSystem and virtual IP resources and push the
configuration to the cluster:

sudo pcs cluster cib cfg
sudo pcs -f cfg resource create mssqlha ocf:mssql:fci op defaults timeout=60s
sudo pcs -f cfg resource create virtualip ocf:heartbeat:IPaddr2 ip=<floating IP>
sudo pcs -f cfg drbd_res ocf:linbit:drbd params drbd_resource=r0 op monitor interval=29s –master meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
sudo pcs -f cfg fs_res ocf:heartbeat:Filesystem params device=/dev/drbd0 directory=/var/sql fstype=ext4
sudo pcs -f cfg constraint colocation add fs_res drbd_res-master
sudo pcs -f cfg constraint colocation add virtualip mssqlha
sudo pcs -f cfg constraint colocation add fs_res mssqlha
sudo pcs -f cfg constraint order promote drbd_res-master then start fs_res
sudo pcs -f cfg constraint order start fs_res then start mssqlha
sudo pcs cluster cib-push cfg


Step 8: Test

Verify thatSQL Server is started:
sudo pcs status

You should see something like this:



Notice that everything is started and that the MSSQL server, virtualip, and file, DRBD master, and DRBD file system are all running on the same node. If this is not the case, there is a probably configuration mistake somewhere. If the SQL server did not start, there is another command that can give more details:
sudo crm_mon

This command refreshes status in real time and will also show errors at the bottom if resources failed to start. Press Ctrl-C to go back to the command line.

Once everything is running as above, the next step is to make sure you can login to the database and to test a failover. First log in to MSSQL:
sqlcmd -S localhost -U hacluster -P '<YourPassword>'

Next, create a new Database:
CREATE DATABASE TestDB
SELECT Name from sys.Databases
GO

You should see something like this:


Notice the new database you created listed at the bottom. Log out of MSSQL by typing quit to return to the command line.

Now it is time to test a failover. First monitor the cluster from the second server by running:
sudo crm_mon

Then on the first server break its connection to the cluster by stopping pacemaker:
sudo systemctl stop pacemaker

You will notice on the second server the cluster changing. The first server should show as disconnected and the second server will take over as master for DRBD. It should also take over the floating IP, DRBD file system, and MSSQL resources. If this does not occur, or if the SQL server does not start, there may be a configuration mistake.

Then on the first server, reconnect by starting pacemaker again:
sudo systemctl start pacemaker

On the second server you should see the first server become slave, but nothing else should change. Now repeat the steps above to connect to MSSQL on the second server, create a new DB, and view the databases available. You should see the TestDB database that you created on the first node, as well as the new one you just created (TestDB2, for example). Then return to the command line with the quit command.

Finally, monitor the cluster on the first server, while you stop and start pacemaker on the second. This should fail back over to the first server, which should be master and be in controller of DRBD, DRBD filesystem, and the floating IP, with the second server just a slave to DRBD. Again, log in to MSSQL on the first database and verify that both databases are there.
 

Members online


Latest posts

Top