I have pieced together what I did best I can remember based on the resources I used. Note that this cluster is not safe for production because it can split brain very easily. My next step will be to use the second NIC card to jumper between the servers and use that for the cluster, as that would be safer than relying on a switch (less likely it would become unplugged, not reliant on its on power supply, etc). In the meantime I will close the thread and here are the steps I believe I took in case anyone like me needs help with this:
How to Setup Microsoft SQL Server for Linux on a Pacemaker Cluster Using DRBD
Requirements:
2 servers, at least 2 logical drives on each, connected to the same switch with access to the internet
At least 2GB RAM on each
Ubuntu 16.04
Step 1: Install Ubuntu 16.04
Step 2: Setup networking
Setup each server on a static IP with a gateway and DNS server so they can reach the internet and each other. Add all servers to the /etc/hosts file on each server so they can ping each other by hostname.
Step 3: Setup DRBD
Download DRBD components on both servers:
sudo apt-get update
sudo apt-get install -y drbd8-utils
install NTP on both servers:
sudo apt-get install -y ntpsudo
Zero out the drive on both servers:
dd if=/dev/zero of=/dev/sdb
on each host edit the /etc/drbd.conf:
global { usage-count no; }
common { protocol C; }
resource r0 {
on <hostname1> {
device /dev/drbd0;
disk /dev/sdb;
address <IPAddress1>:7788;
meta-disk internal;
}
on <hostname2> {
device /dev/drbd0;
disk /dev/sdb;
address <IPAddress2>:7788;
meta-disk internal;
}
}
Still working on both systems, load the kernel module with:
sudo modprobe drbd
Create the mirror device:
sudo drbdadm create-md r0
bring the mirror device online:
sudo drbdadm up r0
To view the status of the mirroring, use one of the following commands:
sudo drbd-overview
sudo cat /proc/drbd
We will see both nodes as secondary and the data to be inconsistent. We can correct this by forcing the Primary role on one node.
Only do this on one node!
sudo drbdadm -- --overwrite-data-of-peer primary r0/0
To view the progress:
sudo watch cat /proc/drbd
Once complete, from the Primary node format the disk and mount it.
sudo mkfs.ext4 /dev/drbd0
sudo mkdir -p /var/sql
sudo mount /dev/drbd0 /var/sql
Step 3: Install and configure SQL Server on each cluster node
Import the public repository GPG keys:
wget -qO-
https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add –
Register the Microsoft SQL Server Ubuntu repository for SQL Server 2019:
sudo add-apt-repository "$(wget -qO-
https://packages.microsoft.com/config/ubuntu/16.04/mssql-server-2019.list)"
Run the following commands to install SQL Server:
sudo apt-get update
sudo apt-get install -y mssql-server
After the package installation finishes, run mssql-conf setup and follow the prompts to set the SA password and choose your edition:
sudo /opt/mssql/bin/mssql-conf setup
Once the configuration is done, verify that the service is running:
systemctl status mssql-server --no-pager
Step 4: Install the SQL Server command-line tools
To create a database, you need to connect with a tool that can run Transact-SQL statements on the SQL Server. The following steps install the SQL Server command-line tools: sqlcmd and bcp.
Import the public repository GPG keys:
curl
https://packages.microsoft.com/keys/microsoft.asc | sudo apt-key add –
Register the Microsoft Ubuntu repository:
curl
https://packages.microsoft.com/config/ubuntu/16.04/prod.list | sudo tee /etc/apt/sources.list.d/msprod.list
Update the sources list and run the installation command with the unix ODBC developer package:
sudo apt-get update
sudo apt-get install mssql-tools unixodbc-dev
To make sqlcmd/bcp accessible from the bash shell for login sessions, modify your PATH in the ~/.bash_profile file with the following command:
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bash_profile
To make sqlcmd/bcp accessible from the bash shell for interactive/non-login sessions, modify the PATH in the ~/.bashrc file with the following command:
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc
source ~/.bashrc
The following steps use sqlcmd to locally connect to your new SQL Server instance. Run sqlcmd with parameters for your SQL Server name (-S), the user name (-U), and the password (-P). In this tutorial, you are connecting locally, so the server name is localhost. The user name is SA and the password is the one you provided for the SA account during setup:
sqlcmd -S localhost -U SA -P '<YourPassword>'
If successful, you should get to a sqlcmd command prompt: 1>
Step 5: Change the default data or log directory location
The filelocation.defaultdatadir and filelocation.defaultlogdir settings change the location where the new database and log files are created. By default, this location is /var/opt/mssql/data, but we want them to be on the DRBD drive. Start by running the following commands on the primary server.
Create the target directory for new database data and log files:
sudo mkdir /var/sql
Change the owner and group of the directory to the mssql user:
sudo chown mssql /var/sql
sudo chgrp mssql /var/sql
Use mssql-conf to change the default data directory with the set command:
sudo /opt/mssql/bin/mssql-conf set filelocation.defaultdatadir /var/sql
Restart the SQL Server service:
sudo systemctl restart mssql-server
Now all the database files for the new databases created will be stored in this new location. Next do the same for the log files:
sudo mkdir /var/sql/log
sudo chown mssql /var/sql/log
sudo /opt/mssql/bin/mssql-conf set filelocation.defaultdatadir /var/sql/log
Restart the SQL Server service:
sudo systemctl restart mssql-server
On the second server, do the following:
sudo mkdir /var/sql
sudo chown mssql /var/sql
sudo chgrp mssql /var/sql
sudo /opt/mssql/bin/mssql-conf set filelocation.defaultdatadir /var/sql
DO NOT RESTART SQL YET
Step 6: Change the default master database file directory location
The filelocation.masterdatafile and filelocation.masterlogfile setting changes the location where the SQL Server engine looks for the master database files. By default, this location is /var/opt/mssql/data, but again we want this to use the shared drive.
Create the target directory for new error log files:
sudo mkdir /var/sql/masterdatabasedir
Change the owner and group of the directory to the mssql user:
sudo chown mssql /var/tmp/masterdatabasedir
sudo chgrp mssql /var/tmp/masterdatabasedir
Use mssql-conf to change the default master database directory for the master data and log files with the set command:
sudo /opt/mssql/bin/mssql-conf set filelocation.masterdatafile /var/sql/masterdatabasedir/master.mdf
sudo /opt/mssql/bin/mssql-conf set filelocation.masterlogfile /var/sql/masterdatabasedir/mastlog.ldf
Stop the SQL Server service:
sudo systemctl stop mssql-server
Move the master.mdf and masterlog.ldf:
sudo mv /var/opt/mssql/data/master.mdf /var/sql/masterdatabasedir/master.mdf
sudo mv /var/opt/mssql/data/mastlog.ldf /var/sql/masterdatabasedir/mastlog.ldf
Start the SQL Server service:
sudo systemctl start mssql-server
On the second server, do the following:
sudo /opt/mssql/bin/mssql-conf set filelocation.masterdatafile /var/sql/masterdatabasedir/master.mdf
sudo /opt/mssql/bin/mssql-conf set filelocation.masterlogfile /var/sql/masterdatabasedir/mastlog.ldf
sudo systemctl stop mssql-server
DO NOT START THE SQL SERVER YET
Step 7: Install and configure the cluster
We will use Pacemaker as our Cluster Resource Manager and support can be gained for this from Linbit as with DRBD. When installing Pacemaker we will also install Corosync that is used to sync the Pacemaker cluster details. On both nodes, first ensure that the DRBD service is not enabled on either node:
sudo systemctl disable drbd
Also ensure that the directory is not mounted and the drbd device is not in use on either node:
sudo umount /var/sql
sudo drbdadm down r0
Then install Pacemaker on both nodes:
sudo apt-get install -y pacemaker
On the primary node, create an SQL server login for Pacemaker and grant the login permission to run sp_server_diagnostics . Pacemaker will use this account to verify which nodeis running SQL Server:
sqlcmd -S localhost -U SA -P '<YourPassword>'
USE [master]
GO
CREATE LOGIN [hacluster] with PASSWORD= N'<loginPassword>'
GRANT VIEW SERVER STATE TO hacluster
On the primary node, stop and disableSQL Server:
sudo systemctl stop mssql-server
sudo systemctl disable mssql-server
On both cluster nodes, create a file to store the SQL Server username and password for the Pacemaker login. The following commands create and populate this file:
sudo touch /var/opt/mssql/secrets/passwd
sudo echo 'hacluster' >> /var/opt/mssql/secrets/passwd
sudo echo '<loginPassword>' >> /var/opt/mssql/secrets/passwd
sudo chown root:root /var/opt/mssql/secrets/passwd
sudo chmod 600 /var/opt/mssql/secrets/passwd
Install Pacemaker packages on each node:
sudo yum install pacemaker pcs fence-agents-all resource-agents
Set the password for the default user that is created when installing Pacemaker and Corosync packages. Use the same password on both nodes:
sudo passwd hacluster
Enable and start pcsd service. This will allow nodes to rejoin the cluster after a reboot.
Run the following command on both nodes:
sudo systemctl enable pcsd
sudo systemctl start pcsd
Install the FCI resource agent for SQL Server. Run the following command on both nodes:
sudo apt-get install mssql-server-ha
On one of the nodes, create the cluster:
sudo pcs cluster auth <hostname1 hostname2 …> -u hacluster
sudo pcs cluster setup –name
sudo pcs cluster start --all
Modify the following /etc/corosync/corosync.conf file on both devices:
totem {
version: 2
cluster_name: sql
secauth: off
transport:udpu
interface {
ringnumber: 0
bindnetaddr: <network IP address>
broadcast: yes
mcastport: 5405
}
}
nodelist {
node {
ring0_addr: <IPAddress1>
name: <hostname1>
nodeid: 1
}
node {
ring0_addr: <IPAddress2>
name: <hostname2>
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
two_node: 1
wait_for_all: 1
last_man_standing: 1
auto_tie_breaker: 0
}
Disable Fencing on both nodes:
sudo pcs property set stonith-enabled=false
sudo pcs property set start-failure-is-fatal=false
Then restart corosync and start pacemaker on both nodes:
sudo systemctl restart corosync
sudo systemctl start pacemaker
On either host, use the command crm status to see the cluster come online:
sudo crm status
Configure the cluster resources for SQL Server, FileSystem and virtual IP resources and push the
configuration to the cluster:
sudo pcs cluster cib cfg
sudo pcs -f cfg resource create mssqlha ocf:mssql:fci op defaults timeout=60s
sudo pcs -f cfg resource create virtualip ocf:heartbeat:IPaddr2 ip=<floating IP>
sudo pcs -f cfg drbd_res ocf:linbit:drbd params drbd_resource=r0 op monitor interval=29s –master meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
sudo pcs -f cfg fs_res ocf:heartbeat:Filesystem params device=/dev/drbd0 directory=/var/sql fstype=ext4
sudo pcs -f cfg constraint colocation add fs_res drbd_res-master
sudo pcs -f cfg constraint colocation add virtualip mssqlha
sudo pcs -f cfg constraint colocation add fs_res mssqlha
sudo pcs -f cfg constraint order promote drbd_res-master then start fs_res
sudo pcs -f cfg constraint order start fs_res then start mssqlha
sudo pcs cluster cib-push cfg
Step 8: Test
Verify thatSQL Server is started:
sudo pcs status
You should see something like this:
Notice that everything is started and that the MSSQL server, virtualip, and file, DRBD master, and DRBD file system are all running on the same node. If this is not the case, there is a probably configuration mistake somewhere. If the SQL server did not start, there is another command that can give more details:
sudo crm_mon
This command refreshes status in real time and will also show errors at the bottom if resources failed to start. Press Ctrl-C to go back to the command line.
Once everything is running as above, the next step is to make sure you can login to the database and to test a failover. First log in to MSSQL:
sqlcmd -S localhost -U hacluster -P '<YourPassword>'
Next, create a new Database:
CREATE DATABASE TestDB
SELECT Name from sys.Databases
GO
You should see something like this:
Notice the new database you created listed at the bottom. Log out of MSSQL by typing quit to return to the command line.
Now it is time to test a failover. First monitor the cluster from the second server by running:
sudo crm_mon
Then on the first server break its connection to the cluster by stopping pacemaker:
sudo systemctl stop pacemaker
You will notice on the second server the cluster changing. The first server should show as disconnected and the second server will take over as master for DRBD. It should also take over the floating IP, DRBD file system, and MSSQL resources. If this does not occur, or if the SQL server does not start, there may be a configuration mistake.
Then on the first server, reconnect by starting pacemaker again:
sudo systemctl start pacemaker
On the second server you should see the first server become slave, but nothing else should change. Now repeat the steps above to connect to MSSQL on the second server, create a new DB, and view the databases available. You should see the TestDB database that you created on the first node, as well as the new one you just created (TestDB2, for example). Then return to the command line with the quit command.
Finally, monitor the cluster on the first server, while you stop and start pacemaker on the second. This should fail back over to the first server, which should be master and be in controller of DRBD, DRBD filesystem, and the floating IP, with the second server just a slave to DRBD. Again, log in to MSSQL on the first database and verify that both databases are there.