In the first part, we explained in detail on how to install and configure 2 Node RedHat Cluster.
We covered the following high-level steps in the previous tutorial:
- Install and start RICCI cluster service
- Create cluster on active node
- Add a node to cluster
- Add fencing to cluster
- Configure failover domain
- Add resources to cluster
In this tutorial, we’ll cover the following high-level steps to finish the cluster setup:
- Sync cluster configuration across nodes
- Start the cluster
- Verify failover by shutting down an active node
1. Sync Configurations across Nodes
Anytime a configuration change is made, or the 1st time when you are installing and configuring the cluster, you should sync the configurations from active node to all the nodes.
The following command will sync the cluster configurations to all the available nodes:
[root@rh1 ~]# ccs -h rh1 --sync --activate rh2 password:
2. Verify Cluster Configuration
Finally, verify that the configurations are valid as shown below.
[root@rh1 ~]# ccs -h rh1 --checkconf All nodes in sync.
If there are any configuration issues, or when the configurations on the active node does not match the configurations on all the nodes in the cluster, the above command will list them appropriately.
3. Start the Cluster
To start the cluster on Node1, do the following:
[root@rh1 ~]# ccs -h rh1 –start
To start cluster on both the nodes, do the following:
[root@rh1 ~]# ccs -h rh1 –startall
To stop the cluster on Node1, do the following:
[root@rh1 ~]# ccs -h rh1 –stop
To stop cluster on both the nodes, do the following:
[root@rh1 ~]# ccs -h rh1 –stopall
4. View Cluster Status
When everything is up and running in your Redhat or CentOS Linux Cluster, you can view the cluster status as shown below:
[root@rh1 cluster]# clustat Cluster Status for mycluster @ Sat Mar 15 02:05:59 2015 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ rh1 1 Online, Local, rgmanager rh2 2 Online Service Name Owner (Last) State ------- ---- ----- ------ ----- service:webservice1 rh1 started
As you see in the above output, it indicates that there are two nodes in our cluster, and both the nodes are online, and and rh1 is the active node.
5. Verify Cluster Failover
To verify the failover of the cluster, stop the cluster on active node or shut-down the active node. This should force the cluster to automatically failover the IP resource and filesystem resource to the next available node defined in the failover domain.
This is what we currently see on the node1.
[root@rh1 ~]# clustat Cluster Status for mycluster @ Sat Mar 15 14:16:00 2015 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ rh1 1 Online, Local, rgmanager rh2 2 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:webservice1 rh1 started [root@rh1 ~]# hostname rh1.mydomain.net [root@rh1 ~]# ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 08:00:27:e6:6d:b7 brd ff:ff:ff:ff:ff:ff inet 192.168.1.10/24 brd 192.168.1.255 scope global eth0 inet 192.168.1.12/24 scope global secondary eth0 inet6 fe80::a00:27ff:fee6:6db7/64 scope link valid_lft forever preferred_lft forever [root@rh1 ~]# df -h /var/www Filesystem Size Used Avail Use% Mounted on /dev/mapper/cluster_vg-vol01 993M 18M 925M 2% /var/www
5. Force Cluster Failover
Now bring down the node1, and all the service and resource should failover to second node and you will see like the below output.
[root@rh1 ~]# shutdown -h now
After the node1 is down, the following is what you’ll see on the node1.
root@rh2 ~]# clustat Cluster Status for mycluster @ Sat Mar 18 14:41:23 2015 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ rh1 1 Offline rh2 2 Online, Local, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:webservice1 rh2 started
The above output indicates that there are two nodes in the cluster (rh1 and rh2). rh1 is down, and currently rh2 is the active node.
Also, as you see below, on rh2, the filesystem and the ip-address got failover from rh1 without any issues.
[root@rh2 ~]# df -h /var/www Filesystem Size Used Avail Use% Mounted on /dev/mapper/cluster_vg-vol01 993M 18M 925M 2% /var/www [root@rh2 ~]# ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP ql en 1000 link/ether 08:00:27:e6:6d:b7 brd ff:ff:ff:ff:ff:ff inet 192.168.1.11/24 brd 192.168.1.255 scope global eth0 inet 192.168.1.12/24 scope global secondary eth0 inet6 fe80::a00:27ff:fee6:6db7/64 scope link tentative dadfailed valid_lft forever preferred_lft forever
6. Full Working cluster.conf Example File
The following the final working cluster.conf configuration file for a 2 node redhat cluster.
[root@rh1 ~]# cat /etc/cluster/cluster.conf <?xml version="1.0"?> <cluster config_version="28" name="mycluster"> <fence_daemon post_join_delay="25"/> <clusternodes> <clusternode name="rh1" nodeid="1"> <fence> <method name="mthd1"> <device name="myfence"/> </method> </fence> </clusternode> <clusternode name="rh2" nodeid="2"> <fence> <method name="mthd1"> <device name="myfence"/> </method> </fence> </clusternode> </clusternodes> <cman expected_votes="1" two_node="1"/> <fencedevices> <fencedevice agent="fence_virt" name="myfence"/> </fencedevices> <rm> <failoverdomains> <failoverdomain name="webserverdomain" nofailback="0" ordered="1" restricted="0"> <failoverdomainnode name="rh1"/> <failoverdomainnode name="rh2"/> </failoverdomain> </failoverdomains> <resources> <fs device="/dev/cluster_vg/vol01" fstype="ext4" mountpoint="/var/www" name="web_fs"/> </resources> <service autostart="1" domain="webserverdomain" name="webservice1" recovery="relocate"> <fs ref="web_fs"/> <ip address="192.168.1.12" monitor_link="yes" sleeptime="10"/> </service> </rm> </cluster>
Comments on this entry are closed.
This is one of best articles out there on how to setup a cluster. Thank you.
While trying, I do get the following error when doing clustat.
“Could not connect to CMAN: No such file or directory”
Is there something I missed?
Hello Ramesh
Curious no mention of a quorum for 2
node cluster possible brain split ?
Thank you
Hi,
Thanks a lot
very nice article…
Hi Karthikeyan Sadhasivam,
First of all, thanks for such nice tutorial.
Secondly, I would like to ask which RedHat (CentOS) version are you presenting here.
Can we use this tutorial for Version 7?
Thank you in advance.
very much useful articles, please share redhat cluster troubleshooting article also..thanks a lot..
I´m having a hard time creating a pcs resource.
Do you have more information about creating services ?
To initiate a failover you could also use
clusvcadm -r -m
Where -r = relocate and -m = member to re locate to. This is the more graceful way to initiate the moving of the service to the passive node.
Now on to my question. I am having issues with my failover/relocate where the shutdown on the active is successful, but the start on the passive fails. I can use clusvcadm -e on either node to start the service, but the failover will not start the service on the other node and restarts it on the same node.
There is no detail in the logs I’ve found. Just “clurgmgrd: #68: Failed to start service:xxx”. Do you have any hints for me to find more info on what is failing?
thanks
sir
Is this article is for RHEL 6?
It’s very nice article very helpful to me but can you help me
while sync mine cluster i am getting error
“unable to connect node2 , make sure ricci server started”
as i am making cluster in mine VMware workstation