In an active-standby Linux cluster configuration, all the critical services including IP, filesystem will failover from one node to another node in the cluster.
This tutorials explains in detail on how to create and configure two node redhat cluster using command line utilities.
The following are the high-level steps involved in configuring Linux cluster on Redhat or CentOS:
- Install and start RICCI cluster service
- Create cluster on active node
- Add a node to cluster
- Add fencing to cluster
- Configure failover domain
- Add resources to cluster
- Sync cluster configuration across nodes
- Start the cluster
- Verify failover by shutting down an active node
1. Required Cluster Packages
First make sure the following cluster packages are installed. If you don’t have these packages install them using yum command.
[root@rh1 ~]# rpm -qa | egrep -i "ricci|luci|cluster|ccs|cman" modcluster-0.16.2-28.el6.x86_64 luci-0.26.0-48.el6.x86_64 ccs-0.16.2-69.el6.x86_64 ricci-0.16.2-69.el6.x86_64 cman-3.0.12.1-59.el6.x86_64 clusterlib-3.0.12.1-59.el6.x86_64
2. Start RICCI service and Assign Password
Next, start ricci service on both the nodes.
[root@rh1 ~]# service ricci start Starting oddjobd: [ OK ] generating SSL certificates... done Generating NSS database... done Starting ricci: [ OK ]
You also need to assign a password for the RICCI on both the nodes.
[root@rh1 ~]# passwd ricci Changing password for user ricci. New password: Retype new password: passwd: all authentication tokens updated successfully.
Also, If you are running iptables firewall, keep in mind that you need to have appropriate firewall rules on both the nodes to be able to talk to each other.
3. Create Cluster on Active Node
From the active node, please run the below command to create a new cluster.
The following command will create the cluster configuration file /etc/cluster/cluster.conf. If the file already exists, it will replace the existing cluster.conf with the newly created cluster.conf.
[root@rh1 ~]# ccs -h rh1.mydomain.net --createcluster mycluster rh1.mydomain.net password: [root@rh1 ~]# ls -l /etc/cluster/cluster.conf -rw-r-----. 1 root root 188 Sep 26 17:40 /etc/cluster/cluster.conf
Also keep in mind that we are running these commands only from one node on the cluster and we are not yet ready to propagate the changes to the other node on the cluster.
4. Initial Plain cluster.conf File
After creating the cluster, the cluster.conf file will look like the following:
[root@rh1 ~]# cat /etc/cluster/cluster.conf <?xml version="1.0"?> <cluster config_version="1" name="mycluster"> <fence_daemon/> <clusternodes/> <cman/> <fencedevices/> <rm> <failoverdomains/> <resources/> </rm> </cluster>
5. Add a Node to the Cluster
Once the cluster is created, we need to add the participating nodes to the cluster using the ccs command as shown below.
First, add the first node rh1 to the cluster as shown below.
[root@rh1 ~]# ccs -h rh1.mydomain.net --addnode rh1.mydomain.net Node rh1.mydomain.net added.
Next, add the second node rh2 to the cluster as shown below.
[root@rh1 ~]# ccs -h rh1.mydomain.net --addnode rh2.mydomain.net Node rh2.mydomain.net added.
Once the nodes are created, you can use the following command to view all the available nodes in the cluster. This will also display the node id for the corresponding node.
[root@rh1 ~]# ccs -h rh1 --lsnodes rh1.mydomain.net: nodeid=1 rh2.mydomain.net: nodeid=2
6. cluster.conf File After Adding Nodes
This above will also add the nodes to the cluster.conf file as shown below.
[root@rh1 ~]# cat /etc/cluster/cluster.conf <?xml version="1.0"?> <cluster config_version="3" name="mycluster"> <fence_daemon/> <clusternodes> <clusternode name="rh1.mydomain.net" nodeid="1"/> <clusternode name="rh2.mydomain.net" nodeid="2"/> </clusternodes> <cman/> <fencedevices/> <rm> <failoverdomains/> <resources/> </rm> </cluster>
7. Add Fencing to Cluster
Fencing is the disconnection of a node from shared storage. Fencing cuts off I/O from shared storage, thus ensuring data integrity.
A fence device is a hardware device that can be used to cut a node off from shared storage.
This can be accomplished in a variety of ways: powering off the node via a remote power switch, disabling a Fiber Channel switch port, or revoking a host’s SCSI 3 reservations.
A fence agent is a software program that connects to a fence device in order to ask the fence device to cut off access to a node’s shared storage (via powering off the node or removing access to the shared storage by other means).
Execute the following command to enable fencing.
[root@rh1 ~]# ccs -h rh1 --setfencedaemon post_fail_delay=0 [root@rh1 ~]# ccs -h rh1 --setfencedaemon post_join_delay=25
Next, add a fence device. There are different types of fencing devices available. If you are using virtual machine to build a cluster, use fence_virt device as shown below.
[root@rh1 ~]# ccs -h rh1 --addfencedev myfence agent=fence_virt
Next, add fencing method. After creating the fencing device, you need to created the fencing method and add the hosts to the fencing method.
[root@rh1 ~]# ccs -h rh1 --addmethod mthd1 rh1.mydomain.net Method mthd1 added to rh1.mydomain.net. [root@rh1 ~]# ccs -h rh1 --addmethod mthd1 rh2.mydomain.net Method mthd1 added to rh2.mydomain.net.
Finally, associate fence device to the method created above as shown below:
[root@rh1 ~]# ccs -h rh1 --addfenceinst myfence rh1.mydomain.net mthd1 [root@rh1 ~]# ccs -h rh1 --addfenceinst myfence rh2.mydomain.net mthd1
8. cluster.conf File after Fencing
Your cluster.conf will look like below after the fencing devices, methods are added.
[root@rh1 ~]# cat /etc/cluster/cluster.conf <?xml version="1.0"?> <cluster config_version="10" name="mycluster"> <fence_daemon post_join_delay="25"/> <clusternodes> <clusternode name="rh1.mydomain.net" nodeid="1"> <fence> <method name="mthd1"> <device name="myfence"/> </method> </fence> </clusternode> <clusternode name="rh2.mydomain.net" nodeid="2"> <fence> <method name="mthd1"> <device name="myfence"/> </method> </fence> </clusternode> </clusternodes> <cman/> <fencedevices> <fencedevice agent="fence_virt" name="myfence"/> </fencedevices> <rm> <failoverdomains/> <resources/> </rm> </cluster>
9. Types of Failover Domain
A failover domain is an ordered subset of cluster members to which a resource group or service may be bound.
The following are the different types of failover domains:
- Restricted failover-domain: Resource groups or service bound to the domain may only run on cluster members which are also members of the failover domain. If no members of failover domain are availables, the resource group or service is placed in stopped state.
- Unrestricted failover-domain: Resource groups bound to this domain may run on all cluster members, but will run on a member of the domain whenever one is available. This means that if a resource group is running outside of the domain and member of the domain transitions online, the resource group or
- service will migrate to that cluster member.
- Ordered domain: Nodes in the ordered domain are assigned a priority level from 1-100. Priority 1 being highest and 100 being the lowest. A node with the highest priority will run the resource group. The resource if it was running on node 2, will migrate to node 1 when it becomes online.
- Unordered domain: Members of the domain have no order of preference. Any member may run in the resource group. Resource group will always migrate to members of their failover domain whenever possible.
10. Add a Filover Domain
To add a failover domain, execute the following command. In this example, I created domain named as “webserverdomain”,
[root@rh1 ~]# ccs -h rh1 --addfailoverdomain webserverdomain ordered
Once the failover domain is created, add both the nodes to the failover domain as shown below:
[root@rh1 ~]# ccs -h rh1 --addfailoverdomainnode webserverdomain rh1.mydomain.net priority=1 [root@rh1 ~]# ccs -h rh1 --addfailoverdomainnode webserverdomain rh2.mydomain.net priority=2
You can view all the nodes in the failover domain using the following command.
[root@rh1 ~]# ccs -h rh1 --lsfailoverdomain webserverdomain: restricted=0, ordered=1, nofailback=0 rh1.mydomain.net: 1 rh2.mydomain.net: 2
11. Add Resources to Cluster
Now it is time to add a resources. This indicates the services that also should failover along with ip and filesystem when a node fails. For example, the Apache webserver can be part of the failover in the Redhat Linux Cluster.
When you are ready to add resources, there are 2 ways you can do this.
You can add as global resources or add a resource directly to resource group or service.
The advantage of adding it as global resource is that if you want to add the resource to more than one service group you can just reference the global resource on your service or resource group.
In this example, we added the filesystem on a shared storage as global resource and referenced it on the service.
[root@rh1 ~]# ccs –h rh1 --addresource fs name=web_fs device=/dev/cluster_vg/vol01 mountpoint=/var/www fstype=ext4
To add a service to the cluster, create a service and add the resource to the service.
[root@rh1 ~]# ccs -h rh1 --addservice webservice1 domain=webserverdomain recovery=relocate autostart=1
Now add the following lines in the cluster.conf for adding the resource references to the service. In this example, we also added failover IP to our service.
<fs ref="web_fs"/> <ip address="192.168.1.12" monitor_link="yes" sleeptime="10"/>
In the 2nd part of this tutorial (tomorrow), we’ll explain how to sync the configurations across multiple nodes in a cluster, and how to verify the failover scenario in a cluster setup.
Comments on this entry are closed.
hi Geek Stuff,
i have encounter the fencing loop in RHCS. Do you aware of this ?
The documentation in below from access.redhat.com
“A fencing loop can occur on a 2-node cluster when the cluster interconnect experiences issues that prevent the nodes from communicating, and one of the nodes starts the cman service (RHEL 5 and 6) or the pacemaker.service systemd unit (RHEL 7).
When the network is lost, both cluster nodes will notice the other is missing and try to fence. If both can reach each fencing devices via the public network and one will win and fence the other node off.
When the fenced node reboots, it will wait for the existing node to rejoin its cluster. After the fenced node waits a period of time, it will decide that the existing node is in an unknown state (because the network is still down) and try to fence it, which will succeed.
The original node then reboots and fences the other node, and this continues until manual intervention occurs.”
Hello Ramesh N,
Really fascinated by your wonderful website. Its been 3 years I am following your site. Really want to thank you for the wornderful work you are doing by sharing knowledge.
Appreciate if you can tell me a way to do this.
I am building a application cluster with 8 Linux servers. Each server will run some services and i would like to control all these services from one master node. How can i handle the service stop/start from a central node.
Thanks,
Raghuram
Could you tell me which Packages List have to Install on RHEL 6 Installation, so we dont need yum.
priority= needs to removed
[root@rh1 ~]# ccs -h rh1 –addfailoverdomainnode webserverdomain rh1.mydomain.net priority=1
Priority must be an integer between 1 and 100
>>>>>
[root@rh1 ~]# ccs -h rh1 –addfailoverdomainnode webserverdomain rh1.mydomain.net 1
[root@rh1 ~]# ccs -h rh1 –addfailoverdomainnode webserverdomain rh2.mydomain.net2
Hi,
Thanks a lot
very nice article…
At step 10,
ccs -h rh1 –addfailoverdomainnode webserverdomain rh1.mydomain.net priority=1
I was getting an error message:
“Priority must be integer between 1 and 100” (didn’t matter what number I tried)
I found the command in Redhat documentation and it should be:
ccs -h rh1 –addfailoverdomainnode webserverdomain rh1.mydomain.net 1
ccs -h rh1 –addfailoverdomainnode webserverdomain rh2.mydomain.net 2
There’s a small error in the node add syntax. It should be:
ccs -h rh1 –addfailoverdomainnode webserverdomain rh1.mydomain.net priority 1
(no “=”)
Good article
The correct command would be, to my guess
ccs -h rh1 –addfailoverdomainnode webserverdomain rh1.mydomain.net 1
Hi
Can you please explain how to add shared storage. can we use NFS.
This is for RHEL6, Please write an article suggesting a two node HA cluster with RHEL7 (please also discuss quorum and split brain)
Why didn’t you mention about iptables?
Quote:
To allow Red Hat Cluster nodes to communicate with each other, you must enable the IP ports assigned to certain Red Hat Cluster components
Hi Karthik,
Its very very nice article.
all the steps are fine.. but here in the below step.
[root@rh1 ~]# ccs –h rh1 –addresource fs name=web_fs device=/dev/cluster_vg/vol01 mountpoint=/var/www fstype=ext4
i am getting Validation failure : unable to modify configuration file.
can u pls explain about device part, because i am using two virtual machines …
pls suggest.
Hi Your linux configuration step 1st part is very useful for me please share me the link for 2nd part
[root@CLUSTER-1 yum.repos.d]# ccs -h CLUSTER-1 –addfailoverdomainnode sandomain CLUSTER-1 priority=1
Priority must be an integer between 1 and 100
[root@CLUSTER-1 yum.repos.d]#
Hi. Can you help me please. I need building a cluster firewall redhat 5. I need only copy iptables and table route main. I have a problem. I have many interfaces bond0.50 to bond.100. The solution need support vlan. I see the keepalived don’t support vlan interface. I thought use contrackd. Can you help me or suggest. i need only vip in fw1 and fw2. iptables and table route equals. please help me. thanks lot. Vinicius – Brazil.
Hi,
In my case, the solution with error “Validation failure” adding a resource, was install rgmanager ant start the service.
Sir you have mention that “In the 2nd part of this tutorial (tomorrow), we’ll explain how to sync the configurations across multiple nodes in a cluster, and how to verify the failover scenario in a cluster setup”
Where is that link i am not able to find.
Hi KARTHIKEYAN,
Nice tutorial for setting up a cluster.I need some help for setting up a cluster for Postgesql. I am using VM’s for the process and all the cluster setup is done except from the fencing part.
I have 2 nodes(RHEL 6.5) in cluster and a shared storage(NFS shared drive)
I am having trouble with the following scenario.
Services running on Node 1. If I disconnect the Network Service of Node1, the services do not shift to node2 . Can you please help me with this. I mean is this possible and if yes then how ?
1)I am using Centos 6 for this 2 node cluster using “openfiler”.
2)Trying this configuration on a virtual machine.
3)The problem is that : ‘quorum disk’ discovered is like : sdb on Node A and sdc on Node B and the same is the case with the ‘shared disk’ : sdc on Node A and sdb on Node B. I am not able to understand why is this happening. Am I doing something wrong….Please help.
Hi,
I am new to Linux, as per this document I tried to configure cluster environment locally. But when I tried to run the command CCS it is saying the command not found. can you please let me know how to get this command. As I checked in the packages this package is not available.Can you please let me know how to get the package. Please help out and do the needful.
Thank you,
Raj.
Hi Ram
I am building redhat cluster with gfs2 for shared filesystems & it will be a 6 node cluster.
Could you please provide the pre-requisites and step by step instructions.
Hi Ramesh,
I gone through your material it was awesome i really appreciate to your patience and your giving good information and it should be clearly. This materials is most of helpful to me.
In that same scheme would I have, how to add a CUPS clustered service?