Thursday, October 8, 2015

Configure NAS4free with HA

NAS4Free High available iSCSI failover VMware server. 

The following post will be how to install and set up NAS4Free server for your ESXi/ESX VMware server as an iSCSI storage.
NAS4Free is based on FreeBSD and has all the required services to serve your system as a High-Available Storage server. (HAST and CARP)
Of course you can use this solution in your network as a High-Available storage or as a Windows cifs samba server, if you modify the services on NAS4Free.
I’ll stick first to the iSCSI setup and later we will show you how to set up NFS and Windows(SAMBA) shares.

The following setup used here:
Node1 primary IP address for serving iSCSI and CARP services: 192.168.101.165
Node1 secondary IP address for HAST synchronisation: 172.16.100.1
Node2 primary IP address for serving iSCSI and CARP services: 192.168.101.166
Node2 secondary IP address for HAST synchronisation: 172.16.100.2
Virtual IP address(CARP address) for iSCSI service: 192.168.101.167
Node1 host name: has1
Node2 host name: has2
Install both nodes with lates NAS4Free edition.
– Change node names according to your set up for example: node1 and node2.

hostname

– Add node names to host file on both nodes.
hosts
– Setup carp services under Network/Interface management:

carp1
– Advertisement skew on has1 node: 0
– Advertisement skew on has2 node: 10
If has1 node dies then has2 node will take over all the services.

carp2
You must use same link up and link down action on both side of the nodes otherwise the switch over wont work properly!
So everything should be the same except the advertisement skew value.
Next step setup HAST services:

hast1
hast3

As you can see here the second network interface card used for the HAST service synchronisation not the main interface.
After you setup HAST service reboot both nodes, the apply wont help to start the services for some reason. 
– Switch on ssh service and ssh into both nodes.
On Master issue these commands:
hastctl role init disk1
hastctl create disk1
hastctl role primary disk1
On Slave issue these commands:
hastctl role init disk1
hastctl create disk1
hastctl role secondary disk1
Check both nodes with: hastctl status
Then configure ZFS
On Master:
Add disks (Disks->Management)
disk1: N/A (HAST device)
Advanced Power Management: Level 254
Acoustic level: Maximum performance
S.M.A.R.T.: Checked
Preformatted file system: ZFS storage pool device
Format as zfs (Disks->Format)
Add ZFS Virtual Disks (Disks->ZFS->Pools->Virtual Device)
Add Pools(Disks->ZFS->Pools->Management)
Add PostInit script on both nodes to /system/advanced/command scripts/ tab.
/usr/local/sbin/carp-hast-switch slave
Shut down the master and on the slave import the pool through the GUI.  Tab: /ZFS/Configuration/Detected
Then synchronise the pool on the slave!
When finished on slave, start master and switch VIP back to master.
zpool status disk1
hastctl status

Troubleshooting commands from SSH terminal:
zpool status
########

nast1: ~ # zpool status mvda0
  pool: mvda0
 state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run ‘zpool clear’.
   see: http://illumos.org/msg/ZFS-8000-HC
  scan: none requested
config:
        NAME                   STATE     READ WRITE CKSUM
        mvda0                  UNAVAIL      0     0     0
          2144332937472371213  REMOVED      0     0     0  was /dev/hast/hast
#########
If status unavailable then you could try:

zpool clear “pool name”

It will scan and scrub the local disks.
#########
nast1: ~ # zpool status mvda0
  pool: mvda0
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://illumos.org/msg/ZFS-8000-8A
  scan: scrub in progress since Mon Jun  2 15:26:25 2014
        1.19G scanned out of 1.43G at 28.3M/s, 0h0m to go
        0 repaired, 82.75% done
config:
        NAME         STATE     READ WRITE CKSUM
        mvda0        ONLINE       0     0     0
          hast/hast  ONLINE       0     0     0
#########
Then check pool again:
zpool status

#########
nast1: ~ # zpool status
  pool: mvda0
 state: ONLINE
  scan: scrub repaired 0 in 0h0m with 0 errors on Mon Jun  2 15:27:17 2014
config:
        NAME         STATE     READ WRITE CKSUM
        mvda0        ONLINE       0     0     0
          hast/hast  ONLINE       0     0     0

#########
Recreate sync on disks or split brain:
On Master issue these commands:
hastctl role init disk1
hastctl create disk1
hastctl role primary disk1
On Slave issue these commands:
hastctl role init disk1
hastctl create disk1
hastctl role secondary disk1
If you lost sync because of disk error or network error then you could recreate the sync between the hast disk(s).
Just recreate the roles and the nodes will start syncing the data. (use commands above)  Be careful with the roles and the nodes, don’t mix them up!
If you recreate the roles and the disks, you wont lose data at all. It will only start synching the disk(s) bt wont overwrite data.
If it a split brain scenario then you should decide which node has the newer data and issue the above commands according to the data. So for example if the secondary node has newer data then the primary then obviously you should issue: role primary on the second node and role secondary on the primary node and vica-versa.

1 comment:

  1. Do you remember how to create virtual interface carp0 ?
    I only see my two NIC in create carp and do not know how to create such carp0 virtual interface..

    ReplyDelete