pfSense Plus HA on OCI

James George
Oracle Developers
Published in
13 min readJul 6, 2023
Photo by kazuend on Unsplash

Netgate recently released pfSense Plus 23.05 which introduces unicast CARP support. In this article we will explore setting up pfSense Plus 23.05 in an OCI lab in a HA cluster using unicast CARP. This article makes the assumption that you are familiar with OCI and pfSense installation and configuration.

This article will make changes the pfSense base install that are not supported by Netgate and could cause other unintended side effects with the pfSense software (due to the installation of the OCI CLI software package); these instructions are intended for learning and experimentation and not for production use.

Note: Unicast CARP is not a feature available in pfSense CE as of 2.7.0.

To start, you will need two pfSense Plus installs of at least 23.05 within the same VCN and subnets in your tenancy. There are a number of other articles out there on how to achieve this (e.g. Install and Configure pfSense on Oracle Cloud Infrastructure); in my case, I used a custom image of 2.6.0 CE imported from a VMDK with VFIO (SR-IOV) enabled for vNICs and then registered with pfSense Plus keys.

My pfSense Plus instances in this example are:

  • pfSense Plus 23.05
  • Shape E4 Flex, 3 OCPU, 16Gb RAM
  • 3 vNICs
    - Primary vNIC (pfSense LAN)
    - Secondary vNIC 1 (pfSense WAN)
    - Secondary vNIC 2 (pfSense SYNC)

The pfSense instances are using a VCN with three subnets:

  • public a public subnet with a 0.0.0.0 route to an Internet Gateway
  • private a private subnet with a route table for my private connectivity and spoke VCNs
  • sync a private subnet with an empty route table
pfSense VCN showing subnet and vNIC placement
pfSense HA in OCI

One of the key things to remember when working with VIPs on OCI is that even though a host might explicitly be configured with a VIP, OCI also needs to know about the VIP (i.e. an OCI secondary IP address must exist) and where the VIP is assigned (i.e. to which vNIC). If OCI doesn’t know about the VIP or where it should be placed, the off-box network virtualisation will prevent packets moving to a host’s vNIC where a VIP has been configured. To this end, we will use the OCI CLI to inform OCI of where VIPs should be running; CARP will move a VIP between the instances and the instance will then inform OCI that it has the VIP. For this to work we will need the following:

  • Create a dynamic group and policy to allow pfSense instances to control the VIPs via instance principals
  • Install OCI CLI on our pfSense instances
  • Ensure that pfSense can call OCI APIs
  • Create a devd configuration and a shell script to be invoked to tell OCI that VIPs have moved

Setup HA in pfSense

This is pretty much just following the official example (High Availability Configuration Example) in which you set up the SYNC interface, state synchronisation (pfsync) and configuration synchronisation (XMLRPC). In my case, my IP selections are:

WAN Addressing

LAN Addressing

SYNC Addressing

When setting up the Firewall/Virtual IPs in pfSense Plus, just remember to configure CARP in unicast mode. For example, pfSense1 WAN:

pfSense1 WAN CARP Configuration

and pfSense2 WAN:

pfSense2 WAN CARP Configuration

Note that the unicast address is the address of the other pfSense instance. Similarly, the configuration of the LAN CARP IP will target the LAN IP address of the other node.

Lastly, remember to not (inadvertently) block CARP heartbeat traffic on the WAN and LAN interfaces.

The CARP VIPs in OCI

Both the internal (LAN) and external (WAN) CARP VIPs must be created and assigned in OCI. This can be done via the OCI console and I would recommend creating and assigning the VIPs on the pfSense instance that will be the primary node in the CARP cluster (in my case, this is pfSense1). Once we finish the set up, these VIPs will move to the secondary node if the primary node fails, however, the default state when everything is up is that the VIPs will be owned by the primary node.

Communication between vNICs in the OCI Network

It is worth remembering that communication between the various pfSense vNICs also needs to be permitted by OCI using either Security Lists or Network Security Groups. For the sake of expediency in my build, I used Network Security Groups for each pair of interfaces. For example, for the SYNC interfaces, their vNICS have the following NSG attached:

pfsense-sync

This allows open communication between any vNIC added to the pfsense-sync Network Security Group, but nothing else. Similar arrangements can be made for the LAN and WAN interfaces with NSGs, however in those cases you may want those much more open e.g. allowing any source and any destination for WAN. Also remember that the default security list allow egress to any destination for any port and protocol, so be mindful of where that is used too.

Note that you need to create an NSG before you can create rules that reference itself within it.

Create a Dynamic Group and Policy

If you are unfamiliar with OCI and instance principals, take a look at the documentation here.

Dynamic groups are created in the root compartment of your tenancy, you can use any name you like, for example pfsense-dynamic-group. The matching rules for your dynamic group can be by instance OCID or compartment OCID. If you are using instance OCIDs you will need to add a rule for each of your pfSense instances and have the matching rules set to “Match any”. In my case, my pfSense instances have a dedicated compartment, so I use a match rule for that compartment OCID — e.g. something like All {instance.compartment.id = 'ocid1.compartment.oc1..xyzzy'}.

Once you have a dynamic group created, you will need to add an appropriate policy to allow the instance principals to perform actions. The policy can be created in any appropriate compartment and the policy needs to allow the management of private-ips for the pfSense instances. In my case, as I have a dedicated compartment for pfSense, I created the policy in the dedicated pfSense compartment and my policy statement is something like Allow dynamic-group pfsense-dynamic-group to manage private-ips in compartment id ocid1.compartment.oc1..xyzzy.

Tinkering with pfSense Plus

The next steps need to be applied to the pfSense install. While you can probably make all of the necessary configuration changes via the pfSense Diagnostics/Edit File facility, it will probably be quicker and easier to enable SSH login to the pfSense instances; do this under System/Advanced — check Enable Secure Shell and click Save.

Enable the FreeBSD pkg Repository

On each of the pfSense instances, log in via SSH and hit 8 to drop to a shell and follow the instructions here to enable the FreeBSD package repository. Once you have done this, you can install the OCI CLI package with pkg update and pkg install devel/oci-cli. This will pull through a number of other dependencies for OCI CLI.

Ensure pfSense can call OCI APIs

For anything to be able to invoke an OCI API, it must be able to reach the public API end points. If your pfSense instances have access to the internet either via an Internet Gateway or a NAT Gateway, you should be fine, however, if you are deploying without internet access or are planning to remove it later, you will need to add a Service Gateway to your VCN and a suitable route to the route table associated with your OCI subnet where pfSense would push this API traffic (e.g. the subnet used by the default route of pfSense).

To actually test whether your pfSense instances can call the OCI APIs using an instance principal, log into each pfSense instance via SSH, hit 8 to drop to a shell and try to list private IPs associated with a pfSense vNIC. Here is an example:

[23.05-RELEASE][root@pfSense1]/root: oci --auth instance_principal network private-ip list --vnic-id ocid1.vnic.oc1.ocid-copied-from-the-oci-console
{
"data": [
{
"availability-domain": "ZVCk:AP-MELBOURNE-1-AD-1",
"compartment-id": "ocid1.compartment.oc1..xyzzy",
"defined-tags": {
"Oracle-Tags": {
"CreatedBy": "someuser",
"CreatedOn": "2023-06-08T01:13:34.645Z"
}
},
"display-name": "private",
"freeform-tags": {},
"hostname-label": null,
"id": "ocid1.privateip.oc1.ap-melbourne-1.xyzzy",
"ip-address": "172.20.0.71",
"is-primary": true,
"subnet-id": "ocid1.subnet.oc1.ap-melbourne-1.xyzzy",
"time-created": "2023-06-08T01:13:37.929000+00:00",
"vlan-id": null,
"vnic-id": "ocid1.vnic.oc1.ap-melbourne-1.xyzzy"
}
]
}

Notice the--auth instance_principal? This is utilising an instance principal for API authentication via the dynamic group and policy we created earlier.

If this is unsuccessful, you need to check and validate your dynamic group and policy set up.

Create devd and Shell Script Glue for OCI API Calls

Note: While you can get the needed information for these scripts from the OCI console, you can also use the OCI Metadata Service directly from the node if you prefer. For example curl http://169.254.169.254/opc/v1/vnics will retrieve information about the instance’s vNICs and assigned IP addresses.

For more information, see Getting Instance Metadata.

While we are in the pfSense shell on each pfSense node, we will need to create a shell script and a configuration file for devd to initiate actions when CARP events occur, namely when the VIP addresses are added to our interfaces by CARP. For this, you will need the vNIC OCIDs for your internal (LAN) and external (WAN) interfaces and Virtual IPs that you are using.

We will create a script called carp-up and locate it in /root. Create /root/carp-up and add the following content substituting VNIC OCIDs appropriately for the particular node and your internal and external VIPs:

#!/bin/sh
NODE_INT_VNIC="ocid1.vnic.node1-internal-ocid-copied-from-the-oci-console-here"
NODE_INT_VIP="ip-address-of-internal-VIP-here" # e.g. 172.20.0.70
NODE_EXT_VNIC="ocid1.vnic.node1-external-ocid-copied-from-the-oci-console-here"
NODE_EXT_VIP="ip-address-of-external-VIP-here" # e.g. 172.20.0.10

if [ "${1}" = "ADDR_ADD" ]; then
# Adding address
if [ "${2}" = "${NODE_INT_VIP}" ]; then
# Private
echo "Adding ${NODE_INT_VIP} as OWNER" >/tmp/ip_switch_int.log 2>&1
/usr/local/bin/oci network vnic assign-private-ip --auth instance_principal --unassign-if-already-assigned --vnic-id ${NODE_INT_VNIC} --ip-address ${NODE_INT_VIP} >>/tmp/ip_switch_int.log 2>&1
elif [ "${2}" = "${NODE_EXT_VIP}" ]; then
# Public
echo "Adding ${NODE_EXT_VIP} as OWNER" >/tmp/ip_switch_ext.log 2>&1
/usr/local/bin/oci network vnic assign-private-ip --auth instance_principal --unassign-if-already-assigned --vnic-id ${NODE_EXT_VNIC} --ip-address ${NODE_EXT_VIP} >>/tmp/ip_switch_ext.log 2>&1
else
echo "I don't know what to do with ${NODE_EXT_VIP}..." >/tmp/ip_switch_ext_error.log 2>&1
fi
fi

We will create our devd configuration file in /usr/local/etc/devd as this path will be checked by devd on start up and merged with the configuration already in /etc/devd and /etc/devd.conf. If /usr/local/etc/devd doesn’t exist, create it.

The key information we need for this file are the interface names (as pfSense knows them) that the VIPs which will be assigned to those interfaces. You can get the interface names from the pfSense console or by looking at the ifconfig output on the instance. In my case, my internal (LAN) interface is on mce0 and my external interface (WAN) is on mce1.

Create a file /usr/local/etc/devd/carp.conf (though the file name isn’t important) and add the following substituting your interface names and internal and external VIPs:

notify 100 {
match "system" "IFNET";
match "type" "(ADDR_ADD|ADDR_DEL)";
match "subsystem" "internal-interface-name-here"; # e.g. mce0
match "address" "ip-address-of-internal-VIP-here"; # e.g. 172.20.0.70
action "/root/carp-up $type $address"; # Our script to call
};
notify 100 {
match "system" "IFNET";
match "type" "(ADDR_ADD|ADDR_DEL)";
match "subsystem" "external-interface-name-here"; # e.g. mce1
match "address" "ip-address-of-external-VIP-here"; # e.g. 172.20.0.10
action "/root/carp-up $type $address"; # Our script to call
};

Note: I’m matching type of ADDR_ADD or ADDR_DEL but only using ADDR_ADD in the carp-up script. If you prefer, just match ADDR_ADD as this is all that is needed in this instance. The match statement would then just be: match "type" "ADDR_ADD";

Restart devd with system devd restart. If devd fails to restart or you want to debug your configuration, stop devd (service devd stop) and run devd interactively with devd -ddevd will now output to the console and you can then watch what devd is doing.

When CARP moves the VIPs, the IFNET system will generate an ADDR_ADD event for the interface and IP address. This configuration and script will capture these events and issue a call to OCI to relocate the VIP address in OCI. The results of the call are logged to /tmp/ip_switch_int.log and /tmp/ip_switch_ext.log for the LAN and WAN IPs respectively.

We should now be ready to test failover.

Other Optional Things

Depending on what your end state will be, if this cluster will be in a hub of a hub-and-spoke topology in OCI, you will probably want to create a gateway for the LAN side of the firewall and add static routes for CIDRs that will exist in the spoke VCNs that point to the LAN-side gateway. The gateway address for the LAN subnet is the first usable IP in the subnet CIDR. In my case, the LAN subnet is 172.20.0.64/26, thus the gateway will be 172.20.0.65. Alternatively, as stated above, you can also use the OCI Metadata Service (curl http://169.254.169.254/opc/v1/vnics) which will provide the virtualRouterIp for each of the vNICs.

If you recall the pfSense VCN diagram above, I have two spoke VCNs — 172.20.1.0/24 and 172.20.2.0/24 — and as a result I have created a gateway for the LAN-side of pfSense (172.20.0.65 in my case) and also static routing that directs RFC-1918 CIDRs to the LAN interface gateway; this is relatively all-encompassing by suits my lab environment. Also, if your intention is to leverage the OCI cloud resolver for DNS, you may want to create a static routing for 169.254.169.254 that points to either the WAN or LAN gateway depending on how you wish to treat this traffic. So I have the following:

pfSense Gateways
pfSense Routing

The default MTU in OCI is 9000 and the default MTU as installed for pfSense is 1500. As a result, you may wish to increase the MTU on each of the pfSense interfaces to 9000.

Testing

For testing, I used a small A1-based instance in a spoke VCN that is attached to my pfSense hub VCN. If you are unfamiliar with this routing scenario in OCI, take a look at Using a DRG to route traffic through a centralized network virtual appliance. Essentially, in my case, the pfSense hub is advertising 0.0.0.0/0 to the DRG and my all my spoke VCN DRG Attachments are using a routing table that has imported the pfSense hub distributed 0.0.0.0/0 route (thus all spoke traffic will be directed to the pfSense hub). The pfSense hub has a DRG Attachment with a VCN route table associated that directs all traffic to the LAN CARP VIP (172.20.0.70 in my case).

Simple ping test and stopping the primary instance:

[opc@tester ~]$ ip address show dev enp0s6
2: enp0s6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast state UP group default qlen 1000
link/ether 02:00:17:01:4a:18 brd ff:ff:ff:ff:ff:ff
inet 172.20.1.227/24 brd 172.20.1.255 scope global dynamic noprefixroute enp0s6
valid_lft 83584sec preferred_lft 83584sec
inet6 fe80::17ff:fe01:4a18/64 scope link noprefixroute
valid_lft forever preferred_lft forever
[opc@tester ~]$ ping www.oracle.com
PING e2581.dscx.akamaiedge.net (23.40.164.54) 56(84) bytes of data.
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=1 ttl=54 time=12.1 ms
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=2 ttl=54 time=12.1 ms
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=3 ttl=54 time=12.1 ms
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=4 ttl=54 time=12.2 ms
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=5 ttl=54 time=12.2 ms
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=6 ttl=54 time=12.1 ms
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=7 ttl=54 time=12.2 ms
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=8 ttl=54 time=12.2 ms
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=9 ttl=54 time=12.1 ms
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=10 ttl=54 time=12.2 ms
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=11 ttl=54 time=12.1 ms
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=12 ttl=54 time=12.2 ms
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=13 ttl=54 time=12.2 ms
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=14 ttl=54 time=12.2 ms
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=17 ttl=54 time=12.1 ms <-- blip where takeover has occurred
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=18 ttl=54 time=12.1 ms
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=19 ttl=54 time=12.1 ms
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=20 ttl=54 time=12.1 ms
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=21 ttl=54 time=12.1 ms
64 bytes from a23–40–164–54.deploy.static.akamaitechnologies.com (23.40.164.54): icmp_seq=22 ttl=54 time=12.2 ms
^C
- - e2581.dscx.akamaiedge.net ping statistics - -
22 packets transmitted, 20 received, 9.09091% packet loss, time 21065ms
rtt min/avg/max/mdev = 12.069/12.144/12.197/0.037 ms

There is a small interruption noticed in packet movement when the primary node shuts down.

Large file transfer and stopping the primary instance:

[opc@tester ~]$ wget https://yum.oracle.com/ISOS/OracleLinux/OL9/u2/x86_64/OracleLinux-R9-U2-x86_64-dvd.iso
- 2023–07–03 01:40:45 - https://yum.oracle.com/ISOS/OracleLinux/OL9/u2/x86_64/OracleLinux-R9-U2-x86_64-dvd.iso
Resolving yum.oracle.com (yum.oracle.com)… 23.77.132.200, 2600:1415:10:5a9::2a7d, 2600:1415:10:585::2a7d
Connecting to yum.oracle.com (yum.oracle.com)|23.77.132.200|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 10267656192 (9.6G) [application/octet-stream]
Saving to: 'OracleLinux-R9-U2-x86_64-dvd.iso'
OracleLinux-R9-U2-x86_64-dvd.iso 79%[==============================================================> ] 7.60G 48.6MB/s eta 33s

Transfer rate briefly dropped to 0MB/s when the primary instance was stopped, but then carried on — firewall states are being maintained between primary and secondary. Restarting primary node during the transfer also resulted in minimal disruption as the CARP VIPs moved back to the primary instance.

You can also check the CARP status in the pfSense UI. Entering maintenance mode will also cause the CARP VIPs to move to the secondary instance.

CARP Status on Primary
CARP Status on Secondary

Happy firewalling!

--

--