Remote Requires Route

Physical box cannot reach virtual machine subnet

(and vice versa)

Info/overview

Here's a table, for quick reference, of some of the addresses that are being used in this documentation. These addresses are based on demonstration subnets (used by this guide).

physical machine external interface external subnet 203.0.113.2/30 2002:CB00:7100::2/126
physical machine TUN/TAP interface Outside VM NIC subnet 198.51.100.249/30 2001:db8:8::9/126
firewall first NIC Outside VM NIC subnet 198.51.100.250/30 2001:db8:8::a/126
firewall second NIC vmSvrs subnet 198.51.100.1/29 2001:db8:1::1/125
endpoint NIC vmSvrs subnet 198.51.100.8/29 2001:db8:1::8/125
Problem

Everything can be pinged, but some pinging doesn't work.

What works

If seems the system in the middle, which is the firewall, can seemingly ping everything.

user@ttyC0:firewall:~/$ sudo ping -c 2 -i .01 198.51.100.249
user@ttyC0:firewall:~/$ echo ${?}
0
user@ttyC0:firewall:~/$ sudo ping6 -c 2 -i .5 2001:db8:8::9
user@ttyC0:firewall:~/$ echo ${?}
0
user@ttyC0:firewall:~/$

Both of the other computers can reach the nearest NIC on the firewall. For example, from the machine that will become the DHCP server:

user@ttyC0:endpoint:~/$ sudo ping -c 2 -i .01 198.51.100.1
user@ttyC0:endpoint:~/$ echo ${?}
0
user@ttyC0:endpoint:~/$ sudo ping6 -c 2 -i .5 2001:db8:1::1
user@ttyC0:endpoint:~/$ echo ${?}
0
user@ttyC0:endpoint:~/$

and, from the physical machine that has a TUN/TAP interface:

user@ttyC0:physbox:~/$ sudo ping -c 2 -i .01 198.51.100.1
user@ttyC0:physbox:~/$ echo ${?}
0
user@ttyC0:physbox:~/$ sudo ping6 -c 2 -i .5 2001:db8:1::1
user@ttyC0:physbox:~/$ echo ${?}
0
user@ttyC0:physbox:~/$

Also, the endpoint system seems to be working with the firewall a bit better than what the physical host can. The endpoint system can communicate with the firewall's remote NIC, which is the firewall's first NIC.

user@ttyC0:endpoint:~/$ sudo ping -c 2 -i .01 198.51.100.250
user@ttyC0:endpoint:~/$ echo ${?}
0
user@ttyC0:endpoint:~/$ sudo ping6 -c 2 -i .5 2001:db8:8::a
user@ttyC0:endpoint:~/$ echo ${?}
0
user@ttyC0:endpoint:~/$
What doesn't work

The DHCP server cannot communicate with the physical machine.

user@ttyC0:endpoint:~/$ time sudo ping -c 2 -i .01 198.51.100.249
user@ttyC0:endpoint:~/$ echo ${?}
0
user@ttyC0:endpoint:~/$ sudo ping6 -c 2 -i .5 2001:db8:8::9
user@ttyC0:endpoint:~/$ echo ${?}
0
user@ttyC0:endpoint:~/$

That seems a bit odd, since the endpoint can communicate with another IP address in the same subnet (198.51.100.250, which is part of the same 198.51.100.248/30 subnet, or 2001:db8:8::a, which is part of the same 2001:db8:8::8/126 subnet).

The physical machine cannot communicate with the endpoint. That doesn't seem too surprising, since the endpoint cannot communicate with the physical machine.

However, what may seem more surprising is that the physical machine cannot communicate with the firewall's second NIC.

user@ttyC0:physbox:~/$ time sudo ping -c 2 -i .01 198.51.100.1
user@ttyC0:endpoint:~/$ echo ${?}
0
user@ttyC0:endpoint:~/$ sudo ping6 -c 2 -i .5 2001:db8:1::1
user@ttyC0:endpoint:~/$ echo ${?}
0
user@ttyC0:endpoint:~/$

This seems odd for two reasons. First of all, communicating from the physical box to the firewall's second NIC isn't working as well as communicating from the endpoint to the firewall's first NIC, which feels like a rather symmetrical layout that is working better. Second of all, the firewall has no problems communicating with ANY IP address on the physical host, but the symmetrical situation is not working because the physical host cannot communicate with every NIC on the firewall.

user@ttyC0:firewall:~/$ time sudo ping -c 2 -i .01 203.0.113.2
user@ttyC0:firewall:~/$ echo ${?}
0
user@ttyC0:firewall:~/$ sudo ping6 -c 2 -i .5 2002:cb00:7100::2
user@ttyC0:firewall:~/$ echo ${?}
0
user@ttyC0:firewall:~/$

In fact, depending on how much networking has already been set up, it might be true that the firewall can ping the Internet. (Or, maybe communication works only with remote IPv6 Internet addresses, but not IPv4 Internet addresses.) So, based on these obversations, it would be easy to incorrectly guess that the physical box would seem to be routing everything just fine. In an effort to propose a possible cause, it would be easy to incorrectly guess that the firewall is misrouting things.

yet more problems

Okay, the previous section had lots of examples of what did work and what didn't work, because there seemed to be little rhyme or reason behind what did work and what didn't work.

Some other things clearly won't work, such as the endpoint computer being able to communicate with 2002:cb00:7100::2/126 or 203.0.113.0/30, which is no surprise since the endpoint computer isn't able to communicate with the outside VM NIC subnet. Likewise, the fact that the physical box isn't able to communicate with the endpoint computer, or any other virtual machine, is not a surprise since the physical box is also unable to communicate to any address on the vmSvrs subnets, including the addresses on the firewall's NIC that is a part of those subnets (2001:db8:1::1/125 and 198.51.100.1/29).

Determining actual cause

Check the routing tables on the physical box.

user@ttyC0:physbox:~/$ echo ${PAGER}

(That should not be blank. If it is, then fix that. Details are discussed at: standard Unix variables for text editors.)

Then

user@ttyC0:physbox:~/$ netstat -nr | ${PAGER}

or

user@ttyC0:physbox:~/$ route -n show | ${PAGER}

The actual cause for all of the problems stated is that the physical box does not have routing information that allows it to communicate to the remote NICs on the firewall. However, the physical box does have routing information to communicate with the nearest NICs on the firewall, because the nearest NICs on the firewall are part of the same subnet, and “link”, as the TUN/TAP devices on the physical box.

Even if you haven't been trained in fully understanding the routing tables, you can rather easily see that routes exist for the subnets that have the TUN/TAP devices. (The Gateway/Destination may often refer to a network card/interface, or a “link”, or the IP address that is on the local NIC.) However, careful studying will show that no route exists for the virtaul machine subnet. This is what breaks the communication.

Fixing this

The most straightforward approach is to add a route on the physical system, manually. (Actually, to fix this more fully, this approach would involve adding two routes: one IPv6 route, and one IPv4 route.)

So that this happens whenever it is needed, this route should also be added to a script file. One good spot for this is the script file that the virtual machine runs when the virtual machine creates the NIC. Another spot, which may work just as well, is the /etc/hostname.tun0 file.

Another approach is NAT. This has several downsides, with one noteworthy positive advantage. One of the downsides are that this often involves requiring a computer to do more work overall. (Some traffic might be NATted when it is unnecessary.) Also, NAT causes the original address to be more difficult to access. Typical NAT implementations may even lose the original address entirely as soon as a NAT state table entry becomes treated as inactive. As a result, log files may refer only to a translated address, which may not be reasonably easy to trace back to find the original address.

The one big advantage to NAT is that this problem can be worked around by using a technique that can be implemented on the firewall, instead of the adjusting the properties of the physical box's networking configuration. Actually, in this case, that is probably not a big deal, at all. However, sometimes there is a desire to implement a fix on one system instead of another system, due to reasons such as legality (if one system is controlled by one organization, and another system is controlled by another organization), simplicity (if different operating systems are used), or other reasons (like trying to make more changes on a device that may be less prone to be replaced by another system that needs to be re-configured). The way this approach works is that a NATting program, which is often the “firewall software”, converts traffic from the remote subnet into traffic that appears to be coming from the NATting program. Since the NATting program can simply communicate using the subnet that is on the TUN/TAP device, the networking configuration (which may be part of the operating system) on the physical box won't require any re-configuration.

So, with that overview provided, let's address this without using NAT. (The first step, shown, is simply verifying virtualization variable values.)

user@ttyC0:physbox:~/$ echo VMDirBas=${VMDirBas} VMGenNam=${VMGenNam} VMLILNAM=${VMLILNAM}

The desired value for the VMLILNAM variable probably will/should be the name of the firewall, not the “DHCP/IPv4 server”/“endpoint system”. If the VMLILNAM is not set to the firewall, change that (so that the upcoming steps work as intended).

user@ttyC0:physbox:~/$ echo VISUAL=${VISUAL}
user@ttyC0:physbox:~/$ cat ${VMDirBas}/execbin/${VMGenNam}/${VMLILNAM}/nicscr/upif1

Add the routes.

user@ttyC0:physbox:~/$ sudo cpytobak ${VMDirBas}/execbin/${VMGenNam}/${VMLILNAM}/nicscr/upif1
user@ttyC0:physbox:~/$ echo route -n add -inet6 2001:db8:1::/125 2001:db8:8::a| sudo -n tee -a ${VMDirBas}/execbin/${VMGenNam}/${VMLILNAM}/nicscr/upif1
  • That example's subnets match the subnets shown at the top of this section. Here is a quick overview:
    • The first IPv6 subnet (2001:db8:1::/125) is the “vmSvrs subnet” as found on the second NIC on the “virtual machine” which is a firewall.
    • The second IPv6 address (2001:db8:8::a) is the address of the “firewall first NIC” as found on the first NIC on the “virtual machine” which is a firewall.
user@ttyC0:physbox:~/$ echo route -n add -inet 198.51.100.0/29 198.51.100.250| sudo -n tee -a ${VMDirBas}/execbin/${VMGenNam}/${VMLILNAM}/nicscr/upif1
  • That example's subnets match the subnets shown at the top of this section. Here is a quick overview:
    • The first IPv4 subnet (198.51.100.0/29) is the “vmSvrs subnet” as found on the second NIC on the “virtual machine” which is a firewall.
    • The second IPv4 address (198.51.100.250) is the address of the “firewall first NIC” as found on the first NIC on the “virtual machine” which is a firewall.

Add an echo statement.

user@ttyC0:physbox:~/$ echo echo Adding IPv4 route returned \${?}| sudo -n tee -a ${VMDirBas}/execbin/${VMGenNam}/${VMLILNAM}/nicscr/upif1
  • A key reason for this is so that the script returns an exit code of zero. If left unspecified, the script may simply return the exit code of the last command. If the last command was a route command that failed to add a route, because the route already existed (and, therefore, could not be “added” as requested), then that would cause the last command to return a non-zero error code, which causes the script to return a non-zero error code, which causes the virtual machine to refuse to start. That last command prevents such problems.

Review the resulting file. It may be best if the route command is not the last command of the file, so that re-adding an existing route doesn't cause the entire virtual machine to quit (as the NIC gets created) just because the final command had a non-zero “error level”/“return code” result. Make sure that any desired commands do not appear after an exit command.

user@ttyC0:physbox:~/$ echo ${VISUAL}
user@ttyC0:physbox:~/$ sudoedit ${VMDirBas}/execbin/${VMGenNam}/${VMLILNAM}/nicscr/upif1

Then, run the command manually. Here's a slick way to do that. First, make sure that the output is what was expected:

user@ttyC0:physbox:~/$ grep -i route ${VMDirBas}/execbin/${VMGenNam}/${VMLILNAM}/nicscr/upif1 | grep -i "inet6 "

Then, if you like what you see, press:

  • up arrow,
  • (recommended to then press the space bar, though that is technically optional),
  • then press the “right parenthesis” key,
  • then Ctrl-A,
  • and then type the rest of the text needed to end up running this:

user@ttyC0:physbox:~/$ sudo $(grep -i route ${VMDirBas}/execbin/${VMGenNam}/${VMLILNAM}/nicscr/upif1 | grep -i "inet6 " )
user@ttyC0:physbox:~/$ echo ${?}

Note: The quotation marks are needed to include the space as part of what is being searched for, which is more useful when working with IPv4.

user@ttyC0:physbox:~/$ grep -i route ${VMDirBas}/execbin/${VMGenNam}/${VMLILNAM}/nicscr/upif1 | grep -i "inet "

Do likewise for IPv4.

user@ttyC0:physbox:~/$ sudo $(grep -i route ${VMDirBas}/execbin/${VMGenNam}/${VMLILNAM}/nicscr/upif1 | grep -i "inet6 ")
user@ttyC0:physbox:~/$ echo ${?}
Results

There may still be some lingering problems, such as the endpoint system being unable to contact the physical box. That problem may remain because there are two problems that can cause that situation, and if both problems exist then both of the problems need to be handled to fix that specific communication. So, partial progress has been made in resolving that issue.

Another issue, however, got completely resolved: the physical box may now communicate with every address on every NIC of the firewall. This means that the physical box is now capable of communicating with at least one address on the “virtual machine” subnet. Granted, the physical box might only be able to communicate with one address on that subnet (which is the address on the firewall), but that is now working and that is an improvement.

If we had addressed the next issue (“forwarding”) issue first, there may have been little or no visible improvements because communication between the physical box and the firewall may still have been disrupted. So, tackling this issue first was worthwhile.

The nice news is that this wraps up a complex part of the routing, where lots of careful attention needs to be made to specify the correct details like individual IP addresses used by hosts, network IDs that identify an entire subnet, and subnet sizes (a.k.a. “prefix lengths”, “subnet masks”). The next piece of routing fixing will be quicker and easier.