Troubleshooting node network configuration

If the node network configuration encounters an issue, the policy is automatically rolled back and the enactments report failure. This includes issues such as:

  • The configuration fails to be applied on the host.

  • The host loses connection to the default gateway.

  • The host loses connection to the API server.

Troubleshooting an incorrect node network configuration policy configuration

You can apply changes to the node network configuration across your entire cluster by applying a node network configuration policy. If you apply an incorrect configuration, you can use the following example to troubleshoot and correct the failed node network policy.

In this example, a Linux bridge policy is applied to an example cluster that has three control plane nodes (master) and three compute (worker) nodes. The policy fails to be applied because it references an incorrect interface. To find the error, investigate the available NMState resources. You can then update the policy with the correct configuration.

Procedure

  1. Create a policy and apply it to your cluster. The following example creates a simple bridge on the ens01 interface:

    1. apiVersion: nmstate.io/v1
    2. kind: NodeNetworkConfigurationPolicy
    3. metadata:
    4. name: ens01-bridge-testfail
    5. spec:
    6. desiredState:
    7. interfaces:
    8. - name: br1
    9. description: Linux bridge with the wrong port
    10. type: linux-bridge
    11. state: up
    12. ipv4:
    13. dhcp: true
    14. enabled: true
    15. bridge:
    16. options:
    17. stp:
    18. enabled: false
    19. port:
    20. - name: ens01
    1. $ oc apply -f ens01-bridge-testfail.yaml

    Example output

    1. nodenetworkconfigurationpolicy.nmstate.io/ens01-bridge-testfail created
  2. Verify the status of the policy by running the following command:

    1. $ oc get nncp

    The output shows that the policy failed:

    Example output

    1. NAME STATUS
    2. ens01-bridge-testfail FailedToConfigure

    However, the policy status alone does not indicate if it failed on all nodes or a subset of nodes.

  3. List the node network configuration enactments to see if the policy was successful on any of the nodes. If the policy failed for only a subset of nodes, it suggests that the problem is with a specific node configuration. If the policy failed on all nodes, it suggests that the problem is with the policy.

    1. $ oc get nnce

    The output shows that the policy failed on all nodes:

    Example output

    1. NAME STATUS
    2. control-plane-1.ens01-bridge-testfail FailedToConfigure
    3. control-plane-2.ens01-bridge-testfail FailedToConfigure
    4. control-plane-3.ens01-bridge-testfail FailedToConfigure
    5. compute-1.ens01-bridge-testfail FailedToConfigure
    6. compute-2.ens01-bridge-testfail FailedToConfigure
    7. compute-3.ens01-bridge-testfail FailedToConfigure
  4. View one of the failed enactments and look at the traceback. The following command uses the output tool jsonpath to filter the output:

    1. $ oc get nnce compute-1.ens01-bridge-testfail -o jsonpath='{.status.conditions[?(@.type=="Failing")].message}'

    This command returns a large traceback that has been edited for brevity:

    Example output

    1. error reconciling NodeNetworkConfigurationPolicy at desired state apply: , failed to execute nmstatectl set --no-commit --timeout 480: 'exit status 1' ''
    2. ...
    3. libnmstate.error.NmstateVerificationError:
    4. desired
    5. =======
    6. ---
    7. name: br1
    8. type: linux-bridge
    9. state: up
    10. bridge:
    11. options:
    12. group-forward-mask: 0
    13. mac-ageing-time: 300
    14. multicast-snooping: true
    15. stp:
    16. enabled: false
    17. forward-delay: 15
    18. hello-time: 2
    19. max-age: 20
    20. priority: 32768
    21. port:
    22. - name: ens01
    23. description: Linux bridge with the wrong port
    24. ipv4:
    25. address: []
    26. auto-dns: true
    27. auto-gateway: true
    28. auto-routes: true
    29. dhcp: true
    30. enabled: true
    31. ipv6:
    32. enabled: false
    33. mac-address: 01-23-45-67-89-AB
    34. mtu: 1500
    35. current
    36. =======
    37. ---
    38. name: br1
    39. type: linux-bridge
    40. state: up
    41. bridge:
    42. options:
    43. group-forward-mask: 0
    44. mac-ageing-time: 300
    45. multicast-snooping: true
    46. stp:
    47. enabled: false
    48. forward-delay: 15
    49. hello-time: 2
    50. max-age: 20
    51. priority: 32768
    52. port: []
    53. description: Linux bridge with the wrong port
    54. ipv4:
    55. address: []
    56. auto-dns: true
    57. auto-gateway: true
    58. auto-routes: true
    59. dhcp: true
    60. enabled: true
    61. ipv6:
    62. enabled: false
    63. mac-address: 01-23-45-67-89-AB
    64. mtu: 1500
    65. difference
    66. ==========
    67. --- desired
    68. +++ current
    69. @@ -13,8 +13,7 @@
    70. hello-time: 2
    71. max-age: 20
    72. priority: 32768
    73. - port:
    74. - - name: ens01
    75. + port: []
    76. description: Linux bridge with the wrong port
    77. ipv4:
    78. address: []
    79. line 651, in _assert_interfaces_equal\n current_state.interfaces[ifname],\nlibnmstate.error.NmstateVerificationError:

    The NmstateVerificationError lists the desired policy configuration, the current configuration of the policy on the node, and the difference highlighting the parameters that do not match. In this example, the port is included in the difference, which suggests that the problem is the port configuration in the policy.

  5. To ensure that the policy is configured properly, view the network configuration for one or all of the nodes by requesting the NodeNetworkState object. The following command returns the network configuration for the control-plane-1 node:

    1. $ oc get nns control-plane-1 -o yaml

    The output shows that the interface name on the nodes is ens1 but the failed policy incorrectly uses ens01:

    Example output

    1. - ipv4:
    2. ...
    3. name: ens1
    4. state: up
    5. type: ethernet
  6. Correct the error by editing the existing policy:

    1. $ oc edit nncp ens01-bridge-testfail
    1. ...
    2. port:
    3. - name: ens1

    Save the policy to apply the correction.

  7. Check the status of the policy to ensure it updated successfully:

    1. $ oc get nncp

    Example output

    1. NAME STATUS
    2. ens01-bridge-testfail SuccessfullyConfigured

The updated policy is successfully configured on all nodes in the cluster.