Troubleshooting MPLS VPN

As I had promised in my post “Non-Technical Tips for CCIE Lab Troubleshooting”, I am back with a series on troubleshooting different networking technologies centering around CCIE Lab. In this post I will be talking about MPLS VPN. For an average person working in Computer Networks, MPLS seems to be a topic which seems to be most scary. Even if their project involves just an MPLS circuit between 2 locations, the word “MPLS” still scares them although there is not much difference between it and a layer 2 circuit. I have received International calls from many of my friends and colleagues who were overwhelmed by MPLS. This could stem from lack of knowledge on MPLS or the way many people teach MPLS. I have seen many guys teaching MPLS by jumping straight to configuration. This teaching method even overwhelmed me, so I tore apart MPLS VPN to understand it better and if you see the concept first and then individual protocols, it might not seem that difficult. In this post, I will not explain MPLS VPN but help develop a strategy to troubleshoot MPLS VPN. It is expected that you know MPLS beforehand to make sense of it or at least have partial knowledge. As I had mentioned in my previous blog about the disadvantages of “Show Run”, so I would be concentrating on using debugs and show commands rather than show run for troubleshooting.

The Topology

I have represented the topology using 2 diagrams. First is the logical diagram consisting of the various protocols running and the the other is the ip address diagram.MPLS Topology protocols

The diagram is pretty self explanatory but I will still explain it. The PE-CE routing protocol between CE1 and PE1 is OSPF. Similarly it is EIGRP between CE2 and PE2. All routers of Provider are running OSPF, LDP and MP BGP. I have not run BGP so as to not complicate the setup and as it is not required and also to show that MPLS does not require IPV4 BGP. There is no requirement for MP BGP peering among all connected provider routers (one mp bgp peering is enough between PE1 and PE2) but it has been done this way for increasing the point of faults. MPLS IP Diagram

There are changes in the configuration as compared to the diagram so as to inject faults.

Requirement

You should be able to ping CE2 loopback from CE1 loopback.

Troubleshooting

According to me, it is best to use the technique of “Divide and Conquer” to isolate the faults. So we will divide the topology into 3 parts. The first and second will be the easier PE-CE connection of both locations and the third will be the Provider Cloud.

Interface not in VRF

We will begin with the most basic command on CE1 and that is to show the routing table.

CE1#sh ip route
10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C 10.10.10.0/24 is directly connected, FastEthernet0/0
C 10.10.20.1/32 is directly connected, Loopback1

From this output, we know that we are not receiving CE2 routes so the first thing to be checked is local IGP. Since we are running OSPF, let us see we are seeing any neighbors or not.

CE1#sh ip ospf neighbor

Since we are not even seeing the neighborship, we must go to a more basic level of troubleshooting and that is icmp. There are 2 places we can do ping from, one is from PE1 to CE1 and the other is CE1 to PE1. The better place to do it is PE1 to CE1 but I am doing it just the opposite so as to demonstrate why. So lets ping from CE1 to PE1.

CE1#ping 10.10.10.1
Sending 5, 100-byte ICMP Echos to 10.10.10.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 8/21/48 ms

The ping seems fine so we assume that the connectivity is fine between CE1 and PE1. Let us do the same thing from PE1 to CE1.

PE1#ping vrf cust 10.10.10.2
% VRF does not have a usable source address

From this we conclude that even though the ping from CE1 was proper, it does not mean that connectivity between CE1 and PE1 is proper. Let us see whether the interface is present in the VRF

PE1#sh ip vrf
Name                               Default RD                 Interfaces
cust                                       100:1

There are no interface in the customer vrf. Let us configure it on Fa0/0 and then see the ping

PE1(config)#int f0/0
PE1(config-if)#ip vrf for cust

PE1#ping vrf cust 10.10.10.2
Sending 5, 100-byte ICMP Echos to 10.10.10.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 8/17/32 ms

*Mar 1 03:37:14.539: %OSPF-5-ADJCHG: Process 100, Nbr 10.10.20.1 on FastEthernet0/0 from LOADING to FULL, Loading Done

We are now able to ping CE1 and also we can see from the log that OSPF neighborship between CE1 and PE1 has come up. There is nothing more we can troubleshoot here so we will move to the second leg of the troubleshooting which is CE2-PE2.

Pinging from PE2 to CE2 shows that the connectivity between the 2 is proper.

PE2#ping vrf cust 192.168.10.2
Sending 5, 100-byte ICMP Echos to 192.168.10.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 8/16/28 ms

Just like in PE1-CE1, the main thing is that IGP neighborship should be up. Let us check that

PE2#sh ip eigrp vrf cust neighbors
IP-EIGRP neighbors for process 100
H Address         Interface Hold Uptime        SRTT   RTO   Q     Seq
                                         (sec)                     (ms)             Cnt    Num
0 192.168.10.2     Fa1/0          12  04:10:08     60      360    0         3

Since the neighborship is up, our second leg is verified but are we getting the routes on CE2? Let us check that

CE2#sh ip route
C 192.168.10.0/24 is directly connected, FastEthernet0/0
          192.168.20.0/32 is subnetted, 1 subnets
C 192.168.20.1 is directly connected, Loopback1

Since we are not getting the routes still, there must be something wrong in the provider cloud.

There are 3 protocols running in the Provider cloud. OSPF, LDP and MP BGP. OSPF is the base protocol over which LDP is running and we cannot troubleshoot LDP unless we have verified that OSPF is working. So let us see whether OSPF is learning the routes or not

PE1#sh ip route
1.0.0.0/8 is variably subnetted, 9 subnets, 2 masks
O 1.1.2.2/32 [110/2] via 1.1.12.2, 00:11:26, FastEthernet1/0
C 1.1.1.1/32 is directly connected, Loopback1
C 1.0.0.1/32 is directly connected, Loopback2
O 1.1.2.1/32 [110/11] via 1.1.11.2, 00:11:16, FastEthernet0/1
O 1.1.1.2/32 [110/3] via 1.1.12.2, 00:11:16, FastEthernet1/0
C 1.1.11.0/30 is directly connected, FastEthernet0/1
C 1.1.12.0/30 is directly connected, FastEthernet1/0
O 1.1.21.0/30 [110/12] via 1.1.12.2, 00:11:17, FastEthernet1/0
O 1.1.22.0/30 [110/2] via 1.1.12.2, 00:11:28, FastEthernet1/0

From the routing table, it seems OSPF is learning all routes in the provider cloud.

Since this is an MPLS domain, we should see LDP neighborships as OSPF is running fine.

PE1#sh mpl ldp nei

The command has no output which means this router has no ldp neighbors. Well let us check whether MPLS is enabled on the Provider interfaces

PE1#sh mpl int
Interface                     IP            Tunnel            Operational
FastEthernet0/1         Yes (ldp)    No                     Yes
FastEthernet1/0         Yes (ldp)    No                     Yes

From this output, we see that both the provider interfaces are running LDP, yet we are seeing no neighbors. Let us do a debug to see where the problem is.

PE1#debug mpls ldp transport events

*Mar 1 00:18:09.463: ldp: Send ldp hello; FastEthernet0/1, src/dst 1.1.11.1/224.0.0.2, inst_id 0
*Mar 1 00:18:10.347: ldp: Rcvd ldp hello; FastEthernet0/1, from 1.1.11.2 (1.1.2.1:0), intf_id 0, opt 0xC
*Mar 1 00:18:10.351: ldp: ldp Hello from 1.1.11.2 (1.1.2.1:0) to 224.0.0.2, opt 0xC
*Mar 1 00:18:10.355: ldp: local idb = FastEthernet0/1, holdtime = 15000, peer 1.1.11.2 holdtime = 15000
*Mar 1 00:18:10.359: ldp: Link intvl min cnt = 2, intvl = 5000, idb = FastEthernet0/1
*Mar 1 00:18:11.339: ldp: Ignore Hello from 1.1.12.2, FastEthernet1/0; protocol mismatch
*Mar 1 00:18:12.191: ldp: Send ldp hello; FastEthernet1/0, src/dst 1.1.12.1/224.0.0.2, inst_id 0
*Mar 1 00:18:13.059: ldp: Scan listening TCBs

From this debug, we must check 2 things as we are supposed to have two LDP neighbors.

First is to ping 1.1.2.1 as it is the Router-ID of P1 from which we are getting updates (LDP sessions are created using router-id ip address)

PE1#ping 1.1.2.1
Sending 5, 100-byte ICMP Echos to 1.1.2.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 12/24/36 ms

The router-id of P1 is reachable and we cannot infer anything else from the debug, so we will move on to PE1-P2 connection.

LDP TDP Mismatch

From the debug, we see that there is a “protocol mismatch” message which means PE1 and P2 are running different Label Switching protocol. PE1 is running LDP, let us see what P2 is running.

P2#sh mpl int
Interface                      IP               Tunnel                 Operational
FastEthernet0/0          Yes (ldp)       No                          Yes
FastEthernet0/1          Yes (tdp)       No                          Yes

From this output, we see that Fa0/1 is running TDP. Let us change it to LDP and see whether session comes up or not.

interface FastEthernet0/1
ip address 1.1.12.2 255.255.255.252
mpls label protocol ldp
mpls ip

P2#sh mpl ldp nei | i Peer
             Peer LDP Ident: 1.1.1.2:0; Local LDP Ident 1.1.2.2:0

Still we are not seeing PE1 as LDP neighbor.

LDP Router-ID Not Reachable

Option 1: If time is not a constrain, do the debug again and this time on P2.

*Mar 1 00:40:27.875: ldp: Send ldp hello; FastEthernet0/1, src/dst 1.1.12.2/224.0.0.2, inst_id 0
*Mar 1 00:40:28.411: ldp: Rcvd ldp hello; FastEthernet0/1, from 1.1.12.1 (1.0.0.1:0), intf_id 0, opt 0xC
*Mar 1 00:40:28.419: ldp: ldp Hello from 1.1.12.1 (1.0.0.1:0) to 224.0.0.2, opt 0xC
*Mar 1 00:40:28.423: ldp: local idb = FastEthernet0/1, holdtime = 15000, peer 1.1.12.1 holdtime = 15000
*Mar 1 00:40:28.423: ldp: Link intvl min cnt = 2, intvl = 5000, idb = FastEthernet0/1
*Mar 1 00:40:28.423: ldp: Opening ldp conn; adj 0x66C8F210, 1.1.2.2 <-> 1.0.0.1; with normal priority
*Mar 1 00:40:28.427: ldp: Found adj 0x66C8F210 for 1.0.0.1 (Hello xport addr opt)
*Mar 1 00:40:28.427: ldp: MD5 setup for neighbor 1.0.0.1; password changed to [nil]
*Mar 1 00:40:28.427: ldp: No route to peer 1.0.0.1; set LDP_CTX_HANDLE_ROUTEUP

From the debug message, we see a “No route to peer 1.0.0.1” message. This means that PE1’s router-id is 1.0.0.1 and it is not reachable.

Option 2 : Since we know LDP is based on connection between router-ids, it would be faster to ping PE1 router-id from P2 to verify.

PE1#show mpls ldp discovery detail
    Local LDP Identifier:
             1.0.0.1:0

P2#ping 1.0.0.1
Sending 5, 100-byte ICMP Echos to 1.0.0.1, timeout is 2 seconds:
…..
Success rate is 0 percent (0/5)

Let us verify the same on PE1

PE1#sh ip os 1 database router self-originate | i 1.0.0.1

From this output we see that the 1.0.0.1 is not being advertised by OSPF. Let us include it in the OSPF process.

PE1(config)#router ospf 1
PE1(config-router)# network 1.0.0.1 0.0.0.0 area 0

*Mar 1 01:08:12.219: %LDP-5-NBRCHG: LDP Neighbor 1.1.2.2:0 (1) is UP
*Mar 1 01:08:12.251: %LDP-5-NBRCHG: LDP Neighbor 1.1.2.1:0 (2) is UP

From the log, we come to know that now the LDP neighborship on PE1 with P1 and P2 has come up.

Let us check the LDP neighborship on PE1 as well

PE2#sh mpl ldp nei | i Peer
Peer LDP Ident: 1.1.2.1:0; Local LDP Ident 1.1.1.2:0
Peer LDP Ident: 1.1.2.2:0; Local LDP Ident 1.1.1.2:0

So now our LDP is working good. Let us see the MPLS Forwarding table

PE2#sh mpl for
Local           Outgoing            Prefix            Bytes        tag          Outgoing           Next Hop
tag                tag or VC          or Tunnel Id   switched                  interface
16                   Pop tag              1.1.2.2/32    155                            Fa0/1               1.1.22.2
17                   Pop tag              1.1.2.1/32       0                             Fa0/0               1.1.21.2
18                     16                     1.1.1.1/32       0                            Fa0/1                1.1.22.2
19                   Pop tag              1.1.12.0/30      0                            Fa0/1                1.1.22.2
20                     17                    1.1.11.0/30      0                            Fa0/1                1.1.22.2
21                      21                   1.0.0.1/32        0                            Fa0/1                1.1.22.2

Our forwarding Table looks good. Let us now proceed to MP BGP.

Wrong Update Source in BGP

PE1#sh ip bgp vpnv4 all summary

Neighbor V       AS      MsgRcvd      MsgSent TblVer    InQ  OutQ  Up/Down State/PfxRcd
1.1.2.1    4       100            0                  195        0           0        0         never           Active
1.1.2.2    4       100          105                109        5           0        0        01:42:02           0

We must troubleshoot the peering further as one of the peering is down and the other although is up yet it is receiving no routes.

As we don’t want to rely on Show runs, we will do debugs

P1#deb ip bgp
BGP debugging is on for address family: IPv4 Unicast
*Mar 1 01:48:21.579: BGP: 1.1.1.1 passive open to 1.1.2.1
*Mar 1 01:48:21.579: BGP: 1.1.1.1 passive open failed – 1.1.2.1 is not update-source Loopback2’s address (1.1.0.1)
*Mar 1 01:48:21.579: BGP: 1.1.1.1 remote connection attempt failed, local address 1.1.2.1

We can see from the debug that the update source on P1 is wrong. Let us change that

P1(config)#router bgp 100
P1(config-router)#neighbor 1.1.1.1 update-source Loopback1

*Mar  1 01:55:50.451: %BGP-5-ADJCHANGE: neighbor 1.1.1.1 Up

The BGP neighbor comes up according to the log

A show ip bgp vpnv4 all summary on PE1 and PE2 shows no prefixes which means although now the MP BGP is up but still no customer routes are present.

Let us troubleshoot further why PE2 is not getting any prefixes. For this we will first see whether PE1 is getting customer routes in its BGP routing table or not

PE1#sh ip bgp vpnv4 vrf cust

Network                                   Next Hop                    Metric            LocPrf      Weight    Path
Route Distinguisher: 100:1 (default for vrf cust)
*> 10.10.10.0/24                       0.0.0.0                          0                                  32768        ?
*> 10.10.20.1/32                      10.10.10.2                    11                                 32768        ?

Since PE1 is getting customer routes, the same must reflect on P1 and P2. Let us see on P1 and P2

P1#sh ip bgp vpnv4 all summary | i 1.1.1.1|Neighbor
Neighbor     V   AS   MsgRcvd       MsgSent     TblVer   InQ   OutQ   Up/Down State/PfxRcd
1.1.1.1        4   100        93                  92              15        1        0        00:47:00           2

P2#sh ip bgp vpnv4 all summary | i 1.1.1.1|Neighbor
Neighbor     V   AS   MsgRcvd       MsgSent       TblVer   InQ   OutQ Up/Down State/PfxRcd
1.1.1.1         4  100        205              203                1          0         0    00:53:08            0

P1 looks fine but P2 is not receiving any routes. Let us see on PE2 whether we are getting the P1 routes.

PE2#sh ip bgp vpnv4 al sum

Neighbor V     AS    MsgRcvd    MsgSent    TblVer   InQ    OutQ   Up/Down  State/PfxRcd
1.1.2.1    4     100        374              357            9         0         0        02:26:47             0
1.1.2.2    4     100        351              359            9         0         0        00:11:13             0

We are not getting any routes from P1 which means something is wrong. Let us do a debug on PE2 and see if we can find anything. We will have to clear the BGP session for a forced update

*Mar 1 06:39:56.162: BGP(2): 1.1.2.1 rcvd UPDATE w/ attr: nexthop 1.1.1.1, origin ?, localpref 100, metric 0, originator 1.1.1.1, clusterlist 1.1.2.1, extended community RT:100:1 OSPF DOMAIN ID:0x0005:0x000000640200 OSPF RT:0.0.0.0:2:0 OSPF ROUTER ID:10.10.10.1:0
*Mar 1 06:39:56.178: BGP(2): 1.1.2.1 rcvd 100:1:10.10.10.0/24 — DENIED due to: extended community not supported;
*Mar 1 06:39:56.182: BGP(2): 1.1.2.1 rcvd UPDATE w/ attr: nexthop 1.1.1.1, origin ?, localpref 100, metric 11, originator 1.1.1.1, clusterlist 1.1.2.1, extended community RT:100:1 OSPF DOMAIN ID:0x0005:0x000000640200 OSPF RT:0.0.0.0:2:0 OSPF ROUTER ID:10.10.10.1:0
*Mar 1 06:39:56.182: BGP(2): 1.1.2.1 rcvd 100:1:10.10.20.1/32 — DENIED due to: extended community not supported;

As we can see from the debug, the routes are being received but are denied due to issue with extended community. As you might be knowing, extended communities are used in MP BGP to import and export Route Targets. Since this router is denying routes with RT as 100:1, it means no router is requesting routes with this RT. Since CE2 is connecting to PE2, PE2 should request routes for CE2 from the MP BGP. Let us see why PE2 is not requesting routes with RT 100:1.

PE2#sh ip vrf detail
VRF cust; default RD 100:2; default VPNID <not set>
Interfaces:
Fa1/0
Connected addresses are not in global routing table
Export VPN route-target communities
RT:100:2
Import VPN route-target communities
RT:100:2
No import route-map
No export route-map
VRF label distribution protocol: not configured
VRF label allocation mode: per-prefix

The above command shows us that we are importing RT 100:2 when we should be importing 100:1 which PE1 is exporting. Let us change RT import to 100:1 and see

PE2#sh ip bgp vpnv4 all summary

Neighbor    V    AS    MsgRcvd   MsgSent   TblVer   InQ   OutQ   Up/Down    State/PfxRcd
1.1.2.1        4    100       448            424           13        0        0       00:05:04             2
1.1.2.2        4    100       415            426           13        0        0       00:05:02             0

We are now receiving 2 routes from P1 but we must still find out why no routes are obtained from P2.

IBGP Split Horizon

There is no easy way to come to know why P2 is not taking routes from PE1. Primarily it is because it does not see a request for RT 100:1. Even after we import RT 100:1 in PE2 yet it is not requesting it from PE1 as this request never reaches PE1 as the request dies at P2 due to IBGP split horizon. In order to reflect this request and route to PE1, we must make P2 as route reflector. Let us do that and see whether it makes any difference

P2(config)#router bgp 100
P2(config-router)# address-family vpnv4
P2(config-router-af)#neighbor 1.1.1.2 route-reflector-client

P2#sh ip bgp vpnv4 al s | i 1.1.1.1|Neighbor

Neighbor   V   AS   MsgRcvd   MsgSent   TblVer   InQ   OutQ   Up/Down   State/PfxRcd
1.1.1.1      4   100       417           414             3         0       0        01:16:41             2

Now that we are seeing the route on P2, we should also be able to see it on PE2

PE2#sh ip bgp vpnv4 al su

Neighbor   V   AS   MsgRcvd   MsgSent   TblVer   InQ   OutQ   Up/Down   State/PfxRcd
1.1.2.1       4  100       464           440            13       0         0        00:21:39           2
1.1.2.2       4  100       438           446            13       0         0        00:11:19           2

We surely are seeing the routes on PE2. We should see the same routes on CE2. Let us see whether there are any routes on CE2 or not

MP BGP Redistribution into IGP

CE2#sh ip route

C 192.168.10.0/24 is directly connected, FastEthernet0/0
           192.168.20.0/32 is subnetted, 1 subnets
C 192.168.20.1 is directly connected, Loopback1

Still we are not getting the routes on CE2. The only thing we can check on PE2 is whether BGP is redistributed into EIGRP or not. Let us see that with an example subnet

PE2#sh ip route vrf cust 10.10.10.0
Routing entry for 10.10.10.0/24
     Known via “bgp 100”, distance 200, metric 0, type internal
     Last update from 1.1.1.1 00:23:28 ago
     Routing Descriptor Blocks:
     * 1.1.1.1 (Default-IP-Routing-Table), from 1.1.2.1, 00:23:28 ago
            Route metric is 0, traffic share count is 1
            AS Hops 0

As can be seen from the output, the BGP process is not redistributed into EIGRP. Lets rdistribute it and see if we can get the routes

PE2(config)#router eigrp 65000
PE2(config-router)# address-family ipv4 vrf cust
PE2(config-router-af)#redistribute bgp 100 metric 1 1 1 1 1

We are now seeing the routes on CE2

CE2#sh ip route

C         192.168.10.0/24 is directly connected, FastEthernet0/0
            192.168.20.0/32 is subnetted, 1 subnets
C                192.168.20.1 is directly connected, Loopback1
            10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
D EX           10.10.10.0/24
                          [170/2560002816] via 192.168.10.1, 00:01:54, FastEthernet0/0
D EX           10.10.20.1/32
                          [170/2560002816] via 192.168.10.1, 00:01:54, FastEthernet0/0

So we have just been successful in getting the routes from CE1 to CE2. We must now go the other way round to get routes on CE1 from CE2. Let us begin with PE2

IGP Redistribution into MP BGP

PE2#sh ip bgp vpnv4 vrf cust
Network               Next Hop             Metric             LocPrf              Weight             Path
Route Distinguisher: 100:2 (default for vrf cust)
*>i10.10.10.0/24   1.1.1.1                   0                  100                       0                    ?
*>i10.10.20.1/32   1.1.1.1                  11                 100                       0                    ?

CE2 routes are not even seen on PE2 which means EIGRP routes are not redistributed into BGP. Let us do it and see if it reflects

PE2(config)#router bgp 100
PE2(config-router)# address-family ipv4 vrf cust
PE2(config-router-af)#redistribute eigrp 100

Network                          Next Hop                Metric              LocPrf         Weight       Path
Route Distinguisher: 100:2 (default for vrf cust)
*>i10.10.10.0/24                1.1.1.1                    0                   100                  0              ?
*>i10.10.20.1/32                 1.1.1.1                  11                  100                  0              ?
*> 192.168.10.0                  0.0.0.0                  0                                        32768          ?
*> 192.168.20.1/32             192.168.10.2     156160                                 32768          ?

Now that we are seeing the routes on PE2, we should also be able to see it on PE1, P1 and P2 and we are.

PE1#sh ip bgp vpnv4 vrf cust

Network                       Next Hop           Metric              LocPrf           Weight            Path
Route Distinguisher: 100:1 (default for vrf cust)
*> 10.10.10.0/24            0.0.0.0                0                                         32768               ?
*> 10.10.20.1/32           10.10.10.2          11                                        32768               ?
*>i192.168.10.0             1.1.1.2                0                     100                   0                  ?
*>i192.168.20.1/32        1.1.1.2           156160                100                   0                  ?

Since we are seeing it on PE1, it should also reach CE1. Let us see that

CE1#sh ip rout

O E2    192.168.10.0/24 [110/1] via 10.10.10.1, 00:07:34, FastEthernet0/0
             192.168.20.0/32 is subnetted, 1 subnets
O E2          192.168.20.1 [110/156160] via 10.10.10.1, 00:07:34, FastEthernet0/0
             10.0.0.0/8 is variably subnetted, 2 subnets, 2 masks
C                10.10.10.0/24 is directly connected, FastEthernet0/0
C                10.10.20.1/32 is directly connected, Loopback1

Since we have routes on both CE1 and CE2, we should be able to trace end to end

CE1#tracer 192.168.20.1 source 10.10.20.1
1 10.10.10.1 44 msec 40 msec 4 msec
2 * * *
3 * * *

We are not able to get past the MPLS cloud which means there are still some issues in MPLS cloud.

LDP and OSPF Loopback mask mismatch

Lets start from PE1 and see if we can find the issue. We will start with MP BGP and see if all is well.

PE1#sh ip bgp vpnv4 vrf cust

Network                     Next Hop             Metric              LocPrf             Weight             Path
Route Distinguisher: 100:1 (default for vrf cust)
*> 10.10.10.0/24        0.0.0.0                   0                                           32768                 ?
*> 10.10.20.1/32        10.10.10.2            11                                          32768                 ?
*>i192.168.10.0          1.1.1.2                  0                     100                     0                    ?
*>i192.168.20.1/32     1.1.1.2             156160                100                     0                    ?

For CE2 networks to be reachable, 1.1.1.2 should be reachable via MPLS. So let us see its label.

PE1#sh mpl for 1.1.1.2
Local         Outgoing            Prefix                 Bytes tag             Outgoing             Next Hop
tag             tag or VC           or Tunnel Id       switched               interface
19                    16                 1.1.1.2/32                0                        Fa1/0                1.1.12.2

P1#sh mpl for 1.1.1.2
Local         Outgoing            Prefix                   Bytes tag             Outgoing             Next Hop
tag             tag or VC           or Tunnel Id          switched              interface
17               Untagged           1.1.1.2/32                    0                    Fa0/1                 1.1.21.1

Since P1 is the Penultimate Hop, the label should be popped and not be removed completely. Why is the label being set to untagged? We will have to go to PE2 to investigate. On PE2, let us see what is the label for 1.1.1.2

PE2#sh mpls ldp binding 1.1.1.2 24 

The above command shows that there is no LDP Binding for 1.1.1.2/24. Since this interface is a connected interface on PE2, shouldn’t it be having an LDP binding? The problem stems from the confusion between LDP and OSPF. OSPF converts the loopback mask to /32 while advertising while LDP sees it as a /24 connected interface. Due to this, it does not generate any label for this interface. The solution for this is to either make the interface mask as /32 or make the interface as ospf point to point. Let us go with the second.

PE2(config)#int lo1
PE2(config-if)#ip ospf network point-to-point

Now let us go back to see if we are getting Labels on P1 or not.

P1#sh mpl for 1.1.1.2
Local        Outgoing           Prefix          Bytes tag             Outgoing             Next Hop
tag              tag or VC        or Tunnel Id switched             interface
22               Pop tag            1.1.1.0/24         0                    Fa0/1                   1.1.21.1 

As we are getting proper pop label on P1, let us see on CE1 whether it is able to reach CE2 or not

CE1#ping 192.168.20.1 so 10.10.20.1
Sending 5, 100-byte ICMP Echos to 192.168.20.1, timeout is 2 seconds:
Packet sent with a source address of 10.10.20.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 60/69/88 ms

Whoa, finally we can reach CE2.

There will be always other faults which could be present which I might not have covered. If there is any fault which you encounter which i have not covered, do let me know so that I can include it in the faults. Have a happy troubleshooting.

Important portion of Pre config

PE1

ip vrf cust
rd 100:1
route-target export 100:1
route-target import 100:1

mpls label protocol ldp

interface Loopback1
ip address 1.1.1.1 255.255.255.255
!
interface Loopback2
ip address 1.0.0.1 255.255.255.255

interface FastEthernet0/1
ip address 1.1.11.1 255.255.255.252
mpls ip

interface FastEthernet1/0
ip address 1.1.12.1 255.255.255.252
mpls ip

router ospf 100 vrf cust
log-adjacency-changes
redistribute bgp 100 subnets
network 10.10.10.1 0.0.0.0 area 0
!
router ospf 1
log-adjacency-changes
network 1.1.1.1 0.0.0.0 area 0
network 1.1.11.1 0.0.0.0 area 0
network 1.1.12.1 0.0.0.0 area 0
!
router bgp 100
no bgp default ipv4-unicast
bgp log-neighbor-changes
neighbor 1.1.2.1 remote-as 100
neighbor 1.1.2.1 update-source Loopback1
neighbor 1.1.2.2 remote-as 100
neighbor 1.1.2.2 update-source Loopback1
!
address-family vpnv4
neighbor 1.1.2.1 activate
neighbor 1.1.2.1 send-community extended
neighbor 1.1.2.2 activate
neighbor 1.1.2.2 send-community extended
exit-address-family
!
address-family ipv4 vrf cust
redistribute ospf 100 vrf cust match internal external 1 external 2
no synchronization
exit-address-family

mpls ldp router-id Loopback2

PE2

ip vrf cust
rd 100:2
route-target export 100:2
route-target import 100:2

mpls label protocol ldp

interface Loopback1
ip address 1.1.1.2 255.255.255.0
!
interface FastEthernet0/0
ip address 1.1.21.1 255.255.255.252
mpls ip

interface FastEthernet0/1
ip address 1.1.22.1 255.255.255.252
mpls ip
!
interface FastEthernet1/0
ip vrf forwarding cust
ip address 192.168.10.1 255.255.255.0

router eigrp 65000
auto-summary
!
address-family ipv4 vrf cust
network 192.168.10.0
no auto-summary
autonomous-system 100
exit-address-family
!
router ospf 1
log-adjacency-changes
network 1.1.1.2 0.0.0.0 area 0
network 1.1.21.1 0.0.0.0 area 0
network 1.1.22.1 0.0.0.0 area 0
!
router bgp 100
no bgp default ipv4-unicast
bgp log-neighbor-changes
neighbor 1.1.2.1 remote-as 100
neighbor 1.1.2.1 update-source Loopback1
neighbor 1.1.2.2 remote-as 100
neighbor 1.1.2.2 update-source Loopback1
!
address-family vpnv4
neighbor 1.1.2.1 activate
neighbor 1.1.2.1 send-community extended
neighbor 1.1.2.2 activate
neighbor 1.1.2.2 send-community extended
exit-address-family
!
address-family ipv4 vrf cust
no synchronization
exit-address-family

P1

mpls label protocol ldp

interface Loopback1
ip address 1.1.2.1 255.255.255.255
!
interface Loopback2
ip address 1.1.0.1 255.255.255.255

router ospf 1
log-adjacency-changes
network 1.1.2.1 0.0.0.0 area 0
network 1.1.11.2 0.0.0.0 area 0
network 1.1.21.2 0.0.0.0 area 0
!
router bgp 100
no bgp default ipv4-unicast
bgp log-neighbor-changes
neighbor 1.1.1.1 remote-as 100
neighbor 1.1.1.1 update-source Loopback2
neighbor 1.1.1.2 remote-as 100
neighbor 1.1.1.2 update-source Loopback1
!
address-family vpnv4
neighbor 1.1.1.1 activate
neighbor 1.1.1.1 send-community extended
neighbor 1.1.1.2 activate
neighbor 1.1.1.2 send-community extended
neighbor 1.1.1.2 route-reflector-client
exit-address-family

P2

mpls label protocol tdp

interface Loopback1
ip address 1.1.2.2 255.255.255.255
!
interface FastEthernet0/0
ip address 1.1.22.2 255.255.255.252
mpls label protocol ldp
mpls ip
!
interface FastEthernet0/1
ip address 1.1.12.2 255.255.255.252
mpls ip
!
router ospf 1
log-adjacency-changes
network 1.1.2.2 0.0.0.0 area 0
network 1.1.12.2 0.0.0.0 area 0
network 1.1.22.2 0.0.0.0 area 0
!
router bgp 100
no bgp default ipv4-unicast
bgp log-neighbor-changes
neighbor 1.1.1.1 remote-as 100
neighbor 1.1.1.1 update-source Loopback1
neighbor 1.1.1.2 remote-as 100
neighbor 1.1.1.2 update-source Loopback1
!
address-family vpnv4
neighbor 1.1.1.1 activate
neighbor 1.1.1.1 send-community extended
neighbor 1.1.1.2 activate
neighbor 1.1.1.2 send-community extended
exit-address-family

CE1

interface Loopback1
ip address 10.10.20.1 255.255.255.255
!
interface FastEthernet0/0
ip address 10.10.10.2 255.255.255.0

router ospf 1
log-adjacency-changes
network 10.10.10.2 0.0.0.0 area 0
network 10.10.20.1 0.0.0.0 area 0

CE2

interface Loopback1
ip address 192.168.20.1 255.255.255.255
!
interface FastEthernet0/0
ip address 192.168.10.2 255.255.255.0

router eigrp 100
network 192.168.10.0
network 192.168.20.1 0.0.0.0
no auto-summary

I hope my post has been helpful in your life but the only guide which can help you in the hereafter is the Qur’an. You can download the English translation of the Qur’an here.

Advertisements

2 thoughts on “Troubleshooting MPLS VPN

  1. Pingback: Non-Technical Tips for CCIE Lab Troubleshooting | Baba AweSam

  2. Pingback: Troubleshooting NTP | Baba AweSam

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s