With the introduction of Troubleshooting section in CCIE Lab Routing and Switching, getting the CCIE number has become ever so difficult. I have even come across people who have switched to other CCIEs like Security, Voice, etc because clearing Troubleshooting section was beyond them. The other problem with TS is that it is placed before the Lab exam, so if you think that you will not be able to clear TS Section, you will have absolutely no interest in completing the configuration portion. The disadvantage to this is that you will lose the free attempt to practice configuration section which would be helpful in future attempts.
In this blog I will write about the Non-Technical aspects which might help you in clearing the TS Section. There are several things you must know before you attempt the TS Section, during the preparation for it, before attempting it and during the attempt. These inputs are based on my experience and could differ from people to people.
Things to keep in mind while preparing for CCIE Troubleshooting
1) Make a Troubleshooting Strategy
The difference between a technician and an engineer is that an Engineer knows how technology works and thus is able to find better solutions. You are attempting the CCIE Lab because you are an engineer and are expected to find systematic and efficient solutions for a given problem. When you know how certain things work, you can systematically tackle the issues. You can make a set of scenarios like “if this happens”, “I can do this”. You are better prepared for it this way. You may also come to know of a situation when certain things happen and you do not have a solution for it. This way you are more likely to find a solution for it by asking people or doing your own research and testing.
If you are one of those who makes config changes on the fly during outages without any plan or strategy, the TS section will be a nice play ground for you. There will be tens of mistakes in the TS Section but only one will be relevant to your ticket. So if you go on removing all mistakes from the TS Section of 30+ routers, you will end up failing because they didn’t ask you to clean the routers, they asked you to troubleshoot only a certain aspect of the network. As an example, you are asked to check reachability of a particular subnet and while troubleshooting you see that there is a wrong static route for a different subnet (based on the diagram) and you think it might be the cause for your issue but it isn’t (since it is a wrong static route for a subnet which is irrelevant to your ticket). So you go on doing changes for what you think is a problem but it is not THE problem.
2) You have to practice for a topic even if it is the easiest in the world
I must stress that one should not ignore a topic because we think it is easy. I have come across a similar situation and ended up losing quite a lot of time. I thought NTP was easy, so there was no need to practice it. I was confident enough to configure it so shouldn’t I be confident enough to troubleshoot it? During TS when NTP ticket came, I said that is so easy but went blank after that. It is easy but I don’t know what commands to run to check it, what all problems it could have and how it can be verified.
3) Decide on the strategy you will use for Troubleshooting
Anyone who has learnt systematic approach of Troubleshooting would know that there are 3 approaches for it. 1) Top to Bottom 2) Bottom to Top and 3) Divide and Conquer. In the first approach, we start troubleshooting at the top of the OSI Layer and move down until we find the issue. This approach might not be relevant for CCIE Troubleshooting as we do not deal with the Application layer. In Bottom to Top approach, we start at the Physical layer moving upwards. In CCIE, there will be no cabling issue, so we will start with Layer 2 upwards till Layer 4. The approach which I prefer is Divide and Conquer in which you start at the middle of the OSI stack and move up or down depending upon the output of that layer. The advantage to this is that you never have to go beyond 3 layers of troubleshooting as compared to the other 2 strategies. For example, if you start at the Application layer and eventually find that there was a cabling issue, you will have gone through 7 layers of troubleshooting. In contrast, if you start at Layer 3, the maximum levels of troubleshooting you go through is 3. Also, the best troubleshooting tool for networkers (ping) goes well with this approach as ping checks layer 3 connectivity. This approach is good for the Lab as well as real life troubleshooting.
4) Never inject faults yourself in the Topology
I have come across people who themselves injected faults in their own topology and troubleshooted it. Over a month of troubleshooting, they memorized where every fault is and a day prior to the Lab, they said they are bored to practice any further because finding faults is so easy. Unfortunately, all of them failed in Troubleshooting. In my case, I tried to avoid it but since I was practicing alone, I could not do it. I injected the faults one month earlier but still while practicing I somehow remembered where the faults were. The problem with injecting the faults yourself is that you are limited by your imagination. You cannot inject different kinds of faults and thus cannot implement the systematic troubleshooting strategy. The tickets in the exam are not designed by you, they are designed by someone else, so right from the beginning, troubleshoot faults which are injected by others. While doing Lab configuration practice, help others who are stuck on certain topics which will form a good troubleshooting exercise.
5) Rely on Show commands rather than show run
The reason for not using show run is that it is time consuming to see running config of 4 to 5 routers for a ticket. There are times when you might miss the important thing in the running config as you skim through it. Also, at times the show run output might show something while the reality is something else. This is what happened to me. In NTP ticket, I saw the running configs of server and client and found that the type 7 key which appeared on the both routers were different. I thought the issue was with NTP authentication as the 2 keys appeared different, so I changed the keys on both routers. Unfortunately, it did not resolve my ticket. Long after the Lab when I was writing a blog about Type 7 passwords that I came to know that one password can be represented by 16 different hashes. So a password of cisco on one router can be encrypted as 12345 on one router while cisco can be encrypted as 45678 on another router. The only definitive way of coming to know that there is a password mismatch is by show commands or by debugging.
6) Practice exactly how it appears in the Exam
We know that Putty is used in the Lab but the copy and paste option is not the same as we use in our day to day use. We have 2 options, either use Putty with default settings of copy/paste (XTerm) and change all sessions in the Lab accordingly OR change the way you practice so that the type of copy/paste (Windows) you use while practice is the same as what you will get in exam. It is logical to use the second option because if we go on changing console settings of all routers, we will not be able to finish even a single ticket. The other thing to note is that we must not use shortcuts while practicing which will not be available during lab. This includes Terminal softwares like SecureCRT. We get all cozy with tab option of SecureCRT while practicing and when we attempt the Lab, it is difficult to manage the various windows and tabbing through it and getting the right window. This leads to time wastage and frustration.
In order to speed up practice sessions and for convenience, we do several optimizations like putting privilege level 15 in console and no password and we get so used to it but when we go to the Lab, these small things irks us. So as we login to the console in exam, we blindly type “show run” or “conf t” without realizing that we are still in user mode. So now we realize our mistake and type “en” and enter and then “show run” or “conf t” but again there is a password prompt we have skipped. So now we put the password and then “show run” or “conf t”. This has lead to so much time wastage and frustration.
Adding further to the above, we make many other shortcuts to make the TS practice convenient like using print outs for diagrams, tickets, etc and writing ip addresses on the diagram. In real TS, this might not be the case, so it is better to get used to the habit of using show commands for knowing the interface ip addresses and other parameters. Also, refrain from using Print outs for diagrams and tickets and instead use soft copies of it so that you are used to the habit of using digital information like in Lab.
7) Always keep in touch
This is a very important thing in CCIE. There are new things coming everyday, technical and non technical which could be all the difference between passing and failing. Always be in touch with your study group and those who are attempting the lab ahead of you. You have to do this even if your Lab date is not near because if you know of any changes, you will immediately realign your practice accordingly.
Things to keep in mind before attempting Troubleshooting
1) Be ready for the Lab before you enter the War Zone
I am not 100% sure about this but it is better to be safe than sorry. We entered the Lab room and we were instructed about the Lab, dos and don’ts, etc and were made to sit on our respective Desktops. The time was started on the big display indicating start of the Lab and we were supposed to login to start the Lab. Anyone who has given any online exam would presume that your time starts when you login, right!!!! I thought the same, so I decided to go to the washroom before logging in and lighten up. When I came back and logged in and did a couple of easy tickets, it was already half an hour gone which was impossible. My presumption probably was wrong that the time starts when you login. Your time starts when the Proctor starts the timer on the display. I have yet no independent confirmation from anyone about my presumption.
2) If you have any issue with the Lab, speak out at the beginning
When you login to your Lab, the diagram and the links to the tickets should load. If the Web interface is loading slowly, your router console too should have a problem. In my case, the web interface loaded slowly, on top of that, the console was so slow that it felt that the actual equipment was on moon. I have worked in high pressure service provider environment throughout my career and never panicked during an issue but during the Lab, my hands froze and every typing mistake I did was compounded by the fact the slow console would only show it 2 seconds later. It would take another few seconds to retype the whole command. So my suggestion would be to tell the Proctor right away because a slow console will affect your whole Lab.
Things to keep in mind during Troubleshooting
1) You have to get 80% in Troubleshooting section
Many people think that if we clear 8 tickets perfectly, it will be enough to pass. If that was the case, why would some tickets be of 2 points and some of 3? If you calculate the percentage after attempting all 2 point tickets, you will never get 80%. In order to reach 80%, you must attempt all 3 point tickets. So if you solve 8 tickets which includes a couple of 3 point tickets then you can pass. Even if you skip one 3 point ticket, you will not be able to reach the passing score.
2) Number of faults in a ticket is proportional to points
As I have mentioned before, there are many apparent faults in the configuration which are irrelevant to our ticket. So a 2 points ticket should have only 1 fault or maximum 2. A 3 points ticket should have 2 faults and maximum 3. If you are trying to remove more faults than these then you probably are beating around the bush.
This is the most important thing for your Lab exam. If you do not save and the Lab time is over, basically all your configuration will be lost when the equipments are rebooted for assessing. While troubleshooting, whenever the ticket is solved, save the changes at the very instance and do not keep it for the end when you think you will go to all routers and do a write. When in Troubleshooting, time passes like anything and before you know, time is over and if you already have not saved the configurations, you might have a panic attack as you have to do a write on 30+ routers.
4) Open limited Consoles
There are 2 ways you can take console of routers. One is via taking console of individual routers and the other is accessing the routers through Terminal server so that you can easily switch between the routers. The problem is that not many people are comfortable with it. If you are comfortable with Terminal servers, you can conveniently access all your routers with one window. As for the others, take console of routers to the devices which are relevant to the ticket. Toggling and finding the right router is quite a hassle when you are under a lot of pressure, on top of that if you keep all your consoles open, it might take you ages to find the right console window. One ticket should span across maybe 4 to 5 routers, so open only those consoles and as soon as you are done with the ticket, save the configuration and close the consoles.
5) Don’t Trust the Diagram 100%
At times, the diagrams do not show all the equipments of which we have access to in the Lab, especially after the introduction of switches in TS. It would be wise to take a look at the list of all the routers and switches.
6) Take Notes
We get a notepad and color pens during the Lab. Everybody makes use of it differently. The most beneficial use I found for it was to take notes of the tickets I have finished. You probably might start your tickets with the easiest ones and go on to the difficult ones, so we lose track of the tickets we have successfully completed, partially completed or did not attempt. It is also not feasible to go to each ticket and read it and then come to a conclusion whether you have completed the ticket or not as it is time consuming. Best is to jot it down on the notepad so that you know which tickets you have completed, the number of tickets you have completed and whether you have completed the 3 points ticket or not.
7) Be logical in your ticket rectification
So you are asked to rectify a reachability issue in an ospf network, you cannot ignore ospf and do static routes to make it work. Another example is that an ospf neighborship is not forming and you find that an access-list is present on the interface which is blocking ospf hellos. You cannot remove the access-list, it would be more judicious to allow ospf protocol than removing the whole ACL.
8) Expect no mercy
Gone are the days when brain dumpers pass CCIE Troubleshooting (or they do but I am not aware of it). So don’t expect that what tickets somebody else got, you will get it too or that others got a certain ticket with a certain fault and the fault in your ticket will be the same. This is why it is very important to have a systematic strategy to tackle the tickets.
This blog is based on my experience with CCIE and every human is different and thus his/her experience would be different. I would appreciate if the veterans can post any comments regarding their experience and any suggestion they have for newbies attempting CCIE Troubleshooting.
Insha Allah, I will follow up this blog by technical write-ups about different Troubleshooting topics, the issues which could be present and the way to tackle it.
I hope my post has been helpful in your life but the only guide which can help you in the hereafter is the Qur’an. You can download the English translation of the Qur’an here.