CRS-1 路由器8个Plane有1个plane显示MCAST_DOWN
硬件平台
CRS
软件版本
IOS XR
案例简介
CRS通过”(admin)#show controller fabric plane all” 发现plane 1显示MCAST_DOWN, 对应在Down Flags一栏显示m.
正常应该所有均显示UP。此情况说明该平面的multicast转发已经停止,不过组播流量仍然可以通过其他7个平面转发。
此情况没有业务影响。但请尽快处理。
(admin)#sh contr fabric plane all de
Wed Mar 27 08:59:36.956 BeiJing
[K Flags: P - plane admin down, p - plane oper down
C - card admin down, c - card oper down
L - link port admin down, l - linkport oper down
A - asic admin down, a - asic oper down
B - bundle port admin Down, b - bundle port oper down
I - bundle admin down, i - bundle oper down
N - node admin down, n - node down
o - other end of link down d - data down
f - failed component downstream
m - plane multicast down, s - link port permanently shutdown
t - no barrier input O - Out-Of-Service oper down
T - topology mismatch down
Plane Admin Oper up->dn Down Total Down
Id State State counter Flags Bundles Bundles
------------------------------------------------------------
0 UP UP 0 9 0
1 UP MCAST_DOWN 0 m 9 0
2 UP UP 0 9 0
3 UP UP 0 9 0
4 UP UP 0 9 0
5 UP UP 0 9 0
6 UP UP 0 9 0
7 UP UP 1 9 0
故障诊断步骤
1. 检查fabric的connectivity是否良好。全1代表每块业务板卡/RP跟所有8个planes的连接都是完好的。如果1换成”.”, 则该板卡/RP跟该平面连接有问题。
(admin)# show controllers fabric connectivity all detail
Card In Tx Planes Rx Planes Monitored Total Percent
R/S/M Use 01234567 01234567 For (s) Uptime (s) Uptime
-------------------------------------------------------------------------------
0/0/CPU0 1 11111111 11111111 335147 335147 100.0000
0/2/CPU0 1 11111111 11111111 335147 335147 100.0000
0/RP0/CPU0 1 11111111 11111111 335147 335147 100.0000
0/RP1/CPU0 1 11111111 11111111 335147 335147 100.0000
2. 大多数MCAST_DOWN的问题由于S13卡的S3 ASIC 跟业务板卡的Fabricq ASIC的连接down 掉引起,下面我们就此做个检查。
(admin)#show controllers fabric link port fabricqr | exclude “UP UP”
Fri Mar 29 12:51:07.148 BeiJing
0/15/CPU0/0/4 UP DOWN l 0/SM1/SP/2/68
0/15/CPU0/0/5 UP DOWN l 0/SM1/SP/3/21
0/15/CPU0/0/6 UP DOWN l 0/SM1/SP/2/69
0/15/CPU0/0/7 UP DOWN l 0/SM1/SP/3/20
- 每个S13卡的每个S3 ASIC在一个平面内跟每个业务板卡只有8根links,只要down link大于等于2,就会显示MCAST_DOWN. 这里我们看到已经有4根links down,所以这两块卡:0/15/cpu0和0/SM1/SP都要受到怀疑。
3. 为了分析是业务板卡0/15/cpu0还是S13卡0/SM1/SP 出问题,考虑到该系统为4+2多机框系统,交换0框和1框的plane 1 的S13卡,进而查看问题有否跟随S13卡而走。该操作只影响一个平面,完全不会给客户业务带来影响,但是为慎重起见,请在业务窗口执行。具体步骤如下:
1. 关闭平面1和0框、1框的平面1的S13矩阵卡
Admin
Conf t
controller fabric plane 1 shutdown
commit
Hw-module power disable location 0/sm1/sp
Hw-module power disable location 1/sm1/sp
commit
2. 将两个矩阵卡的扁平线拔出,然后互换位置,并连接扁平线。
3. 给两个矩阵卡及平面1加电
no hw shutdown loc 0/sm1/sp
no hw shutdown loc 1/sm1/sp
commit
no controller fabric plane 1 shutdown
commit
4. 收集如下命令查看
show controllers fabric link port fabricqr | inc 0/15/CPU0/ | INC 0/SM1
show controllers fabric link port fabricqr | inc 0/15/CPU0/ | INC 1/SM1
show controllers fabric link port fabricqr
show inventory
show platform */SM1/SP
4. 结果通过命令看到down links跟着S13卡走。如下所示。RMA 原0/SM1/SP的S13卡。问题解决。
(admin)#show controllers fabric link port fabricqr | inc 0/15/CPU0/ | INC 1/SM1
Fri Mar 29 12:51:07.148 BeiJing
0/15/CPU0/0/4 UP DOWN l 1/SM1/SP/2/68
0/15/CPU0/0/5 UP DOWN l 1/SM1/SP/3/21
0/15/CPU0/0/6 UP DOWN l 1/SM1/SP/2/69
0/15/CPU0/0/7 UP DOWN l 1/SM1/SP/3/20
0/15/CPU0/1/4 UP UP 1/SM1/SP/2/15
0/15/CPU0/1/5 UP UP 1/SM1/SP/3/57
0/15/CPU0/1/6 UP UP 1/SM1/SP/2/14
0/15/CPU0/1/7 UP UP 1/SM1/SP/3/56
经验总结
1个egress LC/RP 有32根 2.5Gbps的links 连接1个S3 ASIC。
1个egress LC有2个fabricq ASICs.
1个RP 有1个fabricq ASIC.
所以
1. 一个平面内,1个egress LC有8根 links连去S13卡。(32/8 + 32/8)
2. 一个平面内,上半框(slot 0 – slot 7)的LC, 4根连接去S3 ASIC 0, 另外4根连接S3 ASIC 1.
下半框的LC,4根连接ASIC 2, 4根连接ASIC 3.
于是,此case中,在1平面, 在0/15/cpu0和0/sm1/sp之间总共有8根links. 已经down了一半,为了让multicast的traffic不再从PLANE 1送去0/15/cpu0,所以系统把plane 1的multicast给down了。
相关命令
show controllers fabric link port fabricqr | exclude “UP UP”
Admin
Conf t
controller fabric plane 1 shutdown
Hw-module power disable location <>
commit
No hw-module power disable location <>
No controller fabric plane 1 shutdown <>
commit
相关错误信息
RP/0/RP0/CPU0::Mar 5 09:20:44.963 : fsdb_aserver[210]: %FABRIC-FSDB-1-PLANE_UPDOWN : Plane 1 state changed to MCAST_DOWN;
其他相关文档
原文
http://www.cisco.com/cisco/web/support/CN/111/1117/1117750_McastDownCaseStudy.html
暂时还木有人评论,坐等沙发!