欧卡2中文社区

 找回密码
 立即注册

QQ登录

只需一步,快速开始

需要三步,才能开始

只需两步,慢速开始

玩欧卡就用莱仕达V99方向盘欧卡2入门方向盘选莱仕达V9莱仕达折叠便携游戏方向盘支架欢迎地图Mod入驻
查看: 13600|回复: 7
收起左侧

[系统维护] 网卡间歇性中断

[复制链接]
oppo 发表于 2014-11-7 13:26 | 显示全部楼层 |阅读模式
本帖最后由 oppo 于 2014-11-7 13:53 编辑

现象:
1. 有rx error和dropped
[root@mars71 10.55.22.71 ~]# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr A4:BA:DB:2A:7F:B2  
          inet addr:10.54.22.71  Bcast:10.54.22.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:19690063551 errors:1651 dropped:1651 overruns:0 frame:1651
          TX packets:286957776 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:2770004192553 (2.5 TiB)  TX bytes:37963611935 (35.3 GiB)
          Interrupt:106 Memory:d6000000-d6012800 

[root@mars71 10.55.22.71 ~]# ifconfig eth1
eth1      Link encap:Ethernet  HWaddr A4:BA:DB:2A:7F:B4  
          inet addr:10.55.22.71  Bcast:10.55.22.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:146800819704 errors:14 dropped:60724 overruns:0 frame:14
          TX packets:140655980537 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:15526976368356 (14.1 TiB)  TX bytes:85106426142042 (77.4 TiB)
          Interrupt:114 Memory:d8000000-d8012800 

[root@mars71 10.55.22.71 ~]#



2. message可以看到网卡间歇性中断
[root@mars71 10.55.22.71 ~]# cat /var/log/messages | grep bnx2
Nov  2 19:01:35 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  2 19:02:33 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  2 19:02:46 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  2 19:03:56 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  2 19:21:48 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  2 19:22:47 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  2 19:22:59 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  2 19:24:08 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 01:55:56 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 01:56:54 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 01:57:06 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 01:58:15 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 02:36:58 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 02:37:56 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 02:38:09 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 02:39:17 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 02:56:08 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 02:57:06 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 02:57:19 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 02:58:27 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 03:04:37 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 03:05:35 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 03:05:47 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 03:06:56 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 03:15:37 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 03:16:35 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 03:16:48 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 03:17:56 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 03:19:38 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 03:20:36 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 03:20:49 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 03:21:58 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 03:22:02 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 03:23:00 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 03:23:12 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 03:24:21 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 03:31:43 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 03:32:41 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 03:32:54 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 03:34:02 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 11:23:39 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 11:24:37 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 11:24:50 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 11:25:58 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 17:32:48 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 17:33:46 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 17:33:58 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 17:35:07 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 20:03:59 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 20:04:57 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 20:05:09 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 20:06:19 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  6 20:06:20 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  6 20:06:22 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  7 12:17:16 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  7 12:18:14 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  7 12:18:27 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  7 12:19:35 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  7 12:51:12 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  7 12:52:11 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
Nov  7 12:52:23 mars71 kernel: bnx2: eth1 NIC Copper Link is Down
Nov  7 12:53:33 mars71 kernel: bnx2: eth1 NIC Copper Link is Up, 1000 Mbps full duplex
[root@mars71 10.55.22.71 ~]#




3. 机器型号
[root@mars71 10.55.22.71 ~]# dmidecode -s system-product-name
PowerEdge R710
[root@mars71 10.55.22.71 ~]#



4. 网卡型号
[root@mars71 10.55.22.71 ~]# lspci |grep Ethe
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
[root@mars71 10.55.22.71 ~]#



5. 网卡驱动信息
[root@mars71 10.55.22.71 ~]# lsmod |grep bnx2
bnx2                  211976  0 
[root@mars71 10.55.22.71 ~]# modinfo bnx2
filename:       /lib/modules/2.6.18-164.el5/updates/bnx2.ko
version:        1.9.20b
license:        GPL
description:    Broadcom NetXtreme II BCM5706/5708/5709/5716 Driver
author:         Michael Chan <[email]mchan@broadcom.com[/email]>
srcversion:     824BF2D5650956C545AC5BD
alias:          pci:v000014E4d0000163Csv*sd*bc*sc*i*
alias:          pci:v000014E4d0000163Bsv*sd*bc*sc*i*
alias:          pci:v000014E4d0000163Asv*sd*bc*sc*i*
alias:          pci:v000014E4d00001639sv*sd*bc*sc*i*
alias:          pci:v000014E4d000016ACsv*sd*bc*sc*i*
alias:          pci:v000014E4d000016AAsv*sd*bc*sc*i*
alias:          pci:v000014E4d000016AAsv0000103Csd00003102bc*sc*i*
alias:          pci:v000014E4d0000164Csv*sd*bc*sc*i*
alias:          pci:v000014E4d0000164Asv*sd*bc*sc*i*
alias:          pci:v000014E4d0000164Asv0000103Csd00003106bc*sc*i*
alias:          pci:v000014E4d0000164Asv0000103Csd00003101bc*sc*i*
depends:        
vermagic:       2.6.18-164.el5 SMP mod_unload gcc-4.1
parm:           disable_msi:Disable Message Signaled Interrupt (MSI) (int)
[root@mars71 10.55.22.71 ~]# ethtool -i eth0
driver: bnx2
version: 1.9.20b
firmware-version: 5.0.11 NCSI 2.0.5
bus-info: 0000:01:00.0
[root@mars71 10.55.22.71 ~]# ethtool -i eth1
driver: bnx2
version: 1.9.20b
firmware-version: 5.0.11 NCSI 2.0.5
bus-info: 0000:01:00.1
[root@mars71 10.55.22.71 ~]#



 楼主| oppo 发表于 2014-11-7 13:53 | 显示全部楼层

6. 网卡信息
[root@mars71 10.55.22.71 ~]# ethtool eth0
Settings for eth0:
 Supported ports: [ TP ]
 Supported link modes:   10baseT/Half 10baseT/Full 
                         100baseT/Half 100baseT/Full 
                         1000baseT/Full 
 Supports auto-negotiation: Yes
 Advertised link modes:  10baseT/Half 10baseT/Full 
                         100baseT/Half 100baseT/Full 
                         1000baseT/Full 
 Advertised auto-negotiation: Yes
 Speed: 1000Mb/s
 Duplex: Full
 Port: Twisted Pair
 PHYAD: 1
 Transceiver: internal
 Auto-negotiation: on
 Supports Wake-on: g
 Wake-on: d
 Link detected: yes
[root@mars71 10.55.22.71 ~]# ethtool eth1
Settings for eth1:
 Supported ports: [ TP ]
 Supported link modes:   10baseT/Half 10baseT/Full 
                         100baseT/Half 100baseT/Full 
                         1000baseT/Full 
 Supports auto-negotiation: Yes
 Advertised link modes:  10baseT/Half 10baseT/Full 
                         100baseT/Half 100baseT/Full 
                         1000baseT/Full 
 Advertised auto-negotiation: Yes
 Speed: 1000Mb/s
 Duplex: Full
 Port: Twisted Pair
 PHYAD: 1
 Transceiver: internal
 Auto-negotiation: on
 Supports Wake-on: g
 Wake-on: d
 Link detected: yes
[root@mars71 10.55.22.71 ~]# ethtool -S eth0
NIC statistics:
     rx_bytes: 2770006801239
     rx_error_bytes: 0
     tx_bytes: 37963884095
     tx_error_bytes: 0
     rx_ucast_packets: 3000190
     rx_mcast_packets: 19421846269
     rx_bcast_packets: 265238107
     tx_ucast_packets: 4107840
     tx_mcast_packets: 282762689
     tx_bcast_packets: 89180
     tx_mac_errors: 0
     tx_carrier_errors: 0
     rx_crc_errors: 0
     rx_align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     tx_deferred: 0
     tx_excess_collisions: 0
     tx_late_collisions: 0
     tx_total_collisions: 0
     rx_fragments: 0
     rx_jabbers: 0
     rx_undersize_packets: 0
     rx_oversize_packets: 0
     rx_64_byte_packets: 272366238
     rx_65_to_127_byte_packets: 4009624276
     rx_128_to_255_byte_packets: 3942158872
     rx_256_to_511_byte_packets: 2875999558
     rx_512_to_1023_byte_packets: 515
     rx_1024_to_1522_byte_packets: 515
     rx_1523_to_9022_byte_packets: 0
     tx_64_byte_packets: 591313
     tx_65_to_127_byte_packets: 220686866
     tx_128_to_255_byte_packets: 24661999
     tx_256_to_511_byte_packets: 40981691
     tx_512_to_1023_byte_packets: 8418
     tx_1024_to_1522_byte_packets: 29422
     tx_1523_to_9022_byte_packets: 0
     rx_xon_frames: 0
     rx_xoff_frames: 0
     tx_xon_frames: 0
     tx_xoff_frames: 0
     rx_mac_ctrl_frames: 0
     rx_filtered_packets: 120698643
     rx_ftq_discards: 1651
     rx_discards: 0
     rx_fw_discards: 0
[root@mars71 10.55.22.71 ~]# ethtool -S eth1
NIC statistics:
     rx_bytes: 15527005939734
     rx_error_bytes: 0
     tx_bytes: 85106503204516
     tx_error_bytes: 0
     rx_ucast_packets: 146261227969
     rx_mcast_packets: 94563329
     rx_bcast_packets: 445395510
     tx_ucast_packets: 140656099014
     tx_mcast_packets: 0
     tx_bcast_packets: 194950
     tx_mac_errors: 0
     tx_carrier_errors: 0
     rx_crc_errors: 0
     rx_align_errors: 0
     tx_single_collisions: 0
     tx_multi_collisions: 0
     tx_deferred: 0
     tx_excess_collisions: 0
     tx_late_collisions: 0
     tx_total_collisions: 0
     rx_fragments: 0
     rx_jabbers: 0
     rx_undersize_packets: 0
     rx_oversize_packets: 0
     rx_64_byte_packets: 4271581008
     rx_65_to_127_byte_packets: 934351903
     rx_128_to_255_byte_packets: 694033497
     rx_256_to_511_byte_packets: 2405688231
     rx_512_to_1023_byte_packets: 479280699
     rx_1024_to_1522_byte_packets: 577297998
     rx_1523_to_9022_byte_packets: 0
     tx_64_byte_packets: 197721568
     tx_65_to_127_byte_packets: 2835597863
     tx_128_to_255_byte_packets: 3916619194
     tx_256_to_511_byte_packets: 854664693
     tx_512_to_1023_byte_packets: 3524016669
     tx_1024_to_1522_byte_packets: 478655097
     tx_1523_to_9022_byte_packets: 0
     rx_xon_frames: 0
     rx_xoff_frames: 0
     tx_xon_frames: 0
     tx_xoff_frames: 0
     rx_mac_ctrl_frames: 0
     rx_filtered_packets: 1315681107
     rx_ftq_discards: 14
     rx_discards: 0
     rx_fw_discards: 60710
[root@mars71 10.55.22.71 ~]#
 楼主| oppo 发表于 2014-11-7 14:04 | 显示全部楼层
解决方案搜集


1,http://blog.itpub.net/8183550/viewspace-694885
Dell R610/710服务器网卡在使用过程中可能会中断,导致GP,hadoop不正常工作,或者无法连接master或NAMENODE,需要手工重启网卡。导致这种问题的原因是dell服务器使用的Broadcom NetXtreme II BCM 5709在centos(或REDHAT)操作系统上的驱动存在问题,导致网卡工作时,ACPI(节电服务)以为网卡闲着,关掉网卡。
所以可以尝试采用两种方式解决该问题,第一,升级网卡驱动。网卡驱动的下载地址http://www.broadcom.com/support/ethernet_nic/downloaddrivers.php
第二种方式是尝试关闭ACPI节电服务。

2,http://hi.baidu.com/cznanjibing/item/d4d1b50e94f4b0c12f4c6bdb
RedHat As5.X 版本中的Broadcom NetXtreme II BCM 5709网卡驱动有BUG,导致网卡在有负载时候喜欢中断,ifconfig查看会发现:
RX packets:10487593 errors:4756121 dropped:0 overruns:0 frame:4756121
TX packets:10829687 errors:0 dropped:0 overruns:0 carrier:0
重启网卡后恢复正常,一定时间后,故障依然。
解决方法: 升级网卡驱动

 楼主| oppo 发表于 2014-11-7 14:09 | 显示全部楼层
3,http://suchalin.blog.163.com/blog/static/5530467720114230617948/


现网上dell和hp都相继的出现网卡异常down,导致服务器中断的问题,现综合各方面信息和对网卡异常的跟踪做分析和处理建议。


一、      问题分析和总结
DELL PE610 是BROADCOM 5709C 的网卡, 对于LINUX 系统下网络I/O大的时候导致的网络不稳定情况,请参考一下REDHAT 公司KB文档 : https://access.redhat.com/kb/docs/DOC-26837 (具体见附件)  [目前需要用户和密码才能访问REDHAT文档资料].

其中注明了此网卡的bug修复文档https://rhn.redhat.com/errata/RHSA-2010-0398.html

1.       网卡的各种中断方式和区别,以及操作系统对中断的选择

1)         网卡中断方式的发展,INTx,MSI,MSI-X

操作系统目前可识别三种类型的中断:

l  传统中断(INTx)-传统或固定中断是指使用早期总线技术的中断。使用这些技术,可通过一个或多个“带外”(即,独立于总线的主线)连线的外部管脚来发送中断信号。较新的总线技术(如 PCI Express)通过带内机制模拟传统中断来维持软件兼容性。主机 OS 将这些模仿中断视为传统中断。

l  消息告知中断-消息告知中断 (message-signalled interrupt, MSI) 使用带内消息而不是使用管脚,可在主桥 (host bridge) 中确定中断的地址。(有关主桥 (host bridge) 的更多信息,请参见PCI 局部总线。)MSI 可以将数据与中断消息一起发送。每个 MSI 都不是共享的,这样可以保证指定给某一设备的 MSI 在系统中是唯一的。一个 PCI 函数最多可以请求 32 条 MSI 消息。

l  扩展消息告知中断-扩展消息告知中断 (Extended message-signalled interrupt, MSI-X) 是 MSI 的增强版本。MSI-X 中断具有以下新增的优点:

?  支持 2048 条而不是 32 条消息

?  针对每条消息支持独立的消息地址和消息数据

?  支持按消息屏蔽

?  软件分配的向量少于硬件请求的向量时可具有更大灵活性。软件可以在多个 MSI-X 插槽中重用相同的 MSI-X 地址和数据。

2)         MSI中断方式和MSI-X中断方式的区别

看Broadcom的网卡手册看到这样一句:



MSI Version. This is the Message Signaled Interrupts (MSI) version being used. The option MSI corresponds to the PCI 2.2 specification that supports 32 messages and a single MSI address value. The option MSI-X corresponds to the PCI 3.0 specification that supports 2,048 messages and an independent message address for each message.



总算明白了,实际应用场景中,MSI方式的中断对多核cpu的利用情况不佳,网卡中断全部落在某一个cpu上,即使设置cpu affinity也没有作用,而MSI-X中断方式可以自动在多个cpu上分担中断。

3)         Linux中对网卡中断方式的选择

By default, the driver enables MSI if it is supported by the kernel. It runs an interrupt test during initialization to determine if MSI is working. If the test passes, the driver enables MSI. Otherwise, it uses legacy INTx mode.



可以看出,linux中只要网卡支持MSI中断方式,默认都会开启,开机时会探测是否支持MSI,支持就启用,不支持才会使用INTx模式

4)         如何查看系统中网卡工作的中断方式

cat /proc/interrupts有类似如下信息,就可以看出网卡的中断方式

11265195     211176       PCI-MSI-X  eth0-0

54549       7408668       PCI-MSI-X  eth0-1

5)         如何查看网卡的驱动版本信息

ethtool -i eth0

driver: bnx2

version: 1.9.3

firmware-version: 5.2.2 NCSI 2.0.6

bus-info: 0000:10:00.0

         modinfo bnx2

filename:       /lib/modules/2.6.18-164.el5PAE/kernel/drivers/net/bnx2.ko

version:        1.9.3

license:        GPL

description:    Broadcom NetXtreme II BCM5706/5708/5709/5716 Driver

author:         Michael Chan <mchan@broadcom.com>

srcversion:     D151EAED8C1037CA480DE9A

2.       根据bug文档,网卡中断问题其实于2010-05-06就已经通过升级kernel(升级到kernel-2.6.18-194.3.1以上版本)完成修复了。并且还修复了一个除MSI-X以外的另一个问题。

看一下修得的日志,以下引用redhat修复bug文档(部分)。

Important: kernel security and bug fix update

Advisory:

RHSA-2010:0398-1

Type:

Security Advisory

Severity:

Important

Issued on:

2010-05-06

Last updated on:

2010-05-06

Affected Products:

Red Hat Enterprise Linux (v. 5 server)

Red Hat Enterprise Linux Desktop (v. 5 client)

OVAL:

com.redhat.rhsa-20100398.xml

CVEs (cve.mitre.org):

CVE-2010-0307

CVE-2010-0410

CVE-2010-0730

CVE-2010-1085

CVE-2010-1086


Details





Updated kernel packages that fix multiple security issues and several bugs

are now available for Red Hat Enterprise Linux 5.



The Red Hat Security Response Team has rated this update as having

important security impact. Common Vulnerability Scoring System (CVSS) base

scores, which give detailed severity ratings, are available for each

vulnerability from the CVE links in the References section.

The kernel packages contain the Linux kernel, the core of any Linux

operating system.

。。。。

This update fixes the following security issues:

* in certain circumstances, under heavy load, certain network interface

cards using the bnx2 driver and configured to use MSI-X, could stop

processing interrupts and then network connectivity would cease.

(BZ#587799)



* cnic parts resets could cause a deadlock when the bnx2 device was

enslaved in a bonding device and that device had an associated VLAN.

(BZ#581148)

。。。。。。

Users should upgrade to these updated packages, which contain backported

patches to correct these issues. The system must be rebooted for this

update to take effect.



 楼主| oppo 发表于 2014-11-7 14:09 | 显示全部楼层

3.   以下引用Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet网卡驱动程序作者于2010-04-27的修复补丁文档来查看具体修改了驱动中的哪一部分。

bnx2: Fix lost MSI-X problem on 5709 NICs

Submitter

Michael Chan(此网卡驱动的作者,modinfo bnx2中可以看得到)

Date

2010-04-27 21:28:09

Message ID

<1272403691-2934-1-git-send-email-mchan@broadcom.com>

Comments

Michael Chan - 2010-04-27 21:28:09

It has been reported that under certain heavy traffic conditions in MSI-X

mode, the driver can lose an MSI-X vector causing all packets in the

associated rx/tx ring pair to be dropped.  The problem is caused by

the chip dropping the write to unmask the MSI-X vector by the kernel

(when migrating the IRQ for example).



This can be prevented by increasing the GRC timeout value for these

register read and write operations.



Thanks to Dell for helping us debug this problem.



Signed-off-by: Michael Chan <mchan@broadcom.com>

---

drivers/net/bnx2.c |    6 +++++-

1 files changed, 5 insertions(+), 1 deletions(-)

David Miller - 2010-04-27 21:38:25

From: "Michael Chan" <mchan@broadcom.com>

Date: Tue, 27 Apr 2010 14:28:09 -0700



> It has been reported that under certain heavy traffic conditions in MSI-X

> mode, the driver can lose an MSI-X vector causing all packets in the

> associated rx/tx ring pair to be dropped.  The problem is caused by

> the chip dropping the write to unmask the MSI-X vector by the kernel

> (when migrating the IRQ for example).

>

> This can be prevented by increasing the GRC timeout value for these

> register read and write operations.

>

> Thanks to Dell for helping us debug this problem.

>

> Signed-off-by: Michael Chan <mchan@broadcom.com>



Applied to net-2.6

--

To unsubscribe from this list: send the line "unsubscribe netdev" in

the body of a message to majordomo@vger.kernel.org

More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/bnx2.c b/drivers/net/bnx2.c

index a257bab..4c1e51e 100644

--- a/drivers/net/bnx2.c

+++ b/drivers/net/bnx2.c

@@ -4759,8 +4759,12 @@  bnx2_reset_chip(struct bnx2 *bp, u32 reset_code)

                 rc = bnx2_alloc_bad_rbuf(bp);

         }

-        if (bp->flags & BNX2_FLAG_USING_MSIX)

+        if (bp->flags & BNX2_FLAG_USING_MSIX) {

                 bnx2_setup_msix_tbl(bp);

+                /* Prevent MSIX table reads and write from timing out */

+                REG_WR(bp, BNX2_MISC_ECO_HW_CTL,

+                        BNX2_MISC_ECO_HW_CTL_LARGE_GRC_TMOUT_EN);

+        }

          return rc;

}

4.   经过对redhat5.4的kernel中的网卡驱动部分源码进行确认,确实没有打上此补丁。

5.   现网上之前采用升级到2.0.18c的网卡驱动中,确认驱动源码中也没有打上些补丁,所以现网上把网卡升级到2.0.18c也是无效的。

6.   现redhat对rhel5系列发布的最新修复bug后的kernel为kernel-PAE-2.6.18-194.32.1.el5.i686.rpm,对此kernel的源码进行确认,确认已经打上了补丁。

7.   网卡Broadcom官网上对BCM5709系列网卡的驱动已经更新到bnx2-2.0.23b,通过源码确认,Broadcom官网也已经把此补丁更新到发布的驱动中。


二、      现网处理建议
1.       不建议通过关闭msi中断方式解决,原因请看第一部分中的网卡中断方式的区别,即关闭后,网卡中断全部落在某一个cpu上。如果真想关闭msi中断方式,在加载模块时加上disable_msi=1的参数,并加进系统配置文件。

2.       如果条件请允许(因为升级kernel要重启机器才能生效),建议只升级kernel来解决网卡异常问题,因为升级kernel在解决网卡异常问题时,同时也解决了其中很多的系统bug。而且因为只升级kernel,所以对现网的环境如java,python,gcc等都不会发生改变,请不要直接使用centos源,把系统完全升级,那样的话,系统中python等环境会全部改变,可能会对现网的业务产生冲击。建议使用运维提供的yum升级源(特为此次升级kernel做的内网升级源)进行升级。

3.       如果不具备升级kernel的条件,可以使用编译网卡Broadcom官网发布的最新的驱动将网卡的驱动更新到bnx2-2.0.23b。


三、      现网处理具体步骤
1.      关闭msi中断方式,具体参数配置请参考文档中的附件DOC-26837,但不建议关闭。

2.      使用yum升级kernel

1)         运维会提供现网上主要使用的几个操作系统版本的yum源,把运维提供的repo源文件放到/etc/yum.repos.d/下。

如cp rhel-server-5.4-i386.repo /etc/yum.repos.d/

2)         建议把原来/etc/yum.repos.d/下的其中文件备份并移除。

3)         Yum clean all

yum update kernel

Loaded plugins: fastestmirror

Determining fastest mirrors

Cluster                                       | 1.1 kB     00:00   

ClusterStorage                                 | 1.1 kB     00:00   

Server                                        | 1.1 kB     00:00   

Server/primary                                 | 818 kB     00:00   

Server                                                 2293/2293

VT                                            | 1.1 kB     00:00   

Setting up Update Process

Resolving Dependencies

--> Running transaction check

---> Package kernel-PAE.i686 0:2.6.18-194.32.1.el5 set to be installed

--> Finished Dependency Resolution



Dependencies Resolved



===========================================================================

Package        Arch      Version          Repository          Size

===========================================================================

Installing:

kernel-PAE     i686      2.6.18-194.32.1.el5       Server             17 M



Transaction Summary

===========================================================================

Install      1 Package(s)        

Update       0 Package(s)        

Remove       0 Package(s)        



Total download size: 17 M

Is this ok [y/N]: y

Downloading Packages:

kernel-PAE-2.6.18-194.32.1.el5.i686.rpm                   |  17 MB     00:00   

Running rpm_check_debug

Running Transaction Test

Finished Transaction Test

Transaction Test Succeeded

Running Transaction

  Installing     : kernel-PAE                           1/1

Installed:

  kernel-PAE.i686 0:2.6.18-194.32.1.el5                                                                                            



Complete!

4)         cat /boot/grub/grub.conf

default=0

timeout=5

splashimage=(hd0,0)/boot/grub/splash.xpm.gz

hiddenmenu

title CentOS (2.6.18-194.32.1.el5PAE)

        root (hd0,0)

        kernel /boot/vmlinuz-2.6.18-194.32.1.el5PAE ro root=LABEL=/1 rhgb quiet

        initrd /boot/initrd-2.6.18-194.32.1.el5PAE.img

title CentOS (2.6.18-164.el5PAE)

        root (hd0,0)

        kernel /boot/vmlinuz-2.6.18-164.el5PAE ro root=LABEL=/1 rhgb quiet

        initrd /boot/initrd-2.6.18-164.el5PAE.img

注意查看default是否是配置成最新内核的titile

3.      请下载Broadcom官网发布的最新的驱动

1)         Wget –c http://zh-cn.broadcom.com/docs/d ... II/linux-6.2.23.zip

2)         编译安装

A.        建议使用src包安装

解压并找到linux-6.2.23.zip\Server\Linux\Driver\netxtreme2-6.2.23-1.src.rpm

rpm -ivh netxtreme2-<version>.src.rpm

cd /usr/src/redhat

                         rpmbuild -bb SPECS/netxtreme2.spec

                           编译好的RPM包就在RPMS/<arch>/netxtreme2-<version>.<arch>.rpm

                            找到自己的版本进行安装如

rpm -ivh RPMS/i386/netxtreme2-<version>.i386.rpm

B.        使用tar.gz包进行编译安装

tar xvzf netxtreme2-<version>.tar.gz

make

make install

3)         重装加载网卡模块

rmmod bnx2;modprobe bnx2

注意执行模块重载会中断几秒钟,正常的话,所有的连接都不会中断。

注意事项:

1.         建议使用升级kernel方式解决。

2.         不建议关闭msi中断方式。

3.         如果没有使用官网最新的驱动程序进行编译,只能使用源码编译的方式,而且在编译前要先给bnx2.c打path后,再进行编译安装。
 楼主| oppo 发表于 2014-11-7 14:33 | 显示全部楼层
4,http://blog.sina.com.cn/s/blog_761eecb401015yo1.html关于Red Hat网卡死的现象可以参考下面的文档
** 红色字体就是需要更改的地方 注意哦 改完后要重启机器
5709 NIC stop receiving packets intermittently on RHEL 5.3 and newer?
Article ID: 26837 - Created on: Mar 2, 2010 9:41 PM - Last Modified:  Mar 24, 2011 5:05 PM
Issue
In certain situations under heavy loads, the network interface card can stop accepting packets from remote devices.
This problem has been reported on Red Hat Enterprise Linux 5.3 (RHEL 5.3) and newer when using a Broadcom NetXtreme 5709 network interface card.
Environment
Red Hat Enterprise Linux 5.3 to 5.5
Network Interface Cards (NIC) using the bnx2 driver including:
Broadcom Corporation NetXtreme II BCM5709S Gigabit Ethernet
Resolution
Red Hat has released kernel-2.6.18-194.3.1.el5 which will address this issue in RHEL 5. It can be downloaded from the following link:
https://rhn.redhat.com/errata/RHSA-2010-0398.html
* in certain circumstances, under heavy load, certain network interface cards using the bnx2 driver and configured to use MSI-X, could stop processing interrupts and then network connectivity would cease.  (BZ#587799)
If upgrading the kernel is not an option, review the following workarounds
Disable MSI-X in the bnx2 driver. To do this, add the following line to /etc/modprobe.conf
options bnx2 disable_msi=1
Disable MSI completely by booting with the pci=nomsi boot parameter. Obviously, this will disable MSI on all devices that are able to utilize it.
Note: MSI-X increases network performance so disabling it means that the performance will return to the level available before MSI-X was introduced.
Disable C-States in BIOS. Refer to the vendor system documentation in order to learn how to do this.
Root Cause
The kernel gets out  of sync with interrupts generated by the network  interface card which results in an inability to process interrupts,  causing packets to be dropped and ultimately, lost connectivity.
When this situation  occurs, the rx_fw_discards counter will  keep increasing as remote devices unsuccessfully attempt to  communicate with the system via the NIC.
It has been reported that under certain heavy traffic conditions in MSI-X mode, the bnx2 driver can lose an MSI-X vector causing all packets in the associated rx/tx ring pair to be dropped.  The problem is caused by the chip dropping the write to unmask the MSI-X vector by the kernel (when migrating the IRQ for example).This can be prevented by increasing the GRC timeout value for these register read and write operations.
The upstream patch resolving this issue is available here:
http://git.kernel.org/?p=linux/k ... 558d6d95d8944e56a84
Diagnostic Steps
The kernel gets out  of  sync with regard to the interrupts generated by the network  interface card which prevents the reception of packets on  this network device  which results in no processing of interrupts,  dropped packets and  ultimately, lost connectivity.
When this situation  occurs, the rx_fw_discards counter displayed by the ethtool utility will  keep increasing in value as remote devices unsuccessfully attempt to  communicate with the system via the NIC.
It should be noted  that packets are occasionally dropped by the NIC as part of normal  operation which causes rx_fw_discards to increment, but this does not  necessarily indicate the issue in question has manifested.
The keys to  determining that this specific problem has occurred are:
Confirm that all packets sent to the NIC are dropped by repeatedly using this command:
# ethtool -S eth0 | grep rx_fw_discards
(Replace "eth0" with the interface that appears to be having trouble receiving)

Each time this command is executed, the value returned should increase from the previous run as a result of remote devices attempting to communicate with the NIC in question.  The numbers should increase similar to this:
     rx_fw_discards: 53843
     rx_fw_discards: 55467
     rx_fw_discards: 57071
     rx_fw_discards: 58791
     rx_fw_discards: 60596
     rx_fw_discards: 62481
     rx_fw_discards: 64285
     rx_fw_discards: 66069

Confirm that the number of interrupts processed does not increase  on the IRQs assigned to the NIC by repeatedly using this command:
# grep eth0 /proc/interrupts
(Modify  "eth0" with the name of the interface where trouble is suspected.)

The command should be run while remote devices are attempting to transmit to the failing system. Normally, each counter for the interrupts listed for that interface (e.g. eth0) should increase as packets are received from remote devices. In this situation being described here, the interrupt counter(s) should stop incrementing. In severe cases, the counters for all interrupts can remain constant and then the interface will receive no packets from any remote device.
Typically there is no syslog or dmesg output to indicate the issue has  occurred.
 楼主| oppo 发表于 2014-11-7 15:53 | 显示全部楼层
本帖最后由 oppo 于 2014-11-7 16:20 编辑

http://support.huawei.com/enterp ... tentId=KB1000008298
http://guomt.blog.51cto.com/150883/1205426http://sa.028life.com/209.html#more-209
 楼主| oppo 发表于 2014-11-7 16:20 | 显示全部楼层
http://blog.chinaunix.net/uid-10915175-id-3390864.html
最近用linux做nat多出口时,遇到的几个问题,经过一个星期的资料查找跟具体实践终于给解决了,由于的问题复杂性,加之个人水平十分有限,解决的过程十分的痛苦,为了使更多的人,不痛苦或者少痛苦一点,介绍一下问题解决的过程。

先说说环境

1.硬件:DELL R410

2.网卡:板载1000M BCM5709

2.OS: RHEL 5.5 x86_64

3.KERNEL: 2.6.18-194.el5

所出现的问题

1.网卡毫无征兆的down掉,而且没有任何log信息

2.当流量增大时,不到理论上限的1/3时机器出现网络延迟严重,伴随大量的丢包

3.机器的cpu软中断不均衡,只有1个cpu处理软中断,并且该cpu的软中断周期性的达到100%

4.内外网网卡做nat丢包数据量不一致,差别很大,不在同一个数量级

想必第一个问题,大部分使用bcm网卡,rhel 5.3以后得机器都会遇到这种情况,网上的资料比较的多,我也不多啰嗦了,直接升级网卡驱动就可以解决了。第二,三,四其实是同一个问题都是由于网卡中断过多,cpu处理不过来(准确的说,cpu分配不均衡,导致只有一个cpu处理,处理不过来),引起丢包,那么为什么两个网卡丢包的数量级不一样呢,下面从原理上进行解释,既然是做nat多出口,那么就有大量的路由信息,是一个网络应用,当一个数据包请求nat时,数据包先被网卡驱动的数据接收,网卡收到数据时,触发中断。在中断执行例程中,把skb挂入输入队列,并触发软中断。稍后的某个时刻,当软中断执行时,再从该队列中把skb取下来,投递给上层协议。

如果在这个过程当中cpu没有及时处理完这个队列导致网卡的buffer满了,网卡将直接丢弃该数据包。这里牵涉到2个队列,一个是tx,一个是rx,它的队列的大小默认都是255,可以通过ethtool -g eth0(你指定的网卡),为了防止丢包,当时我通过ethtool -G eth0 rx xxx 把它调大了,但是调大以后,还是杯水车薪啊,通过ethtool -S eth0 |grep rx_fw_discards,发现数值还是不停的在增长,也就是说还在不停的丢包,cpu处理不过来,这时候找到网上有人在利用lvs时也遇到这个问题,cpu软中断分配不均衡,只有一个cpu处理软中断的问题,网上的资料五花八门,有建议使用修改设备中断方式。即通过修改设置中断/proc/irq/${网卡中断号}/smp_affinit这时候,我也修改过,没有什么实质的效果,

从官方的bug报告,https://bugzilla.redhat.com/show_bug.cgi?id=520888,其中提到rhel5.6已经修复了这个bug,这其中也提到目前我们的版本可以升级内核到kernel-2.6.18-194.3.1.el5可以解决这个问题。

红帽子官方修复报告中的说明如下:http://rhn.redhat.com/errata/RHSA-2010-0398.html,我们升级了这个内核算是解决单核处理软中断的问题,升级后各个cpu已经能够平均的分配这个软中断,也不丢包了,那么为什么cpu处理不过来这个软中断呢,数据量并不是特别的大啊,上层应用接到这个数据包后,通过路由协议,找到某个出口给nat出去,找nat出口是需要查找路由表,查询路由表是一件很耗时的工作,而每一个不同源地址,不同目的地址的数据包都得重新查找一次路由表,导致cpu处理不过来,为了提高路由查询的效率。Linux内核引用了路由缓存,用于减少对路由表的查询。Linux的路由缓存是被设计来与协议无关的独立子系统,查看路由缓存可以通过命令route -Cn,由于路由缓存当中是采用hash算法进行才找,它的查找速度非常之快,既然是cache就有超时这一概念。系统默认为10分钟,可以通过这个文件进行查看和修改/proc/sys/net/ipv4/route/secret_interval。而当路由缓存当中未找到或者已经超时的路由信息才开始查找路由表,查询到的结果保存在路由缓存中。如果路由表越大,那么查询的时间就越长,一个新的连接进来后或者是老连接cache超时后,占用大量的cpu查询时间,导致cpu周期性的软中断出现100%,而两个网卡丢包的情况来看不均衡也是因为用户的数据包是经过其中一个网卡进来后查询路由表耗时过长,cpu处理不过来,导致那块网卡的队列满了,丢包严重。当然在路由表变动不大的情况下可以加大cache的时间,修改上述内容后,从我监测的情况来看,扛流量能力得到了大大的提升。

联系我们|手机版|欧卡2中国 ( 湘ICP备11020288号-1 )

GMT+8, 2024-11-25 14:44 , Processed in 0.045595 second(s), 7 queries , Redis On.

Powered by Discuz! X3.4

Copyright © 2001-2023, Tencent Cloud.

快速回复 返回顶部 返回列表