Ethernet card is down

soolan

Member
Joined
Jan 13, 2023
Messages
38
Reaction score
4
Credits
375
Hello

OS : oracle linux 8.6
my server suddenly lost connection, restarting the server fixes the problem. What could be causing the problem? happened twice


What else would you recommend I check?
 
Last edited:


tinfoil-hat

Active Member
Joined
Oct 24, 2021
Messages
249
Reaction score
123
Credits
1,753
Do you have free space on your root / boot dir?
 

tinfoil-hat

Active Member
Joined
Oct 24, 2021
Messages
249
Reaction score
123
Credits
1,753
How's the RAM usage? Maybe it runs full
 

osprey

Well-Known Member
Joined
Apr 15, 2022
Messages
919
Reaction score
862
Credits
8,724
CRITICAL: Broadcom P210tep NetXtreme-E Dual-port 10GBASE-T Ethernet PCIe Adapter Connectivity status changed to Link Failure for adapter in slot
Maybe hardware issue ... heat being a possible consideration. How's the air flow around that 10G ethernet card? How's the dust environment?
 
Last edited:
OP
S

soolan

Member
Joined
Jan 13, 2023
Messages
38
Reaction score
4
Credits
375
Maybe hardware issue ... heat being a possible consideration. How's the air flow around that 10G ethernet card? How's the dust environment?
I know when it made a mistake. Can I control the temperature value via ILO?
 

dos2unix

Well-Known Member
Joined
May 3, 2019
Messages
1,916
Reaction score
1,515
Credits
13,184
Can I control the temperature value via ILO?

I've never seen an ILO where that was an option.

Can you install lm_sensors ?
Then run sensors -f

Most NICs don't give temperature for themselves, but at least
it will give you a ball-park of if everything else is running hot on that system.
Are most of the other PCI devices running hot?

Of course it's also possible, the problem is on the remote switch.
Do you have more than one NIC? Can you try another one on that same port?
 
OP
S

soolan

Member
Joined
Jan 13, 2023
Messages
38
Reaction score
4
Credits
375
I've never seen an ILO where that was an option.

Can you install lm_sensors ?
Then run sensors -f

Most NICs don't give temperature for themselves, but at least
it will give you a ball-park of if everything else is running hot on that system.
Are most of the other PCI devices running hot?

Of course it's also possible, the problem is on the remote switch.
Do you have more than one NIC? Can you try another one on that same port?
Code:
bnxt_en-pci-1001
Adapter: PCI adapter
temp1:       +174.2°F

coretemp-isa-0001
Adapter: ISA adapter
Package id 1: +138.2°F  (high = +199.4°F, crit = +217.4°F)
Core 0:       +127.4°F  (high = +199.4°F, crit = +217.4°F)
Core 1:       +123.8°F  (high = +199.4°F, crit = +217.4°F)
Core 2:       +122.0°F  (high = +199.4°F, crit = +217.4°F)
Core 3:       +125.6°F  (high = +199.4°F, crit = +217.4°F)
Core 4:       +122.0°F  (high = +199.4°F, crit = +217.4°F)
Core 5:       +120.2°F  (high = +199.4°F, crit = +217.4°F)
Core 6:       +118.4°F  (high = +199.4°F, crit = +217.4°F)
Core 7:       +138.2°F  (high = +199.4°F, crit = +217.4°F)

i350bb-pci-4800
Adapter: PCI adapter
loc1:        +129.2°F  (high = +248.0°F, crit = +230.0°F)

bnxt_en-pci-1000
Adapter: PCI adapter
temp1:       +174.2°F

coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +145.4°F  (high = +199.4°F, crit = +217.4°F)
Core 0:       +129.2°F  (high = +199.4°F, crit = +217.4°F)
Core 1:       +125.6°F  (high = +199.4°F, crit = +217.4°F)
Core 2:       +132.8°F  (high = +199.4°F, crit = +217.4°F)
Core 3:       +131.0°F  (high = +199.4°F, crit = +217.4°F)
Core 4:       +127.4°F  (high = +199.4°F, crit = +217.4°F)
Core 5:       +125.6°F  (high = +199.4°F, crit = +217.4°F)
Core 6:       +145.4°F  (high = +199.4°F, crit = +217.4°F)
Core 7:       +145.4°F  (high = +199.4°F, crit = +217.4°F)

power_meter-acpi-0
Adapter: ACPI interface
power1:        0.00 W  (interval = 300.00 s)
Yes i have another nic,
 
OP
S

soolan

Member
Joined
Jan 13, 2023
Messages
38
Reaction score
4
Credits
375
I encountered this problem again. When I checked the server through iLO, the network card was shown as down. It was resolved after I restarted it
Which logs would be helpful for us to find a solution? Can you translate this into English as well?
 
OP
S

soolan

Member
Joined
Jan 13, 2023
Messages
38
Reaction score
4
Credits
375
I constantly see this in var/log/message on this server, but I don't see these on my other servers.

Code:
 systemd[20382]: Stopped target Timers.
 systemd[20382]: Closed D-Bus User Message Bus Socket.
 systemd[20382]: Stopped target Paths.
 systemd[20382]: Reached target Shutdown.
 systemd[20382]: Started Exit the Session.
 systemd[20382]: Reached target Exit the Session.
 systemd[1]: [email protected]: Succeeded.
 systemd[1]: Stopped User Manager for UID 0.
 systemd[1]: Stopping User runtime directory /run/user/0...
 systemd[1]: run-user-0.mount: Succeeded.
 systemd[1]: [email protected]: Succeeded.
 systemd[1]: Stopped User runtime directory /run/user/0.
 systemd[1]: Removed slice User Slice of UID 0.
 crond[20288]: postdrop: warning: unable to look up public/pickup: No such file or directory
 systemd[1]: session-202.scope: Succeeded.
 systemd[1]: Stopping User Manager for UID 1008...
 systemd[20229]: Stopping D-Bus User Message Bus...
 systemd[20229]: Stopped target Default.
 systemd[20229]: Stopped D-Bus User Message Bus.
 systemd[20229]: Stopped target Basic System.
 systemd[20229]: Stopped target Timers.
 systemd[20229]: Stopped Mark boot as successful after the user session has run 2 minutes.
 systemd[20229]: Stopped target Paths.
 systemd[20229]: Stopped target Sockets.
 systemd[20229]: Closed Sound System.
 systemd[20229]: Closed Multimedia System.
 systemd[20229]: Closed D-Bus User Message Bus Socket.
 systemd[20229]: Reached target Shutdown.
 systemd[20229]: Started Exit the Session.
 systemd[20229]: Reached target Exit the Session.
 systemd[1]: [email protected]: Succeeded.
 systemd[1]: Stopped User Manager for UID 1008.
 systemd[1]: Stopping User runtime directory /run/user/1008...
 systemd[1]: run-user-1008.mount: Succeeded.
 systemd[1]: [email protected]: Succeeded.
 systemd[1]: Stopped User runtime directory /run/user/1008.
 systemd[1]: Removed slice User Slice of UID 1008.
 
OP
S

soolan

Member
Joined
Jan 13, 2023
Messages
38
Reaction score
4
Credits
375
Secondly, when my Ethernet card is down, log messages.

Code:
 kernel: bnxt_en 0000:10:00.0 ens1f0np0: NIC Link is Down
 smad[1712]: [INFO  ]: AgentX trap received
 smad[1712]: [NOTICE]: AgentX trap CPQNIC (.1.3.6.1.6.3.1.1.4.1.0:.1.3.6.1.4.1.232.0.18012)
 smad[1712]: [NOTICE]: IML received: 171 bytes
 smad[1712]: [ALERT ]: CRITICAL: Broadcom P210tep NetXtreme-E Dual-port 10GBASE-T Ethernet PCIe Adapter Connectivity status changed to Link Failure for adapter in slot 1, port 1
 smad[1712]: [INFO  ]: Log the IML info to syslog
 NetworkManager[1417]: <info>  [1685482012.4493] device (ens1f0np0): carrier: link connected
 smad[1712]: [INFO  ]: AgentX trap received
 smad[1712]: [NOTICE]: AgentX trap CPQNIC (.1.3.6.1.6.3.1.1.4.1.0:.1.3.6.1.4.1.232.0.18011)
 kernel: bnxt_en 0000:10:00.0 ens1f0np0: NIC Link is Up, 1000 Mbps full duplex, Flow control: ON - receive
 kernel: bnxt_en 0000:10:00.0 ens1f0np0: EEE is not active
 kernel: bnxt_en 0000:10:00.0 ens1f0np0: FEC autoneg off encodings: None
 smad[1712]: [NOTICE]: IML received: 177 bytes
 smad[1712]: [ALERT ]: NOTICE: Broadcom P210tep NetXtreme-E Dual-port 10GBASE-T Ethernet PCIe Adapter Connectivity status changed to OK for adapter in slot 1, port 1 has been repaired
 smad[1712]: [INFO  ]: Log the IML info to syslog
 kernel: bnxt_en 0000:10:00.0 ens1f0np0: NIC Link is Down
 smad[1712]: [INFO  ]: AgentX trap received
 smad[1712]: [NOTICE]: AgentX trap CPQNIC (.1.3.6.1.6.3.1.1.4.1.0:.1.3.6.1.4.1.232.0.18012)
 smad[1712]: [NOTICE]: IML received: 171 bytes
 smad[1712]: [ALERT ]: CRITICAL: Broadcom P210tep NetXtreme-E Dual-port 10GBASE-T Ethernet PCIe Adapter Connectivity status changed to Link Failure for adapter in slot 1, port 1
 smad[1712]: [INFO  ]: Log the IML info to syslog
 smad[1712]: [NOTICE]: IML received: 138 bytes
 smad[1712]: [ALERT ]: CRITICAL:  All links are down in adapter Broadcom P210tep NetXtreme-E Dual-port 10GBASE-T Ethernet PCIe Adapter in slot 1
 smad[1712]: [INFO  ]: Log the IML info to syslog
 NetworkManager[1417]: <info>  [1685482015.9492] device (ens1f0np0): carrier: link connected
 smad[1712]: [INFO  ]: AgentX trap received
 smad[1712]: [NOTICE]: AgentX trap CPQNIC (.1.3.6.1.6.3.1.1.4.1.0:.1.3.6.1.4.1.232.0.18011)
 kernel: bnxt_en 0000:10:00.0 ens1f0np0: NIC Link is Up, 1000 Mbps full duplex, Flow control: ON - receive
 kernel: bnxt_en 0000:10:00.0 ens1f0np0: EEE is not active
 

osprey

Well-Known Member
Joined
Apr 15, 2022
Messages
919
Reaction score
862
Credits
8,724
My suspicions about hardware are aroused by the messages:
Code:
... Link Failure ...
... has been repaired ...
... Link Failure ...

It fits the scenario of a contact failing, say from heat expansion which removes contact ("Link Failure", then cooling allowing the contact to be made ("been repaired") then heat again losing contact. Just a theory.
 
OP
S

soolan

Member
Joined
Jan 13, 2023
Messages
38
Reaction score
4
Credits
375
My suspicions about hardware are aroused by the messages:
Code:
... Link Failure ...
... has been repaired ...
... Link Failure ...

It fits the scenario of a contact failing, say from heat expansion which removes contact ("Link Failure", then cooling allowing the contact to be made ("been repaired") then heat again losing contact. Just a theory.
When I checked, the temperature was normal
 

f33dm3bits

Gold Member
Gold Supporter
Joined
Dec 11, 2019
Messages
6,259
Reaction score
4,732
Credits
45,987
Is this your personal server?
 

Staff online

Members online


Latest posts

Top