CIFS Issues

NerdInAChair

New Member
Joined
Feb 12, 2025
Messages
9
Reaction score
2
Credits
77
Hardware: Laptop (ROG Strix G733CX_G733CX (1.0))
Distro: CachyOS (Arch based rolling x86-v3 optimized)
Kernel: Linux 6.13.2-2-cachyos
DE: Plasma 6.3

Issue: Downstream transfers from Samba server on NVIDIA Shield always fail if a file is large enough to take a non trivial amount of time. Upstream transfers from client to server never fail. I want to say Kernel branch 6.6.x did not have any issues, so I think it is a kernel issue, but I wanted to see if anyone has any insights. I have the external share mounted in fstab as follows.

//192.168.50.144/NVIDIA-EXT /cifs/nvidia-external cifs credentials=/etc/samba/credentials/nvidia,defaults,noperm,noauto,x-systemd.automount 0 0

The connection is over WiFi. I really can't test over ethernet because I am in a wheelchair and can't easily run a cable for testing.

error message from console looks like:
cp: error reading '<file name here>' Resource temporarily unavailable

I find it odd it only happens in the downstream direction.
 


What is the output of ...
Code:
netstat -i
 
Code:
Kernel Interface table
Iface             MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
enp108s0         1500        0      0      0 0             0      0      0      0 BMU
lo              65536      119      0      0 0           119      0      0      0 LRU
wlan0            1500   633160      0      0 0         23645      0      0      0 BMRU
 
Code:
Kernel Interface table
Iface             MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg
enp108s0         1500        0      0      0 0             0      0      0      0 BMU
lo              65536      119      0      0 0           119      0      0      0 LRU
wlan0            1500   633160      0      0 0         23645      0      0      0 BMRU
This is on the client side of course, I don't think I can shell into the NVIDIA shield
 
This is on the client side of course, I don't think I can shell into the NVIDIA shield

That's OK, I don't see any dropped packets or errors on the wlan. So it's most likely not a network error.
Anything in dmesg?

Code:
sudo dmesg | grep -iE 'warn|err|fail|support|cifs'

It's possible there could be personal info in here, so only share what you think might be pertinent.
 
Last edited:
That's OK, I don't see any dropped packets or errors on the wlan. So it's most likely not a network error.
Anything in dmesg?

Code:
sudo dmesg | grep -iE 'warn|err|fail|support|cifs'

It's possible there could be personal info in here, so only share what you think might be pertinent.
Too much to paste and I'm not sure what to exclude so I'm attaching a file
 

Attachments

I see hundreds of pci bus errors.. Unfortunately it doesn't say exactly what they are.

Code:
[33886.553804] pcieport 10000:e0:1b.4:    [ 0] RxErr                  (First)
[33886.553821] pcieport 10000:e0:1b.4: AER: Correctable error message received from 10000:00:1b.4
[33886.553826] pcieport 10000:e0:1b.4: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
[33886.553827] pcieport 10000:e0:1b.4:   device [8086:7ac4] error status/mask=00000001/00002000
[33886.553828] pcieport 10000:e0:1b.4:    [ 0] RxErr                  (First)
[33886.553920] pcieport 10000:e0:1b.4: AER: Correctable error message received from 10000:00:1b.4
[33886.553925] pcieport 10000:e0:1b.4: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)
[33886.553926] pcieport 10000:e0:1b.4:   device [8086:7ac4] error status/mask=00000001/00002000
[33886.553927] pcieport 10000:e0:1b.4:    [ 0] RxErr                  (First)

But I do see hundreds of
RxErr", which is typically some kind of "receive packet" error. I am inclined to say this is a hardware issue.

Can you give the output of ...
Code:
lspci

I have seen this before, you might try updating your BIOS if possible. Also you can try adding this to your kernel
parameters and see if that helps.

Code:
 pcie_aspm=off
 
I was looking into that part today. No, BIOS update is not an option. The last bios release was quite a while ago (2022 model). I have the latest BIOS. The PCI ID in the message corresponded to "[7ac4] Alder Lake-S PCH PCI Express Root Port." I actually found the pcie_aspm=off setting before you responded and so far that did suppress all those pcieport errors, but had no effect on the CIFS situation.

lspci
Code:
0000:00:00.0 Host bridge: Intel Corporation Device 4637 (rev 02)
0000:00:01.0 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x16 Controller #1 (rev 02)
0000:00:02.0 Display controller: Intel Corporation Alder Lake-HX GT1 [UHD Graphics 770] (rev 0c)
0000:00:04.0 Signal processing controller: Intel Corporation Alder Lake Innovation Platform Framework Processor Participant (rev 02)
0000:00:08.0 System peripheral: Intel Corporation 12th Gen Core Processor Gaussian & Neural Accelerator (rev 02)
0000:00:0e.0 RAID bus controller: Intel Corporation Volume Management Device NVMe RAID Controller
0000:00:14.0 USB controller: Intel Corporation Alder Lake-S PCH USB 3.2 Gen 2x2 XHCI Controller (rev 11)
0000:00:14.2 RAM memory: Intel Corporation Alder Lake-S PCH Shared SRAM (rev 11)
0000:00:14.3 Network controller: Intel Corporation Alder Lake-S PCH CNVi WiFi (rev 11)
0000:00:15.0 Serial bus controller: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #0 (rev 11)
0000:00:15.3 Serial bus controller: Intel Corporation Alder Lake-S PCH Serial IO I2C Controller #3 (rev 11)
0000:00:16.0 Communication controller: Intel Corporation Alder Lake-S PCH HECI Controller #1 (rev 11)
0000:00:1a.0 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #25 (rev 11)
0000:00:1c.0 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #6 (rev 11)
0000:00:1f.0 ISA bridge: Intel Corporation Device 7a8c (rev 11)
0000:00:1f.3 Audio device: Intel Corporation Alder Lake-S HD Audio Controller (rev 11)
0000:00:1f.4 SMBus: Intel Corporation Alder Lake-S PCH SMBus Controller (rev 11)
0000:00:1f.5 Serial bus controller: Intel Corporation Alder Lake-S PCH SPI Controller (rev 11)
0000:01:00.0 VGA compatible controller: NVIDIA Corporation GA103M [GeForce RTX 3080 Ti Mobile] (rev a1)
0000:01:00.1 Audio device: NVIDIA Corporation Device 2288 (rev a1)
0000:02:00.0 PCI bridge: Intel Corporation Device 1133 (rev 02)
0000:03:00.0 PCI bridge: Intel Corporation Device 1133 (rev 02)
0000:03:01.0 PCI bridge: Intel Corporation Device 1133 (rev 02)
0000:03:02.0 PCI bridge: Intel Corporation Device 1133 (rev 02)
0000:03:03.0 PCI bridge: Intel Corporation Device 1133 (rev 02)
0000:04:00.0 USB controller: Intel Corporation Device 1134
0000:38:00.0 USB controller: Intel Corporation Device 1135
0000:6c:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
10000:e0:1b.0 System peripheral: Intel Corporation RST VMD Managed Controller
10000:e0:1b.4 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #21 (rev 11)
10000:e0:1d.0 System peripheral: Intel Corporation RST VMD Managed Controller
10000:e0:1d.4 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #13 (rev 11)
10000:e1:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
10000:e2:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO

Attached is the new output of dmesg
 

Attachments

Also, one of the times I tried the CIFS copy it locked up the entire system. Desktop froze and I couldn't even drop to a VT to issue a reboot I had to hard power off.
 
Do do I see the culprit in your lspci output.
Code:
10000:e0:1b.4 PCI bridge: Intel Corporation Alder Lake-S PCH PCI Express Root Port #21 (rev 11)

This matches your dmesg error.
Code:
[33886.553828] pcieport 10000:e0:1b.4:    [ 0] RxErr                  (First)

This appears to be a hardware PCI bridge on the motherboard. Not much you can do about that.
Linux can't fix hardware.
 
As I said, I had already found that. Adding the boot option pcie_aspm=off does get rid of those errors. They are unrelated to the CIFS issue. Asus firmware is not standards compliant so some things just aren't going to be compatible.
 
Those errors are related to active state power management (aspm) not being supported correctly in firmware. That is why adding the option to disable aspm got rid of those errors

I am no longer getting the rxErrs with that option in place. It does mean that my laptop will use more power when idle.

When those errors were occurring they were happening when the system was attempting to change power state and not during the CIFS transfer.
 
When those errors were occurring they were happening when the system was attempting to change power state and not during the CIFS transfer.

If you say so, but looking at your dmesg file.

[ 127.803396] pcieport 10000:e0:1b.4: AER: Multiple Correctable error message received from 10000:00:1b.4
[35832.194354] pcieport 10000:e0:1b.4: device [8086:7ac4] error status/mask=00000001/00002000

That's 9 hours and 50 mins of power state change. Over 9 hours of rxErr.
 
I just retested a newly compiled kernel 6.6. I do not have the CIFS issue on that kernel.
Windows does not have an issue with CIFS with the same hardware (dual boot)
I'm not not arguing that those errors aren't concerning, but they show up in kernel 6.6 where I don't have a CIFS issue.

Over that 9 hours, I have no idea, because I was asleep for a good portion of that, so the machine wasn't doing much.

So there is two scemarios here. My motherboard is dying a slow death that somehow only manifests as a CIFS issue on the two current kernel branches.

Or my motherboard is dying a slow death and there also have been some samba/cifs regressions somewhere between 6.6 and 6.12. I mean, I should also note that the most recent kernel release has CIFS changes in the changelog.

Who knows, I just find the latter more likely than the former.
 



Top