I have a system with an AMD Ryzen 5 7600 CPU and an Sapphire Pulse AMD RX 7600 GPU. The system runs Pop_OS 22.04 with kernel 6.9.3-76060903-generic. The motherboard is a Gigabyte A620I AX with BIOS version F31b.
Intermittently, during bootup, Pop_OS fails to initialize the GPU. Visually, the system either seems to hang at the motherboard manufacturer's branded screen with the BIOS and boot menu hotkeys, or the screen remains black with no HDMI video output. Pop_OS continues to load, though, and I can SSH into it.
This issue is similar to, but not exactly the same as, the one found here a few years ago. In post #5 of that post, the OP seemed to be missing some firmware, but that firmware is present on my system:
I'll emphasize that the issue is intermittent. Four-fifths of bootup attempts are ordinary and successful; the system boots, I see the login screen, I log in, and get 65 fps in Cyberpunk 2077 with the help of Proton on Steam. At idle, the system draws about 70 watts of electricity. But when the system fails to initialize the gpu, its idle draw with a black screen (or the motherboard manufacturer's post screen) is two hundred watts!
This is dmesg after a successful bootup:
This is diagnostic info gathered after a failed bootup:
I'm quite unsure where to go from here, so I'd be grateful for any help y'all can offer
Intermittently, during bootup, Pop_OS fails to initialize the GPU. Visually, the system either seems to hang at the motherboard manufacturer's branded screen with the BIOS and boot menu hotkeys, or the screen remains black with no HDMI video output. Pop_OS continues to load, though, and I can SSH into it.
This issue is similar to, but not exactly the same as, the one found here a few years ago. In post #5 of that post, the OP seemed to be missing some firmware, but that firmware is present on my system:
Code:
root@pop-os:~# ls -l /lib/firmware/amdgpu/navy_flounder_sos.bin
-rw-r--r-- 1 root root 218608 Jun 11 03:41 /lib/firmware/amdgpu/navy_flounder_sos.bin
I'll emphasize that the issue is intermittent. Four-fifths of bootup attempts are ordinary and successful; the system boots, I see the login screen, I log in, and get 65 fps in Cyberpunk 2077 with the help of Proton on Steam. At idle, the system draws about 70 watts of electricity. But when the system fails to initialize the gpu, its idle draw with a black screen (or the motherboard manufacturer's post screen) is two hundred watts!
This is dmesg after a successful bootup:
Code:
root@pop-os:~# dmesg | grep amdgpu
[ 6.526511] [drm] amdgpu kernel modesetting enabled.
[ 6.526619] amdgpu: Virtual CRAT table created for CPU
[ 6.526630] amdgpu: Topology: Add CPU node
[ 6.526732] amdgpu 0000:03:00.0: enabling device (0000 -> 0003)
[ 6.562906] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from ROM BAR
[ 6.562908] amdgpu: ATOM BIOS: 113-4481LHS-UC1
[ 6.563640] amdgpu 0000:03:00.0: amdgpu: CP RS64 enable
[ 6.564501] amdgpu 0000:03:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode
[ 6.564942] amdgpu 0000:03:00.0: vgaarb: deactivate vga console
[ 6.564944] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 6.564991] amdgpu 0000:03:00.0: amdgpu: VRAM: 8176M 0x0000008000000000 - 0x00000081FEFFFFFF (8176M used)
[ 6.564993] amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
[ 6.565075] [drm] amdgpu: 8176M of VRAM memory ready
[ 6.565076] [drm] amdgpu: 7799M of GTT memory ready.
[ 6.565599] amdgpu 0000:03:00.0: amdgpu: Will use PSP to load VCN firmware
[ 6.622572] amdgpu 0000:03:00.0: amdgpu: reserve 0x1300000 from 0x81fc000000 for PSP TMR
[ 6.716471] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 6.723981] amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 6.723983] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ 6.724019] amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x00000035, smu fw if version = 0x00000040, smu fw program = 0, smu fw version = 0x00525b00 (82.91.0)
[ 6.724021] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
[ 6.796725] amdgpu 0000:03:00.0: amdgpu: SMU is initialized successfully!
[ 6.853594] amdgpu 0000:03:00.0: [drm:jpeg_v4_0_hw_init [amdgpu]] JPEG decode initialized successfully.
[ 6.888020] amdgpu: HMM registered 8176MB device memory
[ 6.888873] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[ 6.888884] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
[ 6.888909] amdgpu: Virtual CRAT table created for GPU
[ 6.889006] amdgpu: Topology: Add dGPU node [0x7480:0x1002]
[ 6.889008] kfd kfd: amdgpu: added device 1002:7480
[ 6.889021] amdgpu 0000:03:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 8, active_cu_number 32
[ 6.889025] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 6.889027] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 6.889028] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 6.889029] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
[ 6.889030] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
[ 6.889031] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
[ 6.889033] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
[ 6.889034] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
[ 6.889035] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
[ 6.889036] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 6.889037] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[ 6.889038] amdgpu 0000:03:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
[ 6.889040] amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
[ 6.889041] amdgpu 0000:03:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 14 on hub 0
[ 6.892603] amdgpu 0000:03:00.0: amdgpu: Using BACO for runtime pm
[ 6.893087] [drm] Initialized amdgpu 3.57.0 20150101 for 0000:03:00.0 on minor 0
[ 6.899907] fbcon: amdgpudrmfb (fb0) is primary device
[ 6.899911] amdgpu 0000:03:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[ 8.213974] snd_hda_intel 0000:03:00.1: bound 0000:03:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
This is diagnostic info gathered after a failed bootup:
Code:
root@pop-os:~# lshw -c video
*-display UNCLAIMED
description: VGA compatible controller
product: Advanced Micro Devices, Inc. [AMD/ATI]
vendor: Advanced Micro Devices, Inc. [AMD/ATI]
physical id: 0
bus info: pci@0000:03:00.0
version: cf
width: 64 bits
clock: 33MHz
capabilities: pm pciexpress msi vga_controller bus_master cap_list
configuration: latency=0
resources: iomemory:fa0-f9f iomemory:fc0-fbf memory:fa00000000-fbffffffff memory:fc00000000-fc0fffffff ioport:f000(size=256) memory:f6b00000-f6bfffff memory:f6c00000-f6c1ffff
Code:
root@pop-os:~# lsmod | grep amd
amd_atl 53248 1
edac_mce_amd 28672 0
kvm_amd 208896 0
kvm 1417216 1 kvm_amd
ccp 155648 1 kvm_amd
amdgpu 17563648 0
amdxcp 12288 1 amdgpu
drm_exec 12288 1 amdgpu
gpu_sched 61440 1 amdgpu
drm_buddy 20480 1 amdgpu
i2c_algo_bit 16384 1 amdgpu
drm_suballoc_helper 16384 1 amdgpu
drm_ttm_helper 12288 1 amdgpu
ttm 110592 2 amdgpu,drm_ttm_helper
drm_display_helper 266240 1 amdgpu
video 73728 1 amdgpu
Code:
root@pop-os:~# dmesg | grep -i amdgpu
[ 6.405158] [drm] amdgpu kernel modesetting enabled.
[ 6.405272] amdgpu: Virtual CRAT table created for CPU
[ 6.405287] amdgpu: Topology: Add CPU node
[ 6.405381] amdgpu 0000:03:00.0: enabling device (0006 -> 0007)
[ 6.409253] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from VFCT
[ 6.409256] amdgpu: ATOM BIOS: 113-4481LHS-UC1
[ 6.409997] amdgpu 0000:03:00.0: amdgpu: CP RS64 enable
[ 6.410843] amdgpu 0000:03:00.0: [drm:jpeg_v4_0_early_init [amdgpu]] JPEG decode is enabled in VM mode
[ 6.436090] amdgpu 0000:03:00.0: vgaarb: deactivate vga console
[ 6.436094] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[ 6.436137] amdgpu 0000:03:00.0: amdgpu: VRAM: 8176M 0x0000008000000000 - 0x00000081FEFFFFFF (8176M used)
[ 6.436139] amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
[ 6.436222] [drm] amdgpu: 8176M of VRAM memory ready
[ 6.436223] [drm] amdgpu: 7799M of GTT memory ready.
[ 6.436740] amdgpu 0000:03:00.0: amdgpu: Will use PSP to load VCN firmware
[ 6.493479] amdgpu 0000:03:00.0: amdgpu: reserve 0x1300000 from 0x81fc000000 for PSP TMR
[ 6.587687] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 6.595203] amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 6.595205] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ 6.595233] amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x00000035, smu fw if version = 0x00000040, smu fw program = 0, smu fw version = 0x00525b00 (82.91.0)
[ 6.595235] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
[ 6.684669] amdgpu 0000:03:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:6 param:0x00000000 message:EnableAllSmuFeatures?
[ 6.684672] amdgpu 0000:03:00.0: amdgpu: Failed to enable requested dpm features!
[ 6.684673] amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw!
[ 6.684674] [drm:amdgpu_device_ip_init [amdgpu]] *ERROR* hw_init of IP block <smu> failed -121
[ 6.684812] amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_init failed
[ 6.684813] amdgpu 0000:03:00.0: amdgpu: Fatal error during GPU init
[ 6.684815] amdgpu 0000:03:00.0: amdgpu: amdgpu: finishing device.
[ 6.684888] WARNING: CPU: 1 PID: 188 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:622 amdgpu_irq_put+0x9f/0xb0 [amdgpu]
[ 6.685043] Modules linked in: hid_logitech_dj(+) hid_generic usbhid hid amdgpu(+) amdxcp drm_exec gpu_sched drm_buddy crct10dif_pclmul i2c_algo_bit crc32_pclmul drm_suballoc_helper polyval_clmulni drm_ttm_helper polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 ttm nvme drm_display_helper ahci i2c_piix4 xhci_pci nvme_core r8169 libahci cec xhci_pci_renesas realtek nvme_auth rc_core video wmi aesni_intel crypto_simd cryptd
[ 6.685071] RIP: 0010:amdgpu_irq_put+0x9f/0xb0 [amdgpu]
[ 6.685228] ? amdgpu_irq_put+0x9f/0xb0 [amdgpu]
[ 6.685372] ? amdgpu_irq_put+0x9f/0xb0 [amdgpu]
[ 6.685499] amdgpu_fence_driver_hw_fini+0x11f/0x170 [amdgpu]
[ 6.685632] amdgpu_device_fini_hw+0xb3/0x250 [amdgpu]
[ 6.685761] amdgpu_driver_unload_kms+0x4b/0x70 [amdgpu]
[ 6.685888] amdgpu_driver_load_kms+0xf9/0x1c0 [amdgpu]
[ 6.686014] amdgpu_pci_probe+0x1bb/0x5d0 [amdgpu]
[ 6.686171] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[ 6.686298] amdgpu_init+0x69/0xff0 [amdgpu]
[ 6.686539] WARNING: CPU: 1 PID: 188 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:622 amdgpu_irq_put+0x9f/0xb0 [amdgpu]
[ 6.686679] Modules linked in: hid_logitech_dj(+) hid_generic usbhid hid amdgpu(+) amdxcp drm_exec gpu_sched drm_buddy crct10dif_pclmul i2c_algo_bit crc32_pclmul drm_suballoc_helper polyval_clmulni drm_ttm_helper polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 ttm nvme drm_display_helper ahci i2c_piix4 xhci_pci nvme_core r8169 libahci cec xhci_pci_renesas realtek nvme_auth rc_core video wmi aesni_intel crypto_simd cryptd
[ 6.686704] RIP: 0010:amdgpu_irq_put+0x9f/0xb0 [amdgpu]
[ 6.686849] ? amdgpu_irq_put+0x9f/0xb0 [amdgpu]
[ 6.686989] ? amdgpu_irq_put+0x9f/0xb0 [amdgpu]
[ 6.687113] amdgpu_fence_driver_hw_fini+0x11f/0x170 [amdgpu]
[ 6.687247] amdgpu_device_fini_hw+0xb3/0x250 [amdgpu]
[ 6.687373] amdgpu_driver_unload_kms+0x4b/0x70 [amdgpu]
[ 6.687494] amdgpu_driver_load_kms+0xf9/0x1c0 [amdgpu]
[ 6.687614] amdgpu_pci_probe+0x1bb/0x5d0 [amdgpu]
[ 6.687763] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[ 6.687885] amdgpu_init+0x69/0xff0 [amdgpu]
I'm quite unsure where to go from here, so I'd be grateful for any help y'all can offer