NVIDIAドライバによるOpensuse Tumbleweedシステムのハードロック

NVIDIAドライバによるOpensuse Tumbleweedシステムのハードロック

最近のアップデートの後、私のラップトップは起動後数時間以内にランダムにクラッシュが発生し始めました。競合が発生すると、最後の画像がモニターに残りますが、コンピュータが完全に応答しなくなります(番号ロックインジケータは更新されません)。私はOpensuse Tumbleweed、カーネル5.12.2-1、Nvidiaドライバ460.73.01、Quadro M620モバイルGPU、i7-7700HQを含むThinkpad P71を使用しています。

無関係な質問に対して、ネットワークインタフェースは数秒ごとに絶えず上がります。競合が発生する開始では、ネットワークインターフェイス例外に関連するエントリを除いて、競合が発生する数分の間、jourenctlエントリはありません。 NetworkManagerが管理する公式ドッキングステーションを介して内部イーサネットカードを使用します。以下は、衝突前のJournalctlの例です。数時間ログで同じ内容が繰り返されることを確認してください。

May 12 07:10:31 thiccboii nscd[1153]: 1153 checking for monitored file `/etc/services': No such file or directory
May 12 07:10:32 thiccboii NetworkManager[1332]: <info>  [1620828632.1084] device (enp0s31f6): carrier: link connected
May 12 07:10:32 thiccboii NetworkManager[1332]: <info>  [1620828632.1086] device (enp0s31f6): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
May 12 07:10:32 thiccboii NetworkManager[1332]: <info>  [1620828632.1094] policy: auto-activating connection 'Home Ethernet' (e01921b8-0157-3627-bf0c-bbda6a033ae9)
May 12 07:10:32 thiccboii NetworkManager[1332]: <info>  [1620828632.1099] device (enp0s31f6): Activation: starting connection 'Home Ethernet' (e01921b8-0157-3627-bf0c-bbda6a033ae9)
May 12 07:10:32 thiccboii NetworkManager[1332]: <info>  [1620828632.1101] device (enp0s31f6): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
May 12 07:10:32 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
May 12 07:10:32 thiccboii NetworkManager[1332]: <info>  [1620828632.2154] device (enp0s31f6): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
May 12 07:10:32 thiccboii NetworkManager[1332]: <info>  [1620828632.2221] device (enp0s31f6): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
May 12 07:10:32 thiccboii NetworkManager[1332]: <info>  [1620828632.2225] dhcp4 (enp0s31f6): activation: beginning transaction (timeout in 45 seconds)
May 12 07:10:32 thiccboii NetworkManager[1332]: <info>  [1620828632.2238] dhcp4 (enp0s31f6): dhclient started with pid 27289
May 12 07:10:38 thiccboii NetworkManager[1332]: <info>  [1620828638.2172] device (enp0s31f6): state change: ip-config -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
May 12 07:10:38 thiccboii NetworkManager[1332]: <info>  [1620828638.2497] dhcp4 (enp0s31f6): canceled DHCP transaction, DHCP client pid 27289
May 12 07:10:38 thiccboii NetworkManager[1332]: <info>  [1620828638.2497] dhcp4 (enp0s31f6): state changed unknown -> terminated
May 12 07:10:39 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Half Duplex, Flow Control: None
May 12 07:10:39 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
May 12 07:10:43 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
May 12 07:10:43 thiccboii NetworkManager[1332]: <info>  [1620828643.8755] device (enp0s31f6): carrier: link connected
May 12 07:10:43 thiccboii NetworkManager[1332]: <info>  [1620828643.8758] device (enp0s31f6): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
May 12 07:10:43 thiccboii NetworkManager[1332]: <info>  [1620828643.8765] policy: auto-activating connection 'Home Ethernet' (e01921b8-0157-3627-bf0c-bbda6a033ae9)
May 12 07:10:43 thiccboii NetworkManager[1332]: <info>  [1620828643.8770] device (enp0s31f6): Activation: starting connection 'Home Ethernet' (e01921b8-0157-3627-bf0c-bbda6a033ae9)
May 12 07:10:43 thiccboii NetworkManager[1332]: <info>  [1620828643.8771] device (enp0s31f6): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
May 12 07:10:43 thiccboii NetworkManager[1332]: <info>  [1620828643.9856] device (enp0s31f6): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
May 12 07:10:43 thiccboii NetworkManager[1332]: <info>  [1620828643.9943] device (enp0s31f6): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
May 12 07:10:43 thiccboii NetworkManager[1332]: <info>  [1620828643.9946] dhcp4 (enp0s31f6): activation: beginning transaction (timeout in 45 seconds)
May 12 07:10:43 thiccboii NetworkManager[1332]: <info>  [1620828643.9959] dhcp4 (enp0s31f6): dhclient started with pid 27301
May 12 07:10:49 thiccboii NetworkManager[1332]: <info>  [1620828649.9870] device (enp0s31f6): state change: ip-config -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
May 12 07:10:50 thiccboii NetworkManager[1332]: <info>  [1620828650.0196] dhcp4 (enp0s31f6): canceled DHCP transaction, DHCP client pid 27301
May 12 07:10:50 thiccboii NetworkManager[1332]: <info>  [1620828650.0196] dhcp4 (enp0s31f6): state changed unknown -> terminated
May 12 07:10:50 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Half Duplex, Flow Control: None
May 12 07:10:50 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
May 12 07:10:52 thiccboii nscd[1153]: 1153 checking for monitored file `/etc/services': No such file or directory
May 12 07:10:55 thiccboii NetworkManager[1332]: <info>  [1620828655.5960] device (enp0s31f6): carrier: link connected
May 12 07:10:55 thiccboii NetworkManager[1332]: <info>  [1620828655.5964] device (enp0s31f6): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
May 12 07:10:55 thiccboii NetworkManager[1332]: <info>  [1620828655.5971] policy: auto-activating connection 'Home Ethernet' (e01921b8-0157-3627-bf0c-bbda6a033ae9)
May 12 07:10:55 thiccboii NetworkManager[1332]: <info>  [1620828655.5976] device (enp0s31f6): Activation: starting connection 'Home Ethernet' (e01921b8-0157-3627-bf0c-bbda6a033ae9)
May 12 07:10:55 thiccboii NetworkManager[1332]: <info>  [1620828655.5978] device (enp0s31f6): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
May 12 07:10:55 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
May 12 07:10:55 thiccboii NetworkManager[1332]: <info>  [1620828655.7096] device (enp0s31f6): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
May 12 07:10:55 thiccboii NetworkManager[1332]: <info>  [1620828655.7164] device (enp0s31f6): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
May 12 07:10:55 thiccboii NetworkManager[1332]: <info>  [1620828655.7168] dhcp4 (enp0s31f6): activation: beginning transaction (timeout in 45 seconds)
May 12 07:10:55 thiccboii NetworkManager[1332]: <info>  [1620828655.7181] dhcp4 (enp0s31f6): dhclient started with pid 27309
May 12 07:11:01 thiccboii NetworkManager[1332]: <info>  [1620828661.7110] device (enp0s31f6): state change: ip-config -> unavailable (reason 'carrier-changed', sys-iface-state: 'managed')
May 12 07:11:01 thiccboii NetworkManager[1332]: <info>  [1620828661.7434] dhcp4 (enp0s31f6): canceled DHCP transaction, DHCP client pid 27309
May 12 07:11:01 thiccboii NetworkManager[1332]: <info>  [1620828661.7435] dhcp4 (enp0s31f6): state changed unknown -> terminated
May 12 07:11:03 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Half Duplex, Flow Control: None
May 12 07:11:03 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down

さらに、通常のシャットダウン中にJournalctl -kは潜在的に興味深い警告を表示します。

May 13 18:07:45 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
May 13 18:07:49 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
May 13 18:07:57 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Half Duplex, Flow Control: None
May 13 18:07:57 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
May 13 18:08:01 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
May 13 18:08:09 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Half Duplex, Flow Control: None
May 13 18:08:09 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
May 13 18:08:13 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
May 13 18:08:19 thiccboii kernel: ------------[ cut here ]------------
May 13 18:08:19 thiccboii kernel: WARNING: CPU: 6 PID: 16754 at /usr/src/kernel-modules/nvidia-460.73.01-default/nvidia-drm/nvidia-drm-drv.c:531 nv_drm_master_set+0x22/0x30 [nvidia_drm]
May 13 18:08:19 thiccboii kernel: Modules linked in: rfcomm xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nf_nat_tftp nf_conntrack_tftp bridge stp llc nft_objref nf_conntrack_netbios_ns nf_conntrack_broadcast ccm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct af_packet nft_chain_nat nf_tables ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_mangle iptable_raw iptable_security nvidia_drm(POE) nvidia_modeset(POE) ip_set nfnetlink ebtable_filter ebtables nvidia_uvm(POE) ip6table_filter ip6_tables iptable_filter ip_tables x_tables bpfilter nvidia(POE) cmac algif_hash algif_skcipher af_alg bnep dmi_sysfs uas snd_usb_audio snd_usbmidi_lib usb_storage snd_rawmidi snd_seq_device btusb btrtl btbcm btintel bluetooth uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev ecdh_generic mc ecc
May 13 18:08:19 thiccboii kernel:  snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common iwlmvm snd_hda_codec_realtek mac80211 snd_hda_codec_generic snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi libarc4 snd_hda_codec ee1004 iTCO_wdt intel_pmc_bxt iTCO_vendor_support mei_hdcp snd_hda_core iwlwifi x86_pkg_temp_thermal snd_hwdep intel_powerclamp coretemp thinkpad_acpi pcspkr cfg80211 joydev platform_profile efi_pstore snd_pcm wmi_bmof intel_wmi_thunderbolt i2c_i801 mei_me intel_lpss_pci ledtrig_audio intel_lpss rfkill snd_timer i2c_smbus mei idma64 intel_pch_thermal thermal snd soundcore ac tiny_power_button acpi_pad nls_iso8859_1 nls_cp437 vfat fat fuse binfmt_misc configfs hid_generic usbhid i915 kvm_intel kvm rtsx_pci_sdmmc crct10dif_pclmul crc32_pclmul mmc_core ghash_clmulni_intel aesni_intel i2c_algo_bit e1000e(OE) drm_kms_helper crypto_simd cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops xhci_pci cec xhci_pci_renesas xhci_hcd rc_core rtsx_pci drm nvme serio_raw usbcore nvme_core wmi battery
May 13 18:08:19 thiccboii kernel:  i2c_hid_acpi i2c_hid video pinctrl_sunrisepoint button vfio_mdev mdev vhost_net tun tap vhost vhost_iotlb vfio_pci vfio_virqfd irqbypass vfio_iommu_type1 vfio btrfs blake2b_generic libcrc32c crc32c_intel xor raid6_pq sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua msr bbswitch(O) efivarfs
May 13 18:08:19 thiccboii kernel: CPU: 6 PID: 16754 Comm: plymouthd Tainted: P     U     OE     5.12.0-2-default #1 openSUSE Tumbleweed
May 13 18:08:19 thiccboii kernel: Hardware name: LENOVO 20HK0013US/20HK0013US, BIOS N1TET56W (1.30 ) 02/10/2020
May 13 18:08:19 thiccboii kernel: RIP: 0010:nv_drm_master_set+0x22/0x30 [nvidia_drm]
May 13 18:08:19 thiccboii kernel: Code: f4 2c 44 d7 0f 1f 40 00 0f 1f 44 00 00 48 8b 47 38 48 8b 78 20 48 8b 05 9c 5c 00 00 48 8b 40 28 e8 d3 9f 7e d7 84 c0 74 01 c3 <0f> 0b c3 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 80 3d 7c
May 13 18:08:19 thiccboii kernel: RSP: 0018:ffffb41680933bd0 EFLAGS: 00010246
May 13 18:08:19 thiccboii kernel: RAX: 0000000000000000 RBX: ffff999cd278d000 RCX: 0000000000000008
May 13 18:08:19 thiccboii kernel: RDX: ffffffffc37a7e58 RSI: 0000000000000292 RDI: ffffffffc37a7e20
May 13 18:08:19 thiccboii kernel: RBP: ffff999f567c19c0 R08: 0000000000000008 R09: ffffb41680933bb8
May 13 18:08:19 thiccboii kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff999c0cb25800
May 13 18:08:19 thiccboii kernel: R13: 0000000000000000 R14: ffff999c0cb25800 R15: 000000001370a9a8
May 13 18:08:19 thiccboii kernel: FS:  00007fd15d540740(0000) GS:ffff99a577580000(0000) knlGS:0000000000000000
May 13 18:08:19 thiccboii kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 13 18:08:19 thiccboii kernel: CR2: 00007fd15d8fa000 CR3: 000000017f3d2001 CR4: 00000000003706e0
May 13 18:08:19 thiccboii kernel: Call Trace:
May 13 18:08:19 thiccboii kernel:  drm_new_set_master+0x7a/0x100 [drm]
May 13 18:08:19 thiccboii kernel:  drm_master_open+0x68/0x90 [drm]
May 13 18:08:19 thiccboii kernel:  drm_open+0xf5/0x240 [drm]
May 13 18:08:19 thiccboii kernel:  drm_stub_open+0xab/0x130 [drm]
May 13 18:08:19 thiccboii kernel:  chrdev_open+0xed/0x210
May 13 18:08:19 thiccboii kernel:  ? cdev_device_add+0x90/0x90
May 13 18:08:19 thiccboii kernel:  do_dentry_open+0x14e/0x380
May 13 18:08:19 thiccboii kernel:  path_openat+0xaf6/0x10a0
May 13 18:08:19 thiccboii kernel:  ? release_pages+0x153/0x4a0
May 13 18:08:19 thiccboii kernel:  ? flush_tlb_func_common.constprop.0+0x93/0x1e0
May 13 18:08:19 thiccboii kernel:  ? free_unref_page+0x99/0xb0
May 13 18:08:19 thiccboii kernel:  do_filp_open+0x99/0x140
May 13 18:08:19 thiccboii kernel:  ? __check_object_size+0x136/0x150
May 13 18:08:19 thiccboii kernel:  do_sys_openat2+0x97/0x150
May 13 18:08:19 thiccboii kernel:  __x64_sys_openat+0x54/0x90
May 13 18:08:19 thiccboii kernel:  do_syscall_64+0x33/0x80
May 13 18:08:19 thiccboii kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
May 13 18:08:19 thiccboii kernel: RIP: 0033:0x7fd15d7cbffb
May 13 18:08:19 thiccboii kernel: Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 2b 0c 25
May 13 18:08:19 thiccboii kernel: RSP: 002b:00007ffd4e3fa1b0 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
May 13 18:08:19 thiccboii kernel: RAX: ffffffffffffffda RBX: 00007fd15d5406c8 RCX: 00007fd15d7cbffb
May 13 18:08:19 thiccboii kernel: RDX: 0000000000000002 RSI: 000056549ac3d730 RDI: 00000000ffffff9c
May 13 18:08:19 thiccboii kernel: RBP: 000056549ac3d730 R08: 000056549ac3c930 R09: 00007fd15d89ea60
May 13 18:08:19 thiccboii kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
May 13 18:08:19 thiccboii kernel: R13: 00007fd15d8c5da8 R14: 0000000000000000 R15: 000056549ac3d080
May 13 18:08:19 thiccboii kernel: ---[ end trace 24fb17530164c622 ]---
May 13 18:08:19 thiccboii kernel: usb 1-4.3.1: reset high-speed USB device number 13 using xhci_hcd
May 13 18:08:21 thiccboii kernel: wlp4s0: deauthenticating from 3c:37:86:14:73:fa by local choice (Reason: 3=DEAUTH_LEAVING)
May 13 18:08:21 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 1000 Mbps Half Duplex, Flow Control: None
May 13 18:08:21 thiccboii kernel: e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Down
May 13 18:08:23 thiccboii kernel: kauditd_printk_skb: 44 callbacks suppressed
May 13 18:08:23 thiccboii kernel: audit: type=1305 audit(1620954503.216:16943): op=set audit_pid=0 old=1577 auid=4294967295 ses=4294967295 subj==unconfined res=1
May 13 18:08:23 thiccboii kernel: audit: type=1131 audit(1620954503.216:16944): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=auditd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
May 13 18:08:23 thiccboii kernel: audit: type=1131 audit(1620954503.216:16945): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'

他の場所に聞いたら教えてください。 RAMテストをしてみましたが、このコンピュータを1年以上使用してきたので、これが知られているcstate問題のあるCPUの1つだとは思いません。クラッシュが発生するのに2〜20時間かかったため、トラブルシューティングは苦労しましたが、試す必要がある他のものがあるかどうかを知っていました。

ベストアンサー1

先週も同様のことを見たことがあります。これはkernel-default-5.12.13-1およびNvidiaドライバ460.84で発生する可能性がありますが、インストール直後には発生しないため、他のアップデート(プラズマ、クロムなど)に関連する可能性があります。カーネルベース5.13.0-1.1で引き続き発生します。かなり長い間、安定して実行されてきたデスクトップでこれが3回発生しました。

Chromeは数年前に同様のことを経験しました。 Google chrome-beta 92.0.4515.80-1でGPUを高速化するための高度なオプションをオフにしました。これまで私は別の封じ込めを見たことがありません。しかし、今はカーネルのデフォルト値5.13.0-1.2とChrome Beta 92.0.4515.93-1も使用しているので、状況が変わることがあります。

私は通常nvidiaフォーラムでこの質問をします(過去にはnvidiaサポートスタッフが非常に役に立ったと聞きました)。しかし、ログや/var/log/Xorg.0.logでパターンや興味深いものを見るまでこれを躊躇します。最近の衝突による/var/log/Xorg.0.logがある場合は、ここに手がかりがあるかもしれません。

おすすめ記事