Hi, the patch bellow fixes a nullptr dereference reported with OpenSUSE12.3. I am not familiar with the area so I have no idea whether this is the right way to go but after applying this patch the problem is not reproducible anymore. If the patch is correct then please mark it for stable (3.7+).
Thanks! --- From a786a701bd6c277329e2b788fea9a69b1c3ced2e Mon Sep 17 00:00:00 2001 From: Michal Hocko mhocko@suse.cz Date: Tue, 26 Mar 2013 19:04:40 +0100 Subject: [PATCH] drm: fix i_mapping and f_mapping initialization in drm_open in error path
Starting with fdb40a08 (drm: set dev_mapping before calling drm_open_helper) inode and file mappings are set to old_mapping in the error path. old_mapping can be NULL, however, which is handled by initializing dev_mapping to default inode->i_data. old_mapping is left intact though so the both inode's and filep's mapping will still point to NULL which is unexpected and can it results in crashes later one.
Marco Munderloh has reported such crashes: BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 IP: [<ffffffff81190be4>] drop_pagecache_sb+0x74/0xe0 PGD 252bc1067 PUD 253d11067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: fuse af_packet xt_tcpudp xt_pkttype xt_LOG xt_limit bnep bluetooth ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq snd_hda_codec_hdmi mperf coretemp snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep kvm_intel snd_pcm arc4 snd_seq snd_timer snd_seq_device kvm iwldvm mac80211 snd uvcvideo crc32c_intel videobuf2_core videodev ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw videobuf2_vmalloc aes_x86_64 iTCO_wdt xts tpm_infineon mei r8169 videobuf2_memops iTCO_vendor_support sr_mod lpc_ich iwlwifi gf128mul sony_laptop rts_pstor(C) cdrom i2c_i801 tpm_tis tpm tpm_bios battery mfd_core soundcore snd_page_alloc cfg80211 rfkill ac sg microcode pcspkr autofs4 xhci_hcd ehci_hcd usbcore usb_common radeon i915 video ttm drm_kms_helper drm i2c_algo_bit thermal button processor thermal_sys scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh_alua scsi_dh CPU 0 Pid: 1452, comm: bash Tainted: G C 3.7.10-1.1-default ation VPCSA4W9E/VAIO RIP: 0010:[<ffffffff81190be4>] [<ffffffff81190be4>] drop_pagecache_sb+0x74/0xe0 RSP: 0018:ffff880252bc9e18 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88024ecb7db0 RCX: 0000000000000002 RDX: 0000000000000007 RSI: ffff88024f63a670 RDI: ffff88024ecb7e38 RBP: ffff88024ecb7e38 R08: dead000000200200 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000210 R12: ffff880254d588a0 R13: ffff88024fcb25e8 R14: ffffffff81190b70 R15: ffffffffffffffea FS: 00007fad2b9ed700(0000) GS:ffff88025fa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000058 CR3: 0000000252ad2000 CR4: 00000000000407f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process bash (pid: 1452, threadinfo ffff880252bc8000, task ffff880253d321c0) Stack: 0000000000000001 ffff880254d58800 ffff880254e94800 ffff880254d58868 0000000000000000 ffffffff8116a499 0000000000000000 0000000000000001 ffffffff81a228a0 ffff880252bc9f50 0000000000000002 ffffffff81190cce Call Trace: [<ffffffff8116a499>] iterate_supers+0xd9/0xe0 [<ffffffff81190cce>] drop_caches_sysctl_handler+0x7e/0x90 [<ffffffff811d0e26>] proc_sys_call_handler.isra.10+0xc6/0xe0 [<ffffffff81166fd7>] vfs_write+0xa7/0x180 [<ffffffff81167321>] sys_write+0x51/0xa0 [<ffffffff8154f2ed>] system_call_fastpath+0x1a/0x1f [<00007fad2ae959c0>] 0x7fad2ae959bf Code: 01 00 00 49 39 c4 48 8d 98 00 ff ff ff 74 68 48 8d ab 88 00 00 00 48 89 ef e8 49 69 3b 00 f6 83 a0 00 00 00 38 75 d0 48 8b 43 30 <48> 83 78 58 00 74 c5 48 89 df e8 dd ef fe ff 66 83 45 00 01 66 RIP [<ffffffff81190be4>] drop_pagecache_sb+0x74/0xe0 RSP <ffff880252bc9e18> CR2: 0000000000000058
when dropping caches when inode with NULL i_mapping is encountered. Or a different one when umounting devtmpfs: BUG: unable to handle kernel NULL pointer dereference at 0000000000000068 IP: [<ffffffff81122001>] shmem_evict_inode+0x11/0x130 PGD 0 Oops: 0000 [#1] SMP Modules linked in: xt_tcpudp xt_pkttype xt_LOG xt_limit af_packet ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack bnep bluetooth ip6table_filter ip6_tables cpufreq_conservative x_tables cpufreq_userspace cpufreq_powersave snd_hda_codec_hdmi snd_hda_codec_realtek acpi_cpufreq snd_hda_intel mperf snd_hda_codec coretemp snd_hwdep kvm_intel snd_pcm kvm arc4 snd_seq iwldvm mac80211 crc32c_intel ghash_clmulni_intel snd_timer aesni_intel snd_seq_device iTCO_wdt uvcvideo videobuf2_core iwlwifi videodev sony_laptop videobuf2_vmalloc videobuf2_memops ablk_helper iTCO_vendor_support cryptd cfg80211 tpm_infineon r8169 sr_mod cdrom mei snd lpc_ich battery lrw aes_x86_64 xts rfkill i2c_i801 pcspkr mfd_core tpm_tis ac gf128mul tpm tpm_bios soundcore snd_page_alloc sg microcode autofs4 xhci_hcd ehci_hcd radeon(-) i915 ttm drm_kms_helper usbcore usb_common drm thermal i2c_algo_bit video button processor thermal_sys scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh_alua scsi_dh CPU 1 <4>[ 44.175256] Pid: 29, comm: kdevtmpfs Tainted: G W 3.7.10-1-default-patched #4 Sony Corpora tion VPCSA4W9E/VAIO RIP: 0010:[<ffffffff81122001>] [<ffffffff81122001>] shmem_evict_inode+0x11/0x130 RSP: 0018:ffff880254ed3d18 EFLAGS: 00010296 RAX: 0000000000000000 RBX: ffff88024fb185e8 RCX: 0000000000000034 RDX: 0000000000002433 RSI: 0000000000000c11 RDI: ffff88024fb185e8 RBP: ffff88024fb186e8 R08: 1038000000000000 R09: 024fb186881c0000 R10: fd924f0d6445a207 R11: 0000000000000000 R12: ffffffff8161b640 R13: ffff88024fb185e8 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88025fa40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000068 CR3: 0000000001a0c000 CR4: 00000000000407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kdevtmpfs (pid: 29, threadinfo ffff880254ed2000, task ffff880254ed0080) Stack: ffff88024fb185e8 ffff88024fb185e8 ffff88024fb186e8 ffffffff8161b640 0000000000000000 ffffffff8117f5f3 ffff88024e453a80 ffff88024fb185e8 0000000000000000 ffffffff8117b778 0000000000000000 ffff88024e453a80 Call Trace: [<ffffffff8117f5f3>] evict+0xa3/0x190 [<ffffffff8117b778>] d_delete+0x148/0x180 [<ffffffff81171d77>] vfs_unlink+0xf7/0x110 [<ffffffff81386ab2>] handle_remove+0x202/0x250 [<ffffffff81386de5>] devtmpfsd+0xd5/0x130 [<ffffffff81066273>] kthread+0xb3/0xc0 [<ffffffff81549c3c>] ret_from_fork+0x7c/0xb0 Code: 7b 30 b9 01 00 00 00 31 d2 4c 89 f6 e8 69 e3 00 00 e9 23 ff ff ff 0f 1f 40 00 41 55 49 89 fd 41 54 55 53 48 83 ec 08 48 8b 47 30 <48> 81 78 68 00 b7 61 81 74 75 48 8b 7f a8 4d 8d 65 90 e8 b8 1f RIP [<ffffffff81122001>] shmem_evict_inode+0x11/0x130 RSP <ffff880254ed3d18> CR2: 0000000000000068
This patch fixes that by initializating old_mapping to the inode->i_data same as dev_mapping.
Reported-and-tested-by: Marco Munderloh munderl@tnt.uni-hannover.de Signed-off-by: Michal Hocko mhocko@suse.cz --- drivers/gpu/drm/drm_fops.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c index 133b413..62a5435 100644 --- a/drivers/gpu/drm/drm_fops.c +++ b/drivers/gpu/drm/drm_fops.c @@ -139,7 +139,7 @@ int drm_open(struct inode *inode, struct file *filp) mutex_lock(&dev->struct_mutex); old_mapping = dev->dev_mapping; if (old_mapping == NULL) - dev->dev_mapping = &inode->i_data; + dev->dev_mapping = old_mapping = &inode->i_data; /* ihold ensures nobody can remove inode with our i_data */ ihold(container_of(dev->dev_mapping, struct inode, i_data)); inode->i_mapping = dev->dev_mapping;
This looks a bit like a hack and it doesn't look right, conceptually. If the call fails, it should restore things as if nothing has ever happened and overwriting old_mapping is not going to do the trick.
I think the right way to fix it would be to separately store the original mapping for filp->f_mapping and inode->i_mapping and restore it from their respective temporary variables if drm_open_helper or drm_setup fail. Attached is a quick patch to show you what I have in mind, can you please test it and if it solves your problem, I'll send it to Dave.
By the way, what specific course of action reproduces the problem? It requires drm_open to fail, but is there anything else that you do?
thanks,
Ilija
On Tue, Mar 26, 2013 at 3:56 PM, Michal Hocko mhocko@suse.cz wrote:
Hi, the patch bellow fixes a nullptr dereference reported with OpenSUSE12.3. I am not familiar with the area so I have no idea whether this is the right way to go but after applying this patch the problem is not reproducible anymore If the patch is correct then please mark it for stable (3.7+).
Thanks!
From a786a701bd6c277329e2b788fea9a69b1c3ced2e Mon Sep 17 00:00:00 2001 From: Michal Hocko mhocko@suse.cz Date: Tue, 26 Mar 2013 19:04:40 +0100 Subject: [PATCH] drm: fix i_mapping and f_mapping initialization in drm_open in error path
Starting with fdb40a08 (drm: set dev_mapping before calling drm_open_helper) inode and file mappings are set to old_mapping in the error path. old_mapping can be NULL, however, which is handled by initializing dev_mapping to default inode->i_data. old_mapping is left intact though so the both inode's and filep's mapping will still point to NULL which is unexpected and can it results in crashes later one.
Marco Munderloh has reported such crashes: BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 IP: [<ffffffff81190be4>] drop_pagecache_sb+0x74/0xe0 PGD 252bc1067 PUD 253d11067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: fuse af_packet xt_tcpudp xt_pkttype xt_LOG xt_limit bnep bluetooth ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack ip6table_filter ip6_tables x_tables cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq snd_hda_codec_hdmi mperf coretemp snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep kvm_intel snd_pcm arc4 snd_seq snd_timer snd_seq_device kvm iwldvm mac80211 snd uvcvideo crc32c_intel videobuf2_core videodev ghash_clmulni_intel aesni_intel ablk_helper cryptd lrw videobuf2_vmalloc aes_x86_64 iTCO_wdt xts tpm_infineon mei r8169 videobuf2_memops iTCO_vendor_support sr_mod lpc_ich iwlwifi gf128mul sony_laptop rts_pstor(C) cdrom i2c_i801 tpm_tis tpm tpm_bios battery mfd_core soundcore snd_page_alloc cfg80211 rfkill ac sg microcode pcspkr autofs4 xhci_hcd ehci_hcd usbcore usb_common radeon i915 video ttm drm_kms_helper drm i2c_algo_bit thermal button processor thermal_sys scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh_alua scsi_dh CPU 0 Pid: 1452, comm: bash Tainted: G C 3.7.10-1.1-default ation VPCSA4W9E/VAIO RIP: 0010:[<ffffffff81190be4>] [<ffffffff81190be4>] drop_pagecache_sb+0x74/0xe0 RSP: 0018:ffff880252bc9e18 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88024ecb7db0 RCX: 0000000000000002 RDX: 0000000000000007 RSI: ffff88024f63a670 RDI: ffff88024ecb7e38 RBP: ffff88024ecb7e38 R08: dead000000200200 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000210 R12: ffff880254d588a0 R13: ffff88024fcb25e8 R14: ffffffff81190b70 R15: ffffffffffffffea FS: 00007fad2b9ed700(0000) GS:ffff88025fa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000058 CR3: 0000000252ad2000 CR4: 00000000000407f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process bash (pid: 1452, threadinfo ffff880252bc8000, task ffff880253d321c0) Stack: 0000000000000001 ffff880254d58800 ffff880254e94800 ffff880254d58868 0000000000000000 ffffffff8116a499 0000000000000000 0000000000000001 ffffffff81a228a0 ffff880252bc9f50 0000000000000002 ffffffff81190cce Call Trace: [<ffffffff8116a499>] iterate_supers+0xd9/0xe0 [<ffffffff81190cce>] drop_caches_sysctl_handler+0x7e/0x90 [<ffffffff811d0e26>] proc_sys_call_handler.isra.10+0xc6/0xe0 [<ffffffff81166fd7>] vfs_write+0xa7/0x180 [<ffffffff81167321>] sys_write+0x51/0xa0 [<ffffffff8154f2ed>] system_call_fastpath+0x1a/0x1f [<00007fad2ae959c0>] 0x7fad2ae959bf Code: 01 00 00 49 39 c4 48 8d 98 00 ff ff ff 74 68 48 8d ab 88 00 00 00 48 89 ef e8 49 69 3b 00 f6 83 a0 00 00 00 38 75 d0 48 8b 43 30 <48> 83 78 58 00 74 c5 48 89 df e8 dd ef fe ff 66 83 45 00 01 66 RIP [<ffffffff81190be4>] drop_pagecache_sb+0x74/0xe0 RSP <ffff880252bc9e18> CR2: 0000000000000058
when dropping caches when inode with NULL i_mapping is encountered. Or a different one when umounting devtmpfs: BUG: unable to handle kernel NULL pointer dereference at 0000000000000068 IP: [<ffffffff81122001>] shmem_evict_inode+0x11/0x130 PGD 0 Oops: 0000 [#1] SMP Modules linked in: xt_tcpudp xt_pkttype xt_LOG xt_limit af_packet ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw ipt_REJECT iptable_raw xt_CT iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ipv4 nf_defrag_ipv4 ip_tables xt_conntrack nf_conntrack bnep bluetooth ip6table_filter ip6_tables cpufreq_conservative x_tables cpufreq_userspace cpufreq_powersave snd_hda_codec_hdmi snd_hda_codec_realtek acpi_cpufreq snd_hda_intel mperf snd_hda_codec coretemp snd_hwdep kvm_intel snd_pcm kvm arc4 snd_seq iwldvm mac80211 crc32c_intel ghash_clmulni_intel snd_timer aesni_intel snd_seq_device iTCO_wdt uvcvideo videobuf2_core iwlwifi videodev sony_laptop videobuf2_vmalloc videobuf2_memops ablk_helper iTCO_vendor_support cryptd cfg80211 tpm_infineon r8169 sr_mod cdrom mei snd lpc_ich battery lrw aes_x86_64 xts rfkill i2c_i801 pcspkr mfd_core tpm_tis ac gf128mul tpm tpm_bios soundcore snd_page_alloc sg microcode autofs4 xhci_hcd ehci_hcd radeon(-) i915 ttm drm_kms_helper usbcore usb_common drm thermal i2c_algo_bit video button processor thermal_sys scsi_dh_emc scsi_dh_rdac scsi_dh_hp_sw scsi_dh_alua scsi_dh CPU 1 <4>[ 44.175256] Pid: 29, comm: kdevtmpfs Tainted: G W 3.7.10-1-default-patched #4 Sony Corpora tion VPCSA4W9E/VAIO RIP: 0010:[<ffffffff81122001>] [<ffffffff81122001>] shmem_evict_inode+0x11/0x130 RSP: 0018:ffff880254ed3d18 EFLAGS: 00010296 RAX: 0000000000000000 RBX: ffff88024fb185e8 RCX: 0000000000000034 RDX: 0000000000002433 RSI: 0000000000000c11 RDI: ffff88024fb185e8 RBP: ffff88024fb186e8 R08: 1038000000000000 R09: 024fb186881c0000 R10: fd924f0d6445a207 R11: 0000000000000000 R12: ffffffff8161b640 R13: ffff88024fb185e8 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88025fa40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000068 CR3: 0000000001a0c000 CR4: 00000000000407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kdevtmpfs (pid: 29, threadinfo ffff880254ed2000, task ffff880254ed0080) Stack: ffff88024fb185e8 ffff88024fb185e8 ffff88024fb186e8 ffffffff8161b640 0000000000000000 ffffffff8117f5f3 ffff88024e453a80 ffff88024fb185e8 0000000000000000 ffffffff8117b778 0000000000000000 ffff88024e453a80 Call Trace: [<ffffffff8117f5f3>] evict+0xa3/0x190 [<ffffffff8117b778>] d_delete+0x148/0x180 [<ffffffff81171d77>] vfs_unlink+0xf7/0x110 [<ffffffff81386ab2>] handle_remove+0x202/0x250 [<ffffffff81386de5>] devtmpfsd+0xd5/0x130 [<ffffffff81066273>] kthread+0xb3/0xc0 [<ffffffff81549c3c>] ret_from_fork+0x7c/0xb0 Code: 7b 30 b9 01 00 00 00 31 d2 4c 89 f6 e8 69 e3 00 00 e9 23 ff ff ff 0f 1f 40 00 41 55 49 89 fd 41 54 55 53 48 83 ec 08 48 8b 47 30 <48> 81 78 68 00 b7 61 81 74 75 48 8b 7f a8 4d 8d 65 90 e8 b8 1f RIP [<ffffffff81122001>] shmem_evict_inode+0x11/0x130 RSP <ffff880254ed3d18> CR2: 0000000000000068
This patch fixes that by initializating old_mapping to the inode->i_data same as dev_mapping.
Reported-and-tested-by: Marco Munderloh munderl@tnt.uni-hannover.de Signed-off-by: Michal Hocko mhocko@suse.cz
drivers/gpu/drm/drm_fops.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c index 133b413..62a5435 100644 --- a/drivers/gpu/drm/drm_fops.c +++ b/drivers/gpu/drm/drm_fops.c @@ -139,7 +139,7 @@ int drm_open(struct inode *inode, struct file *filp) mutex_lock(&dev->struct_mutex); old_mapping = dev->dev_mapping; if (old_mapping == NULL)
dev->dev_mapping = &inode->i_data;
dev->dev_mapping = old_mapping = &inode->i_data; /* ihold ensures nobody can remove inode with our i_data */ ihold(container_of(dev->dev_mapping, struct inode, i_data)); inode->i_mapping = dev->dev_mapping;
-- 1.7.10.4
-- Michal Hocko SUSE Labs
On Sat 30-03-13 18:26:53, Ilija Hadzic wrote:
This looks a bit like a hack and it doesn't look right, conceptually. If the call fails, it should restore things as if nothing has ever happened and overwriting old_mapping is not going to do the trick.
OK, I thought this is what the patch does as it falls back to &inode->i_data which is the default mapping for all inodes or it uses what used to be in device mapping.
I am obviously not familiar with the drm code but it feels a bit strange that the device mapping can be different than inode's resp. file's one and even more confusing that inode and file are saved separately.
I think the right way to fix it would be to separately store the original mapping for filp->f_mapping and inode->i_mapping and restore it from their respective temporary variables if drm_open_helper or drm_setup fail. Attached is a quick patch to show you
[...]
@@ -137,6 +139,8 @@ int drm_open(struct inode *inode, struct file *filp) if (!dev->open_count++) need_setup = 1; mutex_lock(&dev->struct_mutex);
- old_fmapping = filp->f_mapping;
- old_imapping = inode->i_mapping;
How can file and inode mappings be different?
old_mapping = dev->dev_mapping; if (old_mapping == NULL) dev->dev_mapping = &inode->i_data; @@ -159,8 +163,8 @@ int drm_open(struct inode *inode, struct file *filp)
err_undo: mutex_lock(&dev->struct_mutex);
- filp->f_mapping = old_mapping;
- inode->i_mapping = old_mapping;
- filp->f_mapping = old_fmapping;
- inode->i_mapping = old_imapping; iput(container_of(dev->dev_mapping, struct inode, i_data)); dev->dev_mapping = old_mapping; mutex_unlock(&dev->struct_mutex);
On Sun, 31 Mar 2013, Michal Hocko wrote:
On Sat 30-03-13 18:26:53, Ilija Hadzic wrote:
This looks a bit like a hack and it doesn't look right, conceptually. If the call fails, it should restore things as if nothing has ever happened and overwriting old_mapping is not going to do the trick.
OK, I thought this is what the patch does as it falls back to &inode->i_data which is the default mapping for all inodes or it uses what used to be in device mapping.
I am obviously not familiar with the drm code but it feels a bit strange that the device mapping can be different than inode's resp. file's one
The reason for this is explained in commit message associated with 949c4a34.
In summary, the device's mapping is that of the inode associated with the first opener. Before 949c4a34, subsequent openers would have to come in through exactly the same inode that the first opener came in (otherwise the open call would fail). So if a user did something like: start X, remove /dev/dri/cardN file, mknod the same file again, the applications started after such an action would stop working. Also, using the GPU from chroot-ed environment was not possible if there was another opener from different root.
The 949c4a34, removed this restriction, but introduced a problem with VmWare GPU drivers, which fdb40a08. However, fdb40a08 introduced the bug that you have reported.
The problem that I have with your proposed fix is that if the first opener fails, it can set the device's mapping to that of the inode that was never used and never opened (and could even be removed later down the road).
and even more confusing that inode and file are saved separately.
I was trying to quickly get out the patch that was safe in terms of introducing new breakage. So the "conservative" thing to do (without having to think through all possible scenarios) was to restore each of the three pointers from their own temporary variable. Thinking about it, you are probably right that file descriptor's and inode's mapping pointer are equal when open call is entered so we could use one variable. However, you still need a separate variable to store the device's mapping pointer because that one can be different.
Attached is a v2 of the patch, for reference. I would appreciate if the original reporter or you tested it in lieu of your proposed patch and let me know if it fixes your issue.
-- Ilija
On Mon 01-04-13 13:14:50, Ilija Hadzic wrote:
On Sun, 31 Mar 2013, Michal Hocko wrote:
On Sat 30-03-13 18:26:53, Ilija Hadzic wrote:
This looks a bit like a hack and it doesn't look right, conceptually. If the call fails, it should restore things as if nothing has ever happened and overwriting old_mapping is not going to do the trick.
OK, I thought this is what the patch does as it falls back to &inode->i_data which is the default mapping for all inodes or it uses what used to be in device mapping.
I am obviously not familiar with the drm code but it feels a bit strange that the device mapping can be different than inode's resp. file's one
The reason for this is explained in commit message associated with 949c4a34.
In summary, the device's mapping is that of the inode associated with the first opener. Before 949c4a34, subsequent openers would have to come in through exactly the same inode that the first opener came in (otherwise the open call would fail). So if a user did something like: start X, remove /dev/dri/cardN file, mknod the same file again, the applications started after such an action would stop working. Also, using the GPU from chroot-ed environment was not possible if there was another opener from different root.
Oh, I see. Thanks for the clarification.
The 949c4a34, removed this restriction, but introduced a problem with VmWare GPU drivers, which fdb40a08. However, fdb40a08 introduced the bug that you have reported.
The problem that I have with your proposed fix is that if the first opener fails, it can set the device's mapping to that of the inode that was never used and never opened (and could even be removed later down the road).
Makes sense.
and even more confusing that inode and file are saved separately.
I was trying to quickly get out the patch that was safe in terms of introducing new breakage. So the "conservative" thing to do (without having to think through all possible scenarios) was to restore each of the three pointers from their own temporary variable. Thinking about it, you are probably right that file descriptor's and inode's mapping pointer are equal when open call is entered so we could use one variable. However, you still need a separate variable to store the device's mapping pointer because that one can be different.
Right.
Attached is a v2 of the patch, for reference. I would appreciate if the original reporter or you tested it in lieu of your proposed patch and let me know if it fixes your issue.
OK, this is a call for Marco. I have attached this bug to our bugzilla as well (just for reference: https://bugzilla.novell.com/show_bug.cgi?id=807850)
-- Ilija
From 7e3c832158e2552e5e106a588e2b9e61c35b68f2 Mon Sep 17 00:00:00 2001 From: Ilija Hadzic ihadzic@research.bell-labs.com Date: Sat, 30 Mar 2013 18:20:35 -0400 Subject: [PATCH] drm: correctly restore mappings if drm_open fails
If first drm_open fails, the error-handling path will incorrectly restore inode's mapping to NULL. This can cause the crash later on. Fix by separately storing away mapping pointers that drm_open can touch and restore each from its own respective variable if the call fails.
Reference: http://lists.freedesktop.org/archives/dri-devel/2013-March/036564.html
v2: use one variable to store file and inode mapping since they are the same at the function entry; also fix spelling mistakes in commit message.
Reported-by: Marco Munderloh munderl@tnt.uni-hannover.de Signed-off-by: Ilija Hadzic ihadzic@research.bell-labs.com Cc: Michal Hocko mhocko@suse.cz Cc: stable@vger.kernel.org
Feel free to add Reviewed-by: Michal Hocko mhocko@suse.cz
Thanks!
Signed-off-by: Ilija Hadzic ihadzic@research.bell-labs.com
drivers/gpu/drm/drm_fops.c | 6 ++++-- 1 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_fops.c b/drivers/gpu/drm/drm_fops.c index 13fdcd1..429e07d 100644 --- a/drivers/gpu/drm/drm_fops.c +++ b/drivers/gpu/drm/drm_fops.c @@ -123,6 +123,7 @@ int drm_open(struct inode *inode, struct file *filp) int retcode = 0; int need_setup = 0; struct address_space *old_mapping;
struct address_space *old_imapping;
minor = idr_find(&drm_minors_idr, minor_id); if (!minor)
@@ -137,6 +138,7 @@ int drm_open(struct inode *inode, struct file *filp) if (!dev->open_count++) need_setup = 1; mutex_lock(&dev->struct_mutex);
- old_imapping = inode->i_mapping; old_mapping = dev->dev_mapping; if (old_mapping == NULL) dev->dev_mapping = &inode->i_data;
@@ -159,8 +161,8 @@ int drm_open(struct inode *inode, struct file *filp)
err_undo: mutex_lock(&dev->struct_mutex);
- filp->f_mapping = old_mapping;
- inode->i_mapping = old_mapping;
- filp->f_mapping = old_imapping;
- inode->i_mapping = old_imapping; iput(container_of(dev->dev_mapping, struct inode, i_data)); dev->dev_mapping = old_mapping; mutex_unlock(&dev->struct_mutex);
-- 1.7.4.1
Attached is a v2 of the patch, for reference. I would appreciate if the original reporter or you tested it in lieu of your proposed patch and let me know if it fixes your issue.
The patch works for me. echo 3 > /proc/sys/vm/drop_caches as well as rmmod radeon do not end up in a crash anymore. However, I have still no clue why one of these makes drm_open to fail. On rmmod radeon I get the following log messages. If don't know if the 'unpin not necessary' has anything to do with it.
[drm] radeon: finishing device. radeon 0000:01:00.0: ffff88024e526c00 unpin not necessary radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary [TTM] Finalizing pool allocator [TTM] Finalizing DMA pool allocator [TTM] Zone kernel: Used memory at exit: 0 kiB [TTM] Zone dma32: Used memory at exit: 0 kiB [drm] radeon: ttm finalized vga_switcheroo: disabled [drm] Module unloaded
By the way, sometimes my r8169 ethernet controller does not survive suspend/hibernation (does not detect link). rmmod/modprobe helps. I don't know if this is related.
Thanks for testing. Other issues are probably unrelated, so I'll send the last version of the patch to Dave.
-- Ilija
On Tue, Apr 2, 2013 at 6:36 AM, Marco Munderloh <munderl@tnt.uni-hannover.de
wrote:
Attached is a v2 of the patch, for reference. I would appreciate if the
original reporter or you tested it in lieu of your proposed patch and let me know if it fixes your issue.
The patch works for me. echo 3 > /proc/sys/vm/drop_caches as well as rmmod radeon do not end up in a crash anymore. However, I have still no clue why one of these makes drm_open to fail. On rmmod radeon I get the following log messages. If don't know if the 'unpin not necessary' has anything to do with it.
[drm] radeon: finishing device. radeon 0000:01:00.0: ffff88024e526c00 unpin not necessary radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary [TTM] Finalizing pool allocator [TTM] Finalizing DMA pool allocator [TTM] Zone kernel: Used memory at exit: 0 kiB [TTM] Zone dma32: Used memory at exit: 0 kiB [drm] radeon: ttm finalized vga_switcheroo: disabled [drm] Module unloaded
By the way, sometimes my r8169 ethernet controller does not survive suspend/hibernation (does not detect link). rmmod/modprobe helps. I don't know if this is related.
Hi Ilija,
Thanks for testing. Other issues are probably unrelated, so I'll send the last version of the patch to Dave.
I came across another problem which seems related. rmmod radeon works, however, modprobe radeon afterwards results in a crash (divide error), see attachment.
Best, Marco
On 02.04.2013 13:23, Ilija Hadzic wrote:
-- Ilija
On Tue, Apr 2, 2013 at 6:36 AM, Marco Munderloh <munderl@tnt.uni-hannover.de mailto:munderl@tnt.uni-hannover.de> wrote:
Attached is a v2 of the patch, for reference. I would appreciate if the original reporter or you tested it in lieu of your proposed patch and let me know if it fixes your issue. The patch works for me. echo 3 > /proc/sys/vm/drop_caches as well as rmmod radeon do not end up in a crash anymore. However, I have still no clue why one of these makes drm_open to fail. On rmmod radeon I get the following log messages. If don't know if the 'unpin not necessary' has anything to do with it. [drm] radeon: finishing device. radeon 0000:01:00.0: ffff88024e526c00 unpin not necessary radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary [TTM] Finalizing pool allocator [TTM] Finalizing DMA pool allocator [TTM] Zone kernel: Used memory at exit: 0 kiB [TTM] Zone dma32: Used memory at exit: 0 kiB [drm] radeon: ttm finalized vga_switcheroo: disabled [drm] Module unloaded By the way, sometimes my r8169 ethernet controller does not survive suspend/hibernation (does not detect link). rmmod/modprobe helps. I don't know if this is related.
Marco,
What makes you think that the crash after second modprobe is related to the mappings pointers in DRM module? Can you actually establish the correlation between these patches and the crash or you are just suspecting because your other bug had something to do with module removal/insertion?
If it's the latter, then you may want to open another bug report here https://bugs.freedesktop.org/ (use DRI for product and pick DRM/radeon for component) and have this issue tracked and addressed separately.
The divide error that your log shows apparently happens at this line inside r6xx_remap_render_backend:
pipe_rb_ratio = rendering_pipe_num / req_rb_num;
I would suspect that req_rb_num somehow evaluates to zero at the second modprobe. That variable seems to be the derived of the last three arguments to r6xx_remap_render_backend. If I look at the caller (evergreen_gpu_init) the arguments that have the play here are all derived from the GPU's hardware registers (or are the constant for a given GPU device). So I suspect that the GPU driver leaves some state in GPU at module removal that later bites you.
-- Ilija
On Tue, 2 Apr 2013, Marco Munderloh wrote:
Hi Ilija,
Thanks for testing. Other issues are probably unrelated, so I'll send the last version of the patch to Dave.
I came across another problem which seems related. rmmod radeon works, however, modprobe radeon afterwards results in a crash (divide error), see attachment.
Best, Marco
On 02.04.2013 13:23, Ilija Hadzic wrote:
-- Ilija
On Tue, Apr 2, 2013 at 6:36 AM, Marco Munderloh <munderl@tnt.uni-hannover.de mailto:munderl@tnt.uni-hannover.de> wrote:
Attached is a v2 of the patch, for reference. I would appreciate if
the original reporter or you tested it in lieu of your proposed patch and let me know if it fixes your issue.
The patch works for me. echo 3 > /proc/sys/vm/drop_caches as well as
rmmod radeon do not end up in a crash anymore. However, I have still no clue why one of these makes drm_open to fail. On rmmod radeon I get the following log messages. If don't know if the 'unpin not necessary' has anything to do with it.
[drm] radeon: finishing device. radeon 0000:01:00.0: ffff88024e526c00 unpin not necessary radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary [TTM] Finalizing pool allocator [TTM] Finalizing DMA pool allocator [TTM] Zone kernel: Used memory at exit: 0 kiB [TTM] Zone dma32: Used memory at exit: 0 kiB [drm] radeon: ttm finalized vga_switcheroo: disabled [drm] Module unloaded By the way, sometimes my r8169 ethernet controller does not survive
suspend/hibernation (does not detect link). rmmod/modprobe helps. I don't know if this is related.
-- Dipl.-Ing. Marco Munderloh Mail: munderl@tnt.uni-hannover.de Institut f�r Informationsverarbeitung (TNT) Phone: +49 511 762-19587 Leibniz Universitaet Hannover, Appelstr. 9a Fax: +49 511 762- 5333 30167 Hannover, Germany Web: http://www.tnt.uni-hannover.de/~munderl
On Tue, Apr 2, 2013 at 9:31 AM, Ilija Hadzic ihadzic@research.bell-labs.com wrote:
Marco,
What makes you think that the crash after second modprobe is related to the mappings pointers in DRM module? Can you actually establish the correlation between these patches and the crash or you are just suspecting because your other bug had something to do with module removal/insertion?
If it's the latter, then you may want to open another bug report here https://bugs.freedesktop.org/ (use DRI for product and pick DRM/radeon for component) and have this issue tracked and addressed separately.
The divide error that your log shows apparently happens at this line inside r6xx_remap_render_backend:
pipe_rb_ratio = rendering_pipe_num / req_rb_num;
I would suspect that req_rb_num somehow evaluates to zero at the second modprobe. That variable seems to be the derived of the last three arguments to r6xx_remap_render_backend. If I look at the caller (evergreen_gpu_init) the arguments that have the play here are all derived from the GPU's hardware registers (or are the constant for a given GPU device). So I suspect that the GPU driver leaves some state in GPU at module removal that later bites you.
Newer kernels have a fix for this. http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f6...
Alex
-- Ilija
On Tue, 2 Apr 2013, Marco Munderloh wrote:
Hi Ilija,
Thanks for testing. Other issues are probably unrelated, so I'll send the last version of the patch to Dave.
I came across another problem which seems related. rmmod radeon works, however, modprobe radeon afterwards results in a crash (divide error), see attachment.
Best, Marco
On 02.04.2013 13:23, Ilija Hadzic wrote:
-- Ilija
On Tue, Apr 2, 2013 at 6:36 AM, Marco Munderloh <munderl@tnt.uni-hannover.de mailto:munderl@tnt.uni-hannover.de> wrote:
Attached is a v2 of the patch, for reference. I would appreciate
if the original reporter or you tested it in lieu of your proposed patch and let me know if it fixes your issue.
The patch works for me. echo 3 > /proc/sys/vm/drop_caches as well as
rmmod radeon do not end up in a crash anymore. However, I have still no clue why one of these makes drm_open to fail. On rmmod radeon I get the following log messages. If don't know if the 'unpin not necessary' has anything to do with it.
[drm] radeon: finishing device. radeon 0000:01:00.0: ffff88024e526c00 unpin not necessary radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary radeon 0000:01:00.0: ffff88024f2f6000 unpin not necessary [TTM] Finalizing pool allocator [TTM] Finalizing DMA pool allocator [TTM] Zone kernel: Used memory at exit: 0 kiB [TTM] Zone dma32: Used memory at exit: 0 kiB [drm] radeon: ttm finalized vga_switcheroo: disabled [drm] Module unloaded By the way, sometimes my r8169 ethernet controller does not survive
suspend/hibernation (does not detect link). rmmod/modprobe helps. I don't know if this is related.
-- Dipl.-Ing. Marco Munderloh Mail: munderl@tnt.uni-hannover.de Institut für Informationsverarbeitung (TNT) Phone: +49 511 762-19587 Leibniz Universitaet Hannover, Appelstr. 9a Fax: +49 511 762- 5333 30167 Hannover, Germany Web: http://www.tnt.uni-hannover.de/~munderl
dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
dri-devel@lists.freedesktop.org