On Fri, Aug 24, 2012 at 02:13:33PM +1000, Dave Airlie wrote:
On Mon, Jul 23, 2012 at 7:26 PM, Daniel Vetter daniel.vetter@ffwll.ch wrote:
Well, actually add some, because currently there's exactly none:
- in handle_to_fd we only take the file_priv's prime lock, but the underlying gem object we're exporting isn't per-file.
- in each driver's dma_buf->release callback we don't grab any locks at all.
Finally got to this patch and applied it to my test box and it explodes in the following
start X, bind udl as an output slave, set a mode on the udl, then kill the X server,
I get an oops on i915/radeon/nouveau, like
I have to admit that I don't see what's happening here :( But thinking some more about the locking issues and race my patch tried to fix I concluded that it's the wrong approach anyway - we still have ugly issues left. The fundamental problem is that for the exporter we have 2 refcounts, the gem refcount and the dma_buf refcount, but we should only destroy the gem object + it's dma_buf export if both are 0. And keep everything around otherwise to avoid nasty issues with re-export or reaping userspace-facing resources like the mmap_offset. But I haven't yet grown clue how to handle this any better.
Slightly related: Jani discovered that we seem to have a leak, putting him on cc just in case he's the one with clue ;-)
I've looked again at the other two patches inspired by this one, which fix some races between gem names & gem flink. And I think those two are still valid.
Yours, Daniel
[ 8046.363596] general protection fault: 0000 [#1] SMP [ 8046.364012] Modules linked in: fuse lockd rfcomm sunrpc bnep tpm_bios ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_conntrack_ ipv4 nf_defrag_ipv4 xt_state nf_conntrack snd_hda_codec_conexant arc4 iwldvm mac80211 coretemp kvm_intel snd_hda_intel snd_hda_codec kvm btusb snd_hwdep bluet ooth iwlwifi i915 snd_seq snd_seq_device microcode snd_pcm cfg80211 snd_page_alloc i2c_i801 lpc_ich e1000e mfd_core snd_timer wmi thinkpad_acpi snd soundcore rfkill video uinput sdhci_pci sdhci udl syscopyarea sysfillrect sysimgblt firewire_ohci mmc_core drm_usb firewire_core crc_itu_t yenta_socket radeon i2c_algo_ bit drm_kms_helper ttm drm i2c_core [ 8046.364012] CPU 0 [ 8046.364012] Pid: 6947, comm: Xorg Tainted: G W 3.6.0-rc3+ #7 LENOVO 406334M/406334M [ 8046.364012] RIP: 0010:[<ffffffffa0388e0c>] [<ffffffffa0388e0c>] i915_gem_unmap_dma_buf+0x5c/0xb0 [i915] [ 8046.364012] RSP: 0018:ffff88002c22dc28 EFLAGS: 00010296 [ 8046.364012] RAX: 0000000000000500 RBX: ffff880007d5d6d8 RCX: 0000000000000000 [ 8046.364012] RDX: 0000000000000500 RSI: ffff8800570ca000 RDI: ffff88005b2ea5a8 [ 8046.364012] RBP: ffff88002c22dc68 R08: 0000000000000000 R09: 0000000000000000 [ 8046.364012] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88005b2ea5a8 [ 8046.364012] R13: 0000000000000000 R14: 6b6b6b6b6b6b6b6b R15: ffff8800570ca000 [ 8046.364012] FS: 00007f79955759c0(0000) GS:ffff880078c00000(0000) knlGS:0000000000000000 [ 8046.364012] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8046.364012] CR2: 00000031808bab90 CR3: 0000000001c0b000 CR4: 00000000000007f0 [ 8046.364012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8046.364012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 8046.364012] Process Xorg (pid: 6947, threadinfo ffff88002c22c000, task ffff88005d680000) [ 8046.364012] Stack: [ 8046.364012] ffff880058d167b0 ffff880000000500 ffff88002c22dc48 ffff88005f6f0f50 [ 8046.364012] ffff88005d9622f8 ffff88005d962290 0000000000000000 ffffffffffffffff [ 8046.364012] ffff88002c22dc78 ffffffff81409182 ffff88002c22dc98 ffffffffa002cfeb [ 8046.364012] Call Trace: [ 8046.364012] [<ffffffff81409182>] dma_buf_unmap_attachment+0x22/0x40 [ 8046.364012] [<ffffffffa002cfeb>] drm_prime_gem_destroy+0x2b/0x50 [drm] [ 8046.364012] [<ffffffffa01f9cb9>] udl_gem_free_object+0x39/0x70 [udl] [ 8046.364012] [<ffffffffa001695a>] drm_gem_object_free+0x2a/0x30 [drm] [ 8046.364012] [<ffffffffa00171a8>] drm_gem_object_release_handle+0xa8/0xd0 [drm] [ 8046.364012] [<ffffffff81319b85>] idr_for_each+0xe5/0x180 [ 8046.364012] [<ffffffffa0017100>] ? drm_gem_vm_open+0x70/0x70 [drm] [ 8046.364012] [<ffffffff810bea2d>] ? trace_hardirqs_on+0xd/0x10 [ 8046.364012] [<ffffffffa0017804>] drm_gem_release+0x24/0x40 [drm] [ 8046.364012] [<ffffffffa0015e1a>] drm_release+0x55a/0x5f0 [drm] [ 8046.364012] [<ffffffff811a908a>] __fput+0xfa/0x2d0 [ 8046.364012] [<ffffffff811a926e>] ____fput+0xe/0x10 [ 8046.364012] [<ffffffff81076c89>] task_work_run+0x69/0xa0 [ 8046.364012] [<ffffffff81056aee>] do_exit+0xa0e/0xb20 [ 8046.364012] [<ffffffff816887d5>] ? retint_swapgs+0x13/0x1b [ 8046.364012] [<ffffffff81056f4c>] do_group_exit+0x4c/0xc0
which implies some reference count is off or some object is freed in the wrong order.
Dave.