On 24/11/16 13:29, Russell King - ARM Linux wrote:
On Thu, Nov 24, 2016 at 01:18:39PM +0000, Robin Murphy wrote:
Hi Liviu, Russell,
I'd been meaning to try digging into this if it hadn't gone away since I first noticed it, but I don't really have the time and it still happens with 4.9-rc and today's -next. Representative splat below, but in summary what happens is that if the HDLCD fails to probe, the TDA998x connector seems to get cleaned up twice, resulting in a NULL dereference the second time. I got as far as sketching out the following flow from a debug session (on the same 4.8-rc2 kernel), but I don't know nearly enough to tell which driver is at fault:
hdlcd_drm_bind -> drm_fbdev_cma_init (fails) ... -> drm_mode_config_cleanup ... -> drm_connector_cleanup -> component_unbind_all ... -> tda998x_unbind -> drm_connector_cleanup (NULL connector)
It's easily reproduced on Juno by booting arm64 defconfig with CONFIG_CMA_SIZE_MBYTES=1 and a sufficiently large monitor connected to warrant a >1MB framebuffer.
It looks to me like a hdlcd bug.
The probe path operates in this order:
- allocates hdlcd - 1
- allocates drm device - 2
- drm_mode_config_init - 3
- hdlcd_load - 4
- binds all components - 5
- enables runtime PM - 6
- drm_vblank_init - 7
- drm_mode_config_reset - 8
- drm_kms_helper_poll_init - 9
- drm_fbdev_cma_init - 10
- drm_dev_register - 11
However, the cleanup operates in this order:
- drm_fbdev_cma_fini - undoes 10
- drm_kms_helper_poll_fini - undoes 9
- drm_mode_config_cleanup - undoes 3
- drm_vblank_cleanup - undoes 7
- pm_runtime_disable - undoes 6
- component_unbind_all - undoes 5
- drm_irq_uninstall - undoes 4
- of_reserved_mem_device_release - undoes other half of 4
- drm_dev_unref - undoes 2
Spot the step which is out of the correct order - drm_mode_config_cleanup() is misplaced - it's reversing the actions of drm_mode_config_init(), not drm_mode_config_reset().
Thanks for the explanation - that saves at least a day's worth of me trying to understand DRM code :)
So, drm_mode_config_cleanup() should be much later, after step 4 has been undone, otherwise there are paths that leave various DRM objects (created by drm_mode_create_standard_properties()) referenced, and will cause problems exactly like you're seeing here.
Liviu, can I leave this with you then?
Robin.