On Thu, Nov 24, 2016 at 01:18:39PM +0000, Robin Murphy wrote:
Hi Liviu, Russell,
I'd been meaning to try digging into this if it hadn't gone away since I first noticed it, but I don't really have the time and it still happens with 4.9-rc and today's -next. Representative splat below, but in summary what happens is that if the HDLCD fails to probe, the TDA998x connector seems to get cleaned up twice, resulting in a NULL dereference the second time. I got as far as sketching out the following flow from a debug session (on the same 4.8-rc2 kernel), but I don't know nearly enough to tell which driver is at fault:
hdlcd_drm_bind -> drm_fbdev_cma_init (fails) ... -> drm_mode_config_cleanup ... -> drm_connector_cleanup -> component_unbind_all ... -> tda998x_unbind -> drm_connector_cleanup (NULL connector)
It's easily reproduced on Juno by booting arm64 defconfig with CONFIG_CMA_SIZE_MBYTES=1 and a sufficiently large monitor connected to warrant a >1MB framebuffer.
It looks to me like a hdlcd bug.
The probe path operates in this order:
- allocates hdlcd - 1 - allocates drm device - 2 - drm_mode_config_init - 3 - hdlcd_load - 4 - binds all components - 5 - enables runtime PM - 6 - drm_vblank_init - 7 - drm_mode_config_reset - 8 - drm_kms_helper_poll_init - 9 - drm_fbdev_cma_init - 10 - drm_dev_register - 11
However, the cleanup operates in this order: - drm_fbdev_cma_fini - undoes 10 - drm_kms_helper_poll_fini - undoes 9 - drm_mode_config_cleanup - undoes 3 - drm_vblank_cleanup - undoes 7 - pm_runtime_disable - undoes 6 - component_unbind_all - undoes 5 - drm_irq_uninstall - undoes 4 - of_reserved_mem_device_release - undoes other half of 4 - drm_dev_unref - undoes 2
Spot the step which is out of the correct order - drm_mode_config_cleanup() is misplaced - it's reversing the actions of drm_mode_config_init(), not drm_mode_config_reset().
So, drm_mode_config_cleanup() should be much later, after step 4 has been undone, otherwise there are paths that leave various DRM objects (created by drm_mode_create_standard_properties()) referenced, and will cause problems exactly like you're seeing here.