New subject: Fw: [Intel-gfx] [PATCH 1/5] dri: cleanup debugfs error handling

11 Oct 2021


      Am 08.10.21 um 17:11 schrieb Greg KH:
...
On Fri, Oct 08, 2021 at 04:22:06PM +0200, Christian König wrote:
...
Hi guys,
thanks Nirmoy for forwarding this, there is seriously something wrong with
the AMD mail servers.
...
On 10/8/2021 1:07 PM, Greg KH wrote:
...
On Fri, Oct 08, 2021 at 12:40:47PM +0300, Jani Nikula wrote:
...
On Fri, 08 Oct 2021, Nirmoy Das nirmoy.das@amd.com wrote:
...
Debugfs API returns encoded error instead of NULL.
This patch cleanups drm debugfs error handling to
properly set dri and its minor's root dentry to NULL.
Also do not error out if dri/minor debugfs directory
creation fails as a debugfs error is not a fatal error.
Cc: Greg
I thought this is the opposite of what Greg's been telling everyone to
do with debugfs.
Yes, that is not good.
You should never care about the result of a debugfs_create* call.  Just
take the result, and if it is a directory, save it off to use it for
creating a file, no need to check anything.
While I totally agree to the intention behind that I'm pretty sure there are
some consequences which are a rather bad idea.
First of all the debugfs functions return a normal struct dentry pointer and
keeping that around unchecked means that we now have pointers in structure
members which can suddenly be an ERR_PTR() without any documentation that
they are not real pointers.
What we could do instead is something like returning a typedef or a
structure with the dentry pointer embedded and then document that those are
special, can be ERR_PTR() and should never be touched except for the debugfs
functions itself.
I agree, and am slowly working toward that, but am not there yet.  If
you look, I have removed the return values for almost all
debugfs_create_* calls, only a few remain.
For now, just treat it like a "void" pointer that you have no context
for and all will be fine.
Ok in that case we should just document on the saved dentry that this 
pointer is not necessary valid.
Nirmoy, can you take care of that?
...
...
The other issue is that adding the same file twice is unfortunately a rather
common coding error which we don't catch or complain about any more if we
don't look at the return value at all.
How can you add the same debugfs file twice?  You have different
directories.
We had multiple occasions triggering that:
1. Code accidentally initializing a component twice.
Except for the debugfs entry and a bit memory leak we had no negative 
effect otherwise and wouldn't had detected that without the error 
message from debugfs.
2. Driver not cleaning up properly on hotplug. E.g. old subdirectory not 
properly removed and re-plug.
3. Driver trying to use the same debugfs file for different devices.
...
And really, who cares, it's debugging code!  :)
Well especially cause 3 once took me a day to figure out that I'm 
looking at the wrong hardware state because the driver was handling two 
devices, but only one showed up under debugfs.
...
...
...
...
And then throw it away, later, when you want to remove the directory,
look it up with a call to debugfs_lookup() and pass that to
debugfs_remove() (which does so recursively).
There should never be a need to save, or check, the result of any
debugfs call.  If so, odds are it is being used incorrectly.
Yeah, exactly that's the problem I see here.
We save the return value because the DRM subsystem is creating a debugfs
directory for the drivers to use.
That's fine for now, not a big deal.  And even if there is an error,
again, you can always feed that error back into the debugfs subsystem on
another call and it will handle it correctly.
Problem is it isn't, we have a crash because the member isn't a pointer 
but an ERR_PTR instead.
And adding IS_ERR checks all around is even worse than adding NULL checks.
Regards,
Christian.
...
thanks,
greg k-h

Re: Fw: [Intel-gfx] [PATCH 1/5] dri: cleanup debugfs error handling