Am 08.10.21 um 17:11 schrieb Greg KH:
On Fri, Oct 08, 2021 at 04:22:06PM +0200, Christian König wrote:
Hi guys,
thanks Nirmoy for forwarding this, there is seriously something wrong with the AMD mail servers.
On 10/8/2021 1:07 PM, Greg KH wrote:
On Fri, Oct 08, 2021 at 12:40:47PM +0300, Jani Nikula wrote:
On Fri, 08 Oct 2021, Nirmoy Das nirmoy.das@amd.com wrote:
Debugfs API returns encoded error instead of NULL. This patch cleanups drm debugfs error handling to properly set dri and its minor's root dentry to NULL.
Also do not error out if dri/minor debugfs directory creation fails as a debugfs error is not a fatal error.
Cc: Greg
I thought this is the opposite of what Greg's been telling everyone to do with debugfs.
Yes, that is not good.
You should never care about the result of a debugfs_create* call. Just take the result, and if it is a directory, save it off to use it for creating a file, no need to check anything.
While I totally agree to the intention behind that I'm pretty sure there are some consequences which are a rather bad idea.
First of all the debugfs functions return a normal struct dentry pointer and keeping that around unchecked means that we now have pointers in structure members which can suddenly be an ERR_PTR() without any documentation that they are not real pointers.
What we could do instead is something like returning a typedef or a structure with the dentry pointer embedded and then document that those are special, can be ERR_PTR() and should never be touched except for the debugfs functions itself.
I agree, and am slowly working toward that, but am not there yet. If you look, I have removed the return values for almost all debugfs_create_* calls, only a few remain.
For now, just treat it like a "void" pointer that you have no context for and all will be fine.
Ok in that case we should just document on the saved dentry that this pointer is not necessary valid.
Nirmoy, can you take care of that?
The other issue is that adding the same file twice is unfortunately a rather common coding error which we don't catch or complain about any more if we don't look at the return value at all.
How can you add the same debugfs file twice? You have different directories.
We had multiple occasions triggering that:
1. Code accidentally initializing a component twice.
Except for the debugfs entry and a bit memory leak we had no negative effect otherwise and wouldn't had detected that without the error message from debugfs.
2. Driver not cleaning up properly on hotplug. E.g. old subdirectory not properly removed and re-plug.
3. Driver trying to use the same debugfs file for different devices.
And really, who cares, it's debugging code! :)
Well especially cause 3 once took me a day to figure out that I'm looking at the wrong hardware state because the driver was handling two devices, but only one showed up under debugfs.
And then throw it away, later, when you want to remove the directory, look it up with a call to debugfs_lookup() and pass that to debugfs_remove() (which does so recursively).
There should never be a need to save, or check, the result of any debugfs call. If so, odds are it is being used incorrectly.
Yeah, exactly that's the problem I see here.
We save the return value because the DRM subsystem is creating a debugfs directory for the drivers to use.
That's fine for now, not a big deal. And even if there is an error, again, you can always feed that error back into the debugfs subsystem on another call and it will handle it correctly.
Problem is it isn't, we have a crash because the member isn't a pointer but an ERR_PTR instead.
And adding IS_ERR checks all around is even worse than adding NULL checks.
Regards, Christian.
thanks,
greg k-h
On Mon, Oct 11, 2021 at 04:19:58PM +0200, Christian König wrote:
And then throw it away, later, when you want to remove the directory, look it up with a call to debugfs_lookup() and pass that to debugfs_remove() (which does so recursively).
There should never be a need to save, or check, the result of any debugfs call. If so, odds are it is being used incorrectly.
Yeah, exactly that's the problem I see here.
We save the return value because the DRM subsystem is creating a debugfs directory for the drivers to use.
That's fine for now, not a big deal. And even if there is an error, again, you can always feed that error back into the debugfs subsystem on another call and it will handle it correctly.
Problem is it isn't, we have a crash because the member isn't a pointer but an ERR_PTR instead.
Again, that is fine, you can feed that into debugfs and it will "just work". Treat it as an opaque pointer, not a *dentry and you will be fine.
thanks,
greg k-h
On Mon, 11 Oct 2021, Greg KH gregkh@linuxfoundation.org wrote:
On Mon, Oct 11, 2021 at 04:19:58PM +0200, Christian König wrote:
And then throw it away, later, when you want to remove the directory, look it up with a call to debugfs_lookup() and pass that to debugfs_remove() (which does so recursively).
There should never be a need to save, or check, the result of any debugfs call. If so, odds are it is being used incorrectly.
Yeah, exactly that's the problem I see here.
We save the return value because the DRM subsystem is creating a debugfs directory for the drivers to use.
That's fine for now, not a big deal. And even if there is an error, again, you can always feed that error back into the debugfs subsystem on another call and it will handle it correctly.
Problem is it isn't, we have a crash because the member isn't a pointer but an ERR_PTR instead.
Again, that is fine, you can feed that into debugfs and it will "just work". Treat it as an opaque pointer, not a *dentry and you will be fine.
Hmm, some of the patches add things like:
+ + if (!root) + goto error; + minor->debugfs_root = debugfs_create_dir(name, root);
Superficially this seems okay, as it looks like debugfs_create_dir() doesn't actually cope with NULL values. However, since ->debugfs_root comes from debugfs_create_dir() I presume it'll never be NULL on errors anyway but rather an error pointer!
So I think we probably need to go through the drm subsystem and look for existing similar patterns in fix them.
BR, Jani.
thanks,
greg k-h
On Mon, Oct 11, 2021 at 07:38:22PM +0300, Jani Nikula wrote:
On Mon, 11 Oct 2021, Greg KH gregkh@linuxfoundation.org wrote:
On Mon, Oct 11, 2021 at 04:19:58PM +0200, Christian König wrote:
> And then throw it away, later, when you want to remove the directory, > look it up with a call to debugfs_lookup() and pass that to > debugfs_remove() (which does so recursively). > > There should never be a need to save, or check, the result of any > debugfs call. If so, odds are it is being used incorrectly.
Yeah, exactly that's the problem I see here.
We save the return value because the DRM subsystem is creating a debugfs directory for the drivers to use.
That's fine for now, not a big deal. And even if there is an error, again, you can always feed that error back into the debugfs subsystem on another call and it will handle it correctly.
Problem is it isn't, we have a crash because the member isn't a pointer but an ERR_PTR instead.
Again, that is fine, you can feed that into debugfs and it will "just work". Treat it as an opaque pointer, not a *dentry and you will be fine.
Hmm, some of the patches add things like:
- if (!root)
goto error;
- minor->debugfs_root = debugfs_create_dir(name, root);
Superficially this seems okay, as it looks like debugfs_create_dir() doesn't actually cope with NULL values.
Yes it does, it puts things at the root of debugfs.
But why are you checking for NULL here, as the return value of a debugfs call can never be NULL?
However, since ->debugfs_root comes from debugfs_create_dir() I presume it'll never be NULL on errors anyway but rather an error pointer!
That is correct.
So I think we probably need to go through the drm subsystem and look for existing similar patterns in fix them.
Please do. I know I made one pass at it a while ago but I think someone else went through and cleaned them up again.
thanks,
greg k-h
On Mon, 11 Oct 2021, Greg KH gregkh@linuxfoundation.org wrote:
On Mon, Oct 11, 2021 at 07:38:22PM +0300, Jani Nikula wrote:
On Mon, 11 Oct 2021, Greg KH gregkh@linuxfoundation.org wrote:
On Mon, Oct 11, 2021 at 04:19:58PM +0200, Christian König wrote:
> > And then throw it away, later, when you want to remove the directory, > > look it up with a call to debugfs_lookup() and pass that to > > debugfs_remove() (which does so recursively). > > > > There should never be a need to save, or check, the result of any > > debugfs call. If so, odds are it is being used incorrectly. Yeah, exactly that's the problem I see here.
We save the return value because the DRM subsystem is creating a debugfs directory for the drivers to use.
That's fine for now, not a big deal. And even if there is an error, again, you can always feed that error back into the debugfs subsystem on another call and it will handle it correctly.
Problem is it isn't, we have a crash because the member isn't a pointer but an ERR_PTR instead.
Again, that is fine, you can feed that into debugfs and it will "just work". Treat it as an opaque pointer, not a *dentry and you will be fine.
Hmm, some of the patches add things like:
- if (!root)
goto error;
- minor->debugfs_root = debugfs_create_dir(name, root);
Superficially this seems okay, as it looks like debugfs_create_dir() doesn't actually cope with NULL values.
Yes it does, it puts things at the root of debugfs.
Oh, thanks for the correction.
But why are you checking for NULL here, as the return value of a debugfs call can never be NULL?
Just musing on what is going on in the patch, and why such changes initially seem like good ideas and get through review.
Thanks, Jani.
However, since ->debugfs_root comes from debugfs_create_dir() I presume it'll never be NULL on errors anyway but rather an error pointer!
That is correct.
So I think we probably need to go through the drm subsystem and look for existing similar patterns in fix them.
Please do. I know I made one pass at it a while ago but I think someone else went through and cleaned them up again.
thanks,
greg k-h
On 10/11/2021 4:19 PM, Christian König wrote:
Am 08.10.21 um 17:11 schrieb Greg KH:
On Fri, Oct 08, 2021 at 04:22:06PM +0200, Christian König wrote:
Hi guys,
thanks Nirmoy for forwarding this, there is seriously something wrong with the AMD mail servers.
On 10/8/2021 1:07 PM, Greg KH wrote:
On Fri, Oct 08, 2021 at 12:40:47PM +0300, Jani Nikula wrote:
On Fri, 08 Oct 2021, Nirmoy Das nirmoy.das@amd.com wrote: > Debugfs API returns encoded error instead of NULL. > This patch cleanups drm debugfs error handling to > properly set dri and its minor's root dentry to NULL. > > Also do not error out if dri/minor debugfs directory > creation fails as a debugfs error is not a fatal error. Cc: Greg
I thought this is the opposite of what Greg's been telling everyone to do with debugfs.
Yes, that is not good.
You should never care about the result of a debugfs_create* call. Just take the result, and if it is a directory, save it off to use it for creating a file, no need to check anything.
While I totally agree to the intention behind that I'm pretty sure there are some consequences which are a rather bad idea.
First of all the debugfs functions return a normal struct dentry pointer and keeping that around unchecked means that we now have pointers in structure members which can suddenly be an ERR_PTR() without any documentation that they are not real pointers.
What we could do instead is something like returning a typedef or a structure with the dentry pointer embedded and then document that those are special, can be ERR_PTR() and should never be touched except for the debugfs functions itself.
I agree, and am slowly working toward that, but am not there yet. If you look, I have removed the return values for almost all debugfs_create_* calls, only a few remain.
For now, just treat it like a "void" pointer that you have no context for and all will be fine.
Ok in that case we should just document on the saved dentry that this pointer is not necessary valid.
Nirmoy, can you take care of that?
Sure, I will send patches to document and clean our drm core debugfs code.
Thanks,
Nirmoy
The other issue is that adding the same file twice is unfortunately a rather common coding error which we don't catch or complain about any more if we don't look at the return value at all.
How can you add the same debugfs file twice? You have different directories.
We had multiple occasions triggering that:
- Code accidentally initializing a component twice.
Except for the debugfs entry and a bit memory leak we had no negative effect otherwise and wouldn't had detected that without the error message from debugfs.
- Driver not cleaning up properly on hotplug. E.g. old subdirectory
not properly removed and re-plug.
- Driver trying to use the same debugfs file for different devices.
And really, who cares, it's debugging code! :)
Well especially cause 3 once took me a day to figure out that I'm looking at the wrong hardware state because the driver was handling two devices, but only one showed up under debugfs.
And then throw it away, later, when you want to remove the directory, look it up with a call to debugfs_lookup() and pass that to debugfs_remove() (which does so recursively).
There should never be a need to save, or check, the result of any debugfs call. If so, odds are it is being used incorrectly.
Yeah, exactly that's the problem I see here.
We save the return value because the DRM subsystem is creating a debugfs directory for the drivers to use.
That's fine for now, not a big deal. And even if there is an error, again, you can always feed that error back into the debugfs subsystem on another call and it will handle it correctly.
Problem is it isn't, we have a crash because the member isn't a pointer but an ERR_PTR instead.
And adding IS_ERR checks all around is even worse than adding NULL checks.
Regards, Christian.
thanks,
greg k-h
dri-devel@lists.freedesktop.org