Hi Ben,
I see in commit f553b79c03f0dbd52f6f03abe8233a2bef8cbd0d that you just changed the nouveau driver to use an internal i2c bit-banging implementation instead of i2c-algo-bit. Let me say here publicly that I disapprove this change. I believe this is a move in the wrong direction. Just because doing so fixed one bug you were seeing doesn't make it right.
If i2c-algo-bit has problems, please report them to the maintainer (i.e. me) and have them fixed. If you have problems with it, others may do as well. If everyone stops using common pieces of code as soon as they have a problem, we will end up with a lot of duplicated code in the kernel, which is bad in many respects.
In the commit message, you complain about the use of cond_resched() in i2c-algo-bit. You might as well be right, maybe it should be removed. It's been there pretty much forever (February 2002 at least) and while I can understand why it was put there, I would agree it isn't necessarily a good idea. Did you try removing it to see if it solved your problem?
There is also a call to yield() somewhere else in the i2c-algo-bit driver. I remember having a discussion about it long ago, someone proposed to change it to cond_resched(), but it never happened. I admit I don't know the difference between cond_resched() and yield() so I can't really make a decision until someone explains it to me.
I see that your reimplementation uses:
#define T_HOLD 5000
which basically means you're running the I2C bus at ~100 kHz, while the original code had:
port->bit.udelay = 40;
which basically ran the bus at ~12.5 kHz, i.e. only a tad faster than the low limit allowed by SMBus. I warned some times ago that such low speeds could be problematic:
http://lists.freedesktop.org/archives/dri-devel/2011-October/015512.html
I even posted a patch:
http://lists.freedesktop.org/archives/dri-devel/2011-October/015504.html
Despite positive comments from the i915 and radeon driver maintainers, it was never applied. Did you try lowering the udelay value with i2c- algo-bit to see if it would solve your problem?
David, any reason why you did not pick this patch? Should I resend it? In a split form maybe?
Ben, again, I would really have appreciated if you had contacted me while investigating your NVS 300 issue, instead of silently running away from i2c-algo-bit. If the code is broken, let's fix it for the benefit of all users. Duplicating code around isn't going to help any.
Thanks,
On Wed, 2012-01-11 at 11:17 +0100, Jean Delvare wrote:
Hi Ben,
Hi Jean,
I see in commit f553b79c03f0dbd52f6f03abe8233a2bef8cbd0d that you just changed the nouveau driver to use an internal i2c bit-banging implementation instead of i2c-algo-bit. Let me say here publicly that I disapprove this change. I believe this is a move in the wrong direction. Just because doing so fixed one bug you were seeing doesn't make it right.
If i2c-algo-bit has problems, please report them to the maintainer (i.e. me) and have them fixed. If you have problems with it, others may do as well. If everyone stops using common pieces of code as soon as they have a problem, we will end up with a lot of duplicated code in the kernel, which is bad in many respects.
In the commit message, you complain about the use of cond_resched() in i2c-algo-bit. You might as well be right, maybe it should be removed. It's been there pretty much forever (February 2002 at least) and while I can understand why it was put there, I would agree it isn't necessarily a good idea. Did you try removing it to see if it solved your problem?
I don't disagree, and I held off a long time before finally committing it to the nouveau repository due to not liking the duplication. My final decision to do it was admittedly due to having very very limited time and a *lot* of other things to work on. And this solution worked now.
The fact i2c-algo-bit can't be used from an atomic context was probably a convenient excuse I guess, but I did also figure there were good reasons for it and there'd likely be push-back at attempting to change that. Again, due to having loads of other things to do, I took the quick approach.
There is also a call to yield() somewhere else in the i2c-algo-bit driver. I remember having a discussion about it long ago, someone proposed to change it to cond_resched(), but it never happened. I admit I don't know the difference between cond_resched() and yield() so I can't really make a decision until someone explains it to me.
I see that your reimplementation uses:
#define T_HOLD 5000
which basically means you're running the I2C bus at ~100 kHz, while the original code had:
port->bit.udelay = 40;
which basically ran the bus at ~12.5 kHz, i.e. only a tad faster than the low limit allowed by SMBus. I warned some times ago that such low speeds could be problematic:
http://lists.freedesktop.org/archives/dri-devel/2011-October/015512.html
I even posted a patch:
http://lists.freedesktop.org/archives/dri-devel/2011-October/015504.html
Despite positive comments from the i915 and radeon driver maintainers, it was never applied. Did you try lowering the udelay value with i2c- algo-bit to see if it would solve your problem?
This was one of the first things I tried actually, and while it changed the behaviour somewhat (went from "nearly always failing" to "mostly always failing" to fetch edid correctly) it didn't fix it.
David, any reason why you did not pick this patch? Should I resend it? In a split form maybe?
Ben, again, I would really have appreciated if you had contacted me while investigating your NVS 300 issue, instead of silently running away from i2c-algo-bit. If the code is broken, let's fix it for the benefit of all users. Duplicating code around isn't going to help any.
As mentioned above, I don't disagree. And I apologise for not thinking to contact you about this.
I'd be more than happy to have these issues (nvs300 and running from an atomic context) solved in i2c-algo-bit, and will gladly revert the nouveau changes once it is. Let me know how I can help you with that.
Thanks, Ben.
Thanks,
Hi Ben,
I'm sorry for the very very late reply. Please do not jump to the conclusion that I do not care - I do! Just I am very busy, just as you.
On Wednesday 11 January 2012 01:40:58 pm Ben Skeggs wrote:
On Wed, 2012-01-11 at 11:17 +0100, Jean Delvare wrote:
In the commit message, you complain about the use of cond_resched() in i2c-algo-bit. You might as well be right, maybe it should be removed. It's been there pretty much forever (February 2002 at least) and while I can understand why it was put there, I would agree it isn't necessarily a good idea. Did you try removing it to see if it solved your problem?
I don't disagree, and I held off a long time before finally committing it to the nouveau repository due to not liking the duplication. My final decision to do it was admittedly due to having very very limited time and a *lot* of other things to work on. And this solution worked now.
The fact i2c-algo-bit can't be used from an atomic context was probably a convenient excuse I guess, but I did also figure there were good reasons for it and there'd likely be push-back at attempting to change that. Again, due to having loads of other things to do, I took the quick approach.
I don't think there actually is any good reason, and not only I wouldn't push back if someone wants to get rid of the call to cond_resched(), but I would even push for it to happen ;)
Now the question is, what do we replace it with? Nothing? cpu_relax()? yield()? udelay(<small value>)?
I guess yield() would have the same property of making the code not callable from atomic context. Not having anything might be harsh from a CPU and I/O load perspective, as it would be a tight loop lasting for several milliseconds. So this leaves us with cpu_relax() and udelay(). Any preference?
(...) I'd be more than happy to have these issues (nvs300 and running from an atomic context) solved in i2c-algo-bit, and will gladly revert the nouveau changes once it is. Let me know how I can help you with that.
Incidentally I just sent a fix upstream for a bug reported by Ville Syrjälä in i2c-algo-bit, that would cause spurious timeouts on heavy CPU load. I am curious if it would have helped in your case. Please take a look:
http://marc.info/?l=linux-kernel&m=133182203301053&w=2
And if you think it could help, I would appreciate if you could give it a try with the nouveau driver and see if it actually does.
Thanks,
dri-devel@lists.freedesktop.org