An awful lot of drivers, mostly DRI drivers, are still mucking with MTRRs directly as opposed to using ioremap_wc() or similar interfaces. In addition to the architecture dependency, this is really undesirable because MTRRs are a limited resource, whereas page table attributes are not.
Furthermore, this perpetuates the need for the horrific hack known as "MTRR cleanup".
What, if anything, can we do to clean up this mess?
-hpa
Le 21/06/2013 07:00, H. Peter Anvin a écrit :
An awful lot of drivers, mostly DRI drivers, are still mucking with MTRRs directly as opposed to using ioremap_wc() or similar interfaces. In addition to the architecture dependency, this is really undesirable because MTRRs are a limited resource, whereas page table attributes are not.
Furthermore, this perpetuates the need for the horrific hack known as "MTRR cleanup".
What, if anything, can we do to clean up this mess?
-hpa
The first network driver that used ioremap_wc() back in 2008 (myri10ge) had to keep using MTRR because ioremap_wc() silently falls back to ioremap_nocache() when PAT is disabled.
I asked about this in https://lkml.org/lkml/2008/5/31/42 and there was some talk about putting the MTRR addition in the nocache fallback path but I guess nobody implemented the idea.
Brice
Why do you care about performance when PAT is disabled?
Brice Goglin brice.goglin@gmail.com wrote:
Le 21/06/2013 07:00, H. Peter Anvin a écrit :
An awful lot of drivers, mostly DRI drivers, are still mucking with MTRRs directly as opposed to using ioremap_wc() or similar
interfaces.
In addition to the architecture dependency, this is really
undesirable
because MTRRs are a limited resource, whereas page table attributes
are not.
Furthermore, this perpetuates the need for the horrific hack known as "MTRR cleanup".
What, if anything, can we do to clean up this mess?
-hpa
The first network driver that used ioremap_wc() back in 2008 (myri10ge) had to keep using MTRR because ioremap_wc() silently falls back to ioremap_nocache() when PAT is disabled.
I asked about this in https://lkml.org/lkml/2008/5/31/42 and there was some talk about putting the MTRR addition in the nocache fallback path but I guess nobody implemented the idea.
Brice
On Sun, 23 Jun 2013, H. Peter Anvin wrote:
Why do you care about performance when PAT is disabled?
It will regress already slow boxes. We blacklist a LOT of P4s, PMs, etc and nobody ever took the pain to track down which ones of those actually have PAT+MTRR aliasing bugs.
These boxes have boards like the Radeon X300, which needs either PAT or MTRR to not become unusable...
On 06/23/2013 12:29 PM, Henrique de Moraes Holschuh wrote:
On Sun, 23 Jun 2013, H. Peter Anvin wrote:
Why do you care about performance when PAT is disabled?
It will regress already slow boxes. We blacklist a LOT of P4s, PMs, etc and nobody ever took the pain to track down which ones of those actually have PAT+MTRR aliasing bugs.
These boxes have boards like the Radeon X300, which needs either PAT or MTRR to not become unusable...
We're talking hardware which is now many years old, but this is causing very serious problems on real, modern hardware. As far as I understand it, too, the blacklisting was precautionary (the only bug that I personally know about is a performance bug, where WC would be incorrectly converted to UC.)
We need a way forward here. If it is the only way I think we would have to sacrifice the old machines, but perhaps something can be worked out (e.g. if PAT is disabled, fall back to MTRRs if available for ioremap_wc()).
-hpa
Why do you care about performance when PAT is disabled?
breaking old boxes just because, is just going to get reverted when I get the first regression report that you broke old boxes.
Andy Lutomirski just submitted a bunch of patches to clean up the DRM usage of mtrrs, they are in drm-next, afaik we no longer add them on PAT systems.
Dave.
It will regress already slow boxes. We blacklist a LOT of P4s, PMs, etc and nobody ever took the pain to track down which ones of those actually have PAT+MTRR aliasing bugs.
These boxes have boards like the Radeon X300, which needs either PAT or MTRR to not become unusable...
We're talking hardware which is now many years old, but this is causing very serious problems on real, modern hardware. As far as I understand it, too, the blacklisting was precautionary (the only bug that I personally know about is a performance bug, where WC would be incorrectly converted to UC.)
We need a way forward here. If it is the only way I think we would have to sacrifice the old machines, but perhaps something can be worked out (e.g. if PAT is disabled, fall back to MTRRs if available for ioremap_wc()).
-hpa
-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
On 06/23/2013 01:30 PM, Dave Airlie wrote:
Why do you care about performance when PAT is disabled?
breaking old boxes just because, is just going to get reverted when I get the first regression report that you broke old boxes.
Not "just because", but *if* the choice is between breaking old boxes and breaking new boxes I'll take the latter.
Andy Lutomirski just submitted a bunch of patches to clean up the DRM usage of mtrrs, they are in drm-next, afaik we no longer add them on PAT systems.
Fantastic news. No issue, then, and no need to break anything.
The only problem I see with having ioremap_wc() installing an MTRR on non-PAT, rather than pushing that into the drivers which is clearly not the right thing, is that we will need a hook to uninstall it when the mapping is destroyed.
-hpa
breaking old boxes just because, is just going to get reverted when I get the first regression report that you broke old boxes.
Not "just because", but *if* the choice is between breaking old boxes and breaking new boxes I'll take the latter.
But Linus won't so your choice doesn't matter.
Andy Lutomirski just submitted a bunch of patches to clean up the DRM usage of mtrrs, they are in drm-next, afaik we no longer add them on PAT systems.
Fantastic news. No issue, then, and no need to break anything.
Granted I haven't tested Andy's patches on my AGP boxes, and I intend to, if they do cause any regressions he'll be working them out :-)
Dave.
On 06/23/2013 01:54 PM, Dave Airlie wrote:
breaking old boxes just because, is just going to get reverted when I get the first regression report that you broke old boxes.
Not "just because", but *if* the choice is between breaking old boxes and breaking new boxes I'll take the latter.
But Linus won't so your choice doesn't matter.
I hate to break it to you, but we regress on ancient hardware all the time. Optimization work gets done on modern machines, so the sweet spot keeps moving. In particular, if supporting ancient hardware means leaving a lot of performance on modern hardware on the table, we may have to take that penalty.
Fortunately, most of the time we don't have to.
-hpa
On Mon, Jun 24, 2013 at 6:58 AM, H. Peter Anvin hpa@zytor.com wrote:
On 06/23/2013 01:54 PM, Dave Airlie wrote:
breaking old boxes just because, is just going to get reverted when I get the first regression report that you broke old boxes.
Not "just because", but *if* the choice is between breaking old boxes and breaking new boxes I'll take the latter.
But Linus won't so your choice doesn't matter.
I hate to break it to you, but we regress on ancient hardware all the time. Optimization work gets done on modern machines, so the sweet spot keeps moving. In particular, if supporting ancient hardware means leaving a lot of performance on modern hardware on the table, we may have to take that penalty.
Fortunately, most of the time we don't have to.
Big difference between optimization sweet-spot and deliberately breaking older systems. This is firmly in the second category, lots of Intel hardware stops being useable when MTRR and PAT isn't working, so much so we had to a warning to the driver when we detect such a thing.
Dave.
On Sun, Jun 23, 2013 at 1:38 PM, H. Peter Anvin hpa@zytor.com wrote:
On 06/23/2013 01:30 PM, Dave Airlie wrote:
Why do you care about performance when PAT is disabled?
breaking old boxes just because, is just going to get reverted when I get the first regression report that you broke old boxes.
Not "just because", but *if* the choice is between breaking old boxes and breaking new boxes I'll take the latter.
Andy Lutomirski just submitted a bunch of patches to clean up the DRM usage of mtrrs, they are in drm-next, afaik we no longer add them on PAT systems.
Fantastic news. No issue, then, and no need to break anything.
The only problem I see with having ioremap_wc() installing an MTRR on non-PAT, rather than pushing that into the drivers which is clearly not the right thing, is that we will need a hook to uninstall it when the mapping is destroyed.
I have trouble believing that this will ever work well -- MTRRs have crazy alignment requirements and interactions with other MTRRs, and a few drivers have to jump through hoops to set up the right MTRRs. There aren't really enough to break down every mapping.
My patches (in dri-next) add functions arch_wc_phys_add and arch_wc_phys_del that do nothing except on x86 with MTRRs on and PAT off, in which case they try to add a WC MTRR. That way the handful of drivers that need WC for performance on old hardware can try (and possibly fail, depending on the usual vagaries of MTRRs). With my patches applied, DRM and agpgart no longer touch MTRRs at all with PAT on.
I didn't get around to excising MTRRs from the non-DRM video drivers or from the few odd cases like myri10ge.
This stuff is painful to test. The only drivers I can really test are i915 and radeon. I have a myri10ge device, but it's on a production server. I also have several mgag200 devices, but they're in a super-secret-locked-down datacenter a few thousand miles away, and trying to gauge framebuffer performance over Dell and/or HP's crappy remoting interface is a lost cause. I'm not sure that my oldest computer (locked in a basement in another state) is old enough to have an AGP port.
--Andy
On Sun, 23 Jun 2013, H. Peter Anvin wrote:
On 06/23/2013 12:29 PM, Henrique de Moraes Holschuh wrote:
On Sun, 23 Jun 2013, H. Peter Anvin wrote:
Why do you care about performance when PAT is disabled?
It will regress already slow boxes. We blacklist a LOT of P4s, PMs, etc and nobody ever took the pain to track down which ones of those actually have PAT+MTRR aliasing bugs.
These boxes have boards like the Radeon X300, which needs either PAT or MTRR to not become unusable...
We're talking hardware which is now many years old, but this is causing very serious problems on real, modern hardware. As far as I understand it, too, the blacklisting was precautionary (the only bug that I personally know about is a performance bug, where WC would be incorrectly converted to UC.)
And as far as I could find from Intel's not-that-complete public "specification updates", we are applying the errata workaround to a few more processors than strictly required, but since I have no idea how to write a test case, I can't whitelist the 3rd-gen Pentium M on my T43, nor can I get ThinkPad owners to test it for us on 1st and 2nd-gen Pentium M and report back.
We need a way forward here. If it is the only way I think we would have to sacrifice the old machines, but perhaps something can be worked out (e.g. if PAT is disabled, fall back to MTRRs if available for ioremap_wc()).
I'd be quite happy with a MTRR fallback.
On 06/23/2013 02:56 PM, Henrique de Moraes Holschuh wrote:
And as far as I could find from Intel's not-that-complete public "specification updates", we are applying the errata workaround to a few more processors than strictly required, but since I have no idea how to write a test case, I can't whitelist the 3rd-gen Pentium M on my T43, nor can I get ThinkPad owners to test it for us on 1st and 2nd-gen Pentium M and report back.
Which specific erratum are you referring to, here? The "WC becomes UC" erratum? I don't think there is a sane testcase for it since it needs a very complicated setup to trigger.
-hpa
On Sun, 23 Jun 2013, H. Peter Anvin wrote:
On 06/23/2013 02:56 PM, Henrique de Moraes Holschuh wrote:
And as far as I could find from Intel's not-that-complete public "specification updates", we are applying the errata workaround to a few more processors than strictly required, but since I have no idea how to write a test case, I can't whitelist the 3rd-gen Pentium M on my T43, nor can I get ThinkPad owners to test it for us on 1st and 2nd-gen Pentium M and report back.
Which specific erratum are you referring to, here? The "WC becomes UC" erratum? I don't think there is a sane testcase for it since it needs a very complicated setup to trigger.
There are at least two different nasty PAT issues that are not always critical, and one that outright hangs the processor (if the unsupported aliasing of WB with UC/WC happens).
Interestingly enough, most of the P4-Xeons and P4 do not appear to have the "WC becomes UC" errata.
However, LOTS of P4, M-P4, Xeon PIII, Xeon, and Pentium M have a bug where the four highest entries in the PAT table are inactive (aliased to the four lowest entries) in mode B (PSE) and mode C (PAE) for 4k pages. They work fine for large pages.
Also, lots of them can hang if you ever alias WB with UC or WC (which is apparently an unsupported configuration anyway, or so it says in the errata).
There are other weird aliasing nasties, such as one where you get memory corruption if you alias WB data with code (being accessed as UC or WC) in the same cacheline, and some stuff such as weirdness should the page table be on WC memory...
I can track down most of the CPUIDs involved if you want, but someone from Intel would be better (I assume they actually have access to the errata documentation in some less idiotic way than reading a ton of badly indexed PDFs that take forever to find in their site).
The aliasing doesn't matter for Linux because we map the high and low half the same.
Henrique de Moraes Holschuh hmh@hmh.eng.br wrote:
On Sun, 23 Jun 2013, H. Peter Anvin wrote:
On 06/23/2013 02:56 PM, Henrique de Moraes Holschuh wrote:
And as far as I could find from Intel's not-that-complete public "specification updates", we are applying the errata workaround to a
few more
processors than strictly required, but since I have no idea how to
write a
test case, I can't whitelist the 3rd-gen Pentium M on my T43, nor
can I get
ThinkPad owners to test it for us on 1st and 2nd-gen Pentium M and
report
back.
Which specific erratum are you referring to, here? The "WC becomes
UC"
erratum? I don't think there is a sane testcase for it since it
needs a
very complicated setup to trigger.
There are at least two different nasty PAT issues that are not always critical, and one that outright hangs the processor (if the unsupported aliasing of WB with UC/WC happens).
Interestingly enough, most of the P4-Xeons and P4 do not appear to have the "WC becomes UC" errata.
However, LOTS of P4, M-P4, Xeon PIII, Xeon, and Pentium M have a bug where the four highest entries in the PAT table are inactive (aliased to the four lowest entries) in mode B (PSE) and mode C (PAE) for 4k pages. They work fine for large pages.
Also, lots of them can hang if you ever alias WB with UC or WC (which is apparently an unsupported configuration anyway, or so it says in the errata).
There are other weird aliasing nasties, such as one where you get memory corruption if you alias WB data with code (being accessed as UC or WC) in the same cacheline, and some stuff such as weirdness should the page table be on WC memory...
I can track down most of the CPUIDs involved if you want, but someone from Intel would be better (I assume they actually have access to the errata documentation in some less idiotic way than reading a ton of badly indexed PDFs that take forever to find in their site).
dri-devel@lists.freedesktop.org