On Tue, 5 Jun 2018, Alexey Brodkin wrote:
Hi Mikulas,
On Sun, 2018-06-03 at 16:41 +0200, Mikulas Patocka wrote:
Modern processors can detect linear memory accesses and prefetch data automatically, so there's no need to use prefetch.
Not each and every CPU that's capable of running Linux has prefetch functionality :)
Still read-on...
Signed-off-by: Mikulas Patocka mpatocka@redhat.com
drivers/gpu/drm/udl/udl_transfer.c | 7 ------- 1 file changed, 7 deletions(-)
Index: linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c
--- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200 +++ linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200 @@ -13,7 +13,6 @@ #include <linux/module.h> #include <linux/slab.h> #include <linux/fb.h> -#include <linux/prefetch.h> #include <asm/unaligned.h>
#include <drm/drmP.h> @@ -51,9 +50,6 @@ static int udl_trim_hline(const u8 *bbac int start = width; int end = width;
- prefetch((void *) front);
- prefetch((void *) back);
AFAIK prefetcher fetches new data according to a known history... i.e. based on previously used pattern we'll trying to get the next batch of data.
But the code above is in the very beginning of the data processing routine where prefetcher doesn't yet have any history to know what and where to prefetch.
So I'd say this particular usage is good. At least those prefetches shouldn't hurt because typically it would be just 1 instruction if those exist or nothing if CPU/compiler doesn't support it.
See this post https://lwn.net/Articles/444336/ where they measured that prefetch hurts performance. Prefetch shouldn't be used unless you have a proof that it improves performance.
The problem is that the prefetch instruction causes stalls in the pipeline when it encounters TLB miss and the automatic prefetcher doesn't.
Mikulas