Hi
Here I'm sending bug fixes and performance improvements for the USB DisplayLink framebuffer and modesetting drivers for this merge window.
Mikulas
The displaylink hardware has such a peculiarity that it doesn't render a command until next command is received. This produces occasional corruption, such as when setting 22x11 font on the console, only the first line of the cursor will be blinking if the cursor is located at some specific columns.
When we end up with a repeating pixel, the driver has a bug that it leaves one uninitialized byte after the command (and this byte is enough to flush the command and render it - thus it fixes the screen corruption), however whe we end up with a non-repeating pixel, there is no byte appended and this results in temporary screen corruption.
This patch fixes the screen corruption by always appending a byte 0xAF at the end of URB. It also removes the uninitialized byte.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org
--- drivers/gpu/drm/udl/udl_fb.c | 5 ++++- drivers/gpu/drm/udl/udl_transfer.c | 11 +++++++---- 2 files changed, 11 insertions(+), 5 deletions(-)
Index: linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c =================================================================== --- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:47:07.000000000 +0200 +++ linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:53:31.000000000 +0200 @@ -153,11 +153,11 @@ static void udl_compress_hline16( raw_pixels_count_byte = cmd++; /* we'll know this later */ raw_pixel_start = pixel;
- cmd_pixel_end = pixel + (min(MAX_CMD_PIXELS + 1, - min((int)(pixel_end - pixel) / bpp, - (int)(cmd_buffer_end - cmd) / 2))) * bpp; + cmd_pixel_end = pixel + min3(MAX_CMD_PIXELS + 1UL, + (unsigned long)(pixel_end - pixel) / bpp, + (unsigned long)(cmd_buffer_end - 1 - cmd) / 2) * bpp;
- prefetch_range((void *) pixel, (cmd_pixel_end - pixel) * bpp); + prefetch_range((void *) pixel, cmd_pixel_end - pixel); pixel_val16 = get_pixel_val16(pixel, bpp);
while (pixel < cmd_pixel_end) { @@ -193,6 +193,9 @@ static void udl_compress_hline16( if (pixel > raw_pixel_start) { /* finalize last RAW span */ *raw_pixels_count_byte = ((pixel-raw_pixel_start) / bpp) & 0xFF; + } else { + /* undo unused byte */ + cmd--; }
*cmd_pixels_count_byte = ((pixel - cmd_pixel_start) / bpp) & 0xFF; Index: linux-4.16.12/drivers/gpu/drm/udl/udl_fb.c =================================================================== --- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_fb.c 2018-05-31 14:47:07.000000000 +0200 +++ linux-4.16.12/drivers/gpu/drm/udl/udl_fb.c 2018-05-31 14:53:32.000000000 +0200 @@ -137,7 +137,10 @@ int udl_handle_damage(struct udl_framebu
if (cmd > (char *) urb->transfer_buffer) { /* Send partial buffer remaining before exiting */ - int len = cmd - (char *) urb->transfer_buffer; + int len; + if (cmd < (char *) urb->transfer_buffer + urb->transfer_buffer_length) + *cmd++ = 0xAF; + len = cmd - (char *) urb->transfer_buffer; ret = udl_submit_urb(dev, urb, len); bytes_sent += len; } else
If we leave urbs around, it causes not only leak, but also memory corruption. This patch fixes the function udl_free_urb_list, so that it always waits for all urbs that are in progress.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org
--- drivers/gpu/drm/udl/udl_main.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-)
Index: linux-4.16.12/drivers/gpu/drm/udl/udl_main.c =================================================================== --- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_main.c 2018-05-31 10:23:42.000000000 +0200 +++ linux-4.16.12/drivers/gpu/drm/udl/udl_main.c 2018-05-31 10:28:11.000000000 +0200 @@ -170,18 +170,13 @@ static void udl_free_urb_list(struct drm struct list_head *node; struct urb_node *unode; struct urb *urb; - int ret; unsigned long flags;
DRM_DEBUG("Waiting for completes and freeing all render urbs\n");
/* keep waiting and freeing, until we've got 'em all */ while (count--) { - - /* Getting interrupted means a leak, but ok at shutdown*/ - ret = down_interruptible(&udl->urbs.limit_sem); - if (ret) - break; + down(&udl->urbs.limit_sem);
spin_lock_irqsave(&udl->urbs.lock, flags);
Allocations larger than PAGE_ALLOC_COSTLY_ORDER are unreliable and they may fail anytime. This patch fixes the udl kms driver so that when a large alloactions fails, it tries to do multiple smaller allocations.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org
--- drivers/gpu/drm/udl/udl_main.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-)
Index: linux-4.16.12/drivers/gpu/drm/udl/udl_main.c =================================================================== --- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_main.c 2018-05-31 11:16:15.000000000 +0200 +++ linux-4.16.12/drivers/gpu/drm/udl/udl_main.c 2018-05-31 11:16:15.000000000 +0200 @@ -200,17 +200,22 @@ static void udl_free_urb_list(struct drm static int udl_alloc_urb_list(struct drm_device *dev, int count, size_t size) { struct udl_device *udl = dev->dev_private; - int i = 0; struct urb *urb; struct urb_node *unode; char *buf; + size_t wanted_size = count * size;
spin_lock_init(&udl->urbs.lock);
+retry: udl->urbs.size = size; INIT_LIST_HEAD(&udl->urbs.list);
- while (i < count) { + sema_init(&udl->urbs.limit_sem, 0); + udl->urbs.count = 0; + udl->urbs.available = 0; + + while (udl->urbs.count * size < wanted_size) { unode = kzalloc(sizeof(struct urb_node), GFP_KERNEL); if (!unode) break; @@ -226,11 +231,16 @@ static int udl_alloc_urb_list(struct drm } unode->urb = urb;
- buf = usb_alloc_coherent(udl->udev, MAX_TRANSFER, GFP_KERNEL, + buf = usb_alloc_coherent(udl->udev, size, GFP_KERNEL, &urb->transfer_dma); if (!buf) { kfree(unode); usb_free_urb(urb); + if (size > PAGE_SIZE) { + size /= 2; + udl_free_urb_list(dev); + goto retry; + } break; }
@@ -241,16 +251,14 @@ static int udl_alloc_urb_list(struct drm
list_add_tail(&unode->entry, &udl->urbs.list);
- i++; + up(&udl->urbs.limit_sem); + udl->urbs.count++; + udl->urbs.available++; }
- sema_init(&udl->urbs.limit_sem, i); - udl->urbs.count = i; - udl->urbs.available = i; - - DRM_DEBUG("allocated %d %d byte urbs\n", i, (int) size); + DRM_DEBUG("allocated %d %d byte urbs\n", udl->urbs.count, (int) size);
- return i; + return udl->urbs.count; }
struct urb *udl_get_urb(struct drm_device *dev)
We must use kzalloc when allocating the fb_deferred_io structure. Otherwise, the field first_io is undefined and it causes a crash.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org
--- drivers/gpu/drm/udl/udl_fb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-4.16.12/drivers/gpu/drm/udl/udl_fb.c =================================================================== --- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_fb.c 2018-05-29 17:55:39.000000000 +0200 +++ linux-4.16.12/drivers/gpu/drm/udl/udl_fb.c 2018-05-29 17:55:39.000000000 +0200 @@ -221,7 +221,7 @@ static int udl_fb_open(struct fb_info *i
struct fb_deferred_io *fbdefio;
- fbdefio = kmalloc(sizeof(struct fb_deferred_io), GFP_KERNEL); + fbdefio = kzalloc(sizeof(struct fb_deferred_io), GFP_KERNEL);
if (fbdefio) { fbdefio->delay = DL_DEFIO_WRITE_DELAY;
The udl driver crashes when fbdefio is used. The crash can be reproduced with this sequence: 1. echo 1 >/sys/module/udl/parameters/fb_defio 2. run some program that maps the framebuffer, such as 'links -g' or 'fbi' 3. allocate memory to the point where the machine starts swapping
The reason for the crash is that udl_gem_get_pages calls drm_gem_get_pages and drm_gem_get_pages allocates the pages using shmem_read_mapping_page. The shmem pages are kept on the memory management lists using the page->lru entry.
However, fbdefio reuses the page->lru entry for the list of pages that were modified, so the memory management lists are corrupted and the machine crashes when vmscan starts to scan memory.
I fixed this crash by allocating pages with "alloc_page" instead. The pages allocated with "alloc_page" have page->lru unused, and thus the system doesn't crash when fbdefio uses it.
Unable to handle kernel paging request at virtual address dead000000000200 Mem abort info: ESR = 0x96000044 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000044 CM = 0, WnR = 1 [dead000000000200] address between user and kernel address ranges Internal error: Oops: 96000044 [#1] PREEMPT SMP Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet autofs4 udl drm_kms_helper cfbfillrect syscopyarea cfbimgblt sysfillrect sysimgblt fb_sys_fops cfbcopyarea fb font drm drm_panel_orientation_quirks mousedev hid_generic usbhid hid binfmt_misc snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi snd_pcm snd_timer snd soundcore ipv6 aes_ce_blk crypto_simd cryptd aes_ce_cipher crc32_ce ghash_ce gf128mul aes_arm64 sha2_ce sha256_arm64 sha1_ce sha1_generic xhci_plat_hcd xhci_hcd sd_mod usbcore usb_common mvpp2 unix CPU: 0 PID: 39 Comm: kswapd0 Not tainted 4.16.12 #3 Hardware name: Marvell 8040 MACHIATOBin (DT) pstate: 00000085 (nzcv daIf -PAN -UAO) pc : isolate_lru_pages.isra.16+0x23c/0x2b0 lr : isolate_lru_pages.isra.16+0x104/0x2b0 sp : ffffffc13a897ac0 x29: ffffffc13a897ac0 x28: 0000000000000003 x27: 0000000000000003 x26: 0000000000000004 x25: ffffff80087e84a0 x24: ffffffc13a897b68 x23: ffffffbf04cefc60 x22: ffffffc13a897e44 x21: 0000000000000009 x20: ffffffc13a897c00 x19: ffffffc13a897b40 x18: ffffffbf04d39000 x17: 00000000fffffff8 x16: ffffffbf00000000 x15: 0000000000000006 x14: 0000000000000000 x13: 00000000000001aa x12: 400000000004001c x11: 0000000000000001 x10: 0000000000000001 x9 : 00000000000ee3ac x8 : 00000000000004a0 x7 : ffffff80087e84a0 x6 : 0000000000000000 x5 : 0000000000000000 x4 : 0000000000000002 x3 : ffffff80087e84a0 x2 : 0000000000000200 x1 : dead000000000200 x0 : 400000000004001c Process kswapd0 (pid: 39, stack limit = 0x0000000097f25571) Call trace: isolate_lru_pages.isra.16+0x23c/0x2b0 shrink_inactive_list+0xe4/0x3b0 shrink_node_memcg.constprop.19+0x374/0x630 shrink_node+0x64/0x1c8 kswapd+0x340/0x568 kthread+0x118/0x120 ret_from_fork+0x10/0x18 Code: d2804002 f85e02e0 f85e02ec f9000461 (f9000023) ---[ end trace f9f3ad3856cb2ef3 ]--- note: kswapd0[39] exited with preempt_count 1
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org
--- drivers/gpu/drm/udl/udl_gem.c | 31 +++++++++++++++++++++++++++---- 1 file changed, 27 insertions(+), 4 deletions(-)
Index: linux-4.16.12/drivers/gpu/drm/udl/udl_gem.c =================================================================== --- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_gem.c 2018-01-10 09:31:23.000000000 +0100 +++ linux-4.16.12/drivers/gpu/drm/udl/udl_gem.c 2018-05-29 17:46:10.000000000 +0200 @@ -130,28 +130,51 @@ int udl_gem_fault(struct vm_fault *vmf) int udl_gem_get_pages(struct udl_gem_object *obj) { struct page **pages; + int npages, i;
if (obj->pages) return 0;
- pages = drm_gem_get_pages(&obj->base); - if (IS_ERR(pages)) - return PTR_ERR(pages); + npages = obj->base.size >> PAGE_SHIFT; + + pages = kvmalloc_array(npages, sizeof(struct page *), GFP_KERNEL); + if (!pages) + return -ENOMEM; + + for (i = 0; i < npages; i++) { + struct page *p = alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!p) + goto fail; + pages[i] = p; + }
obj->pages = pages;
return 0; + +fail: + while (i--) + put_page(pages[i]); + kvfree(pages); + return -ENOMEM; }
void udl_gem_put_pages(struct udl_gem_object *obj) { + int npages, i; + if (obj->base.import_attach) { kvfree(obj->pages); obj->pages = NULL; return; }
- drm_gem_put_pages(&obj->base, obj->pages, false, false); + npages = obj->base.size >> PAGE_SHIFT; + + for (i = 0; i < npages; i++) + put_page(obj->pages[i]); + + kvfree(obj->pages); obj->pages = NULL; }
The defio subsystem overwrites the method fb_osp->mmap. That method is stored in module's static data - and that means that if we have multiple diplaylink adapters, they will over write each other's method.
In order to avoid interference between multiple adapters, we copy the fb_ops structure to a device-local memory.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org
--- drivers/gpu/drm/udl/udl_fb.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
Index: linux-4.17-rc7/drivers/gpu/drm/udl/udl_fb.c =================================================================== --- linux-4.17-rc7.orig/drivers/gpu/drm/udl/udl_fb.c 2018-06-03 13:05:20.000000000 +0200 +++ linux-4.17-rc7/drivers/gpu/drm/udl/udl_fb.c 2018-06-03 13:08:03.000000000 +0200 @@ -34,6 +34,7 @@ module_param(fb_defio, int, S_IWUSR | S_ struct udl_fbdev { struct drm_fb_helper helper; struct udl_framebuffer ufb; + struct fb_ops fb_ops; int fb_count; };
@@ -405,7 +406,8 @@ static int udlfb_create(struct drm_fb_he info->fix.smem_len = size; info->fix.smem_start = (unsigned long)ufbdev->ufb.obj->vmapping;
- info->fbops = &udlfb_ops; + ufbdev->fb_ops = udlfb_ops; + info->fbops = &ufbdev->fb_ops; drm_fb_helper_fill_fix(info, fb->pitches[0], fb->format->depth); drm_fb_helper_fill_var(info, &ufbdev->helper, sizes->fb_width, sizes->fb_height);
Division is slow, so it shouldn't be done by the pixel generating code. The driver supports only 2 or 4 bytes per pixel, so we can replace division with a shift.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org
--- drivers/gpu/drm/udl/udl_drv.h | 2 - drivers/gpu/drm/udl/udl_fb.c | 15 ++++++++------ drivers/gpu/drm/udl/udl_transfer.c | 39 ++++++++++++++++++------------------- 3 files changed, 30 insertions(+), 26 deletions(-)
Index: linux-4.17-rc7/drivers/gpu/drm/udl/udl_drv.h =================================================================== --- linux-4.17-rc7.orig/drivers/gpu/drm/udl/udl_drv.h 2018-06-03 13:15:01.000000000 +0200 +++ linux-4.17-rc7/drivers/gpu/drm/udl/udl_drv.h 2018-06-03 13:15:01.000000000 +0200 @@ -110,7 +110,7 @@ udl_fb_user_fb_create(struct drm_device struct drm_file *file, const struct drm_mode_fb_cmd2 *mode_cmd);
-int udl_render_hline(struct drm_device *dev, int bpp, struct urb **urb_ptr, +int udl_render_hline(struct drm_device *dev, int log_bpp, struct urb **urb_ptr, const char *front, char **urb_buf_ptr, u32 byte_offset, u32 device_byte_offset, u32 byte_width, int *ident_ptr, int *sent_ptr); Index: linux-4.17-rc7/drivers/gpu/drm/udl/udl_fb.c =================================================================== --- linux-4.17-rc7.orig/drivers/gpu/drm/udl/udl_fb.c 2018-06-03 13:15:01.000000000 +0200 +++ linux-4.17-rc7/drivers/gpu/drm/udl/udl_fb.c 2018-06-03 13:15:01.000000000 +0200 @@ -91,7 +91,10 @@ int udl_handle_damage(struct udl_framebu int bytes_identical = 0; struct urb *urb; int aligned_x; - int bpp = fb->base.format->cpp[0]; + int log_bpp; + + BUG_ON(!is_power_of_2(fb->base.format->cpp[0])); + log_bpp = __ffs(fb->base.format->cpp[0]);
if (!fb->active_16) return 0; @@ -126,12 +129,12 @@ int udl_handle_damage(struct udl_framebu
for (i = y; i < y + height ; i++) { const int line_offset = fb->base.pitches[0] * i; - const int byte_offset = line_offset + (x * bpp); - const int dev_byte_offset = (fb->base.width * bpp * i) + (x * bpp); - if (udl_render_hline(dev, bpp, &urb, + const int byte_offset = line_offset + (x << log_bpp); + const int dev_byte_offset = (fb->base.width * i + x) << log_bpp; + if (udl_render_hline(dev, log_bpp, &urb, (char *) fb->obj->vmapping, &cmd, byte_offset, dev_byte_offset, - width * bpp, + width << log_bpp, &bytes_identical, &bytes_sent)) goto error; } @@ -150,7 +153,7 @@ int udl_handle_damage(struct udl_framebu error: atomic_add(bytes_sent, &udl->bytes_sent); atomic_add(bytes_identical, &udl->bytes_identical); - atomic_add(width*height*bpp, &udl->bytes_rendered); + atomic_add((width * height) << log_bpp, &udl->bytes_rendered); end_cycles = get_cycles(); atomic_add(((unsigned int) ((end_cycles - start_cycles) >> 10)), /* Kcycles */ Index: linux-4.17-rc7/drivers/gpu/drm/udl/udl_transfer.c =================================================================== --- linux-4.17-rc7.orig/drivers/gpu/drm/udl/udl_transfer.c 2018-06-03 13:15:01.000000000 +0200 +++ linux-4.17-rc7/drivers/gpu/drm/udl/udl_transfer.c 2018-06-03 13:15:01.000000000 +0200 @@ -83,12 +83,12 @@ static inline u16 pixel32_to_be16(const ((pixel >> 8) & 0xf800)); }
-static inline u16 get_pixel_val16(const uint8_t *pixel, int bpp) +static inline u16 get_pixel_val16(const uint8_t *pixel, int log_bpp) { - u16 pixel_val16 = 0; - if (bpp == 2) + u16 pixel_val16; + if (log_bpp == 1) pixel_val16 = *(const uint16_t *)pixel; - else if (bpp == 4) + else pixel_val16 = pixel32_to_be16(*(const uint32_t *)pixel); return pixel_val16; } @@ -125,8 +125,9 @@ static void udl_compress_hline16( const u8 *const pixel_end, uint32_t *device_address_ptr, uint8_t **command_buffer_ptr, - const uint8_t *const cmd_buffer_end, int bpp) + const uint8_t *const cmd_buffer_end, int log_bpp) { + const int bpp = 1 << log_bpp; const u8 *pixel = *pixel_start_ptr; uint32_t dev_addr = *device_address_ptr; uint8_t *cmd = *command_buffer_ptr; @@ -153,12 +154,12 @@ static void udl_compress_hline16( raw_pixels_count_byte = cmd++; /* we'll know this later */ raw_pixel_start = pixel;
- cmd_pixel_end = pixel + min3(MAX_CMD_PIXELS + 1UL, - (unsigned long)(pixel_end - pixel) / bpp, - (unsigned long)(cmd_buffer_end - 1 - cmd) / 2) * bpp; + cmd_pixel_end = pixel + (min3(MAX_CMD_PIXELS + 1UL, + (unsigned long)(pixel_end - pixel) >> log_bpp, + (unsigned long)(cmd_buffer_end - 1 - cmd) / 2) << log_bpp);
prefetch_range((void *) pixel, cmd_pixel_end - pixel); - pixel_val16 = get_pixel_val16(pixel, bpp); + pixel_val16 = get_pixel_val16(pixel, log_bpp);
while (pixel < cmd_pixel_end) { const u8 *const start = pixel; @@ -170,7 +171,7 @@ static void udl_compress_hline16( pixel += bpp;
while (pixel < cmd_pixel_end) { - pixel_val16 = get_pixel_val16(pixel, bpp); + pixel_val16 = get_pixel_val16(pixel, log_bpp); if (pixel_val16 != repeating_pixel_val16) break; pixel += bpp; @@ -179,10 +180,10 @@ static void udl_compress_hline16( if (unlikely(pixel > start + bpp)) { /* go back and fill in raw pixel count */ *raw_pixels_count_byte = (((start - - raw_pixel_start) / bpp) + 1) & 0xFF; + raw_pixel_start) >> log_bpp) + 1) & 0xFF;
/* immediately after raw data is repeat byte */ - *cmd++ = (((pixel - start) / bpp) - 1) & 0xFF; + *cmd++ = (((pixel - start) >> log_bpp) - 1) & 0xFF;
/* Then start another raw pixel span */ raw_pixel_start = pixel; @@ -192,14 +193,14 @@ static void udl_compress_hline16(
if (pixel > raw_pixel_start) { /* finalize last RAW span */ - *raw_pixels_count_byte = ((pixel-raw_pixel_start) / bpp) & 0xFF; + *raw_pixels_count_byte = ((pixel - raw_pixel_start) >> log_bpp) & 0xFF; } else { /* undo unused byte */ cmd--; }
- *cmd_pixels_count_byte = ((pixel - cmd_pixel_start) / bpp) & 0xFF; - dev_addr += ((pixel - cmd_pixel_start) / bpp) * 2; + *cmd_pixels_count_byte = ((pixel - cmd_pixel_start) >> log_bpp) & 0xFF; + dev_addr += ((pixel - cmd_pixel_start) >> log_bpp) * 2; }
if (cmd_buffer_end <= MIN_RLX_CMD_BYTES + cmd) { @@ -222,19 +223,19 @@ static void udl_compress_hline16( * (that we can only write to, slowly, and can never read), and (optionally) * our shadow copy that tracks what's been sent to that hardware buffer. */ -int udl_render_hline(struct drm_device *dev, int bpp, struct urb **urb_ptr, +int udl_render_hline(struct drm_device *dev, int log_bpp, struct urb **urb_ptr, const char *front, char **urb_buf_ptr, u32 byte_offset, u32 device_byte_offset, u32 byte_width, int *ident_ptr, int *sent_ptr) { const u8 *line_start, *line_end, *next_pixel; - u32 base16 = 0 + (device_byte_offset / bpp) * 2; + u32 base16 = 0 + (device_byte_offset >> log_bpp) * 2; struct urb *urb = *urb_ptr; u8 *cmd = *urb_buf_ptr; u8 *cmd_end = (u8 *) urb->transfer_buffer + urb->transfer_buffer_length;
- BUG_ON(!(bpp == 2 || bpp == 4)); + BUG_ON(!(log_bpp == 1 || log_bpp == 2));
line_start = (u8 *) (front + byte_offset); next_pixel = line_start; @@ -244,7 +245,7 @@ int udl_render_hline(struct drm_device *
udl_compress_hline16(&next_pixel, line_end, &base16, - (u8 **) &cmd, (u8 *) cmd_end, bpp); + (u8 **) &cmd, (u8 *) cmd_end, log_bpp);
if (cmd >= cmd_end) { int len = cmd - (u8 *) urb->transfer_buffer;
Modern processors can detect linear memory accesses and prefetch data automatically, so there's no need to use prefetch.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com
--- drivers/gpu/drm/udl/udl_transfer.c | 7 ------- 1 file changed, 7 deletions(-)
Index: linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c =================================================================== --- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200 +++ linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200 @@ -13,7 +13,6 @@ #include <linux/module.h> #include <linux/slab.h> #include <linux/fb.h> -#include <linux/prefetch.h> #include <asm/unaligned.h>
#include <drm/drmP.h> @@ -51,9 +50,6 @@ static int udl_trim_hline(const u8 *bbac int start = width; int end = width;
- prefetch((void *) front); - prefetch((void *) back); - for (j = 0; j < width; j++) { if (back[j] != front[j]) { start = j; @@ -140,8 +136,6 @@ static void udl_compress_hline16( const u8 *cmd_pixel_start, *cmd_pixel_end = NULL; uint16_t pixel_val16;
- prefetchw((void *) cmd); /* pull in one cache line at least */ - *cmd++ = 0xaf; *cmd++ = 0x6b; *cmd++ = (uint8_t) ((dev_addr >> 16) & 0xFF); @@ -158,7 +152,6 @@ static void udl_compress_hline16( (unsigned long)(pixel_end - pixel) >> log_bpp, (unsigned long)(cmd_buffer_end - 1 - cmd) / 2) << log_bpp);
- prefetch_range((void *) pixel, cmd_pixel_end - pixel); pixel_val16 = get_pixel_val16(pixel, log_bpp);
while (pixel < cmd_pixel_end) {
Hi Mikulas,
On Sun, 2018-06-03 at 16:41 +0200, Mikulas Patocka wrote:
Modern processors can detect linear memory accesses and prefetch data automatically, so there's no need to use prefetch.
Not each and every CPU that's capable of running Linux has prefetch functionality :)
Still read-on...
Signed-off-by: Mikulas Patocka mpatocka@redhat.com
drivers/gpu/drm/udl/udl_transfer.c | 7 ------- 1 file changed, 7 deletions(-)
Index: linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c
--- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200 +++ linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200 @@ -13,7 +13,6 @@ #include <linux/module.h> #include <linux/slab.h> #include <linux/fb.h> -#include <linux/prefetch.h> #include <asm/unaligned.h>
#include <drm/drmP.h> @@ -51,9 +50,6 @@ static int udl_trim_hline(const u8 *bbac int start = width; int end = width;
- prefetch((void *) front);
- prefetch((void *) back);
AFAIK prefetcher fetches new data according to a known history... i.e. based on previously used pattern we'll trying to get the next batch of data.
But the code above is in the very beginning of the data processing routine where prefetcher doesn't yet have any history to know what and where to prefetch.
So I'd say this particular usage is good. At least those prefetches shouldn't hurt because typically it would be just 1 instruction if those exist or nothing if CPU/compiler doesn't support it.
for (j = 0; j < width; j++) { if (back[j] != front[j]) { start = j; @@ -140,8 +136,6 @@ static void udl_compress_hline16( const u8 *cmd_pixel_start, *cmd_pixel_end = NULL; uint16_t pixel_val16;
prefetchw((void *) cmd); /* pull in one cache line at least */
Pretty much the same is here. Obviously on the first iteration prefetcher dosn't know where to get "cmd". On the next iterations it might be better but given amount of operation happens further in the cycle (and in inner cycles) I won't be completely sure that each and every prefetcher will still keep track of "cmd".
*cmd++ = 0xaf; *cmd++ = 0x6b; *cmd++ = (uint8_t) ((dev_addr >> 16) & 0xFF);
@@ -158,7 +152,6 @@ static void udl_compress_hline16( (unsigned long)(pixel_end - pixel) >> log_bpp, (unsigned long)(cmd_buffer_end - 1 - cmd) / 2) << log_bpp);
prefetch_range((void *) pixel, cmd_pixel_end - pixel);
Again I'm not sure what we gain removing that code in comparison of possible performance degradation on simpler CPUs.
And essentially all the same is applicable to UDLFB patch.
-Alexey
On Tue, Jun 05, 2018 at 10:08:02AM +0000, Alexey Brodkin wrote:
Hi Mikulas,
On Sun, 2018-06-03 at 16:41 +0200, Mikulas Patocka wrote:
Modern processors can detect linear memory accesses and prefetch data automatically, so there's no need to use prefetch.
Not each and every CPU that's capable of running Linux has prefetch functionality :)
Still read-on...
Signed-off-by: Mikulas Patocka mpatocka@redhat.com
drivers/gpu/drm/udl/udl_transfer.c | 7 ------- 1 file changed, 7 deletions(-)
Index: linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c
--- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200 +++ linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200 @@ -13,7 +13,6 @@ #include <linux/module.h> #include <linux/slab.h> #include <linux/fb.h> -#include <linux/prefetch.h> #include <asm/unaligned.h>
#include <drm/drmP.h> @@ -51,9 +50,6 @@ static int udl_trim_hline(const u8 *bbac int start = width; int end = width;
- prefetch((void *) front);
- prefetch((void *) back);
AFAIK prefetcher fetches new data according to a known history... i.e. based on previously used pattern we'll trying to get the next batch of data.
But the code above is in the very beginning of the data processing routine where prefetcher doesn't yet have any history to know what and where to prefetch.
So I'd say this particular usage is good. At least those prefetches shouldn't hurt because typically it would be just 1 instruction if those exist or nothing if CPU/compiler doesn't support it.
for (j = 0; j < width; j++) { if (back[j] != front[j]) { start = j; @@ -140,8 +136,6 @@ static void udl_compress_hline16( const u8 *cmd_pixel_start, *cmd_pixel_end = NULL; uint16_t pixel_val16;
prefetchw((void *) cmd); /* pull in one cache line at least */
Pretty much the same is here. Obviously on the first iteration prefetcher dosn't know where to get "cmd". On the next iterations it might be better but given amount of operation happens further in the cycle (and in inner cycles) I won't be completely sure that each and every prefetcher will still keep track of "cmd".
*cmd++ = 0xaf; *cmd++ = 0x6b; *cmd++ = (uint8_t) ((dev_addr >> 16) & 0xFF);
@@ -158,7 +152,6 @@ static void udl_compress_hline16( (unsigned long)(pixel_end - pixel) >> log_bpp, (unsigned long)(cmd_buffer_end - 1 - cmd) / 2) << log_bpp);
prefetch_range((void *) pixel, cmd_pixel_end - pixel);
Again I'm not sure what we gain removing that code in comparison of possible performance degradation on simpler CPUs.
And essentially all the same is applicable to UDLFB patch.
Tested this one on AT91SAM9G20 SoC, but couldn't get any measurable difference. Part of problem is probably that full speed USB is the bottleneck here. However, the same applies on OMAP3630 based board with high speed USB.
As a side note, I didn't experience any problem those paches are fixing, so perhaps testcases could be described briefly, preferably with some numbers (I'm not running console on udlfb, but try to give it a try next week).
Thank you, ladis
-Alexey
On Tue, 5 Jun 2018, Alexey Brodkin wrote:
Hi Mikulas,
On Sun, 2018-06-03 at 16:41 +0200, Mikulas Patocka wrote:
Modern processors can detect linear memory accesses and prefetch data automatically, so there's no need to use prefetch.
Not each and every CPU that's capable of running Linux has prefetch functionality :)
Still read-on...
Signed-off-by: Mikulas Patocka mpatocka@redhat.com
drivers/gpu/drm/udl/udl_transfer.c | 7 ------- 1 file changed, 7 deletions(-)
Index: linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c
--- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200 +++ linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200 @@ -13,7 +13,6 @@ #include <linux/module.h> #include <linux/slab.h> #include <linux/fb.h> -#include <linux/prefetch.h> #include <asm/unaligned.h>
#include <drm/drmP.h> @@ -51,9 +50,6 @@ static int udl_trim_hline(const u8 *bbac int start = width; int end = width;
- prefetch((void *) front);
- prefetch((void *) back);
AFAIK prefetcher fetches new data according to a known history... i.e. based on previously used pattern we'll trying to get the next batch of data.
But the code above is in the very beginning of the data processing routine where prefetcher doesn't yet have any history to know what and where to prefetch.
So I'd say this particular usage is good. At least those prefetches shouldn't hurt because typically it would be just 1 instruction if those exist or nothing if CPU/compiler doesn't support it.
See this post https://lwn.net/Articles/444336/ where they measured that prefetch hurts performance. Prefetch shouldn't be used unless you have a proof that it improves performance.
The problem is that the prefetch instruction causes stalls in the pipeline when it encounters TLB miss and the automatic prefetcher doesn't.
Mikulas
Hi Mikulas,
On Tue, 2018-06-05 at 11:30 -0400, Mikulas Patocka wrote:
On Tue, 5 Jun 2018, Alexey Brodkin wrote:
Hi Mikulas,
On Sun, 2018-06-03 at 16:41 +0200, Mikulas Patocka wrote:
Modern processors can detect linear memory accesses and prefetch data automatically, so there's no need to use prefetch.
Not each and every CPU that's capable of running Linux has prefetch functionality :)
Still read-on...
Signed-off-by: Mikulas Patocka mpatocka@redhat.com
drivers/gpu/drm/udl/udl_transfer.c | 7 ------- 1 file changed, 7 deletions(-)
Index: linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c
--- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200 +++ linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200 @@ -13,7 +13,6 @@ #include <linux/module.h> #include <linux/slab.h> #include <linux/fb.h> -#include <linux/prefetch.h> #include <asm/unaligned.h>
#include <drm/drmP.h> @@ -51,9 +50,6 @@ static int udl_trim_hline(const u8 *bbac int start = width; int end = width;
- prefetch((void *) front);
- prefetch((void *) back);
AFAIK prefetcher fetches new data according to a known history... i.e. based on previously used pattern we'll trying to get the next batch of data.
But the code above is in the very beginning of the data processing routine where prefetcher doesn't yet have any history to know what and where to prefetch.
So I'd say this particular usage is good. At least those prefetches shouldn't hurt because typically it would be just 1 instruction if those exist or nothing if CPU/compiler doesn't support it.
See this post https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_444336... ViXO7breS55ytWkhpk5R81I&m=a5RaqJtvajFkM1hL7bOKD5jV7cpFfTvG2Y1cYCdBPd0&s=w0W8wFtAgENp8TE6RzdPGhdKRasJc_otIn08V0EkgrY&e= where they measured that prefetch hurts performance. Prefetch shouldn't be used unless you have a proof that it improves performance.
The problem is that the prefetch instruction causes stalls in the pipeline when it encounters TLB miss and the automatic prefetcher doesn't.
Wow, thanks for the link. I didn't know about that subtle issue with prefetch instructions on ARM and x86.
So OK in case of UDL these prefetches anyways make not not much sense I guess and there's something worse still, see what I've got from WandBoard Quad running kmscube [1] application with help of perf utility: --------------------------->8------------------------- # Overhead Command Shared Object Symbol # ........ ....... ....................... ........................................ # 92.93% kmscube [kernel.kallsyms] [k] udl_render_hline 2.51% kmscube [kernel.kallsyms] [k] __divsi3 0.33% kmscube [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 0.22% kmscube [kernel.kallsyms] [k] lock_acquire 0.19% kmscube [kernel.kallsyms] [k] _raw_spin_unlock_irq 0.17% kmscube [kernel.kallsyms] [k] udl_handle_damage 0.12% kmscube [kernel.kallsyms] [k] v7_dma_clean_range 0.11% kmscube [kernel.kallsyms] [k] l2c210_clean_range 0.06% kmscube [kernel.kallsyms] [k] __memzero --------------------------->8-------------------------
That said it's not even USB 2.0 which is a bottle-neck but computations in the udl_render_hline().
[1] https://cgit.freedesktop.org/mesa/kmscube/
-Alexey
On Wed, 6 Jun 2018, Alexey Brodkin wrote:
Hi Mikulas,
On Tue, 2018-06-05 at 11:30 -0400, Mikulas Patocka wrote:
On Tue, 5 Jun 2018, Alexey Brodkin wrote:
Hi Mikulas,
On Sun, 2018-06-03 at 16:41 +0200, Mikulas Patocka wrote:
Modern processors can detect linear memory accesses and prefetch data automatically, so there's no need to use prefetch.
Not each and every CPU that's capable of running Linux has prefetch functionality :)
Still read-on...
Signed-off-by: Mikulas Patocka mpatocka@redhat.com
drivers/gpu/drm/udl/udl_transfer.c | 7 ------- 1 file changed, 7 deletions(-)
Index: linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c
--- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200 +++ linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200 @@ -13,7 +13,6 @@ #include <linux/module.h> #include <linux/slab.h> #include <linux/fb.h> -#include <linux/prefetch.h> #include <asm/unaligned.h>
#include <drm/drmP.h> @@ -51,9 +50,6 @@ static int udl_trim_hline(const u8 *bbac int start = width; int end = width;
- prefetch((void *) front);
- prefetch((void *) back);
AFAIK prefetcher fetches new data according to a known history... i.e. based on previously used pattern we'll trying to get the next batch of data.
But the code above is in the very beginning of the data processing routine where prefetcher doesn't yet have any history to know what and where to prefetch.
So I'd say this particular usage is good. At least those prefetches shouldn't hurt because typically it would be just 1 instruction if those exist or nothing if CPU/compiler doesn't support it.
See this post https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_444336... ViXO7breS55ytWkhpk5R81I&m=a5RaqJtvajFkM1hL7bOKD5jV7cpFfTvG2Y1cYCdBPd0&s=w0W8wFtAgENp8TE6RzdPGhdKRasJc_otIn08V0EkgrY&e= where they measured that prefetch hurts performance. Prefetch shouldn't be used unless you have a proof that it improves performance.
The problem is that the prefetch instruction causes stalls in the pipeline when it encounters TLB miss and the automatic prefetcher doesn't.
Wow, thanks for the link. I didn't know about that subtle issue with prefetch instructions on ARM and x86.
So OK in case of UDL these prefetches anyways make not not much sense I guess and there's something worse still, see what I've got from WandBoard Quad running kmscube [1] application with help of perf utility: --------------------------->8------------------------- # Overhead Command Shared Object Symbol # ........ ....... ....................... ........................................ # 92.93% kmscube [kernel.kallsyms] [k] udl_render_hline 2.51% kmscube [kernel.kallsyms] [k] __divsi3 0.33% kmscube [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 0.22% kmscube [kernel.kallsyms] [k] lock_acquire 0.19% kmscube [kernel.kallsyms] [k] _raw_spin_unlock_irq 0.17% kmscube [kernel.kallsyms] [k] udl_handle_damage 0.12% kmscube [kernel.kallsyms] [k] v7_dma_clean_range 0.11% kmscube [kernel.kallsyms] [k] l2c210_clean_range 0.06% kmscube [kernel.kallsyms] [k] __memzero --------------------------->8-------------------------
That said it's not even USB 2.0 which is a bottle-neck but computations in the udl_render_hline().
[1] https://cgit.freedesktop.org/mesa/kmscube/
-Alexey
Try this patch http://people.redhat.com/~mpatocka/patches/kernel/udl/udlkms-avoid-division....
It is doing a lot of divisions - and WandBoard has Cortex-A9, that doesn't have division instruction.
BTW. the framebuffer UDL driver (not the modesetting driver) has performance counters in sysfs. Their location depends on the system, you can find them with find /sys -name "*metrics*"
The file "metrics_reset" resets the counters, so you can measure if the prefetch instructions improve performance or not.
Mikulas
Hi Mikulas,
On Wed, 2018-06-06 at 11:46 -0400, Mikulas Patocka wrote:
On Wed, 6 Jun 2018, Alexey Brodkin wrote:
Hi Mikulas,
On Tue, 2018-06-05 at 11:30 -0400, Mikulas Patocka wrote:
On Tue, 5 Jun 2018, Alexey Brodkin wrote:
Hi Mikulas,
On Sun, 2018-06-03 at 16:41 +0200, Mikulas Patocka wrote:
Modern processors can detect linear memory accesses and prefetch data automatically, so there's no need to use prefetch.
Not each and every CPU that's capable of running Linux has prefetch functionality :)
Still read-on...
Signed-off-by: Mikulas Patocka mpatocka@redhat.com
drivers/gpu/drm/udl/udl_transfer.c | 7 ------- 1 file changed, 7 deletions(-)
Index: linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c
--- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200 +++ linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200 @@ -13,7 +13,6 @@ #include <linux/module.h> #include <linux/slab.h> #include <linux/fb.h> -#include <linux/prefetch.h> #include <asm/unaligned.h>
#include <drm/drmP.h> @@ -51,9 +50,6 @@ static int udl_trim_hline(const u8 *bbac int start = width; int end = width;
- prefetch((void *) front);
- prefetch((void *) back);
AFAIK prefetcher fetches new data according to a known history... i.e. based on previously used pattern we'll trying to get the next batch of data.
But the code above is in the very beginning of the data processing routine where prefetcher doesn't yet have any history to know what and where to prefetch.
So I'd say this particular usage is good. At least those prefetches shouldn't hurt because typically it would be just 1 instruction if those exist or nothing if CPU/compiler doesn't support it.
See this post https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_444336... 656e ViXO7breS55ytWkhpk5R81I&m=a5RaqJtvajFkM1hL7bOKD5jV7cpFfTvG2Y1cYCdBPd0&s=w0W8wFtAgENp8TE6RzdPGhdKRasJc_otIn08V0EkgrY&e= where they measured that prefetch hurts performance. Prefetch shouldn't be used unless you have a proof that it improves performance.
The problem is that the prefetch instruction causes stalls in the pipeline when it encounters TLB miss and the automatic prefetcher doesn't.
Wow, thanks for the link. I didn't know about that subtle issue with prefetch instructions on ARM and x86.
So OK in case of UDL these prefetches anyways make not not much sense I guess and there's something worse still, see what I've got from WandBoard Quad running kmscube [1] application with help of perf utility: --------------------------->8------------------------- # Overhead Command Shared Object Symbol # ........ ....... ....................... ........................................ # 92.93% kmscube [kernel.kallsyms] [k] udl_render_hline 2.51% kmscube [kernel.kallsyms] [k] __divsi3 0.33% kmscube [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 0.22% kmscube [kernel.kallsyms] [k] lock_acquire 0.19% kmscube [kernel.kallsyms] [k] _raw_spin_unlock_irq 0.17% kmscube [kernel.kallsyms] [k] udl_handle_damage 0.12% kmscube [kernel.kallsyms] [k] v7_dma_clean_range 0.11% kmscube [kernel.kallsyms] [k] l2c210_clean_range 0.06% kmscube [kernel.kallsyms] [k] __memzero --------------------------->8-------------------------
That said it's not even USB 2.0 which is a bottle-neck but computations in the udl_render_hline().
[1] https://urldefense.proofpoint.com/v2/url?u=https-3A__cgit.freedesktop.org_me... 6eViXO7breS55ytWkhpk5R81I&m=pj60-2aLQeePuzK_lo0lQ1j9GenwSnjZ6UmI3r_nbBU&s=JMUk3_YdOpEQTbIyAs0hGvbUgNRhn4ytlIaJ9iE_Lbk&e=
-Alexey
Try this patch https://urldefense.proofpoint.com/v2/url?u=http-3A__people.redhat.com_-7Empa... _X_6JkXFx7AXWqB0tg&r=lqdeeSSEes0GFDDl656eViXO7breS55ytWkhpk5R81I&m=pj60- 2aLQeePuzK_lo0lQ1j9GenwSnjZ6UmI3r_nbBU&s=HNJmAaDh2uo9wtYsChwu_TpeOY5PmkUDpicFArzK3wE&e=
It is doing a lot of divisions - and WandBoard has Cortex-A9, that doesn't have division instruction.
Looks like that patch makes not that much of a difference.
Below are results of perf.
Without patch: ----------------------------->8------------------------------------------- 91.46% kmscube [kernel.kallsyms] [k] udl_render_hline 2.15% kmscube [kernel.kallsyms] [k] __divsi3 1.09% kmscube [kernel.kallsyms] [k] mmioset 0.48% kmscube [kernel.kallsyms] [k] _raw_spin_unlock_irq 0.48% kmscube [kernel.kallsyms] [k] v7_dma_clean_range 0.34% kmscube [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 0.28% kmscube [kernel.kallsyms] [k] l2c210_clean_range 0.25% kmscube [kernel.kallsyms] [k] shmem_getpage_gfp.constprop.4 0.18% kmscube [kernel.kallsyms] [k] lock_acquire
With patch: ----------------------------->8------------------------------------------- 94.81% kmscube [kernel.kallsyms] [k] udl_render_hline 0.88% kmscube [kernel.kallsyms] [k] mmioset 0.44% kmscube [kernel.kallsyms] [k] _raw_spin_unlock_irq 0.38% kmscube [kernel.kallsyms] [k] v7_dma_clean_range 0.33% kmscube [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 0.22% kmscube [kernel.kallsyms] [k] l2c210_clean_range 0.16% kmscube [kernel.kallsyms] [k] shmem_getpage_gfp.constprop.4 0.16% kmscube [kernel.kallsyms] [k] udl_handle_damage 0.16% kmscube [kernel.kallsyms] [k] lock_acquire ----------------------------->8-------------------------------------------
There is no mode __divsi3 function in perf results, so patch works. But this function was just ~2% of overall overhead.
-Alexey
spin_lock_irqsave and spin_unlock_irqrestore is inteded to be called from a context where it is unknown if interrupts are enabled or disabled (such as interrupt handlers). From a process context, we should call spin_lock_irq and spin_unlock_irq, that avoids the costly pushf and popf instructions.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com
--- drivers/gpu/drm/udl/udl_main.c | 10 ++++------ drivers/gpu/drm/udl/udl_modeset.c | 5 ++--- 2 files changed, 6 insertions(+), 9 deletions(-)
Index: linux-4.16.12/drivers/gpu/drm/udl/udl_main.c =================================================================== --- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_main.c 2018-05-31 11:17:01.000000000 +0200 +++ linux-4.16.12/drivers/gpu/drm/udl/udl_main.c 2018-05-31 11:17:01.000000000 +0200 @@ -170,7 +170,6 @@ static void udl_free_urb_list(struct drm struct list_head *node; struct urb_node *unode; struct urb *urb; - unsigned long flags;
DRM_DEBUG("Waiting for completes and freeing all render urbs\n");
@@ -178,12 +177,12 @@ static void udl_free_urb_list(struct drm while (count--) { down(&udl->urbs.limit_sem);
- spin_lock_irqsave(&udl->urbs.lock, flags); + spin_lock_irq(&udl->urbs.lock);
node = udl->urbs.list.next; /* have reserved one with sem */ list_del_init(node);
- spin_unlock_irqrestore(&udl->urbs.lock, flags); + spin_unlock_irq(&udl->urbs.lock);
unode = list_entry(node, struct urb_node, entry); urb = unode->urb; @@ -268,7 +267,6 @@ struct urb *udl_get_urb(struct drm_devic struct list_head *entry; struct urb_node *unode; struct urb *urb = NULL; - unsigned long flags;
/* Wait for an in-flight buffer to complete and get re-queued */ ret = down_timeout(&udl->urbs.limit_sem, GET_URB_TIMEOUT); @@ -279,14 +277,14 @@ struct urb *udl_get_urb(struct drm_devic goto error; }
- spin_lock_irqsave(&udl->urbs.lock, flags); + spin_lock_irq(&udl->urbs.lock);
BUG_ON(list_empty(&udl->urbs.list)); /* reserved one with limit_sem */ entry = udl->urbs.list.next; list_del_init(entry); udl->urbs.available--;
- spin_unlock_irqrestore(&udl->urbs.lock, flags); + spin_unlock_irq(&udl->urbs.lock);
unode = list_entry(entry, struct urb_node, entry); urb = unode->urb; Index: linux-4.16.12/drivers/gpu/drm/udl/udl_modeset.c =================================================================== --- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_modeset.c 2018-05-31 11:17:01.000000000 +0200 +++ linux-4.16.12/drivers/gpu/drm/udl/udl_modeset.c 2018-05-31 11:17:01.000000000 +0200 @@ -366,7 +366,6 @@ static int udl_crtc_page_flip(struct drm { struct udl_framebuffer *ufb = to_udl_fb(fb); struct drm_device *dev = crtc->dev; - unsigned long flags;
struct drm_framebuffer *old_fb = crtc->primary->fb; if (old_fb) { @@ -377,10 +376,10 @@ static int udl_crtc_page_flip(struct drm
udl_handle_damage(ufb, 0, 0, fb->width, fb->height);
- spin_lock_irqsave(&dev->event_lock, flags); + spin_lock_irq(&dev->event_lock); if (event) drm_crtc_send_vblank_event(crtc, event); - spin_unlock_irqrestore(&dev->event_lock, flags); + spin_unlock_irq(&dev->event_lock); crtc->primary->fb = fb;
return 0;
The udl kms driver writes messages to the syslog whenever some application opens or closes /dev/fb0 and whenever the user switches between the Xserver and the console.
This patch changes the priority of these messages to debug.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com
--- drivers/gpu/drm/udl/udl_fb.c | 6 +++--- drivers/gpu/drm/udl/udl_modeset.c | 2 +- 2 files changed, 4 insertions(+), 4 deletions(-)
Index: linux-4.17-rc7/drivers/gpu/drm/udl/udl_fb.c =================================================================== --- linux-4.17-rc7.orig/drivers/gpu/drm/udl/udl_fb.c 2018-06-03 13:17:58.000000000 +0200 +++ linux-4.17-rc7/drivers/gpu/drm/udl/udl_fb.c 2018-06-03 13:42:48.000000000 +0200 @@ -179,7 +179,7 @@ static int udl_fb_mmap(struct fb_info *i
pos = (unsigned long)info->fix.smem_start + offset;
- pr_notice("mmap() framebuffer addr:%lu size:%lu\n", + pr_debug("mmap() framebuffer addr:%lu size:%lu\n", pos, size);
/* We don't want the framebuffer to be mapped encrypted */ @@ -237,7 +237,7 @@ static int udl_fb_open(struct fb_info *i } #endif
- pr_notice("open /dev/fb%d user=%d fb_info=%p count=%d\n", + pr_debug("open /dev/fb%d user=%d fb_info=%p count=%d\n", info->node, user, info, ufbdev->fb_count);
return 0; @@ -262,7 +262,7 @@ static int udl_fb_release(struct fb_info } #endif
- pr_warn("released /dev/fb%d user=%d count=%d\n", + pr_debug("released /dev/fb%d user=%d count=%d\n", info->node, user, ufbdev->fb_count);
return 0; Index: linux-4.17-rc7/drivers/gpu/drm/udl/udl_modeset.c =================================================================== --- linux-4.17-rc7.orig/drivers/gpu/drm/udl/udl_modeset.c 2018-06-03 13:17:58.000000000 +0200 +++ linux-4.17-rc7/drivers/gpu/drm/udl/udl_modeset.c 2018-06-03 13:41:56.000000000 +0200 @@ -243,7 +243,7 @@ static int udl_crtc_write_mode_to_hw(str
memcpy(buf, udl->mode_buf, udl->mode_buf_len); retval = udl_submit_urb(dev, urb, udl->mode_buf_len); - DRM_INFO("write mode info %d\n", udl->mode_buf_len); + DRM_DEBUG("write mode info %d\n", udl->mode_buf_len); return retval; }
I observed that the performance of the udl fb driver degrades over time. On a freshly booted machine, it takes 6 seconds to do "ls -la /usr/bin"; after some time of use, the same operation takes 14 seconds.
The reason is that the value of "limit_sem" decays over time.
The udl driver uses a semaphore "limit_set" to specify how many free urbs are there on dlfb->urbs.list. If the count is zero, the "down" operation will sleep until some urbs are added to the freelist.
In order to avoid some hypothetical deadlock, the driver will not call "up" immediatelly, but it will offload it to a workqueue. The problem is that if we call "schedule_delayed_work" on the same work item multiple times, the work item may only be executed once.
This is happening: * some urb completes * dlfb_urb_completion adds it to the free list * dlfb_urb_completion calls schedule_delayed_work to schedule the function dlfb_release_urb_work to increase the semaphore count * as the urb is on the free list, some other task grabs it and submits it * the submitted urb completes, dlfb_urb_completion is called again * dlfb_urb_completion calls schedule_delayed_work, but the work is already scheduled, so it does nothing * finally, dlfb_release_urb_work is called, it increases the semaphore count by 1, although it should increase it by 2
So, the semaphore count is decreasing over time, and this causes gradual performance degradation.
Note that in the current kernel, the "up" function may be called from interrupt and it may race with the "down" function called by another thread, so we don't have to offload the call of "up" to a workqueue at all. This patch removes the workqueue code. The patch also changes "down_interruptible" to "down" in dlfb_free_urb_list, so that we will clean up the driver properly even if a signal arrives.
With this patch, the performance of udlfb no longer degrades.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org
--- drivers/video/fbdev/udlfb.c | 27 ++------------------------- include/video/udlfb.h | 1 - 2 files changed, 2 insertions(+), 26 deletions(-)
Index: linux-4.17-rc7/drivers/video/fbdev/udlfb.c =================================================================== --- linux-4.17-rc7.orig/drivers/video/fbdev/udlfb.c 2018-05-31 12:31:04.000000000 +0200 +++ linux-4.17-rc7/drivers/video/fbdev/udlfb.c 2018-05-31 12:31:04.000000000 +0200 @@ -922,14 +922,6 @@ static void dlfb_free(struct kref *kref) kfree(dlfb); }
-static void dlfb_release_urb_work(struct work_struct *work) -{ - struct urb_node *unode = container_of(work, struct urb_node, - release_urb_work.work); - - up(&unode->dlfb->urbs.limit_sem); -} - static void dlfb_free_framebuffer(struct dlfb_data *dlfb) { struct fb_info *info = dlfb->info; @@ -1789,14 +1781,7 @@ static void dlfb_urb_completion(struct u dlfb->urbs.available++; spin_unlock_irqrestore(&dlfb->urbs.lock, flags);
- /* - * When using fb_defio, we deadlock if up() is called - * while another is waiting. So queue to another process. - */ - if (fb_defio) - schedule_delayed_work(&unode->release_urb_work, 0); - else - up(&dlfb->urbs.limit_sem); + up(&dlfb->urbs.limit_sem); }
static void dlfb_free_urb_list(struct dlfb_data *dlfb) @@ -1805,16 +1790,11 @@ static void dlfb_free_urb_list(struct dl struct list_head *node; struct urb_node *unode; struct urb *urb; - int ret; unsigned long flags;
/* keep waiting and freeing, until we've got 'em all */ while (count--) { - - /* Getting interrupted means a leak, but ok at disconnect */ - ret = down_interruptible(&dlfb->urbs.limit_sem); - if (ret) - break; + down(&dlfb->urbs.limit_sem);
spin_lock_irqsave(&dlfb->urbs.lock, flags);
@@ -1854,9 +1834,6 @@ static int dlfb_alloc_urb_list(struct dl break; unode->dlfb = dlfb;
- INIT_DELAYED_WORK(&unode->release_urb_work, - dlfb_release_urb_work); - urb = usb_alloc_urb(0, GFP_KERNEL); if (!urb) { kfree(unode); Index: linux-4.17-rc7/include/video/udlfb.h =================================================================== --- linux-4.17-rc7.orig/include/video/udlfb.h 2018-05-31 12:31:04.000000000 +0200 +++ linux-4.17-rc7/include/video/udlfb.h 2018-05-31 12:31:04.000000000 +0200 @@ -20,7 +20,6 @@ struct dloarea { struct urb_node { struct list_head entry; struct dlfb_data *dlfb; - struct delayed_work release_urb_work; struct urb *urb; };
The displaylink hardware has such a peculiarity that it doesn't render a command until next command is received. This produces occasional corruption, such as when setting 22x11 font on the console, only the first line of the cursor will be blinking if the cursor is located at some specific columns.
When we end up with a repeating pixel, the driver has a bug that it leaves one uninitialized byte after the command (and this byte is enough to flush the command and render it - thus it fixes the screen corruption), however whe we end up with a non-repeating pixel, there is no byte appended and this results in temporary screen corruption.
This patch fixes the screen corruption by always appending a byte 0xAF at the end of URB. It also removes the uninitialized byte.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org
--- drivers/video/fbdev/udlfb.c | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-)
Index: linux-4.17-rc7/drivers/video/fbdev/udlfb.c =================================================================== --- linux-4.17-rc7.orig/drivers/video/fbdev/udlfb.c 2018-05-31 14:43:13.000000000 +0200 +++ linux-4.17-rc7/drivers/video/fbdev/udlfb.c 2018-05-31 14:46:03.000000000 +0200 @@ -27,6 +27,7 @@ #include <linux/slab.h> #include <linux/prefetch.h> #include <linux/delay.h> +#include <asm/unaligned.h> #include <video/udlfb.h> #include "edid.h"
@@ -450,17 +451,17 @@ static void dlfb_compress_hline( raw_pixels_count_byte = cmd++; /* we'll know this later */ raw_pixel_start = pixel;
- cmd_pixel_end = pixel + min(MAX_CMD_PIXELS + 1, - min((int)(pixel_end - pixel), - (int)(cmd_buffer_end - cmd) / BPP)); + cmd_pixel_end = pixel + min3(MAX_CMD_PIXELS + 1UL, + (unsigned long)(pixel_end - pixel), + (unsigned long)(cmd_buffer_end - 1 - cmd) / BPP);
- prefetch_range((void *) pixel, (cmd_pixel_end - pixel) * BPP); + prefetch_range((void *) pixel, (u8 *)cmd_pixel_end - (u8 *)pixel);
while (pixel < cmd_pixel_end) { const uint16_t * const repeating_pixel = pixel;
- *cmd++ = *pixel >> 8; - *cmd++ = *pixel; + put_unaligned_be16(*pixel, cmd); + cmd += 2; pixel++;
if (unlikely((pixel < cmd_pixel_end) && @@ -486,13 +487,16 @@ static void dlfb_compress_hline( if (pixel > raw_pixel_start) { /* finalize last RAW span */ *raw_pixels_count_byte = (pixel-raw_pixel_start) & 0xFF; + } else { + /* undo unused byte */ + cmd--; }
*cmd_pixels_count_byte = (pixel - cmd_pixel_start) & 0xFF; - dev_addr += (pixel - cmd_pixel_start) * BPP; + dev_addr += (u8 *)pixel - (u8 *)cmd_pixel_start; }
- if (cmd_buffer_end <= MIN_RLX_CMD_BYTES + cmd) { + if (cmd_buffer_end - MIN_RLX_CMD_BYTES <= cmd) { /* Fill leftover bytes with no-ops */ if (cmd_buffer_end > cmd) memset(cmd, 0xAF, cmd_buffer_end - cmd); @@ -610,8 +614,11 @@ static int dlfb_handle_damage(struct dlf }
if (cmd > (char *) urb->transfer_buffer) { + int len; + if (cmd < (char *) urb->transfer_buffer + urb->transfer_buffer_length) + *cmd++ = 0xAF; /* Send partial buffer remaining before exiting */ - int len = cmd - (char *) urb->transfer_buffer; + len = cmd - (char *) urb->transfer_buffer; ret = dlfb_submit_urb(dlfb, urb, len); bytes_sent += len; } else @@ -735,8 +742,11 @@ static void dlfb_dpy_deferred_io(struct }
if (cmd > (char *) urb->transfer_buffer) { + int len; + if (cmd < (char *) urb->transfer_buffer + urb->transfer_buffer_length) + *cmd++ = 0xAF; /* Send partial buffer remaining before exiting */ - int len = cmd - (char *) urb->transfer_buffer; + len = cmd - (char *) urb->transfer_buffer; dlfb_submit_urb(dlfb, urb, len); bytes_sent += len; } else
The udlfb driver reprograms the hardware everytime the user switches the console, that makes quite unusable when working on the console.
This patch makes the driver remember the videomode we are in and avoid reprogramming the hardware if we switch to the same videomode.
We mask the "activate" field and the "FB_VMODE_SMOOTH_XPAN" flag when comparing the videomode, because they cause spurious switches when switching to and from the Xserver.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org
--- drivers/video/fbdev/udlfb.c | 18 ++++++++++++++++-- include/video/udlfb.h | 1 + 2 files changed, 17 insertions(+), 2 deletions(-)
Index: linux-4.17-rc7/drivers/video/fbdev/udlfb.c =================================================================== --- linux-4.17-rc7.orig/drivers/video/fbdev/udlfb.c 2018-05-31 14:49:52.000000000 +0200 +++ linux-4.17-rc7/drivers/video/fbdev/udlfb.c 2018-05-31 14:49:52.000000000 +0200 @@ -1041,10 +1041,24 @@ static int dlfb_ops_set_par(struct fb_in int result; u16 *pix_framebuffer; int i; + struct fb_var_screeninfo fvs; + + /* clear the activate field because it causes spurious miscompares */ + fvs = info->var; + fvs.activate = 0; + fvs.vmode &= ~FB_VMODE_SMOOTH_XPAN; + + if (!memcmp(&dlfb->current_mode, &fvs, sizeof(struct fb_var_screeninfo))) + return 0;
result = dlfb_set_video_mode(dlfb, &info->var);
- if ((result == 0) && (dlfb->fb_count == 0)) { + if (result) + return result; + + dlfb->current_mode = fvs; + + if (dlfb->fb_count == 0) {
/* paint greenscreen */
@@ -1056,7 +1070,7 @@ static int dlfb_ops_set_par(struct fb_in info->screen_base); }
- return result; + return 0; }
/* To fonzi the jukebox (e.g. make blanking changes take effect) */ Index: linux-4.17-rc7/include/video/udlfb.h =================================================================== --- linux-4.17-rc7.orig/include/video/udlfb.h 2018-05-31 14:49:52.000000000 +0200 +++ linux-4.17-rc7/include/video/udlfb.h 2018-05-31 14:49:52.000000000 +0200 @@ -56,6 +56,7 @@ struct dlfb_data { atomic_t bytes_identical; /* saved effort with backbuffer comparison */ atomic_t bytes_sent; /* to usb, after compression including overhead */ atomic_t cpu_kcycles_used; /* transpired during pixel processing */ + struct fb_var_screeninfo current_mode; };
#define NR_USB_REQUEST_I2C_SUB_IO 0x02
The defio subsystem overwrites the method fb_osp->mmap. That method is stored in module's static data - and that means that if we have multiple diplaylink adapters, they will over write each other's method.
In order to avoid interference between multiple adapters, we copy the fb_ops structure to a device-local memory.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org
--- drivers/video/fbdev/udlfb.c | 3 ++- include/video/udlfb.h | 1 + 2 files changed, 3 insertions(+), 1 deletion(-)
Index: linux-4.17-rc7/drivers/video/fbdev/udlfb.c =================================================================== --- linux-4.17-rc7.orig/drivers/video/fbdev/udlfb.c 2018-06-03 13:17:33.000000000 +0200 +++ linux-4.17-rc7/drivers/video/fbdev/udlfb.c 2018-06-03 13:17:33.000000000 +0200 @@ -1665,7 +1665,8 @@ static void dlfb_init_framebuffer_work(s dlfb->info = info; info->par = dlfb; info->pseudo_palette = dlfb->pseudo_palette; - info->fbops = &dlfb_ops; + dlfb->ops = dlfb_ops; + info->fbops = &dlfb->ops;
retval = fb_alloc_cmap(&info->cmap, 256, 0); if (retval < 0) { Index: linux-4.17-rc7/include/video/udlfb.h =================================================================== --- linux-4.17-rc7.orig/include/video/udlfb.h 2018-06-03 13:17:33.000000000 +0200 +++ linux-4.17-rc7/include/video/udlfb.h 2018-06-03 13:17:33.000000000 +0200 @@ -51,6 +51,7 @@ struct dlfb_data { int base8; u32 pseudo_palette[256]; int blank_mode; /*one of FB_BLANK_ */ + struct fb_ops ops; /* blit-only rendering path metrics, exposed through sysfs */ atomic_t bytes_rendered; /* raw pixel-bytes driver asked to render */ atomic_t bytes_identical; /* saved effort with backbuffer comparison */
The default delay 5 jiffies is too much when the kernel is compiled with HZ=100 - it results in jumpy cursor in Xwindow.
In order to find out the optimal delay, I benchmarked the driver on 1280x720x30fps video. I found out that with HZ=1000, 10ms is acceptable, but with HZ=250 or HZ=300, we need 4ms, so that the video is played without any frame skips.
This patch changes the delay to this value.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org
--- include/video/udlfb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Index: linux-4.17-rc7/include/video/udlfb.h =================================================================== --- linux-4.17-rc7.orig/include/video/udlfb.h 2018-06-03 13:17:37.000000000 +0200 +++ linux-4.17-rc7/include/video/udlfb.h 2018-06-03 13:17:37.000000000 +0200 @@ -88,7 +88,7 @@ struct dlfb_data { #define MIN_RAW_PIX_BYTES 2 #define MIN_RAW_CMD_BYTES (RAW_HEADER_BYTES + MIN_RAW_PIX_BYTES)
-#define DL_DEFIO_WRITE_DELAY 5 /* fb_deferred_io.delay in jiffies */ +#define DL_DEFIO_WRITE_DELAY msecs_to_jiffies(HZ <= 300 ? 4 : 10) /* optimal value for 720p video */ #define DL_DEFIO_WRITE_DISABLE (HZ*60) /* "disable" with long delay */
/* remove these once align.h patch is taken into kernel */
Allocations larger than PAGE_ALLOC_COSTLY_ORDER are unreliable and they may fail anytime. This patch fixes the udlfb driver so that when a large alloactions fails, it tries to do multiple smaller allocations.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org
--- drivers/video/fbdev/udlfb.c | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-)
Index: linux-4.17-rc7/drivers/video/fbdev/udlfb.c =================================================================== --- linux-4.17-rc7.orig/drivers/video/fbdev/udlfb.c 2018-06-03 13:17:38.000000000 +0200 +++ linux-4.17-rc7/drivers/video/fbdev/udlfb.c 2018-06-03 13:17:38.000000000 +0200 @@ -1843,17 +1843,22 @@ static void dlfb_free_urb_list(struct dl
static int dlfb_alloc_urb_list(struct dlfb_data *dlfb, int count, size_t size) { - int i = 0; struct urb *urb; struct urb_node *unode; char *buf; + size_t wanted_size = count * size;
spin_lock_init(&dlfb->urbs.lock);
+retry: dlfb->urbs.size = size; INIT_LIST_HEAD(&dlfb->urbs.list);
- while (i < count) { + sema_init(&dlfb->urbs.limit_sem, 0); + dlfb->urbs.count = 0; + dlfb->urbs.available = 0; + + while (dlfb->urbs.count * size < wanted_size) { unode = kzalloc(sizeof(*unode), GFP_KERNEL); if (!unode) break; @@ -1866,11 +1871,16 @@ static int dlfb_alloc_urb_list(struct dl } unode->urb = urb;
- buf = usb_alloc_coherent(dlfb->udev, MAX_TRANSFER, GFP_KERNEL, + buf = usb_alloc_coherent(dlfb->udev, size, GFP_KERNEL, &urb->transfer_dma); if (!buf) { kfree(unode); usb_free_urb(urb); + if (size > PAGE_SIZE) { + size /= 2; + dlfb_free_urb_list(dlfb); + goto retry; + } break; }
@@ -1881,14 +1891,12 @@ static int dlfb_alloc_urb_list(struct dl
list_add_tail(&unode->entry, &dlfb->urbs.list);
- i++; + up(&dlfb->urbs.limit_sem); + dlfb->urbs.count++; + dlfb->urbs.available++; }
- sema_init(&dlfb->urbs.limit_sem, i); - dlfb->urbs.count = i; - dlfb->urbs.available = i; - - return i; + return dlfb->urbs.count; }
static struct urb *dlfb_get_urb(struct dlfb_data *dlfb)
Set the variable "line_length" in the function dlfb_ops_set_par. Without this, we get garbage if we select different videomode with fbset.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org
--- drivers/video/fbdev/udlfb.c | 1 + 1 file changed, 1 insertion(+)
Index: linux-4.17-rc7/drivers/video/fbdev/udlfb.c =================================================================== --- linux-4.17-rc7.orig/drivers/video/fbdev/udlfb.c 2018-05-31 14:50:00.000000000 +0200 +++ linux-4.17-rc7/drivers/video/fbdev/udlfb.c 2018-05-31 14:50:00.000000000 +0200 @@ -1057,6 +1057,7 @@ static int dlfb_ops_set_par(struct fb_in return result;
dlfb->current_mode = fvs; + info->fix.line_length = info->var.xres * (info->var.bits_per_pixel / 8);
if (dlfb->fb_count == 0) {
This patch changes udlfb so that it may reallocate the framebuffer when setting higher-resolution mode. If we boot the system without monitor attached, udlfb creates a framebuffer with the size 800x600. This patch makes it possible to select higher videomode with the fbset command when a monitor is attached.
Note that there is no reliable way to prevent the system from touching the old framebuffer, so we must not free it. We add it to the list dlfb->deferred_free and free it when the driver is unloaded.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com
--- drivers/video/fbdev/udlfb.c | 70 +++++++++++++++++++++++++++++--------------- include/video/udlfb.h | 1 2 files changed, 48 insertions(+), 23 deletions(-)
Index: linux-4.17-rc7/drivers/video/fbdev/udlfb.c =================================================================== --- linux-4.17-rc7.orig/drivers/video/fbdev/udlfb.c 2018-06-03 13:17:41.000000000 +0200 +++ linux-4.17-rc7/drivers/video/fbdev/udlfb.c 2018-06-03 13:17:41.000000000 +0200 @@ -73,6 +73,13 @@ static bool fb_defio = 1; /* Detect mma static bool shadow = 1; /* Optionally disable shadow framebuffer */ static int pixel_limit; /* Optionally force a pixel resolution limit */
+struct dlfb_deferred_free { + struct list_head list; + void *mem; +}; + +static int dlfb_realloc_framebuffer(struct dlfb_data *dlfb, struct fb_info *info, u32 new_len); + /* dlfb keeps a list of urbs for efficient bulk transfers */ static void dlfb_urb_completion(struct urb *urb); static struct urb *dlfb_get_urb(struct dlfb_data *dlfb); @@ -927,6 +934,12 @@ static void dlfb_free(struct kref *kref) { struct dlfb_data *dlfb = container_of(kref, struct dlfb_data, kref);
+ while (!list_empty(&dlfb->deferred_free)) { + struct dlfb_deferred_free *d = list_entry(dlfb->deferred_free.next, struct dlfb_deferred_free, list); + list_del(&d->list); + vfree(d->mem); + kfree(d); + } vfree(dlfb->backing_buffer); kfree(dlfb->edid); kfree(dlfb); @@ -1020,10 +1033,6 @@ static int dlfb_ops_check_var(struct fb_ struct fb_videomode mode; struct dlfb_data *dlfb = info->par;
- /* TODO: support dynamically changing framebuffer size */ - if ((var->xres * var->yres * 2) > info->fix.smem_len) - return -EINVAL; - /* set device-specific elements of var unrelated to mode */ dlfb_var_color_format(var);
@@ -1042,6 +1051,7 @@ static int dlfb_ops_set_par(struct fb_in u16 *pix_framebuffer; int i; struct fb_var_screeninfo fvs; + u32 line_length = info->var.xres * (info->var.bits_per_pixel / 8);
/* clear the activate field because it causes spurious miscompares */ fvs = info->var; @@ -1051,13 +1061,17 @@ static int dlfb_ops_set_par(struct fb_in if (!memcmp(&dlfb->current_mode, &fvs, sizeof(struct fb_var_screeninfo))) return 0;
+ result = dlfb_realloc_framebuffer(dlfb, info, info->var.yres * line_length); + if (result) + return result; + result = dlfb_set_video_mode(dlfb, &info->var);
if (result) return result;
dlfb->current_mode = fvs; - info->fix.line_length = info->var.xres * (info->var.bits_per_pixel / 8); + info->fix.line_length = line_length;
if (dlfb->fb_count == 0) {
@@ -1066,11 +1080,11 @@ static int dlfb_ops_set_par(struct fb_in pix_framebuffer = (u16 *) info->screen_base; for (i = 0; i < info->fix.smem_len / 2; i++) pix_framebuffer[i] = 0x37e6; - - dlfb_handle_damage(dlfb, 0, 0, info->var.xres, info->var.yres, - info->screen_base); }
+ dlfb_handle_damage(dlfb, 0, 0, info->var.xres, info->var.yres, + info->screen_base); + return 0; }
@@ -1146,21 +1160,29 @@ static struct fb_ops dlfb_ops = { };
+static void dlfb_deferred_vfree(struct dlfb_data *dlfb, void *mem) +{ + struct dlfb_deferred_free *d = kmalloc(sizeof(struct dlfb_deferred_free), GFP_KERNEL); + if (!d) + return; + d->mem = mem; + list_add(&d->list, &dlfb->deferred_free); +} + /* * Assumes &info->lock held by caller * Assumes no active clients have framebuffer open */ -static int dlfb_realloc_framebuffer(struct dlfb_data *dlfb, struct fb_info *info) +static int dlfb_realloc_framebuffer(struct dlfb_data *dlfb, struct fb_info *info, u32 new_len) { - int old_len = info->fix.smem_len; - int new_len; + u32 old_len = info->fix.smem_len; unsigned char *old_fb = info->screen_base; unsigned char *new_fb; unsigned char *new_back = NULL;
- new_len = info->fix.line_length * info->var.yres; + new_len = PAGE_ALIGN(new_len);
- if (PAGE_ALIGN(new_len) > old_len) { + if (new_len > old_len) { /* * Alloc system memory for virtual framebuffer */ @@ -1169,14 +1191,15 @@ static int dlfb_realloc_framebuffer(stru dev_err(info->dev, "Virtual framebuffer alloc failed\n"); return -ENOMEM; } + memset(new_fb, 0xff, new_len);
if (info->screen_base) { memcpy(new_fb, old_fb, old_len); - vfree(info->screen_base); + dlfb_deferred_vfree(dlfb, info->screen_base); }
info->screen_base = new_fb; - info->fix.smem_len = PAGE_ALIGN(new_len); + info->fix.smem_len = new_len; info->fix.smem_start = (unsigned long) new_fb; info->flags = udlfb_info_flags;
@@ -1192,7 +1215,7 @@ static int dlfb_realloc_framebuffer(stru dev_info(info->dev, "No shadow/backing buffer allocated\n"); else { - vfree(dlfb->backing_buffer); + dlfb_deferred_vfree(dlfb, dlfb->backing_buffer); dlfb->backing_buffer = new_back; } } @@ -1344,11 +1367,6 @@ static int dlfb_setup_modes(struct dlfb_ * with mode size info, we can now alloc our framebuffer. */ memcpy(&info->fix, &dlfb_fix, sizeof(dlfb_fix)); - info->fix.line_length = info->var.xres * - (info->var.bits_per_pixel / 8); - - result = dlfb_realloc_framebuffer(dlfb, info); - } else result = -EINVAL;
@@ -1436,7 +1454,10 @@ static ssize_t edid_store( if (!dlfb->edid || memcmp(src, dlfb->edid, src_size)) return -EINVAL;
- dlfb_ops_set_par(fb_info); + ret = dlfb_ops_set_par(fb_info); + if (ret) + return ret; + return src_size; }
@@ -1596,6 +1617,7 @@ static int dlfb_usb_probe(struct usb_int }
kref_init(&dlfb->kref); /* matching kref_put in usb .disconnect fn */ + INIT_LIST_HEAD(&dlfb->deferred_free);
dlfb->udev = usbdev; usb_set_intfdata(intf, dlfb); @@ -1693,7 +1715,9 @@ static void dlfb_init_framebuffer_work(s dlfb_select_std_channel(dlfb);
dlfb_ops_check_var(&info->var, info); - dlfb_ops_set_par(info); + retval = dlfb_ops_set_par(info); + if (retval) + goto error;
retval = register_framebuffer(info); if (retval < 0) { Index: linux-4.17-rc7/include/video/udlfb.h =================================================================== --- linux-4.17-rc7.orig/include/video/udlfb.h 2018-06-03 13:17:41.000000000 +0200 +++ linux-4.17-rc7/include/video/udlfb.h 2018-06-03 13:17:41.000000000 +0200 @@ -58,6 +58,7 @@ struct dlfb_data { atomic_t bytes_sent; /* to usb, after compression including overhead */ atomic_t cpu_kcycles_used; /* transpired during pixel processing */ struct fb_var_screeninfo current_mode; + struct list_head deferred_free; };
#define NR_USB_REQUEST_I2C_SUB_IO 0x02
Hi Mikulas,
I love your patch! Perhaps something to improve:
[auto build test WARNING on drm/drm-next] [also build test WARNING on v4.17-rc7 next-20180601] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Mikulas-Patocka/USB-DisplayLink-pat... base: git://people.freedesktop.org/~airlied/linux.git drm-next reproduce: # apt-get install sparse make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__
sparse warnings: (new ones prefixed by >>)
vim +1198 drivers/video/fbdev/udlfb.c
1171 1172 /* 1173 * Assumes &info->lock held by caller 1174 * Assumes no active clients have framebuffer open 1175 */ 1176 static int dlfb_realloc_framebuffer(struct dlfb_data *dlfb, struct fb_info *info, u32 new_len) 1177 { 1178 u32 old_len = info->fix.smem_len;
1179 unsigned char *old_fb = info->screen_base;
1180 unsigned char *new_fb; 1181 unsigned char *new_back = NULL; 1182 1183 new_len = PAGE_ALIGN(new_len); 1184 1185 if (new_len > old_len) { 1186 /* 1187 * Alloc system memory for virtual framebuffer 1188 */ 1189 new_fb = vmalloc(new_len); 1190 if (!new_fb) { 1191 dev_err(info->dev, "Virtual framebuffer alloc failed\n"); 1192 return -ENOMEM; 1193 } 1194 memset(new_fb, 0xff, new_len); 1195 1196 if (info->screen_base) { 1197 memcpy(new_fb, old_fb, old_len);
1198 dlfb_deferred_vfree(dlfb, info->screen_base);
1199 } 1200 1201 info->screen_base = new_fb; 1202 info->fix.smem_len = new_len; 1203 info->fix.smem_start = (unsigned long) new_fb; 1204 info->flags = udlfb_info_flags; 1205 1206 /* 1207 * Second framebuffer copy to mirror the framebuffer state 1208 * on the physical USB device. We can function without this. 1209 * But with imperfect damage info we may send pixels over USB 1210 * that were, in fact, unchanged - wasting limited USB bandwidth 1211 */ 1212 if (shadow) 1213 new_back = vzalloc(new_len); 1214 if (!new_back) 1215 dev_info(info->dev, 1216 "No shadow/backing buffer allocated\n"); 1217 else { 1218 dlfb_deferred_vfree(dlfb, dlfb->backing_buffer); 1219 dlfb->backing_buffer = new_back; 1220 } 1221 } 1222 return 0; 1223 } 1224
--- 0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Mon, 4 Jun 2018, kbuild test robot wrote:
Hi Mikulas,
I love your patch! Perhaps something to improve:
[auto build test WARNING on drm/drm-next] [also build test WARNING on v4.17-rc7 next-20180601] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Mikulas-Patocka/USB-DisplayLink-pat...
What is it really complaining about? That URL shows 404 Not Found and this email has no warnings at all.
base: git://people.freedesktop.org/~airlied/linux.git drm-next reproduce: # apt-get install sparse make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__
sparse warnings: (new ones prefixed by >>)
vim +1198 drivers/video/fbdev/udlfb.c
1171 1172 /* 1173 * Assumes &info->lock held by caller 1174 * Assumes no active clients have framebuffer open 1175 */ 1176 static int dlfb_realloc_framebuffer(struct dlfb_data *dlfb, struct fb_info *info, u32 new_len) 1177 { 1178 u32 old_len = info->fix.smem_len;
1179 unsigned char *old_fb = info->screen_base;
1180 unsigned char *new_fb; 1181 unsigned char *new_back = NULL; 1182 1183 new_len = PAGE_ALIGN(new_len); 1184 1185 if (new_len > old_len) { 1186 /* 1187 * Alloc system memory for virtual framebuffer 1188 */ 1189 new_fb = vmalloc(new_len); 1190 if (!new_fb) { 1191 dev_err(info->dev, "Virtual framebuffer alloc failed\n"); 1192 return -ENOMEM; 1193 } 1194 memset(new_fb, 0xff, new_len); 1195 1196 if (info->screen_base) { 1197 memcpy(new_fb, old_fb, old_len);
1198 dlfb_deferred_vfree(dlfb, info->screen_base);
1199 } 1200 1201 info->screen_base = new_fb; 1202 info->fix.smem_len = new_len; 1203 info->fix.smem_start = (unsigned long) new_fb; 1204 info->flags = udlfb_info_flags; 1205 1206 /* 1207 * Second framebuffer copy to mirror the framebuffer state 1208 * on the physical USB device. We can function without this. 1209 * But with imperfect damage info we may send pixels over USB 1210 * that were, in fact, unchanged - wasting limited USB bandwidth 1211 */ 1212 if (shadow) 1213 new_back = vzalloc(new_len); 1214 if (!new_back) 1215 dev_info(info->dev, 1216 "No shadow/backing buffer allocated\n"); 1217 else { 1218 dlfb_deferred_vfree(dlfb, dlfb->backing_buffer); 1219 dlfb->backing_buffer = new_back; 1220 } 1221 } 1222 return 0; 1223 } 1224
0-DAY kernel test infrastructure Open Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation
On Tuesday, June 12, 2018 12:32:34 PM Mikulas Patocka wrote:
On Mon, 4 Jun 2018, kbuild test robot wrote:
Hi Mikulas,
I love your patch! Perhaps something to improve:
[auto build test WARNING on drm/drm-next] [also build test WARNING on v4.17-rc7 next-20180601] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Mikulas-Patocka/USB-DisplayLink-pat...
What is it really complaining about? That URL shows 404 Not Found and this email has no warnings at all.
screen_base in struct fb_info is annotated with __iomem tag:
... char __iomem *screen_base; /* Virtual address */ ...
and this tag should be preserved (or explicitly casted).
base: git://people.freedesktop.org/~airlied/linux.git drm-next reproduce: # apt-get install sparse make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__
You should be able to reproduce the issue with the above sequence.
sparse warnings: (new ones prefixed by >>)
[...]
1178 u32 old_len = info->fix.smem_len;
1179 unsigned char *old_fb = info->screen_base;
1180 unsigned char *new_fb;
[...]
1196 if (info->screen_base) { 1197 memcpy(new_fb, old_fb, old_len);
1198 dlfb_deferred_vfree(dlfb, info->screen_base);
1199 }
Best regards, -- Bartlomiej Zolnierkiewicz Samsung R&D Institute Poland Samsung Electronics
Currently, the udlfb driver only tests for identical bytes at the beginning or at the end of a page and renders anything between the first and last mismatching pixel. But pages are not the same as lines, so this is quite suboptimal - if there is something modified at the beginning of a page and at the end of a page, the whole page is rendered, even if most of the page is not modified.
This patch makes it test for identical pixels at the beginning and end of each rendering command. This patch improves identical byte detection by 41% when playing video in a window.
This patch also fixes a possible screen corruption if the user is writing to the framebuffer while dlfb_render_hline is in progress - the pixel data that is copied to the backbuffer with memcpy may be different from the pixel data that is actually rendered to the hardware (because the content of the framebuffer may change between memcpy and the rendering command). We must make sure that we copy exactly the same pixel as the pixel that is being rendered.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com
--- drivers/video/fbdev/udlfb.c | 45 +++++++++++++++++++++++++++++++++----------- 1 file changed, 34 insertions(+), 11 deletions(-)
Index: linux-4.17-rc7/drivers/video/fbdev/udlfb.c =================================================================== --- linux-4.17-rc7.orig/drivers/video/fbdev/udlfb.c 2018-05-31 14:51:43.000000000 +0200 +++ linux-4.17-rc7/drivers/video/fbdev/udlfb.c 2018-05-31 14:51:43.000000000 +0200 @@ -431,7 +431,9 @@ static void dlfb_compress_hline( const uint16_t *const pixel_end, uint32_t *device_address_ptr, uint8_t **command_buffer_ptr, - const uint8_t *const cmd_buffer_end) + const uint8_t *const cmd_buffer_end, + unsigned long back_buffer_offset, + int *ident_ptr) { const uint16_t *pixel = *pixel_start_ptr; uint32_t dev_addr = *device_address_ptr; @@ -444,6 +446,14 @@ static void dlfb_compress_hline( const uint16_t *raw_pixel_start = NULL; const uint16_t *cmd_pixel_start, *cmd_pixel_end = NULL;
+ if (back_buffer_offset && + *pixel == *(u16 *)((u8 *)pixel + back_buffer_offset)) { + pixel++; + dev_addr += BPP; + (*ident_ptr)++; + continue; + } + prefetchw((void *) cmd); /* pull in one cache line at least */
*cmd++ = 0xAF; @@ -462,25 +472,37 @@ static void dlfb_compress_hline( (unsigned long)(pixel_end - pixel), (unsigned long)(cmd_buffer_end - 1 - cmd) / BPP);
+ if (back_buffer_offset) { + /* note: the framebuffer may change under us, so we must test for underflow */ + while (cmd_pixel_end - 1 > pixel && + *(cmd_pixel_end - 1) == *(u16 *)((u8 *)(cmd_pixel_end - 1) + back_buffer_offset)) + cmd_pixel_end--; + } + prefetch_range((void *) pixel, (u8 *)cmd_pixel_end - (u8 *)pixel);
while (pixel < cmd_pixel_end) { const uint16_t * const repeating_pixel = pixel; + u16 pixel_value = *pixel;
- put_unaligned_be16(*pixel, cmd); + put_unaligned_be16(pixel_value, cmd); + if (back_buffer_offset) + *(u16 *)((u8 *)pixel + back_buffer_offset) = pixel_value; cmd += 2; pixel++;
if (unlikely((pixel < cmd_pixel_end) && - (*pixel == *repeating_pixel))) { + (*pixel == pixel_value))) { /* go back and fill in raw pixel count */ *raw_pixels_count_byte = ((repeating_pixel - raw_pixel_start) + 1) & 0xFF;
- while ((pixel < cmd_pixel_end) - && (*pixel == *repeating_pixel)) { - pixel++; - } + do { + if (back_buffer_offset) + *(u16 *)((u8 *)pixel + back_buffer_offset) = pixel_value; + pixel++; + } while ((pixel < cmd_pixel_end) && + (*pixel == pixel_value));
/* immediately after raw data is repeat byte */ *cmd++ = ((pixel - repeating_pixel) - 1) & 0xFF; @@ -531,6 +553,7 @@ static int dlfb_render_hline(struct dlfb struct urb *urb = *urb_ptr; u8 *cmd = *urb_buf_ptr; u8 *cmd_end = (u8 *) urb->transfer_buffer + urb->transfer_buffer_length; + unsigned long back_buffer_offset = 0;
line_start = (u8 *) (front + byte_offset); next_pixel = line_start; @@ -541,6 +564,8 @@ static int dlfb_render_hline(struct dlfb const u8 *back_start = (u8 *) (dlfb->backing_buffer + byte_offset);
+ back_buffer_offset = (unsigned long)back_start - (unsigned long)line_start; + *ident_ptr += dlfb_trim_hline(back_start, &next_pixel, &byte_width);
@@ -549,16 +574,14 @@ static int dlfb_render_hline(struct dlfb dev_addr += offset; back_start += offset; line_start += offset; - - memcpy((char *)back_start, (char *) line_start, - byte_width); }
while (next_pixel < line_end) {
dlfb_compress_hline((const uint16_t **) &next_pixel, (const uint16_t *) line_end, &dev_addr, - (u8 **) &cmd, (u8 *) cmd_end); + (u8 **) &cmd, (u8 *) cmd_end, back_buffer_offset, + ident_ptr);
if (cmd >= cmd_end) { int len = cmd - (u8 *) urb->transfer_buffer;
Modern processors can detect linear memory accesses and prefetch data automatically, so there's no need to use prefetch.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com
--- drivers/video/fbdev/udlfb.c | 8 -------- 1 file changed, 8 deletions(-)
Index: linux-4.17-rc7/drivers/video/fbdev/udlfb.c =================================================================== --- linux-4.17-rc7.orig/drivers/video/fbdev/udlfb.c 2018-05-31 12:48:35.000000000 +0200 +++ linux-4.17-rc7/drivers/video/fbdev/udlfb.c 2018-05-31 12:48:35.000000000 +0200 @@ -25,7 +25,6 @@ #include <linux/fb.h> #include <linux/vmalloc.h> #include <linux/slab.h> -#include <linux/prefetch.h> #include <linux/delay.h> #include <asm/unaligned.h> #include <video/udlfb.h> @@ -375,9 +374,6 @@ static int dlfb_trim_hline(const u8 *bba int start = width; int end = width;
- prefetch((void *) front); - prefetch((void *) back); - for (j = 0; j < width; j++) { if (back[j] != front[j]) { start = j; @@ -454,8 +450,6 @@ static void dlfb_compress_hline( continue; }
- prefetchw((void *) cmd); /* pull in one cache line at least */ - *cmd++ = 0xAF; *cmd++ = 0x6B; *cmd++ = dev_addr >> 16; @@ -479,8 +473,6 @@ static void dlfb_compress_hline( cmd_pixel_end--; }
- prefetch_range((void *) pixel, (u8 *)cmd_pixel_end - (u8 *)pixel); - while (pixel < cmd_pixel_end) { const uint16_t * const repeating_pixel = pixel; u16 pixel_value = *pixel;
spin_lock_irqsave and spin_unlock_irqrestore is inteded to be called from a context where it is unknown if interrupts are enabled or disabled (such as interrupt handlers). From a process context, we should call spin_lock_irq and spin_unlock_irq, that avoids the costly pushf and popf instructions.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com
--- drivers/video/fbdev/udlfb.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)
Index: linux-4.17-rc7/drivers/video/fbdev/udlfb.c =================================================================== --- linux-4.17-rc7.orig/drivers/video/fbdev/udlfb.c 2018-06-03 13:17:46.000000000 +0200 +++ linux-4.17-rc7/drivers/video/fbdev/udlfb.c 2018-06-03 13:17:46.000000000 +0200 @@ -1855,18 +1855,17 @@ static void dlfb_free_urb_list(struct dl struct list_head *node; struct urb_node *unode; struct urb *urb; - unsigned long flags;
/* keep waiting and freeing, until we've got 'em all */ while (count--) { down(&dlfb->urbs.limit_sem);
- spin_lock_irqsave(&dlfb->urbs.lock, flags); + spin_lock_irq(&dlfb->urbs.lock);
node = dlfb->urbs.list.next; /* have reserved one with sem */ list_del_init(node);
- spin_unlock_irqrestore(&dlfb->urbs.lock, flags); + spin_unlock_irq(&dlfb->urbs.lock);
unode = list_entry(node, struct urb_node, entry); urb = unode->urb; @@ -1944,7 +1943,6 @@ static struct urb *dlfb_get_urb(struct d int ret; struct list_head *entry; struct urb_node *unode; - unsigned long flags;
/* Wait for an in-flight buffer to complete and get re-queued */ ret = down_timeout(&dlfb->urbs.limit_sem, GET_URB_TIMEOUT); @@ -1956,14 +1954,14 @@ static struct urb *dlfb_get_urb(struct d return NULL; }
- spin_lock_irqsave(&dlfb->urbs.lock, flags); + spin_lock_irq(&dlfb->urbs.lock);
BUG_ON(list_empty(&dlfb->urbs.list)); /* reserved one with limit_sem */ entry = dlfb->urbs.list.next; list_del_init(entry); dlfb->urbs.available--;
- spin_unlock_irqrestore(&dlfb->urbs.lock, flags); + spin_unlock_irq(&dlfb->urbs.lock);
unode = list_entry(entry, struct urb_node, entry); return unode->urb;
On 4 June 2018 at 00:40, Mikulas Patocka mpatocka@redhat.com wrote:
Hi
Here I'm sending bug fixes and performance improvements for the USB DisplayLink framebuffer and modesetting drivers for this merge window.
Hi,
You probably want to split these up into separate series for the kms and fbdev drivers.
Otherwise at least for drm you've missed this merge window, since it closes around rc6 of the previous kernel, did you use git send-email for these patches, at least some of them viewed funny on my phone, I'll try and look over the kms ones soon. Do you have any numbers for improvements to the kms ones?
Dave.
On Mon, 4 Jun 2018, Dave Airlie wrote:
On 4 June 2018 at 00:40, Mikulas Patocka mpatocka@redhat.com wrote:
Hi
Here I'm sending bug fixes and performance improvements for the USB DisplayLink framebuffer and modesetting drivers for this merge window.
Hi,
You probably want to split these up into separate series for the kms and fbdev drivers.
Otherwise at least for drm you've missed this merge window, since it closes around rc6 of the previous kernel,
Could you apply at least the fbdefio patches (without them, fbdefio is unusable due to crashes) and the display corruption of the last line (because most people will hit it)?
did you use git send-email for these patches, at least some of them viewed funny on my phone,
I used the command "quilt mail". I use quilt, not git, for management of my patches.
I'll try and look over the kms ones soon. Do you have any numbers for improvements to the kms ones?
Dave.
I measured performance improvement on the framebuffer patches. The kms driver already performs well, there's not much to do.
I'd like to as you if you could review the patch "udl-kms: fix a linked-list corruption when using fbdefio" - for me it fixes the crashes, but I am not expert in modesetting drivers and I don't know if some other part of the kernel assumes that the framebuffer pages must be allocated with drm_gem_get_pages.
BTW. When I unplug the USB adapter while using the modesetting driver, I get this warning. Do you have an idea how to fix it?
WARNING: CPU: 0 PID: 61 at drivers/gpu/drm/drm_mode_config.c:439 drm_mode_config_cleanup+0x250/0x2b8 [drm] Modules linked in: udlfb hid_generic usbhid hid tun bridge stp llc autofs4 binfmt_misc ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_conntrack xt_multiport iptable_filter iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 ip_tables x_tables pppoe pppox af_packet ppp_generic slhc udl drm_kms_helper cfbfillrect cfbimgblt cfbcopyarea drm drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops fb font snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi snd_pcm snd_timer snd soundcore nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack sd_mod ipv6 aes_ce_blk crypto_simd cryptd aes_ce_cipher crc32_ce ghash_ce gf128mul aes_arm64 sha2_ce sha256_arm64 sha1_ce xhci_plat_hcd xhci_hcd sha1_generic usbcore usb_common ahci_platform libahci_platform libahci mvpp2 unix CPU: 0 PID: 61 Comm: kworker/0:2 Not tainted 4.17.0-rc7 #1 Hardware name: Marvell 8040 MACCHIATOBin (DT) Workqueue: usb_hub_wq hub_event [usbcore] pstate: 80000005 (Nzcv daif -PAN -UAO) pc : drm_mode_config_cleanup+0x250/0x2b8 [drm] lr : drm_mode_config_cleanup+0x88/0x2b8 [drm] sp : ffffffc13a643920 x29: ffffffc13a643920 x28: ffffffc13a63ac00 x27: ffffffc1380962c0 x26: ffffffc11b95f898 x25: ffffff8000afd1d8 x24: ffffffc11b95f800 x23: 0000000000000060 x22: ffffff8000afd240 x21: ffffffc11b95eb38 x20: ffffffc11b95e800 x19: ffffffc11b95eb30 x18: ffffffc11b95ea7c x17: 0000007fa3591b60 x16: ffffff80081dc178 x15: ffffffc11b95ea78 x14: 0000000000000000 x13: ffffffc12b3fc000 x12: ffffffc12b3fc028 x11: ffffffc12b3fc119 x10: 000000000000001f x9 : 0000000000000028 x8 : ffffff8000a84000 x7 : 0000000000000000 x6 : 0000000000000001 x5 : 0000000000000002 x4 : 0000000000000001 x3 : 0000000000000002 x2 : 000000000000002f x1 : ffffffc11b95eaf8 x0 : ffffffc11b978818 Call trace: drm_mode_config_cleanup+0x250/0x2b8 [drm] udl_modeset_cleanup+0xc/0x18 [udl] udl_driver_unload+0x30/0x50 [udl] drm_dev_unregister+0x3c/0xe8 [drm] drm_dev_unplug+0x18/0x70 [drm] udl_usb_disconnect+0x30/0x40 [udl] usb_unbind_interface+0x6c/0x290 [usbcore] device_release_driver_internal+0x170/0x200 device_release_driver+0x14/0x20 bus_remove_device+0x118/0x128 device_del+0x110/0x308 usb_disable_device+0x8c/0x1f8 [usbcore] usb_disconnect+0xb4/0x218 [usbcore] usb_disconnect+0x9c/0x218 [usbcore] usb_disconnect+0x9c/0x218 [usbcore] hub_event+0xf20/0x1020 [usbcore] process_one_work+0x1c8/0x310 worker_thread+0x44/0x450 kthread+0x118/0x120 ret_from_fork+0x10/0x18 ---[ end trace 978a27ff198f1268 ]--- [drm:drm_mode_config_cleanup [drm]] *ERROR* connector DVI-I-1 leaked!
Mikulas
On Mon, Jun 04, 2018 at 10:14:02AM -0400, Mikulas Patocka wrote:
On Mon, 4 Jun 2018, Dave Airlie wrote:
On 4 June 2018 at 00:40, Mikulas Patocka mpatocka@redhat.com wrote:
Hi
Here I'm sending bug fixes and performance improvements for the USB DisplayLink framebuffer and modesetting drivers for this merge window.
Hi,
You probably want to split these up into separate series for the kms and fbdev drivers.
Otherwise at least for drm you've missed this merge window, since it closes around rc6 of the previous kernel,
Could you apply at least the fbdefio patches (without them, fbdefio is unusable due to crashes) and the display corruption of the last line (because most people will hit it)?
did you use git send-email for these patches, at least some of them viewed funny on my phone,
I used the command "quilt mail". I use quilt, not git, for management of my patches.
I'll try and look over the kms ones soon. Do you have any numbers for improvements to the kms ones?
Dave.
I measured performance improvement on the framebuffer patches. The kms driver already performs well, there's not much to do.
I'd like to as you if you could review the patch "udl-kms: fix a linked-list corruption when using fbdefio" - for me it fixes the crashes, but I am not expert in modesetting drivers and I don't know if some other part of the kernel assumes that the framebuffer pages must be allocated with drm_gem_get_pages.
BTW. When I unplug the USB adapter while using the modesetting driver, I get this warning. Do you have an idea how to fix it?
Probably the driver is missing a proper shutdown call (for atomic drivers this would be drm_atomic_helper_shutdown), leaving the connector active, which leaves it's reference count elevated. Or something like that.
Aside: Should we maintain uld as part of drm-misc under the small drivers topic? Or is this patch pile here more a one-shot effort? See
https://01.org/linuxgraphics/gfx-docs/maintainer-tools/drm-misc.html#small-d...
Cheers, Daniel
WARNING: CPU: 0 PID: 61 at drivers/gpu/drm/drm_mode_config.c:439 drm_mode_config_cleanup+0x250/0x2b8 [drm] Modules linked in: udlfb hid_generic usbhid hid tun bridge stp llc autofs4 binfmt_misc ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ipt_REJECT nf_reject_ipv4 xt_conntrack xt_multiport iptable_filter iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_nat xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 ip_tables x_tables pppoe pppox af_packet ppp_generic slhc udl drm_kms_helper cfbfillrect cfbimgblt cfbcopyarea drm drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops fb font snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi snd_pcm snd_timer snd soundcore nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack sd_mod ipv6 aes_ce_blk crypto_simd cryptd aes_ce_cipher crc32_ce ghash_ce gf128mul aes_arm64 sha2_ce sha256_arm64 sha1_ce xhci_plat_hcd xhci_hcd sha1_generic usbcore usb_common ahci_platform libahci_platform libahci mvpp2 unix CPU: 0 PID: 61 Comm: kworker/0:2 Not tainted 4.17.0-rc7 #1 Hardware name: Marvell 8040 MACCHIATOBin (DT) Workqueue: usb_hub_wq hub_event [usbcore] pstate: 80000005 (Nzcv daif -PAN -UAO) pc : drm_mode_config_cleanup+0x250/0x2b8 [drm] lr : drm_mode_config_cleanup+0x88/0x2b8 [drm] sp : ffffffc13a643920 x29: ffffffc13a643920 x28: ffffffc13a63ac00 x27: ffffffc1380962c0 x26: ffffffc11b95f898 x25: ffffff8000afd1d8 x24: ffffffc11b95f800 x23: 0000000000000060 x22: ffffff8000afd240 x21: ffffffc11b95eb38 x20: ffffffc11b95e800 x19: ffffffc11b95eb30 x18: ffffffc11b95ea7c x17: 0000007fa3591b60 x16: ffffff80081dc178 x15: ffffffc11b95ea78 x14: 0000000000000000 x13: ffffffc12b3fc000 x12: ffffffc12b3fc028 x11: ffffffc12b3fc119 x10: 000000000000001f x9 : 0000000000000028 x8 : ffffff8000a84000 x7 : 0000000000000000 x6 : 0000000000000001 x5 : 0000000000000002 x4 : 0000000000000001 x3 : 0000000000000002 x2 : 000000000000002f x1 : ffffffc11b95eaf8 x0 : ffffffc11b978818 Call trace: drm_mode_config_cleanup+0x250/0x2b8 [drm] udl_modeset_cleanup+0xc/0x18 [udl] udl_driver_unload+0x30/0x50 [udl] drm_dev_unregister+0x3c/0xe8 [drm] drm_dev_unplug+0x18/0x70 [drm] udl_usb_disconnect+0x30/0x40 [udl] usb_unbind_interface+0x6c/0x290 [usbcore] device_release_driver_internal+0x170/0x200 device_release_driver+0x14/0x20 bus_remove_device+0x118/0x128 device_del+0x110/0x308 usb_disable_device+0x8c/0x1f8 [usbcore] usb_disconnect+0xb4/0x218 [usbcore] usb_disconnect+0x9c/0x218 [usbcore] usb_disconnect+0x9c/0x218 [usbcore] hub_event+0xf20/0x1020 [usbcore] process_one_work+0x1c8/0x310 worker_thread+0x44/0x450 kthread+0x118/0x120 ret_from_fork+0x10/0x18 ---[ end trace 978a27ff198f1268 ]--- [drm:drm_mode_config_cleanup [drm]] *ERROR* connector DVI-I-1 leaked!
Mikulas _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Hi Mikulas,
On Sun, 2018-06-03 at 16:40 +0200, Mikulas Patocka wrote:
Hi
Here I'm sending bug fixes and performance improvements for the USB DisplayLink framebuffer and modesetting drivers for this merge window.
For such a long series it would be very nice to post a link to your git tree so people may play with your changes easily.
Could you please prepare one?
-Alexey
On Tue, 5 Jun 2018, Alexey Brodkin wrote:
Hi Mikulas,
On Sun, 2018-06-03 at 16:40 +0200, Mikulas Patocka wrote:
Hi
Here I'm sending bug fixes and performance improvements for the USB DisplayLink framebuffer and modesetting drivers for this merge window.
For such a long series it would be very nice to post a link to your git tree so people may play with your changes easily.
Could you please prepare one?
-Alexey
I don't use git to manage my patches, I use quilt.
I uploaded the patches from quilt here: http://people.redhat.com/~mpatocka/patches/kernel/udl/series.html
Mikulas
On Tuesday, June 05, 2018 11:34:42 AM Mikulas Patocka wrote:
On Tue, 5 Jun 2018, Alexey Brodkin wrote:
Hi Mikulas,
On Sun, 2018-06-03 at 16:40 +0200, Mikulas Patocka wrote:
Hi
Here I'm sending bug fixes and performance improvements for the USB DisplayLink framebuffer and modesetting drivers for this merge window.
For such a long series it would be very nice to post a link to your git tree so people may play with your changes easily.
Could you please prepare one?
-Alexey
I don't use git to manage my patches, I use quilt.
I uploaded the patches from quilt here: http://people.redhat.com/~mpatocka/patches/kernel/udl/series.html
I've queued fbdev ones for 4.19 (w/ some minor fixes), thanks.
Best regards, -- Bartlomiej Zolnierkiewicz Samsung R&D Institute Poland Samsung Electronics
dri-devel@lists.freedesktop.org