This set of patches adds support for Tegra20 and Tegra30 host1x and 2D. It is based on linux-next-20130114. The set was regenerated with git format-patch -M.
The fifth version merges DRM and host1x drivers into one driver. This allowed moving include/linux/host1x.h back into the driver and removed the need for a dummy platform device. This version also uses the code from tegradrm driver almost as is, so there are a lot less actual code changes.
This patch set does not have the host1x allocator, but it uses CMA helpers for memory management.
host1x is the driver that controls host1x hardware. It supports host1x command channels, synchronization, and memory management. It is sectioned into logical driver under drivers/gpu/host1x and physical driver under drivers/host1x/hw. The physical driver is compiled with the hardware headers of the particular host1x version.
The hardware units are described (briefly) in the Tegra2 TRM. Wiki page https://gitorious.org/linux-tegra-drm/pages/Host1xIntroduction also contains a short description of the functionality.
The patch set merges tegradrm into host1x and adds 2D driver, which uses host1x channels and sync points. The patch set also adds user space API to tegradrm for accessing host1x and 2D.
Terje Bergstrom (8): gpu: host1x: Add host1x driver gpu: host1x: Add syncpoint wait and interrupts gpu: host1x: Add channel support gpu: host1x: Add debug support drm: tegra: Move drm to live under host1x gpu: host1x: Remove second host1x driver ARM: tegra: Add board data and 2D clocks drm: tegra: Add gr2d device
arch/arm/mach-tegra/board-dt-tegra20.c | 1 + arch/arm/mach-tegra/board-dt-tegra30.c | 1 + arch/arm/mach-tegra/tegra20_clocks_data.c | 2 +- arch/arm/mach-tegra/tegra30_clocks_data.c | 1 + drivers/gpu/Makefile | 1 + drivers/gpu/drm/Kconfig | 2 - drivers/gpu/drm/Makefile | 1 - drivers/gpu/drm/tegra/Makefile | 7 - drivers/gpu/drm/tegra/drm.c | 115 ----- drivers/gpu/host1x/Kconfig | 32 ++ drivers/gpu/host1x/Makefile | 22 + drivers/gpu/host1x/cdma.c | 473 ++++++++++++++++++ drivers/gpu/host1x/cdma.h | 107 +++++ drivers/gpu/host1x/channel.c | 140 ++++++ drivers/gpu/host1x/channel.h | 58 +++ drivers/gpu/host1x/cma.c | 116 +++++ drivers/gpu/host1x/cma.h | 43 ++ drivers/gpu/host1x/debug.c | 215 +++++++++ drivers/gpu/host1x/debug.h | 50 ++ drivers/gpu/host1x/dev.c | 251 ++++++++++ drivers/gpu/host1x/dev.h | 170 +++++++ drivers/gpu/{drm/tegra => host1x/drm}/Kconfig | 2 +- drivers/gpu/{drm/tegra => host1x/drm}/dc.c | 7 +- drivers/gpu/{drm/tegra => host1x/drm}/dc.h | 0 drivers/gpu/host1x/drm/drm.c | 548 +++++++++++++++++++++ drivers/gpu/{drm/tegra => host1x/drm}/drm.h | 37 +- drivers/gpu/{drm/tegra => host1x/drm}/fb.c | 0 drivers/gpu/host1x/drm/gr2d.c | 325 +++++++++++++ drivers/gpu/{drm/tegra => host1x/drm}/hdmi.c | 7 +- drivers/gpu/{drm/tegra => host1x/drm}/hdmi.h | 0 drivers/gpu/{drm/tegra => host1x/drm}/host1x.c | 0 drivers/gpu/{drm/tegra => host1x/drm}/output.c | 0 drivers/gpu/{drm/tegra => host1x/drm}/rgb.c | 0 drivers/gpu/host1x/host1x.h | 29 ++ drivers/gpu/host1x/host1x_client.h | 34 ++ drivers/gpu/host1x/hw/Makefile | 6 + drivers/gpu/host1x/hw/cdma_hw.c | 478 ++++++++++++++++++ drivers/gpu/host1x/hw/cdma_hw.h | 37 ++ drivers/gpu/host1x/hw/channel_hw.c | 148 ++++++ drivers/gpu/host1x/hw/debug_hw.c | 400 ++++++++++++++++ drivers/gpu/host1x/hw/host1x01.c | 45 ++ drivers/gpu/host1x/hw/host1x01.h | 25 + drivers/gpu/host1x/hw/host1x01_hardware.h | 150 ++++++ drivers/gpu/host1x/hw/hw_host1x01_channel.h | 120 +++++ drivers/gpu/host1x/hw/hw_host1x01_sync.h | 241 ++++++++++ drivers/gpu/host1x/hw/hw_host1x01_uclass.h | 168 +++++++ drivers/gpu/host1x/hw/intr_hw.c | 178 +++++++ drivers/gpu/host1x/hw/syncpt_hw.c | 157 ++++++ drivers/gpu/host1x/intr.c | 383 +++++++++++++++ drivers/gpu/host1x/intr.h | 109 +++++ drivers/gpu/host1x/job.c | 612 ++++++++++++++++++++++++ drivers/gpu/host1x/job.h | 164 +++++++ drivers/gpu/host1x/memmgr.c | 173 +++++++ drivers/gpu/host1x/memmgr.h | 72 +++ drivers/gpu/host1x/syncpt.c | 399 +++++++++++++++ drivers/gpu/host1x/syncpt.h | 165 +++++++ drivers/video/Kconfig | 2 + include/drm/tegra_drm.h | 131 +++++ include/trace/events/host1x.h | 272 +++++++++++ 59 files changed, 7295 insertions(+), 137 deletions(-) delete mode 100644 drivers/gpu/drm/tegra/Makefile delete mode 100644 drivers/gpu/drm/tegra/drm.c create mode 100644 drivers/gpu/host1x/Kconfig create mode 100644 drivers/gpu/host1x/Makefile create mode 100644 drivers/gpu/host1x/cdma.c create mode 100644 drivers/gpu/host1x/cdma.h create mode 100644 drivers/gpu/host1x/channel.c create mode 100644 drivers/gpu/host1x/channel.h create mode 100644 drivers/gpu/host1x/cma.c create mode 100644 drivers/gpu/host1x/cma.h create mode 100644 drivers/gpu/host1x/debug.c create mode 100644 drivers/gpu/host1x/debug.h create mode 100644 drivers/gpu/host1x/dev.c create mode 100644 drivers/gpu/host1x/dev.h rename drivers/gpu/{drm/tegra => host1x/drm}/Kconfig (94%) rename drivers/gpu/{drm/tegra => host1x/drm}/dc.c (99%) rename drivers/gpu/{drm/tegra => host1x/drm}/dc.h (100%) create mode 100644 drivers/gpu/host1x/drm/drm.c rename drivers/gpu/{drm/tegra => host1x/drm}/drm.h (86%) rename drivers/gpu/{drm/tegra => host1x/drm}/fb.c (100%) create mode 100644 drivers/gpu/host1x/drm/gr2d.c rename drivers/gpu/{drm/tegra => host1x/drm}/hdmi.c (99%) rename drivers/gpu/{drm/tegra => host1x/drm}/hdmi.h (100%) rename drivers/gpu/{drm/tegra => host1x/drm}/host1x.c (100%) rename drivers/gpu/{drm/tegra => host1x/drm}/output.c (100%) rename drivers/gpu/{drm/tegra => host1x/drm}/rgb.c (100%) create mode 100644 drivers/gpu/host1x/host1x.h create mode 100644 drivers/gpu/host1x/host1x_client.h create mode 100644 drivers/gpu/host1x/hw/Makefile create mode 100644 drivers/gpu/host1x/hw/cdma_hw.c create mode 100644 drivers/gpu/host1x/hw/cdma_hw.h create mode 100644 drivers/gpu/host1x/hw/channel_hw.c create mode 100644 drivers/gpu/host1x/hw/debug_hw.c create mode 100644 drivers/gpu/host1x/hw/host1x01.c create mode 100644 drivers/gpu/host1x/hw/host1x01.h create mode 100644 drivers/gpu/host1x/hw/host1x01_hardware.h create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_channel.h create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_sync.h create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_uclass.h create mode 100644 drivers/gpu/host1x/hw/intr_hw.c create mode 100644 drivers/gpu/host1x/hw/syncpt_hw.c create mode 100644 drivers/gpu/host1x/intr.c create mode 100644 drivers/gpu/host1x/intr.h create mode 100644 drivers/gpu/host1x/job.c create mode 100644 drivers/gpu/host1x/job.h create mode 100644 drivers/gpu/host1x/memmgr.c create mode 100644 drivers/gpu/host1x/memmgr.h create mode 100644 drivers/gpu/host1x/syncpt.c create mode 100644 drivers/gpu/host1x/syncpt.h create mode 100644 include/drm/tegra_drm.h create mode 100644 include/trace/events/host1x.h
Add host1x, the driver for host1x and its client unit 2D.
Signed-off-by: Terje Bergstrom tbergstrom@nvidia.com --- drivers/gpu/Makefile | 1 + drivers/gpu/host1x/Kconfig | 6 + drivers/gpu/host1x/Makefile | 8 ++ drivers/gpu/host1x/dev.c | 161 +++++++++++++++++++++ drivers/gpu/host1x/dev.h | 73 ++++++++++ drivers/gpu/host1x/hw/Makefile | 6 + drivers/gpu/host1x/hw/host1x01.c | 35 +++++ drivers/gpu/host1x/hw/host1x01.h | 25 ++++ drivers/gpu/host1x/hw/host1x01_hardware.h | 26 ++++ drivers/gpu/host1x/hw/hw_host1x01_sync.h | 72 ++++++++++ drivers/gpu/host1x/hw/syncpt_hw.c | 146 +++++++++++++++++++ drivers/gpu/host1x/syncpt.c | 217 +++++++++++++++++++++++++++++ drivers/gpu/host1x/syncpt.h | 153 ++++++++++++++++++++ drivers/video/Kconfig | 2 + include/trace/events/host1x.h | 61 ++++++++ 15 files changed, 992 insertions(+) create mode 100644 drivers/gpu/host1x/Kconfig create mode 100644 drivers/gpu/host1x/Makefile create mode 100644 drivers/gpu/host1x/dev.c create mode 100644 drivers/gpu/host1x/dev.h create mode 100644 drivers/gpu/host1x/hw/Makefile create mode 100644 drivers/gpu/host1x/hw/host1x01.c create mode 100644 drivers/gpu/host1x/hw/host1x01.h create mode 100644 drivers/gpu/host1x/hw/host1x01_hardware.h create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_sync.h create mode 100644 drivers/gpu/host1x/hw/syncpt_hw.c create mode 100644 drivers/gpu/host1x/syncpt.c create mode 100644 drivers/gpu/host1x/syncpt.h create mode 100644 include/trace/events/host1x.h
diff --git a/drivers/gpu/Makefile b/drivers/gpu/Makefile index cc92778..7e227097 100644 --- a/drivers/gpu/Makefile +++ b/drivers/gpu/Makefile @@ -1 +1,2 @@ obj-y += drm/ vga/ stub/ +obj-$(CONFIG_TEGRA_HOST1X) += host1x/ diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig new file mode 100644 index 0000000..e89fb2b --- /dev/null +++ b/drivers/gpu/host1x/Kconfig @@ -0,0 +1,6 @@ +config TEGRA_HOST1X + tristate "Tegra host1x driver" + help + Driver for the Tegra host1x hardware. + + Required for enabling tegradrm. diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile new file mode 100644 index 0000000..363e6ab --- /dev/null +++ b/drivers/gpu/host1x/Makefile @@ -0,0 +1,8 @@ +ccflags-y = -Idrivers/gpu/host1x + +host1x-y = \ + syncpt.o \ + dev.o \ + hw/host1x01.o + +obj-$(CONFIG_TEGRA_HOST1X) += host1x.o diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c new file mode 100644 index 0000000..cd2b1ef --- /dev/null +++ b/drivers/gpu/host1x/dev.c @@ -0,0 +1,161 @@ +/* + * Tegra host1x driver + * + * Copyright (c) 2010-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <linux/module.h> +#include <linux/list.h> +#include <linux/slab.h> +#include <linux/of.h> +#include <linux/of_device.h> +#include <linux/clk.h> +#include <linux/io.h> +#include "dev.h" +#include "hw/host1x01.h" + +#define CREATE_TRACE_POINTS +#include <trace/events/host1x.h> + +#define DRIVER_NAME "tegra-host1x" + +void host1x_sync_writel(struct host1x *host1x, u32 v, u32 r) +{ + void __iomem *sync_regs = host1x->regs + host1x->info.sync_offset; + + writel(v, sync_regs + r); +} + +u32 host1x_sync_readl(struct host1x *host1x, u32 r) +{ + void __iomem *sync_regs = host1x->regs + host1x->info.sync_offset; + + return readl(sync_regs + r); +} + +static struct host1x_device_info host1x_info = { + .nb_channels = 8, + .nb_pts = 32, + .nb_mlocks = 16, + .nb_bases = 8, + .init = host1x01_init, + .sync_offset = 0x3000, +}; + +static struct of_device_id host1x_match[] = { + { .compatible = "nvidia,tegra30-host1x", .data = &host1x_info, }, + { .compatible = "nvidia,tegra20-host1x", .data = &host1x_info, }, + { }, +}; + +static int host1x_probe(struct platform_device *dev) +{ + struct host1x *host; + struct resource *regs; + int syncpt_irq; + int err; + const struct of_device_id *devid = + of_match_device(host1x_match, &dev->dev); + + if (!devid) + return -EINVAL; + + regs = platform_get_resource(dev, IORESOURCE_MEM, 0); + if (!regs) { + dev_err(&dev->dev, "missing regs\n"); + return -ENXIO; + } + + syncpt_irq = platform_get_irq(dev, 0); + if (IS_ERR_VALUE(syncpt_irq)) { + dev_err(&dev->dev, "missing irq\n"); + return -ENXIO; + } + + host = devm_kzalloc(&dev->dev, sizeof(*host), GFP_KERNEL); + if (!host) { + dev_err(&dev->dev, "failed to alloc host1x\n"); + return -ENOMEM; + } + + host->dev = dev; + memcpy(&host->info, devid->data, sizeof(struct host1x_device_info)); + + /* set common host1x device data */ + platform_set_drvdata(dev, host); + + host->regs = devm_request_and_ioremap(&dev->dev, regs); + if (!host->regs) { + dev_err(&dev->dev, "failed to remap host registers\n"); + return -ENXIO; + } + + if (host->info.init) { + err = host->info.init(host); + if (err) + return err; + } + + err = host1x_syncpt_init(host); + if (err) + return err; + + host->clk = devm_clk_get(&dev->dev, NULL); + if (IS_ERR(host->clk)) { + dev_err(&dev->dev, "failed to get clock\n"); + err = PTR_ERR(host->clk); + goto fail_deinit_syncpt; + } + + err = clk_prepare_enable(host->clk); + if (err < 0) { + dev_err(&dev->dev, "failed to enable clock\n"); + goto fail_deinit_syncpt; + } + + host1x_syncpt_reset(host); + + dev_info(&dev->dev, "initialized\n"); + + return 0; + +fail_deinit_syncpt: + host1x_syncpt_deinit(host); + return err; +} + +static int __exit host1x_remove(struct platform_device *dev) +{ + struct host1x *host = platform_get_drvdata(dev); + host1x_syncpt_deinit(host); + clk_disable_unprepare(host->clk); + return 0; +} + +static struct platform_driver platform_driver = { + .probe = host1x_probe, + .remove = __exit_p(host1x_remove), + .driver = { + .owner = THIS_MODULE, + .name = DRIVER_NAME, + .of_match_table = host1x_match, + }, +}; + +module_platform_driver(platform_driver); + +MODULE_AUTHOR("Terje Bergstrom tbergstrom@nvidia.com"); +MODULE_DESCRIPTION("Host1x driver for Tegra products"); +MODULE_LICENSE("GPL"); diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h new file mode 100644 index 0000000..d8f5979 --- /dev/null +++ b/drivers/gpu/host1x/dev.h @@ -0,0 +1,73 @@ +/* + * Copyright (c) 2012-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef HOST1X_DEV_H +#define HOST1X_DEV_H + +#include "syncpt.h" + +struct host1x; +struct host1x_syncpt; +struct platform_device; + +struct host1x_syncpt_ops { + void (*reset)(struct host1x_syncpt *); + void (*reset_wait_base)(struct host1x_syncpt *); + void (*read_wait_base)(struct host1x_syncpt *); + u32 (*load_min)(struct host1x_syncpt *); + void (*cpu_incr)(struct host1x_syncpt *); + int (*patch_wait)(struct host1x_syncpt *, void *patch_addr); + void (*debug)(struct host1x_syncpt *); + const char * (*name)(struct host1x_syncpt *); +}; + +struct host1x_device_info { + int nb_channels; /* host1x: num channels supported */ + int nb_pts; /* host1x: num syncpoints supported */ + int nb_bases; /* host1x: num syncpoints supported */ + int nb_mlocks; /* host1x: number of mlocks */ + int (*init)(struct host1x *); /* initialize per SoC ops */ + int sync_offset; +}; + +struct host1x { + void __iomem *regs; + struct host1x_syncpt *syncpt; + struct platform_device *dev; + struct host1x_device_info info; + struct clk *clk; + + struct host1x_syncpt_ops syncpt_op; + + struct dentry *debugfs; +}; + +static inline +struct host1x *host1x_get_host(struct platform_device *_dev) +{ + struct platform_device *pdev; + + if (_dev->dev.parent) { + pdev = to_platform_device(_dev->dev.parent); + return platform_get_drvdata(pdev); + } else + return platform_get_drvdata(_dev); +} + +void host1x_sync_writel(struct host1x *host1x, u32 r, u32 v); +u32 host1x_sync_readl(struct host1x *host1x, u32 r); + +#endif diff --git a/drivers/gpu/host1x/hw/Makefile b/drivers/gpu/host1x/hw/Makefile new file mode 100644 index 0000000..9b50863 --- /dev/null +++ b/drivers/gpu/host1x/hw/Makefile @@ -0,0 +1,6 @@ +ccflags-y = -Idrivers/gpu/host1x + +host1x-hw-objs = \ + host1x01.o + +obj-$(CONFIG_TEGRA_HOST1X) += host1x-hw.o diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c new file mode 100644 index 0000000..ea6e604 --- /dev/null +++ b/drivers/gpu/host1x/hw/host1x01.c @@ -0,0 +1,35 @@ +/* + * Host1x init for T20 and T30 Architecture Chips + * + * Copyright (c) 2011-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <linux/init.h> +#include <linux/clk.h> +#include <linux/of.h> +#include <linux/of_platform.h> + +#include "hw/host1x01.h" +#include "dev.h" +#include "hw/host1x01_hardware.h" + +#include "hw/syncpt_hw.c" + +int host1x01_init(struct host1x *host) +{ + host->syncpt_op = host1x_syncpt_ops; + + return 0; +} diff --git a/drivers/gpu/host1x/hw/host1x01.h b/drivers/gpu/host1x/hw/host1x01.h new file mode 100644 index 0000000..6ec30051 --- /dev/null +++ b/drivers/gpu/host1x/hw/host1x01.h @@ -0,0 +1,25 @@ +/* + * Host1x init for T20 and T30 Architecture Chips + * + * Copyright (c) 2011-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ +#ifndef HOST1X_HOST1X01_H +#define HOST1X_HOST1X01_H + +struct host1x; + +int host1x01_init(struct host1x *); + +#endif /* HOST1X_HOST1X01_H_ */ diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h new file mode 100644 index 0000000..c1d5324 --- /dev/null +++ b/drivers/gpu/host1x/hw/host1x01_hardware.h @@ -0,0 +1,26 @@ +/* + * Tegra host1x Register Offsets for Tegra20 and Tegra30 + * + * Copyright (c) 2010-2013 NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef __HOST1X_HOST1X01_HARDWARE_H +#define __HOST1X_HOST1X01_HARDWARE_H + +#include <linux/types.h> +#include <linux/bitops.h> +#include "hw_host1x01_sync.h" + +#endif diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h new file mode 100644 index 0000000..b12c1a4 --- /dev/null +++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h @@ -0,0 +1,72 @@ +/* + * Copyright (c) 2012-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + * + */ + + /* + * Function naming determines intended use: + * + * <x>_r(void) : Returns the offset for register <x>. + * + * <x>_w(void) : Returns the word offset for word (4 byte) element <x>. + * + * <x>_<y>_s(void) : Returns size of field <y> of register <x> in bits. + * + * <x>_<y>_f(u32 v) : Returns a value based on 'v' which has been shifted + * and masked to place it at field <y> of register <x>. This value + * can be |'d with others to produce a full register value for + * register <x>. + * + * <x>_<y>_m(void) : Returns a mask for field <y> of register <x>. This + * value can be ~'d and then &'d to clear the value of field <y> for + * register <x>. + * + * <x>_<y>_<z>_f(void) : Returns the constant value <z> after being shifted + * to place it at field <y> of register <x>. This value can be |'d + * with others to produce a full register value for <x>. + * + * <x>_<y>_v(u32 r) : Returns the value of field <y> from a full register + * <x> value 'r' after being shifted to place its LSB at bit 0. + * This value is suitable for direct comparison with other unshifted + * values appropriate for use in field <y> of register <x>. + * + * <x>_<y>_<z>_v(void) : Returns the constant value for <z> defined for + * field <y> of register <x>. This value is suitable for direct + * comparison with unshifted values appropriate for use in field <y> + * of register <x>. + */ + +#ifndef __hw_host1x01_sync_h__ +#define __hw_host1x01_sync_h__ + +static inline u32 host1x_sync_syncpt_0_r(void) +{ + return 0x400; +} +#define HOST1X_SYNC_SYNCPT_0 \ + host1x_sync_syncpt_0_r() +static inline u32 host1x_sync_syncpt_base_0_r(void) +{ + return 0x600; +} +#define HOST1X_SYNC_SYNCPT_BASE_0 \ + host1x_sync_syncpt_base_0_r() +static inline u32 host1x_sync_syncpt_cpu_incr_r(void) +{ + return 0x700; +} +#define HOST1X_SYNC_SYNCPT_CPU_INCR \ + host1x_sync_syncpt_cpu_incr_r() +#endif /* __hw_host1x01_sync_h__ */ diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c new file mode 100644 index 0000000..16e3ada --- /dev/null +++ b/drivers/gpu/host1x/hw/syncpt_hw.c @@ -0,0 +1,146 @@ +/* + * Tegra host1x Syncpoints + * + * Copyright (c) 2010-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <linux/io.h> +#include "syncpt.h" +#include "dev.h" + +/* + * Write the current syncpoint value back to hw. + */ +static void syncpt_reset(struct host1x_syncpt *sp) +{ + struct host1x *dev = sp->dev; + int min = host1x_syncpt_read_min(sp); + host1x_sync_writel(dev, min, HOST1X_SYNC_SYNCPT_0 + sp->id * 4); +} + +/* + * Write the current waitbase value back to hw. + */ +static void syncpt_reset_wait_base(struct host1x_syncpt *sp) +{ + struct host1x *dev = sp->dev; + host1x_sync_writel(dev, sp->base_val, + HOST1X_SYNC_SYNCPT_BASE_0 + sp->id * 4); +} + +/* + * Read waitbase value from hw. + */ +static void syncpt_read_wait_base(struct host1x_syncpt *sp) +{ + struct host1x *dev = sp->dev; + sp->base_val = host1x_sync_readl(dev, + HOST1X_SYNC_SYNCPT_BASE_0 + sp->id * 4); +} + +/* + * Updates the last value read from hardware. + * (was host1x_syncpt_load_min) + */ +static u32 syncpt_load_min(struct host1x_syncpt *sp) +{ + struct host1x *dev = sp->dev; + u32 old, live; + + do { + old = host1x_syncpt_read_min(sp); + live = host1x_sync_readl(dev, + HOST1X_SYNC_SYNCPT_0 + sp->id * 4); + } while ((u32)atomic_cmpxchg(&sp->min_val, old, live) != old); + + if (!host1x_syncpt_check_max(sp, live)) + dev_err(&dev->dev->dev, + "%s failed: id=%u, min=%d, max=%d\n", + __func__, + sp->id, + host1x_syncpt_read_min(sp), + host1x_syncpt_read_max(sp)); + + return live; +} + +/* + * Write a cpu syncpoint increment to the hardware, without touching + * the cache. Caller is responsible for host being powered. + */ +static void syncpt_cpu_incr(struct host1x_syncpt *sp) +{ + struct host1x *dev = sp->dev; + u32 reg_offset = sp->id / 32; + + if (!host1x_syncpt_client_managed(sp) + && host1x_syncpt_min_eq_max(sp)) { + dev_err(&dev->dev->dev, + "Trying to increment syncpoint id %d beyond max\n", + sp->id); + return; + } + host1x_sync_writel(dev, BIT_MASK(sp->id), + HOST1X_SYNC_SYNCPT_CPU_INCR + reg_offset * 4); + wmb(); +} + +static const char *syncpt_name(struct host1x_syncpt *sp) +{ + struct host1x_device_info *info = &sp->dev->info; + const char *name = NULL; + + if (sp->id < info->nb_pts) + name = sp->name; + + return name ? name : ""; +} + +static void syncpt_debug(struct host1x_syncpt *sp) +{ + u32 i; + for (i = 0; i < host1x_syncpt_nb_pts(sp->dev); i++) { + u32 max = host1x_syncpt_read_max(sp); + u32 min = host1x_syncpt_load_min(sp); + if (!max && !min) + continue; + dev_info(&sp->dev->dev->dev, + "id %d (%s) min %d max %d\n", + i, sp->name, + min, max); + + } + + for (i = 0; i < host1x_syncpt_nb_bases(sp->dev); i++) { + u32 base_val; + host1x_syncpt_read_wait_base(sp); + base_val = sp->base_val; + if (base_val) + dev_info(&sp->dev->dev->dev, + "waitbase id %d val %d\n", + i, base_val); + + } +} + +static const struct host1x_syncpt_ops host1x_syncpt_ops = { + .reset = syncpt_reset, + .reset_wait_base = syncpt_reset_wait_base, + .read_wait_base = syncpt_read_wait_base, + .load_min = syncpt_load_min, + .cpu_incr = syncpt_cpu_incr, + .debug = syncpt_debug, + .name = syncpt_name, +}; diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c new file mode 100644 index 0000000..b45651f --- /dev/null +++ b/drivers/gpu/host1x/syncpt.c @@ -0,0 +1,217 @@ +/* + * Tegra host1x Syncpoints + * + * Copyright (c) 2010-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <linux/platform_device.h> +#include <linux/slab.h> +#include <linux/stat.h> +#include <linux/module.h> +#include "syncpt.h" +#include "dev.h" +#include <trace/events/host1x.h> + +#define MAX_SYNCPT_LENGTH 5 + +static struct host1x_syncpt *_host1x_syncpt_alloc(struct host1x *host, + struct platform_device *pdev, + int client_managed); + +u32 host1x_syncpt_id(struct host1x_syncpt *sp) +{ + return sp->id; +} + +/* + * Updates the value sent to hardware. + */ +u32 host1x_syncpt_incr_max(struct host1x_syncpt *sp, u32 incrs) +{ + return (u32)atomic_add_return(incrs, &sp->max_val); +} + +/* + * Resets syncpoint and waitbase values to sw shadows + */ +void host1x_syncpt_reset(struct host1x *dev) +{ + struct host1x_syncpt *sp_base = dev->syncpt; + u32 i; + + for (i = 0; i < host1x_syncpt_nb_pts(dev); i++) + dev->syncpt_op.reset(sp_base + i); + for (i = 0; i < host1x_syncpt_nb_bases(dev); i++) + dev->syncpt_op.reset_wait_base(sp_base + i); + wmb(); +} + +/* + * Updates sw shadow state for client managed registers + */ +void host1x_syncpt_save(struct host1x *dev) +{ + struct host1x_syncpt *sp_base = dev->syncpt; + u32 i; + + for (i = 0; i < host1x_syncpt_nb_pts(dev); i++) { + if (host1x_syncpt_client_managed(sp_base + i)) + dev->syncpt_op.load_min(sp_base + i); + else + WARN_ON(!host1x_syncpt_min_eq_max(sp_base + i)); + } + + for (i = 0; i < host1x_syncpt_nb_bases(dev); i++) + dev->syncpt_op.read_wait_base(sp_base + i); +} + +/* + * Updates the last value read from hardware. + */ +u32 host1x_syncpt_load_min(struct host1x_syncpt *sp) +{ + u32 val; + val = sp->dev->syncpt_op.load_min(sp); + trace_host1x_syncpt_load_min(sp->id, val); + + return val; +} + +/* + * Get the current syncpoint base + */ +u32 host1x_syncpt_read_wait_base(struct host1x_syncpt *sp) +{ + u32 val; + sp->dev->syncpt_op.read_wait_base(sp); + val = sp->base_val; + return val; +} + +/* + * Write a cpu syncpoint increment to the hardware, without touching + * the cache. Caller is responsible for host being powered. + */ +void host1x_syncpt_cpu_incr(struct host1x_syncpt *sp) +{ + sp->dev->syncpt_op.cpu_incr(sp); +} + +/* + * Increment syncpoint value from cpu, updating cache + */ +void host1x_syncpt_incr(struct host1x_syncpt *sp) +{ + if (host1x_syncpt_client_managed(sp)) + host1x_syncpt_incr_max(sp, 1); + host1x_syncpt_cpu_incr(sp); +} + +void host1x_syncpt_debug(struct host1x_syncpt *sp) +{ + sp->dev->syncpt_op.debug(sp); +} + +int host1x_syncpt_init(struct host1x *host) +{ + struct host1x_syncpt *syncpt, *sp; + int i; + + syncpt = sp = devm_kzalloc(&host->dev->dev, + sizeof(struct host1x_syncpt) * host->info.nb_pts, + GFP_KERNEL); + if (!syncpt) + return -ENOMEM; + + for (i = 0; i < host->info.nb_pts; ++i, ++sp) { + sp->id = i; + sp->dev = host; + } + + host->syncpt = syncpt; + + return 0; +} + +static struct host1x_syncpt *_host1x_syncpt_alloc(struct host1x *host, + struct platform_device *pdev, + int client_managed) +{ + int i; + struct host1x_syncpt *sp = host->syncpt; + char *name; + + for (i = 0; i < host->info.nb_pts && sp->name; i++, sp++) + ; + if (sp->pdev) + return NULL; + + name = kasprintf(GFP_KERNEL, "%02d-%s", sp->id, + pdev ? dev_name(&pdev->dev) : NULL); + if (!name) + return NULL; + + sp->pdev = pdev; + sp->name = name; + sp->client_managed = client_managed; + + return sp; +} + +struct host1x_syncpt *host1x_syncpt_alloc(struct platform_device *pdev, + int client_managed) +{ + struct host1x *host = host1x_get_host(pdev); + return _host1x_syncpt_alloc(host, pdev, client_managed); +} + +void host1x_syncpt_free(struct host1x_syncpt *sp) +{ + if (!sp) + return; + + kfree(sp->name); + sp->pdev = NULL; + sp->name = NULL; + sp->client_managed = 0; +} + +void host1x_syncpt_deinit(struct host1x *host) +{ + int i; + struct host1x_syncpt *sp = host->syncpt; + for (i = 0; i < host->info.nb_pts; i++, sp++) + kfree(sp->name); +} + +int host1x_syncpt_nb_pts(struct host1x *dev) +{ + return dev->info.nb_pts; +} + +int host1x_syncpt_nb_bases(struct host1x *dev) +{ + return dev->info.nb_bases; +} + +int host1x_syncpt_nb_mlocks(struct host1x *dev) +{ + return dev->info.nb_mlocks; +} + +struct host1x_syncpt *host1x_syncpt_get(struct host1x *dev, u32 id) +{ + return dev->syncpt + id; +} diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h new file mode 100644 index 0000000..d9b9b0a --- /dev/null +++ b/drivers/gpu/host1x/syncpt.h @@ -0,0 +1,153 @@ +/* + * Tegra host1x Syncpoints + * + * Copyright (c) 2010-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef __HOST1X_SYNCPT_H +#define __HOST1X_SYNCPT_H + +#include <linux/kernel.h> +#include <linux/sched.h> +#include <linux/atomic.h> + +struct host1x; + +#define NVSYNCPT_INVALID (-1) + +struct host1x_syncpt { + int id; + atomic_t min_val; + atomic_t max_val; + u32 base_val; + const char *name; + int client_managed; + struct host1x *dev; + struct platform_device *pdev; +}; + +/* Initialize sync point array */ +int host1x_syncpt_init(struct host1x *); + +/* Free sync point array */ +void host1x_syncpt_deinit(struct host1x *); + +/* + * Read max. It indicates how many operations there are in queue, either in + * channel or in a software thread. + * */ +static inline u32 host1x_syncpt_read_max(struct host1x_syncpt *sp) +{ + smp_rmb(); + return (u32)atomic_read(&sp->max_val); +} + +/* + * Read min, which is a shadow of the current sync point value in hardware. + */ +static inline u32 host1x_syncpt_read_min(struct host1x_syncpt *sp) +{ + smp_rmb(); + return (u32)atomic_read(&sp->min_val); +} + +/* Return number of sync point supported. */ +int host1x_syncpt_nb_pts(struct host1x *dev); + +/* Return number of wait bases supported. */ +int host1x_syncpt_nb_bases(struct host1x *dev); + +/* Return number of mlocks supported. */ +int host1x_syncpt_nb_mlocks(struct host1x *dev); + +/* + * Check sync point sanity. If max is larger than min, there have too many + * sync point increments. + * + * Client managed sync point are not tracked. + * */ +static inline bool host1x_syncpt_check_max(struct host1x_syncpt *sp, u32 real) +{ + u32 max; + if (sp->client_managed) + return true; + max = host1x_syncpt_read_max(sp); + return (s32)(max - real) >= 0; +} + +/* Return true if sync point is client managed. */ +static inline int host1x_syncpt_client_managed(struct host1x_syncpt *sp) +{ + return sp->client_managed; +} + +/* + * Returns true if syncpoint min == max, which means that there are no + * outstanding operations. + */ +static inline bool host1x_syncpt_min_eq_max(struct host1x_syncpt *sp) +{ + int min, max; + smp_rmb(); + min = atomic_read(&sp->min_val); + max = atomic_read(&sp->max_val); + return (min == max); +} + +/* Return pointer to struct denoting sync point id. */ +struct host1x_syncpt *host1x_syncpt_get(struct host1x *dev, u32 id); + +/* Request incrementing a sync point. */ +void host1x_syncpt_cpu_incr(struct host1x_syncpt *sp); + +/* Load current value from hardware to the shadow register. */ +u32 host1x_syncpt_load_min(struct host1x_syncpt *sp); + +/* Save host1x sync point state into shadow registers. */ +void host1x_syncpt_save(struct host1x *dev); + +/* Reset host1x sync point state from shadow registers. */ +void host1x_syncpt_reset(struct host1x *dev); + +/* Read current wait base value into shadow register and return it. */ +u32 host1x_syncpt_read_wait_base(struct host1x_syncpt *sp); + +/* Increment sync point and its max. */ +void host1x_syncpt_incr(struct host1x_syncpt *sp); + +/* Indicate future operations by incrementing the sync point max. */ +u32 host1x_syncpt_incr_max(struct host1x_syncpt *sp, u32 incrs); + +/* Do a debug dump of sync point values. */ +void host1x_syncpt_debug(struct host1x_syncpt *sp); + +/* Check if sync point id is valid. */ +static inline int host1x_syncpt_is_valid(struct host1x_syncpt *sp) +{ + return sp->id != NVSYNCPT_INVALID && + sp->id < host1x_syncpt_nb_pts(sp->dev); +} + +/* Return id of the sync point */ +u32 host1x_syncpt_id(struct host1x_syncpt *sp); + +/* Allocate a sync point for a device. */ +struct host1x_syncpt *host1x_syncpt_alloc(struct platform_device *pdev, + int client_managed); + +/* Free a sync point. */ +void host1x_syncpt_free(struct host1x_syncpt *sp); + +#endif diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig index e7068c5..776ddba 100644 --- a/drivers/video/Kconfig +++ b/drivers/video/Kconfig @@ -21,6 +21,8 @@ source "drivers/gpu/vga/Kconfig"
source "drivers/gpu/drm/Kconfig"
+source "drivers/gpu/host1x/Kconfig" + source "drivers/gpu/stub/Kconfig"
config VGASTATE diff --git a/include/trace/events/host1x.h b/include/trace/events/host1x.h new file mode 100644 index 0000000..3c14cac --- /dev/null +++ b/include/trace/events/host1x.h @@ -0,0 +1,61 @@ +/* + * include/trace/events/host1x.h + * + * Nvhost event logging to ftrace. + * + * Copyright (c) 2010-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +#undef TRACE_SYSTEM +#define TRACE_SYSTEM host1x + +#if !defined(_TRACE_HOST1X_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_HOST1X_H + +#include <linux/ktime.h> +#include <linux/tracepoint.h> + +DECLARE_EVENT_CLASS(host1x, + TP_PROTO(const char *name), + TP_ARGS(name), + TP_STRUCT__entry(__field(const char *, name)), + TP_fast_assign(__entry->name = name;), + TP_printk("name=%s", __entry->name) +); + +TRACE_EVENT(host1x_syncpt_load_min, + TP_PROTO(u32 id, u32 val), + + TP_ARGS(id, val), + + TP_STRUCT__entry( + __field(u32, id) + __field(u32, val) + ), + + TP_fast_assign( + __entry->id = id; + __entry->val = val; + ), + + TP_printk("id=%d, val=%d", __entry->id, __entry->val) +); + +#endif /* _TRACE_HOST1X_H */ + +/* This part must be outside protection */ +#include <trace/define_trace.h>
On Tue, Jan 15, 2013 at 01:43:57PM +0200, Terje Bergstrom wrote:
Add host1x, the driver for host1x and its client unit 2D.
Maybe this could be a bit more verbose. Perhaps describe what host1x is.
diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig
[...]
@@ -0,0 +1,6 @@ +config TEGRA_HOST1X
- tristate "Tegra host1x driver"
- help
Driver for the Tegra host1x hardware.
Maybe s/Tegra/NVIDIA Tegra/?
Required for enabling tegradrm.
This should probably be dropped. Either encode such knowledge as explicit dependencies or in this case just remove it altogether since we will probably merge both drivers anyway.
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
[...]
+#include <linux/module.h> +#include <linux/list.h> +#include <linux/slab.h> +#include <linux/of.h> +#include <linux/of_device.h> +#include <linux/clk.h> +#include <linux/io.h> +#include "dev.h"
Maybe add a blank line between the previous two lines to visually separate standard Linux includes from driver-specific ones.
+#include "hw/host1x01.h"
+#define CREATE_TRACE_POINTS +#include <trace/events/host1x.h>
+#define DRIVER_NAME "tegra-host1x"
You only ever use this once, so maybe it can just be dropped?
+static struct host1x_device_info host1x_info = {
Perhaps this should be host1x01_info in order to match the hardware revision? That'll avoid it having to be renamed later on when other revisions start to appear.
+static int host1x_probe(struct platform_device *dev) +{
[...]
- syncpt_irq = platform_get_irq(dev, 0);
- if (IS_ERR_VALUE(syncpt_irq)) {
This is somewhat unusual. It should be fine to just do:
if (syncpt_irq < 0)
but IS_ERR_VALUE() should work fine too.
- memcpy(&host->info, devid->data, sizeof(struct host1x_device_info));
Why not make the .info field a pointer to struct host1x_device_info instead? That way you don't have to keep separate copies of the same information.
- /* set common host1x device data */
- platform_set_drvdata(dev, host);
- host->regs = devm_request_and_ioremap(&dev->dev, regs);
- if (!host->regs) {
dev_err(&dev->dev, "failed to remap host registers\n");
return -ENXIO;
- }
This should probably be rewritten as:
host->regs = devm_ioremap_resource(&dev->dev, regs); if (IS_ERR(host->regs)) return PTR_ERR(host->regs);
Though that function will only be available in 3.9-rc1.
- err = host1x_syncpt_init(host);
- if (err)
return err;
[...]
- host1x_syncpt_reset(host);
Why separate host1x_syncpt_reset() from host1x_syncpt_init()? I see why it might be useful to have host1x_syncpt_reset() as a separate function but couldn't it be called as part of host1x_syncpt_init()?
- dev_info(&dev->dev, "initialized\n");
I don't think this is very useful. We should make sure to tell people when things fail. When everything goes as planned we don't need to brag about it =)
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
[...]
+struct host1x_syncpt_ops {
[...]
- const char * (*name)(struct host1x_syncpt *);
Why do we need this? Could we not refer to the syncpt name directly instead of going through this wrapper? I'd expect the name to be static.
+struct host1x_device_info {
Maybe this should be called simply host1x_info? _device seems redundant.
- int nb_channels; /* host1x: num channels supported */
- int nb_pts; /* host1x: num syncpoints supported */
- int nb_bases; /* host1x: num syncpoints supported */
- int nb_mlocks; /* host1x: number of mlocks */
- int (*init)(struct host1x *); /* initialize per SoC ops */
- int sync_offset;
+};
While this isn't public API, maybe it would still be useful to turn the comments into proper kerneldoc? That's what people are used to.
+struct host1x {
- void __iomem *regs;
- struct host1x_syncpt *syncpt;
- struct platform_device *dev;
- struct host1x_device_info info;
- struct clk *clk;
- struct host1x_syncpt_ops syncpt_op;
Maybe make this a struct host1x_syncpt_ops * instead so you don't have separate copies? While at it, maybe this should be const as well.
- struct dentry *debugfs;
This doesn't seem to be used anywhere.
+static inline +struct host1x *host1x_get_host(struct platform_device *_dev) +{
- struct platform_device *pdev;
- if (_dev->dev.parent) {
pdev = to_platform_device(_dev->dev.parent);
return platform_get_drvdata(pdev);
- } else
return platform_get_drvdata(_dev);
+}
There is a lot of needless casting in here. Why not pass in a struct device * and use dev_get_drvdata() instead?
diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
[...]
+#include "hw/host1x01.h" +#include "dev.h" +#include "hw/host1x01_hardware.h"
The ordering here looks funny.
+#include "hw/syncpt_hw.c"
Why include the source file here? Can't you compile it separately instead?
diff --git a/drivers/gpu/host1x/hw/host1x01.h b/drivers/gpu/host1x/hw/host1x01.h
[...]
+int host1x01_init(struct host1x *);
For completeness you should probably name the parameter, even if this is a prototype.
diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h
[...]
+#include <linux/types.h> +#include <linux/bitops.h> +#include "hw_host1x01_sync.h"
Again, a blank line might help between the above two. I also assume that this file will be filled with more content later on, so I guess it's not worth the trouble to postpone it's creation until a later point.
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
[...]
+static inline u32 host1x_sync_syncpt_0_r(void) +{
- return 0x400;
+} +#define HOST1X_SYNC_SYNCPT_0 \
- host1x_sync_syncpt_0_r()
+static inline u32 host1x_sync_syncpt_base_0_r(void) +{
- return 0x600;
+} +#define HOST1X_SYNC_SYNCPT_BASE_0 \
- host1x_sync_syncpt_base_0_r()
+static inline u32 host1x_sync_syncpt_cpu_incr_r(void) +{
- return 0x700;
+}
Perhaps it would be useful to modify these to take the syncpt ID as a parameter? That way you don't have to remember to do the multiplication everytime you access the register?
diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c
[...]
+/*
- Updates the last value read from hardware.
- (was host1x_syncpt_load_min)
Can the comment in () not be dropped? Given that this is new code nobody would know about the old name.
- */
+static u32 syncpt_load_min(struct host1x_syncpt *sp) +{
- struct host1x *dev = sp->dev;
- u32 old, live;
- do {
old = host1x_syncpt_read_min(sp);
live = host1x_sync_readl(dev,
HOST1X_SYNC_SYNCPT_0 + sp->id * 4);
- } while ((u32)atomic_cmpxchg(&sp->min_val, old, live) != old);
I think this warrants a comment.
- if (!host1x_syncpt_check_max(sp, live))
dev_err(&dev->dev->dev,
"%s failed: id=%u, min=%d, max=%d\n",
__func__,
sp->id,
host1x_syncpt_read_min(sp),
host1x_syncpt_read_max(sp));
You could probably make this fit into less lines.
+/*
- Write a cpu syncpoint increment to the hardware, without touching
- the cache. Caller is responsible for host being powered.
- */
The last part of this comment applies to every host1x function, right? So maybe it should just be dropped.
+static void syncpt_debug(struct host1x_syncpt *sp) +{
- u32 i;
- for (i = 0; i < host1x_syncpt_nb_pts(sp->dev); i++) {
u32 max = host1x_syncpt_read_max(sp);
u32 min = host1x_syncpt_load_min(sp);
if (!max && !min)
continue;
dev_info(&sp->dev->dev->dev,
"id %d (%s) min %d max %d\n",
i, sp->name,
min, max);
- }
There's a gratuitous blank line above.
- for (i = 0; i < host1x_syncpt_nb_bases(sp->dev); i++) {
u32 base_val;
host1x_syncpt_read_wait_base(sp);
base_val = sp->base_val;
if (base_val)
dev_info(&sp->dev->dev->dev,
"waitbase id %d val %d\n",
i, base_val);
- }
And another one.
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
[...]
+#include <linux/platform_device.h> +#include <linux/slab.h> +#include <linux/stat.h>
I don't think this is needed.
+#include <linux/module.h> +#include "syncpt.h" +#include "dev.h" +#include <trace/events/host1x.h>
Again, some more spacing would be nice here. And the ordering is a bit weird. Maybe put the trace include above syncpt.h and dev.h?
+#define MAX_SYNCPT_LENGTH 5
This doesn't seem to be used anywhere.
+static struct host1x_syncpt *_host1x_syncpt_alloc(struct host1x *host,
struct platform_device *pdev,
int client_managed);
Can't you move the actual implementation here? Also I'm not sure if passing the platform_device is the best choice here. struct device should work just as well.
+/*
- Resets syncpoint and waitbase values to sw shadows
- */
+void host1x_syncpt_reset(struct host1x *dev)
Maybe host1x_syncpt_flush() would be a better name given the above description? Reset does have this hardware reset connotation so my first intuition had been that this would reset the syncpt value to 0.
If you decide to change the name, make sure to change it in the syncpt ops as well.
+/*
- Updates sw shadow state for client managed registers
- */
+void host1x_syncpt_save(struct host1x *dev) +{
- struct host1x_syncpt *sp_base = dev->syncpt;
- u32 i;
- for (i = 0; i < host1x_syncpt_nb_pts(dev); i++) {
if (host1x_syncpt_client_managed(sp_base + i))
dev->syncpt_op.load_min(sp_base + i);
else
WARN_ON(!host1x_syncpt_min_eq_max(sp_base + i));
- }
- for (i = 0; i < host1x_syncpt_nb_bases(dev); i++)
dev->syncpt_op.read_wait_base(sp_base + i);
+}
A similar comment applies here. Though I'm not so sure about a better name. Perhaps host1x_syncpt_sync()?
I know that this must sound all pretty straightforward to you, but for somebody who hasn't used these functions at all the names are quite confusing. So instead of people to go read the documentation I tend to think that making the names as descriptive as possible is essential here.
+/*
- Updates the last value read from hardware.
- */
+u32 host1x_syncpt_load_min(struct host1x_syncpt *sp) +{
- u32 val;
- val = sp->dev->syncpt_op.load_min(sp);
- trace_host1x_syncpt_load_min(sp->id, val);
- return val;
+}
I don't know I understand what this means exactly. Does it read the value that hardware last incremented? Perhaps this will become clearer when you add a comment to the syncpt_load_min() implementation.
+int host1x_syncpt_init(struct host1x *host) +{
- struct host1x_syncpt *syncpt, *sp;
- int i;
- syncpt = sp = devm_kzalloc(&host->dev->dev,
sizeof(struct host1x_syncpt) * host->info.nb_pts,
You can make this a bit shorter by using sizeof(*sp) instead.
- for (i = 0; i < host->info.nb_pts; ++i, ++sp) {
sp->id = i;
sp->dev = host;
Perhaps:
syncpt[i].id = i; syncpt[i].dev = host;
To avoid the need to explicitly keep track of sp?
+static struct host1x_syncpt *_host1x_syncpt_alloc(struct host1x *host,
struct platform_device *pdev,
int client_managed)
+{
- int i;
- struct host1x_syncpt *sp = host->syncpt;
- char *name;
- for (i = 0; i < host->info.nb_pts && sp->name; i++, sp++)
;
- if (sp->pdev)
return NULL;
- name = kasprintf(GFP_KERNEL, "%02d-%s", sp->id,
pdev ? dev_name(&pdev->dev) : NULL);
- if (!name)
return NULL;
- sp->pdev = pdev;
- sp->name = name;
- sp->client_managed = client_managed;
- return sp;
+}
+struct host1x_syncpt *host1x_syncpt_alloc(struct platform_device *pdev,
int client_managed)
+{
- struct host1x *host = host1x_get_host(pdev);
- return _host1x_syncpt_alloc(host, pdev, client_managed);
+}
I think it's enough to keep track of the struct device here instead of the struct platform_device.
Also the syncpoint is not actually allocated here, so maybe host1x_syncpt_request() would be a better name. As a nice side-effect it makes the naming more similar to the IRQ API and might be easier to work with.
+struct host1x_syncpt *host1x_syncpt_get(struct host1x *dev, u32 id) +{
- return dev->syncpt + id;
+}
Should this perhaps do some error checking. What if the specified syncpt hasn't actually been requested before?
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
[...]
+struct host1x_syncpt {
- int id;
- atomic_t min_val;
- atomic_t max_val;
- u32 base_val;
- const char *name;
- int client_managed;
Is this field actually ever used? Looking through the patches none of the clients actually set this.
+/*
- Returns true if syncpoint min == max, which means that there are no
- outstanding operations.
- */
+static inline bool host1x_syncpt_min_eq_max(struct host1x_syncpt *sp) +{
- int min, max;
- smp_rmb();
- min = atomic_read(&sp->min_val);
- max = atomic_read(&sp->max_val);
- return (min == max);
+}
Maybe call this host1x_syncpt_idle() or something similar instead?
+{
- return sp->id != NVSYNCPT_INVALID &&
sp->id < host1x_syncpt_nb_pts(sp->dev);
+}
Is there really a need for NVSYNCPT_INVALID? Even if you want to keep the special case you can drop the explicit check because -1 will be larger than host1x_syncpt_nb_pts() anyway.
Thierry
On 04.02.2013 01:09, Thierry Reding wrote:
On Tue, Jan 15, 2013 at 01:43:57PM +0200, Terje Bergstrom wrote:
Add host1x, the driver for host1x and its client unit 2D.
Maybe this could be a bit more verbose. Perhaps describe what host1x is.
Sure. I could just steal the paragraph from Stephen:
The Tegra host1x module is the DMA engine for register access to Tegra's graphics- and multimedia-related modules. The modules served by host1x are referred to as clients. host1x includes some other functionality, such as synchronization.
diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig
[...]
@@ -0,0 +1,6 @@ +config TEGRA_HOST1X
tristate "Tegra host1x driver"
help
Driver for the Tegra host1x hardware.
Maybe s/Tegra/NVIDIA Tegra/?
Sounds good.
Required for enabling tegradrm.
This should probably be dropped. Either encode such knowledge as explicit dependencies or in this case just remove it altogether since we will probably merge both drivers anyway.
I think this was left from previous versions. Now it just doesn't make sense. I'll just drop it.
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
[...]
+#include <linux/module.h> +#include <linux/list.h> +#include <linux/slab.h> +#include <linux/of.h> +#include <linux/of_device.h> +#include <linux/clk.h> +#include <linux/io.h> +#include "dev.h"
Maybe add a blank line between the previous two lines to visually separate standard Linux includes from driver-specific ones.
Ok. You commented in quite few places in a similar way. I'll fix all of them to first include system includes, then driver's own includes, and add a blank line in between.
+#include "hw/host1x01.h"
+#define CREATE_TRACE_POINTS +#include <trace/events/host1x.h>
+#define DRIVER_NAME "tegra-host1x"
You only ever use this once, so maybe it can just be dropped?
Yes.
+static struct host1x_device_info host1x_info = {
Perhaps this should be host1x01_info in order to match the hardware revision? That'll avoid it having to be renamed later on when other revisions start to appear.
Ok, will do. I thought it'd be awkward being alone until the second version appears, but I'll add it.
+static int host1x_probe(struct platform_device *dev) +{
[...]
syncpt_irq = platform_get_irq(dev, 0);
if (IS_ERR_VALUE(syncpt_irq)) {
This is somewhat unusual. It should be fine to just do:
if (syncpt_irq < 0)
but IS_ERR_VALUE() should work fine too.
I'll use the simpler version.
memcpy(&host->info, devid->data, sizeof(struct host1x_device_info));
Why not make the .info field a pointer to struct host1x_device_info instead? That way you don't have to keep separate copies of the same information.
This had something to do with __init data and non-init data. But, we're not really putting this data into __init, so we should be able to use just a pointer.
/* set common host1x device data */
platform_set_drvdata(dev, host);
host->regs = devm_request_and_ioremap(&dev->dev, regs);
if (!host->regs) {
dev_err(&dev->dev, "failed to remap host registers\n");
return -ENXIO;
}
This should probably be rewritten as:
host->regs = devm_ioremap_resource(&dev->dev, regs); if (IS_ERR(host->regs)) return PTR_ERR(host->regs);
Though that function will only be available in 3.9-rc1.
Ok, 3.9-rc1 is fine as a basis.
err = host1x_syncpt_init(host);
if (err)
return err;
[...]
host1x_syncpt_reset(host);
Why separate host1x_syncpt_reset() from host1x_syncpt_init()? I see why it might be useful to have host1x_syncpt_reset() as a separate function but couldn't it be called as part of host1x_syncpt_init()?
host1x_syncpt_init() is used for initializing the syncpt structures, and is called in probe. host1x_syncpt_reset() should be called whenever we think hardware state is lost, for example if VDD_CORE was rail gated due to system suspend.
dev_info(&dev->dev, "initialized\n");
I don't think this is very useful. We should make sure to tell people when things fail. When everything goes as planned we don't need to brag about it =)
True. I wish other kernel drivers followed that same philosophy. :-) I'll remove that. It's mainly useful as debug help, but it's as easy to check from sysfs the state.
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
[...]
+struct host1x_syncpt_ops {
[...]
const char * (*name)(struct host1x_syncpt *);
Why do we need this? Could we not refer to the syncpt name directly instead of going through this wrapper? I'd expect the name to be static.
This must be a relic. I'll remove the wrapper.
+struct host1x_device_info {
Maybe this should be called simply host1x_info? _device seems redundant.
Sure.
int nb_channels; /* host1x: num channels supported */
int nb_pts; /* host1x: num syncpoints supported */
int nb_bases; /* host1x: num syncpoints supported */
int nb_mlocks; /* host1x: number of mlocks */
int (*init)(struct host1x *); /* initialize per SoC ops */
int sync_offset;
+};
While this isn't public API, maybe it would still be useful to turn the comments into proper kerneldoc? That's what people are used to.
Ok.
+struct host1x {
void __iomem *regs;
struct host1x_syncpt *syncpt;
struct platform_device *dev;
struct host1x_device_info info;
struct clk *clk;
struct host1x_syncpt_ops syncpt_op;
Maybe make this a struct host1x_syncpt_ops * instead so you don't have separate copies? While at it, maybe this should be const as well.
Sounds good. I guess there are other areas in need of a const, too.
struct dentry *debugfs;
This doesn't seem to be used anywhere.
It's a failed split - it's used in the debug patch (4/8).
+static inline +struct host1x *host1x_get_host(struct platform_device *_dev) +{
struct platform_device *pdev;
if (_dev->dev.parent) {
pdev = to_platform_device(_dev->dev.parent);
return platform_get_drvdata(pdev);
} else
return platform_get_drvdata(_dev);
+}
There is a lot of needless casting in here. Why not pass in a struct device * and use dev_get_drvdata() instead?
Hmm, true, this should fit into smaller space.
diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
[...]
+#include "hw/host1x01.h" +#include "dev.h" +#include "hw/host1x01_hardware.h"
The ordering here looks funny.
I'll make it more alphabetic.
+#include "hw/syncpt_hw.c"
Why include the source file here? Can't you compile it separately instead?
It's because we need to compile with the hardware headers of that host1x version, because we haven't been good at keeping compatibility. So host1x01.c #includes version 01 headers, and syncpt_hw.c in this compilation unit gets compiled with that. 02 would include 02 headers, and syncpt_hw.c would get compiled with its register definitions etc.
diff --git a/drivers/gpu/host1x/hw/host1x01.h b/drivers/gpu/host1x/hw/host1x01.h
[...]
+int host1x01_init(struct host1x *);
For completeness you should probably name the parameter, even if this is a prototype.
Ok.
diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h
[...]
+#include <linux/types.h> +#include <linux/bitops.h> +#include "hw_host1x01_sync.h"
Again, a blank line might help between the above two. I also assume that this file will be filled with more content later on, so I guess it's not worth the trouble to postpone it's creation until a later point.
Yeah, most of the content gets added by the dreaded patch 3.
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
[...]
+static inline u32 host1x_sync_syncpt_0_r(void) +{
return 0x400;
+} +#define HOST1X_SYNC_SYNCPT_0 \
host1x_sync_syncpt_0_r()
+static inline u32 host1x_sync_syncpt_base_0_r(void) +{
return 0x600;
+} +#define HOST1X_SYNC_SYNCPT_BASE_0 \
host1x_sync_syncpt_base_0_r()
+static inline u32 host1x_sync_syncpt_cpu_incr_r(void) +{
return 0x700;
+}
Perhaps it would be useful to modify these to take the syncpt ID as a parameter? That way you don't have to remember to do the multiplication everytime you access the register?
Yeah, sounds good.
diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c
[...]
+/*
- Updates the last value read from hardware.
- (was host1x_syncpt_load_min)
Can the comment in () not be dropped? Given that this is new code nobody would know about the old name.
Yes, it should be dropped.
- */
+static u32 syncpt_load_min(struct host1x_syncpt *sp) +{
struct host1x *dev = sp->dev;
u32 old, live;
do {
old = host1x_syncpt_read_min(sp);
live = host1x_sync_readl(dev,
HOST1X_SYNC_SYNCPT_0 + sp->id * 4);
} while ((u32)atomic_cmpxchg(&sp->min_val, old, live) != old);
I think this warrants a comment.
Sure. It just loops in case there's a race writing to min_val.
if (!host1x_syncpt_check_max(sp, live))
dev_err(&dev->dev->dev,
"%s failed: id=%u, min=%d, max=%d\n",
__func__,
sp->id,
host1x_syncpt_read_min(sp),
host1x_syncpt_read_max(sp));
You could probably make this fit into less lines.
Yes, definitely. Will do.
+/*
- Write a cpu syncpoint increment to the hardware, without touching
- the cache. Caller is responsible for host being powered.
- */
The last part of this comment applies to every host1x function, right? So maybe it should just be dropped.
Yeah, we don't really have runtime PM, so host1x is anyway turned on. In downstream, with dynamic power management, some functions require caller to ensure power is on, some functions turn on power themselves.
I'll remove these comments, as they do not apply until we have runtime PM.
+static void syncpt_debug(struct host1x_syncpt *sp) +{
u32 i;
for (i = 0; i < host1x_syncpt_nb_pts(sp->dev); i++) {
u32 max = host1x_syncpt_read_max(sp);
u32 min = host1x_syncpt_load_min(sp);
if (!max && !min)
continue;
dev_info(&sp->dev->dev->dev,
"id %d (%s) min %d max %d\n",
i, sp->name,
min, max);
}
There's a gratuitous blank line above.
Will remove.
for (i = 0; i < host1x_syncpt_nb_bases(sp->dev); i++) {
u32 base_val;
host1x_syncpt_read_wait_base(sp);
base_val = sp->base_val;
if (base_val)
dev_info(&sp->dev->dev->dev,
"waitbase id %d val %d\n",
i, base_val);
}
And another one.
Consider it gone.
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
[...]
+#include <linux/platform_device.h> +#include <linux/slab.h> +#include <linux/stat.h>
I don't think this is needed.
Yup, gone.
+#include <linux/module.h> +#include "syncpt.h" +#include "dev.h" +#include <trace/events/host1x.h>
Again, some more spacing would be nice here. And the ordering is a bit weird. Maybe put the trace include above syncpt.h and dev.h?
Will do.
+#define MAX_SYNCPT_LENGTH 5
This doesn't seem to be used anywhere.
Yeah, it was an old restriction for length of syncpt name, but as we moved to dynamic allocation, it doesn't apply.
+static struct host1x_syncpt *_host1x_syncpt_alloc(struct host1x *host,
struct platform_device *pdev,
int client_managed);
Can't you move the actual implementation here? Also I'm not sure if passing the platform_device is the best choice here. struct device should work just as well.
True, and sp->pdev needs to become struct device *, too.
+/*
- Resets syncpoint and waitbase values to sw shadows
- */
+void host1x_syncpt_reset(struct host1x *dev)
Maybe host1x_syncpt_flush() would be a better name given the above description? Reset does have this hardware reset connotation so my first intuition had been that this would reset the syncpt value to 0.
Right, it actually reloads values stored in shadow registers back to host1x. Flush doesn't feel like it's conveying the meaning. Would host1x_syncpt_restore() work? That'd match with host1x_syncpt_save(), which just updates all shadow registers from hardware and is used just before host1x loses power.
If you decide to change the name, make sure to change it in the syncpt ops as well.
Sure.
+/*
- Updates sw shadow state for client managed registers
- */
+void host1x_syncpt_save(struct host1x *dev) +{
struct host1x_syncpt *sp_base = dev->syncpt;
u32 i;
for (i = 0; i < host1x_syncpt_nb_pts(dev); i++) {
if (host1x_syncpt_client_managed(sp_base + i))
dev->syncpt_op.load_min(sp_base + i);
else
WARN_ON(!host1x_syncpt_min_eq_max(sp_base + i));
}
for (i = 0; i < host1x_syncpt_nb_bases(dev); i++)
dev->syncpt_op.read_wait_base(sp_base + i);
+}
A similar comment applies here. Though I'm not so sure about a better name. Perhaps host1x_syncpt_sync()?
I know that this must sound all pretty straightforward to you, but for somebody who hasn't used these functions at all the names are quite confusing. So instead of people to go read the documentation I tend to think that making the names as descriptive as possible is essential here.
I definitely agree that naming should be descriptive. This is used when saving host1x state before it loses power, so that's why it's called host1x_syncpt_save().
But I'm open to changing the naming, if something else would feel more descriptive.
+/*
- Updates the last value read from hardware.
- */
+u32 host1x_syncpt_load_min(struct host1x_syncpt *sp) +{
u32 val;
val = sp->dev->syncpt_op.load_min(sp);
trace_host1x_syncpt_load_min(sp->id, val);
return val;
+}
I don't know I understand what this means exactly. Does it read the value that hardware last incremented? Perhaps this will become clearer when you add a comment to the syncpt_load_min() implementation.
It just loads the current syncpt value to shadow register. The shadow register is called min, because host1x tracks the range of sync point increments that hardware is still going to do, so min is the lower boundary of the range.
max tells what the sync point is expected to reach for hardware to be considered idle.
host1x will f.ex. nop out waits for sync point values outside the range, because hardware isn't good at handling syncpt value wrapping.
+int host1x_syncpt_init(struct host1x *host) +{
struct host1x_syncpt *syncpt, *sp;
int i;
syncpt = sp = devm_kzalloc(&host->dev->dev,
sizeof(struct host1x_syncpt) * host->info.nb_pts,
You can make this a bit shorter by using sizeof(*sp) instead.
Will do.
for (i = 0; i < host->info.nb_pts; ++i, ++sp) {
sp->id = i;
sp->dev = host;
Perhaps:
syncpt[i].id = i; syncpt[i].dev = host;
To avoid the need to explicitly keep track of sp?
Sounds good. I usually prefer indexing, so I'm happy with this.
+static struct host1x_syncpt *_host1x_syncpt_alloc(struct host1x *host,
struct platform_device *pdev,
int client_managed)
+{
int i;
struct host1x_syncpt *sp = host->syncpt;
char *name;
for (i = 0; i < host->info.nb_pts && sp->name; i++, sp++)
;
if (sp->pdev)
return NULL;
name = kasprintf(GFP_KERNEL, "%02d-%s", sp->id,
pdev ? dev_name(&pdev->dev) : NULL);
if (!name)
return NULL;
sp->pdev = pdev;
sp->name = name;
sp->client_managed = client_managed;
return sp;
+}
+struct host1x_syncpt *host1x_syncpt_alloc(struct platform_device *pdev,
int client_managed)
+{
struct host1x *host = host1x_get_host(pdev);
return _host1x_syncpt_alloc(host, pdev, client_managed);
+}
I think it's enough to keep track of the struct device here instead of the struct platform_device.
Yes, I actually managed to comment the same thing earlier.
Also the syncpoint is not actually allocated here, so maybe host1x_syncpt_request() would be a better name. As a nice side-effect it makes the naming more similar to the IRQ API and might be easier to work with.
I'm not entirely sure about the difference, but isn't the number to be allocated usually passed to a function ending in _request? Allocate would just allocate the next available - as host1x_syncpt_allocate does.
+struct host1x_syncpt *host1x_syncpt_get(struct host1x *dev, u32 id) +{
return dev->syncpt + id;
+}
Should this perhaps do some error checking. What if the specified syncpt hasn't actually been requested before?
I'll need to check the use of host1x_syncpt_get(). It might be called for un-allocated (or requested, if we choose that) syncpoints. An error check would make sense at least to check that id is smaller than nb_pts.
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
[...]
+struct host1x_syncpt {
int id;
atomic_t min_val;
atomic_t max_val;
u32 base_val;
const char *name;
int client_managed;
Is this field actually ever used? Looking through the patches none of the clients actually set this.
VBLANK should be set client_managed, so a follow-up patch would add a call from dc.c to here, with client_managed=false.
+/*
- Returns true if syncpoint min == max, which means that there are no
- outstanding operations.
- */
+static inline bool host1x_syncpt_min_eq_max(struct host1x_syncpt *sp) +{
int min, max;
smp_rmb();
min = atomic_read(&sp->min_val);
max = atomic_read(&sp->max_val);
return (min == max);
+}
Maybe call this host1x_syncpt_idle() or something similar instead?
Sounds fine - although the syncpt itself isn't idle, but the corresponding client.
+{
return sp->id != NVSYNCPT_INVALID &&
sp->id < host1x_syncpt_nb_pts(sp->dev);
+}
Is there really a need for NVSYNCPT_INVALID? Even if you want to keep the special case you can drop the explicit check because -1 will be larger than host1x_syncpt_nb_pts() anyway.
No, it's not really needed, so I'll remove it.
Terje
On Mon, Feb 04, 2013 at 07:30:08PM -0800, Terje Bergström wrote:
On 04.02.2013 01:09, Thierry Reding wrote:
On Tue, Jan 15, 2013 at 01:43:57PM +0200, Terje Bergstrom wrote:
Add host1x, the driver for host1x and its client unit 2D.
Maybe this could be a bit more verbose. Perhaps describe what host1x is.
Sure. I could just steal the paragraph from Stephen:
The Tegra host1x module is the DMA engine for register access to Tegra's graphics- and multimedia-related modules. The modules served by host1x are referred to as clients. host1x includes some other functionality, such as synchronization.
Yes, that sound good.
err = host1x_syncpt_init(host);
if (err)
return err;
[...]
host1x_syncpt_reset(host);
Why separate host1x_syncpt_reset() from host1x_syncpt_init()? I see why it might be useful to have host1x_syncpt_reset() as a separate function but couldn't it be called as part of host1x_syncpt_init()?
host1x_syncpt_init() is used for initializing the syncpt structures, and is called in probe. host1x_syncpt_reset() should be called whenever we think hardware state is lost, for example if VDD_CORE was rail gated due to system suspend.
My point was that you could include the call to host1x_syncpt_reset() within host1x_syncpt_init(). That will keep unneeded code out of the host1x_probe() function. Also you don't want to use the syncpoints uninitialized, right?
+#include "hw/syncpt_hw.c"
Why include the source file here? Can't you compile it separately instead?
It's because we need to compile with the hardware headers of that host1x version, because we haven't been good at keeping compatibility. So host1x01.c #includes version 01 headers, and syncpt_hw.c in this compilation unit gets compiled with that. 02 would include 02 headers, and syncpt_hw.c would get compiled with its register definitions etc.
Okay, fair enough.
- */
+static u32 syncpt_load_min(struct host1x_syncpt *sp) +{
struct host1x *dev = sp->dev;
u32 old, live;
do {
old = host1x_syncpt_read_min(sp);
live = host1x_sync_readl(dev,
HOST1X_SYNC_SYNCPT_0 + sp->id * 4);
} while ((u32)atomic_cmpxchg(&sp->min_val, old, live) != old);
I think this warrants a comment.
Sure. It just loops in case there's a race writing to min_val.
Oh, I see. That'd make a good comment. Is the cast to (u32) really necessary?
+/*
- Resets syncpoint and waitbase values to sw shadows
- */
+void host1x_syncpt_reset(struct host1x *dev)
Maybe host1x_syncpt_flush() would be a better name given the above description? Reset does have this hardware reset connotation so my first intuition had been that this would reset the syncpt value to 0.
Right, it actually reloads values stored in shadow registers back to host1x. Flush doesn't feel like it's conveying the meaning. Would host1x_syncpt_restore() work? That'd match with host1x_syncpt_save(), which just updates all shadow registers from hardware and is used just before host1x loses power.
Save/restore has the disadvantage of the direction not being implicit. Save could mean save to hardware or save to software. The same is true for restore. However if the direction is clearly defined, save and restore work for me.
Maybe the comment could be changed to be more explicit. Something like:
/* * Write cached syncpoint and waitbase values to hardware. */
And for host1x_syncpt_save():
/* * For client-managed registers, update the cached syncpoint and * waitbase values by reading from the registers. */
+/*
- Updates the last value read from hardware.
- */
+u32 host1x_syncpt_load_min(struct host1x_syncpt *sp) +{
u32 val;
val = sp->dev->syncpt_op.load_min(sp);
trace_host1x_syncpt_load_min(sp->id, val);
return val;
+}
I don't know I understand what this means exactly. Does it read the value that hardware last incremented? Perhaps this will become clearer when you add a comment to the syncpt_load_min() implementation.
It just loads the current syncpt value to shadow register. The shadow register is called min, because host1x tracks the range of sync point increments that hardware is still going to do, so min is the lower boundary of the range.
max tells what the sync point is expected to reach for hardware to be considered idle.
host1x will f.ex. nop out waits for sync point values outside the range, because hardware isn't good at handling syncpt value wrapping.
Maybe the function should be called host1x_syncpt_load() if there is no equivalent way to load the maximum value (since there is no register to read from).
Also the syncpoint is not actually allocated here, so maybe host1x_syncpt_request() would be a better name. As a nice side-effect it makes the naming more similar to the IRQ API and might be easier to work with.
I'm not entirely sure about the difference, but isn't the number to be allocated usually passed to a function ending in _request? Allocate would just allocate the next available - as host1x_syncpt_allocate does.
That's certainly true for interrupts. However, if you look at the DMA subsystem for example, you can also request an unnamed resource.
The difference is sufficiently subtle that host1x_syncpt_allocate() would work for me too, though. I just have a slight preference for host1x_syncpt_request().
Thierry
On 04.02.2013 23:43, Thierry Reding wrote:
My point was that you could include the call to host1x_syncpt_reset() within host1x_syncpt_init(). That will keep unneeded code out of the host1x_probe() function. Also you don't want to use the syncpoints uninitialized, right?
Of course, sorry, I misunderstood. That makes a lot of sense.
- */
+static u32 syncpt_load_min(struct host1x_syncpt *sp) +{
struct host1x *dev = sp->dev;
u32 old, live;
do {
old = host1x_syncpt_read_min(sp);
live = host1x_sync_readl(dev,
HOST1X_SYNC_SYNCPT_0 + sp->id * 4);
} while ((u32)atomic_cmpxchg(&sp->min_val, old, live) != old);
I think this warrants a comment.
Sure. It just loops in case there's a race writing to min_val.
Oh, I see. That'd make a good comment. Is the cast to (u32) really necessary?
I'll add a comment. atomic_cmpxchg returns a signed value, so I think the cast is needed.
Save/restore has the disadvantage of the direction not being implicit. Save could mean save to hardware or save to software. The same is true for restore. However if the direction is clearly defined, save and restore work for me.
Maybe the comment could be changed to be more explicit. Something like:
/* * Write cached syncpoint and waitbase values to hardware. */
And for host1x_syncpt_save():
/* * For client-managed registers, update the cached syncpoint and * waitbase values by reading from the registers. */
I was using save in the same way as f.ex. i915 (i915_suspend.c): save state of hardware to RAM, restore state from RAM. I'll add proper comments, but save and restore are for all syncpts, not only client managed.
+/*
- Updates the last value read from hardware.
- */
+u32 host1x_syncpt_load_min(struct host1x_syncpt *sp) +{
u32 val;
val = sp->dev->syncpt_op.load_min(sp);
trace_host1x_syncpt_load_min(sp->id, val);
return val;
+}
Maybe the function should be called host1x_syncpt_load() if there is no equivalent way to load the maximum value (since there is no register to read from).
Sounds good. Maximum is just a software concept.
That's certainly true for interrupts. However, if you look at the DMA subsystem for example, you can also request an unnamed resource.
The difference is sufficiently subtle that host1x_syncpt_allocate() would work for me too, though. I just have a slight preference for host1x_syncpt_request().
I don't really have a strong preference, so I'll follow your suggestion.
Terje
Add support for sync point interrupts, and sync point wait. Sync point wait used interrupts for unblocking wait.
Signed-off-by: Terje Bergstrom tbergstrom@nvidia.com --- drivers/gpu/host1x/Makefile | 1 + drivers/gpu/host1x/dev.c | 21 +- drivers/gpu/host1x/dev.h | 17 +- drivers/gpu/host1x/hw/host1x01.c | 2 + drivers/gpu/host1x/hw/hw_host1x01_sync.h | 42 ++++ drivers/gpu/host1x/hw/intr_hw.c | 178 +++++++++++++++ drivers/gpu/host1x/intr.c | 356 ++++++++++++++++++++++++++++++ drivers/gpu/host1x/intr.h | 103 +++++++++ drivers/gpu/host1x/syncpt.c | 163 ++++++++++++++ drivers/gpu/host1x/syncpt.h | 5 + 10 files changed, 883 insertions(+), 5 deletions(-) create mode 100644 drivers/gpu/host1x/hw/intr_hw.c create mode 100644 drivers/gpu/host1x/intr.c create mode 100644 drivers/gpu/host1x/intr.h
diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile index 363e6ab..5ef47ff 100644 --- a/drivers/gpu/host1x/Makefile +++ b/drivers/gpu/host1x/Makefile @@ -3,6 +3,7 @@ ccflags-y = -Idrivers/gpu/host1x host1x-y = \ syncpt.o \ dev.o \ + intr.o \ hw/host1x01.o
obj-$(CONFIG_TEGRA_HOST1X) += host1x.o diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index cd2b1ef..7f9f389 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -24,6 +24,7 @@ #include <linux/clk.h> #include <linux/io.h> #include "dev.h" +#include "intr.h" #include "hw/host1x01.h"
#define CREATE_TRACE_POINTS @@ -95,7 +96,6 @@ static int host1x_probe(struct platform_device *dev)
/* set common host1x device data */ platform_set_drvdata(dev, host); - host->regs = devm_request_and_ioremap(&dev->dev, regs); if (!host->regs) { dev_err(&dev->dev, "failed to remap host registers\n"); @@ -109,28 +109,40 @@ static int host1x_probe(struct platform_device *dev) }
err = host1x_syncpt_init(host); - if (err) + if (err) { + dev_err(&dev->dev, "failed to init sync points"); return err; + } + + err = host1x_intr_init(&host->intr, syncpt_irq); + if (err) { + dev_err(&dev->dev, "failed to init irq"); + goto fail_deinit_syncpt; + }
host->clk = devm_clk_get(&dev->dev, NULL); if (IS_ERR(host->clk)) { dev_err(&dev->dev, "failed to get clock\n"); err = PTR_ERR(host->clk); - goto fail_deinit_syncpt; + goto fail_deinit_intr; }
err = clk_prepare_enable(host->clk); if (err < 0) { dev_err(&dev->dev, "failed to enable clock\n"); - goto fail_deinit_syncpt; + goto fail_deinit_intr; }
host1x_syncpt_reset(host);
+ host1x_intr_start(&host->intr, clk_get_rate(host->clk)); + dev_info(&dev->dev, "initialized\n");
return 0;
+fail_deinit_intr: + host1x_intr_deinit(&host->intr); fail_deinit_syncpt: host1x_syncpt_deinit(host); return err; @@ -139,6 +151,7 @@ fail_deinit_syncpt: static int __exit host1x_remove(struct platform_device *dev) { struct host1x *host = platform_get_drvdata(dev); + host1x_intr_deinit(&host->intr); host1x_syncpt_deinit(host); clk_disable_unprepare(host->clk); return 0; diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h index d8f5979..8376092 100644 --- a/drivers/gpu/host1x/dev.h +++ b/drivers/gpu/host1x/dev.h @@ -17,11 +17,12 @@ #ifndef HOST1X_DEV_H #define HOST1X_DEV_H
+#include <linux/platform_device.h> #include "syncpt.h" +#include "intr.h"
struct host1x; struct host1x_syncpt; -struct platform_device;
struct host1x_syncpt_ops { void (*reset)(struct host1x_syncpt *); @@ -34,6 +35,18 @@ struct host1x_syncpt_ops { const char * (*name)(struct host1x_syncpt *); };
+struct host1x_intr_ops { + void (*init_host_sync)(struct host1x_intr *); + void (*set_host_clocks_per_usec)( + struct host1x_intr *, u32 clocks); + void (*set_syncpt_threshold)( + struct host1x_intr *, u32 id, u32 thresh); + void (*enable_syncpt_intr)(struct host1x_intr *, u32 id); + void (*disable_syncpt_intr)(struct host1x_intr *, u32 id); + void (*disable_all_syncpt_intrs)(struct host1x_intr *); + int (*free_syncpt_irq)(struct host1x_intr *); +}; + struct host1x_device_info { int nb_channels; /* host1x: num channels supported */ int nb_pts; /* host1x: num syncpoints supported */ @@ -46,11 +59,13 @@ struct host1x_device_info { struct host1x { void __iomem *regs; struct host1x_syncpt *syncpt; + struct host1x_intr intr; struct platform_device *dev; struct host1x_device_info info; struct clk *clk;
struct host1x_syncpt_ops syncpt_op; + struct host1x_intr_ops intr_op;
struct dentry *debugfs; }; diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c index ea6e604..3d633a3 100644 --- a/drivers/gpu/host1x/hw/host1x01.c +++ b/drivers/gpu/host1x/hw/host1x01.c @@ -26,10 +26,12 @@ #include "hw/host1x01_hardware.h"
#include "hw/syncpt_hw.c" +#include "hw/intr_hw.c"
int host1x01_init(struct host1x *host) { host->syncpt_op = host1x_syncpt_ops; + host->intr_op = host1x_intr_ops;
return 0; } diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h index b12c1a4..5da9afb 100644 --- a/drivers/gpu/host1x/hw/hw_host1x01_sync.h +++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h @@ -51,12 +51,54 @@ #ifndef __hw_host1x01_sync_h__ #define __hw_host1x01_sync_h__
+static inline u32 host1x_sync_syncpt_thresh_cpu0_int_status_r(void) +{ + return 0x40; +} +#define HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS \ + host1x_sync_syncpt_thresh_cpu0_int_status_r() +static inline u32 host1x_sync_syncpt_thresh_int_disable_r(void) +{ + return 0x60; +} +#define HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE \ + host1x_sync_syncpt_thresh_int_disable_r() +static inline u32 host1x_sync_syncpt_thresh_int_enable_cpu0_r(void) +{ + return 0x68; +} +#define HOST1X_SYNC_SYNCPT_THRESH_INT_ENABLE_CPU0 \ + host1x_sync_syncpt_thresh_int_enable_cpu0_r() +static inline u32 host1x_sync_usec_clk_r(void) +{ + return 0x1a4; +} +#define HOST1X_SYNC_USEC_CLK \ + host1x_sync_usec_clk_r() +static inline u32 host1x_sync_ctxsw_timeout_cfg_r(void) +{ + return 0x1a8; +} +#define HOST1X_SYNC_CTXSW_TIMEOUT_CFG \ + host1x_sync_ctxsw_timeout_cfg_r() +static inline u32 host1x_sync_ip_busy_timeout_r(void) +{ + return 0x1bc; +} +#define HOST1X_SYNC_IP_BUSY_TIMEOUT \ + host1x_sync_ip_busy_timeout_r() static inline u32 host1x_sync_syncpt_0_r(void) { return 0x400; } #define HOST1X_SYNC_SYNCPT_0 \ host1x_sync_syncpt_0_r() +static inline u32 host1x_sync_syncpt_int_thresh_0_r(void) +{ + return 0x500; +} +#define HOST1X_SYNC_SYNCPT_INT_THRESH_0 \ + host1x_sync_syncpt_int_thresh_0_r() static inline u32 host1x_sync_syncpt_base_0_r(void) { return 0x600; diff --git a/drivers/gpu/host1x/hw/intr_hw.c b/drivers/gpu/host1x/hw/intr_hw.c new file mode 100644 index 0000000..12488e2 --- /dev/null +++ b/drivers/gpu/host1x/hw/intr_hw.c @@ -0,0 +1,178 @@ +/* + * Tegra host1x Interrupt Management + * + * Copyright (C) 2010 Google, Inc. + * Copyright (c) 2010-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <linux/interrupt.h> +#include <linux/irq.h> +#include <linux/io.h> +#include <asm/mach/irq.h> + +#include "intr.h" +#include "dev.h" + +/* Spacing between sync registers */ +#define REGISTER_STRIDE 4 + +static void host1x_intr_syncpt_thresh_isr(struct host1x_intr_syncpt *syncpt); + +static void syncpt_thresh_cascade_fn(struct work_struct *work) +{ + struct host1x_intr_syncpt *sp = + container_of(work, struct host1x_intr_syncpt, work); + host1x_syncpt_thresh_fn(sp); +} + +static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id) +{ + struct host1x *host1x = dev_id; + struct host1x_intr *intr = &host1x->intr; + unsigned long reg; + int i, id; + + for (i = 0; i < host1x->info.nb_pts / BITS_PER_LONG; i++) { + reg = host1x_sync_readl(host1x, + HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS + + i * REGISTER_STRIDE); + for_each_set_bit(id, ®, BITS_PER_LONG) { + struct host1x_intr_syncpt *sp = + intr->syncpt + (i * BITS_PER_LONG + id); + host1x_intr_syncpt_thresh_isr(sp); + queue_work(intr->wq, &sp->work); + } + } + + return IRQ_HANDLED; +} + +static void host1x_intr_init_host_sync(struct host1x_intr *intr) +{ + struct host1x *host1x = intr_to_host1x(intr); + int i, err; + + host1x_sync_writel(host1x, 0xffffffffUL, + HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE); + host1x_sync_writel(host1x, 0xffffffffUL, + HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS); + + for (i = 0; i < host1x->info.nb_pts; i++) + INIT_WORK(&intr->syncpt[i].work, syncpt_thresh_cascade_fn); + + err = devm_request_irq(&host1x->dev->dev, intr->syncpt_irq, + syncpt_thresh_cascade_isr, + IRQF_SHARED, "host1x_syncpt", host1x); + WARN_ON(IS_ERR_VALUE(err)); + + /* disable the ip_busy_timeout. this prevents write drops */ + host1x_sync_writel(host1x, 0, HOST1X_SYNC_IP_BUSY_TIMEOUT); + + /* + * increase the auto-ack timout to the maximum value. 2d will hang + * otherwise on Tegra2. + */ + host1x_sync_writel(host1x, 0xff, HOST1X_SYNC_CTXSW_TIMEOUT_CFG); +} + +static void host1x_intr_set_host_clocks_per_usec(struct host1x_intr *intr, + u32 cpm) +{ + struct host1x *host1x = intr_to_host1x(intr); + /* write microsecond clock register */ + host1x_sync_writel(host1x, cpm, HOST1X_SYNC_USEC_CLK); +} + +static void host1x_intr_set_syncpt_threshold(struct host1x_intr *intr, + u32 id, u32 thresh) +{ + struct host1x *host1x = intr_to_host1x(intr); + host1x_sync_writel(host1x, thresh, + HOST1X_SYNC_SYNCPT_INT_THRESH_0 + id * REGISTER_STRIDE); +} + +static void host1x_intr_enable_syncpt_intr(struct host1x_intr *intr, u32 id) +{ + struct host1x *host1x = intr_to_host1x(intr); + + host1x_sync_writel(host1x, BIT_MASK(id), + HOST1X_SYNC_SYNCPT_THRESH_INT_ENABLE_CPU0 + + BIT_WORD(id) * REGISTER_STRIDE); +} + +static void host1x_intr_disable_syncpt_intr(struct host1x_intr *intr, u32 id) +{ + struct host1x *host1x = intr_to_host1x(intr); + + host1x_sync_writel(host1x, BIT_MASK(id), + HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE + + BIT_WORD(id) * REGISTER_STRIDE); + + host1x_sync_writel(host1x, BIT_MASK(id), + HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS + + BIT_WORD(id) * REGISTER_STRIDE); +} + +static void host1x_intr_disable_all_syncpt_intrs(struct host1x_intr *intr) +{ + struct host1x *host1x = intr_to_host1x(intr); + u32 reg; + + for (reg = 0; reg <= BIT_WORD(host1x->info.nb_pts) * REGISTER_STRIDE; + reg += REGISTER_STRIDE) { + host1x_sync_writel(host1x, 0xffffffffu, + HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE + + reg); + + host1x_sync_writel(host1x, 0xffffffffu, + HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS + reg); + } +} + +/* + * Sync point threshold interrupt service function + * Handles sync point threshold triggers, in interrupt context + */ +static void host1x_intr_syncpt_thresh_isr(struct host1x_intr_syncpt *syncpt) +{ + unsigned int id = syncpt->id; + struct host1x_intr *intr = intr_syncpt_to_intr(syncpt); + struct host1x *host1x = intr_to_host1x(intr); + u32 reg = BIT_WORD(id) * REGISTER_STRIDE; + + host1x_sync_writel(host1x, BIT_MASK(id), + HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE + reg); + host1x_sync_writel(host1x, BIT_MASK(id), + HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS + reg); +} + +static int host1x_free_syncpt_irq(struct host1x_intr *intr) +{ + struct host1x *host1x = intr_to_host1x(intr); + + devm_free_irq(&host1x->dev->dev, intr->syncpt_irq, host1x); + flush_workqueue(intr->wq); + return 0; +} + +static const struct host1x_intr_ops host1x_intr_ops = { + .init_host_sync = host1x_intr_init_host_sync, + .set_host_clocks_per_usec = host1x_intr_set_host_clocks_per_usec, + .set_syncpt_threshold = host1x_intr_set_syncpt_threshold, + .enable_syncpt_intr = host1x_intr_enable_syncpt_intr, + .disable_syncpt_intr = host1x_intr_disable_syncpt_intr, + .disable_all_syncpt_intrs = host1x_intr_disable_all_syncpt_intrs, + .free_syncpt_irq = host1x_free_syncpt_irq, +}; diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c new file mode 100644 index 0000000..26099b8 --- /dev/null +++ b/drivers/gpu/host1x/intr.c @@ -0,0 +1,356 @@ +/* + * Tegra host1x Interrupt Management + * + * Copyright (c) 2010-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include "intr.h" +#include <linux/interrupt.h> +#include <linux/slab.h> +#include <linux/irq.h> +#include "dev.h" + +/* Wait list management */ + +struct host1x_waitlist { + struct list_head list; + struct kref refcount; + u32 thresh; + enum host1x_intr_action action; + atomic_t state; + void *data; + int count; +}; + +enum waitlist_state { + WLS_PENDING, + WLS_REMOVED, + WLS_CANCELLED, + WLS_HANDLED +}; + +static void waiter_release(struct kref *kref) +{ + kfree(container_of(kref, struct host1x_waitlist, refcount)); +} + +/* + * add a waiter to a waiter queue, sorted by threshold + * returns true if it was added at the head of the queue + */ +static bool add_waiter_to_queue(struct host1x_waitlist *waiter, + struct list_head *queue) +{ + struct host1x_waitlist *pos; + u32 thresh = waiter->thresh; + + list_for_each_entry_reverse(pos, queue, list) + if ((s32)(pos->thresh - thresh) <= 0) { + list_add(&waiter->list, &pos->list); + return false; + } + + list_add(&waiter->list, queue); + return true; +} + +/* + * run through a waiter queue for a single sync point ID + * and gather all completed waiters into lists by actions + */ +static void remove_completed_waiters(struct list_head *head, u32 sync, + struct list_head completed[HOST1X_INTR_ACTION_COUNT]) +{ + struct list_head *dest; + struct host1x_waitlist *waiter, *next; + + list_for_each_entry_safe(waiter, next, head, list) { + if ((s32)(waiter->thresh - sync) > 0) + break; + + dest = completed + waiter->action; + + /* PENDING->REMOVED or CANCELLED->HANDLED */ + if (atomic_inc_return(&waiter->state) == WLS_HANDLED || !dest) { + list_del(&waiter->list); + kref_put(&waiter->refcount, waiter_release); + } else { + list_move_tail(&waiter->list, dest); + } + } +} + +static void reset_threshold_interrupt(struct host1x_intr *intr, + struct list_head *head, + unsigned int id) +{ + struct host1x *host1x = intr_to_host1x(intr); + u32 thresh = list_first_entry(head, + struct host1x_waitlist, list)->thresh; + + host1x->intr_op.set_syncpt_threshold(intr, id, thresh); + host1x->intr_op.enable_syncpt_intr(intr, id); +} + +static void action_wakeup(struct host1x_waitlist *waiter) +{ + wait_queue_head_t *wq = waiter->data; + + wake_up(wq); +} + +static void action_wakeup_interruptible(struct host1x_waitlist *waiter) +{ + wait_queue_head_t *wq = waiter->data; + + wake_up_interruptible(wq); +} + +typedef void (*action_handler)(struct host1x_waitlist *waiter); + +static action_handler action_handlers[HOST1X_INTR_ACTION_COUNT] = { + action_wakeup, + action_wakeup_interruptible, +}; + +static void run_handlers(struct list_head completed[HOST1X_INTR_ACTION_COUNT]) +{ + struct list_head *head = completed; + int i; + + for (i = 0; i < HOST1X_INTR_ACTION_COUNT; ++i, ++head) { + action_handler handler = action_handlers[i]; + struct host1x_waitlist *waiter, *next; + + list_for_each_entry_safe(waiter, next, head, list) { + list_del(&waiter->list); + handler(waiter); + WARN_ON(atomic_xchg(&waiter->state, WLS_HANDLED) + != WLS_REMOVED); + kref_put(&waiter->refcount, waiter_release); + } + } +} + +/* + * Remove & handle all waiters that have completed for the given syncpt + */ +static int process_wait_list(struct host1x_intr *intr, + struct host1x_intr_syncpt *syncpt, + u32 threshold) +{ + struct host1x *host1x = intr_to_host1x(intr); + struct list_head completed[HOST1X_INTR_ACTION_COUNT]; + unsigned int i; + int empty; + + for (i = 0; i < HOST1X_INTR_ACTION_COUNT; ++i) + INIT_LIST_HEAD(completed + i); + + spin_lock(&syncpt->lock); + + remove_completed_waiters(&syncpt->wait_head, threshold, completed); + + empty = list_empty(&syncpt->wait_head); + if (empty) + host1x->intr_op.disable_syncpt_intr(intr, syncpt->id); + else + reset_threshold_interrupt(intr, &syncpt->wait_head, + syncpt->id); + + spin_unlock(&syncpt->lock); + + run_handlers(completed); + + return empty; +} + +/* + * Sync point threshold interrupt service thread function + * Handles sync point threshold triggers, in thread context + */ +irqreturn_t host1x_syncpt_thresh_fn(void *dev_id) +{ + struct host1x_intr_syncpt *syncpt = dev_id; + unsigned int id = syncpt->id; + struct host1x_intr *intr = intr_syncpt_to_intr(syncpt); + struct host1x *host1x = intr_to_host1x(intr); + + (void)process_wait_list(intr, syncpt, + host1x_syncpt_load_min(host1x->syncpt + id)); + + return IRQ_HANDLED; +} + +int host1x_intr_add_action(struct host1x_intr *intr, u32 id, u32 thresh, + enum host1x_intr_action action, void *data, + void *_waiter, + void **ref) +{ + struct host1x *host1x = intr_to_host1x(intr); + struct host1x_waitlist *waiter = _waiter; + struct host1x_intr_syncpt *syncpt; + int queue_was_empty; + + if (waiter == NULL) { + pr_warn("%s: NULL waiter\n", __func__); + return -EINVAL; + } + + /* initialize a new waiter */ + INIT_LIST_HEAD(&waiter->list); + kref_init(&waiter->refcount); + if (ref) + kref_get(&waiter->refcount); + waiter->thresh = thresh; + waiter->action = action; + atomic_set(&waiter->state, WLS_PENDING); + waiter->data = data; + waiter->count = 1; + + syncpt = intr->syncpt + id; + + spin_lock(&syncpt->lock); + + queue_was_empty = list_empty(&syncpt->wait_head); + + if (add_waiter_to_queue(waiter, &syncpt->wait_head)) { + /* added at head of list - new threshold value */ + host1x->intr_op.set_syncpt_threshold(intr, id, thresh); + + /* added as first waiter - enable interrupt */ + if (queue_was_empty) + host1x->intr_op.enable_syncpt_intr(intr, id); + } + + spin_unlock(&syncpt->lock); + + if (ref) + *ref = waiter; + return 0; +} + +void *host1x_intr_alloc_waiter(void) +{ + return kzalloc(sizeof(struct host1x_waitlist), GFP_KERNEL); +} + +void host1x_intr_put_ref(struct host1x_intr *intr, u32 id, void *ref) +{ + struct host1x_waitlist *waiter = ref; + struct host1x_intr_syncpt *syncpt; + struct host1x *host1x = intr_to_host1x(intr); + + while (atomic_cmpxchg(&waiter->state, + WLS_PENDING, WLS_CANCELLED) == WLS_REMOVED) + schedule(); + + syncpt = intr->syncpt + id; + (void)process_wait_list(intr, syncpt, + host1x_syncpt_load_min(host1x->syncpt + id)); + + kref_put(&waiter->refcount, waiter_release); +} + +int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync) +{ + unsigned int id; + struct host1x *host1x = intr_to_host1x(intr); + u32 nb_pts = host1x_syncpt_nb_pts(host1x); + + intr->syncpt = devm_kzalloc(&host1x->dev->dev, + sizeof(struct host1x_intr_syncpt) * + host1x->info.nb_pts, + GFP_KERNEL); + + if (!host1x->intr.syncpt) + return -ENOMEM; + + mutex_init(&intr->mutex); + intr->syncpt_irq = irq_sync; + intr->wq = create_workqueue("host_syncpt"); + if (!intr->wq) + return -ENOMEM; + + for (id = 0; id < nb_pts; ++id) { + struct host1x_intr_syncpt *syncpt = &intr->syncpt[id]; + + syncpt->intr = &host1x->intr; + syncpt->id = id; + spin_lock_init(&syncpt->lock); + INIT_LIST_HEAD(&syncpt->wait_head); + snprintf(syncpt->thresh_irq_name, + sizeof(syncpt->thresh_irq_name), + "host1x_sp_%02d", id); + } + + return 0; +} + +void host1x_intr_deinit(struct host1x_intr *intr) +{ + host1x_intr_stop(intr); + destroy_workqueue(intr->wq); +} + +void host1x_intr_start(struct host1x_intr *intr, u32 hz) +{ + struct host1x *host1x = intr_to_host1x(intr); + mutex_lock(&intr->mutex); + + host1x->intr_op.init_host_sync(intr); + host1x->intr_op.set_host_clocks_per_usec(intr, + DIV_ROUND_UP(hz, 1000000)); + + mutex_unlock(&intr->mutex); +} + +void host1x_intr_stop(struct host1x_intr *intr) +{ + unsigned int id; + struct host1x *host1x = intr_to_host1x(intr); + struct host1x_intr_syncpt *syncpt; + u32 nb_pts = host1x_syncpt_nb_pts(intr_to_host1x(intr)); + + mutex_lock(&intr->mutex); + + host1x->intr_op.disable_all_syncpt_intrs(intr); + + for (id = 0, syncpt = intr->syncpt; + id < nb_pts; + ++id, ++syncpt) { + struct host1x_waitlist *waiter, *next; + list_for_each_entry_safe(waiter, next, + &syncpt->wait_head, list) { + if (atomic_cmpxchg(&waiter->state, + WLS_CANCELLED, WLS_HANDLED) + == WLS_CANCELLED) { + list_del(&waiter->list); + kref_put(&waiter->refcount, waiter_release); + } + } + + if (!list_empty(&syncpt->wait_head)) { /* output diagnostics */ + mutex_unlock(&intr->mutex); + pr_warn("%s cannot stop syncpt intr id=%d\n", + __func__, id); + return; + } + } + + host1x->intr_op.free_syncpt_irq(intr); + + mutex_unlock(&intr->mutex); +} diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h new file mode 100644 index 0000000..679a7b4 --- /dev/null +++ b/drivers/gpu/host1x/intr.h @@ -0,0 +1,103 @@ +/* + * Tegra host1x Interrupt Management + * + * Copyright (c) 2010-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef __HOST1X_INTR_H +#define __HOST1X_INTR_H + +#include <linux/interrupt.h> +#include <linux/workqueue.h> + +enum host1x_intr_action { + /* + * Wake up a task. + * 'data' points to a wait_queue_head_t + */ + HOST1X_INTR_ACTION_WAKEUP, + + /* + * Wake up a interruptible task. + * 'data' points to a wait_queue_head_t + */ + HOST1X_INTR_ACTION_WAKEUP_INTERRUPTIBLE, + + HOST1X_INTR_ACTION_COUNT +}; + +struct host1x_intr; + +struct host1x_intr_syncpt { + struct host1x_intr *intr; + u8 id; + spinlock_t lock; + struct list_head wait_head; + char thresh_irq_name[12]; + struct work_struct work; +}; + +struct host1x_intr { + struct host1x_intr_syncpt *syncpt; + struct mutex mutex; + int syncpt_irq; + struct workqueue_struct *wq; +}; +#define intr_to_host1x(x) container_of(x, struct host1x, intr) +#define intr_syncpt_to_intr(is) (is->intr) + +/* + * Schedule an action to be taken when a sync point reaches the given threshold. + * + * @id the sync point + * @thresh the threshold + * @action the action to take + * @data a pointer to extra data depending on action, see above + * @waiter waiter allocated with host1x_intr_alloc_waiter - assumes ownership + * @ref must be passed if cancellation is possible, else NULL + * + * This is a non-blocking api. + */ +int host1x_intr_add_action(struct host1x_intr *intr, u32 id, u32 thresh, + enum host1x_intr_action action, void *data, + void *waiter, + void **ref); + +/* + * Allocate a waiter. + */ +void *host1x_intr_alloc_waiter(void); + +/* + * Unreference an action submitted to host1x_intr_add_action(). + * You must call this if you passed non-NULL as ref. + * @ref the ref returned from host1x_intr_add_action() + */ +void host1x_intr_put_ref(struct host1x_intr *intr, u32 id, void *ref); + +/* Initialize host1x sync point interrupt */ +int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync); + +/* Deinitialize host1x sync point interrupt */ +void host1x_intr_deinit(struct host1x_intr *intr); + +/* Enable host1x sync point interrupt */ +void host1x_intr_start(struct host1x_intr *intr, u32 hz); + +/* Disable host1x sync point interrupt */ +void host1x_intr_stop(struct host1x_intr *intr); + +irqreturn_t host1x_syncpt_thresh_fn(void *dev_id); +#endif diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c index b45651f..32e2b42 100644 --- a/drivers/gpu/host1x/syncpt.c +++ b/drivers/gpu/host1x/syncpt.c @@ -22,9 +22,12 @@ #include <linux/module.h> #include "syncpt.h" #include "dev.h" +#include "intr.h" #include <trace/events/host1x.h>
#define MAX_SYNCPT_LENGTH 5 +#define SYNCPT_CHECK_PERIOD (2 * HZ) +#define MAX_STUCK_CHECK_COUNT 15
static struct host1x_syncpt *_host1x_syncpt_alloc(struct host1x *host, struct platform_device *pdev, @@ -119,6 +122,166 @@ void host1x_syncpt_incr(struct host1x_syncpt *sp) host1x_syncpt_cpu_incr(sp); }
+/* + * Updated sync point form hardware, and returns true if syncpoint is expired, + * false if we may need to wait + */ +static bool syncpt_load_min_is_expired( + struct host1x_syncpt *sp, + u32 thresh) +{ + sp->dev->syncpt_op.load_min(sp); + return host1x_syncpt_is_expired(sp, thresh); +} + +/* + * Main entrypoint for syncpoint value waits. + */ +int host1x_syncpt_wait(struct host1x_syncpt *sp, + u32 thresh, long timeout, u32 *value) +{ + DECLARE_WAIT_QUEUE_HEAD_ONSTACK(wq); + void *ref; + void *waiter; + int err = 0, check_count = 0; + u32 val; + + if (value) + *value = 0; + + /* first check cache */ + if (host1x_syncpt_is_expired(sp, thresh)) { + if (value) + *value = host1x_syncpt_read_min(sp); + return 0; + } + + /* try to read from register */ + val = sp->dev->syncpt_op.load_min(sp); + if (host1x_syncpt_is_expired(sp, thresh)) { + if (value) + *value = val; + goto done; + } + + if (!timeout) { + err = -EAGAIN; + goto done; + } + + /* schedule a wakeup when the syncpoint value is reached */ + waiter = host1x_intr_alloc_waiter(); + if (!waiter) { + err = -ENOMEM; + goto done; + } + + err = host1x_intr_add_action(&(sp->dev->intr), sp->id, thresh, + HOST1X_INTR_ACTION_WAKEUP_INTERRUPTIBLE, &wq, + waiter, + &ref); + if (err) + goto done; + + err = -EAGAIN; + /* Caller-specified timeout may be impractically low */ + if (timeout < 0) + timeout = LONG_MAX; + + /* wait for the syncpoint, or timeout, or signal */ + while (timeout) { + long check = min_t(long, SYNCPT_CHECK_PERIOD, timeout); + int remain = wait_event_interruptible_timeout(wq, + syncpt_load_min_is_expired(sp, thresh), + check); + if (remain > 0 || host1x_syncpt_is_expired(sp, thresh)) { + if (value) + *value = host1x_syncpt_read_min(sp); + err = 0; + break; + } + if (remain < 0) { + err = remain; + break; + } + timeout -= check; + if (timeout && check_count <= MAX_STUCK_CHECK_COUNT) { + dev_warn(&sp->dev->dev->dev, + "%s: syncpoint id %d (%s) stuck waiting %d, timeout=%ld\n", + current->comm, sp->id, sp->name, + thresh, timeout); + sp->dev->syncpt_op.debug(sp); + check_count++; + } + } + host1x_intr_put_ref(&(sp->dev->intr), sp->id, ref); + +done: + return err; +} +EXPORT_SYMBOL(host1x_syncpt_wait); + +/* + * Returns true if syncpoint is expired, false if we may need to wait + */ +bool host1x_syncpt_is_expired( + struct host1x_syncpt *sp, + u32 thresh) +{ + u32 current_val; + u32 future_val; + smp_rmb(); + current_val = (u32)atomic_read(&sp->min_val); + future_val = (u32)atomic_read(&sp->max_val); + + /* Note the use of unsigned arithmetic here (mod 1<<32). + * + * c = current_val = min_val = the current value of the syncpoint. + * t = thresh = the value we are checking + * f = future_val = max_val = the value c will reach when all + * outstanding increments have completed. + * + * Note that c always chases f until it reaches f. + * + * Dtf = (f - t) + * Dtc = (c - t) + * + * Consider all cases: + * + * A) .....c..t..f..... Dtf < Dtc need to wait + * B) .....c.....f..t.. Dtf > Dtc expired + * C) ..t..c.....f..... Dtf > Dtc expired (Dct very large) + * + * Any case where f==c: always expired (for any t). Dtf == Dcf + * Any case where t==c: always expired (for any f). Dtf >= Dtc (because Dtc==0) + * Any case where t==f!=c: always wait. Dtf < Dtc (because Dtf==0, + * Dtc!=0) + * + * Other cases: + * + * A) .....t..f..c..... Dtf < Dtc need to wait + * A) .....f..c..t..... Dtf < Dtc need to wait + * A) .....f..t..c..... Dtf > Dtc expired + * + * So: + * Dtf >= Dtc implies EXPIRED (return true) + * Dtf < Dtc implies WAIT (return false) + * + * Note: If t is expired then we *cannot* wait on it. We would wait + * forever (hang the system). + * + * Note: do NOT get clever and remove the -thresh from both sides. It + * is NOT the same. + * + * If future valueis zero, we have a client managed sync point. In that + * case we do a direct comparison. + */ + if (!host1x_syncpt_client_managed(sp)) + return future_val - thresh >= current_val - thresh; + else + return (s32)(current_val - thresh) >= 0; +} + void host1x_syncpt_debug(struct host1x_syncpt *sp) { sp->dev->syncpt_op.debug(sp); diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h index d9b9b0a..b46d044 100644 --- a/drivers/gpu/host1x/syncpt.h +++ b/drivers/gpu/host1x/syncpt.h @@ -114,6 +114,7 @@ void host1x_syncpt_cpu_incr(struct host1x_syncpt *sp);
/* Load current value from hardware to the shadow register. */ u32 host1x_syncpt_load_min(struct host1x_syncpt *sp); +bool host1x_syncpt_is_expired(struct host1x_syncpt *sp, u32 thresh);
/* Save host1x sync point state into shadow registers. */ void host1x_syncpt_save(struct host1x *dev); @@ -133,6 +134,10 @@ u32 host1x_syncpt_incr_max(struct host1x_syncpt *sp, u32 incrs); /* Do a debug dump of sync point values. */ void host1x_syncpt_debug(struct host1x_syncpt *sp);
+/* Wait until sync point reaches a threshold value, or a timeout. */ +int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh, + long timeout, u32 *value); + /* Check if sync point id is valid. */ static inline int host1x_syncpt_is_valid(struct host1x_syncpt *sp) {
On Tue, Jan 15, 2013 at 01:43:58PM +0200, Terje Bergstrom wrote: [...]
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
[...]
@@ -95,7 +96,6 @@ static int host1x_probe(struct platform_device *dev)
/* set common host1x device data */ platform_set_drvdata(dev, host);
- host->regs = devm_request_and_ioremap(&dev->dev, regs); if (!host->regs) { dev_err(&dev->dev, "failed to remap host registers\n");
This seems an unrelated (and actually undesirable) change.
@@ -109,28 +109,40 @@ static int host1x_probe(struct platform_device *dev) }
err = host1x_syncpt_init(host);
- if (err)
- if (err) {
return err;dev_err(&dev->dev, "failed to init sync points");
- }
This error message should probably have gone in the previous patch as well.
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h index d8f5979..8376092 100644 --- a/drivers/gpu/host1x/dev.h +++ b/drivers/gpu/host1x/dev.h @@ -17,11 +17,12 @@ #ifndef HOST1X_DEV_H #define HOST1X_DEV_H
+#include <linux/platform_device.h> #include "syncpt.h" +#include "intr.h"
struct host1x; struct host1x_syncpt; -struct platform_device;
Why include platform_device.h here?
@@ -34,6 +35,18 @@ struct host1x_syncpt_ops { const char * (*name)(struct host1x_syncpt *); };
+struct host1x_intr_ops {
- void (*init_host_sync)(struct host1x_intr *);
- void (*set_host_clocks_per_usec)(
struct host1x_intr *, u32 clocks);
Could the above two not be combined? The only reason to keep them separate would be if the host1x clock was dynamically changed, but I don't think we support that, right?
- void (*set_syncpt_threshold)(
struct host1x_intr *, u32 id, u32 thresh);
- void (*enable_syncpt_intr)(struct host1x_intr *, u32 id);
- void (*disable_syncpt_intr)(struct host1x_intr *, u32 id);
- void (*disable_all_syncpt_intrs)(struct host1x_intr *);
Can disable_all_syncpt_intrs() not be implemented generically using the number of syncpoints as exposed by host1x_device_info and the .disable_syncpt_intr() function?
@@ -46,11 +59,13 @@ struct host1x_device_info { struct host1x { void __iomem *regs; struct host1x_syncpt *syncpt;
struct host1x_intr intr; struct platform_device *dev; struct host1x_device_info info; struct clk *clk;
struct host1x_syncpt_ops syncpt_op;
struct host1x_intr_ops intr_op;
I think carrying a const pointer to the interrupt operations structure is a better option here.
diff --git a/drivers/gpu/host1x/hw/intr_hw.c b/drivers/gpu/host1x/hw/intr_hw.c
[...]
+static void host1x_intr_syncpt_thresh_isr(struct host1x_intr_syncpt *syncpt);
Can we avoid this forward declaration?
+static void syncpt_thresh_cascade_fn(struct work_struct *work)
syncpt_thresh_work()?
+{
- struct host1x_intr_syncpt *sp =
container_of(work, struct host1x_intr_syncpt, work);
- host1x_syncpt_thresh_fn(sp);
Couldn't we inline the host1x_syncpt_thresh_fn() implementation here? Why do we need to go through an external function declaration?
+static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id) +{
- struct host1x *host1x = dev_id;
- struct host1x_intr *intr = &host1x->intr;
- unsigned long reg;
- int i, id;
- for (i = 0; i < host1x->info.nb_pts / BITS_PER_LONG; i++) {
reg = host1x_sync_readl(host1x,
HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS +
i * REGISTER_STRIDE);
for_each_set_bit(id, ®, BITS_PER_LONG) {
struct host1x_intr_syncpt *sp =
intr->syncpt + (i * BITS_PER_LONG + id);
host1x_intr_syncpt_thresh_isr(sp);
Have you considered mimicking the IRQ API and name this something like host1x_intr_syncpt_thresh_handle() and name the actual ISR just syncpt_thresh_isr()? Not so important but it makes things a bit clearer in my opinion.
queue_work(intr->wq, &sp->work);
Should the call to queue_work() perhaps be moved into host1x_intr_syncpt_thresh_isr().
+static void host1x_intr_init_host_sync(struct host1x_intr *intr) +{
- struct host1x *host1x = intr_to_host1x(intr);
- int i, err;
- host1x_sync_writel(host1x, 0xffffffffUL,
HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE);
- host1x_sync_writel(host1x, 0xffffffffUL,
HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS);
- for (i = 0; i < host1x->info.nb_pts; i++)
INIT_WORK(&intr->syncpt[i].work, syncpt_thresh_cascade_fn);
- err = devm_request_irq(&host1x->dev->dev, intr->syncpt_irq,
syncpt_thresh_cascade_isr,
IRQF_SHARED, "host1x_syncpt", host1x);
- WARN_ON(IS_ERR_VALUE(err));
Do we really want to continue in this case?
+static void host1x_intr_set_syncpt_threshold(struct host1x_intr *intr,
- u32 id, u32 thresh)
+{
- struct host1x *host1x = intr_to_host1x(intr);
- host1x_sync_writel(host1x, thresh,
HOST1X_SYNC_SYNCPT_INT_THRESH_0 + id * REGISTER_STRIDE);
+}
Again, maybe defining the register stride as part of the register definition might be better. I think HOST1X_SYNC_SYNCPT_INT_THRESH(id) is easier to read.
+static void host1x_intr_enable_syncpt_intr(struct host1x_intr *intr, u32 id) +{
- struct host1x *host1x = intr_to_host1x(intr);
- host1x_sync_writel(host1x, BIT_MASK(id),
HOST1X_SYNC_SYNCPT_THRESH_INT_ENABLE_CPU0 +
BIT_WORD(id) * REGISTER_STRIDE);
+}
Same here.
+static void host1x_intr_disable_syncpt_intr(struct host1x_intr *intr, u32 id) +{
- struct host1x *host1x = intr_to_host1x(intr);
- host1x_sync_writel(host1x, BIT_MASK(id),
HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE +
BIT_WORD(id) * REGISTER_STRIDE);
- host1x_sync_writel(host1x, BIT_MASK(id),
HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS +
BIT_WORD(id) * REGISTER_STRIDE);
+}
And here.
diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
[...]
+#include "intr.h" +#include <linux/interrupt.h> +#include <linux/slab.h> +#include <linux/irq.h> +#include "dev.h"
More funky ordering of includes.
+int host1x_intr_add_action(struct host1x_intr *intr, u32 id, u32 thresh,
enum host1x_intr_action action, void *data,
void *_waiter,
void **ref)
Why do you pass in _waiter as void * and not struct host1x_waitlist *?
I think I've said this before. The interface doesn't seem optimal to me here. Passing in an enumeration to choose which action to perform looks difficult to work with (not to mention the symbols are rather long and therefore result in ugly code).
Maybe doing this by passing around a pointer to a handler function would be nicer. However since I haven't really used this yet, I can't really tell. So maybe we should just merge the implementation as-is for now. We can always clean it up later.
+void *host1x_intr_alloc_waiter(void) +{
- return kzalloc(sizeof(struct host1x_waitlist), GFP_KERNEL);
+}
I'm not sure why this is separate from host1x_syncpt_wait() since it is only used inside that function and the waiter returned never leaves the scope of that function, so it might be better to allocate it directly in host1x_syncpt_wait() instead.
Actually, it looks like the waiter doesn't ever leave scope, so you may even want to allocate it on the stack.
+void host1x_intr_put_ref(struct host1x_intr *intr, u32 id, void *ref)
Here again, you pass in the waiter via a void *. Why's that?
+int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync)
Maybe you should keep the type of the irq_sync here so that it properly propagates to the call to devm_request_irq().
+{
- unsigned int id;
- struct host1x *host1x = intr_to_host1x(intr);
- u32 nb_pts = host1x_syncpt_nb_pts(host1x);
- intr->syncpt = devm_kzalloc(&host1x->dev->dev,
sizeof(struct host1x_intr_syncpt) *
host1x->info.nb_pts,
GFP_KERNEL);
- if (!host1x->intr.syncpt)
The above blank line isn't necessary.
+void host1x_intr_stop(struct host1x_intr *intr) +{
- unsigned int id;
- struct host1x *host1x = intr_to_host1x(intr);
- struct host1x_intr_syncpt *syncpt;
- u32 nb_pts = host1x_syncpt_nb_pts(intr_to_host1x(intr));
- mutex_lock(&intr->mutex);
- host1x->intr_op.disable_all_syncpt_intrs(intr);
I haven't commented on this everywhere, but I think this could benefit from a wrapper that forwards this to the intr_op. The same goes for the sync_op.
- for (id = 0, syncpt = intr->syncpt;
id < nb_pts;
++id, ++syncpt) {
I don't think you need to explicitly keep track of syncpt within the for statement. Instead you could either index intr->syncpt directly or obtain a reference within the loop. It allows the for statement to be written much more canonically.
diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h
[...]
+#define intr_syncpt_to_intr(is) (is->intr)
This one doesn't buy you anything. It actually uses up more characters so you can just drop it.
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
[...]
@@ -119,6 +122,166 @@ void host1x_syncpt_incr(struct host1x_syncpt *sp) host1x_syncpt_cpu_incr(sp); }
+/*
- Updated sync point form hardware, and returns true if syncpoint is expired,
- false if we may need to wait
- */
+static bool syncpt_load_min_is_expired(
- struct host1x_syncpt *sp,
- u32 thresh)
This can all go on one line.
+/*
- Main entrypoint for syncpoint value waits.
- */
+int host1x_syncpt_wait(struct host1x_syncpt *sp,
u32 thresh, long timeout, u32 *value)
+{
[...]
+} +EXPORT_SYMBOL(host1x_syncpt_wait);
This doesn't only seem to be the main entrypoint, but it's basically the only way to currently wait for syncpoints. One actual use-case where this might turn out to be a problem is video capturing. The problem is that using this API you can't very well asynchronously capture frames. So eventually I think we need a way to allow a generic handler to be attached to syncpoints so that you can have this handler continuously invoked after each frame is captured and just pass the buffer back to userspace.
+bool host1x_syncpt_is_expired(
- struct host1x_syncpt *sp,
- u32 thresh)
This can go on one line.
Thierry
On 04.02.2013 02:30, Thierry Reding wrote:
On Tue, Jan 15, 2013 at 01:43:58PM +0200, Terje Bergstrom wrote: [...]
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
[...]
@@ -95,7 +96,6 @@ static int host1x_probe(struct platform_device *dev)
/* set common host1x device data */ platform_set_drvdata(dev, host);
host->regs = devm_request_and_ioremap(&dev->dev, regs); if (!host->regs) { dev_err(&dev->dev, "failed to remap host registers\n");
This seems an unrelated (and actually undesirable) change.
@@ -109,28 +109,40 @@ static int host1x_probe(struct platform_device *dev) }
err = host1x_syncpt_init(host);
if (err)
if (err) {
dev_err(&dev->dev, "failed to init sync points"); return err;
}
This error message should probably have gone in the previous patch as well.
Oops, will move these to previous patch. I'm pretty sure I already fixed this once. :-(
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h index d8f5979..8376092 100644 --- a/drivers/gpu/host1x/dev.h +++ b/drivers/gpu/host1x/dev.h @@ -17,11 +17,12 @@ #ifndef HOST1X_DEV_H #define HOST1X_DEV_H
+#include <linux/platform_device.h> #include "syncpt.h" +#include "intr.h"
struct host1x; struct host1x_syncpt; -struct platform_device;
Why include platform_device.h here?
host1x_get_host() actually needs that, so this #include should've also been in previous patch.
@@ -34,6 +35,18 @@ struct host1x_syncpt_ops { const char * (*name)(struct host1x_syncpt *); };
+struct host1x_intr_ops {
void (*init_host_sync)(struct host1x_intr *);
void (*set_host_clocks_per_usec)(
struct host1x_intr *, u32 clocks);
Could the above two not be combined? The only reason to keep them separate would be if the host1x clock was dynamically changed, but I don't think we support that, right?
I've left this as a placeholder to at some point start supporting host1x clock scaling. But I don't think we're going to do that for a while, so I could merge them.
void (*set_syncpt_threshold)(
struct host1x_intr *, u32 id, u32 thresh);
void (*enable_syncpt_intr)(struct host1x_intr *, u32 id);
void (*disable_syncpt_intr)(struct host1x_intr *, u32 id);
void (*disable_all_syncpt_intrs)(struct host1x_intr *);
Can disable_all_syncpt_intrs() not be implemented generically using the number of syncpoints as exposed by host1x_device_info and the .disable_syncpt_intr() function?
disable_all_syncpt_intrs() disables all interrupts in one write (or one per 32 sync points), so it's more efficient.
@@ -46,11 +59,13 @@ struct host1x_device_info { struct host1x { void __iomem *regs; struct host1x_syncpt *syncpt;
struct host1x_intr intr; struct platform_device *dev; struct host1x_device_info info; struct clk *clk; struct host1x_syncpt_ops syncpt_op;
struct host1x_intr_ops intr_op;
I think carrying a const pointer to the interrupt operations structure is a better option here.
Ok.
diff --git a/drivers/gpu/host1x/hw/intr_hw.c b/drivers/gpu/host1x/hw/intr_hw.c
[...]
+static void host1x_intr_syncpt_thresh_isr(struct host1x_intr_syncpt *syncpt);
Can we avoid this forward declaration?
I think we can, if I just move the isr to top of file.
+static void syncpt_thresh_cascade_fn(struct work_struct *work)
syncpt_thresh_work()?
Sounds good.
+{
struct host1x_intr_syncpt *sp =
container_of(work, struct host1x_intr_syncpt, work);
host1x_syncpt_thresh_fn(sp);
Couldn't we inline the host1x_syncpt_thresh_fn() implementation here? Why do we need to go through an external function declaration?
If I move syncpt_thresh_work() to intr.c from intr_hw.c, I could do that. That'd simplify the interrupt path.
+static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id) +{
struct host1x *host1x = dev_id;
struct host1x_intr *intr = &host1x->intr;
unsigned long reg;
int i, id;
for (i = 0; i < host1x->info.nb_pts / BITS_PER_LONG; i++) {
reg = host1x_sync_readl(host1x,
HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS +
i * REGISTER_STRIDE);
for_each_set_bit(id, ®, BITS_PER_LONG) {
struct host1x_intr_syncpt *sp =
intr->syncpt + (i * BITS_PER_LONG + id);
host1x_intr_syncpt_thresh_isr(sp);
Have you considered mimicking the IRQ API and name this something like host1x_intr_syncpt_thresh_handle() and name the actual ISR just syncpt_thresh_isr()? Not so important but it makes things a bit clearer in my opinion.
This gets a bit confusing, because we have an ISR that calls a function that is also called ISR. I've kept "isr" in names of both to emphasize that this is running in interrupt context. I'm open to renaming these to make it clearer.
Did you refer to chained IRQ handler in linux/irq.h when you mentioned IRQ API as reference for naming?
queue_work(intr->wq, &sp->work);
Should the call to queue_work() perhaps be moved into host1x_intr_syncpt_thresh_isr().
I'm not sure, either way would be ok to me. The current structure allows host1x_intr_syncpt_thresh_isr() to only take one parameter (host1x_intr_syncpt). If we move queue_work, we'd also need to pass host1x_intr.
+static void host1x_intr_init_host_sync(struct host1x_intr *intr) +{
struct host1x *host1x = intr_to_host1x(intr);
int i, err;
host1x_sync_writel(host1x, 0xffffffffUL,
HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE);
host1x_sync_writel(host1x, 0xffffffffUL,
HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS);
for (i = 0; i < host1x->info.nb_pts; i++)
INIT_WORK(&intr->syncpt[i].work, syncpt_thresh_cascade_fn);
err = devm_request_irq(&host1x->dev->dev, intr->syncpt_irq,
syncpt_thresh_cascade_isr,
IRQF_SHARED, "host1x_syncpt", host1x);
WARN_ON(IS_ERR_VALUE(err));
Do we really want to continue in this case?
Hmm, we'd need to actually return an error code. There's not much the driver can do without syncpt interrupts.
+static void host1x_intr_set_syncpt_threshold(struct host1x_intr *intr,
u32 id, u32 thresh)
+{
struct host1x *host1x = intr_to_host1x(intr);
host1x_sync_writel(host1x, thresh,
HOST1X_SYNC_SYNCPT_INT_THRESH_0 + id * REGISTER_STRIDE);
+}
Again, maybe defining the register stride as part of the register definition might be better. I think HOST1X_SYNC_SYNCPT_INT_THRESH(id) is easier to read.
Sounds good.
+static void host1x_intr_enable_syncpt_intr(struct host1x_intr *intr, u32 id) +{
struct host1x *host1x = intr_to_host1x(intr);
host1x_sync_writel(host1x, BIT_MASK(id),
HOST1X_SYNC_SYNCPT_THRESH_INT_ENABLE_CPU0 +
BIT_WORD(id) * REGISTER_STRIDE);
+}
Same here.
Yep.
+static void host1x_intr_disable_syncpt_intr(struct host1x_intr *intr, u32 id) +{
struct host1x *host1x = intr_to_host1x(intr);
host1x_sync_writel(host1x, BIT_MASK(id),
HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE +
BIT_WORD(id) * REGISTER_STRIDE);
host1x_sync_writel(host1x, BIT_MASK(id),
HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS +
BIT_WORD(id) * REGISTER_STRIDE);
+}
And here.
Yep.
diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
[...]
+#include "intr.h" +#include <linux/interrupt.h> +#include <linux/slab.h> +#include <linux/irq.h> +#include "dev.h"
More funky ordering of includes.
Will fix.
+int host1x_intr_add_action(struct host1x_intr *intr, u32 id, u32 thresh,
enum host1x_intr_action action, void *data,
void *_waiter,
void **ref)
Why do you pass in _waiter as void * and not struct host1x_waitlist *?
struct host1x_waitlist is defined inside intr.c, so I've chosen to pass void *. I could naturally just forward declare host1x_waitlist in intr.h and change the allocation and add_action to use that.
I think I've said this before. The interface doesn't seem optimal to me here. Passing in an enumeration to choose which action to perform looks difficult to work with (not to mention the symbols are rather long and therefore result in ugly code).
Maybe doing this by passing around a pointer to a handler function would be nicer. However since I haven't really used this yet, I can't really tell. So maybe we should just merge the implementation as-is for now. We can always clean it up later.
We're using the enum also to index into arrays. We do it so that we can remove all the completed waiters from the wait_head, and insert them into lists per action type. This way we can run all actions in priority order: first action_submit_complete, then action_wakeup, and then action_wakeup_interruptible.
Now, we're recently noticed that the priority order is actually wrong. The first priority should be to wake up non-interruptible tasks, then interruptible tasks. Cleaning up memory of completed submits should be lower priority.
I've considered this part as something private to host1x driver and it's not really meant to be called f.ex. from DRM. But, as you seem to have a need to have an asynchronous wait for a fence, we'd need to figure something out for that.
+void *host1x_intr_alloc_waiter(void) +{
return kzalloc(sizeof(struct host1x_waitlist), GFP_KERNEL);
+}
I'm not sure why this is separate from host1x_syncpt_wait() since it is only used inside that function and the waiter returned never leaves the scope of that function, so it might be better to allocate it directly in host1x_syncpt_wait() instead.
Actually, it looks like the waiter doesn't ever leave scope, so you may even want to allocate it on the stack.
In patch 3, at submit time we first allocate waiter, then take submit_lock, write submit to channel, and add the waiter while having the lock. I did this so that I host1x_intr_add_action() can always succeed. Otherwise I'd need to write another code path to handle the case where we wrote a job to channel, but we're not able to add a submit_complete action to it.
+void host1x_intr_put_ref(struct host1x_intr *intr, u32 id, void *ref)
Here again, you pass in the waiter via a void *. Why's that?
host1x_waitlist is hidden inside intr.c.
+int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync)
Maybe you should keep the type of the irq_sync here so that it properly propagates to the call to devm_request_irq().
I'm not sure what you mean. Do you mean that I should use unsigned int, as that's the type used in devm_request_irq()?
+{
unsigned int id;
struct host1x *host1x = intr_to_host1x(intr);
u32 nb_pts = host1x_syncpt_nb_pts(host1x);
intr->syncpt = devm_kzalloc(&host1x->dev->dev,
sizeof(struct host1x_intr_syncpt) *
host1x->info.nb_pts,
GFP_KERNEL);
if (!host1x->intr.syncpt)
The above blank line isn't necessary.
Will remove.
+void host1x_intr_stop(struct host1x_intr *intr) +{
unsigned int id;
struct host1x *host1x = intr_to_host1x(intr);
struct host1x_intr_syncpt *syncpt;
u32 nb_pts = host1x_syncpt_nb_pts(intr_to_host1x(intr));
mutex_lock(&intr->mutex);
host1x->intr_op.disable_all_syncpt_intrs(intr);
I haven't commented on this everywhere, but I think this could benefit from a wrapper that forwards this to the intr_op. The same goes for the sync_op.
You mean something like "host1x_disable_all_syncpt_intrs"?
for (id = 0, syncpt = intr->syncpt;
id < nb_pts;
++id, ++syncpt) {
I don't think you need to explicitly keep track of syncpt within the for statement. Instead you could either index intr->syncpt directly or obtain a reference within the loop. It allows the for statement to be written much more canonically.
Yep, will do.
diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h
[...]
+#define intr_syncpt_to_intr(is) (is->intr)
This one doesn't buy you anything. It actually uses up more characters so you can just drop it.
True, it's useless. I'll remove.
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
[...]
@@ -119,6 +122,166 @@ void host1x_syncpt_incr(struct host1x_syncpt *sp) host1x_syncpt_cpu_incr(sp); }
+/*
- Updated sync point form hardware, and returns true if syncpoint is expired,
- false if we may need to wait
- */
+static bool syncpt_load_min_is_expired(
struct host1x_syncpt *sp,
u32 thresh)
This can all go on one line.
Ok.
+/*
- Main entrypoint for syncpoint value waits.
- */
+int host1x_syncpt_wait(struct host1x_syncpt *sp,
u32 thresh, long timeout, u32 *value)
+{
[...]
+} +EXPORT_SYMBOL(host1x_syncpt_wait);
This doesn't only seem to be the main entrypoint, but it's basically the only way to currently wait for syncpoints. One actual use-case where this might turn out to be a problem is video capturing. The problem is that using this API you can't very well asynchronously capture frames. So eventually I think we need a way to allow a generic handler to be attached to syncpoints so that you can have this handler continuously invoked after each frame is captured and just pass the buffer back to userspace.
Yep, so far all asynchronous waits have been done in user space. We would probably allow attaching a handler to a syncpt value, so that we'd call that handler once a value is reached. In effect, similar to a wake_up event that is now added via host1x_intr_add_action, but simpler. That'd mean that the handler needs to be re-added after each frame.
We could also add the handler as persistent if re-adding would be a problem. That'd require some new wiring and I'll have to think how to implement that.
+bool host1x_syncpt_is_expired(
struct host1x_syncpt *sp,
u32 thresh)
This can go on one line.
Will join.
Terje
On Mon, Feb 04, 2013 at 08:29:08PM -0800, Terje Bergström wrote:
On 04.02.2013 02:30, Thierry Reding wrote:
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h index d8f5979..8376092 100644 --- a/drivers/gpu/host1x/dev.h +++ b/drivers/gpu/host1x/dev.h @@ -17,11 +17,12 @@ #ifndef HOST1X_DEV_H #define HOST1X_DEV_H
+#include <linux/platform_device.h> #include "syncpt.h" +#include "intr.h"
struct host1x; struct host1x_syncpt; -struct platform_device;
Why include platform_device.h here?
host1x_get_host() actually needs that, so this #include should've also been in previous patch.
No need to if you pass struct device * instead. You might need linux/device.h instead, though.
void (*set_syncpt_threshold)(
struct host1x_intr *, u32 id, u32 thresh);
void (*enable_syncpt_intr)(struct host1x_intr *, u32 id);
void (*disable_syncpt_intr)(struct host1x_intr *, u32 id);
void (*disable_all_syncpt_intrs)(struct host1x_intr *);
Can disable_all_syncpt_intrs() not be implemented generically using the number of syncpoints as exposed by host1x_device_info and the .disable_syncpt_intr() function?
disable_all_syncpt_intrs() disables all interrupts in one write (or one per 32 sync points), so it's more efficient.
Yes, I noticed that and failed to remove this comment.
+{
struct host1x_intr_syncpt *sp =
container_of(work, struct host1x_intr_syncpt, work);
host1x_syncpt_thresh_fn(sp);
Couldn't we inline the host1x_syncpt_thresh_fn() implementation here? Why do we need to go through an external function declaration?
If I move syncpt_thresh_work() to intr.c from intr_hw.c, I could do that. That'd simplify the interrupt path.
I like simplification. =)
+static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id) +{
struct host1x *host1x = dev_id;
struct host1x_intr *intr = &host1x->intr;
unsigned long reg;
int i, id;
for (i = 0; i < host1x->info.nb_pts / BITS_PER_LONG; i++) {
reg = host1x_sync_readl(host1x,
HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS +
i * REGISTER_STRIDE);
for_each_set_bit(id, ®, BITS_PER_LONG) {
struct host1x_intr_syncpt *sp =
intr->syncpt + (i * BITS_PER_LONG + id);
host1x_intr_syncpt_thresh_isr(sp);
Have you considered mimicking the IRQ API and name this something like host1x_intr_syncpt_thresh_handle() and name the actual ISR just syncpt_thresh_isr()? Not so important but it makes things a bit clearer in my opinion.
This gets a bit confusing, because we have an ISR that calls a function that is also called ISR. I've kept "isr" in names of both to emphasize that this is running in interrupt context. I'm open to renaming these to make it clearer.
Did you refer to chained IRQ handler in linux/irq.h when you mentioned IRQ API as reference for naming?
What I had in mind was more along the lines of kernel/irq/chip.c, which has a bunch of handlers for various types of interrupts, such as handle_nested_irq() or handle_simple_irq().
Hence my proposal to rename host1x_intr_syncpt_thresh_isr() to host1x_intr_syncpt_handle() because it handles the interrupt from a single syncpoint and syncpt_thresh_cascade_isr() to syncpt_thresh_isr() to keep it shorter.
Another variant would be host1x_syncpt_irq() for the top-level handler and something host1x_handle_syncpt() to handle individual syncpoints. I like this one best, but this is pure bike-shedding and there's nothing technically wrong with the names you chose, so I can't really object if you want to stick to them.
queue_work(intr->wq, &sp->work);
Should the call to queue_work() perhaps be moved into host1x_intr_syncpt_thresh_isr().
I'm not sure, either way would be ok to me. The current structure allows host1x_intr_syncpt_thresh_isr() to only take one parameter (host1x_intr_syncpt). If we move queue_work, we'd also need to pass host1x_intr.
I think I'd still prefer to have all the code in one function because it make subsequent modification easier and less error-prone.
+static void host1x_intr_init_host_sync(struct host1x_intr *intr) +{
struct host1x *host1x = intr_to_host1x(intr);
int i, err;
host1x_sync_writel(host1x, 0xffffffffUL,
HOST1X_SYNC_SYNCPT_THRESH_INT_DISABLE);
host1x_sync_writel(host1x, 0xffffffffUL,
HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS);
for (i = 0; i < host1x->info.nb_pts; i++)
INIT_WORK(&intr->syncpt[i].work, syncpt_thresh_cascade_fn);
err = devm_request_irq(&host1x->dev->dev, intr->syncpt_irq,
syncpt_thresh_cascade_isr,
IRQF_SHARED, "host1x_syncpt", host1x);
WARN_ON(IS_ERR_VALUE(err));
Do we really want to continue in this case?
Hmm, we'd need to actually return an error code. There's not much the driver can do without syncpt interrupts.
Yeah, in that case I think we should bail out. It's not like we're expecting any failures. If the interrupt cannot be requested, something must seriously be wrong and we should tell users about it so that it can be fixed. Trying to continue on a best effort basis isn't useful here, I think.
+int host1x_intr_add_action(struct host1x_intr *intr, u32 id, u32 thresh,
enum host1x_intr_action action, void *data,
void *_waiter,
void **ref)
Why do you pass in _waiter as void * and not struct host1x_waitlist *?
struct host1x_waitlist is defined inside intr.c, so I've chosen to pass void *. I could naturally just forward declare host1x_waitlist in intr.h and change the allocation and add_action to use that.
Yes, that's definitely better.
I think I've said this before. The interface doesn't seem optimal to me here. Passing in an enumeration to choose which action to perform looks difficult to work with (not to mention the symbols are rather long and therefore result in ugly code).
Maybe doing this by passing around a pointer to a handler function would be nicer. However since I haven't really used this yet, I can't really tell. So maybe we should just merge the implementation as-is for now. We can always clean it up later.
We're using the enum also to index into arrays. We do it so that we can remove all the completed waiters from the wait_head, and insert them into lists per action type. This way we can run all actions in priority order: first action_submit_complete, then action_wakeup, and then action_wakeup_interruptible.
Now, we're recently noticed that the priority order is actually wrong. The first priority should be to wake up non-interruptible tasks, then interruptible tasks. Cleaning up memory of completed submits should be lower priority.
I've considered this part as something private to host1x driver and it's not really meant to be called f.ex. from DRM. But, as you seem to have a need to have an asynchronous wait for a fence, we'd need to figure something out for that.
Okay, let's keep it as-is for now and see how it can be improved later when we have an actual use-case for using it externally.
+void *host1x_intr_alloc_waiter(void) +{
return kzalloc(sizeof(struct host1x_waitlist), GFP_KERNEL);
+}
I'm not sure why this is separate from host1x_syncpt_wait() since it is only used inside that function and the waiter returned never leaves the scope of that function, so it might be better to allocate it directly in host1x_syncpt_wait() instead.
Actually, it looks like the waiter doesn't ever leave scope, so you may even want to allocate it on the stack.
In patch 3, at submit time we first allocate waiter, then take submit_lock, write submit to channel, and add the waiter while having the lock. I did this so that I host1x_intr_add_action() can always succeed. Otherwise I'd need to write another code path to handle the case where we wrote a job to channel, but we're not able to add a submit_complete action to it.
Okay. In that case why not allocate it on the stack in the first place so you don't have to bother with allocations (and potential failure) at all? The variable doesn't leave the function scope, so there shouldn't be any issues, right?
Or if that doesn't work it would still be preferable to allocate memory in host1x_syncpt_wait() directly instead of going through the wrapper.
+void host1x_intr_put_ref(struct host1x_intr *intr, u32 id, void *ref)
Here again, you pass in the waiter via a void *. Why's that?
host1x_waitlist is hidden inside intr.c.
I don't think that's necessary here. I'd rather have the compiler check for types rather than hide the structure.
+int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync)
Maybe you should keep the type of the irq_sync here so that it properly propagates to the call to devm_request_irq().
I'm not sure what you mean. Do you mean that I should use unsigned int, as that's the type used in devm_request_irq()?
Yes.
+void host1x_intr_stop(struct host1x_intr *intr) +{
unsigned int id;
struct host1x *host1x = intr_to_host1x(intr);
struct host1x_intr_syncpt *syncpt;
u32 nb_pts = host1x_syncpt_nb_pts(intr_to_host1x(intr));
mutex_lock(&intr->mutex);
host1x->intr_op.disable_all_syncpt_intrs(intr);
I haven't commented on this everywhere, but I think this could benefit from a wrapper that forwards this to the intr_op. The same goes for the sync_op.
You mean something like "host1x_disable_all_syncpt_intrs"?
Yes. I think that'd be useful for each of the op functions. Perhaps you could even pass in a struct host1x * to make calls more uniform.
+/*
- Main entrypoint for syncpoint value waits.
- */
+int host1x_syncpt_wait(struct host1x_syncpt *sp,
u32 thresh, long timeout, u32 *value)
+{
[...]
+} +EXPORT_SYMBOL(host1x_syncpt_wait);
This doesn't only seem to be the main entrypoint, but it's basically the only way to currently wait for syncpoints. One actual use-case where this might turn out to be a problem is video capturing. The problem is that using this API you can't very well asynchronously capture frames. So eventually I think we need a way to allow a generic handler to be attached to syncpoints so that you can have this handler continuously invoked after each frame is captured and just pass the buffer back to userspace.
Yep, so far all asynchronous waits have been done in user space. We would probably allow attaching a handler to a syncpt value, so that we'd call that handler once a value is reached. In effect, similar to a wake_up event that is now added via host1x_intr_add_action, but simpler. That'd mean that the handler needs to be re-added after each frame.
We could also add the handler as persistent if re-adding would be a problem. That'd require some new wiring and I'll have to think how to implement that.
Yes, that sounds like what I had in mind. Again, no need to worry about it now. We can cross that bridge when we come to it.
Thierry
On 05.02.2013 00:42, Thierry Reding wrote:
On Mon, Feb 04, 2013 at 08:29:08PM -0800, Terje Bergström wrote:
host1x_get_host() actually needs that, so this #include should've also been in previous patch.
No need to if you pass struct device * instead. You might need linux/device.h instead, though.
Can do.
Another variant would be host1x_syncpt_irq() for the top-level handler and something host1x_handle_syncpt() to handle individual syncpoints. I like this one best, but this is pure bike-shedding and there's nothing technically wrong with the names you chose, so I can't really object if you want to stick to them.
I could use these names. They sound logical to me,too.
queue_work(intr->wq, &sp->work);
Should the call to queue_work() perhaps be moved into host1x_intr_syncpt_thresh_isr().
I'm not sure, either way would be ok to me. The current structure allows host1x_intr_syncpt_thresh_isr() to only take one parameter (host1x_intr_syncpt). If we move queue_work, we'd also need to pass host1x_intr.
I think I'd still prefer to have all the code in one function because it make subsequent modification easier and less error-prone.
Ok, I'll do that change.
Yeah, in that case I think we should bail out. It's not like we're expecting any failures. If the interrupt cannot be requested, something must seriously be wrong and we should tell users about it so that it can be fixed. Trying to continue on a best effort basis isn't useful here, I think.
Yep, I agree.
In patch 3, at submit time we first allocate waiter, then take submit_lock, write submit to channel, and add the waiter while having the lock. I did this so that I host1x_intr_add_action() can always succeed. Otherwise I'd need to write another code path to handle the case where we wrote a job to channel, but we're not able to add a submit_complete action to it.
Okay. In that case why not allocate it on the stack in the first place so you don't have to bother with allocations (and potential failure) at all? The variable doesn't leave the function scope, so there shouldn't be any issues, right?
The submit code in patch 3 allocates a waiter, and the waiter outlives the function scope. That waiter will clean up job queue once a job is complete.
Or if that doesn't work it would still be preferable to allocate memory in host1x_syncpt_wait() directly instead of going through the wrapper.
This was done purely, because I'm hiding the struct size from the caller. If the caller needs to allocate, I need to expose the struct in a header, not just a forward declaration.
+int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync)
Maybe you should keep the type of the irq_sync here so that it properly propagates to the call to devm_request_irq().
I'm not sure what you mean. Do you mean that I should use unsigned int, as that's the type used in devm_request_irq()?
Yes.
Ok, will do.
+void host1x_intr_stop(struct host1x_intr *intr) +{
unsigned int id;
struct host1x *host1x = intr_to_host1x(intr);
struct host1x_intr_syncpt *syncpt;
u32 nb_pts = host1x_syncpt_nb_pts(intr_to_host1x(intr));
mutex_lock(&intr->mutex);
host1x->intr_op.disable_all_syncpt_intrs(intr);
I haven't commented on this everywhere, but I think this could benefit from a wrapper that forwards this to the intr_op. The same goes for the sync_op.
You mean something like "host1x_disable_all_syncpt_intrs"?
Yes. I think that'd be useful for each of the op functions. Perhaps you could even pass in a struct host1x * to make calls more uniform.
Ok, I'll add the wrapper, and I'll check if passing struct host1x * would make sense. In effect that'd render struct host1x_intr mostly unused, so how about if we just merge the contents of host1x_intr to host1x?
Terje
On Wed, Feb 06, 2013 at 12:29:26PM -0800, Terje Bergström wrote:
On 05.02.2013 00:42, Thierry Reding wrote:
[...]
Or if that doesn't work it would still be preferable to allocate memory in host1x_syncpt_wait() directly instead of going through the wrapper.
This was done purely, because I'm hiding the struct size from the caller. If the caller needs to allocate, I need to expose the struct in a header, not just a forward declaration.
I don't think we need to hide the struct from the caller. This is all host1x internal. Even if a host1x client uses the struct it makes little sense to hide it. They are all part of the same code base so there's not much to be gained by hiding the structure definition.
+void host1x_intr_stop(struct host1x_intr *intr) +{
unsigned int id;
struct host1x *host1x = intr_to_host1x(intr);
struct host1x_intr_syncpt *syncpt;
u32 nb_pts = host1x_syncpt_nb_pts(intr_to_host1x(intr));
mutex_lock(&intr->mutex);
host1x->intr_op.disable_all_syncpt_intrs(intr);
I haven't commented on this everywhere, but I think this could benefit from a wrapper that forwards this to the intr_op. The same goes for the sync_op.
You mean something like "host1x_disable_all_syncpt_intrs"?
Yes. I think that'd be useful for each of the op functions. Perhaps you could even pass in a struct host1x * to make calls more uniform.
Ok, I'll add the wrapper, and I'll check if passing struct host1x * would make sense. In effect that'd render struct host1x_intr mostly unused, so how about if we just merge the contents of host1x_intr to host1x?
We can probably do that. It might make some sense to keep it in order to scope the related fields but struct host1x isn't very large yet, so I think omitting host1x_intr should be fine.
Thierry
On 06.02.2013 12:38, Thierry Reding wrote:
On Wed, Feb 06, 2013 at 12:29:26PM -0800, Terje Bergström wrote:
This was done purely, because I'm hiding the struct size from the caller. If the caller needs to allocate, I need to expose the struct in a header, not just a forward declaration.
I don't think we need to hide the struct from the caller. This is all host1x internal. Even if a host1x client uses the struct it makes little sense to hide it. They are all part of the same code base so there's not much to be gained by hiding the structure definition.
I agree, and will change.
Ok, I'll add the wrapper, and I'll check if passing struct host1x * would make sense. In effect that'd render struct host1x_intr mostly unused, so how about if we just merge the contents of host1x_intr to host1x?
We can probably do that. It might make some sense to keep it in order to scope the related fields but struct host1x isn't very large yet, so I think omitting host1x_intr should be fine.
Yes, it's not very large, and it'd remove a lot of casting between host1x and host1x_intr, so I'll just do that.
Terje
Add support for host1x client modules, and host1x channels to submit work to the clients. The work is submitted in GEM CMA buffers, so this patch adds support for them.
Signed-off-by: Terje Bergstrom tbergstrom@nvidia.com --- drivers/gpu/host1x/Kconfig | 25 +- drivers/gpu/host1x/Makefile | 5 + drivers/gpu/host1x/cdma.c | 439 +++++++++++++++++++ drivers/gpu/host1x/cdma.h | 107 +++++ drivers/gpu/host1x/channel.c | 140 ++++++ drivers/gpu/host1x/channel.h | 58 +++ drivers/gpu/host1x/cma.c | 116 +++++ drivers/gpu/host1x/cma.h | 43 ++ drivers/gpu/host1x/dev.c | 13 + drivers/gpu/host1x/dev.h | 59 +++ drivers/gpu/host1x/host1x.h | 29 ++ drivers/gpu/host1x/hw/cdma_hw.c | 475 +++++++++++++++++++++ drivers/gpu/host1x/hw/cdma_hw.h | 37 ++ drivers/gpu/host1x/hw/channel_hw.c | 148 +++++++ drivers/gpu/host1x/hw/host1x01.c | 6 + drivers/gpu/host1x/hw/host1x01_hardware.h | 124 ++++++ drivers/gpu/host1x/hw/hw_host1x01_channel.h | 102 +++++ drivers/gpu/host1x/hw/hw_host1x01_sync.h | 12 + drivers/gpu/host1x/hw/hw_host1x01_uclass.h | 168 ++++++++ drivers/gpu/host1x/hw/syncpt_hw.c | 10 + drivers/gpu/host1x/intr.c | 29 +- drivers/gpu/host1x/intr.h | 6 + drivers/gpu/host1x/job.c | 612 +++++++++++++++++++++++++++ drivers/gpu/host1x/job.h | 164 +++++++ drivers/gpu/host1x/memmgr.c | 173 ++++++++ drivers/gpu/host1x/memmgr.h | 72 ++++ drivers/gpu/host1x/syncpt.c | 11 + drivers/gpu/host1x/syncpt.h | 4 + include/trace/events/host1x.h | 211 +++++++++ 29 files changed, 3396 insertions(+), 2 deletions(-) create mode 100644 drivers/gpu/host1x/cdma.c create mode 100644 drivers/gpu/host1x/cdma.h create mode 100644 drivers/gpu/host1x/channel.c create mode 100644 drivers/gpu/host1x/channel.h create mode 100644 drivers/gpu/host1x/cma.c create mode 100644 drivers/gpu/host1x/cma.h create mode 100644 drivers/gpu/host1x/host1x.h create mode 100644 drivers/gpu/host1x/hw/cdma_hw.c create mode 100644 drivers/gpu/host1x/hw/cdma_hw.h create mode 100644 drivers/gpu/host1x/hw/channel_hw.c create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_channel.h create mode 100644 drivers/gpu/host1x/hw/hw_host1x01_uclass.h create mode 100644 drivers/gpu/host1x/job.c create mode 100644 drivers/gpu/host1x/job.h create mode 100644 drivers/gpu/host1x/memmgr.c create mode 100644 drivers/gpu/host1x/memmgr.h
diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig index e89fb2b..57680a6 100644 --- a/drivers/gpu/host1x/Kconfig +++ b/drivers/gpu/host1x/Kconfig @@ -3,4 +3,27 @@ config TEGRA_HOST1X help Driver for the Tegra host1x hardware.
- Required for enabling tegradrm. + Required for enabling tegradrm and 2D acceleration. + +if TEGRA_HOST1X + +config TEGRA_HOST1X_CMA + bool "Support DRM CMA buffers" + depends on DRM + default y + select DRM_GEM_CMA_HELPER + select DRM_KMS_CMA_HELPER + help + Say yes if you wish to use DRM CMA buffers. + + If unsure, choose Y. + +config TEGRA_HOST1X_FIREWALL + bool "Enable HOST1X security firewall" + default y + help + Say yes if kernel should protect command streams from tampering. + + If unsure, choose Y. + +endif diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile index 5ef47ff..cdd87c8 100644 --- a/drivers/gpu/host1x/Makefile +++ b/drivers/gpu/host1x/Makefile @@ -4,6 +4,11 @@ host1x-y = \ syncpt.o \ dev.o \ intr.o \ + cdma.o \ + channel.o \ + job.o \ + memmgr.o \ hw/host1x01.o
+host1x-$(CONFIG_TEGRA_HOST1X_CMA) += cma.o obj-$(CONFIG_TEGRA_HOST1X) += host1x.o diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c new file mode 100644 index 0000000..d6a38d2 --- /dev/null +++ b/drivers/gpu/host1x/cdma.c @@ -0,0 +1,439 @@ +/* + * Tegra host1x Command DMA + * + * Copyright (c) 2010-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include "cdma.h" +#include "channel.h" +#include "dev.h" +#include "memmgr.h" +#include "job.h" +#include <asm/cacheflush.h> + +#include <linux/slab.h> +#include <linux/kfifo.h> +#include <linux/interrupt.h> +#include <trace/events/host1x.h> + +#define TRACE_MAX_LENGTH 128U + +/* + * Add an entry to the sync queue. + */ +static void add_to_sync_queue(struct host1x_cdma *cdma, + struct host1x_job *job, + u32 nr_slots, + u32 first_get) +{ + if (job->syncpt_id == NVSYNCPT_INVALID) { + dev_warn(&job->ch->dev->dev, "%s: Invalid syncpt\n", + __func__); + return; + } + + job->first_get = first_get; + job->num_slots = nr_slots; + host1x_job_get(job); + list_add_tail(&job->list, &cdma->sync_queue); +} + +/* + * Return the status of the cdma's sync queue or push buffer for the given event + * - sq empty: returns 1 for empty, 0 for not empty (as in "1 empty queue" :-) + * - pb space: returns the number of free slots in the channel's push buffer + * Must be called with the cdma lock held. + */ +static unsigned int cdma_status_locked(struct host1x_cdma *cdma, + enum cdma_event event) +{ + struct host1x *host1x = cdma_to_host1x(cdma); + switch (event) { + case CDMA_EVENT_SYNC_QUEUE_EMPTY: + return list_empty(&cdma->sync_queue) ? 1 : 0; + case CDMA_EVENT_PUSH_BUFFER_SPACE: { + struct push_buffer *pb = &cdma->push_buffer; + return host1x->cdma_pb_op.space(pb); + } + default: + return 0; + } +} + +/* + * Sleep (if necessary) until the requested event happens + * - CDMA_EVENT_SYNC_QUEUE_EMPTY : sync queue is completely empty. + * - Returns 1 + * - CDMA_EVENT_PUSH_BUFFER_SPACE : there is space in the push buffer + * - Return the amount of space (> 0) + * Must be called with the cdma lock held. + */ +unsigned int host1x_cdma_wait_locked(struct host1x_cdma *cdma, + enum cdma_event event) +{ + for (;;) { + unsigned int space = cdma_status_locked(cdma, event); + if (space) + return space; + + trace_host1x_wait_cdma(cdma_to_channel(cdma)->dev->name, + event); + + /* If somebody has managed to already start waiting, yield */ + if (cdma->event != CDMA_EVENT_NONE) { + mutex_unlock(&cdma->lock); + schedule(); + mutex_lock(&cdma->lock); + continue; + } + cdma->event = event; + + mutex_unlock(&cdma->lock); + down(&cdma->sem); + mutex_lock(&cdma->lock); + } + return 0; +} + +/* + * Start timer for a buffer submition that has completed yet. + * Must be called with the cdma lock held. + */ +static void cdma_start_timer_locked(struct host1x_cdma *cdma, + struct host1x_job *job) +{ + struct host1x *host = cdma_to_host1x(cdma); + + if (cdma->timeout.clientid) { + /* timer already started */ + return; + } + + cdma->timeout.clientid = job->clientid; + cdma->timeout.syncpt = host1x_syncpt_get(host, job->syncpt_id); + cdma->timeout.syncpt_val = job->syncpt_end; + cdma->timeout.start_ktime = ktime_get(); + + schedule_delayed_work(&cdma->timeout.wq, + msecs_to_jiffies(job->timeout)); +} + +/* + * Stop timer when a buffer submition completes. + * Must be called with the cdma lock held. + */ +static void stop_cdma_timer_locked(struct host1x_cdma *cdma) +{ + cancel_delayed_work(&cdma->timeout.wq); + cdma->timeout.clientid = 0; +} + +/* + * For all sync queue entries that have already finished according to the + * current sync point registers: + * - unpin & unref their mems + * - pop their push buffer slots + * - remove them from the sync queue + * This is normally called from the host code's worker thread, but can be + * called manually if necessary. + * Must be called with the cdma lock held. + */ +static void update_cdma_locked(struct host1x_cdma *cdma) +{ + bool signal = false; + struct host1x *host1x = cdma_to_host1x(cdma); + struct host1x_job *job, *n; + + /* If CDMA is stopped, queue is cleared and we can return */ + if (!cdma->running) + return; + + /* + * Walk the sync queue, reading the sync point registers as necessary, + * to consume as many sync queue entries as possible without blocking + */ + list_for_each_entry_safe(job, n, &cdma->sync_queue, list) { + struct host1x_syncpt *sp = host1x->syncpt + job->syncpt_id; + + /* Check whether this syncpt has completed, and bail if not */ + if (!host1x_syncpt_is_expired(sp, job->syncpt_end)) { + /* Start timer on next pending syncpt */ + if (job->timeout) + cdma_start_timer_locked(cdma, job); + break; + } + + /* Cancel timeout, when a buffer completes */ + if (cdma->timeout.clientid) + stop_cdma_timer_locked(cdma); + + /* Unpin the memory */ + host1x_job_unpin(job); + + /* Pop push buffer slots */ + if (job->num_slots) { + struct push_buffer *pb = &cdma->push_buffer; + host1x->cdma_pb_op.pop_from(pb, job->num_slots); + if (cdma->event == CDMA_EVENT_PUSH_BUFFER_SPACE) + signal = true; + } + + list_del(&job->list); + host1x_job_put(job); + } + + if (list_empty(&cdma->sync_queue) && + cdma->event == CDMA_EVENT_SYNC_QUEUE_EMPTY) + signal = true; + + /* Wake up CdmaWait() if the requested event happened */ + if (signal) { + cdma->event = CDMA_EVENT_NONE; + up(&cdma->sem); + } +} + +void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma, + struct platform_device *dev) +{ + u32 get_restart; + u32 syncpt_incrs; + struct host1x_job *job = NULL; + u32 syncpt_val; + struct host1x *host1x = cdma_to_host1x(cdma); + + syncpt_val = host1x_syncpt_load_min(cdma->timeout.syncpt); + + dev_dbg(&dev->dev, + "%s: starting cleanup (thresh %d)\n", + __func__, syncpt_val); + + /* + * Move the sync_queue read pointer to the first entry that hasn't + * completed based on the current HW syncpt value. It's likely there + * won't be any (i.e. we're still at the head), but covers the case + * where a syncpt incr happens just prior/during the teardown. + */ + + dev_dbg(&dev->dev, + "%s: skip completed buffers still in sync_queue\n", + __func__); + + list_for_each_entry(job, &cdma->sync_queue, list) { + if (syncpt_val < job->syncpt_end) + break; + + host1x_job_dump(&dev->dev, job); + } + + /* + * Walk the sync_queue, first incrementing with the CPU syncpts that + * are partially executed (the first buffer) or fully skipped while + * still in the current context (slots are also NOP-ed). + * + * At the point contexts are interleaved, syncpt increments must be + * done inline with the pushbuffer from a GATHER buffer to maintain + * the order (slots are modified to be a GATHER of syncpt incrs). + * + * Note: save in get_restart the location where the timed out buffer + * started in the PB, so we can start the refetch from there (with the + * modified NOP-ed PB slots). This lets things appear to have completed + * properly for this buffer and resources are freed. + */ + + dev_dbg(&dev->dev, + "%s: perform CPU incr on pending same ctx buffers\n", + __func__); + + get_restart = cdma->last_put; + if (!list_empty(&cdma->sync_queue)) + get_restart = job->first_get; + + /* do CPU increments as long as this context continues */ + list_for_each_entry_from(job, &cdma->sync_queue, list) { + /* different context, gets us out of this loop */ + if (job->clientid != cdma->timeout.clientid) + break; + + /* won't need a timeout when replayed */ + job->timeout = 0; + + syncpt_incrs = job->syncpt_end - syncpt_val; + dev_dbg(&dev->dev, + "%s: CPU incr (%d)\n", __func__, syncpt_incrs); + + host1x_job_dump(&dev->dev, job); + + /* safe to use CPU to incr syncpts */ + host1x->cdma_op.timeout_cpu_incr(cdma, + job->first_get, + syncpt_incrs, + job->syncpt_end, + job->num_slots); + + syncpt_val += syncpt_incrs; + } + + list_for_each_entry_from(job, &cdma->sync_queue, list) + if (job->clientid == cdma->timeout.clientid) + job->timeout = 500; + + dev_dbg(&dev->dev, + "%s: finished sync_queue modification\n", __func__); + + /* roll back DMAGET and start up channel again */ + host1x->cdma_op.timeout_teardown_end(cdma, get_restart); +} + +/* + * Create a cdma + */ +int host1x_cdma_init(struct host1x_cdma *cdma) +{ + int err; + struct push_buffer *pb = &cdma->push_buffer; + struct host1x *host1x = cdma_to_host1x(cdma); + + mutex_init(&cdma->lock); + sema_init(&cdma->sem, 0); + + INIT_LIST_HEAD(&cdma->sync_queue); + + cdma->event = CDMA_EVENT_NONE; + cdma->running = false; + cdma->torndown = false; + + err = host1x->cdma_pb_op.init(pb); + if (err) + return err; + return 0; +} + +/* + * Destroy a cdma + */ +void host1x_cdma_deinit(struct host1x_cdma *cdma) +{ + struct push_buffer *pb = &cdma->push_buffer; + struct host1x *host1x = cdma_to_host1x(cdma); + + if (cdma->running) { + pr_warn("%s: CDMA still running\n", + __func__); + } else { + host1x->cdma_pb_op.destroy(pb); + host1x->cdma_op.timeout_destroy(cdma); + } +} + +/* + * Begin a cdma submit + */ +int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job) +{ + struct host1x *host1x = cdma_to_host1x(cdma); + + mutex_lock(&cdma->lock); + + if (job->timeout) { + /* init state on first submit with timeout value */ + if (!cdma->timeout.initialized) { + int err; + err = host1x->cdma_op.timeout_init(cdma, + job->syncpt_id); + if (err) { + mutex_unlock(&cdma->lock); + return err; + } + } + } + if (!cdma->running) + host1x->cdma_op.start(cdma); + + cdma->slots_free = 0; + cdma->slots_used = 0; + cdma->first_get = host1x->cdma_pb_op.putptr(&cdma->push_buffer); + + trace_host1x_cdma_begin(job->ch->dev->name); + return 0; +} + +/* + * Push two words into a push buffer slot + * Blocks as necessary if the push buffer is full. + */ +void host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2) +{ + host1x_cdma_push_gather(cdma, NULL, 0, op1, op2); +} + +/* + * Push two words into a push buffer slot + * Blocks as necessary if the push buffer is full. + */ +void host1x_cdma_push_gather(struct host1x_cdma *cdma, + struct mem_handle *handle, + u32 offset, u32 op1, u32 op2) +{ + struct host1x *host1x = cdma_to_host1x(cdma); + u32 slots_free = cdma->slots_free; + struct push_buffer *pb = &cdma->push_buffer; + + if (slots_free == 0) { + host1x->cdma_op.kick(cdma); + slots_free = host1x_cdma_wait_locked(cdma, + CDMA_EVENT_PUSH_BUFFER_SPACE); + } + cdma->slots_free = slots_free - 1; + cdma->slots_used++; + host1x->cdma_pb_op.push_to(pb, handle, op1, op2); +} + +/* + * End a cdma submit + * Kick off DMA, add job to the sync queue, and a number of slots to be freed + * from the pushbuffer. The handles for a submit must all be pinned at the same + * time, but they can be unpinned in smaller chunks. + */ +void host1x_cdma_end(struct host1x_cdma *cdma, + struct host1x_job *job) +{ + struct host1x *host1x = cdma_to_host1x(cdma); + bool was_idle = list_empty(&cdma->sync_queue); + + host1x->cdma_op.kick(cdma); + + add_to_sync_queue(cdma, + job, + cdma->slots_used, + cdma->first_get); + + /* start timer on idle -> active transitions */ + if (job->timeout && was_idle) + cdma_start_timer_locked(cdma, job); + + trace_host1x_cdma_end(job->ch->dev->name); + mutex_unlock(&cdma->lock); +} + +/* + * Update cdma state according to current sync point values + */ +void host1x_cdma_update(struct host1x_cdma *cdma) +{ + mutex_lock(&cdma->lock); + update_cdma_locked(cdma); + mutex_unlock(&cdma->lock); +} diff --git a/drivers/gpu/host1x/cdma.h b/drivers/gpu/host1x/cdma.h new file mode 100644 index 0000000..d9cabef --- /dev/null +++ b/drivers/gpu/host1x/cdma.h @@ -0,0 +1,107 @@ +/* + * Tegra host1x Command DMA + * + * Copyright (c) 2010-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef __HOST1X_CDMA_H +#define __HOST1X_CDMA_H + +#include <linux/sched.h> +#include <linux/semaphore.h> + +#include <linux/list.h> + +struct host1x_syncpt; +struct host1x_userctx_timeout; +struct host1x_job; +struct mem_handle; +struct platform_device; + +/* + * cdma + * + * This is in charge of a host command DMA channel. + * Sends ops to a push buffer, and takes responsibility for unpinning + * (& possibly freeing) of memory after those ops have completed. + * Producer: + * begin + * push - send ops to the push buffer + * end - start command DMA and enqueue handles to be unpinned + * Consumer: + * update - call to update sync queue and push buffer, unpin memory + */ + +struct push_buffer { + u32 *mapped; /* mapped pushbuffer memory */ + dma_addr_t phys; /* physical address of pushbuffer */ + u32 fence; /* index we've written */ + u32 cur; /* index to write to */ + struct mem_handle **handle; /* handle for each opcode pair */ +}; + +struct buffer_timeout { + struct delayed_work wq; /* work queue */ + bool initialized; /* timer one-time setup flag */ + struct host1x_syncpt *syncpt; /* buffer completion syncpt */ + u32 syncpt_val; /* syncpt value when completed */ + ktime_t start_ktime; /* starting time */ + /* context timeout information */ + int clientid; +}; + +enum cdma_event { + CDMA_EVENT_NONE, /* not waiting for any event */ + CDMA_EVENT_SYNC_QUEUE_EMPTY, /* wait for empty sync queue */ + CDMA_EVENT_PUSH_BUFFER_SPACE /* wait for space in push buffer */ +}; + +struct host1x_cdma { + struct mutex lock; /* controls access to shared state */ + struct semaphore sem; /* signalled when event occurs */ + enum cdma_event event; /* event that sem is waiting for */ + unsigned int slots_used; /* pb slots used in current submit */ + unsigned int slots_free; /* pb slots free in current submit */ + unsigned int first_get; /* DMAGET value, where submit begins */ + unsigned int last_put; /* last value written to DMAPUT */ + struct push_buffer push_buffer; /* channel's push buffer */ + struct list_head sync_queue; /* job queue */ + struct buffer_timeout timeout; /* channel's timeout state/wq */ + bool running; + bool torndown; +}; + +#define cdma_to_channel(cdma) container_of(cdma, struct host1x_channel, cdma) +#define cdma_to_host1x(cdma) host1x_get_host(cdma_to_channel(cdma)->dev) +#define cdma_to_memmgr(cdma) ((cdma_to_host1x(cdma))->memmgr) +#define pb_to_cdma(pb) container_of(pb, struct host1x_cdma, push_buffer) + +int host1x_cdma_init(struct host1x_cdma *cdma); +void host1x_cdma_deinit(struct host1x_cdma *cdma); +void host1x_cdma_stop(struct host1x_cdma *cdma); +int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job); +void host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2); +void host1x_cdma_push_gather(struct host1x_cdma *cdma, + struct mem_handle *handle, u32 offset, u32 op1, u32 op2); +void host1x_cdma_end(struct host1x_cdma *cdma, + struct host1x_job *job); +void host1x_cdma_update(struct host1x_cdma *cdma); +void host1x_cdma_peek(struct host1x_cdma *cdma, + u32 dmaget, int slot, u32 *out); +unsigned int host1x_cdma_wait_locked(struct host1x_cdma *cdma, + enum cdma_event event); +void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma, + struct platform_device *dev); +#endif diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c new file mode 100644 index 0000000..ff647ac --- /dev/null +++ b/drivers/gpu/host1x/channel.c @@ -0,0 +1,140 @@ +/* + * Tegra host1x Channel + * + * Copyright (c) 2010-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include "channel.h" +#include "dev.h" +#include "job.h" + +#include <linux/slab.h> +#include <linux/module.h> + +/* Constructor for the host1x device list */ +void host1x_channel_list_init(struct host1x *host1x) +{ + INIT_LIST_HEAD(&host1x->chlist.list); + mutex_init(&host1x->chlist_mutex); +} + +/* + * Iterator function for host1x device list + * It takes a fptr as an argument and calls that function for each + * device in the list + */ +void host1x_channel_for_all(struct host1x *host1x, void *data, + int (*fptr)(struct host1x_channel *ch, void *fdata)) +{ + struct host1x_channel *ch; + int ret; + + list_for_each_entry(ch, &host1x->chlist.list, list) { + if (ch && fptr) { + ret = fptr(ch, data); + if (ret) { + pr_info("%s: iterator error\n", __func__); + break; + } + } + } +} + + +int host1x_channel_submit(struct host1x_job *job) +{ + return host1x_get_host(job->ch->dev)->channel_op.submit(job); +} + +struct host1x_channel *host1x_channel_get(struct host1x_channel *ch) +{ + int err = 0; + + mutex_lock(&ch->reflock); + if (ch->refcount == 0) + err = host1x_cdma_init(&ch->cdma); + if (!err) + ch->refcount++; + + mutex_unlock(&ch->reflock); + + return err ? NULL : ch; +} + +void host1x_channel_put(struct host1x_channel *ch) +{ + mutex_lock(&ch->reflock); + if (ch->refcount == 1) { + host1x_get_host(ch->dev)->cdma_op.stop(&ch->cdma); + host1x_cdma_deinit(&ch->cdma); + } + ch->refcount--; + mutex_unlock(&ch->reflock); +} + +struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev) +{ + struct host1x_channel *ch = NULL; + struct host1x *host1x = host1x_get_host(pdev); + int chindex; + int max_channels = host1x->info.nb_channels; + int err; + + mutex_lock(&host1x->chlist_mutex); + + chindex = host1x->allocated_channels; + if (chindex > max_channels) + goto fail; + + ch = kzalloc(sizeof(*ch), GFP_KERNEL); + if (ch == NULL) + goto fail; + + /* Link platform_device to host1x_channel */ + err = host1x->channel_op.init(ch, host1x, chindex); + if (err < 0) + goto fail; + + ch->dev = pdev; + + /* Add to channel list */ + list_add_tail(&ch->list, &host1x->chlist.list); + + host1x->allocated_channels++; + + mutex_unlock(&host1x->chlist_mutex); + return ch; + +fail: + dev_err(&pdev->dev, "failed to init channel\n"); + kfree(ch); + mutex_unlock(&host1x->chlist_mutex); + return NULL; +} + +void host1x_channel_free(struct host1x_channel *ch) +{ + struct host1x *host1x = host1x_get_host(ch->dev); + struct host1x_channel *chiter, *tmp; + list_for_each_entry_safe(chiter, tmp, &host1x->chlist.list, list) { + if (chiter == ch) { + list_del(&chiter->list); + kfree(ch); + host1x->allocated_channels--; + + return; + } + } +} diff --git a/drivers/gpu/host1x/channel.h b/drivers/gpu/host1x/channel.h new file mode 100644 index 0000000..41eb01e --- /dev/null +++ b/drivers/gpu/host1x/channel.h @@ -0,0 +1,58 @@ +/* + * Tegra host1x Channel + * + * Copyright (c) 2010-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef __HOST1X_CHANNEL_H +#define __HOST1X_CHANNEL_H + +#include <linux/cdev.h> +#include <linux/io.h> +#include "cdma.h" + +struct host1x; +struct platform_device; + +/* + * host1x device list in debug-fs dump of host1x and client device + * as well as channel state + */ +struct host1x_channel { + struct list_head list; + + int refcount; + int chid; + struct mutex reflock; + struct mutex submitlock; + void __iomem *regs; + struct device *node; + struct platform_device *dev; + struct cdev cdev; + struct host1x_cdma cdma; +}; + +/* channel list operations */ +void host1x_channel_list_init(struct host1x *); +void host1x_channel_for_all(struct host1x *, void *data, + int (*fptr)(struct host1x_channel *ch, void *fdata)); + +struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev); +void host1x_channel_free(struct host1x_channel *ch); +struct host1x_channel *host1x_channel_get(struct host1x_channel *ch); +void host1x_channel_put(struct host1x_channel *ch); +int host1x_channel_submit(struct host1x_job *job); + +#endif diff --git a/drivers/gpu/host1x/cma.c b/drivers/gpu/host1x/cma.c new file mode 100644 index 0000000..06b7959 --- /dev/null +++ b/drivers/gpu/host1x/cma.c @@ -0,0 +1,116 @@ +/* + * Tegra host1x CMA support + * + * Copyright (c) 2012-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <drm/drmP.h> +#include <drm/drm.h> +#include <drm/drm_gem_cma_helper.h> +#include <linux/mutex.h> + +#include "cma.h" +#include "memmgr.h" + +static inline struct drm_gem_cma_object *to_cma_obj(struct mem_handle *h) +{ + return (struct drm_gem_cma_object *)(((u32)h) & MEMMGR_ID_MASK); +} + +struct mem_handle *host1x_cma_alloc(size_t size, size_t align, int flags) +{ + return NULL; +} + +void host1x_cma_put(struct mem_handle *handle) +{ + struct drm_gem_cma_object *obj = to_cma_obj(handle); + struct mutex *struct_mutex = &obj->base.dev->struct_mutex; + + mutex_lock(struct_mutex); + drm_gem_object_unreference(&obj->base); + mutex_unlock(struct_mutex); +} + +struct sg_table *host1x_cma_pin(struct mem_handle *handle) +{ + return NULL; +} + +void host1x_cma_unpin(struct mem_handle *handle, struct sg_table *sgt) +{ + +} + + +void *host1x_cma_mmap(struct mem_handle *handle) +{ + return (to_cma_obj(handle))->vaddr; +} + +void host1x_cma_munmap(struct mem_handle *handle, void *addr) +{ + +} + +void *host1x_cma_kmap(struct mem_handle *handle, unsigned int pagenum) +{ + return (to_cma_obj(handle))->vaddr + pagenum * PAGE_SIZE; +} + +void host1x_cma_kunmap(struct mem_handle *handle, unsigned int pagenum, + void *addr) +{ + +} + +struct mem_handle *host1x_cma_get(u32 id, struct platform_device *dev) +{ + struct drm_gem_cma_object *obj = to_cma_obj((void *)id); + struct mutex *struct_mutex = &obj->base.dev->struct_mutex; + + mutex_lock(struct_mutex); + drm_gem_object_reference(&obj->base); + mutex_unlock(struct_mutex); + + return (struct mem_handle *) ((u32)id | mem_mgr_type_cma); +} + +int host1x_cma_pin_array_ids(struct platform_device *dev, + long unsigned *ids, + long unsigned id_type_mask, + long unsigned id_type, + u32 count, + struct host1x_job_unpin_data *unpin_data, + dma_addr_t *phys_addr) +{ + int i; + int pin_count = 0; + + for (i = 0; i < count; i++) { + struct mem_handle *handle; + + if ((ids[i] & id_type_mask) != id_type) + continue; + + handle = host1x_cma_get(ids[i], dev); + + phys_addr[i] = (to_cma_obj(handle)->paddr); + unpin_data[pin_count].h = handle; + + pin_count++; + } + return pin_count; +} diff --git a/drivers/gpu/host1x/cma.h b/drivers/gpu/host1x/cma.h new file mode 100644 index 0000000..82ad710 --- /dev/null +++ b/drivers/gpu/host1x/cma.h @@ -0,0 +1,43 @@ +/* + * Tegra host1x cma memory manager + * + * Copyright (c) 2012-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef __HOST1X_CMA_H +#define __HOST1X_CMA_H + +#include "memmgr.h" + +struct platform_device; + +struct mem_handle *host1x_cma_alloc(size_t size, size_t align, int flags); +void host1x_cma_put(struct mem_handle *handle); +struct sg_table *host1x_cma_pin(struct mem_handle *handle); +void host1x_cma_unpin(struct mem_handle *handle, struct sg_table *sgt); +void *host1x_cma_mmap(struct mem_handle *handle); +void host1x_cma_munmap(struct mem_handle *handle, void *addr); +void *host1x_cma_kmap(struct mem_handle *handle, unsigned int pagenum); +void host1x_cma_kunmap(struct mem_handle *handle, unsigned int pagenum, + void *addr); +struct mem_handle *host1x_cma_get(u32 id, struct platform_device *dev); +int host1x_cma_pin_array_ids(struct platform_device *dev, + long unsigned *ids, + long unsigned id_type_mask, + long unsigned id_type, + u32 count, + struct host1x_job_unpin_data *unpin_data, + dma_addr_t *phys_addr); +#endif diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index 7f9f389..80311ca 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -25,6 +25,7 @@ #include <linux/io.h> #include "dev.h" #include "intr.h" +#include "channel.h" #include "hw/host1x01.h"
#define CREATE_TRACE_POINTS @@ -46,6 +47,16 @@ u32 host1x_sync_readl(struct host1x *host1x, u32 r) return readl(sync_regs + r); }
+void host1x_ch_writel(struct host1x_channel *ch, u32 v, u32 r) +{ + writel(v, ch->regs + r); +} + +u32 host1x_ch_readl(struct host1x_channel *ch, u32 r) +{ + return readl(ch->regs + r); +} + static struct host1x_device_info host1x_info = { .nb_channels = 8, .nb_pts = 32, @@ -135,6 +146,8 @@ static int host1x_probe(struct platform_device *dev)
host1x_syncpt_reset(host);
+ host1x_channel_list_init(host); + host1x_intr_start(&host->intr, clk_get_rate(host->clk));
dev_info(&dev->dev, "initialized\n"); diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h index 8376092..2fefa78 100644 --- a/drivers/gpu/host1x/dev.h +++ b/drivers/gpu/host1x/dev.h @@ -18,11 +18,58 @@ #define HOST1X_DEV_H
#include <linux/platform_device.h> + +#include "channel.h" #include "syncpt.h" #include "intr.h"
struct host1x; +struct host1x_intr; struct host1x_syncpt; +struct host1x_channel; +struct host1x_cdma; +struct host1x_job; +struct push_buffer; +struct dentry; +struct mem_handle; +struct platform_device; + +struct host1x_channel_ops { + int (*init)(struct host1x_channel *, + struct host1x *, + int chid); + int (*submit)(struct host1x_job *job); +}; + +struct host1x_cdma_ops { + void (*start)(struct host1x_cdma *); + void (*stop)(struct host1x_cdma *); + void (*kick)(struct host1x_cdma *); + int (*timeout_init)(struct host1x_cdma *, + u32 syncpt_id); + void (*timeout_destroy)(struct host1x_cdma *); + void (*timeout_teardown_begin)(struct host1x_cdma *); + void (*timeout_teardown_end)(struct host1x_cdma *, + u32 getptr); + void (*timeout_cpu_incr)(struct host1x_cdma *, + u32 getptr, + u32 syncpt_incrs, + u32 syncval, + u32 nr_slots); +}; + +struct host1x_pushbuffer_ops { + void (*reset)(struct push_buffer *); + int (*init)(struct push_buffer *); + void (*destroy)(struct push_buffer *); + void (*push_to)(struct push_buffer *, + struct mem_handle *, + u32 op1, u32 op2); + void (*pop_from)(struct push_buffer *, + unsigned int slots); + u32 (*space)(struct push_buffer *); + u32 (*putptr)(struct push_buffer *); +};
struct host1x_syncpt_ops { void (*reset)(struct host1x_syncpt *); @@ -64,9 +111,19 @@ struct host1x { struct host1x_device_info info; struct clk *clk;
+ /* Sync point dedicated to replacing waits for expired fences */ + struct host1x_syncpt *nop_sp; + + struct host1x_channel_ops channel_op; + struct host1x_cdma_ops cdma_op; + struct host1x_pushbuffer_ops cdma_pb_op; struct host1x_syncpt_ops syncpt_op; struct host1x_intr_ops intr_op;
+ struct mutex chlist_mutex; + struct host1x_channel chlist; + int allocated_channels; + struct dentry *debugfs; };
@@ -84,5 +141,7 @@ struct host1x *host1x_get_host(struct platform_device *_dev)
void host1x_sync_writel(struct host1x *host1x, u32 r, u32 v); u32 host1x_sync_readl(struct host1x *host1x, u32 r); +void host1x_ch_writel(struct host1x_channel *ch, u32 r, u32 v); +u32 host1x_ch_readl(struct host1x_channel *ch, u32 r);
#endif diff --git a/drivers/gpu/host1x/host1x.h b/drivers/gpu/host1x/host1x.h new file mode 100644 index 0000000..ded0660 --- /dev/null +++ b/drivers/gpu/host1x/host1x.h @@ -0,0 +1,29 @@ +/* + * Tegra host1x driver + * + * Copyright (c) 2009-2013, NVIDIA Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. + */ + +#ifndef __LINUX_HOST1X_H +#define __LINUX_HOST1X_H + +enum host1x_class { + NV_HOST1X_CLASS_ID = 0x1, + NV_GRAPHICS_2D_CLASS_ID = 0x51, +}; + +#endif diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c new file mode 100644 index 0000000..7a44418 --- /dev/null +++ b/drivers/gpu/host1x/hw/cdma_hw.c @@ -0,0 +1,475 @@ +/* + * Tegra host1x Command DMA + * + * Copyright (c) 2010-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <linux/slab.h> +#include <linux/scatterlist.h> +#include <linux/dma-mapping.h> +#include "cdma.h" +#include "channel.h" +#include "dev.h" +#include "memmgr.h" + +#include "cdma_hw.h" + +static inline u32 host1x_channel_dmactrl(int stop, int get_rst, int init_get) +{ + return HOST1X_CHANNEL_DMACTRL_DMASTOP_F(stop) + | HOST1X_CHANNEL_DMACTRL_DMAGETRST_F(get_rst) + | HOST1X_CHANNEL_DMACTRL_DMAINITGET_F(init_get); +} + +static void cdma_timeout_handler(struct work_struct *work); + +/* + * push_buffer + * + * The push buffer is a circular array of words to be fetched by command DMA. + * Note that it works slightly differently to the sync queue; fence == cur + * means that the push buffer is full, not empty. + */ + + +/** + * Reset to empty push buffer + */ +static void push_buffer_reset(struct push_buffer *pb) +{ + pb->fence = PUSH_BUFFER_SIZE - 8; + pb->cur = 0; +} + +/** + * Init push buffer resources + */ +static void push_buffer_destroy(struct push_buffer *pb); +static int push_buffer_init(struct push_buffer *pb) +{ + struct host1x_cdma *cdma = pb_to_cdma(pb); + struct host1x *host1x = cdma_to_host1x(cdma); + pb->mapped = NULL; + pb->phys = 0; + pb->handle = NULL; + + host1x->cdma_pb_op.reset(pb); + + /* allocate and map pushbuffer memory */ + pb->mapped = dma_alloc_writecombine(&host1x->dev->dev, + PUSH_BUFFER_SIZE + 4, &pb->phys, GFP_KERNEL); + if (!pb->mapped) + goto fail; + + /* memory for storing mem client and handles for each opcode pair */ + pb->handle = kzalloc(HOST1X_GATHER_QUEUE_SIZE * + sizeof(struct mem_handle *), + GFP_KERNEL); + if (!pb->handle) + goto fail; + + /* put the restart at the end of pushbuffer memory */ + *(pb->mapped + (PUSH_BUFFER_SIZE >> 2)) = + host1x_opcode_restart(pb->phys); + + return 0; + +fail: + push_buffer_destroy(pb); + return -ENOMEM; +} + +/* + * Clean up push buffer resources + */ +static void push_buffer_destroy(struct push_buffer *pb) +{ + struct host1x_cdma *cdma = pb_to_cdma(pb); + struct host1x *host1x = cdma_to_host1x(cdma); + + if (pb->phys != 0) + dma_free_writecombine(&host1x->dev->dev, + PUSH_BUFFER_SIZE + 4, + pb->mapped, pb->phys); + + kfree(pb->handle); + + pb->mapped = NULL; + pb->phys = 0; + pb->handle = NULL; +} + +/* + * Push two words to the push buffer + * Caller must ensure push buffer is not full + */ +static void push_buffer_push_to(struct push_buffer *pb, + struct mem_handle *handle, + u32 op1, u32 op2) +{ + u32 cur = pb->cur; + u32 *p = (u32 *)((u32)pb->mapped + cur); + u32 cur_mem = (cur/8) & (HOST1X_GATHER_QUEUE_SIZE - 1); + WARN_ON(cur == pb->fence); + *(p++) = op1; + *(p++) = op2; + pb->handle[cur_mem] = handle; + pb->cur = (cur + 8) & (PUSH_BUFFER_SIZE - 1); +} + +/* + * Pop a number of two word slots from the push buffer + * Caller must ensure push buffer is not empty + */ +static void push_buffer_pop_from(struct push_buffer *pb, + unsigned int slots) +{ + /* Clear the mem references for old items from pb */ + unsigned int i; + u32 fence_mem = pb->fence/8; + for (i = 0; i < slots; i++) { + int cur_fence_mem = (fence_mem+i) + & (HOST1X_GATHER_QUEUE_SIZE - 1); + pb->handle[cur_fence_mem] = NULL; + } + /* Advance the next write position */ + pb->fence = (pb->fence + slots * 8) & (PUSH_BUFFER_SIZE - 1); +} + +/* + * Return the number of two word slots free in the push buffer + */ +static u32 push_buffer_space(struct push_buffer *pb) +{ + return ((pb->fence - pb->cur) & (PUSH_BUFFER_SIZE - 1)) / 8; +} + +static u32 push_buffer_putptr(struct push_buffer *pb) +{ + return pb->phys + pb->cur; +} + +/* + * The syncpt incr buffer is filled with methods to increment syncpts, which + * is later GATHER-ed into the mainline PB. It's used when a timed out context + * is interleaved with other work, so needs to inline the syncpt increments + * to maintain the count (but otherwise does no work). + */ + +/* + * Init timeout resources + */ +static int cdma_timeout_init(struct host1x_cdma *cdma, + u32 syncpt_id) +{ + if (syncpt_id == NVSYNCPT_INVALID) + return -EINVAL; + + INIT_DELAYED_WORK(&cdma->timeout.wq, cdma_timeout_handler); + cdma->timeout.initialized = true; + + return 0; +} + +/* + * Clean up timeout resources + */ +static void cdma_timeout_destroy(struct host1x_cdma *cdma) +{ + if (cdma->timeout.initialized) + cancel_delayed_work(&cdma->timeout.wq); + cdma->timeout.initialized = false; +} + +/* + * Increment timedout buffer's syncpt via CPU. + */ +static void cdma_timeout_cpu_incr(struct host1x_cdma *cdma, u32 getptr, + u32 syncpt_incrs, u32 syncval, u32 nr_slots) +{ + struct host1x *host1x = cdma_to_host1x(cdma); + struct push_buffer *pb = &cdma->push_buffer; + u32 i, getidx; + + for (i = 0; i < syncpt_incrs; i++) + host1x_syncpt_cpu_incr(cdma->timeout.syncpt); + + /* after CPU incr, ensure shadow is up to date */ + host1x_syncpt_load_min(cdma->timeout.syncpt); + + /* NOP all the PB slots */ + getidx = getptr - pb->phys; + while (nr_slots--) { + u32 *p = (u32 *)((u32)pb->mapped + getidx); + *(p++) = HOST1X_OPCODE_NOOP; + *(p++) = HOST1X_OPCODE_NOOP; + dev_dbg(&host1x->dev->dev, "%s: NOP at 0x%x\n", + __func__, pb->phys + getidx); + getidx = (getidx + 8) & (PUSH_BUFFER_SIZE - 1); + } + wmb(); +} + +/* + * Start channel DMA + */ +static void cdma_start(struct host1x_cdma *cdma) +{ + struct host1x_channel *ch = cdma_to_channel(cdma); + struct host1x *host1x = cdma_to_host1x(cdma); + + if (cdma->running) + return; + + cdma->last_put = host1x->cdma_pb_op.putptr(&cdma->push_buffer); + + host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false), + HOST1X_CHANNEL_DMACTRL); + + /* set base, put, end pointer (all of memory) */ + host1x_ch_writel(ch, 0, HOST1X_CHANNEL_DMASTART); + host1x_ch_writel(ch, cdma->last_put, HOST1X_CHANNEL_DMAPUT); + host1x_ch_writel(ch, 0xFFFFFFFF, HOST1X_CHANNEL_DMAEND); + + /* reset GET */ + host1x_ch_writel(ch, host1x_channel_dmactrl(true, true, true), + HOST1X_CHANNEL_DMACTRL); + + /* start the command DMA */ + host1x_ch_writel(ch, host1x_channel_dmactrl(false, false, false), + HOST1X_CHANNEL_DMACTRL); + + cdma->running = true; +} + +/* + * Similar to cdma_start(), but rather than starting from an idle + * state (where DMA GET is set to DMA PUT), on a timeout we restore + * DMA GET from an explicit value (so DMA may again be pending). + */ +static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr) +{ + struct host1x *host1x = cdma_to_host1x(cdma); + struct host1x_channel *ch = cdma_to_channel(cdma); + + if (cdma->running) + return; + + cdma->last_put = host1x->cdma_pb_op.putptr(&cdma->push_buffer); + + host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false), + HOST1X_CHANNEL_DMACTRL); + + /* set base, end pointer (all of memory) */ + host1x_ch_writel(ch, 0, HOST1X_CHANNEL_DMASTART); + host1x_ch_writel(ch, 0xFFFFFFFF, HOST1X_CHANNEL_DMAEND); + + /* set GET, by loading the value in PUT (then reset GET) */ + host1x_ch_writel(ch, getptr, HOST1X_CHANNEL_DMAPUT); + host1x_ch_writel(ch, host1x_channel_dmactrl(true, true, true), + HOST1X_CHANNEL_DMACTRL); + + dev_dbg(&host1x->dev->dev, + "%s: DMA GET 0x%x, PUT HW 0x%x / shadow 0x%x\n", + __func__, + host1x_ch_readl(ch, HOST1X_CHANNEL_DMAGET), + host1x_ch_readl(ch, HOST1X_CHANNEL_DMAPUT), + cdma->last_put); + + /* deassert GET reset and set PUT */ + host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false), + HOST1X_CHANNEL_DMACTRL); + host1x_ch_writel(ch, cdma->last_put, HOST1X_CHANNEL_DMAPUT); + + /* start the command DMA */ + host1x_ch_writel(ch, host1x_channel_dmactrl(false, false, false), + HOST1X_CHANNEL_DMACTRL); + + cdma->running = true; +} + +/* + * Kick channel DMA into action by writing its PUT offset (if it has changed) + */ +static void cdma_kick(struct host1x_cdma *cdma) +{ + struct host1x *host1x = cdma_to_host1x(cdma); + struct host1x_channel *ch = cdma_to_channel(cdma); + u32 put; + + put = host1x->cdma_pb_op.putptr(&cdma->push_buffer); + + if (put != cdma->last_put) { + host1x_ch_writel(ch, put, HOST1X_CHANNEL_DMAPUT); + cdma->last_put = put; + } +} + +static void cdma_stop(struct host1x_cdma *cdma) +{ + struct host1x_channel *ch = cdma_to_channel(cdma); + + mutex_lock(&cdma->lock); + if (cdma->running) { + host1x_cdma_wait_locked(cdma, CDMA_EVENT_SYNC_QUEUE_EMPTY); + host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false), + HOST1X_CHANNEL_DMACTRL); + cdma->running = false; + } + mutex_unlock(&cdma->lock); +} + +/* + * Stops both channel's command processor and CDMA immediately. + * Also, tears down the channel and resets corresponding module. + */ +static void cdma_timeout_teardown_begin(struct host1x_cdma *cdma) +{ + struct host1x *dev = cdma_to_host1x(cdma); + struct host1x_channel *ch = cdma_to_channel(cdma); + u32 cmdproc_stop; + + if (cdma->torndown && !cdma->running) { + dev_warn(&dev->dev->dev, "Already torn down\n"); + return; + } + + dev_dbg(&dev->dev->dev, + "begin channel teardown (channel id %d)\n", ch->chid); + + cmdproc_stop = host1x_sync_readl(dev, HOST1X_SYNC_CMDPROC_STOP); + cmdproc_stop |= BIT(ch->chid); + host1x_sync_writel(dev, cmdproc_stop, HOST1X_SYNC_CMDPROC_STOP); + + dev_dbg(&dev->dev->dev, + "%s: DMA GET 0x%x, PUT HW 0x%x / shadow 0x%x\n", + __func__, + host1x_ch_readl(ch, HOST1X_CHANNEL_DMAGET), + host1x_ch_readl(ch, HOST1X_CHANNEL_DMAPUT), + cdma->last_put); + + host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false), + HOST1X_CHANNEL_DMACTRL); + + host1x_sync_writel(dev, BIT(ch->chid), HOST1X_SYNC_CH_TEARDOWN); + + cdma->running = false; + cdma->torndown = true; +} + +static void cdma_timeout_teardown_end(struct host1x_cdma *cdma, u32 getptr) +{ + struct host1x *host1x = cdma_to_host1x(cdma); + struct host1x_channel *ch = cdma_to_channel(cdma); + u32 cmdproc_stop; + + dev_dbg(&host1x->dev->dev, + "end channel teardown (id %d, DMAGET restart = 0x%x)\n", + ch->chid, getptr); + + cmdproc_stop = host1x_sync_readl(host1x, HOST1X_SYNC_CMDPROC_STOP); + cmdproc_stop &= ~(BIT(ch->chid)); + host1x_sync_writel(host1x, cmdproc_stop, HOST1X_SYNC_CMDPROC_STOP); + + cdma->torndown = false; + cdma_timeout_restart(cdma, getptr); +} + +/* + * If this timeout fires, it indicates the current sync_queue entry has + * exceeded its TTL and the userctx should be timed out and remaining + * submits already issued cleaned up (future submits return an error). + */ +static void cdma_timeout_handler(struct work_struct *work) +{ + struct host1x_cdma *cdma; + struct host1x *host1x; + struct host1x_channel *ch; + + u32 syncpt_val; + + u32 prev_cmdproc, cmdproc_stop; + + cdma = container_of(to_delayed_work(work), struct host1x_cdma, + timeout.wq); + host1x = cdma_to_host1x(cdma); + ch = cdma_to_channel(cdma); + + mutex_lock(&cdma->lock); + + if (!cdma->timeout.clientid) { + dev_dbg(&host1x->dev->dev, + "cdma_timeout: expired, but has no clientid\n"); + mutex_unlock(&cdma->lock); + return; + } + + /* stop processing to get a clean snapshot */ + prev_cmdproc = host1x_sync_readl(host1x, HOST1X_SYNC_CMDPROC_STOP); + cmdproc_stop = prev_cmdproc | BIT(ch->chid); + host1x_sync_writel(host1x, cmdproc_stop, HOST1X_SYNC_CMDPROC_STOP); + + dev_dbg(&host1x->dev->dev, "cdma_timeout: cmdproc was 0x%x is 0x%x\n", + prev_cmdproc, cmdproc_stop); + + syncpt_val = host1x_syncpt_load_min(host1x->syncpt); + + /* has buffer actually completed? */ + if ((s32)(syncpt_val - cdma->timeout.syncpt_val) >= 0) { + dev_dbg(&host1x->dev->dev, + "cdma_timeout: expired, but buffer had completed\n"); + /* restore */ + cmdproc_stop = prev_cmdproc & ~(BIT(ch->chid)); + host1x_sync_writel(host1x, cmdproc_stop, + HOST1X_SYNC_CMDPROC_STOP); + mutex_unlock(&cdma->lock); + return; + } + + dev_warn(&host1x->dev->dev, + "%s: timeout: %d (%s), HW thresh %d, done %d\n", + __func__, + cdma->timeout.syncpt->id, cdma->timeout.syncpt->name, + syncpt_val, cdma->timeout.syncpt_val); + + /* stop HW, resetting channel/module */ + host1x->cdma_op.timeout_teardown_begin(cdma); + + host1x_cdma_update_sync_queue(cdma, ch->dev); + mutex_unlock(&cdma->lock); +} + +static const struct host1x_cdma_ops host1x_cdma_ops = { + .start = cdma_start, + .stop = cdma_stop, + .kick = cdma_kick, + + .timeout_init = cdma_timeout_init, + .timeout_destroy = cdma_timeout_destroy, + .timeout_teardown_begin = cdma_timeout_teardown_begin, + .timeout_teardown_end = cdma_timeout_teardown_end, + .timeout_cpu_incr = cdma_timeout_cpu_incr, +}; + +static const struct host1x_pushbuffer_ops host1x_pushbuffer_ops = { + .reset = push_buffer_reset, + .init = push_buffer_init, + .destroy = push_buffer_destroy, + .push_to = push_buffer_push_to, + .pop_from = push_buffer_pop_from, + .space = push_buffer_space, + .putptr = push_buffer_putptr, +}; + diff --git a/drivers/gpu/host1x/hw/cdma_hw.h b/drivers/gpu/host1x/hw/cdma_hw.h new file mode 100644 index 0000000..80a085a --- /dev/null +++ b/drivers/gpu/host1x/hw/cdma_hw.h @@ -0,0 +1,37 @@ +/* + * Tegra host1x Command DMA + * + * Copyright (c) 2011-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef __HOST1X_CDMA_HW_H +#define __HOST1X_CDMA_HW_H + +/* + * Size of the sync queue. If it is too small, we won't be able to queue up + * many command buffers. If it is too large, we waste memory. + */ +#define HOST1X_SYNC_QUEUE_SIZE 512 + +/* + * Number of gathers we allow to be queued up per channel. Must be a + * power of two. Currently sized such that pushbuffer is 4KB (512*8B). + */ +#define HOST1X_GATHER_QUEUE_SIZE 512 + +/* 8 bytes per slot. (This number does not include the final RESTART.) */ +#define PUSH_BUFFER_SIZE (HOST1X_GATHER_QUEUE_SIZE * 8) + +#endif diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c new file mode 100644 index 0000000..905cfd2 --- /dev/null +++ b/drivers/gpu/host1x/hw/channel_hw.c @@ -0,0 +1,148 @@ +/* + * Tegra host1x Channel + * + * Copyright (c) 2010-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include "host1x.h" +#include "channel.h" +#include "dev.h" +#include <linux/slab.h> +#include "intr.h" +#include "job.h" +#include <trace/events/host1x.h> + +static void submit_gathers(struct host1x_job *job) +{ + /* push user gathers */ + int i; + for (i = 0 ; i < job->num_gathers; i++) { + struct host1x_job_gather *g = &job->gathers[i]; + u32 op1 = host1x_opcode_gather(g->words); + u32 op2 = g->mem_base + g->offset; + host1x_cdma_push_gather(&job->ch->cdma, + job->gathers[i].ref, + job->gathers[i].offset, + op1, op2); + } +} + +static int channel_submit(struct host1x_job *job) +{ + struct host1x_channel *ch = job->ch; + struct host1x_syncpt *sp; + u32 user_syncpt_incrs = job->syncpt_incrs; + u32 prev_max = 0; + u32 syncval; + int err; + void *completed_waiter = NULL; + + sp = host1x_get_host(job->ch->dev)->syncpt + job->syncpt_id; + trace_host1x_channel_submit(ch->dev->name, + job->num_gathers, job->num_relocs, job->num_waitchk, + job->syncpt_id, job->syncpt_incrs); + + /* before error checks, return current max */ + prev_max = job->syncpt_end = host1x_syncpt_read_max(sp); + + /* get submit lock */ + err = mutex_lock_interruptible(&ch->submitlock); + if (err) + goto error; + + completed_waiter = host1x_intr_alloc_waiter(); + if (!completed_waiter) { + mutex_unlock(&ch->submitlock); + err = -ENOMEM; + goto error; + } + + /* begin a CDMA submit */ + err = host1x_cdma_begin(&ch->cdma, job); + if (err) { + mutex_unlock(&ch->submitlock); + goto error; + } + + if (job->serialize) { + /* + * Force serialization by inserting a host wait for the + * previous job to finish before this one can commence. + */ + host1x_cdma_push(&ch->cdma, + host1x_opcode_setclass(NV_HOST1X_CLASS_ID, + host1x_uclass_wait_syncpt_r(), + 1), + host1x_class_host_wait_syncpt(job->syncpt_id, + host1x_syncpt_read_max(sp))); + } + + syncval = host1x_syncpt_incr_max(sp, user_syncpt_incrs); + + job->syncpt_end = syncval; + + /* add a setclass for modules that require it */ + if (job->class) + host1x_cdma_push(&ch->cdma, + host1x_opcode_setclass(job->class, 0, 0), + HOST1X_OPCODE_NOOP); + + submit_gathers(job); + + /* end CDMA submit & stash pinned hMems into sync queue */ + host1x_cdma_end(&ch->cdma, job); + + trace_host1x_channel_submitted(ch->dev->name, + prev_max, syncval); + + /* schedule a submit complete interrupt */ + err = host1x_intr_add_action(&host1x_get_host(ch->dev)->intr, + job->syncpt_id, syncval, + HOST1X_INTR_ACTION_SUBMIT_COMPLETE, ch, + completed_waiter, + NULL); + completed_waiter = NULL; + WARN(err, "Failed to set submit complete interrupt"); + + mutex_unlock(&ch->submitlock); + + return 0; + +error: + kfree(completed_waiter); + return err; +} + +static inline void __iomem *host1x_channel_regs(void __iomem *p, int ndx) +{ + p += ndx * NV_HOST1X_CHANNEL_MAP_SIZE_BYTES; + return p; +} + +static int host1x_channel_init(struct host1x_channel *ch, + struct host1x *dev, int index) +{ + ch->chid = index; + mutex_init(&ch->reflock); + mutex_init(&ch->submitlock); + + ch->regs = host1x_channel_regs(dev->regs, index); + return 0; +} + +static const struct host1x_channel_ops host1x_channel_ops = { + .init = host1x_channel_init, + .submit = channel_submit, +}; diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c index 3d633a3..7569a1e 100644 --- a/drivers/gpu/host1x/hw/host1x01.c +++ b/drivers/gpu/host1x/hw/host1x01.c @@ -23,13 +23,19 @@
#include "hw/host1x01.h" #include "dev.h" +#include "channel.h" #include "hw/host1x01_hardware.h"
+#include "hw/channel_hw.c" +#include "hw/cdma_hw.c" #include "hw/syncpt_hw.c" #include "hw/intr_hw.c"
int host1x01_init(struct host1x *host) { + host->channel_op = host1x_channel_ops; + host->cdma_op = host1x_cdma_ops; + host->cdma_pb_op = host1x_pushbuffer_ops; host->syncpt_op = host1x_syncpt_ops; host->intr_op = host1x_intr_ops;
diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h index c1d5324..03873c0 100644 --- a/drivers/gpu/host1x/hw/host1x01_hardware.h +++ b/drivers/gpu/host1x/hw/host1x01_hardware.h @@ -21,6 +21,130 @@
#include <linux/types.h> #include <linux/bitops.h> +#include "hw_host1x01_channel.h" #include "hw_host1x01_sync.h" +#include "hw_host1x01_uclass.h" + +/* channel registers */ +#define NV_HOST1X_CHANNEL_MAP_SIZE_BYTES 16384 + +static inline u32 host1x_class_host_wait_syncpt( + unsigned indx, unsigned threshold) +{ + return host1x_uclass_wait_syncpt_indx_f(indx) + | host1x_uclass_wait_syncpt_thresh_f(threshold); +} + +static inline u32 host1x_class_host_load_syncpt_base( + unsigned indx, unsigned threshold) +{ + return host1x_uclass_load_syncpt_base_base_indx_f(indx) + | host1x_uclass_load_syncpt_base_value_f(threshold); +} + +static inline u32 host1x_class_host_wait_syncpt_base( + unsigned indx, unsigned base_indx, unsigned offset) +{ + return host1x_uclass_wait_syncpt_base_indx_f(indx) + | host1x_uclass_wait_syncpt_base_base_indx_f(base_indx) + | host1x_uclass_wait_syncpt_base_offset_f(offset); +} + +static inline u32 host1x_class_host_incr_syncpt_base( + unsigned base_indx, unsigned offset) +{ + return host1x_uclass_incr_syncpt_base_base_indx_f(base_indx) + | host1x_uclass_incr_syncpt_base_offset_f(offset); +} + +static inline u32 host1x_class_host_incr_syncpt( + unsigned cond, unsigned indx) +{ + return host1x_uclass_incr_syncpt_cond_f(cond) + | host1x_uclass_incr_syncpt_indx_f(indx); +} + +static inline u32 host1x_class_host_indoff_reg_write( + unsigned mod_id, unsigned offset, bool auto_inc) +{ + u32 v = host1x_uclass_indoff_indbe_f(0xf) + | host1x_uclass_indoff_indmodid_f(mod_id) + | host1x_uclass_indoff_indroffset_f(offset); + if (auto_inc) + v |= host1x_uclass_indoff_autoinc_f(1); + return v; +} + +static inline u32 host1x_class_host_indoff_reg_read( + unsigned mod_id, unsigned offset, bool auto_inc) +{ + u32 v = host1x_uclass_indoff_indmodid_f(mod_id) + | host1x_uclass_indoff_indroffset_f(offset) + | host1x_uclass_indoff_rwn_read_v(); + if (auto_inc) + v |= host1x_uclass_indoff_autoinc_f(1); + return v; +} + + +/* cdma opcodes */ +static inline u32 host1x_opcode_setclass( + unsigned class_id, unsigned offset, unsigned mask) +{ + return (0 << 28) | (offset << 16) | (class_id << 6) | mask; +} + +static inline u32 host1x_opcode_incr(unsigned offset, unsigned count) +{ + return (1 << 28) | (offset << 16) | count; +} + +static inline u32 host1x_opcode_nonincr(unsigned offset, unsigned count) +{ + return (2 << 28) | (offset << 16) | count; +} + +static inline u32 host1x_opcode_mask(unsigned offset, unsigned mask) +{ + return (3 << 28) | (offset << 16) | mask; +} + +static inline u32 host1x_opcode_imm(unsigned offset, unsigned value) +{ + return (4 << 28) | (offset << 16) | value; +} + +static inline u32 host1x_opcode_imm_incr_syncpt(unsigned cond, unsigned indx) +{ + return host1x_opcode_imm(host1x_uclass_incr_syncpt_r(), + host1x_class_host_incr_syncpt(cond, indx)); +} + +static inline u32 host1x_opcode_restart(unsigned address) +{ + return (5 << 28) | (address >> 4); +} + +static inline u32 host1x_opcode_gather(unsigned count) +{ + return (6 << 28) | count; +} + +static inline u32 host1x_opcode_gather_nonincr(unsigned offset, unsigned count) +{ + return (6 << 28) | (offset << 16) | BIT(15) | count; +} + +static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count) +{ + return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count; +} + +#define HOST1X_OPCODE_NOOP host1x_opcode_nonincr(0, 0) + +static inline u32 host1x_mask2(unsigned x, unsigned y) +{ + return 1 | (1 << (y - x)); +}
#endif diff --git a/drivers/gpu/host1x/hw/hw_host1x01_channel.h b/drivers/gpu/host1x/hw/hw_host1x01_channel.h new file mode 100644 index 0000000..dad4fee --- /dev/null +++ b/drivers/gpu/host1x/hw/hw_host1x01_channel.h @@ -0,0 +1,102 @@ +/* + * Copyright (c) 2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + * + */ + + /* + * Function naming determines intended use: + * + * <x>_r(void) : Returns the offset for register <x>. + * + * <x>_w(void) : Returns the word offset for word (4 byte) element <x>. + * + * <x>_<y>_s(void) : Returns size of field <y> of register <x> in bits. + * + * <x>_<y>_f(u32 v) : Returns a value based on 'v' which has been shifted + * and masked to place it at field <y> of register <x>. This value + * can be |'d with others to produce a full register value for + * register <x>. + * + * <x>_<y>_m(void) : Returns a mask for field <y> of register <x>. This + * value can be ~'d and then &'d to clear the value of field <y> for + * register <x>. + * + * <x>_<y>_<z>_f(void) : Returns the constant value <z> after being shifted + * to place it at field <y> of register <x>. This value can be |'d + * with others to produce a full register value for <x>. + * + * <x>_<y>_v(u32 r) : Returns the value of field <y> from a full register + * <x> value 'r' after being shifted to place its LSB at bit 0. + * This value is suitable for direct comparison with other unshifted + * values appropriate for use in field <y> of register <x>. + * + * <x>_<y>_<z>_v(void) : Returns the constant value for <z> defined for + * field <y> of register <x>. This value is suitable for direct + * comparison with unshifted values appropriate for use in field <y> + * of register <x>. + */ + +#ifndef __hw_host1x_channel_host1x_h__ +#define __hw_host1x_channel_host1x_h__ + +static inline u32 host1x_channel_dmastart_r(void) +{ + return 0x14; +} +#define HOST1X_CHANNEL_DMASTART \ + host1x_channel_dmastart_r() +static inline u32 host1x_channel_dmaput_r(void) +{ + return 0x18; +} +#define HOST1X_CHANNEL_DMAPUT \ + host1x_channel_dmaput_r() +static inline u32 host1x_channel_dmaget_r(void) +{ + return 0x1c; +} +#define HOST1X_CHANNEL_DMAGET \ + host1x_channel_dmaget_r() +static inline u32 host1x_channel_dmaend_r(void) +{ + return 0x20; +} +#define HOST1X_CHANNEL_DMAEND \ + host1x_channel_dmaend_r() +static inline u32 host1x_channel_dmactrl_r(void) +{ + return 0x24; +} +#define HOST1X_CHANNEL_DMACTRL \ + host1x_channel_dmactrl_r() +static inline u32 host1x_channel_dmactrl_dmastop_f(u32 v) +{ + return (v & 0x1) << 0; +} +#define HOST1X_CHANNEL_DMACTRL_DMASTOP_F(v) \ + host1x_channel_dmactrl_dmastop_f(v) +static inline u32 host1x_channel_dmactrl_dmagetrst_f(u32 v) +{ + return (v & 0x1) << 1; +} +#define HOST1X_CHANNEL_DMACTRL_DMAGETRST_F(v) \ + host1x_channel_dmactrl_dmagetrst_f(v) +static inline u32 host1x_channel_dmactrl_dmainitget_f(u32 v) +{ + return (v & 0x1) << 2; +} +#define HOST1X_CHANNEL_DMACTRL_DMAINITGET_F(v) \ + host1x_channel_dmactrl_dmainitget_f(v) +#endif diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h index 5da9afb..3073d37 100644 --- a/drivers/gpu/host1x/hw/hw_host1x01_sync.h +++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h @@ -69,6 +69,18 @@ static inline u32 host1x_sync_syncpt_thresh_int_enable_cpu0_r(void) } #define HOST1X_SYNC_SYNCPT_THRESH_INT_ENABLE_CPU0 \ host1x_sync_syncpt_thresh_int_enable_cpu0_r() +static inline u32 host1x_sync_cmdproc_stop_r(void) +{ + return 0xac; +} +#define HOST1X_SYNC_CMDPROC_STOP \ + host1x_sync_cmdproc_stop_r() +static inline u32 host1x_sync_ch_teardown_r(void) +{ + return 0xb0; +} +#define HOST1X_SYNC_CH_TEARDOWN \ + host1x_sync_ch_teardown_r() static inline u32 host1x_sync_usec_clk_r(void) { return 0x1a4; diff --git a/drivers/gpu/host1x/hw/hw_host1x01_uclass.h b/drivers/gpu/host1x/hw/hw_host1x01_uclass.h new file mode 100644 index 0000000..7af6609 --- /dev/null +++ b/drivers/gpu/host1x/hw/hw_host1x01_uclass.h @@ -0,0 +1,168 @@ +/* + * Copyright (c) 2012-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + * + */ + + /* + * Function naming determines intended use: + * + * <x>_r(void) : Returns the offset for register <x>. + * + * <x>_w(void) : Returns the word offset for word (4 byte) element <x>. + * + * <x>_<y>_s(void) : Returns size of field <y> of register <x> in bits. + * + * <x>_<y>_f(u32 v) : Returns a value based on 'v' which has been shifted + * and masked to place it at field <y> of register <x>. This value + * can be |'d with others to produce a full register value for + * register <x>. + * + * <x>_<y>_m(void) : Returns a mask for field <y> of register <x>. This + * value can be ~'d and then &'d to clear the value of field <y> for + * register <x>. + * + * <x>_<y>_<z>_f(void) : Returns the constant value <z> after being shifted + * to place it at field <y> of register <x>. This value can be |'d + * with others to produce a full register value for <x>. + * + * <x>_<y>_v(u32 r) : Returns the value of field <y> from a full register + * <x> value 'r' after being shifted to place its LSB at bit 0. + * This value is suitable for direct comparison with other unshifted + * values appropriate for use in field <y> of register <x>. + * + * <x>_<y>_<z>_v(void) : Returns the constant value for <z> defined for + * field <y> of register <x>. This value is suitable for direct + * comparison with unshifted values appropriate for use in field <y> + * of register <x>. + */ + +#ifndef __hw_host1x_uclass_host1x_h__ +#define __hw_host1x_uclass_host1x_h__ + +static inline u32 host1x_uclass_incr_syncpt_r(void) +{ + return 0x0; +} +#define HOST1X_UCLASS_INCR_SYNCPT \ + host1x_uclass_incr_syncpt_r() +static inline u32 host1x_uclass_incr_syncpt_cond_f(u32 v) +{ + return (v & 0xff) << 8; +} +#define HOST1X_UCLASS_INCR_SYNCPT_COND_F(v) \ + host1x_uclass_incr_syncpt_cond_f(v) +static inline u32 host1x_uclass_incr_syncpt_indx_f(u32 v) +{ + return (v & 0xff) << 0; +} +#define HOST1X_UCLASS_INCR_SYNCPT_INDX_F(v) \ + host1x_uclass_incr_syncpt_indx_f(v) +static inline u32 host1x_uclass_wait_syncpt_r(void) +{ + return 0x8; +} +#define HOST1X_UCLASS_WAIT_SYNCPT \ + host1x_uclass_wait_syncpt_r() +static inline u32 host1x_uclass_wait_syncpt_indx_f(u32 v) +{ + return (v & 0xff) << 24; +} +#define HOST1X_UCLASS_WAIT_SYNCPT_INDX_F(v) \ + host1x_uclass_wait_syncpt_indx_f(v) +static inline u32 host1x_uclass_wait_syncpt_thresh_f(u32 v) +{ + return (v & 0xffffff) << 0; +} +#define HOST1X_UCLASS_WAIT_SYNCPT_THRESH_F(v) \ + host1x_uclass_wait_syncpt_thresh_f(v) +static inline u32 host1x_uclass_wait_syncpt_base_indx_f(u32 v) +{ + return (v & 0xff) << 24; +} +#define HOST1X_UCLASS_WAIT_SYNCPT_BASE_INDX_F(v) \ + host1x_uclass_wait_syncpt_base_indx_f(v) +static inline u32 host1x_uclass_wait_syncpt_base_base_indx_f(u32 v) +{ + return (v & 0xff) << 16; +} +#define HOST1X_UCLASS_WAIT_SYNCPT_BASE_BASE_INDX_F(v) \ + host1x_uclass_wait_syncpt_base_base_indx_f(v) +static inline u32 host1x_uclass_wait_syncpt_base_offset_f(u32 v) +{ + return (v & 0xffff) << 0; +} +#define HOST1X_UCLASS_WAIT_SYNCPT_BASE_OFFSET_F(v) \ + host1x_uclass_wait_syncpt_base_offset_f(v) +static inline u32 host1x_uclass_load_syncpt_base_base_indx_f(u32 v) +{ + return (v & 0xff) << 24; +} +#define HOST1X_UCLASS_LOAD_SYNCPT_BASE_BASE_INDX_F(v) \ + host1x_uclass_load_syncpt_base_base_indx_f(v) +static inline u32 host1x_uclass_load_syncpt_base_value_f(u32 v) +{ + return (v & 0xffffff) << 0; +} +#define HOST1X_UCLASS_LOAD_SYNCPT_BASE_VALUE_F(v) \ + host1x_uclass_load_syncpt_base_value_f(v) +static inline u32 host1x_uclass_incr_syncpt_base_base_indx_f(u32 v) +{ + return (v & 0xff) << 24; +} +#define HOST1X_UCLASS_INCR_SYNCPT_BASE_BASE_INDX_F(v) \ + host1x_uclass_incr_syncpt_base_base_indx_f(v) +static inline u32 host1x_uclass_incr_syncpt_base_offset_f(u32 v) +{ + return (v & 0xffffff) << 0; +} +#define HOST1X_UCLASS_INCR_SYNCPT_BASE_OFFSET_F(v) \ + host1x_uclass_incr_syncpt_base_offset_f(v) +static inline u32 host1x_uclass_indoff_r(void) +{ + return 0x2d; +} +#define HOST1X_UCLASS_INDOFF \ + host1x_uclass_indoff_r() +static inline u32 host1x_uclass_indoff_indbe_f(u32 v) +{ + return (v & 0xf) << 28; +} +#define HOST1X_UCLASS_INDOFF_INDBE_F(v) \ + host1x_uclass_indoff_indbe_f(v) +static inline u32 host1x_uclass_indoff_autoinc_f(u32 v) +{ + return (v & 0x1) << 27; +} +#define HOST1X_UCLASS_INDOFF_AUTOINC_F(v) \ + host1x_uclass_indoff_autoinc_f(v) +static inline u32 host1x_uclass_indoff_indmodid_f(u32 v) +{ + return (v & 0xff) << 18; +} +#define HOST1X_UCLASS_INDOFF_INDMODID_F(v) \ + host1x_uclass_indoff_indmodid_f(v) +static inline u32 host1x_uclass_indoff_indroffset_f(u32 v) +{ + return (v & 0xffff) << 2; +} +#define HOST1X_UCLASS_INDOFF_INDROFFSET_F(v) \ + host1x_uclass_indoff_indroffset_f(v) +static inline u32 host1x_uclass_indoff_rwn_read_v(void) +{ + return 1; +} +#define HOST1X_UCLASS_INDOFF_INDROFFSET_F(v) \ + host1x_uclass_indoff_indroffset_f(v) +#endif diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c index 16e3ada..ba48cee 100644 --- a/drivers/gpu/host1x/hw/syncpt_hw.c +++ b/drivers/gpu/host1x/hw/syncpt_hw.c @@ -97,6 +97,15 @@ static void syncpt_cpu_incr(struct host1x_syncpt *sp) wmb(); }
+/* remove a wait pointed to by patch_addr */ +static int syncpt_patch_wait(struct host1x_syncpt *sp, void *patch_addr) +{ + u32 override = host1x_class_host_wait_syncpt( + NVSYNCPT_GRAPHICS_HOST, 0); + __raw_writel(override, patch_addr); + return 0; +} + static const char *syncpt_name(struct host1x_syncpt *sp) { struct host1x_device_info *info = &sp->dev->info; @@ -141,6 +150,7 @@ static const struct host1x_syncpt_ops host1x_syncpt_ops = { .read_wait_base = syncpt_read_wait_base, .load_min = syncpt_load_min, .cpu_incr = syncpt_cpu_incr, + .patch_wait = syncpt_patch_wait, .debug = syncpt_debug, .name = syncpt_name, }; diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c index 26099b8..9d0b5f1 100644 --- a/drivers/gpu/host1x/intr.c +++ b/drivers/gpu/host1x/intr.c @@ -20,6 +20,8 @@ #include <linux/interrupt.h> #include <linux/slab.h> #include <linux/irq.h> +#include <trace/events/host1x.h> +#include "channel.h" #include "dev.h"
/* Wait list management */ @@ -74,7 +76,7 @@ static void remove_completed_waiters(struct list_head *head, u32 sync, struct list_head completed[HOST1X_INTR_ACTION_COUNT]) { struct list_head *dest; - struct host1x_waitlist *waiter, *next; + struct host1x_waitlist *waiter, *next, *prev;
list_for_each_entry_safe(waiter, next, head, list) { if ((s32)(waiter->thresh - sync) > 0) @@ -82,6 +84,17 @@ static void remove_completed_waiters(struct list_head *head, u32 sync,
dest = completed + waiter->action;
+ /* consolidate submit cleanups */ + if (waiter->action == HOST1X_INTR_ACTION_SUBMIT_COMPLETE + && !list_empty(dest)) { + prev = list_entry(dest->prev, + struct host1x_waitlist, list); + if (prev->data == waiter->data) { + prev->count++; + dest = NULL; + } + } + /* PENDING->REMOVED or CANCELLED->HANDLED */ if (atomic_inc_return(&waiter->state) == WLS_HANDLED || !dest) { list_del(&waiter->list); @@ -104,6 +117,19 @@ static void reset_threshold_interrupt(struct host1x_intr *intr, host1x->intr_op.enable_syncpt_intr(intr, id); }
+static void action_submit_complete(struct host1x_waitlist *waiter) +{ + struct host1x_channel *channel = waiter->data; + int nr_completed = waiter->count; + + host1x_cdma_update(&channel->cdma); + + /* Add nr_completed to trace */ + trace_host1x_channel_submit_complete(channel->dev->name, + nr_completed, waiter->thresh); + +} + static void action_wakeup(struct host1x_waitlist *waiter) { wait_queue_head_t *wq = waiter->data; @@ -121,6 +147,7 @@ static void action_wakeup_interruptible(struct host1x_waitlist *waiter) typedef void (*action_handler)(struct host1x_waitlist *waiter);
static action_handler action_handlers[HOST1X_INTR_ACTION_COUNT] = { + action_submit_complete, action_wakeup, action_wakeup_interruptible, }; diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h index 679a7b4..979b929 100644 --- a/drivers/gpu/host1x/intr.h +++ b/drivers/gpu/host1x/intr.h @@ -24,6 +24,12 @@
enum host1x_intr_action { /* + * Perform cleanup after a submit has completed. + * 'data' points to a channel + */ + HOST1X_INTR_ACTION_SUBMIT_COMPLETE = 0, + + /* * Wake up a task. * 'data' points to a wait_queue_head_t */ diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c new file mode 100644 index 0000000..cc9c84a --- /dev/null +++ b/drivers/gpu/host1x/job.c @@ -0,0 +1,612 @@ +/* + * Tegra host1x Job + * + * Copyright (c) 2010-2012, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <linux/module.h> +#include <linux/slab.h> +#include <linux/kref.h> +#include <linux/err.h> +#include <linux/vmalloc.h> +#include <linux/scatterlist.h> +#include <trace/events/host1x.h> +#include <linux/dma-mapping.h> +#include "job.h" +#include "channel.h" +#include "syncpt.h" +#include "dev.h" +#include "memmgr.h" + +#ifdef CONFIG_TEGRA_HOST1X_FIREWALL +static int host1x_firewall = 1; +#else +static int host1x_firewall; +#endif + +struct host1x_job *host1x_job_alloc(struct host1x_channel *ch, + u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks) +{ + struct host1x_job *job = NULL; + int num_unpins = num_cmdbufs + num_relocs; + s64 total; + void *mem; + + /* Check that we're not going to overflow */ + total = sizeof(struct host1x_job) + + num_relocs * sizeof(struct host1x_reloc) + + num_unpins * sizeof(struct host1x_job_unpin_data) + + num_waitchks * sizeof(struct host1x_waitchk) + + num_cmdbufs * sizeof(struct host1x_job_gather) + + num_unpins * sizeof(dma_addr_t) + + num_unpins * sizeof(u32 *); + if (total > ULONG_MAX) + return NULL; + + mem = job = kzalloc(total, GFP_KERNEL); + if (!job) + return NULL; + + kref_init(&job->ref); + job->ch = ch; + + /* First init state to zero */ + + /* + * Redistribute memory to the structs. + * Overflows and negative conditions have + * already been checked in job_alloc(). + */ + mem += sizeof(struct host1x_job); + job->relocarray = num_relocs ? mem : NULL; + mem += num_relocs * sizeof(struct host1x_reloc); + job->unpins = num_unpins ? mem : NULL; + mem += num_unpins * sizeof(struct host1x_job_unpin_data); + job->waitchk = num_waitchks ? mem : NULL; + mem += num_waitchks * sizeof(struct host1x_waitchk); + job->gathers = num_cmdbufs ? mem : NULL; + mem += num_cmdbufs * sizeof(struct host1x_job_gather); + job->addr_phys = num_unpins ? mem : NULL; + mem += num_unpins * sizeof(dma_addr_t); + job->pin_ids = num_unpins ? mem : NULL; + + job->reloc_addr_phys = job->addr_phys; + job->gather_addr_phys = &job->addr_phys[num_relocs]; + + return job; +} + +void host1x_job_get(struct host1x_job *job) +{ + kref_get(&job->ref); +} + +static void job_free(struct kref *ref) +{ + struct host1x_job *job = container_of(ref, struct host1x_job, ref); + + kfree(job); +} + +void host1x_job_put(struct host1x_job *job) +{ + kref_put(&job->ref, job_free); +} + +void host1x_job_add_gather(struct host1x_job *job, + u32 mem_id, u32 words, u32 offset) +{ + struct host1x_job_gather *cur_gather = + &job->gathers[job->num_gathers]; + + cur_gather->words = words; + cur_gather->mem_id = mem_id; + cur_gather->offset = offset; + job->num_gathers++; +} + +/* + * Check driver supplied waitchk structs for syncpt thresholds + * that have already been satisfied and NULL the comparison (to + * avoid a wrap condition in the HW). + */ +static int do_waitchks(struct host1x_job *job, struct host1x *host, + u32 patch_mem, struct mem_handle *h) +{ + int i; + + /* compare syncpt vs wait threshold */ + for (i = 0; i < job->num_waitchk; i++) { + struct host1x_waitchk *wait = &job->waitchk[i]; + struct host1x_syncpt *sp = + host1x_syncpt_get(host, wait->syncpt_id); + + /* validate syncpt id */ + if (wait->syncpt_id > host1x_syncpt_nb_pts(host)) + continue; + + /* skip all other gathers */ + if (patch_mem != wait->mem) + continue; + + trace_host1x_syncpt_wait_check(wait->mem, wait->offset, + wait->syncpt_id, wait->thresh, + host1x_syncpt_read_min(sp)); + if (host1x_syncpt_is_expired( + host1x_syncpt_get(host, wait->syncpt_id), + wait->thresh)) { + struct host1x_syncpt *sp = + host1x_syncpt_get(host, wait->syncpt_id); + + void *patch_addr = NULL; + + /* + * NULL an already satisfied WAIT_SYNCPT host method, + * by patching its args in the command stream. The + * method data is changed to reference a reserved + * (never given out or incr) NVSYNCPT_GRAPHICS_HOST + * syncpt with a matching threshold value of 0, so + * is guaranteed to be popped by the host HW. + */ + dev_dbg(&host->dev->dev, + "drop WAIT id %d (%s) thresh 0x%x, min 0x%x\n", + wait->syncpt_id, sp->name, wait->thresh, + host1x_syncpt_read_min(sp)); + + /* patch the wait */ + patch_addr = host1x_memmgr_kmap(h, + wait->offset >> PAGE_SHIFT); + if (patch_addr) { + host1x_syncpt_patch_wait(sp, + (patch_addr + + (wait->offset & ~PAGE_MASK))); + host1x_memmgr_kunmap(h, + wait->offset >> PAGE_SHIFT, + patch_addr); + } else { + pr_err("Couldn't map cmdbuf for wait check\n"); + } + } + + wait->mem = 0; + } + return 0; +} + + +static int pin_job_mem(struct host1x_job *job) +{ + int i; + int count = 0; + int result; + + for (i = 0; i < job->num_relocs; i++) { + struct host1x_reloc *reloc = &job->relocarray[i]; + job->pin_ids[count] = reloc->target; + count++; + } + + for (i = 0; i < job->num_gathers; i++) { + struct host1x_job_gather *g = &job->gathers[i]; + job->pin_ids[count] = g->mem_id; + count++; + } + + /* validate array and pin unique ids, get refs for unpinning */ + result = host1x_memmgr_pin_array_ids(job->ch->dev, + job->pin_ids, job->addr_phys, + count, + job->unpins); + + if (result > 0) + job->num_unpins = result; + + return result; +} + +static int do_relocs(struct host1x_job *job, + u32 cmdbuf_mem, struct mem_handle *h) +{ + int i = 0; + int last_page = -1; + void *cmdbuf_page_addr = NULL; + + /* pin & patch the relocs for one gather */ + while (i < job->num_relocs) { + struct host1x_reloc *reloc = &job->relocarray[i]; + + /* skip all other gathers */ + if (cmdbuf_mem != reloc->cmdbuf_mem) { + i++; + continue; + } + + if (last_page != reloc->cmdbuf_offset >> PAGE_SHIFT) { + if (cmdbuf_page_addr) + host1x_memmgr_kunmap(h, + last_page, cmdbuf_page_addr); + + cmdbuf_page_addr = host1x_memmgr_kmap(h, + reloc->cmdbuf_offset >> PAGE_SHIFT); + last_page = reloc->cmdbuf_offset >> PAGE_SHIFT; + + if (unlikely(!cmdbuf_page_addr)) { + pr_err("Couldn't map cmdbuf for relocation\n"); + return -ENOMEM; + } + } + + __raw_writel( + (job->reloc_addr_phys[i] + + reloc->target_offset) >> reloc->shift, + (cmdbuf_page_addr + + (reloc->cmdbuf_offset & ~PAGE_MASK))); + + /* remove completed reloc from the job */ + if (i != job->num_relocs - 1) { + struct host1x_reloc *reloc_last = + &job->relocarray[job->num_relocs - 1]; + reloc->cmdbuf_mem = reloc_last->cmdbuf_mem; + reloc->cmdbuf_offset = reloc_last->cmdbuf_offset; + reloc->target = reloc_last->target; + reloc->target_offset = reloc_last->target_offset; + reloc->shift = reloc_last->shift; + job->reloc_addr_phys[i] = + job->reloc_addr_phys[job->num_relocs - 1]; + job->num_relocs--; + } else { + break; + } + } + + if (cmdbuf_page_addr) + host1x_memmgr_kunmap(h, last_page, cmdbuf_page_addr); + + return 0; +} + +static int check_reloc(struct host1x_reloc *reloc, + u32 cmdbuf_id, int offset) +{ + int err = 0; + if (reloc->cmdbuf_mem != cmdbuf_id + || reloc->cmdbuf_offset != offset * sizeof(u32)) + err = -EINVAL; + + return err; +} + +static int check_mask(struct host1x_job *job, + struct platform_device *pdev, + struct host1x_reloc **reloc, int *num_relocs, + u32 cmdbuf_id, int *offset, + u32 *words, u32 class, u32 reg, u32 mask) +{ + while (mask) { + if (*words == 0) + return -EINVAL; + + if (mask & 1) { + if (job->is_addr_reg(pdev, class, reg)) { + if (!*num_relocs || + check_reloc(*reloc, cmdbuf_id, *offset)) + return -EINVAL; + (*reloc)++; + (*num_relocs)--; + } + (*words)--; + (*offset)++; + } + mask >>= 1; + reg += 1; + } + + return 0; +} + +static int check_incr(struct host1x_job *job, + struct platform_device *pdev, + struct host1x_reloc **reloc, int *num_relocs, + u32 cmdbuf_id, int *offset, + u32 *words, u32 class, u32 reg, u32 count) +{ + while (count) { + if (*words == 0) + return -EINVAL; + + if (job->is_addr_reg(pdev, class, reg)) { + if (!*num_relocs || + check_reloc(*reloc, cmdbuf_id, *offset)) + return -EINVAL; + (*reloc)++; + (*num_relocs)--; + } + reg += 1; + (*words)--; + (*offset)++; + count--; + } + + return 0; +} + +static int check_nonincr(struct host1x_job *job, + struct platform_device *pdev, + struct host1x_reloc **reloc, int *num_relocs, + u32 cmdbuf_id, int *offset, + u32 *words, u32 class, u32 reg, u32 count) +{ + int is_addr_reg = job->is_addr_reg(pdev, class, reg); + + while (count) { + if (*words == 0) + return -EINVAL; + + if (is_addr_reg) { + if (!*num_relocs || + check_reloc(*reloc, cmdbuf_id, *offset)) + return -EINVAL; + (*reloc)++; + (*num_relocs)--; + } + (*words)--; + (*offset)++; + count--; + } + + return 0; +} + +static int validate(struct host1x_job *job, struct platform_device *pdev, + struct host1x_job_gather *g) +{ + struct host1x_reloc *reloc = job->relocarray; + int num_relocs = job->num_relocs; + u32 *cmdbuf_base; + int offset = 0; + unsigned int words; + int err = 0; + int class = 0; + + if (!job->is_addr_reg) + return 0; + + cmdbuf_base = host1x_memmgr_mmap(g->ref); + if (!cmdbuf_base) + return -ENOMEM; + + words = g->words; + while (words && !err) { + u32 word = cmdbuf_base[offset]; + u32 opcode = (word & 0xf0000000) >> 28; + u32 mask = 0; + u32 reg = 0; + u32 count = 0; + + words--; + offset++; + + switch (opcode) { + case 0: + class = word >> 6 & 0x3ff; + mask = word & 0x3f; + reg = word >> 16 & 0xfff; + err = check_mask(job, pdev, + &reloc, &num_relocs, g->mem_id, + &offset, &words, class, reg, mask); + if (err) + goto out; + break; + case 1: + reg = word >> 16 & 0xfff; + count = word & 0xffff; + err = check_incr(job, pdev, + &reloc, &num_relocs, g->mem_id, + &offset, &words, class, reg, count); + if (err) + goto out; + break; + + case 2: + reg = word >> 16 & 0xfff; + count = word & 0xffff; + err = check_nonincr(job, pdev, + &reloc, &num_relocs, g->mem_id, + &offset, &words, class, reg, count); + if (err) + goto out; + break; + + case 3: + mask = word & 0xffff; + reg = word >> 16 & 0xfff; + err = check_mask(job, pdev, + &reloc, &num_relocs, g->mem_id, + &offset, &words, class, reg, mask); + if (err) + goto out; + break; + case 4: + case 5: + case 14: + break; + default: + err = -EINVAL; + break; + } + } + + /* No relocs should remain at this point */ + if (num_relocs) + err = -EINVAL; + +out: + host1x_memmgr_munmap(g->ref, cmdbuf_base); + + return err; +} + +static inline int copy_gathers(struct host1x_job *job, + struct platform_device *pdev) +{ + size_t size = 0; + size_t offset = 0; + int i; + + for (i = 0; i < job->num_gathers; i++) { + struct host1x_job_gather *g = &job->gathers[i]; + size += g->words * sizeof(u32); + } + + job->gather_copy_mapped = dma_alloc_writecombine(&pdev->dev, + size, &job->gather_copy, GFP_KERNEL); + if (IS_ERR(job->gather_copy_mapped)) { + int err = PTR_ERR(job->gather_copy_mapped); + job->gather_copy_mapped = NULL; + return err; + } + + job->gather_copy_size = size; + + for (i = 0; i < job->num_gathers; i++) { + struct host1x_job_gather *g = &job->gathers[i]; + void *gather = host1x_memmgr_mmap(g->ref); + memcpy(job->gather_copy_mapped + offset, + gather + g->offset, + g->words * sizeof(u32)); + + g->mem_base = job->gather_copy; + g->offset = offset; + g->mem_id = 0; + g->ref = 0; + + host1x_memmgr_munmap(g->ref, gather); + offset += g->words * sizeof(u32); + } + + return 0; +} + +int host1x_job_pin(struct host1x_job *job, struct platform_device *pdev) +{ + int err = 0, i = 0, j = 0; + struct host1x *host = host1x_get_host(pdev); + DECLARE_BITMAP(waitchk_mask, host1x_syncpt_nb_pts(host)); + + bitmap_zero(waitchk_mask, host1x_syncpt_nb_pts(host)); + for (i = 0; i < job->num_waitchk; i++) { + u32 syncpt_id = job->waitchk[i].syncpt_id; + if (syncpt_id < host1x_syncpt_nb_pts(host)) + set_bit(syncpt_id, waitchk_mask); + } + + /* get current syncpt values for waitchk */ + for_each_set_bit(i, &waitchk_mask[0], sizeof(waitchk_mask)) + host1x_syncpt_load_min(host->syncpt + i); + + /* pin memory */ + err = pin_job_mem(job); + if (err <= 0) + goto out; + + /* patch gathers */ + for (i = 0; i < job->num_gathers; i++) { + struct host1x_job_gather *g = &job->gathers[i]; + + /* process each gather mem only once */ + if (!g->ref) { + g->ref = host1x_memmgr_get(g->mem_id, job->ch->dev); + if (IS_ERR(g->ref)) { + err = PTR_ERR(g->ref); + g->ref = NULL; + break; + } + + g->mem_base = job->gather_addr_phys[i]; + + for (j = 0; j < job->num_gathers; j++) { + struct host1x_job_gather *tmp = + &job->gathers[j]; + if (!tmp->ref && tmp->mem_id == g->mem_id) { + tmp->ref = g->ref; + tmp->mem_base = g->mem_base; + } + } + err = 0; + if (host1x_firewall) + err = validate(job, pdev, g); + if (err) + dev_err(&pdev->dev, + "Job validate returned %d\n", err); + if (!err) + err = do_relocs(job, g->mem_id, g->ref); + if (!err) + err = do_waitchks(job, host, + g->mem_id, g->ref); + host1x_memmgr_put(g->ref); + if (err) + break; + } + } + + if (host1x_firewall && !err) { + err = copy_gathers(job, pdev); + if (err) { + host1x_job_unpin(job); + return err; + } + } + +out: + wmb(); + + return err; +} + +void host1x_job_unpin(struct host1x_job *job) +{ + int i; + + for (i = 0; i < job->num_unpins; i++) { + struct host1x_job_unpin_data *unpin = &job->unpins[i]; + host1x_memmgr_unpin(unpin->h, unpin->mem); + host1x_memmgr_put(unpin->h); + } + job->num_unpins = 0; + + if (job->gather_copy_size) + dma_free_writecombine(&job->ch->dev->dev, + job->gather_copy_size, + job->gather_copy_mapped, job->gather_copy); +} + +/* + * Debug routine used to dump job entries + */ +void host1x_job_dump(struct device *dev, struct host1x_job *job) +{ + dev_dbg(dev, " SYNCPT_ID %d\n", + job->syncpt_id); + dev_dbg(dev, " SYNCPT_VAL %d\n", + job->syncpt_end); + dev_dbg(dev, " FIRST_GET 0x%x\n", + job->first_get); + dev_dbg(dev, " TIMEOUT %d\n", + job->timeout); + dev_dbg(dev, " NUM_SLOTS %d\n", + job->num_slots); + dev_dbg(dev, " NUM_HANDLES %d\n", + job->num_unpins); +} diff --git a/drivers/gpu/host1x/job.h b/drivers/gpu/host1x/job.h new file mode 100644 index 0000000..428c670 --- /dev/null +++ b/drivers/gpu/host1x/job.h @@ -0,0 +1,164 @@ +/* + * Tegra host1x Job + * + * Copyright (c) 2011-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef __HOST1X_JOB_H +#define __HOST1X_JOB_H + +struct platform_device; + +struct host1x_job_gather { + u32 words; + dma_addr_t mem_base; + u32 mem_id; + int offset; + struct mem_handle *ref; +}; + +struct host1x_cmdbuf { + __u32 mem; + __u32 offset; + __u32 words; + __u32 pad; +}; + +struct host1x_reloc { + __u32 cmdbuf_mem; + __u32 cmdbuf_offset; + __u32 target; + __u32 target_offset; + __u32 shift; + __u32 pad; +}; + +struct host1x_waitchk { + __u32 mem; + __u32 offset; + __u32 syncpt_id; + __u32 thresh; +}; + +/* + * Each submit is tracked as a host1x_job. + */ +struct host1x_job { + /* When refcount goes to zero, job can be freed */ + struct kref ref; + + /* List entry */ + struct list_head list; + + /* Channel where job is submitted to */ + struct host1x_channel *ch; + + int clientid; + + /* Gathers and their memory */ + struct host1x_job_gather *gathers; + int num_gathers; + + /* Wait checks to be processed at submit time */ + struct host1x_waitchk *waitchk; + int num_waitchk; + u32 waitchk_mask; + + /* Array of handles to be pinned & unpinned */ + struct host1x_reloc *relocarray; + int num_relocs; + struct host1x_job_unpin_data *unpins; + int num_unpins; + + dma_addr_t *addr_phys; + dma_addr_t *gather_addr_phys; + dma_addr_t *reloc_addr_phys; + + /* Sync point id, number of increments and end related to the submit */ + u32 syncpt_id; + u32 syncpt_incrs; + u32 syncpt_end; + + /* Maximum time to wait for this job */ + int timeout; + + /* Null kickoff prevents submit from being sent to hardware */ + bool null_kickoff; + + /* Index and number of slots used in the push buffer */ + int first_get; + int num_slots; + + /* Copy of gathers */ + size_t gather_copy_size; + dma_addr_t gather_copy; + u8 *gather_copy_mapped; + + /* Temporary space for unpin ids */ + long unsigned int *pin_ids; + + /* Check if register is marked as an address reg */ + int (*is_addr_reg)(struct platform_device *dev, u32 reg, u32 class); + + /* Request a SETCLASS to this class */ + u32 class; + + /* Add a channel wait for previous ops to complete */ + u32 serialize; +}; +/* + * Allocate memory for a job. Just enough memory will be allocated to + * accomodate the submit. + */ +struct host1x_job *host1x_job_alloc(struct host1x_channel *ch, + u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks); + +/* + * Add a gather to a job. + */ +void host1x_job_add_gather(struct host1x_job *job, + u32 mem_id, u32 words, u32 offset); + +/* + * Increment reference going to host1x_job. + */ +void host1x_job_get(struct host1x_job *job); + +/* + * Decrement reference job, free if goes to zero. + */ +void host1x_job_put(struct host1x_job *job); + +/* + * Pin memory related to job. This handles relocation of addresses to the + * host1x address space. Handles both the gather memory and any other memory + * referred to from the gather buffers. + * + * Handles also patching out host waits that would wait for an expired sync + * point value. + */ +int host1x_job_pin(struct host1x_job *job, struct platform_device *pdev); + +/* + * Unpin memory related to job. + */ +void host1x_job_unpin(struct host1x_job *job); + +/* + * Dump contents of job to debug output. + */ +void host1x_job_dump(struct device *dev, struct host1x_job *job); + +#endif diff --git a/drivers/gpu/host1x/memmgr.c b/drivers/gpu/host1x/memmgr.c new file mode 100644 index 0000000..eceb782 --- /dev/null +++ b/drivers/gpu/host1x/memmgr.c @@ -0,0 +1,173 @@ +/* + * Tegra host1x Memory Management Abstraction + * + * Copyright (c) 2012-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <linux/kernel.h> +#include <linux/err.h> + +#include "memmgr.h" +#include "cma.h" + +struct mem_handle *host1x_memmgr_alloc(size_t size, size_t align, int flags) +{ + return NULL; +} + +struct mem_handle *host1x_memmgr_get(u32 id, struct platform_device *dev) +{ + struct mem_handle *h = NULL; + + switch (host1x_memmgr_type(id)) { +#if defined(CONFIG_TEGRA_HOST1X_CMA) + case mem_mgr_type_cma: + h = (struct mem_handle *) host1x_cma_get(id, dev); + break; +#endif + default: + break; + } + + return h; +} + +void host1x_memmgr_put(struct mem_handle *handle) +{ + switch (host1x_memmgr_type((u32)handle)) { +#if defined(CONFIG_TEGRA_HOST1X_CMA) + case mem_mgr_type_cma: + host1x_cma_put(handle); + break; +#endif + default: + break; + } +} + +struct sg_table *host1x_memmgr_pin(struct mem_handle *handle) +{ + switch (host1x_memmgr_type((u32)handle)) { +#if defined(CONFIG_TEGRA_HOST1X_CMA) + case mem_mgr_type_cma: + return host1x_cma_pin(handle); + break; +#endif + default: + return NULL; + break; + } +} + +void host1x_memmgr_unpin(struct mem_handle *handle, struct sg_table *sgt) +{ + switch (host1x_memmgr_type((u32)handle)) { +#if defined(CONFIG_TEGRA_HOST1X_CMA) + case mem_mgr_type_cma: + host1x_cma_unpin(handle, sgt); + break; +#endif + default: + break; + } +} + +void *host1x_memmgr_mmap(struct mem_handle *handle) +{ + switch (host1x_memmgr_type((u32)handle)) { +#if defined(CONFIG_TEGRA_HOST1X_CMA) + case mem_mgr_type_cma: + return host1x_cma_mmap(handle); + break; +#endif + default: + return NULL; + break; + } +} + +void host1x_memmgr_munmap(struct mem_handle *handle, void *addr) +{ + switch (host1x_memmgr_type((u32)handle)) { +#if defined(CONFIG_TEGRA_HOST1X_CMA) + case mem_mgr_type_cma: + host1x_cma_munmap(handle, addr); + break; +#endif + default: + break; + } +} + +void *host1x_memmgr_kmap(struct mem_handle *handle, unsigned int pagenum) +{ + switch (host1x_memmgr_type((u32)handle)) { +#if defined(CONFIG_TEGRA_HOST1X_CMA) + case mem_mgr_type_cma: + return host1x_cma_kmap(handle, pagenum); + break; +#endif + default: + return NULL; + break; + } +} + +void host1x_memmgr_kunmap(struct mem_handle *handle, unsigned int pagenum, + void *addr) +{ + switch (host1x_memmgr_type((u32)handle)) { +#if defined(CONFIG_TEGRA_HOST1X_CMA) + case mem_mgr_type_cma: + host1x_cma_kunmap(handle, pagenum, addr); + break; +#endif + default: + break; + } +} + +int host1x_memmgr_pin_array_ids(struct platform_device *dev, + long unsigned *ids, + dma_addr_t *phys_addr, + u32 count, + struct host1x_job_unpin_data *unpin_data) +{ + int pin_count = 0; + +#if defined(CONFIG_TEGRA_HOST1X_CMA) + { + int cma_count = host1x_cma_pin_array_ids(dev, + ids, MEMMGR_TYPE_MASK, + mem_mgr_type_cma, + count, &unpin_data[pin_count], + phys_addr); + + if (cma_count < 0) { + /* clean up previous handles */ + while (pin_count) { + pin_count--; + /* unpin, put */ + host1x_memmgr_unpin(unpin_data[pin_count].h, + unpin_data[pin_count].mem); + host1x_memmgr_put(unpin_data[pin_count].h); + } + return cma_count; + } + pin_count += cma_count; + } +#endif + return pin_count; +} diff --git a/drivers/gpu/host1x/memmgr.h b/drivers/gpu/host1x/memmgr.h new file mode 100644 index 0000000..a265fe8 --- /dev/null +++ b/drivers/gpu/host1x/memmgr.h @@ -0,0 +1,72 @@ +/* + * Tegra host1x Memory Management Abstraction header + * + * Copyright (c) 2012-2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef _HOST1X_MEM_MGR_H +#define _HOST1X_MEM_MGR_H + +struct mem_handle; +struct platform_device; + +struct host1x_job_unpin_data { + struct mem_handle *h; + struct sg_table *mem; +}; + +enum mem_mgr_flag { + mem_mgr_flag_uncacheable = 0, + mem_mgr_flag_write_combine = 1, +}; + +/* Buffer encapsulation */ +enum mem_mgr_type { + mem_mgr_type_cma = 2, +}; + +#define MEMMGR_TYPE_MASK 0x3 +#define MEMMGR_ID_MASK ~0x3 + +static inline int host1x_memmgr_type(u32 id) { return id & MEMMGR_TYPE_MASK; } +static inline int host1x_memmgr_id(u32 id) { return id & MEMMGR_ID_MASK; } +static inline unsigned int host1x_memmgr_host1x_id(u32 type, u32 handle) +{ + if (host1x_memmgr_type(type) != type || + host1x_memmgr_id(handle) != handle) + return 0; + + return handle | type; +} + +struct mem_handle *host1x_memmgr_alloc(size_t size, size_t align, + int flags); +struct mem_handle *host1x_memmgr_get(u32 id, struct platform_device *dev); +void host1x_memmgr_put(struct mem_handle *handle); +struct sg_table *host1x_memmgr_pin(struct mem_handle *handle); +void host1x_memmgr_unpin(struct mem_handle *handle, struct sg_table *sgt); +void *host1x_memmgr_mmap(struct mem_handle *handle); +void host1x_memmgr_munmap(struct mem_handle *handle, void *addr); +void *host1x_memmgr_kmap(struct mem_handle *handle, unsigned int pagenum); +void host1x_memmgr_kunmap(struct mem_handle *handle, unsigned int pagenum, + void *addr); + +int host1x_memmgr_pin_array_ids(struct platform_device *dev, + long unsigned *ids, + dma_addr_t *phys_addr, + u32 count, + struct host1x_job_unpin_data *unpin_data); + +#endif diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c index 32e2b42..f21c688 100644 --- a/drivers/gpu/host1x/syncpt.c +++ b/drivers/gpu/host1x/syncpt.c @@ -287,6 +287,12 @@ void host1x_syncpt_debug(struct host1x_syncpt *sp) sp->dev->syncpt_op.debug(sp); }
+/* remove a wait pointed to by patch_addr */ +int host1x_syncpt_patch_wait(struct host1x_syncpt *sp, void *patch_addr) +{ + return sp->dev->syncpt_op.patch_wait(sp, patch_addr); +} + int host1x_syncpt_init(struct host1x *host) { struct host1x_syncpt *syncpt, *sp; @@ -305,6 +311,11 @@ int host1x_syncpt_init(struct host1x *host)
host->syncpt = syncpt;
+ /* Allocate sync point to use for clearing waits for expired fences */ + host->nop_sp = _host1x_syncpt_alloc(host, NULL, 0); + if (!host->nop_sp) + return -ENOMEM; + return 0; }
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h index b46d044..255a3a3 100644 --- a/drivers/gpu/host1x/syncpt.h +++ b/drivers/gpu/host1x/syncpt.h @@ -26,6 +26,7 @@ struct host1x;
#define NVSYNCPT_INVALID (-1) +#define NVSYNCPT_GRAPHICS_HOST 0
struct host1x_syncpt { int id; @@ -145,6 +146,9 @@ static inline int host1x_syncpt_is_valid(struct host1x_syncpt *sp) sp->id < host1x_syncpt_nb_pts(sp->dev); }
+/* Patch a wait by replacing it with a wait for syncpt 0 value 0 */ +int host1x_syncpt_patch_wait(struct host1x_syncpt *sp, void *patch_addr); + /* Return id of the sync point */ u32 host1x_syncpt_id(struct host1x_syncpt *sp);
diff --git a/include/trace/events/host1x.h b/include/trace/events/host1x.h index 3c14cac..c63d75c 100644 --- a/include/trace/events/host1x.h +++ b/include/trace/events/host1x.h @@ -37,6 +37,190 @@ DECLARE_EVENT_CLASS(host1x, TP_printk("name=%s", __entry->name) );
+DEFINE_EVENT(host1x, host1x_channel_open, + TP_PROTO(const char *name), + TP_ARGS(name) +); + +DEFINE_EVENT(host1x, host1x_channel_release, + TP_PROTO(const char *name), + TP_ARGS(name) +); + +DEFINE_EVENT(host1x, host1x_cdma_begin, + TP_PROTO(const char *name), + TP_ARGS(name) +); + +DEFINE_EVENT(host1x, host1x_cdma_end, + TP_PROTO(const char *name), + TP_ARGS(name) +); + +TRACE_EVENT(host1x, + TP_PROTO(const char *name, int timeout), + + TP_ARGS(name, timeout), + + TP_STRUCT__entry( + __field(const char *, name) + __field(int, timeout) + ), + + TP_fast_assign( + __entry->name = name; + __entry->timeout = timeout; + ), + + TP_printk("name=%s, timeout=%d", + __entry->name, __entry->timeout) +); + +TRACE_EVENT(host1x_cdma_push, + TP_PROTO(const char *name, u32 op1, u32 op2), + + TP_ARGS(name, op1, op2), + + TP_STRUCT__entry( + __field(const char *, name) + __field(u32, op1) + __field(u32, op2) + ), + + TP_fast_assign( + __entry->name = name; + __entry->op1 = op1; + __entry->op2 = op2; + ), + + TP_printk("name=%s, op1=%08x, op2=%08x", + __entry->name, __entry->op1, __entry->op2) +); + +TRACE_EVENT(host1x_cdma_push_gather, + TP_PROTO(const char *name, u32 mem_id, + u32 words, u32 offset, void *cmdbuf), + + TP_ARGS(name, mem_id, words, offset, cmdbuf), + + TP_STRUCT__entry( + __field(const char *, name) + __field(u32, mem_id) + __field(u32, words) + __field(u32, offset) + __field(bool, cmdbuf) + __dynamic_array(u32, cmdbuf, words) + ), + + TP_fast_assign( + if (cmdbuf) { + memcpy(__get_dynamic_array(cmdbuf), cmdbuf+offset, + words * sizeof(u32)); + } + __entry->cmdbuf = cmdbuf; + __entry->name = name; + __entry->mem_id = mem_id; + __entry->words = words; + __entry->offset = offset; + ), + + TP_printk("name=%s, mem_id=%08x, words=%u, offset=%d, contents=[%s]", + __entry->name, __entry->mem_id, + __entry->words, __entry->offset, + __print_hex(__get_dynamic_array(cmdbuf), + __entry->cmdbuf ? __entry->words * 4 : 0)) +); + +TRACE_EVENT(host1x_channel_submit, + TP_PROTO(const char *name, u32 cmdbufs, u32 relocs, u32 waitchks, + u32 syncpt_id, u32 syncpt_incrs), + + TP_ARGS(name, cmdbufs, relocs, waitchks, syncpt_id, syncpt_incrs), + + TP_STRUCT__entry( + __field(const char *, name) + __field(u32, cmdbufs) + __field(u32, relocs) + __field(u32, waitchks) + __field(u32, syncpt_id) + __field(u32, syncpt_incrs) + ), + + TP_fast_assign( + __entry->name = name; + __entry->cmdbufs = cmdbufs; + __entry->relocs = relocs; + __entry->waitchks = waitchks; + __entry->syncpt_id = syncpt_id; + __entry->syncpt_incrs = syncpt_incrs; + ), + + TP_printk("name=%s, cmdbufs=%u, relocs=%u, waitchks=%d," + "syncpt_id=%u, syncpt_incrs=%u", + __entry->name, __entry->cmdbufs, __entry->relocs, __entry->waitchks, + __entry->syncpt_id, __entry->syncpt_incrs) +); + +TRACE_EVENT(host1x_channel_submitted, + TP_PROTO(const char *name, u32 syncpt_base, u32 syncpt_max), + + TP_ARGS(name, syncpt_base, syncpt_max), + + TP_STRUCT__entry( + __field(const char *, name) + __field(u32, syncpt_base) + __field(u32, syncpt_max) + ), + + TP_fast_assign( + __entry->name = name; + __entry->syncpt_base = syncpt_base; + __entry->syncpt_max = syncpt_max; + ), + + TP_printk("name=%s, syncpt_base=%d, syncpt_max=%d", + __entry->name, __entry->syncpt_base, __entry->syncpt_max) +); + +TRACE_EVENT(host1x_channel_submit_complete, + TP_PROTO(const char *name, int count, u32 thresh), + + TP_ARGS(name, count, thresh), + + TP_STRUCT__entry( + __field(const char *, name) + __field(int, count) + __field(u32, thresh) + ), + + TP_fast_assign( + __entry->name = name; + __entry->count = count; + __entry->thresh = thresh; + ), + + TP_printk("name=%s, count=%d, thresh=%d", + __entry->name, __entry->count, __entry->thresh) +); + +TRACE_EVENT(host1x_wait_cdma, + TP_PROTO(const char *name, u32 eventid), + + TP_ARGS(name, eventid), + + TP_STRUCT__entry( + __field(const char *, name) + __field(u32, eventid) + ), + + TP_fast_assign( + __entry->name = name; + __entry->eventid = eventid; + ), + + TP_printk("name=%s, event=%d", __entry->name, __entry->eventid) +); + TRACE_EVENT(host1x_syncpt_load_min, TP_PROTO(u32 id, u32 val),
@@ -55,6 +239,33 @@ TRACE_EVENT(host1x_syncpt_load_min, TP_printk("id=%d, val=%d", __entry->id, __entry->val) );
+TRACE_EVENT(host1x_syncpt_wait_check, + TP_PROTO(u32 mem_id, u32 offset, u32 syncpt_id, u32 thresh, u32 min), + + TP_ARGS(mem_id, offset, syncpt_id, thresh, min), + + TP_STRUCT__entry( + __field(u32, mem_id) + __field(u32, offset) + __field(u32, syncpt_id) + __field(u32, thresh) + __field(u32, min) + ), + + TP_fast_assign( + __entry->mem_id = mem_id; + __entry->offset = offset; + __entry->syncpt_id = syncpt_id; + __entry->thresh = thresh; + __entry->min = min; + ), + + TP_printk("mem_id=%08x, offset=%05x, id=%d, thresh=%d, current=%d", + __entry->mem_id, __entry->offset, + __entry->syncpt_id, __entry->thresh, + __entry->min) +); + #endif /* _TRACE_HOST1X_H */
/* This part must be outside protection */
On Tue, Jan 15, 2013 at 01:43:59PM +0200, Terje Bergstrom wrote: [...]
diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig index e89fb2b..57680a6 100644 --- a/drivers/gpu/host1x/Kconfig +++ b/drivers/gpu/host1x/Kconfig @@ -3,4 +3,27 @@ config TEGRA_HOST1X help Driver for the Tegra host1x hardware.
Required for enabling tegradrm.
Required for enabling tegradrm and 2D acceleration.
I don't think I commented on this in the other patches, but I think this could use a bit more information about what host1x is. Also mentioning that it is a requirement for tegra-drm and 2D acceleration isn't very useful because it can equally well be expressed in Kconfig. If you add some description about what host1x is, people will know that they want to enable it.
+if TEGRA_HOST1X
+config TEGRA_HOST1X_CMA
- bool "Support DRM CMA buffers"
- depends on DRM
- default y
- select DRM_GEM_CMA_HELPER
- select DRM_KMS_CMA_HELPER
- help
Say yes if you wish to use DRM CMA buffers.
If unsure, choose Y.
Perhaps make this not user-selectable (for now)? If somebody disables this explicitly they won't get a working driver, right?
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
[...]
+#include "cdma.h" +#include "channel.h" +#include "dev.h" +#include "memmgr.h" +#include "job.h" +#include <asm/cacheflush.h>
+#include <linux/slab.h> +#include <linux/kfifo.h> +#include <linux/interrupt.h> +#include <trace/events/host1x.h>
+#define TRACE_MAX_LENGTH 128U
"" includes generally follow <> ones.
+/*
- Add an entry to the sync queue.
- */
+static void add_to_sync_queue(struct host1x_cdma *cdma,
struct host1x_job *job,
u32 nr_slots,
u32 first_get)
+{
- if (job->syncpt_id == NVSYNCPT_INVALID) {
dev_warn(&job->ch->dev->dev, "%s: Invalid syncpt\n",
__func__);
return;
- }
- job->first_get = first_get;
- job->num_slots = nr_slots;
- host1x_job_get(job);
- list_add_tail(&job->list, &cdma->sync_queue);
+}
It's a bit odd that you pass a job in here along with some parameters that are then assigned to the job's fields. Couldn't you just assign them to the job's fields before passing the job into this function?
I also see that you only use this function once, so maybe you could open-code it instead.
+/*
- Return the status of the cdma's sync queue or push buffer for the given event
- sq empty: returns 1 for empty, 0 for not empty (as in "1 empty queue" :-)
- pb space: returns the number of free slots in the channel's push buffer
- Must be called with the cdma lock held.
- */
+static unsigned int cdma_status_locked(struct host1x_cdma *cdma,
enum cdma_event event)
+{
- struct host1x *host1x = cdma_to_host1x(cdma);
- switch (event) {
- case CDMA_EVENT_SYNC_QUEUE_EMPTY:
return list_empty(&cdma->sync_queue) ? 1 : 0;
- case CDMA_EVENT_PUSH_BUFFER_SPACE: {
struct push_buffer *pb = &cdma->push_buffer;
return host1x->cdma_pb_op.space(pb);
- }
- default:
return 0;
- }
+}
Similarly this function is only used in one place and it requires a whole lot of documentation to define the meaning of the return value. If you implement this functionality directly in host1x_cdma_wait_locked() you have much more context and don't require all this "protocol".
+/*
- Start timer for a buffer submition that has completed yet.
"submission". And I don't understand the "that has completed yet" part.
- Must be called with the cdma lock held.
- */
+static void cdma_start_timer_locked(struct host1x_cdma *cdma,
struct host1x_job *job)
You use two different styles to indent the function parameters. You might want to stick to one, preferably aligning them with the first parameter on the first line.
+{
- struct host1x *host = cdma_to_host1x(cdma);
- if (cdma->timeout.clientid) {
/* timer already started */
return;
- }
- cdma->timeout.clientid = job->clientid;
- cdma->timeout.syncpt = host1x_syncpt_get(host, job->syncpt_id);
- cdma->timeout.syncpt_val = job->syncpt_end;
- cdma->timeout.start_ktime = ktime_get();
- schedule_delayed_work(&cdma->timeout.wq,
msecs_to_jiffies(job->timeout));
+}
+/*
- Stop timer when a buffer submition completes.
"submission"
+/*
- For all sync queue entries that have already finished according to the
- current sync point registers:
- unpin & unref their mems
- pop their push buffer slots
- remove them from the sync queue
- This is normally called from the host code's worker thread, but can be
- called manually if necessary.
- Must be called with the cdma lock held.
- */
+static void update_cdma_locked(struct host1x_cdma *cdma) +{
- bool signal = false;
- struct host1x *host1x = cdma_to_host1x(cdma);
- struct host1x_job *job, *n;
- /* If CDMA is stopped, queue is cleared and we can return */
- if (!cdma->running)
return;
- /*
* Walk the sync queue, reading the sync point registers as necessary,
* to consume as many sync queue entries as possible without blocking
*/
- list_for_each_entry_safe(job, n, &cdma->sync_queue, list) {
struct host1x_syncpt *sp = host1x->syncpt + job->syncpt_id;
host1x_syncpt_get()?
/* Check whether this syncpt has completed, and bail if not */
if (!host1x_syncpt_is_expired(sp, job->syncpt_end)) {
/* Start timer on next pending syncpt */
if (job->timeout)
cdma_start_timer_locked(cdma, job);
break;
}
/* Cancel timeout, when a buffer completes */
if (cdma->timeout.clientid)
stop_cdma_timer_locked(cdma);
/* Unpin the memory */
host1x_job_unpin(job);
/* Pop push buffer slots */
if (job->num_slots) {
struct push_buffer *pb = &cdma->push_buffer;
host1x->cdma_pb_op.pop_from(pb, job->num_slots);
if (cdma->event == CDMA_EVENT_PUSH_BUFFER_SPACE)
signal = true;
}
list_del(&job->list);
host1x_job_put(job);
- }
- if (list_empty(&cdma->sync_queue) &&
cdma->event == CDMA_EVENT_SYNC_QUEUE_EMPTY)
signal = true;
This looks funny, maybe:
if (cdma->event == CDMA_EVENT_SYNC_QUEUE_EMPTY && list_empty(&cdma->sync_queue)) signal = true;
?
- /* Wake up CdmaWait() if the requested event happened */
CdmaWait()? Where's that?
- if (signal) {
cdma->event = CDMA_EVENT_NONE;
up(&cdma->sem);
- }
+}
+void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
struct platform_device *dev)
There's nothing in this function that requires a platform_device, so passing struct device should be enough. Or maybe host1x_cdma should get a struct device * field?
+{
- u32 get_restart;
Maybe just call this "restart" or "restart_addr". get_restart sounds like a function name.
- u32 syncpt_incrs;
- struct host1x_job *job = NULL;
- u32 syncpt_val;
- struct host1x *host1x = cdma_to_host1x(cdma);
- syncpt_val = host1x_syncpt_load_min(cdma->timeout.syncpt);
- dev_dbg(&dev->dev,
"%s: starting cleanup (thresh %d)\n",
__func__, syncpt_val);
This fits on two lines.
- /*
* Move the sync_queue read pointer to the first entry that hasn't
* completed based on the current HW syncpt value. It's likely there
* won't be any (i.e. we're still at the head), but covers the case
* where a syncpt incr happens just prior/during the teardown.
*/
- dev_dbg(&dev->dev,
"%s: skip completed buffers still in sync_queue\n",
__func__);
This too.
- list_for_each_entry(job, &cdma->sync_queue, list) {
if (syncpt_val < job->syncpt_end)
break;
host1x_job_dump(&dev->dev, job);
- }
That's potentially a lot of debug output. I wonder if it might make sense to control parts of this via a module parameter. Then again, if somebody really needs to debug this, maybe they really want *all* the information.
- /*
* Walk the sync_queue, first incrementing with the CPU syncpts that
* are partially executed (the first buffer) or fully skipped while
* still in the current context (slots are also NOP-ed).
*
* At the point contexts are interleaved, syncpt increments must be
* done inline with the pushbuffer from a GATHER buffer to maintain
* the order (slots are modified to be a GATHER of syncpt incrs).
*
* Note: save in get_restart the location where the timed out buffer
* started in the PB, so we can start the refetch from there (with the
* modified NOP-ed PB slots). This lets things appear to have completed
* properly for this buffer and resources are freed.
*/
- dev_dbg(&dev->dev,
"%s: perform CPU incr on pending same ctx buffers\n",
__func__);
Can be collapsed to two lines.
- get_restart = cdma->last_put;
- if (!list_empty(&cdma->sync_queue))
get_restart = job->first_get;
Perhaps:
if (list_empty(&cdma->sync_queue)) restart = cdma->last_put; else restart = job->first_get;
?
- list_for_each_entry_from(job, &cdma->sync_queue, list)
if (job->clientid == cdma->timeout.clientid)
job->timeout = 500;
I think this warrants a comment.
+/*
- Destroy a cdma
- */
+void host1x_cdma_deinit(struct host1x_cdma *cdma) +{
- struct push_buffer *pb = &cdma->push_buffer;
- struct host1x *host1x = cdma_to_host1x(cdma);
- if (cdma->running) {
pr_warn("%s: CDMA still running\n",
__func__);
- } else {
host1x->cdma_pb_op.destroy(pb);
host1x->cdma_op.timeout_destroy(cdma);
- }
+}
There's no way to recover from the situation where a cdma is still running. Can this not return an error code (-EBUSY?) if the cdma can't be destroyed?
+/*
- End a cdma submit
- Kick off DMA, add job to the sync queue, and a number of slots to be freed
- from the pushbuffer. The handles for a submit must all be pinned at the same
- time, but they can be unpinned in smaller chunks.
- */
+void host1x_cdma_end(struct host1x_cdma *cdma,
struct host1x_job *job)
+{
- struct host1x *host1x = cdma_to_host1x(cdma);
- bool was_idle = list_empty(&cdma->sync_queue);
Maybe just "idle"? It reflects the current state of the CDMA, not any old state.
- host1x->cdma_op.kick(cdma);
- add_to_sync_queue(cdma,
job,
cdma->slots_used,
cdma->first_get);
No need to split this over so many lines. Also, shouldn't the order be reversed here? I.e. first add to sync queue, then start DMA?
- /* start timer on idle -> active transitions */
- if (job->timeout && was_idle)
cdma_start_timer_locked(cdma, job);
This could be part of add_to_sync_queue(), but if you open-code that as I suggest earlier it should obviously stay.
diff --git a/drivers/gpu/host1x/cdma.h b/drivers/gpu/host1x/cdma.h
[...]
+struct platform_device;
No need for this if you pass struct device * instead.
+/*
- cdma
- This is in charge of a host command DMA channel.
- Sends ops to a push buffer, and takes responsibility for unpinning
- (& possibly freeing) of memory after those ops have completed.
- Producer:
- begin
push - send ops to the push buffer
- end - start command DMA and enqueue handles to be unpinned
- Consumer:
- update - call to update sync queue and push buffer, unpin memory
- */
I find the name to be a bit confusing. For some reason I automatically think of GSM when I read CDMA. This really is more of a job queue, so maybe calling it host1x_job_queue might be more appropriate. But I've already requested a lot of things to be renamed, so I think I can live with this being called CDMA if you don't want to change it.
Alternatively all of these could be moved to the struct host1x_channel given that there's only one of each of the push_buffer, buffer_timeout and host1x_cma objects per channel.
diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
[...]
+#include "channel.h" +#include "dev.h" +#include "job.h"
+#include <linux/slab.h> +#include <linux/module.h>
Again the include ordering is strange.
+/*
- Iterator function for host1x device list
- It takes a fptr as an argument and calls that function for each
- device in the list
- */
+void host1x_channel_for_all(struct host1x *host1x, void *data,
- int (*fptr)(struct host1x_channel *ch, void *fdata))
+{
- struct host1x_channel *ch;
- int ret;
- list_for_each_entry(ch, &host1x->chlist.list, list) {
if (ch && fptr) {
ret = fptr(ch, data);
if (ret) {
pr_info("%s: iterator error\n", __func__);
break;
}
}
- }
+}
Couldn't you rewrite this as a macro, similar to list_for_each_entry() so that users could do something like:
host1x_for_each_channel(channel, host1x) { ... }
That's a bit friendlier than having each user write a separate function to be called from this iterator.
+int host1x_channel_submit(struct host1x_job *job) +{
- return host1x_get_host(job->ch->dev)->channel_op.submit(job);
+}
I'd expect a function named host1x_channel_submit() to take a struct host1x_channel *. Should this perhaps be called host1x_job_submit()?
+struct host1x_channel *host1x_channel_get(struct host1x_channel *ch) +{
- int err = 0;
- mutex_lock(&ch->reflock);
- if (ch->refcount == 0)
err = host1x_cdma_init(&ch->cdma);
- if (!err)
ch->refcount++;
- mutex_unlock(&ch->reflock);
- return err ? NULL : ch;
+}
Why don't you use any of the kernel's reference counting mechanisms?
+void host1x_channel_put(struct host1x_channel *ch) +{
- mutex_lock(&ch->reflock);
- if (ch->refcount == 1) {
host1x_get_host(ch->dev)->cdma_op.stop(&ch->cdma);
host1x_cdma_deinit(&ch->cdma);
- }
- ch->refcount--;
- mutex_unlock(&ch->reflock);
+}
I think you can do all of this using a kref.
+struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev) +{
- struct host1x_channel *ch = NULL;
- struct host1x *host1x = host1x_get_host(pdev);
- int chindex;
- int max_channels = host1x->info.nb_channels;
- int err;
- mutex_lock(&host1x->chlist_mutex);
- chindex = host1x->allocated_channels;
- if (chindex > max_channels)
goto fail;
- ch = kzalloc(sizeof(*ch), GFP_KERNEL);
- if (ch == NULL)
goto fail;
- /* Link platform_device to host1x_channel */
- err = host1x->channel_op.init(ch, host1x, chindex);
- if (err < 0)
goto fail;
- ch->dev = pdev;
- /* Add to channel list */
- list_add_tail(&ch->list, &host1x->chlist.list);
- host1x->allocated_channels++;
- mutex_unlock(&host1x->chlist_mutex);
- return ch;
+fail:
- dev_err(&pdev->dev, "failed to init channel\n");
- kfree(ch);
- mutex_unlock(&host1x->chlist_mutex);
- return NULL;
+}
I think the critical section could be shorter here. It's probably not worth the extra trouble, though, given that channels are not often allocated.
+void host1x_channel_free(struct host1x_channel *ch) +{
- struct host1x *host1x = host1x_get_host(ch->dev);
- struct host1x_channel *chiter, *tmp;
- list_for_each_entry_safe(chiter, tmp, &host1x->chlist.list, list) {
if (chiter == ch) {
list_del(&chiter->list);
kfree(ch);
host1x->allocated_channels--;
return;
}
- }
+}
This doesn't free the channel if it happens to not be part of the host1x channel list. Perhaps an easier way to write it would be:
host1x = host1x_get_host(ch->dev);
list_del(&ch->list); kfree(ch);
host1x->allocated_channels--;
Looking at the rest of the code, it seems like a channel will never not be part of the host1x channel list, so I don't think there's a need to to scan the list.
On a side-note: generally if you break out of the loop right after freeing the memory of a removed node, there's no need to use the _safe variant since you won't be accessing the .next field of the freed node anyway.
Maybe these should also adopt a similar naming as what we discussed for the syncpoints. That is:
struct host1x_channel *host1x_channel_request(struct device *dev);
?
diff --git a/drivers/gpu/host1x/channel.h b/drivers/gpu/host1x/channel.h
[...]
+/*
- host1x device list in debug-fs dump of host1x and client device
- as well as channel state
- */
I don't understand this comment.
+struct host1x_channel {
- struct list_head list;
- int refcount;
- int chid;
This can probably just be id. It is a field of host1x_channel, so the ch prefix is redundant.
- struct mutex reflock;
- struct mutex submitlock;
- void __iomem *regs;
- struct device *node;
This is never used.
- struct platform_device *dev;
Can this be just struct device *?
- struct cdev cdev;
This is never used.
+/* channel list operations */ +void host1x_channel_list_init(struct host1x *); +void host1x_channel_for_all(struct host1x *, void *data,
- int (*fptr)(struct host1x_channel *ch, void *fdata));
+struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev); +void host1x_channel_free(struct host1x_channel *ch);
Is it a good idea to make host1x_channel_free() publicly available? Shouldn't the host1x_channel_alloc()/host1x_channel_request() return a host1x_channel with a reference count of 1 and everybody release their reference using host1x_channel_put() to make sure the channel is freed only after the last reference disappears?
Otherwise whoever calls host1x_channel_free() will confuse everybody else that's still keeping a reference.
diff --git a/drivers/gpu/host1x/cma.c b/drivers/gpu/host1x/cma.c
[...]
Various spurious blank lines in this file, and the alignment of function parameters is off.
+struct mem_handle *host1x_cma_get(u32 id, struct platform_device *dev)
I don't think this needs platform_device either.
+{
- struct drm_gem_cma_object *obj = to_cma_obj((void *)id);
- struct mutex *struct_mutex = &obj->base.dev->struct_mutex;
- mutex_lock(struct_mutex);
- drm_gem_object_reference(&obj->base);
- mutex_unlock(struct_mutex);
I think it's more customary to obtain a pointer to struct drm_device and then use mutex_{lock,unlock}(&drm->struct_mutex). Or you could just use drm_gem_object_reference_unlocked(&obj->base) instead. Which doesn't exist yet, apparently. But it could be added.
+int host1x_cma_pin_array_ids(struct platform_device *dev,
long unsigned *ids,
long unsigned id_type_mask,
long unsigned id_type,
u32 count,
struct host1x_job_unpin_data *unpin_data,
dma_addr_t *phys_addr)
struct device * and unsigned long please. count can also doesn't need to be a sized type. unsigned int will do just fine. The return value can also be unsigned int if you don't expect to return any error conditions.
+{
- int i;
- int pin_count = 0;
Both should be unsigned as well, and can go on one line:
unsigned int pin_count = 0, i;
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
[...]
struct host1x; +struct host1x_intr; struct host1x_syncpt; +struct host1x_channel; +struct host1x_cdma; +struct host1x_job; +struct push_buffer; +struct dentry;
I think this already belongs in a previous patch. The debugfs dentry isn't added in this patch.
+struct host1x_channel_ops {
- int (*init)(struct host1x_channel *,
struct host1x *,
int chid);
Please add the parameter names as well (the same goes for all ops declared in this file). And "id" will be enough. Also the channel ID can surely be unsigned, right?
+struct host1x_cdma_ops {
- void (*start)(struct host1x_cdma *);
- void (*stop)(struct host1x_cdma *);
- void (*kick)(struct host1x_cdma *);
- int (*timeout_init)(struct host1x_cdma *,
u32 syncpt_id);
- void (*timeout_destroy)(struct host1x_cdma *);
- void (*timeout_teardown_begin)(struct host1x_cdma *);
- void (*timeout_teardown_end)(struct host1x_cdma *,
u32 getptr);
- void (*timeout_cpu_incr)(struct host1x_cdma *,
u32 getptr,
u32 syncpt_incrs,
u32 syncval,
u32 nr_slots);
+};
Can the timeout_ prefix not be dropped? The functions are generally useful and not directly related to timeouts, even though they seem to only be used during timeout handling.
Also, is it really necessary to abstract these into an ops structure? I get that newer hardware revisions might require different ops for sync- point handling because the register layout or number of syncpoints may be different, but the CDMA and push buffer (below) concepts are pretty much a software abstraction, and as such its implementation is unlikely to change with some future hardware revision.
+struct host1x_pushbuffer_ops {
- void (*reset)(struct push_buffer *);
- int (*init)(struct push_buffer *);
- void (*destroy)(struct push_buffer *);
- void (*push_to)(struct push_buffer *,
struct mem_handle *,
u32 op1, u32 op2);
- void (*pop_from)(struct push_buffer *,
unsigned int slots);
Maybe just push() and pop()?
- u32 (*space)(struct push_buffer *);
- u32 (*putptr)(struct push_buffer *);
+};
struct host1x_syncpt_ops { void (*reset)(struct host1x_syncpt *); @@ -64,9 +111,19 @@ struct host1x { struct host1x_device_info info; struct clk *clk;
/* Sync point dedicated to replacing waits for expired fences */
struct host1x_syncpt *nop_sp;
struct host1x_channel_ops channel_op;
struct host1x_cdma_ops cdma_op;
struct host1x_pushbuffer_ops cdma_pb_op; struct host1x_syncpt_ops syncpt_op; struct host1x_intr_ops intr_op;
struct mutex chlist_mutex;
struct host1x_channel chlist;
Shouldn't this just be struct list_head?
- int allocated_channels;
unsigned int? And maybe just "num_channels"?
diff --git a/drivers/gpu/host1x/host1x.h b/drivers/gpu/host1x/host1x.h
[...]
+enum host1x_class {
- NV_HOST1X_CLASS_ID = 0x1,
- NV_GRAPHICS_2D_CLASS_ID = 0x51,
This entry belongs in a later patch, right? And I find it convenient if enumeration constants start with the enum name as prefix. Furthermore it'd be nice to reuse the hardware module names, like so:
enum host1x_class { HOST1X_CLASS_HOST1X, HOST1X_CLASS_GR2D, HOST1X_CLASS_GR3D, };
diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
[...]
+#include <linux/slab.h> +#include <linux/scatterlist.h> +#include <linux/dma-mapping.h> +#include "cdma.h" +#include "channel.h" +#include "dev.h" +#include "memmgr.h"
+#include "cdma_hw.h"
+static inline u32 host1x_channel_dmactrl(int stop, int get_rst, int init_get) +{
- return HOST1X_CHANNEL_DMACTRL_DMASTOP_F(stop)
| HOST1X_CHANNEL_DMACTRL_DMAGETRST_F(get_rst)
| HOST1X_CHANNEL_DMACTRL_DMAINITGET_F(init_get);
I think it is more customary to put the | at the end of the preceding line:
return HOST1X_CHANNEL_DMACTRL_DMASTOP_F(stop) | HOST1X_CHANNEL_DMACTRL_DMAGETRST_F(get_rst) | HOST1X_CHANNEL_DMACTRL_DMAINITGET_F(init_get);
Also since these are all single bits, I'd prefer if you could drop the _F suffix and not make them take a parameter. I think it'd even be better not to have this function at all, but make the intent explicit where the register is written. That is, have each call site set the bits explicitly instead of calling this helper. Having a parameter list such as (true, false, false) or (true, true, true) is confusing since you have to keep looking up the meaning of the parameters.
+}
+static void cdma_timeout_handler(struct work_struct *work);
Can this prototype be avoided?
+/**
- Reset to empty push buffer
- */
+static void push_buffer_reset(struct push_buffer *pb) +{
- pb->fence = PUSH_BUFFER_SIZE - 8;
- pb->cur = 0;
Maybe position is a better name than cur.
+/**
- Init push buffer resources
- */
+static void push_buffer_destroy(struct push_buffer *pb);
You should be careful with these comment blocks. If you start them with /**, then you should make them proper kerneldoc comments. But you don't really need that for static functions, so you could just make them /*- style.
Also this particular comment is confusingly place on top of the proto- type of push_buffer_destroy().
+/*
- Push two words to the push buffer
- Caller must ensure push buffer is not full
- */
+static void push_buffer_push_to(struct push_buffer *pb,
struct mem_handle *handle,
u32 op1, u32 op2)
+{
- u32 cur = pb->cur;
- u32 *p = (u32 *)((u32)pb->mapped + cur);
You do all this extra casting to make sure to increment by bytes and not 32-bit words. How about you change pb->cur to contain the word index, so that you don't have to go through hoops each time around.
Alternatively you could make it a pointer to u32 and not have to index or cast at all. So you'd end up with something like:
struct push_buffer { u32 *start; u32 *end; u32 *ptr; };
+/*
- Return the number of two word slots free in the push buffer
- */
+static u32 push_buffer_space(struct push_buffer *pb) +{
- return ((pb->fence - pb->cur) & (PUSH_BUFFER_SIZE - 1)) / 8;
+}
Why & (PUSH_BUFFER_SIZE - 1) here? fence - cur can never be larger than PUSH_BUFFER_SIZE, can it?
+/*
- Init timeout resources
- */
+static int cdma_timeout_init(struct host1x_cdma *cdma,
u32 syncpt_id)
+{
- if (syncpt_id == NVSYNCPT_INVALID)
return -EINVAL;
Do we really need the syncpt_id check here? It is the only reason why we need to pass the parameter in the first place, and if we get to this point we should already have made sure that the syncpoint is actually valid.
+/*
- Increment timedout buffer's syncpt via CPU.
Nit: "timed out buffer's"
- */
+static void cdma_timeout_cpu_incr(struct host1x_cdma *cdma, u32 getptr,
u32 syncpt_incrs, u32 syncval, u32 nr_slots)
The syncval parameter isn't used.
+{
- struct host1x *host1x = cdma_to_host1x(cdma);
- struct push_buffer *pb = &cdma->push_buffer;
- u32 i, getidx;
- for (i = 0; i < syncpt_incrs; i++)
host1x_syncpt_cpu_incr(cdma->timeout.syncpt);
- /* after CPU incr, ensure shadow is up to date */
- host1x_syncpt_load_min(cdma->timeout.syncpt);
- /* NOP all the PB slots */
- getidx = getptr - pb->phys;
- while (nr_slots--) {
u32 *p = (u32 *)((u32)pb->mapped + getidx);
*(p++) = HOST1X_OPCODE_NOOP;
*(p++) = HOST1X_OPCODE_NOOP;
dev_dbg(&host1x->dev->dev, "%s: NOP at 0x%x\n",
__func__, pb->phys + getidx);
getidx = (getidx + 8) & (PUSH_BUFFER_SIZE - 1);
- }
- wmb();
Why the memory barrier?
+/*
- Similar to cdma_start(), but rather than starting from an idle
- state (where DMA GET is set to DMA PUT), on a timeout we restore
- DMA GET from an explicit value (so DMA may again be pending).
- */
+static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr) +{
- struct host1x *host1x = cdma_to_host1x(cdma);
- struct host1x_channel *ch = cdma_to_channel(cdma);
- if (cdma->running)
return;
- cdma->last_put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
- host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
HOST1X_CHANNEL_DMACTRL);
- /* set base, end pointer (all of memory) */
- host1x_ch_writel(ch, 0, HOST1X_CHANNEL_DMASTART);
- host1x_ch_writel(ch, 0xFFFFFFFF, HOST1X_CHANNEL_DMAEND);
According to the TRM, writing to HOST1X_CHANNEL_DMASTART will start a DMA transfer on the channel (if DMA_PUT != DMA_GET). Irrespective of that, why set the valid range to all of physical memory? We know the valid range of the push buffer, why not set the limits accordingly?
+/*
- Kick channel DMA into action by writing its PUT offset (if it has changed)
- */
+static void cdma_kick(struct host1x_cdma *cdma) +{
- struct host1x *host1x = cdma_to_host1x(cdma);
- struct host1x_channel *ch = cdma_to_channel(cdma);
- u32 put;
- put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
- if (put != cdma->last_put) {
host1x_ch_writel(ch, put, HOST1X_CHANNEL_DMAPUT);
cdma->last_put = put;
- }
+}
kick() sounds unusual. Maybe flush or commit or something similar would be more accurate.
+static void cdma_stop(struct host1x_cdma *cdma) +{
- struct host1x_channel *ch = cdma_to_channel(cdma);
- mutex_lock(&cdma->lock);
- if (cdma->running) {
host1x_cdma_wait_locked(cdma, CDMA_EVENT_SYNC_QUEUE_EMPTY);
host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
HOST1X_CHANNEL_DMACTRL);
cdma->running = false;
- }
- mutex_unlock(&cdma->lock);
+}
Perhaps this should be ranem cdma_stop_sync() or similar to make it clear that it waits for the queue to run empty.
+static void cdma_timeout_teardown_end(struct host1x_cdma *cdma, u32 getptr)
Maybe the last parameter should be called restart to match its purpose?
+{
- struct host1x *host1x = cdma_to_host1x(cdma);
- struct host1x_channel *ch = cdma_to_channel(cdma);
- u32 cmdproc_stop;
- dev_dbg(&host1x->dev->dev,
"end channel teardown (id %d, DMAGET restart = 0x%x)\n",
ch->chid, getptr);
- cmdproc_stop = host1x_sync_readl(host1x, HOST1X_SYNC_CMDPROC_STOP);
- cmdproc_stop &= ~(BIT(ch->chid));
No need for the extra parentheses.
- host1x_sync_writel(host1x, cmdproc_stop, HOST1X_SYNC_CMDPROC_STOP);
- cdma->torndown = false;
- cdma_timeout_restart(cdma, getptr);
+}
I find this a bit non-intuitive. We teardown a channel, and when we're done tearing down, the torndown variable is set to false and the channel is actually restarted. Maybe you could explain some more how this works and what its purpose is.
+/*
- If this timeout fires, it indicates the current sync_queue entry has
- exceeded its TTL and the userctx should be timed out and remaining
- submits already issued cleaned up (future submits return an error).
- */
I can't seem to find what causes subsequent submits to return an error. Also, how is the channel reset so that new jobs can be submitted?
+static void cdma_timeout_handler(struct work_struct *work) +{
- struct host1x_cdma *cdma;
- struct host1x *host1x;
- struct host1x_channel *ch;
- u32 syncpt_val;
- u32 prev_cmdproc, cmdproc_stop;
- cdma = container_of(to_delayed_work(work), struct host1x_cdma,
timeout.wq);
- host1x = cdma_to_host1x(cdma);
- ch = cdma_to_channel(cdma);
- mutex_lock(&cdma->lock);
- if (!cdma->timeout.clientid) {
dev_dbg(&host1x->dev->dev,
"cdma_timeout: expired, but has no clientid\n");
mutex_unlock(&cdma->lock);
return;
- }
How can the CDMA not have a client?
- /* stop processing to get a clean snapshot */
- prev_cmdproc = host1x_sync_readl(host1x, HOST1X_SYNC_CMDPROC_STOP);
- cmdproc_stop = prev_cmdproc | BIT(ch->chid);
- host1x_sync_writel(host1x, cmdproc_stop, HOST1X_SYNC_CMDPROC_STOP);
- dev_dbg(&host1x->dev->dev, "cdma_timeout: cmdproc was 0x%x is 0x%x\n",
prev_cmdproc, cmdproc_stop);
- syncpt_val = host1x_syncpt_load_min(host1x->syncpt);
- /* has buffer actually completed? */
- if ((s32)(syncpt_val - cdma->timeout.syncpt_val) >= 0) {
dev_dbg(&host1x->dev->dev,
"cdma_timeout: expired, but buffer had completed\n");
Maybe this should really be a warning?
/* restore */
cmdproc_stop = prev_cmdproc & ~(BIT(ch->chid));
No need for the extra parentheses. Also, why not just use prev_cmdproc, which shouldn't have the bit set anyway?
diff --git a/drivers/gpu/host1x/hw/cdma_hw.h b/drivers/gpu/host1x/hw/cdma_hw.h
[...]
+/*
- Size of the sync queue. If it is too small, we won't be able to queue up
- many command buffers. If it is too large, we waste memory.
- */
+#define HOST1X_SYNC_QUEUE_SIZE 512
I don't see this used anywhere.
+/*
- Number of gathers we allow to be queued up per channel. Must be a
- power of two. Currently sized such that pushbuffer is 4KB (512*8B).
- */
+#define HOST1X_GATHER_QUEUE_SIZE 512
More pieces falling into place.
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
[...]
+#include "host1x.h" +#include "channel.h" +#include "dev.h" +#include <linux/slab.h> +#include "intr.h" +#include "job.h" +#include <trace/events/host1x.h>
More include ordering issues.
+static void submit_gathers(struct host1x_job *job) +{
- /* push user gathers */
- int i;
unsigned int?
- for (i = 0 ; i < job->num_gathers; i++) {
struct host1x_job_gather *g = &job->gathers[i];
u32 op1 = host1x_opcode_gather(g->words);
u32 op2 = g->mem_base + g->offset;
host1x_cdma_push_gather(&job->ch->cdma,
job->gathers[i].ref,
job->gathers[i].offset,
op1, op2);
- }
+}
Perhaps inline this into channel_submit()? I'm not sure how useful it really is to split off smallish functions such as this which aren't reused anywhere else. I don't have any major objection though, so you can keep it separate if you want.
+static inline void __iomem *host1x_channel_regs(void __iomem *p, int ndx) +{
- p += ndx * NV_HOST1X_CHANNEL_MAP_SIZE_BYTES;
- return p;
+}
+static int host1x_channel_init(struct host1x_channel *ch,
- struct host1x *dev, int index)
+{
- ch->chid = index;
- mutex_init(&ch->reflock);
- mutex_init(&ch->submitlock);
- ch->regs = host1x_channel_regs(dev->regs, index);
- return 0;
+}
You only use host1x_channel_regs() once, so I really don't think it buys you anything to split it off. Both host1x_channel_regs() and host1x_channel_init() are short enough that they can be collapsed.
diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
[...]
#include "hw/host1x01.h" #include "dev.h" +#include "channel.h" #include "hw/host1x01_hardware.h"
+#include "hw/channel_hw.c" +#include "hw/cdma_hw.c" #include "hw/syncpt_hw.c" #include "hw/intr_hw.c"
int host1x01_init(struct host1x *host) {
- host->channel_op = host1x_channel_ops;
- host->cdma_op = host1x_cdma_ops;
- host->cdma_pb_op = host1x_pushbuffer_ops; host->syncpt_op = host1x_syncpt_ops; host->intr_op = host1x_intr_ops;
I think I mentioned this before, but I'd prefer not to have the .c files included here, but rather reference the ops structures externally. But I still think that especially CDMA and push buffer ops don't need to be in separate structures since they aren't likely to change with new hardware revisions.
diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h
[...]
index c1d5324..03873c0 100644 --- a/drivers/gpu/host1x/hw/host1x01_hardware.h +++ b/drivers/gpu/host1x/hw/host1x01_hardware.h @@ -21,6 +21,130 @@
#include <linux/types.h> #include <linux/bitops.h> +#include "hw_host1x01_channel.h" #include "hw_host1x01_sync.h" +#include "hw_host1x01_uclass.h"
+/* channel registers */ +#define NV_HOST1X_CHANNEL_MAP_SIZE_BYTES 16384
The only user of this seems to be host1x_channel_regs(), so it could be moved to that file. Also the name is overly long, why not something like HOST1X_CHANNEL_SIZE?
+#define HOST1X_OPCODE_NOOP host1x_opcode_nonincr(0, 0)
HOST1X_OPCODE_NOP would be more canonical in my opinion.
+static inline u32 host1x_mask2(unsigned x, unsigned y) +{
- return 1 | (1 << (y - x));
+}
What's this? I don't see it used anywhere.
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_channel.h b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
[...]
+#define HOST1X_CHANNEL_DMACTRL_DMASTOP_F(v) \
- host1x_channel_dmactrl_dmastop_f(v)
I mentioned this elsewhere already, but I think the _F suffix (and _f for that matter) along with the v parameter should go away.
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_uclass.h b/drivers/gpu/host1x/hw/hw_host1x01_uclass.h
[...]
What does the "uclass" stand for? It seems a bit useless to me.
diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c index 16e3ada..ba48cee 100644 --- a/drivers/gpu/host1x/hw/syncpt_hw.c +++ b/drivers/gpu/host1x/hw/syncpt_hw.c @@ -97,6 +97,15 @@ static void syncpt_cpu_incr(struct host1x_syncpt *sp) wmb(); }
+/* remove a wait pointed to by patch_addr */ +static int syncpt_patch_wait(struct host1x_syncpt *sp, void *patch_addr) +{
- u32 override = host1x_class_host_wait_syncpt(
NVSYNCPT_GRAPHICS_HOST, 0);
- __raw_writel(override, patch_addr);
__raw_writel() isn't meant to be used for regular memory addresses, but only for MMIO addresses. patch_addr will be a kernel virtual address to an location in RAM, so you can just treat it as a normal pointer, so:
*(u32 *)patch_addr = override;
A small optimization might be to make override a static const, so that it doesn't have to be composed every time.
diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
[...]
+static void action_submit_complete(struct host1x_waitlist *waiter) +{
- struct host1x_channel *channel = waiter->data;
- int nr_completed = waiter->count;
No need for this variable.
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
[...]
+#ifdef CONFIG_TEGRA_HOST1X_FIREWALL +static int host1x_firewall = 1; +#else +static int host1x_firewall; +#endif
You could use IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL) in the code, which will have the nice side-effect of compiling code out if the symbol isn't selected.
+struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks)
Maybe make the parameters unsigned int instead of u32?
+{
- struct host1x_job *job = NULL;
- int num_unpins = num_cmdbufs + num_relocs;
unsigned int?
- s64 total;
This doesn't need to be signed, u64 will be good enough. None of the terms in the expression that assigns to total can be negative.
- void *mem;
- /* Check that we're not going to overflow */
- total = sizeof(struct host1x_job)
+ num_relocs * sizeof(struct host1x_reloc)
+ num_unpins * sizeof(struct host1x_job_unpin_data)
+ num_waitchks * sizeof(struct host1x_waitchk)
+ num_cmdbufs * sizeof(struct host1x_job_gather)
+ num_unpins * sizeof(dma_addr_t)
+ num_unpins * sizeof(u32 *);
"+"s at the end of the preceding lines.
- if (total > ULONG_MAX)
return NULL;
- mem = job = kzalloc(total, GFP_KERNEL);
- if (!job)
return NULL;
- kref_init(&job->ref);
- job->ch = ch;
- /* First init state to zero */
- /*
* Redistribute memory to the structs.
* Overflows and negative conditions have
* already been checked in job_alloc().
*/
The last two lines don't really apply here. The checks are in this same function and they check only for overflow, not negative conditions, which can't happen anyway since the counts are all unsigned.
+void host1x_job_get(struct host1x_job *job) +{
- kref_get(&job->ref);
+}
I think it is common for *_get() functions to return a pointer to the referenced object.
+void host1x_job_add_gather(struct host1x_job *job,
u32 mem_id, u32 words, u32 offset)
+{
- struct host1x_job_gather *cur_gather =
&job->gathers[job->num_gathers];
Should this check for overflow?
+/*
- Check driver supplied waitchk structs for syncpt thresholds
- that have already been satisfied and NULL the comparison (to
- avoid a wrap condition in the HW).
- */
+static int do_waitchks(struct host1x_job *job, struct host1x *host,
u32 patch_mem, struct mem_handle *h)
+{
- int i;
- /* compare syncpt vs wait threshold */
- for (i = 0; i < job->num_waitchk; i++) {
struct host1x_waitchk *wait = &job->waitchk[i];
struct host1x_syncpt *sp =
host1x_syncpt_get(host, wait->syncpt_id);
/* validate syncpt id */
if (wait->syncpt_id > host1x_syncpt_nb_pts(host))
continue;
/* skip all other gathers */
if (patch_mem != wait->mem)
continue;
trace_host1x_syncpt_wait_check(wait->mem, wait->offset,
wait->syncpt_id, wait->thresh,
host1x_syncpt_read_min(sp));
if (host1x_syncpt_is_expired(
host1x_syncpt_get(host, wait->syncpt_id),
wait->thresh)) {
You already have the sp variable that you could use here to make it more readable.
struct host1x_syncpt *sp =
host1x_syncpt_get(host, wait->syncpt_id);
And you don't need this then, since you already have sp pointing to the same syncpoint.
void *patch_addr = NULL;
/*
* NULL an already satisfied WAIT_SYNCPT host method,
* by patching its args in the command stream. The
* method data is changed to reference a reserved
* (never given out or incr) NVSYNCPT_GRAPHICS_HOST
* syncpt with a matching threshold value of 0, so
* is guaranteed to be popped by the host HW.
*/
dev_dbg(&host->dev->dev,
"drop WAIT id %d (%s) thresh 0x%x, min 0x%x\n",
wait->syncpt_id, sp->name, wait->thresh,
host1x_syncpt_read_min(sp));
/* patch the wait */
patch_addr = host1x_memmgr_kmap(h,
wait->offset >> PAGE_SHIFT);
if (patch_addr) {
host1x_syncpt_patch_wait(sp,
(patch_addr +
(wait->offset & ~PAGE_MASK)));
host1x_memmgr_kunmap(h,
wait->offset >> PAGE_SHIFT,
patch_addr);
} else {
pr_err("Couldn't map cmdbuf for wait check\n");
}
This is a case where splitting out a small function would actually be useful to make the code more readable since you can remove two levels of indentation. You can just pass in the handle and the offset, let it do the actual patching. Maybe
host1x_syncpt_patch_offset(sp, h, wait->offset);
?
}
wait->mem = 0;
- }
- return 0;
+}
There's a gratuitous blank line.
+static int pin_job_mem(struct host1x_job *job) +{
- int i;
- int count = 0;
- int result;
These (and the return value) can all be unsigned int.
+static int do_relocs(struct host1x_job *job,
u32 cmdbuf_mem, struct mem_handle *h)
+{
- int i = 0;
This can also be unsigned int.
- int last_page = -1;
And this should match the type of cmdbuf_offset (u32). You can initially set it to something like ~0 to make sure it doesn't match any valid offset.
- void *cmdbuf_page_addr = NULL;
- /* pin & patch the relocs for one gather */
- while (i < job->num_relocs) {
struct host1x_reloc *reloc = &job->relocarray[i];
/* skip all other gathers */
if (cmdbuf_mem != reloc->cmdbuf_mem) {
i++;
continue;
}
if (last_page != reloc->cmdbuf_offset >> PAGE_SHIFT) {
if (cmdbuf_page_addr)
host1x_memmgr_kunmap(h,
last_page, cmdbuf_page_addr);
cmdbuf_page_addr = host1x_memmgr_kmap(h,
reloc->cmdbuf_offset >> PAGE_SHIFT);
last_page = reloc->cmdbuf_offset >> PAGE_SHIFT;
if (unlikely(!cmdbuf_page_addr)) {
pr_err("Couldn't map cmdbuf for relocation\n");
return -ENOMEM;
}
}
__raw_writel(
(job->reloc_addr_phys[i] +
reloc->target_offset) >> reloc->shift,
(cmdbuf_page_addr +
(reloc->cmdbuf_offset & ~PAGE_MASK)));
Again, wrong __raw_writel() usage.
/* remove completed reloc from the job */
if (i != job->num_relocs - 1) {
struct host1x_reloc *reloc_last =
&job->relocarray[job->num_relocs - 1];
reloc->cmdbuf_mem = reloc_last->cmdbuf_mem;
reloc->cmdbuf_offset = reloc_last->cmdbuf_offset;
reloc->target = reloc_last->target;
reloc->target_offset = reloc_last->target_offset;
reloc->shift = reloc_last->shift;
job->reloc_addr_phys[i] =
job->reloc_addr_phys[job->num_relocs - 1];
job->num_relocs--;
} else {
break;
}
- }
- if (cmdbuf_page_addr)
host1x_memmgr_kunmap(h, last_page, cmdbuf_page_addr);
- return 0;
+}
Also the algorithm seems a bit strange and hard to follow. Instead of removing relocs from the job, replacing them with the last entry and decrementing job->num_relocs, how much is the penalty for always iterating over all relocs? This is one of the other cases where I'd argue that simplicity is key. Furthermore you need to copy quite a bit of data to replace the completed relocs, so I'm not sure it buys you much.
It could always be optimized later on by just setting a bit in the reloc to mark it as completed, or keep a bitmask of completed relocations or whatever.
+static int check_reloc(struct host1x_reloc *reloc,
u32 cmdbuf_id, int offset)
offset can be unsigned int.
+{
- int err = 0;
- if (reloc->cmdbuf_mem != cmdbuf_id
|| reloc->cmdbuf_offset != offset * sizeof(u32))
err = -EINVAL;
- return err;
+}
More canonically:
offset *= sizeof(u32);
if (reloc->cmdbuf_mem != cmdbuf_id || reloc->cmdbuf_offset != offset) return -EINVAL;
return 0;
+static int check_mask(struct host1x_job *job,
struct platform_device *pdev,
struct host1x_reloc **reloc, int *num_relocs,
u32 cmdbuf_id, int *offset,
u32 *words, u32 class, u32 reg, u32 mask)
num_relocs and offset can be unsigned int *.
Same comment for the other check_*() functions. That said I think the code would become a lot more readable if you were to wrap all of these parameters into a structure, say host1x_firewall, and just pass that into the functions.
+static inline int copy_gathers(struct host1x_job *job,
struct platform_device *pdev)
struct device *
+{
- size_t size = 0;
- size_t offset = 0;
- int i;
- for (i = 0; i < job->num_gathers; i++) {
struct host1x_job_gather *g = &job->gathers[i];
size += g->words * sizeof(u32);
- }
- job->gather_copy_mapped = dma_alloc_writecombine(&pdev->dev,
size, &job->gather_copy, GFP_KERNEL);
- if (IS_ERR(job->gather_copy_mapped)) {
dma_alloc_writecombine() returns NULL on failure, so this check is wrong.
int err = PTR_ERR(job->gather_copy_mapped);
job->gather_copy_mapped = NULL;
return err;
- }
- job->gather_copy_size = size;
- for (i = 0; i < job->num_gathers; i++) {
struct host1x_job_gather *g = &job->gathers[i];
void *gather = host1x_memmgr_mmap(g->ref);
memcpy(job->gather_copy_mapped + offset,
gather + g->offset,
g->words * sizeof(u32));
g->mem_base = job->gather_copy;
g->offset = offset;
g->mem_id = 0;
g->ref = 0;
host1x_memmgr_munmap(g->ref, gather);
offset += g->words * sizeof(u32);
- }
- return 0;
+}
I wonder, where's this DMA buffer actually used? I can't find any use between this copy and the corresponding dma_free_writecombine() call.
+int host1x_job_pin(struct host1x_job *job, struct platform_device *pdev) +{
- int err = 0, i = 0, j = 0;
No need to initialize these here. i and j can also be unsigned.
- struct host1x *host = host1x_get_host(pdev);
- DECLARE_BITMAP(waitchk_mask, host1x_syncpt_nb_pts(host));
- bitmap_zero(waitchk_mask, host1x_syncpt_nb_pts(host));
- for (i = 0; i < job->num_waitchk; i++) {
u32 syncpt_id = job->waitchk[i].syncpt_id;
if (syncpt_id < host1x_syncpt_nb_pts(host))
set_bit(syncpt_id, waitchk_mask);
- }
- /* get current syncpt values for waitchk */
- for_each_set_bit(i, &waitchk_mask[0], sizeof(waitchk_mask))
host1x_syncpt_load_min(host->syncpt + i);
- /* pin memory */
- err = pin_job_mem(job);
- if (err <= 0)
goto out;
pin_job_mem() never returns negative.
- /* patch gathers */
- for (i = 0; i < job->num_gathers; i++) {
struct host1x_job_gather *g = &job->gathers[i];
/* process each gather mem only once */
if (!g->ref) {
g->ref = host1x_memmgr_get(g->mem_id, job->ch->dev);
if (IS_ERR(g->ref)) {
host1x_memmgr_get() also seems to return NULL on error.
err = PTR_ERR(g->ref);
g->ref = NULL;
break;
}
g->mem_base = job->gather_addr_phys[i];
for (j = 0; j < job->num_gathers; j++) {
struct host1x_job_gather *tmp =
&job->gathers[j];
if (!tmp->ref && tmp->mem_id == g->mem_id) {
tmp->ref = g->ref;
tmp->mem_base = g->mem_base;
}
}
err = 0;
if (host1x_firewall)
if (IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL))
err = validate(job, pdev, g);
if (err)
dev_err(&pdev->dev,
"Job validate returned %d\n", err);
if (!err)
err = do_relocs(job, g->mem_id, g->ref);
if (!err)
err = do_waitchks(job, host,
g->mem_id, g->ref);
host1x_memmgr_put(g->ref);
if (err)
break;
}
- }
- if (host1x_firewall && !err) {
And here.
+/*
- Debug routine used to dump job entries
- */
+void host1x_job_dump(struct device *dev, struct host1x_job *job) +{
- dev_dbg(dev, " SYNCPT_ID %d\n",
job->syncpt_id);
- dev_dbg(dev, " SYNCPT_VAL %d\n",
job->syncpt_end);
- dev_dbg(dev, " FIRST_GET 0x%x\n",
job->first_get);
- dev_dbg(dev, " TIMEOUT %d\n",
job->timeout);
- dev_dbg(dev, " NUM_SLOTS %d\n",
job->num_slots);
- dev_dbg(dev, " NUM_HANDLES %d\n",
job->num_unpins);
+}
These don't need to be wrapped.
diff --git a/drivers/gpu/host1x/job.h b/drivers/gpu/host1x/job.h
[...]
+struct host1x_job_gather {
- u32 words;
- dma_addr_t mem_base;
- u32 mem_id;
- int offset;
- struct mem_handle *ref;
+};
+struct host1x_cmdbuf {
- __u32 mem;
- __u32 offset;
- __u32 words;
- __u32 pad;
+};
+struct host1x_reloc {
- __u32 cmdbuf_mem;
- __u32 cmdbuf_offset;
- __u32 target;
- __u32 target_offset;
- __u32 shift;
- __u32 pad;
+};
+struct host1x_waitchk {
- __u32 mem;
- __u32 offset;
- __u32 syncpt_id;
- __u32 thresh;
+};
None of these are shared with userspace, so they shouldn't take the __u32 types, but the regular u32 ones.
+/*
- Each submit is tracked as a host1x_job.
- */
+struct host1x_job {
- /* When refcount goes to zero, job can be freed */
- struct kref ref;
- /* List entry */
- struct list_head list;
- /* Channel where job is submitted to */
- struct host1x_channel *ch;
Maybe write it out as "channel"?
- int clientid;
Subsequent patches assign u32 to this field, so maybe the type should be changed here. And maybe leave out the id suffix. It doesn't really add any information.
- /* Gathers and their memory */
- struct host1x_job_gather *gathers;
- int num_gathers;
unsigned int
- /* Wait checks to be processed at submit time */
- struct host1x_waitchk *waitchk;
- int num_waitchk;
unsigned int
- u32 waitchk_mask;
This might need to be changed to a bitfield once future Tegra versions start supporting more than 32 syncpoints.
- /* Array of handles to be pinned & unpinned */
- struct host1x_reloc *relocarray;
- int num_relocs;
unsigned int
- struct host1x_job_unpin_data *unpins;
- int num_unpins;
unsigned int
- dma_addr_t *addr_phys;
- dma_addr_t *gather_addr_phys;
- dma_addr_t *reloc_addr_phys;
- /* Sync point id, number of increments and end related to the submit */
- u32 syncpt_id;
- u32 syncpt_incrs;
- u32 syncpt_end;
- /* Maximum time to wait for this job */
- int timeout;
unsigned int. I think we discussed this already in a slightly different context in patch 2.
- /* Null kickoff prevents submit from being sent to hardware */
- bool null_kickoff;
I don't think this is used anywhere.
- /* Index and number of slots used in the push buffer */
- int first_get;
- int num_slots;
unsigned int
- /* Copy of gathers */
- size_t gather_copy_size;
- dma_addr_t gather_copy;
- u8 *gather_copy_mapped;
Are these really needed? They don't seem to be used anywhere except to store a copy and free that copy sometime later.
- /* Temporary space for unpin ids */
- long unsigned int *pin_ids;
unsigned long
- /* Check if register is marked as an address reg */
- int (*is_addr_reg)(struct platform_device *dev, u32 reg, u32 class);
is_addr_reg() sounds a bit unusual. Maybe match this to the name of the main firewall routine, validate()?
- /* Request a SETCLASS to this class */
- u32 class;
- /* Add a channel wait for previous ops to complete */
- u32 serialize;
This is used in code as a boolean. Why does it need to be 32 bits?
diff --git a/drivers/gpu/host1x/memmgr.h b/drivers/gpu/host1x/memmgr.h
[...]
+struct mem_handle; +struct platform_device;
+struct host1x_job_unpin_data {
- struct mem_handle *h;
- struct sg_table *mem;
+};
+enum mem_mgr_flag {
- mem_mgr_flag_uncacheable = 0,
- mem_mgr_flag_write_combine = 1,
+};
I'd like to see this use a more object-oriented approach and more common terminology. All of these handles are essentially buffer objects, so maybe something like host1x_bo would be a nice and short name.
To make this more object-oriented, I propose something like:
struct host1x_bo_ops { int (*alloc)(struct host1x_bo *bo, size_t size, unsigned long align, unsigned long flags); int (*free)(struct host1x_bo *bo); ... };
struct host1x_bo { const struct host1x_bo_ops *ops; };
struct host1x_cma_bo { struct host1x_bo base; struct drm_gem_cma_object *obj; };
static inline struct host1x_cma_bo *to_host1x_cma_bo(struct host1x_bo *bo) { return container_of(bo, struct host1x_cma_bo, base); }
static inline int host1x_bo_alloc(struct host1x_bo *bo, size_t size, unsigned long align, unsigned long flags) { return bo->ops->alloc(bo, size, align, flags); }
...
That should be easy to extend with a new type of BO once the IOMMU-based allocator is ready. And as I said it is much closer in terminology to what other drivers do.
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h index b46d044..255a3a3 100644 --- a/drivers/gpu/host1x/syncpt.h +++ b/drivers/gpu/host1x/syncpt.h @@ -26,6 +26,7 @@ struct host1x;
#define NVSYNCPT_INVALID (-1) +#define NVSYNCPT_GRAPHICS_HOST 0
I think these should match other naming, so:
#define HOST1X_SYNCPT_INVALID -1 #define HOST1X_SYNCPT_HOST1X 0
There are a few more occurrences where platform_device is used but I haven't commented on them. I don't think any of them won't work with just a struct device instead. Also I may not have caught all of the places where you should rather be using unsigned int instead of int, so you might want to look out for some of those.
Generally I very much like where this is going. Are there any plans to move the userspace binary driver to this interface at some point so we can more actively test it? Also, is anything else blocking adding a gr3d device similar to gr2d from this patch series?
Thierry
On 25.02.2013 17:24, Thierry Reding wrote:
- PGP Signed by an unknown key
On Tue, Jan 15, 2013 at 01:43:59PM +0200, Terje Bergstrom wrote: [...]
diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig index e89fb2b..57680a6 100644 --- a/drivers/gpu/host1x/Kconfig +++ b/drivers/gpu/host1x/Kconfig @@ -3,4 +3,27 @@ config TEGRA_HOST1X help Driver for the Tegra host1x hardware.
Required for enabling tegradrm.
Required for enabling tegradrm and 2D acceleration.
I don't think I commented on this in the other patches, but I think this could use a bit more information about what host1x is. Also mentioning that it is a requirement for tegra-drm and 2D acceleration isn't very useful because it can equally well be expressed in Kconfig. If you add some description about what host1x is, people will know that they want to enable it.
Ok, we'll rewrite that. I think we can reuse the text from commit msg that I stole from Stephen's appnote.
+if TEGRA_HOST1X
+config TEGRA_HOST1X_CMA
bool "Support DRM CMA buffers"
depends on DRM
default y
select DRM_GEM_CMA_HELPER
select DRM_KMS_CMA_HELPER
help
Say yes if you wish to use DRM CMA buffers.
If unsure, choose Y.
Perhaps make this not user-selectable (for now)? If somebody disables this explicitly they won't get a working driver, right?
True, there's no alternative, so it should not be user selectable.
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
[...]
+#include "cdma.h" +#include "channel.h" +#include "dev.h" +#include "memmgr.h" +#include "job.h" +#include <asm/cacheflush.h>
+#include <linux/slab.h> +#include <linux/kfifo.h> +#include <linux/interrupt.h> +#include <trace/events/host1x.h>
+#define TRACE_MAX_LENGTH 128U
"" includes generally follow <> ones.
Will do.
+/*
- Add an entry to the sync queue.
- */
+static void add_to_sync_queue(struct host1x_cdma *cdma,
struct host1x_job *job,
u32 nr_slots,
u32 first_get)
+{
if (job->syncpt_id == NVSYNCPT_INVALID) {
dev_warn(&job->ch->dev->dev, "%s: Invalid syncpt\n",
__func__);
return;
}
job->first_get = first_get;
job->num_slots = nr_slots;
host1x_job_get(job);
list_add_tail(&job->list, &cdma->sync_queue);
+}
It's a bit odd that you pass a job in here along with some parameters that are then assigned to the job's fields. Couldn't you just assign them to the job's fields before passing the job into this function?
I also see that you only use this function once, so maybe you could open-code it instead.
I think open coding would be the best choice. There's no real reason to have this as separate function. That'd solve the odd parameters phenomenon, too.
+/*
- Return the status of the cdma's sync queue or push buffer for the given event
- sq empty: returns 1 for empty, 0 for not empty (as in "1 empty queue" :-)
- pb space: returns the number of free slots in the channel's push buffer
- Must be called with the cdma lock held.
- */
+static unsigned int cdma_status_locked(struct host1x_cdma *cdma,
enum cdma_event event)
+{
struct host1x *host1x = cdma_to_host1x(cdma);
switch (event) {
case CDMA_EVENT_SYNC_QUEUE_EMPTY:
return list_empty(&cdma->sync_queue) ? 1 : 0;
case CDMA_EVENT_PUSH_BUFFER_SPACE: {
struct push_buffer *pb = &cdma->push_buffer;
return host1x->cdma_pb_op.space(pb);
}
default:
return 0;
}
+}
Similarly this function is only used in one place and it requires a whole lot of documentation to define the meaning of the return value. If you implement this functionality directly in host1x_cdma_wait_locked() you have much more context and don't require all this "protocol".
I agree, this function is confusing. For some future functionality, it's going to be called from a second place with CDMA_EVENT_SYNC_QUEUE_EMPTY, but it's better of both of those calls are just opened up to get rid of the extra switch().
+/*
- Start timer for a buffer submition that has completed yet.
"submission". And I don't understand the "that has completed yet" part.
It should become "Start timer that tracks the time spent by the job".
- Must be called with the cdma lock held.
- */
+static void cdma_start_timer_locked(struct host1x_cdma *cdma,
struct host1x_job *job)
You use two different styles to indent the function parameters. You might want to stick to one, preferably aligning them with the first parameter on the first line.
I've generally favored "two tabs" indenting, but we'll anyway standardize on one.
+{
struct host1x *host = cdma_to_host1x(cdma);
if (cdma->timeout.clientid) {
/* timer already started */
return;
}
cdma->timeout.clientid = job->clientid;
cdma->timeout.syncpt = host1x_syncpt_get(host, job->syncpt_id);
cdma->timeout.syncpt_val = job->syncpt_end;
cdma->timeout.start_ktime = ktime_get();
schedule_delayed_work(&cdma->timeout.wq,
msecs_to_jiffies(job->timeout));
+}
+/*
- Stop timer when a buffer submition completes.
"submission"
Will fix.
+/*
- For all sync queue entries that have already finished according to the
- current sync point registers:
- unpin & unref their mems
- pop their push buffer slots
- remove them from the sync queue
- This is normally called from the host code's worker thread, but can be
- called manually if necessary.
- Must be called with the cdma lock held.
- */
+static void update_cdma_locked(struct host1x_cdma *cdma) +{
bool signal = false;
struct host1x *host1x = cdma_to_host1x(cdma);
struct host1x_job *job, *n;
/* If CDMA is stopped, queue is cleared and we can return */
if (!cdma->running)
return;
/*
* Walk the sync queue, reading the sync point registers as necessary,
* to consume as many sync queue entries as possible without blocking
*/
list_for_each_entry_safe(job, n, &cdma->sync_queue, list) {
struct host1x_syncpt *sp = host1x->syncpt + job->syncpt_id;
host1x_syncpt_get()?
Yes, that should be used.
/* Check whether this syncpt has completed, and bail if not */
if (!host1x_syncpt_is_expired(sp, job->syncpt_end)) {
/* Start timer on next pending syncpt */
if (job->timeout)
cdma_start_timer_locked(cdma, job);
break;
}
/* Cancel timeout, when a buffer completes */
if (cdma->timeout.clientid)
stop_cdma_timer_locked(cdma);
/* Unpin the memory */
host1x_job_unpin(job);
/* Pop push buffer slots */
if (job->num_slots) {
struct push_buffer *pb = &cdma->push_buffer;
host1x->cdma_pb_op.pop_from(pb, job->num_slots);
if (cdma->event == CDMA_EVENT_PUSH_BUFFER_SPACE)
signal = true;
}
list_del(&job->list);
host1x_job_put(job);
}
if (list_empty(&cdma->sync_queue) &&
cdma->event == CDMA_EVENT_SYNC_QUEUE_EMPTY)
signal = true;
This looks funny, maybe:
if (cdma->event == CDMA_EVENT_SYNC_QUEUE_EMPTY && list_empty(&cdma->sync_queue)) signal = true;
?
Indenting at least is strange. I don't have a preference for the ordering of conditions, so if you like the latter order, we can just use that.
/* Wake up CdmaWait() if the requested event happened */
CdmaWait()? Where's that?
host1x_cdma_wait_locked(). Will fix.
if (signal) {
cdma->event = CDMA_EVENT_NONE;
up(&cdma->sem);
}
+}
+void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
struct platform_device *dev)
There's nothing in this function that requires a platform_device, so passing struct device should be enough. Or maybe host1x_cdma should get a struct device * field?
I think we'll just start using struct device * in general in code. Arto's been already fixing a lot of these, so he might've already fixed this.
+{
u32 get_restart;
Maybe just call this "restart" or "restart_addr". get_restart sounds like a function name.
Ok, how about "restart_dmaget_addr"? That indicates what we're doing with the restart address.
u32 syncpt_incrs;
struct host1x_job *job = NULL;
u32 syncpt_val;
struct host1x *host1x = cdma_to_host1x(cdma);
syncpt_val = host1x_syncpt_load_min(cdma->timeout.syncpt);
dev_dbg(&dev->dev,
"%s: starting cleanup (thresh %d)\n",
__func__, syncpt_val);
This fits on two lines.
Will merge.
/*
* Move the sync_queue read pointer to the first entry that hasn't
* completed based on the current HW syncpt value. It's likely there
* won't be any (i.e. we're still at the head), but covers the case
* where a syncpt incr happens just prior/during the teardown.
*/
dev_dbg(&dev->dev,
"%s: skip completed buffers still in sync_queue\n",
__func__);
This too.
Ok.
list_for_each_entry(job, &cdma->sync_queue, list) {
if (syncpt_val < job->syncpt_end)
break;
host1x_job_dump(&dev->dev, job);
}
That's potentially a lot of debug output. I wonder if it might make sense to control parts of this via a module parameter. Then again, if somebody really needs to debug this, maybe they really want *all* the information.
host1x_job_dump() uses dev_dbg(), so it only dumps a lot if DEBUG has been defined in that file.
/*
* Walk the sync_queue, first incrementing with the CPU syncpts that
* are partially executed (the first buffer) or fully skipped while
* still in the current context (slots are also NOP-ed).
*
* At the point contexts are interleaved, syncpt increments must be
* done inline with the pushbuffer from a GATHER buffer to maintain
* the order (slots are modified to be a GATHER of syncpt incrs).
*
* Note: save in get_restart the location where the timed out buffer
* started in the PB, so we can start the refetch from there (with the
* modified NOP-ed PB slots). This lets things appear to have completed
* properly for this buffer and resources are freed.
*/
dev_dbg(&dev->dev,
"%s: perform CPU incr on pending same ctx buffers\n",
__func__);
Can be collapsed to two lines.
Sure.
get_restart = cdma->last_put;
if (!list_empty(&cdma->sync_queue))
get_restart = job->first_get;
Perhaps:
if (list_empty(&cdma->sync_queue)) restart = cdma->last_put; else restart = job->first_get;
?
That's equivalent in functionality, and there's one less assignment for one path, so sounds good.
list_for_each_entry_from(job, &cdma->sync_queue, list)
if (job->clientid == cdma->timeout.clientid)
job->timeout = 500;
I think this warrants a comment.
Sure. We're accelerating timing out jobs for the client that submitted the job that timed out. But we'll add a comment. And, in downstream, we already changed this to "job->timeout = max(job->timeout, 500), so we should use that.
+/*
- Destroy a cdma
- */
+void host1x_cdma_deinit(struct host1x_cdma *cdma) +{
struct push_buffer *pb = &cdma->push_buffer;
struct host1x *host1x = cdma_to_host1x(cdma);
if (cdma->running) {
pr_warn("%s: CDMA still running\n",
__func__);
} else {
host1x->cdma_pb_op.destroy(pb);
host1x->cdma_op.timeout_destroy(cdma);
}
+}
There's no way to recover from the situation where a cdma is still running. Can this not return an error code (-EBUSY?) if the cdma can't be destroyed?
It's called from close(), which cannot return an error code. It's actually more of a power optimization. The effect is that if there are no users for channel, we'll just not free up the push buffer.
I think the proper fix would actually be to check in host1x_cdma_init() if push buffer is already allocated and cdma->running. In that case we could skip most of initialization.
+/*
- End a cdma submit
- Kick off DMA, add job to the sync queue, and a number of slots to be freed
- from the pushbuffer. The handles for a submit must all be pinned at the same
- time, but they can be unpinned in smaller chunks.
- */
+void host1x_cdma_end(struct host1x_cdma *cdma,
struct host1x_job *job)
+{
struct host1x *host1x = cdma_to_host1x(cdma);
bool was_idle = list_empty(&cdma->sync_queue);
Maybe just "idle"? It reflects the current state of the CDMA, not any old state.
Ok.
host1x->cdma_op.kick(cdma);
add_to_sync_queue(cdma,
job,
cdma->slots_used,
cdma->first_get);
No need to split this over so many lines. Also, shouldn't the order be reversed here? I.e. first add to sync queue, then start DMA?
Yeah, I think the order should be reversed. And, we're anyway moving the code inline, so there's no function call.
/* start timer on idle -> active transitions */
if (job->timeout && was_idle)
cdma_start_timer_locked(cdma, job);
This could be part of add_to_sync_queue(), but if you open-code that as I suggest earlier it should obviously stay.
Yep, let's open-code that.
diff --git a/drivers/gpu/host1x/cdma.h b/drivers/gpu/host1x/cdma.h
[...]
+struct platform_device;
No need for this if you pass struct device * instead.
Will change.
+/*
- cdma
- This is in charge of a host command DMA channel.
- Sends ops to a push buffer, and takes responsibility for unpinning
- (& possibly freeing) of memory after those ops have completed.
- Producer:
- begin
push - send ops to the push buffer
- end - start command DMA and enqueue handles to be unpinned
- Consumer:
- update - call to update sync queue and push buffer, unpin memory
- */
I find the name to be a bit confusing. For some reason I automatically think of GSM when I read CDMA. This really is more of a job queue, so maybe calling it host1x_job_queue might be more appropriate. But I've already requested a lot of things to be renamed, so I think I can live with this being called CDMA if you don't want to change it.
Alternatively all of these could be moved to the struct host1x_channel given that there's only one of each of the push_buffer, buffer_timeout and host1x_cma objects per channel.
I did consider merging those two at a time. That should work, as they both deal with channels essentially. I also saw that the resulting file and data structures became quite large, so I have so far preferred to keep them separate.
This way I can keep the "higher level" stuff (inserting setclass, serializing, allocating sync point ranges, etc) in one file and lower level stuff (write to hardware, deal with push buffer pointers, etc) in another.
diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
[...]
+#include "channel.h" +#include "dev.h" +#include "job.h"
+#include <linux/slab.h> +#include <linux/module.h>
Again the include ordering is strange.
Will fix.
+/*
- Iterator function for host1x device list
- It takes a fptr as an argument and calls that function for each
- device in the list
- */
+void host1x_channel_for_all(struct host1x *host1x, void *data,
int (*fptr)(struct host1x_channel *ch, void *fdata))
+{
struct host1x_channel *ch;
int ret;
list_for_each_entry(ch, &host1x->chlist.list, list) {
if (ch && fptr) {
ret = fptr(ch, data);
if (ret) {
pr_info("%s: iterator error\n", __func__);
break;
}
}
}
+}
Couldn't you rewrite this as a macro, similar to list_for_each_entry() so that users could do something like:
host1x_for_each_channel(channel, host1x) { ... }
That's a bit friendlier than having each user write a separate function to be called from this iterator.
Sounds good, we'll try that. My macro magic is rusty, but I trust list_for_each_entry() will give a template.
+int host1x_channel_submit(struct host1x_job *job) +{
return host1x_get_host(job->ch->dev)->channel_op.submit(job);
+}
I'd expect a function named host1x_channel_submit() to take a struct host1x_channel *. Should this perhaps be called host1x_job_submit()?
It calls into channel code directly, and the underlying op also just takes a job. We could add channel as a parameter, and not pass it in host1x_job_alloc(). but we actually need the channel data already in host1x_job_pin(), which comes before submit. We need it so that we pin the buffer to correct engine.
+struct host1x_channel *host1x_channel_get(struct host1x_channel *ch) +{
int err = 0;
mutex_lock(&ch->reflock);
if (ch->refcount == 0)
err = host1x_cdma_init(&ch->cdma);
if (!err)
ch->refcount++;
mutex_unlock(&ch->reflock);
return err ? NULL : ch;
+}
Why don't you use any of the kernel's reference counting mechanisms?
+void host1x_channel_put(struct host1x_channel *ch) +{
mutex_lock(&ch->reflock);
if (ch->refcount == 1) {
host1x_get_host(ch->dev)->cdma_op.stop(&ch->cdma);
host1x_cdma_deinit(&ch->cdma);
}
ch->refcount--;
mutex_unlock(&ch->reflock);
+}
I think you can do all of this using a kref.
I think the original reason was that there's no reason to use atomic kref, as we anyway have to do mutual exclusion via mutex. But, using kref won't be any problem, so we could use that.
+struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev) +{
struct host1x_channel *ch = NULL;
struct host1x *host1x = host1x_get_host(pdev);
int chindex;
int max_channels = host1x->info.nb_channels;
int err;
mutex_lock(&host1x->chlist_mutex);
chindex = host1x->allocated_channels;
if (chindex > max_channels)
goto fail;
ch = kzalloc(sizeof(*ch), GFP_KERNEL);
if (ch == NULL)
goto fail;
/* Link platform_device to host1x_channel */
err = host1x->channel_op.init(ch, host1x, chindex);
if (err < 0)
goto fail;
ch->dev = pdev;
/* Add to channel list */
list_add_tail(&ch->list, &host1x->chlist.list);
host1x->allocated_channels++;
mutex_unlock(&host1x->chlist_mutex);
return ch;
+fail:
dev_err(&pdev->dev, "failed to init channel\n");
kfree(ch);
mutex_unlock(&host1x->chlist_mutex);
return NULL;
+}
I think the critical section could be shorter here. It's probably not worth the extra trouble, though, given that channels are not often allocated.
Yeah, boot time isn't measured in microseconds. :-) But, if we just make allocated_channels an atomic, we should be able to drop chlist_mutex altogether and it could simplify the code.
+void host1x_channel_free(struct host1x_channel *ch) +{
struct host1x *host1x = host1x_get_host(ch->dev);
struct host1x_channel *chiter, *tmp;
list_for_each_entry_safe(chiter, tmp, &host1x->chlist.list, list) {
if (chiter == ch) {
list_del(&chiter->list);
kfree(ch);
host1x->allocated_channels--;
return;
}
}
+}
This doesn't free the channel if it happens to not be part of the host1x channel list. Perhaps an easier way to write it would be:
host1x = host1x_get_host(ch->dev); list_del(&ch->list); kfree(ch); host1x->allocated_channels--;
Looking at the rest of the code, it seems like a channel will never not be part of the host1x channel list, so I don't think there's a need to to scan the list.
I think you're right. This is just overprotective. Your variant does the same thing with much less code.
On a side-note: generally if you break out of the loop right after freeing the memory of a removed node, there's no need to use the _safe variant since you won't be accessing the .next field of the freed node anyway.
That's true.
Maybe these should also adopt a similar naming as what we discussed for the syncpoints. That is:
struct host1x_channel *host1x_channel_request(struct device *dev);
?
Sounds good.
diff --git a/drivers/gpu/host1x/channel.h b/drivers/gpu/host1x/channel.h
[...]
+/*
- host1x device list in debug-fs dump of host1x and client device
- as well as channel state
- */
I don't understand this comment.
Probably because it's not a sentence and doesn't make sense. I think it's just misplaced. We'll find its proper home.
+struct host1x_channel {
struct list_head list;
int refcount;
int chid;
This can probably just be id. It is a field of host1x_channel, so the ch prefix is redundant.
Ok.
struct mutex reflock;
struct mutex submitlock;
void __iomem *regs;
struct device *node;
This is never used.
Yep, let's remove "node".
struct platform_device *dev;
Can this be just struct device *?
I think so. I'll let Arto look at all places where we could change platform_device->device. He was already on it.
struct cdev cdev;
This is never used.
Will remove.
+/* channel list operations */ +void host1x_channel_list_init(struct host1x *); +void host1x_channel_for_all(struct host1x *, void *data,
int (*fptr)(struct host1x_channel *ch, void *fdata));
+struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev); +void host1x_channel_free(struct host1x_channel *ch);
Is it a good idea to make host1x_channel_free() publicly available? Shouldn't the host1x_channel_alloc()/host1x_channel_request() return a host1x_channel with a reference count of 1 and everybody release their reference using host1x_channel_put() to make sure the channel is freed only after the last reference disappears?
Otherwise whoever calls host1x_channel_free() will confuse everybody else that's still keeping a reference.
The difference is that _put and _get are called to indicate how many user space processes there are for the channel. Even if there are no processes, we won't free the channel structure - we just freeze the channel.
_alloc and _free are different in that they actually create the channel structs and delete them and they follow the lifecycle of the driver. Perhaps we should figure new naming, but refcounting and alloc/free cannot be merged here.
diff --git a/drivers/gpu/host1x/cma.c b/drivers/gpu/host1x/cma.c
[...]
Various spurious blank lines in this file, and the alignment of function parameters is off.
Will fix.
+struct mem_handle *host1x_cma_get(u32 id, struct platform_device *dev)
I don't think this needs platform_device either.
Will fix.
+{
struct drm_gem_cma_object *obj = to_cma_obj((void *)id);
struct mutex *struct_mutex = &obj->base.dev->struct_mutex;
mutex_lock(struct_mutex);
drm_gem_object_reference(&obj->base);
mutex_unlock(struct_mutex);
I think it's more customary to obtain a pointer to struct drm_device and then use mutex_{lock,unlock}(&drm->struct_mutex). Or you could just use drm_gem_object_reference_unlocked(&obj->base) instead. Which doesn't exist yet, apparently. But it could be added.
I think we could take the former path - just refer to mutex in a different way.
+int host1x_cma_pin_array_ids(struct platform_device *dev,
long unsigned *ids,
long unsigned id_type_mask,
long unsigned id_type,
u32 count,
struct host1x_job_unpin_data *unpin_data,
dma_addr_t *phys_addr)
struct device * and unsigned long please. count can also doesn't need to be a sized type. unsigned int will do just fine. The return value can also be unsigned int if you don't expect to return any error conditions.
I think we'll need to check these. ids probably needs to be a u32 *, and id_type_mask and id_type should be u32. They come like that from user space.
+{
int i;
int pin_count = 0;
Both should be unsigned as well, and can go on one line:
unsigned int pin_count = 0, i;
Ok.
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
[...]
struct host1x; +struct host1x_intr; struct host1x_syncpt; +struct host1x_channel; +struct host1x_cdma; +struct host1x_job; +struct push_buffer; +struct dentry;
I think this already belongs in a previous patch. The debugfs dentry isn't added in this patch.
Ok, that was a mistake I did when I re-split after one of the previous rounds. I compiled (at least thought I did) after each patch, so it might be that these aren't actually needed.
+struct host1x_channel_ops {
int (*init)(struct host1x_channel *,
struct host1x *,
int chid);
Please add the parameter names as well (the same goes for all ops declared in this file). And "id" will be enough. Also the channel ID can surely be unsigned, right?
Sure to all of these.
+struct host1x_cdma_ops {
void (*start)(struct host1x_cdma *);
void (*stop)(struct host1x_cdma *);
void (*kick)(struct host1x_cdma *);
int (*timeout_init)(struct host1x_cdma *,
u32 syncpt_id);
void (*timeout_destroy)(struct host1x_cdma *);
void (*timeout_teardown_begin)(struct host1x_cdma *);
void (*timeout_teardown_end)(struct host1x_cdma *,
u32 getptr);
void (*timeout_cpu_incr)(struct host1x_cdma *,
u32 getptr,
u32 syncpt_incrs,
u32 syncval,
u32 nr_slots);
+};
Can the timeout_ prefix not be dropped? The functions are generally useful and not directly related to timeouts, even though they seem to only be used during timeout handling.
All the timeout functions actually access the timeout struct, so they're not generic. Teardown functions are the only ones which don't access timeout.
Also, is it really necessary to abstract these into an ops structure? I get that newer hardware revisions might require different ops for sync- point handling because the register layout or number of syncpoints may be different, but the CDMA and push buffer (below) concepts are pretty much a software abstraction, and as such its implementation is unlikely to change with some future hardware revision.
Pushbuffer ops can become generic. There's only one catch - init uses the restart opcode. But the opcode is not going to change, so we can generalize that.
+struct host1x_pushbuffer_ops {
void (*reset)(struct push_buffer *);
int (*init)(struct push_buffer *);
void (*destroy)(struct push_buffer *);
void (*push_to)(struct push_buffer *,
struct mem_handle *,
u32 op1, u32 op2);
void (*pop_from)(struct push_buffer *,
unsigned int slots);
Maybe just push() and pop()?
Can do.
u32 (*space)(struct push_buffer *);
u32 (*putptr)(struct push_buffer *);
+};
struct host1x_syncpt_ops { void (*reset)(struct host1x_syncpt *); @@ -64,9 +111,19 @@ struct host1x { struct host1x_device_info info; struct clk *clk;
/* Sync point dedicated to replacing waits for expired fences */
struct host1x_syncpt *nop_sp;
struct host1x_channel_ops channel_op;
struct host1x_cdma_ops cdma_op;
struct host1x_pushbuffer_ops cdma_pb_op; struct host1x_syncpt_ops syncpt_op; struct host1x_intr_ops intr_op;
struct mutex chlist_mutex;
struct host1x_channel chlist;
Shouldn't this just be struct list_head?
I think you're right, to follow the normal kernel conventions.
int allocated_channels;
unsigned int? And maybe just "num_channels"?
num_channels could be thought as "number of available channels", so I'd like to use num_allocated_channels here.
diff --git a/drivers/gpu/host1x/host1x.h b/drivers/gpu/host1x/host1x.h
[...]
+enum host1x_class {
NV_HOST1X_CLASS_ID = 0x1,
NV_GRAPHICS_2D_CLASS_ID = 0x51,
This entry belongs in a later patch, right? And I find it convenient if enumeration constants start with the enum name as prefix. Furthermore it'd be nice to reuse the hardware module names, like so:
enum host1x_class { HOST1X_CLASS_HOST1X, HOST1X_CLASS_GR2D, HOST1X_CLASS_GR3D, };
The naming sounds good. We already use HOST1X_CLASS_HOST1X in code to insert a wait. If you'd prefer, we can move the definition of HOST1X_CLASS_GR2D to the later patch.
diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
[...]
+#include <linux/slab.h> +#include <linux/scatterlist.h> +#include <linux/dma-mapping.h> +#include "cdma.h" +#include "channel.h" +#include "dev.h" +#include "memmgr.h"
+#include "cdma_hw.h"
+static inline u32 host1x_channel_dmactrl(int stop, int get_rst, int init_get) +{
return HOST1X_CHANNEL_DMACTRL_DMASTOP_F(stop)
| HOST1X_CHANNEL_DMACTRL_DMAGETRST_F(get_rst)
| HOST1X_CHANNEL_DMACTRL_DMAINITGET_F(init_get);
I think it is more customary to put the | at the end of the preceding line:
return HOST1X_CHANNEL_DMACTRL_DMASTOP_F(stop) | HOST1X_CHANNEL_DMACTRL_DMAGETRST_F(get_rst) | HOST1X_CHANNEL_DMACTRL_DMAINITGET_F(init_get);
Also since these are all single bits, I'd prefer if you could drop the _F suffix and not make them take a parameter. I think it'd even be better not to have this function at all, but make the intent explicit where the register is written. That is, have each call site set the bits explicitly instead of calling this helper. Having a parameter list such as (true, false, false) or (true, true, true) is confusing since you have to keep looking up the meaning of the parameters.
The operation that the _F macros do is masking and bit shifting the fields correctly. Without that, we'd need to expose several macros to mask and shift, and I'd rather just have one macro to take care of that.
But, we can open code the function to wherever it's used if that's more readable.
+}
+static void cdma_timeout_handler(struct work_struct *work);
Can this prototype be avoided?
We could try shuffling the code. There might be some dependency problems that forced this ordering, but we'll try.
+/**
- Reset to empty push buffer
- */
+static void push_buffer_reset(struct push_buffer *pb) +{
pb->fence = PUSH_BUFFER_SIZE - 8;
pb->cur = 0;
Maybe position is a better name than cur.
Sure.
+/**
- Init push buffer resources
- */
+static void push_buffer_destroy(struct push_buffer *pb);
You should be careful with these comment blocks. If you start them with /**, then you should make them proper kerneldoc comments. But you don't really need that for static functions, so you could just make them /*- style.
Also this particular comment is confusingly place on top of the proto- type of push_buffer_destroy().
You're right. We'll just remove the /** */ notation and use normal comments. And the comment is just misplaced, so we'll move it.
+/*
- Push two words to the push buffer
- Caller must ensure push buffer is not full
- */
+static void push_buffer_push_to(struct push_buffer *pb,
struct mem_handle *handle,
u32 op1, u32 op2)
+{
u32 cur = pb->cur;
u32 *p = (u32 *)((u32)pb->mapped + cur);
You do all this extra casting to make sure to increment by bytes and not 32-bit words. How about you change pb->cur to contain the word index, so that you don't have to go through hoops each time around.
Alternatively you could make it a pointer to u32 and not have to index or cast at all. So you'd end up with something like:
struct push_buffer { u32 *start; u32 *end; u32 *ptr; };
The complexity comes from the fact that we deal both with device virtual addresses and CPU addresses to the same buffer. We'll need the indexes so that we can convert between the two address spaces, but we might be able to use word indexes. We'll check this.
+/*
- Return the number of two word slots free in the push buffer
- */
+static u32 push_buffer_space(struct push_buffer *pb) +{
return ((pb->fence - pb->cur) & (PUSH_BUFFER_SIZE - 1)) / 8;
+}
Why & (PUSH_BUFFER_SIZE - 1) here? fence - cur can never be larger than PUSH_BUFFER_SIZE, can it?
You're right, this function doesn't need to worry about wrapping.
+/*
- Init timeout resources
- */
+static int cdma_timeout_init(struct host1x_cdma *cdma,
u32 syncpt_id)
+{
if (syncpt_id == NVSYNCPT_INVALID)
return -EINVAL;
Do we really need the syncpt_id check here? It is the only reason why we need to pass the parameter in the first place, and if we get to this point we should already have made sure that the syncpoint is actually valid.
True, we can drop this.
+/*
- Increment timedout buffer's syncpt via CPU.
Nit: "timed out buffer's"
Will fix.
- */
+static void cdma_timeout_cpu_incr(struct host1x_cdma *cdma, u32 getptr,
u32 syncpt_incrs, u32 syncval, u32 nr_slots)
The syncval parameter isn't used.
True, that'd be used only with wait base support, as we need to synchronize wait base with the sync point. Will remove.
+{
struct host1x *host1x = cdma_to_host1x(cdma);
struct push_buffer *pb = &cdma->push_buffer;
u32 i, getidx;
for (i = 0; i < syncpt_incrs; i++)
host1x_syncpt_cpu_incr(cdma->timeout.syncpt);
/* after CPU incr, ensure shadow is up to date */
host1x_syncpt_load_min(cdma->timeout.syncpt);
/* NOP all the PB slots */
getidx = getptr - pb->phys;
while (nr_slots--) {
u32 *p = (u32 *)((u32)pb->mapped + getidx);
*(p++) = HOST1X_OPCODE_NOOP;
*(p++) = HOST1X_OPCODE_NOOP;
dev_dbg(&host1x->dev->dev, "%s: NOP at 0x%x\n",
__func__, pb->phys + getidx);
getidx = (getidx + 8) & (PUSH_BUFFER_SIZE - 1);
}
wmb();
Why the memory barrier?
Can't think of any good reason. Will try removing.
+/*
- Similar to cdma_start(), but rather than starting from an idle
- state (where DMA GET is set to DMA PUT), on a timeout we restore
- DMA GET from an explicit value (so DMA may again be pending).
- */
+static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr) +{
struct host1x *host1x = cdma_to_host1x(cdma);
struct host1x_channel *ch = cdma_to_channel(cdma);
if (cdma->running)
return;
cdma->last_put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
HOST1X_CHANNEL_DMACTRL);
/* set base, end pointer (all of memory) */
host1x_ch_writel(ch, 0, HOST1X_CHANNEL_DMASTART);
host1x_ch_writel(ch, 0xFFFFFFFF, HOST1X_CHANNEL_DMAEND);
According to the TRM, writing to HOST1X_CHANNEL_DMASTART will start a DMA transfer on the channel (if DMA_PUT != DMA_GET). Irrespective of that, why set the valid range to all of physical memory? We know the valid range of the push buffer, why not set the limits accordingly?
That'd make sense. Currently we use the RESTART as the barrier, but having hardware check against DMAEND is a good idea, too.
+/*
- Kick channel DMA into action by writing its PUT offset (if it has changed)
- */
+static void cdma_kick(struct host1x_cdma *cdma) +{
struct host1x *host1x = cdma_to_host1x(cdma);
struct host1x_channel *ch = cdma_to_channel(cdma);
u32 put;
put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
if (put != cdma->last_put) {
host1x_ch_writel(ch, put, HOST1X_CHANNEL_DMAPUT);
cdma->last_put = put;
}
+}
kick() sounds unusual. Maybe flush or commit or something similar would be more accurate.
We could use flush.
+static void cdma_stop(struct host1x_cdma *cdma) +{
struct host1x_channel *ch = cdma_to_channel(cdma);
mutex_lock(&cdma->lock);
if (cdma->running) {
host1x_cdma_wait_locked(cdma, CDMA_EVENT_SYNC_QUEUE_EMPTY);
host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
HOST1X_CHANNEL_DMACTRL);
cdma->running = false;
}
mutex_unlock(&cdma->lock);
+}
Perhaps this should be ranem cdma_stop_sync() or similar to make it clear that it waits for the queue to run empty.
Ok, sounds good.
+static void cdma_timeout_teardown_end(struct host1x_cdma *cdma, u32 getptr)
Maybe the last parameter should be called restart to match its purpose?
Makes sense, will do.
+{
struct host1x *host1x = cdma_to_host1x(cdma);
struct host1x_channel *ch = cdma_to_channel(cdma);
u32 cmdproc_stop;
dev_dbg(&host1x->dev->dev,
"end channel teardown (id %d, DMAGET restart = 0x%x)\n",
ch->chid, getptr);
cmdproc_stop = host1x_sync_readl(host1x, HOST1X_SYNC_CMDPROC_STOP);
cmdproc_stop &= ~(BIT(ch->chid));
No need for the extra parentheses.
Ok, will remove.
host1x_sync_writel(host1x, cmdproc_stop, HOST1X_SYNC_CMDPROC_STOP);
cdma->torndown = false;
cdma_timeout_restart(cdma, getptr);
+}
I find this a bit non-intuitive. We teardown a channel, and when we're done tearing down, the torndown variable is set to false and the channel is actually restarted. Maybe you could explain some more how this works and what its purpose is.
Actually, teardown_begin freezes the channel, then we manipulate the queue, and in the end teardown_end restarts the channel. So these should be named freeze and resume. We could even drop the timeout from the names of these functions.
+/*
- If this timeout fires, it indicates the current sync_queue entry has
- exceeded its TTL and the userctx should be timed out and remaining
- submits already issued cleaned up (future submits return an error).
- */
I can't seem to find what causes subsequent submits to return an error. Also, how is the channel reset so that new jobs can be submitted?
That comment actually applies only downstream. We blacklist contexts for channels that carry state across submits (=have hardware contexts implemented). 2D has atomic jobs, so it doesn't need blacklisting.
host1x_cdma_update_sync_queue() purges the failed job, finds the DMAGET for the next job, and sets sync points correctly. It'll call teardown_end (which we'll rename) to resume the channel with the new DMAGET pointer.
+static void cdma_timeout_handler(struct work_struct *work) +{
struct host1x_cdma *cdma;
struct host1x *host1x;
struct host1x_channel *ch;
u32 syncpt_val;
u32 prev_cmdproc, cmdproc_stop;
cdma = container_of(to_delayed_work(work), struct host1x_cdma,
timeout.wq);
host1x = cdma_to_host1x(cdma);
ch = cdma_to_channel(cdma);
mutex_lock(&cdma->lock);
if (!cdma->timeout.clientid) {
dev_dbg(&host1x->dev->dev,
"cdma_timeout: expired, but has no clientid\n");
mutex_unlock(&cdma->lock);
return;
}
How can the CDMA not have a client?
I don't think that's possible. :-) We should just remove the check. It might be that we were just protecting some kind of race between timeout code triggering and something else, but I can't really think of a scenario.
/* stop processing to get a clean snapshot */
prev_cmdproc = host1x_sync_readl(host1x, HOST1X_SYNC_CMDPROC_STOP);
cmdproc_stop = prev_cmdproc | BIT(ch->chid);
host1x_sync_writel(host1x, cmdproc_stop, HOST1X_SYNC_CMDPROC_STOP);
dev_dbg(&host1x->dev->dev, "cdma_timeout: cmdproc was 0x%x is 0x%x\n",
prev_cmdproc, cmdproc_stop);
syncpt_val = host1x_syncpt_load_min(host1x->syncpt);
/* has buffer actually completed? */
if ((s32)(syncpt_val - cdma->timeout.syncpt_val) >= 0) {
dev_dbg(&host1x->dev->dev,
"cdma_timeout: expired, but buffer had completed\n");
Maybe this should really be a warning?
Not really - it's actually just a normal state. We got a timeout event, but before we process it, it might be that the job manages to complete. This can happen, and is not an error case.
/* restore */
cmdproc_stop = prev_cmdproc & ~(BIT(ch->chid));
No need for the extra parentheses. Also, why not just use prev_cmdproc, which shouldn't have the bit set anyway?
Yeah, prev_cmdproc is the one we should use directly.
diff --git a/drivers/gpu/host1x/hw/cdma_hw.h b/drivers/gpu/host1x/hw/cdma_hw.h
[...]
+/*
- Size of the sync queue. If it is too small, we won't be able to queue up
- many command buffers. If it is too large, we waste memory.
- */
+#define HOST1X_SYNC_QUEUE_SIZE 512
I don't see this used anywhere.
Sync queue used to be an array. It hasn't been for a long time, but this remained. Will remove.
+/*
- Number of gathers we allow to be queued up per channel. Must be a
- power of two. Currently sized such that pushbuffer is 4KB (512*8B).
- */
+#define HOST1X_GATHER_QUEUE_SIZE 512
More pieces falling into place.
Great. :-)
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
[...]
+#include "host1x.h" +#include "channel.h" +#include "dev.h" +#include <linux/slab.h> +#include "intr.h" +#include "job.h" +#include <trace/events/host1x.h>
More include ordering issues.
Will fix.
+static void submit_gathers(struct host1x_job *job) +{
/* push user gathers */
int i;
unsigned int?
for (i = 0 ; i < job->num_gathers; i++) {
struct host1x_job_gather *g = &job->gathers[i];
u32 op1 = host1x_opcode_gather(g->words);
u32 op2 = g->mem_base + g->offset;
host1x_cdma_push_gather(&job->ch->cdma,
job->gathers[i].ref,
job->gathers[i].offset,
op1, op2);
}
+}
Perhaps inline this into channel_submit()? I'm not sure how useful it really is to split off smallish functions such as this which aren't reused anywhere else. I don't have any major objection though, so you can keep it separate if you want.
I split these out because channel_submit() became so long that I couldn't understand it anymore. I'd prefer keeping separate just to keep myself (semi-)sane.
+static inline void __iomem *host1x_channel_regs(void __iomem *p, int ndx) +{
p += ndx * NV_HOST1X_CHANNEL_MAP_SIZE_BYTES;
return p;
+}
+static int host1x_channel_init(struct host1x_channel *ch,
struct host1x *dev, int index)
+{
ch->chid = index;
mutex_init(&ch->reflock);
mutex_init(&ch->submitlock);
ch->regs = host1x_channel_regs(dev->regs, index);
return 0;
+}
You only use host1x_channel_regs() once, so I really don't think it buys you anything to split it off. Both host1x_channel_regs() and host1x_channel_init() are short enough that they can be collapsed.
True, will merge.
diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
[...]
#include "hw/host1x01.h" #include "dev.h" +#include "channel.h" #include "hw/host1x01_hardware.h"
+#include "hw/channel_hw.c" +#include "hw/cdma_hw.c" #include "hw/syncpt_hw.c" #include "hw/intr_hw.c"
int host1x01_init(struct host1x *host) {
host->channel_op = host1x_channel_ops;
host->cdma_op = host1x_cdma_ops;
host->cdma_pb_op = host1x_pushbuffer_ops; host->syncpt_op = host1x_syncpt_ops; host->intr_op = host1x_intr_ops;
I think I mentioned this before, but I'd prefer not to have the .c files included here, but rather reference the ops structures externally. But I still think that especially CDMA and push buffer ops don't need to be in separate structures since they aren't likely to change with new hardware revisions.
The C files need to be included here so that they pick up the hardware defs for the correct SoC. Pushbuffer is probably something we can generalize, but channel registers can change, so they need to be per SoC.
diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h
[...]
index c1d5324..03873c0 100644 --- a/drivers/gpu/host1x/hw/host1x01_hardware.h +++ b/drivers/gpu/host1x/hw/host1x01_hardware.h @@ -21,6 +21,130 @@
#include <linux/types.h> #include <linux/bitops.h> +#include "hw_host1x01_channel.h" #include "hw_host1x01_sync.h" +#include "hw_host1x01_uclass.h"
+/* channel registers */ +#define NV_HOST1X_CHANNEL_MAP_SIZE_BYTES 16384
The only user of this seems to be host1x_channel_regs(), so it could be moved to that file. Also the name is overly long, why not something like HOST1X_CHANNEL_SIZE?
Sounds good.
+#define HOST1X_OPCODE_NOOP host1x_opcode_nonincr(0, 0)
HOST1X_OPCODE_NOP would be more canonical in my opinion.
Ok, can change.
+static inline u32 host1x_mask2(unsigned x, unsigned y) +{
return 1 | (1 << (y - x));
+}
What's this? I don't see it used anywhere.
It's a shortcut to add two register writes to one MASK opcode, but we'll remove the def as it's not used.
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_channel.h b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
[...]
+#define HOST1X_CHANNEL_DMACTRL_DMASTOP_F(v) \
host1x_channel_dmactrl_dmastop_f(v)
I mentioned this elsewhere already, but I think the _F suffix (and _f for that matter) along with the v parameter should go away.
I'd prefer keeping so that I don't have to use two #defines to replace one. That IMO makes the usage harder and more error prone.
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_uclass.h b/drivers/gpu/host1x/hw/hw_host1x01_uclass.h
[...]
What does the "uclass" stand for? It seems a bit useless to me.
It means host1x class, i.e. the host1x registers that can be written to from push buffers.
diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c index 16e3ada..ba48cee 100644 --- a/drivers/gpu/host1x/hw/syncpt_hw.c +++ b/drivers/gpu/host1x/hw/syncpt_hw.c @@ -97,6 +97,15 @@ static void syncpt_cpu_incr(struct host1x_syncpt *sp) wmb(); }
+/* remove a wait pointed to by patch_addr */ +static int syncpt_patch_wait(struct host1x_syncpt *sp, void *patch_addr) +{
u32 override = host1x_class_host_wait_syncpt(
NVSYNCPT_GRAPHICS_HOST, 0);
__raw_writel(override, patch_addr);
__raw_writel() isn't meant to be used for regular memory addresses, but only for MMIO addresses. patch_addr will be a kernel virtual address to an location in RAM, so you can just treat it as a normal pointer, so:
*(u32 *)patch_addr = override;
Sure, you mentioned it earlier, but I've just forgotten that. Sorry about that.
A small optimization might be to make override a static const, so that it doesn't have to be composed every time.
Can do.
diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
[...]
+static void action_submit_complete(struct host1x_waitlist *waiter) +{
struct host1x_channel *channel = waiter->data;
int nr_completed = waiter->count;
No need for this variable.
I'm using it for tracing in a follow-up patch. It can be used in traces for checking the queue length at each point of time.
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
[...]
+#ifdef CONFIG_TEGRA_HOST1X_FIREWALL +static int host1x_firewall = 1; +#else +static int host1x_firewall; +#endif
You could use IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL) in the code, which will have the nice side-effect of compiling code out if the symbol isn't selected.
Sure, I just wasn't aware of IS_ENABLED.
+struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks)
Maybe make the parameters unsigned int instead of u32?
I'll check this, but we're getting them from user space, and that API has a fixed length field. That's why I'm carrying that type over.
+{
struct host1x_job *job = NULL;
int num_unpins = num_cmdbufs + num_relocs;
unsigned int?
Sounds good.
s64 total;
This doesn't need to be signed, u64 will be good enough. None of the terms in the expression that assigns to total can be negative.
True, will change.
void *mem;
/* Check that we're not going to overflow */
total = sizeof(struct host1x_job)
+ num_relocs * sizeof(struct host1x_reloc)
+ num_unpins * sizeof(struct host1x_job_unpin_data)
+ num_waitchks * sizeof(struct host1x_waitchk)
+ num_cmdbufs * sizeof(struct host1x_job_gather)
+ num_unpins * sizeof(dma_addr_t)
+ num_unpins * sizeof(u32 *);
"+"s at the end of the preceding lines.
Ok.
if (total > ULONG_MAX)
return NULL;
mem = job = kzalloc(total, GFP_KERNEL);
if (!job)
return NULL;
kref_init(&job->ref);
job->ch = ch;
/* First init state to zero */
/*
* Redistribute memory to the structs.
* Overflows and negative conditions have
* already been checked in job_alloc().
*/
The last two lines don't really apply here. The checks are in this same function and they check only for overflow, not negative conditions, which can't happen anyway since the counts are all unsigned.
Actually overflow and negative in this case meant the same thing. Will fix comment.
+void host1x_job_get(struct host1x_job *job) +{
kref_get(&job->ref);
+}
I think it is common for *_get() functions to return a pointer to the referenced object.
Ok, can do.
+void host1x_job_add_gather(struct host1x_job *job,
u32 mem_id, u32 words, u32 offset)
+{
struct host1x_job_gather *cur_gather =
&job->gathers[job->num_gathers];
Should this check for overflow?
As defensive measure, could do, but this is not exploitable.
+/*
- Check driver supplied waitchk structs for syncpt thresholds
- that have already been satisfied and NULL the comparison (to
- avoid a wrap condition in the HW).
- */
+static int do_waitchks(struct host1x_job *job, struct host1x *host,
u32 patch_mem, struct mem_handle *h)
+{
int i;
/* compare syncpt vs wait threshold */
for (i = 0; i < job->num_waitchk; i++) {
struct host1x_waitchk *wait = &job->waitchk[i];
struct host1x_syncpt *sp =
host1x_syncpt_get(host, wait->syncpt_id);
/* validate syncpt id */
if (wait->syncpt_id > host1x_syncpt_nb_pts(host))
continue;
/* skip all other gathers */
if (patch_mem != wait->mem)
continue;
trace_host1x_syncpt_wait_check(wait->mem, wait->offset,
wait->syncpt_id, wait->thresh,
host1x_syncpt_read_min(sp));
if (host1x_syncpt_is_expired(
host1x_syncpt_get(host, wait->syncpt_id),
wait->thresh)) {
You already have the sp variable that you could use here to make it more readable.
True, will use that.
struct host1x_syncpt *sp =
host1x_syncpt_get(host, wait->syncpt_id);
And you don't need this then, since you already have sp pointing to the same syncpoint.
Ok.
void *patch_addr = NULL;
/*
* NULL an already satisfied WAIT_SYNCPT host method,
* by patching its args in the command stream. The
* method data is changed to reference a reserved
* (never given out or incr) NVSYNCPT_GRAPHICS_HOST
* syncpt with a matching threshold value of 0, so
* is guaranteed to be popped by the host HW.
*/
dev_dbg(&host->dev->dev,
"drop WAIT id %d (%s) thresh 0x%x, min 0x%x\n",
wait->syncpt_id, sp->name, wait->thresh,
host1x_syncpt_read_min(sp));
/* patch the wait */
patch_addr = host1x_memmgr_kmap(h,
wait->offset >> PAGE_SHIFT);
if (patch_addr) {
host1x_syncpt_patch_wait(sp,
(patch_addr +
(wait->offset & ~PAGE_MASK)));
host1x_memmgr_kunmap(h,
wait->offset >> PAGE_SHIFT,
patch_addr);
} else {
pr_err("Couldn't map cmdbuf for wait check\n");
}
This is a case where splitting out a small function would actually be useful to make the code more readable since you can remove two levels of indentation. You can just pass in the handle and the offset, let it do the actual patching. Maybe
host1x_syncpt_patch_offset(sp, h, wait->offset);
?
Sounds good, for readability point of view.
}
wait->mem = 0;
}
return 0;
+}
There's a gratuitous blank line.
Will remove.
+static int pin_job_mem(struct host1x_job *job) +{
int i;
int count = 0;
int result;
These (and the return value) can all be unsigned int.
True. will fix.
+static int do_relocs(struct host1x_job *job,
u32 cmdbuf_mem, struct mem_handle *h)
+{
int i = 0;
This can also be unsigned int.
True, will fix.
int last_page = -1;
And this should match the type of cmdbuf_offset (u32). You can initially set it to something like ~0 to make sure it doesn't match any valid offset.
You're right, will change.
void *cmdbuf_page_addr = NULL;
/* pin & patch the relocs for one gather */
while (i < job->num_relocs) {
struct host1x_reloc *reloc = &job->relocarray[i];
/* skip all other gathers */
if (cmdbuf_mem != reloc->cmdbuf_mem) {
i++;
continue;
}
if (last_page != reloc->cmdbuf_offset >> PAGE_SHIFT) {
if (cmdbuf_page_addr)
host1x_memmgr_kunmap(h,
last_page, cmdbuf_page_addr);
cmdbuf_page_addr = host1x_memmgr_kmap(h,
reloc->cmdbuf_offset >> PAGE_SHIFT);
last_page = reloc->cmdbuf_offset >> PAGE_SHIFT;
if (unlikely(!cmdbuf_page_addr)) {
pr_err("Couldn't map cmdbuf for relocation\n");
return -ENOMEM;
}
}
__raw_writel(
(job->reloc_addr_phys[i] +
reloc->target_offset) >> reloc->shift,
(cmdbuf_page_addr +
(reloc->cmdbuf_offset & ~PAGE_MASK)));
Again, wrong __raw_writel() usage.
Yes, sorry, I forgot about this.
/* remove completed reloc from the job */
if (i != job->num_relocs - 1) {
struct host1x_reloc *reloc_last =
&job->relocarray[job->num_relocs - 1];
reloc->cmdbuf_mem = reloc_last->cmdbuf_mem;
reloc->cmdbuf_offset = reloc_last->cmdbuf_offset;
reloc->target = reloc_last->target;
reloc->target_offset = reloc_last->target_offset;
reloc->shift = reloc_last->shift;
job->reloc_addr_phys[i] =
job->reloc_addr_phys[job->num_relocs - 1];
job->num_relocs--;
} else {
break;
}
}
if (cmdbuf_page_addr)
host1x_memmgr_kunmap(h, last_page, cmdbuf_page_addr);
return 0;
+}
Also the algorithm seems a bit strange and hard to follow. Instead of removing relocs from the job, replacing them with the last entry and decrementing job->num_relocs, how much is the penalty for always iterating over all relocs? This is one of the other cases where I'd argue that simplicity is key. Furthermore you need to copy quite a bit of data to replace the completed relocs, so I'm not sure it buys you much.
It could always be optimized later on by just setting a bit in the reloc to mark it as completed, or keep a bitmask of completed relocations or whatever.
This was done in a big optimization patch, but we'll check if we could remove this. Previously we just set cmdbuf_mem for the completed reloc to 0, and that should work in this case.
+static int check_reloc(struct host1x_reloc *reloc,
u32 cmdbuf_id, int offset)
offset can be unsigned int.
Yep, will change.
+{
int err = 0;
if (reloc->cmdbuf_mem != cmdbuf_id
|| reloc->cmdbuf_offset != offset * sizeof(u32))
err = -EINVAL;
return err;
+}
More canonically:
offset *= sizeof(u32); if (reloc->cmdbuf_mem != cmdbuf_id || reloc->cmdbuf_offset != offset) return -EINVAL; return 0;
Ok, both do the same thing, so can change.
+static int check_mask(struct host1x_job *job,
struct platform_device *pdev,
struct host1x_reloc **reloc, int *num_relocs,
u32 cmdbuf_id, int *offset,
u32 *words, u32 class, u32 reg, u32 mask)
num_relocs and offset can be unsigned int *.
Same comment for the other check_*() functions. That said I think the code would become a lot more readable if you were to wrap all of these parameters into a structure, say host1x_firewall, and just pass that into the functions.
True, might improve performance, too. We'll do that.
+static inline int copy_gathers(struct host1x_job *job,
struct platform_device *pdev)
struct device *
Will do.
+{
size_t size = 0;
size_t offset = 0;
int i;
for (i = 0; i < job->num_gathers; i++) {
struct host1x_job_gather *g = &job->gathers[i];
size += g->words * sizeof(u32);
}
job->gather_copy_mapped = dma_alloc_writecombine(&pdev->dev,
size, &job->gather_copy, GFP_KERNEL);
if (IS_ERR(job->gather_copy_mapped)) {
dma_alloc_writecombine() returns NULL on failure, so this check is wrong.
Oops, will fix.
int err = PTR_ERR(job->gather_copy_mapped);
job->gather_copy_mapped = NULL;
return err;
}
job->gather_copy_size = size;
for (i = 0; i < job->num_gathers; i++) {
struct host1x_job_gather *g = &job->gathers[i];
void *gather = host1x_memmgr_mmap(g->ref);
memcpy(job->gather_copy_mapped + offset,
gather + g->offset,
g->words * sizeof(u32));
g->mem_base = job->gather_copy;
g->offset = offset;
g->mem_id = 0;
g->ref = 0;
host1x_memmgr_munmap(g->ref, gather);
offset += g->words * sizeof(u32);
}
return 0;
+}
I wonder, where's this DMA buffer actually used? I can't find any use between this copy and the corresponding dma_free_writecombine() call.
We replace the gathers in host1x_job with the ones we allocate here, so they are used when pushing the gather's to hardware.
This is done so that user space cannot tamper with the gathers once they've been checked by firewall.
+int host1x_job_pin(struct host1x_job *job, struct platform_device *pdev) +{
int err = 0, i = 0, j = 0;
No need to initialize these here. i and j can also be unsigned.
Ok.
struct host1x *host = host1x_get_host(pdev);
DECLARE_BITMAP(waitchk_mask, host1x_syncpt_nb_pts(host));
bitmap_zero(waitchk_mask, host1x_syncpt_nb_pts(host));
for (i = 0; i < job->num_waitchk; i++) {
u32 syncpt_id = job->waitchk[i].syncpt_id;
if (syncpt_id < host1x_syncpt_nb_pts(host))
set_bit(syncpt_id, waitchk_mask);
}
/* get current syncpt values for waitchk */
for_each_set_bit(i, &waitchk_mask[0], sizeof(waitchk_mask))
host1x_syncpt_load_min(host->syncpt + i);
/* pin memory */
err = pin_job_mem(job);
if (err <= 0)
goto out;
pin_job_mem() never returns negative.
Ok, will fix.
/* patch gathers */
for (i = 0; i < job->num_gathers; i++) {
struct host1x_job_gather *g = &job->gathers[i];
/* process each gather mem only once */
if (!g->ref) {
g->ref = host1x_memmgr_get(g->mem_id, job->ch->dev);
if (IS_ERR(g->ref)) {
host1x_memmgr_get() also seems to return NULL on error.
I think I'll change memmgr_get() to return an ERR_PTR().
err = PTR_ERR(g->ref);
g->ref = NULL;
break;
}
g->mem_base = job->gather_addr_phys[i];
for (j = 0; j < job->num_gathers; j++) {
struct host1x_job_gather *tmp =
&job->gathers[j];
if (!tmp->ref && tmp->mem_id == g->mem_id) {
tmp->ref = g->ref;
tmp->mem_base = g->mem_base;
}
}
err = 0;
if (host1x_firewall)
if (IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL))
Will fix.
err = validate(job, pdev, g);
if (err)
dev_err(&pdev->dev,
"Job validate returned %d\n", err);
if (!err)
err = do_relocs(job, g->mem_id, g->ref);
if (!err)
err = do_waitchks(job, host,
g->mem_id, g->ref);
host1x_memmgr_put(g->ref);
if (err)
break;
}
}
if (host1x_firewall && !err) {
And here.
Here, too.
+/*
- Debug routine used to dump job entries
- */
+void host1x_job_dump(struct device *dev, struct host1x_job *job) +{
dev_dbg(dev, " SYNCPT_ID %d\n",
job->syncpt_id);
dev_dbg(dev, " SYNCPT_VAL %d\n",
job->syncpt_end);
dev_dbg(dev, " FIRST_GET 0x%x\n",
job->first_get);
dev_dbg(dev, " TIMEOUT %d\n",
job->timeout);
dev_dbg(dev, " NUM_SLOTS %d\n",
job->num_slots);
dev_dbg(dev, " NUM_HANDLES %d\n",
job->num_unpins);
+}
These don't need to be wrapped.
True, will merge lines.
diff --git a/drivers/gpu/host1x/job.h b/drivers/gpu/host1x/job.h
[...]
+struct host1x_job_gather {
u32 words;
dma_addr_t mem_base;
u32 mem_id;
int offset;
struct mem_handle *ref;
+};
+struct host1x_cmdbuf {
__u32 mem;
__u32 offset;
__u32 words;
__u32 pad;
+};
+struct host1x_reloc {
__u32 cmdbuf_mem;
__u32 cmdbuf_offset;
__u32 target;
__u32 target_offset;
__u32 shift;
__u32 pad;
+};
+struct host1x_waitchk {
__u32 mem;
__u32 offset;
__u32 syncpt_id;
__u32 thresh;
+};
None of these are shared with userspace, so they shouldn't take the __u32 types, but the regular u32 ones.
True. We copy stuff from user space types to these, but we don't use these directly in user space API.
+/*
- Each submit is tracked as a host1x_job.
- */
+struct host1x_job {
/* When refcount goes to zero, job can be freed */
struct kref ref;
/* List entry */
struct list_head list;
/* Channel where job is submitted to */
struct host1x_channel *ch;
Maybe write it out as "channel"?
Ok.
int clientid;
Subsequent patches assign u32 to this field, so maybe the type should be changed here. And maybe leave out the id suffix. It doesn't really add any information.
Good catch, will change.
/* Gathers and their memory */
struct host1x_job_gather *gathers;
int num_gathers;
unsigned int
Will change.
/* Wait checks to be processed at submit time */
struct host1x_waitchk *waitchk;
int num_waitchk;
unsigned int
Ok.
u32 waitchk_mask;
This might need to be changed to a bitfield once future Tegra versions start supporting more than 32 syncpoints.
True, I think we'll need to get this changed already now. We actually drop the usage of waitchk_mask in downstream because of this. It's basically just an optimization that doesn't gain any real world speed advantage.
/* Array of handles to be pinned & unpinned */
struct host1x_reloc *relocarray;
int num_relocs;
unsigned int
Will change.
struct host1x_job_unpin_data *unpins;
int num_unpins;
unsigned int
Will change.
dma_addr_t *addr_phys;
dma_addr_t *gather_addr_phys;
dma_addr_t *reloc_addr_phys;
/* Sync point id, number of increments and end related to the submit */
u32 syncpt_id;
u32 syncpt_incrs;
u32 syncpt_end;
/* Maximum time to wait for this job */
int timeout;
unsigned int. I think we discussed this already in a slightly different context in patch 2.
Sure, will change. I think timeouts were discussed wrt syncpt wait timeout.
/* Null kickoff prevents submit from being sent to hardware */
bool null_kickoff;
I don't think this is used anywhere.
True, we can remove this as we haven't posted the code for null kickoff.
/* Index and number of slots used in the push buffer */
int first_get;
int num_slots;
unsigned int
Ok.
/* Copy of gathers */
size_t gather_copy_size;
dma_addr_t gather_copy;
u8 *gather_copy_mapped;
Are these really needed? They don't seem to be used anywhere except to store a copy and free that copy sometime later.
They're needed so that kernel can take a copy of the gathers so that user space cannot tamper with them post-submit.
/* Temporary space for unpin ids */
long unsigned int *pin_ids;
unsigned long
Will change.
/* Check if register is marked as an address reg */
int (*is_addr_reg)(struct platform_device *dev, u32 reg, u32 class);
is_addr_reg() sounds a bit unusual. Maybe match this to the name of the main firewall routine, validate()?
The point of this op is to just tell if a register for a class is pointing to a buffer. validate then uses this information. But both answers (yes/no) and both types of registers are still valid, so validate() wouldn't be the proper name.
validation is then done by checking that there's a reloc corresponding to each register write to a register that can hold an address.
/* Request a SETCLASS to this class */
u32 class;
/* Add a channel wait for previous ops to complete */
u32 serialize;
This is used in code as a boolean. Why does it need to be 32 bits?
No need, will change to bool.
diff --git a/drivers/gpu/host1x/memmgr.h b/drivers/gpu/host1x/memmgr.h
[...]
+struct mem_handle; +struct platform_device;
+struct host1x_job_unpin_data {
struct mem_handle *h;
struct sg_table *mem;
+};
+enum mem_mgr_flag {
mem_mgr_flag_uncacheable = 0,
mem_mgr_flag_write_combine = 1,
+};
I'd like to see this use a more object-oriented approach and more common terminology. All of these handles are essentially buffer objects, so maybe something like host1x_bo would be a nice and short name.
To make this more object-oriented, I propose something like:
struct host1x_bo_ops { int (*alloc)(struct host1x_bo *bo, size_t size, unsigned long align, unsigned long flags); int (*free)(struct host1x_bo *bo); ... }; struct host1x_bo { const struct host1x_bo_ops *ops; }; struct host1x_cma_bo { struct host1x_bo base; struct drm_gem_cma_object *obj; }; static inline struct host1x_cma_bo *to_host1x_cma_bo(struct host1x_bo *bo) { return container_of(bo, struct host1x_cma_bo, base); } static inline int host1x_bo_alloc(struct host1x_bo *bo, size_t size, unsigned long align, unsigned long flags) { return bo->ops->alloc(bo, size, align, flags); } ...
That should be easy to extend with a new type of BO once the IOMMU-based allocator is ready. And as I said it is much closer in terminology to what other drivers do.
One complexity is that we're using the same type for communicating with user space. Each buffer carries with it a flag indicating its allocator. We might be able to model the internal structure to be more like what you propose, but for the API we still need the flag.
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h index b46d044..255a3a3 100644 --- a/drivers/gpu/host1x/syncpt.h +++ b/drivers/gpu/host1x/syncpt.h @@ -26,6 +26,7 @@ struct host1x;
#define NVSYNCPT_INVALID (-1) +#define NVSYNCPT_GRAPHICS_HOST 0
I think these should match other naming, so:
#define HOST1X_SYNCPT_INVALID -1 #define HOST1X_SYNCPT_HOST1X 0
Sure, sounds good.
There are a few more occurrences where platform_device is used but I haven't commented on them. I don't think any of them won't work with just a struct device instead. Also I may not have caught all of the places where you should rather be using unsigned int instead of int, so you might want to look out for some of those.
Yes, we'll go through the code with this in mind.
Generally I very much like where this is going. Are there any plans to move the userspace binary driver to this interface at some point so we can more actively test it? Also, is anything else blocking adding a gr3d device similar to gr2d from this patch series?
We're doing this in stages. I don't want to change the code base and APIs both in one step, because big moves in both user and kernel space tend to fail easily.
First we upstream code, and try to get feature parity. Then we re-engineer our downstream driver delta on top of the upstream code, but in this phase we keep the downstream kernel API.
In the next step, we'll start moving to the DRM APIs.
So, there's quite a few steps still before we're on DRM APIs, but we'll reach it at some point. :-)
3D driver should work on top of this. I don't see anything blocking that.
Terje
On Tue, Feb 26, 2013 at 11:48:18AM +0200, Terje Bergström wrote:
On 25.02.2013 17:24, Thierry Reding wrote:
On Tue, Jan 15, 2013 at 01:43:59PM +0200, Terje Bergstrom wrote:
[...]
+/*
- Start timer for a buffer submition that has completed yet.
"submission". And I don't understand the "that has completed yet" part.
It should become "Start timer that tracks the time spent by the job".
Yes, that's a lot better.
if (list_empty(&cdma->sync_queue) &&
cdma->event == CDMA_EVENT_SYNC_QUEUE_EMPTY)
signal = true;
This looks funny, maybe:
if (cdma->event == CDMA_EVENT_SYNC_QUEUE_EMPTY && list_empty(&cdma->sync_queue)) signal = true;
?
Indenting at least is strange. I don't have a preference for the ordering of conditions, so if you like the latter order, we can just use that.
I just happen to find it easier to read that way. If you want to keep the ordering that's fine, but the indentation needs to be fixed.
+{
u32 get_restart;
Maybe just call this "restart" or "restart_addr". get_restart sounds like a function name.
Ok, how about "restart_dmaget_addr"? That indicates what we're doing with the restart address.
Sounds good.
list_for_each_entry(job, &cdma->sync_queue, list) {
if (syncpt_val < job->syncpt_end)
break;
host1x_job_dump(&dev->dev, job);
}
That's potentially a lot of debug output. I wonder if it might make sense to control parts of this via a module parameter. Then again, if somebody really needs to debug this, maybe they really want *all* the information.
host1x_job_dump() uses dev_dbg(), so it only dumps a lot if DEBUG has been defined in that file.
Okay, let's leave it like that then.
+/*
- Destroy a cdma
- */
+void host1x_cdma_deinit(struct host1x_cdma *cdma) +{
struct push_buffer *pb = &cdma->push_buffer;
struct host1x *host1x = cdma_to_host1x(cdma);
if (cdma->running) {
pr_warn("%s: CDMA still running\n",
__func__);
} else {
host1x->cdma_pb_op.destroy(pb);
host1x->cdma_op.timeout_destroy(cdma);
}
+}
There's no way to recover from the situation where a cdma is still running. Can this not return an error code (-EBUSY?) if the cdma can't be destroyed?
It's called from close(), which cannot return an error code. It's actually more of a power optimization. The effect is that if there are no users for channel, we'll just not free up the push buffer.
I think the proper fix would actually be to check in host1x_cdma_init() if push buffer is already allocated and cdma->running. In that case we could skip most of initialization.
Yes, in that case it might be useful to do this. I still think it's worth to return an error code to the caller, even if it can't be propagated. That way the caller at least has the possibility to react.
I'm still not quite sure I understand the necessity for this, though. Maybe you can give an example of when this will actually happen?
+/*
- cdma
- This is in charge of a host command DMA channel.
- Sends ops to a push buffer, and takes responsibility for unpinning
- (& possibly freeing) of memory after those ops have completed.
- Producer:
- begin
push - send ops to the push buffer
- end - start command DMA and enqueue handles to be unpinned
- Consumer:
- update - call to update sync queue and push buffer, unpin memory
- */
I find the name to be a bit confusing. For some reason I automatically think of GSM when I read CDMA. This really is more of a job queue, so maybe calling it host1x_job_queue might be more appropriate. But I've already requested a lot of things to be renamed, so I think I can live with this being called CDMA if you don't want to change it.
Alternatively all of these could be moved to the struct host1x_channel given that there's only one of each of the push_buffer, buffer_timeout and host1x_cma objects per channel.
I did consider merging those two at a time. That should work, as they both deal with channels essentially. I also saw that the resulting file and data structures became quite large, so I have so far preferred to keep them separate.
This way I can keep the "higher level" stuff (inserting setclass, serializing, allocating sync point ranges, etc) in one file and lower level stuff (write to hardware, deal with push buffer pointers, etc) in another.
Alright. I can live with that.
+int host1x_channel_submit(struct host1x_job *job) +{
return host1x_get_host(job->ch->dev)->channel_op.submit(job);
+}
I'd expect a function named host1x_channel_submit() to take a struct host1x_channel *. Should this perhaps be called host1x_job_submit()?
It calls into channel code directly, and the underlying op also just takes a job. We could add channel as a parameter, and not pass it in host1x_job_alloc(). but we actually need the channel data already in host1x_job_pin(), which comes before submit. We need it so that we pin the buffer to correct engine.
That's all fine. My point was that this operates on a job object, so I'd find it more intuitive if the function name reflected that. There's nothing wrong with submitting a job without explicitly specifying the channel if it is tied to one channel anyway.
host1x_channel_submit() would imply "submit channel", which doesn't make sense, so the next best alternative is "submit job to channel", but that isn't reflected in the parameters. So host1x_job_submit() fits pretty well. There's no reason why it has to be prefixed host1x_channel_*, right?
+struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev) +{
struct host1x_channel *ch = NULL;
struct host1x *host1x = host1x_get_host(pdev);
int chindex;
int max_channels = host1x->info.nb_channels;
int err;
mutex_lock(&host1x->chlist_mutex);
chindex = host1x->allocated_channels;
if (chindex > max_channels)
goto fail;
ch = kzalloc(sizeof(*ch), GFP_KERNEL);
if (ch == NULL)
goto fail;
/* Link platform_device to host1x_channel */
err = host1x->channel_op.init(ch, host1x, chindex);
if (err < 0)
goto fail;
ch->dev = pdev;
/* Add to channel list */
list_add_tail(&ch->list, &host1x->chlist.list);
host1x->allocated_channels++;
mutex_unlock(&host1x->chlist_mutex);
return ch;
+fail:
dev_err(&pdev->dev, "failed to init channel\n");
kfree(ch);
mutex_unlock(&host1x->chlist_mutex);
return NULL;
+}
I think the critical section could be shorter here. It's probably not worth the extra trouble, though, given that channels are not often allocated.
Yeah, boot time isn't measured in microseconds. :-) But, if we just make allocated_channels an atomic, we should be able to drop chlist_mutex altogether and it could simplify the code.
You still need to protect the list from concurrent modification.
+/* channel list operations */ +void host1x_channel_list_init(struct host1x *); +void host1x_channel_for_all(struct host1x *, void *data,
int (*fptr)(struct host1x_channel *ch, void *fdata));
+struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev); +void host1x_channel_free(struct host1x_channel *ch);
Is it a good idea to make host1x_channel_free() publicly available? Shouldn't the host1x_channel_alloc()/host1x_channel_request() return a host1x_channel with a reference count of 1 and everybody release their reference using host1x_channel_put() to make sure the channel is freed only after the last reference disappears?
Otherwise whoever calls host1x_channel_free() will confuse everybody else that's still keeping a reference.
The difference is that _put and _get are called to indicate how many user space processes there are for the channel. Even if there are no processes, we won't free the channel structure - we just freeze the channel.
_alloc and _free are different in that they actually create the channel structs and delete them and they follow the lifecycle of the driver. Perhaps we should figure new naming, but refcounting and alloc/free cannot be merged here.
I understand. Perhaps better names would be host1x_channel_setup() and host1x_channel_teardown()?
+{
struct drm_gem_cma_object *obj = to_cma_obj((void *)id);
struct mutex *struct_mutex = &obj->base.dev->struct_mutex;
mutex_lock(struct_mutex);
drm_gem_object_reference(&obj->base);
mutex_unlock(struct_mutex);
I think it's more customary to obtain a pointer to struct drm_device and then use mutex_{lock,unlock}(&drm->struct_mutex). Or you could just use drm_gem_object_reference_unlocked(&obj->base) instead. Which doesn't exist yet, apparently. But it could be added.
I think we could take the former path - just refer to mutex in a different way.
You'll get extra points if you add the function =). The documentation in Documentation/DocBook/drm.tmpl says that it exists, but it doesn't, so you'd even be fixing a bug along the way.
+int host1x_cma_pin_array_ids(struct platform_device *dev,
long unsigned *ids,
long unsigned id_type_mask,
long unsigned id_type,
u32 count,
struct host1x_job_unpin_data *unpin_data,
dma_addr_t *phys_addr)
struct device * and unsigned long please. count can also doesn't need to be a sized type. unsigned int will do just fine. The return value can also be unsigned int if you don't expect to return any error conditions.
I think we'll need to check these. ids probably needs to be a u32 *, and id_type_mask and id_type should be u32. They come like that from user space.
Okay. My main point was that it's more usual to use "unsigned long" than "long unsigned":
linux.git $ git grep -n 'long unsigned' | wc -l 72 linux.git $ git grep -n 'unsigned long' | wc -l 106575
Also the more I think about it, the more I have doubts that passing around IDs like this (or using ID types and masks) is the right thing to do. I'll get back to that later.
int allocated_channels;
unsigned int? And maybe just "num_channels"?
num_channels could be thought as "number of available channels", so I'd like to use num_allocated_channels here.
Okay.
diff --git a/drivers/gpu/host1x/host1x.h b/drivers/gpu/host1x/host1x.h
[...]
+enum host1x_class {
NV_HOST1X_CLASS_ID = 0x1,
NV_GRAPHICS_2D_CLASS_ID = 0x51,
This entry belongs in a later patch, right? And I find it convenient if enumeration constants start with the enum name as prefix. Furthermore it'd be nice to reuse the hardware module names, like so:
enum host1x_class { HOST1X_CLASS_HOST1X, HOST1X_CLASS_GR2D, HOST1X_CLASS_GR3D, };
The naming sounds good. We already use HOST1X_CLASS_HOST1X in code to insert a wait. If you'd prefer, we can move the definition of HOST1X_CLASS_GR2D to the later patch.
Yes, it's better to introduce it in the patch that first uses it.
diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
[...]
+#include <linux/slab.h> +#include <linux/scatterlist.h> +#include <linux/dma-mapping.h> +#include "cdma.h" +#include "channel.h" +#include "dev.h" +#include "memmgr.h"
+#include "cdma_hw.h"
+static inline u32 host1x_channel_dmactrl(int stop, int get_rst, int init_get) +{
return HOST1X_CHANNEL_DMACTRL_DMASTOP_F(stop)
| HOST1X_CHANNEL_DMACTRL_DMAGETRST_F(get_rst)
| HOST1X_CHANNEL_DMACTRL_DMAINITGET_F(init_get);
I think it is more customary to put the | at the end of the preceding line:
return HOST1X_CHANNEL_DMACTRL_DMASTOP_F(stop) | HOST1X_CHANNEL_DMACTRL_DMAGETRST_F(get_rst) | HOST1X_CHANNEL_DMACTRL_DMAINITGET_F(init_get);
Also since these are all single bits, I'd prefer if you could drop the _F suffix and not make them take a parameter. I think it'd even be better not to have this function at all, but make the intent explicit where the register is written. That is, have each call site set the bits explicitly instead of calling this helper. Having a parameter list such as (true, false, false) or (true, true, true) is confusing since you have to keep looking up the meaning of the parameters.
The operation that the _F macros do is masking and bit shifting the fields correctly. Without that, we'd need to expose several macros to mask and shift, and I'd rather just have one macro to take care of that.
But, we can open code the function to wherever it's used if that's more readable.
I wasn't arguing against masking and shifting, but rather in favour of treating these like normal bit definitions. So instead of passing a boolean parameter to the macro, you just don't use it if the bit isn't supposed to be set. And if you want to set the bit you or in the value.
So:
static inline u32 host1x_channel_dmactrl_dmastop(void) { return 1 << 0; }
#define HOST1X_CHANNEL_DMACTRL_DMASTOP \ host1x_channel_dmactrl_dmastop()
+/*
- Similar to cdma_start(), but rather than starting from an idle
- state (where DMA GET is set to DMA PUT), on a timeout we restore
- DMA GET from an explicit value (so DMA may again be pending).
- */
+static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr) +{
struct host1x *host1x = cdma_to_host1x(cdma);
struct host1x_channel *ch = cdma_to_channel(cdma);
if (cdma->running)
return;
cdma->last_put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
host1x_ch_writel(ch, host1x_channel_dmactrl(true, false, false),
HOST1X_CHANNEL_DMACTRL);
/* set base, end pointer (all of memory) */
host1x_ch_writel(ch, 0, HOST1X_CHANNEL_DMASTART);
host1x_ch_writel(ch, 0xFFFFFFFF, HOST1X_CHANNEL_DMAEND);
According to the TRM, writing to HOST1X_CHANNEL_DMASTART will start a DMA transfer on the channel (if DMA_PUT != DMA_GET). Irrespective of that, why set the valid range to all of physical memory? We know the valid range of the push buffer, why not set the limits accordingly?
That'd make sense. Currently we use the RESTART as the barrier, but having hardware check against DMAEND is a good idea, too.
Any reason why DMASTART shouldn't be used to restrict the range as well?
+/*
- Kick channel DMA into action by writing its PUT offset (if it has changed)
- */
+static void cdma_kick(struct host1x_cdma *cdma) +{
struct host1x *host1x = cdma_to_host1x(cdma);
struct host1x_channel *ch = cdma_to_channel(cdma);
u32 put;
put = host1x->cdma_pb_op.putptr(&cdma->push_buffer);
if (put != cdma->last_put) {
host1x_ch_writel(ch, put, HOST1X_CHANNEL_DMAPUT);
cdma->last_put = put;
}
+}
kick() sounds unusual. Maybe flush or commit or something similar would be more accurate.
We could use flush.
Great.
host1x_sync_writel(host1x, cmdproc_stop, HOST1X_SYNC_CMDPROC_STOP);
cdma->torndown = false;
cdma_timeout_restart(cdma, getptr);
+}
I find this a bit non-intuitive. We teardown a channel, and when we're done tearing down, the torndown variable is set to false and the channel is actually restarted. Maybe you could explain some more how this works and what its purpose is.
Actually, teardown_begin freezes the channel, then we manipulate the queue, and in the end teardown_end restarts the channel. So these should be named freeze and resume. We could even drop the timeout from the names of these functions.
Sounds good.
/* stop processing to get a clean snapshot */
prev_cmdproc = host1x_sync_readl(host1x, HOST1X_SYNC_CMDPROC_STOP);
cmdproc_stop = prev_cmdproc | BIT(ch->chid);
host1x_sync_writel(host1x, cmdproc_stop, HOST1X_SYNC_CMDPROC_STOP);
dev_dbg(&host1x->dev->dev, "cdma_timeout: cmdproc was 0x%x is 0x%x\n",
prev_cmdproc, cmdproc_stop);
syncpt_val = host1x_syncpt_load_min(host1x->syncpt);
/* has buffer actually completed? */
if ((s32)(syncpt_val - cdma->timeout.syncpt_val) >= 0) {
dev_dbg(&host1x->dev->dev,
"cdma_timeout: expired, but buffer had completed\n");
Maybe this should really be a warning?
Not really - it's actually just a normal state. We got a timeout event, but before we process it, it might be that the job manages to complete. This can happen, and is not an error case.
Okay, I see. That's fine then.
for (i = 0 ; i < job->num_gathers; i++) {
struct host1x_job_gather *g = &job->gathers[i];
u32 op1 = host1x_opcode_gather(g->words);
u32 op2 = g->mem_base + g->offset;
host1x_cdma_push_gather(&job->ch->cdma,
job->gathers[i].ref,
job->gathers[i].offset,
op1, op2);
}
+}
Perhaps inline this into channel_submit()? I'm not sure how useful it really is to split off smallish functions such as this which aren't reused anywhere else. I don't have any major objection though, so you can keep it separate if you want.
I split these out because channel_submit() became so long that I couldn't understand it anymore. I'd prefer keeping separate just to keep myself (semi-)sane.
Okay. =)
diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c
[...]
#include "hw/host1x01.h" #include "dev.h" +#include "channel.h" #include "hw/host1x01_hardware.h"
+#include "hw/channel_hw.c" +#include "hw/cdma_hw.c" #include "hw/syncpt_hw.c" #include "hw/intr_hw.c"
int host1x01_init(struct host1x *host) {
host->channel_op = host1x_channel_ops;
host->cdma_op = host1x_cdma_ops;
host->cdma_pb_op = host1x_pushbuffer_ops; host->syncpt_op = host1x_syncpt_ops; host->intr_op = host1x_intr_ops;
I think I mentioned this before, but I'd prefer not to have the .c files included here, but rather reference the ops structures externally. But I still think that especially CDMA and push buffer ops don't need to be in separate structures since they aren't likely to change with new hardware revisions.
The C files need to be included here so that they pick up the hardware defs for the correct SoC. Pushbuffer is probably something we can generalize, but channel registers can change, so they need to be per SoC.
We can do the same using extern variables, can't we? If you're concerned about the definitions that come from the headers, we can probably make that work by parameterizing more.
I think we can live with this way for now and clean it up later, though.
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_channel.h b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
[...]
+#define HOST1X_CHANNEL_DMACTRL_DMASTOP_F(v) \
host1x_channel_dmactrl_dmastop_f(v)
I mentioned this elsewhere already, but I think the _F suffix (and _f for that matter) along with the v parameter should go away.
I'd prefer keeping so that I don't have to use two #defines to replace one. That IMO makes the usage harder and more error prone.
That's precisely my point. This actually makes it harder to use. If you don't want to set the bit, just don't or it in. It's completely pointless to shift and mask an unset bit.
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_uclass.h b/drivers/gpu/host1x/hw/hw_host1x01_uclass.h
[...]
What does the "uclass" stand for? It seems a bit useless to me.
It means host1x class, i.e. the host1x registers that can be written to from push buffers.
I still don't understand why we need uclass. It seems redundant.
diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
[...]
+static void action_submit_complete(struct host1x_waitlist *waiter) +{
struct host1x_channel *channel = waiter->data;
int nr_completed = waiter->count;
No need for this variable.
I'm using it for tracing in a follow-up patch. It can be used in traces for checking the queue length at each point of time.
Any reason why it can't be introduced in the follow-up patch?
+struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
u32 num_cmdbufs, u32 num_relocs, u32 num_waitchks)
Maybe make the parameters unsigned int instead of u32?
I'll check this, but we're getting them from user space, and that API has a fixed length field. That's why I'm carrying that type over.
Okay, it isn't that important.
+void host1x_job_add_gather(struct host1x_job *job,
u32 mem_id, u32 words, u32 offset)
+{
struct host1x_job_gather *cur_gather =
&job->gathers[job->num_gathers];
Should this check for overflow?
As defensive measure, could do, but this is not exploitable.
Alright then.
/* remove completed reloc from the job */
if (i != job->num_relocs - 1) {
struct host1x_reloc *reloc_last =
&job->relocarray[job->num_relocs - 1];
reloc->cmdbuf_mem = reloc_last->cmdbuf_mem;
reloc->cmdbuf_offset = reloc_last->cmdbuf_offset;
reloc->target = reloc_last->target;
reloc->target_offset = reloc_last->target_offset;
reloc->shift = reloc_last->shift;
job->reloc_addr_phys[i] =
job->reloc_addr_phys[job->num_relocs - 1];
job->num_relocs--;
} else {
break;
}
}
if (cmdbuf_page_addr)
host1x_memmgr_kunmap(h, last_page, cmdbuf_page_addr);
return 0;
+}
Also the algorithm seems a bit strange and hard to follow. Instead of removing relocs from the job, replacing them with the last entry and decrementing job->num_relocs, how much is the penalty for always iterating over all relocs? This is one of the other cases where I'd argue that simplicity is key. Furthermore you need to copy quite a bit of data to replace the completed relocs, so I'm not sure it buys you much.
It could always be optimized later on by just setting a bit in the reloc to mark it as completed, or keep a bitmask of completed relocations or whatever.
This was done in a big optimization patch, but we'll check if we could remove this. Previously we just set cmdbuf_mem for the completed reloc to 0, and that should work in this case.
That certainly sounds simpler.
int err = PTR_ERR(job->gather_copy_mapped);
job->gather_copy_mapped = NULL;
return err;
}
job->gather_copy_size = size;
for (i = 0; i < job->num_gathers; i++) {
struct host1x_job_gather *g = &job->gathers[i];
void *gather = host1x_memmgr_mmap(g->ref);
memcpy(job->gather_copy_mapped + offset,
gather + g->offset,
g->words * sizeof(u32));
g->mem_base = job->gather_copy;
g->offset = offset;
g->mem_id = 0;
g->ref = 0;
host1x_memmgr_munmap(g->ref, gather);
offset += g->words * sizeof(u32);
}
return 0;
+}
I wonder, where's this DMA buffer actually used? I can't find any use between this copy and the corresponding dma_free_writecombine() call.
We replace the gathers in host1x_job with the ones we allocate here, so they are used when pushing the gather's to hardware.
This is done so that user space cannot tamper with the gathers once they've been checked by firewall.
Oh, I had missed how g->mem_base is assigned job->gather_copy, so I had thought the memory wasn't used anywhere. I wonder if it wouldn't be more efficient to pre-allocate this buffer. We number of gathers is limited by HOST1X_GATHER_QUEUE_SIZE, right? So we could allocate a buffer of the appropriate size for each job to avoid continuously reallocating and freeing everytime the job in pinned or unpinned.
Also jobs are allocated for each submit and allocating them is quite expensive, so eventually we may want to pool them. Which will not be trivial though, given that it requires the number of command buffers and relocs to match. Some clever checks can probably make this work, though.
/* Null kickoff prevents submit from being sent to hardware */
bool null_kickoff;
I don't think this is used anywhere.
True, we can remove this as we haven't posted the code for null kickoff.
Make sure to explain what this is used for when you post. The one comment above is a bit vague.
/* Check if register is marked as an address reg */
int (*is_addr_reg)(struct platform_device *dev, u32 reg, u32 class);
is_addr_reg() sounds a bit unusual. Maybe match this to the name of the main firewall routine, validate()?
The point of this op is to just tell if a register for a class is pointing to a buffer. validate then uses this information. But both answers (yes/no) and both types of registers are still valid, so validate() wouldn't be the proper name.
validation is then done by checking that there's a reloc corresponding to each register write to a register that can hold an address.
I just remembered that we discussed this already and I think we agreed that a table lookup might be a better implementation. That'd get rid of the naming issue altogether, since you can just name the table something like address_registers, which is quite unambiguous.
diff --git a/drivers/gpu/host1x/memmgr.h b/drivers/gpu/host1x/memmgr.h
[...]
+struct mem_handle; +struct platform_device;
+struct host1x_job_unpin_data {
struct mem_handle *h;
struct sg_table *mem;
+};
+enum mem_mgr_flag {
mem_mgr_flag_uncacheable = 0,
mem_mgr_flag_write_combine = 1,
+};
I'd like to see this use a more object-oriented approach and more common terminology. All of these handles are essentially buffer objects, so maybe something like host1x_bo would be a nice and short name.
To make this more object-oriented, I propose something like:
struct host1x_bo_ops { int (*alloc)(struct host1x_bo *bo, size_t size, unsigned long align, unsigned long flags); int (*free)(struct host1x_bo *bo); ... }; struct host1x_bo { const struct host1x_bo_ops *ops; }; struct host1x_cma_bo { struct host1x_bo base; struct drm_gem_cma_object *obj; }; static inline struct host1x_cma_bo *to_host1x_cma_bo(struct host1x_bo *bo) { return container_of(bo, struct host1x_cma_bo, base); } static inline int host1x_bo_alloc(struct host1x_bo *bo, size_t size, unsigned long align, unsigned long flags) { return bo->ops->alloc(bo, size, align, flags); } ...
That should be easy to extend with a new type of BO once the IOMMU-based allocator is ready. And as I said it is much closer in terminology to what other drivers do.
One complexity is that we're using the same type for communicating with user space. Each buffer carries with it a flag indicating its allocator. We might be able to model the internal structure to be more like what you propose, but for the API we still need the flag.
I disagree. I don't see any need for passing around the type at all. We've discussed this a few times already, and correct me if I'm wrong, but I think we agreed that we don't want to mix handle/buffer types.
We only support CMA for now, so all buffers will be allocated from CMA. Once the IOMMU-based allocator is ready we'll want to switch to that for Tegra30 and later, but stick to CMA for Tegra20 since the GART isn't very usable.
So the way I see it, the decision about which allocator to use is done once at driver probe time. So all that's really needed is a function that allocates a buffer object and returns the proper one for the given Tegra SoC. Once a host1x_bo object is returned it can be used throughout and we get rid of the additional memmgr abstraction. I think it'll make things much simpler.
Thierry
On 26.02.2013 11:48, Terje Bergström wrote:
On 25.02.2013 17:24, Thierry Reding wrote:
You use two different styles to indent the function parameters. You might want to stick to one, preferably aligning them with the first parameter on the first line.
I've generally favored "two tabs" indenting, but we'll anyway standardize on one.
We standardized on the convention used in tegradrm, i.e. aligning with first parameter.
There's nothing in this function that requires a platform_device, so passing struct device should be enough. Or maybe host1x_cdma should get a struct device * field?
I think we'll just start using struct device * in general in code. Arto's been already fixing a lot of these, so he might've already fixed this.
We did a sweep in the code and now I hope everything that can, uses struct device *. The side effect was getting rid of a lot of casting, which is good.
Why don't you use any of the kernel's reference counting mechanisms?
+void host1x_channel_put(struct host1x_channel *ch) +{
mutex_lock(&ch->reflock);
if (ch->refcount == 1) {
host1x_get_host(ch->dev)->cdma_op.stop(&ch->cdma);
host1x_cdma_deinit(&ch->cdma);
}
ch->refcount--;
mutex_unlock(&ch->reflock);
+}
I think you can do all of this using a kref.
I think the original reason was that there's no reason to use atomic kref, as we anyway have to do mutual exclusion via mutex. But, using kref won't be any problem, so we could use that.
Actually, we ended up with a problem with this. kref assumes that once refcount goes to zero, it gets destroyed. In ch->refcount, going to zero is just fine and just indicates that we need to initialize. And, we anyway need to do locking, so we didn't do the conversion to kref.
+struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev) +{
(...)
+}
I think the critical section could be shorter here. It's probably not worth the extra trouble, though, given that channels are not often allocated.
Yeah, boot time isn't measured in microseconds. :-) But, if we just make allocated_channels an atomic, we should be able to drop chlist_mutex altogether and it could simplify the code.
There wasn't much we could have moved outside the critical section, so we didn't touch this area.
Also, is it really necessary to abstract these into an ops structure? I get that newer hardware revisions might require different ops for sync- point handling because the register layout or number of syncpoints may be different, but the CDMA and push buffer (below) concepts are pretty much a software abstraction, and as such its implementation is unlikely to change with some future hardware revision.
Pushbuffer ops can become generic. There's only one catch - init uses the restart opcode. But the opcode is not going to change, so we can generalize that.
We ended up keeping the init as an operation, but rest of push buffer ops became generic.
+/*
- Push two words to the push buffer
- Caller must ensure push buffer is not full
- */
+static void push_buffer_push_to(struct push_buffer *pb,
struct mem_handle *handle,
u32 op1, u32 op2)
+{
u32 cur = pb->cur;
u32 *p = (u32 *)((u32)pb->mapped + cur);
You do all this extra casting to make sure to increment by bytes and not 32-bit words. How about you change pb->cur to contain the word index, so that you don't have to go through hoops each time around.
When we changed DMASTART and DMAEND to actually denote the push buffer area, we noticed that DMAGET and DMAPUT are actually relative to DMASTART and DMAEND. This and the need to access both CPU and device virtual addresses coupled with changing to word indexes didn't actually simplify the code, so we kept still using byte indexes.
+/*
- Return the number of two word slots free in the push buffer
- */
+static u32 push_buffer_space(struct push_buffer *pb) +{
return ((pb->fence - pb->cur) & (PUSH_BUFFER_SIZE - 1)) / 8;
+}
Why & (PUSH_BUFFER_SIZE - 1) here? fence - cur can never be larger than PUSH_BUFFER_SIZE, can it?
You're right, this function doesn't need to worry about wrapping.
Arto noticed this, but actually I was wrong - the wrapping is very possible. We just have to remember that if we're processing something at the end of push buffer, cur might be in the end, and fence in the beginning.
diff --git a/drivers/gpu/host1x/memmgr.h b/drivers/gpu/host1x/memmgr.h
[...]
+struct mem_handle; +struct platform_device;
+struct host1x_job_unpin_data {
struct mem_handle *h;
struct sg_table *mem;
+};
+enum mem_mgr_flag {
mem_mgr_flag_uncacheable = 0,
mem_mgr_flag_write_combine = 1,
+};
I'd like to see this use a more object-oriented approach and more common terminology. All of these handles are essentially buffer objects, so maybe something like host1x_bo would be a nice and short name.
We did this a bit differently, but following pretty much the same principles. We have host1x_mem_handle, which contains an ops pointer. The handle gets encapsulated inside drm_gem_cma_object.
_bo structs seem to usually contains a drm_gem_object, so we thought it's better not to reuse that term.
Please check the code and let us know what you think. This pretty much follows what Lucas proposed a while ago, and keeps neatly the DRM specific parts inside the drm directory.
Other than these, we should have implemented all changes that we agreed to include. If something's missing, it's because there were so many that we just dropped the ball.
Terje
On Fri, Mar 08, 2013 at 06:16:16PM +0200, Terje Bergström wrote:
On 26.02.2013 11:48, Terje Bergström wrote:
On 25.02.2013 17:24, Thierry Reding wrote:
[...]
+struct mem_handle; +struct platform_device;
+struct host1x_job_unpin_data {
struct mem_handle *h;
struct sg_table *mem;
+};
+enum mem_mgr_flag {
mem_mgr_flag_uncacheable = 0,
mem_mgr_flag_write_combine = 1,
+};
I'd like to see this use a more object-oriented approach and more common terminology. All of these handles are essentially buffer objects, so maybe something like host1x_bo would be a nice and short name.
We did this a bit differently, but following pretty much the same principles. We have host1x_mem_handle, which contains an ops pointer. The handle gets encapsulated inside drm_gem_cma_object.
_bo structs seem to usually contains a drm_gem_object, so we thought it's better not to reuse that term.
Please check the code and let us know what you think. This pretty much follows what Lucas proposed a while ago, and keeps neatly the DRM specific parts inside the drm directory.
A bo is just a buffer object, so I don't see why the name shouldn't be used. The name is in no way specific to DRM or GEM. But the point that I was trying to make was that there is nothing to suggest that we couldn't use drm_gem_object as the underlying scaffold to base all host1x buffer objects on.
Furthermore I don't understand why you've chosen this approach. It is completely different from what other drivers do and therefore makes it more difficult to comprehend. That alone I could live with if there were any advantages to that approach, but as far as I can tell there are none.
Thierry
On 08.03.2013 22:43, Thierry Reding wrote:
A bo is just a buffer object, so I don't see why the name shouldn't be used. The name is in no way specific to DRM or GEM. But the point that I was trying to make was that there is nothing to suggest that we couldn't use drm_gem_object as the underlying scaffold to base all host1x buffer objects on.
Furthermore I don't understand why you've chosen this approach. It is completely different from what other drivers do and therefore makes it more difficult to comprehend. That alone I could live with if there were any advantages to that approach, but as far as I can tell there are none.
I was following the plan we agreed on earlier in email discussion with you and Lucas:
On 29.11.2012 11:09, Lucas Stach wrote:
We should aim for a clean split here. GEM handles are something which is really specific to how DRM works and as such should be constructed by tegradrm. nvhost should really just manage allocations/virtual address space and provide something that is able to back all the GEM handle operations.
nvhost has really no reason at all to even know about GEM handles. If you back a GEM object by a nvhost object you can just peel out the nvhost handles from the GEM wrapper in the tegradrm submit ioctl handler and queue the job to nvhost using it's native handles.
This way you would also be able to construct different handles (like GEM obj or V4L2 buffers) from the same backing nvhost object. Note that I'm not sure how useful this would be, but it seems like a reasonable design to me being able to do so.
With this structure, we are already prepared for non-DRM APIs. Tt's a matter of familiarity of code versus future expansion. Code paths for both are as simple/complex, so neither has a direct technical superiority in performance.
I know other DRM drivers have opted to hard code GEM dependency throughout the code. Then again, host1x hardware is managing much more than graphics, so we need to think outside the DRM box, too.
Terje
On Mon, Mar 11, 2013 at 08:29:59AM +0200, Terje Bergström wrote:
On 08.03.2013 22:43, Thierry Reding wrote:
A bo is just a buffer object, so I don't see why the name shouldn't be used. The name is in no way specific to DRM or GEM. But the point that I was trying to make was that there is nothing to suggest that we couldn't use drm_gem_object as the underlying scaffold to base all host1x buffer objects on.
Furthermore I don't understand why you've chosen this approach. It is completely different from what other drivers do and therefore makes it more difficult to comprehend. That alone I could live with if there were any advantages to that approach, but as far as I can tell there are none.
I was following the plan we agreed on earlier in email discussion with you and Lucas:
On 29.11.2012 11:09, Lucas Stach wrote:
We should aim for a clean split here. GEM handles are something which is really specific to how DRM works and as such should be constructed by tegradrm. nvhost should really just manage allocations/virtual address space and provide something that is able to back all the GEM handle operations.
nvhost has really no reason at all to even know about GEM handles. If you back a GEM object by a nvhost object you can just peel out the nvhost handles from the GEM wrapper in the tegradrm submit ioctl handler and queue the job to nvhost using it's native handles.
This way you would also be able to construct different handles (like GEM obj or V4L2 buffers) from the same backing nvhost object. Note that I'm not sure how useful this would be, but it seems like a reasonable design to me being able to do so.
With this structure, we are already prepared for non-DRM APIs. Tt's a matter of familiarity of code versus future expansion. Code paths for both are as simple/complex, so neither has a direct technical superiority in performance.
I know other DRM drivers have opted to hard code GEM dependency throughout the code. Then again, host1x hardware is managing much more than graphics, so we need to think outside the DRM box, too.
This sound a bit over-engineered at this point in time. DRM is currently the only user. Is anybody working on any non-DRM drivers that would use this?
Even that aside, I don't think host1x_mem_handle is a good choice of name here. The objects are much more than handles. They are in fact buffer objects, which can optionally be attached to a handle. I also think that using a void * to store the handle specific data isn't such a good idea.
So how about the following proposal, which I think might satisfy both of us:
struct host1x_bo;
struct host1x_bo_ops { struct host1x_bo *(*get)(struct host1x_bo *bo); void (*put)(struct host1x_bo *bo); dma_addr_t (*pin)(struct host1x_bo *bo, struct sg_table **sgt); ... };
struct host1x_bo *host1x_bo_get(struct host1x_bo *bo); void host1x_bo_put(struct host1x_bo *bo); dma_addr_t host1x_bo_pin(struct host1x_bo *bo, struct sg_table **sgt); ...
struct host1x_bo { const struct host1x_bo_ops *ops; ... };
struct tegra_drm_bo { struct host1x_bo base; ... };
That way you can get rid of the host1x_memmgr_create_handle() helper and instead embed host1x_bo into driver-/framework-specific structures with the necessary initialization.
It also allows you to interact directly with the objects instead of having to go through the memmgr API. The memory manager doesn't really exist anymore so keeping the name in the API is only confusing. Your current proposal deals with memory handles directly already so it's really just making the naming more consistent.
Thierry
On 11.03.2013 09:18, Thierry Reding wrote:
This sound a bit over-engineered at this point in time. DRM is currently the only user. Is anybody working on any non-DRM drivers that would use this?
Well, this contains beginning of that:
http://nv-tegra.nvidia.com/gitweb/?p=linux-2.6.git;a=blob;f=drivers/media/vi...
I don't want to give these guys any excuse not to port it over to host1x code base. :-)
Even that aside, I don't think host1x_mem_handle is a good choice of name here. The objects are much more than handles. They are in fact buffer objects, which can optionally be attached to a handle. I also think that using a void * to store the handle specific data isn't such a good idea.
Naming if not an issue for me - we can easily agree on using _bo.
So how about the following proposal, which I think might satisfy both of us:
struct host1x_bo;
struct host1x_bo_ops { struct host1x_bo *(*get)(struct host1x_bo *bo); void (*put)(struct host1x_bo *bo); dma_addr_t (*pin)(struct host1x_bo *bo, struct sg_table **sgt); ... };
struct host1x_bo *host1x_bo_get(struct host1x_bo *bo); void host1x_bo_put(struct host1x_bo *bo); dma_addr_t host1x_bo_pin(struct host1x_bo *bo, struct sg_table **sgt); ...
struct host1x_bo { const struct host1x_bo_ops *ops; ... };
struct tegra_drm_bo { struct host1x_bo base; ... };
That way you can get rid of the host1x_memmgr_create_handle() helper and instead embed host1x_bo into driver-/framework-specific structures with the necessary initialization.
This would make sense. We'll get back when we have enough of implementation done to understand it all. One consequence is that we cannot use drm_gem_cma_create() anymore. We'll have to introduce a function that does the same as drm_gem_cma_create(), but it takes a pre-allocated drm_gem_cma_object pointer. That way we can allocate the struct, and use DRM CMA just to initialize the drm_gem_cma_object.
Other way would be just taking a copy of DRM CMA helper, but I'd like to defer that to the next step when we implement IOMMU aware allocator.
It also allows you to interact directly with the objects instead of having to go through the memmgr API. The memory manager doesn't really exist anymore so keeping the name in the API is only confusing. Your current proposal deals with memory handles directly already so it's really just making the naming more consistent.
The memmgr APIs are currently just a shortcut wrapper to the ops, so in that sense the memmgr does not really exist. I think it might still make sense to keep static inline wrappers for calling the ops within, but we could rename them to host1x_bo_somethingandother. Then it'd follow the pattern we are using for the hw ops in the latest set.
Terje
On Mon, Mar 11, 2013 at 11:21:05AM +0200, Terje Bergström wrote:
On 11.03.2013 09:18, Thierry Reding wrote:
This sound a bit over-engineered at this point in time. DRM is currently the only user. Is anybody working on any non-DRM drivers that would use this?
Well, this contains beginning of that:
http://nv-tegra.nvidia.com/gitweb/?p=linux-2.6.git;a=blob;f=drivers/media/vi...
I don't want to give these guys any excuse not to port it over to host1x code base. :-)
I was aware of that driver but I didn't realize it had been available publicly. It's great to see this, though, and one more argument in favour of not binding the host1x_bo too tightly to DRM/GEM.
So how about the following proposal, which I think might satisfy both of us:
struct host1x_bo;
struct host1x_bo_ops { struct host1x_bo *(*get)(struct host1x_bo *bo); void (*put)(struct host1x_bo *bo); dma_addr_t (*pin)(struct host1x_bo *bo, struct sg_table **sgt); ... };
struct host1x_bo *host1x_bo_get(struct host1x_bo *bo); void host1x_bo_put(struct host1x_bo *bo); dma_addr_t host1x_bo_pin(struct host1x_bo *bo, struct sg_table **sgt); ...
struct host1x_bo { const struct host1x_bo_ops *ops; ... };
struct tegra_drm_bo { struct host1x_bo base; ... };
That way you can get rid of the host1x_memmgr_create_handle() helper and instead embed host1x_bo into driver-/framework-specific structures with the necessary initialization.
This would make sense. We'll get back when we have enough of implementation done to understand it all. One consequence is that we cannot use drm_gem_cma_create() anymore. We'll have to introduce a function that does the same as drm_gem_cma_create(), but it takes a pre-allocated drm_gem_cma_object pointer. That way we can allocate the struct, and use DRM CMA just to initialize the drm_gem_cma_object.
I certainly think that introducing a drm_gem_cma_object_init() function shouldn't pose a problem. If you do, make sure to update the existing drm_gem_cma_create() to use it. Having both lets users have the choice to use drm_gem_cma_create() if they don't need to embed it, or drm_gem_cma_object_init() otherwise.
Other way would be just taking a copy of DRM CMA helper, but I'd like to defer that to the next step when we implement IOMMU aware allocator.
I'm not sure I understand what you're saying, but if you add a function as discussed above this shouldn't be necessary.
It also allows you to interact directly with the objects instead of having to go through the memmgr API. The memory manager doesn't really exist anymore so keeping the name in the API is only confusing. Your current proposal deals with memory handles directly already so it's really just making the naming more consistent.
The memmgr APIs are currently just a shortcut wrapper to the ops, so in that sense the memmgr does not really exist. I think it might still make sense to keep static inline wrappers for calling the ops within, but we could rename them to host1x_bo_somethingandother. Then it'd follow the pattern we are using for the hw ops in the latest set.
Yes, that's exactly what I had in mind in the above proposal. They could be inline, but it's probably also okay if they're not. They aren't meant to be used very frequently so the extra function call shouldn't matter much. It might be easier to do add some additional checks if they aren't inlined. I'm fine either way.
Thierry
Add support for host1x debugging. Adds debugfs entries, and dumps channel state to UART in case of stuck job.
Signed-off-by: Terje Bergstrom tbergstrom@nvidia.com --- drivers/gpu/host1x/Makefile | 1 + drivers/gpu/host1x/cdma.c | 34 +++ drivers/gpu/host1x/debug.c | 215 ++++++++++++++ drivers/gpu/host1x/debug.h | 50 ++++ drivers/gpu/host1x/dev.c | 3 + drivers/gpu/host1x/dev.h | 17 ++ drivers/gpu/host1x/hw/cdma_hw.c | 3 + drivers/gpu/host1x/hw/debug_hw.c | 400 +++++++++++++++++++++++++++ drivers/gpu/host1x/hw/host1x01.c | 2 + drivers/gpu/host1x/hw/hw_host1x01_channel.h | 18 ++ drivers/gpu/host1x/hw/hw_host1x01_sync.h | 115 ++++++++ drivers/gpu/host1x/hw/syncpt_hw.c | 1 + drivers/gpu/host1x/syncpt.c | 3 + 13 files changed, 862 insertions(+) create mode 100644 drivers/gpu/host1x/debug.c create mode 100644 drivers/gpu/host1x/debug.h create mode 100644 drivers/gpu/host1x/hw/debug_hw.c
diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile index cdd87c8..697d49a 100644 --- a/drivers/gpu/host1x/Makefile +++ b/drivers/gpu/host1x/Makefile @@ -7,6 +7,7 @@ host1x-y = \ cdma.o \ channel.o \ job.o \ + debug.o \ memmgr.o \ hw/host1x01.o
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c index d6a38d2..12dd46c 100644 --- a/drivers/gpu/host1x/cdma.c +++ b/drivers/gpu/host1x/cdma.c @@ -19,6 +19,7 @@ #include "cdma.h" #include "channel.h" #include "dev.h" +#include "debug.h" #include "memmgr.h" #include "job.h" #include <asm/cacheflush.h> @@ -370,12 +371,42 @@ int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job) return 0; }
+static void trace_write_gather(struct host1x_cdma *cdma, + struct mem_handle *ref, + u32 offset, u32 words) +{ + void *mem = NULL; + + if (host1x_debug_trace_cmdbuf) + mem = host1x_memmgr_mmap(ref); + + if (mem) { + u32 i; + /* + * Write in batches of 128 as there seems to be a limit + * of how much you can output to ftrace at once. + */ + for (i = 0; i < words; i += TRACE_MAX_LENGTH) { + trace_host1x_cdma_push_gather( + cdma_to_channel(cdma)->dev->name, + (u32)ref, + min(words - i, TRACE_MAX_LENGTH), + offset + i * sizeof(u32), + mem); + } + host1x_memmgr_munmap(ref, mem); + } +} + /* * Push two words into a push buffer slot * Blocks as necessary if the push buffer is full. */ void host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2) { + if (host1x_debug_trace_cmdbuf) + trace_host1x_cdma_push(cdma_to_channel(cdma)->dev->name, + op1, op2); host1x_cdma_push_gather(cdma, NULL, 0, op1, op2); }
@@ -391,6 +422,9 @@ void host1x_cdma_push_gather(struct host1x_cdma *cdma, u32 slots_free = cdma->slots_free; struct push_buffer *pb = &cdma->push_buffer;
+ if (handle) + trace_write_gather(cdma, handle, offset, op1 & 0xffff); + if (slots_free == 0) { host1x->cdma_op.kick(cdma); slots_free = host1x_cdma_wait_locked(cdma, diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c new file mode 100644 index 0000000..29cbe93 --- /dev/null +++ b/drivers/gpu/host1x/debug.c @@ -0,0 +1,215 @@ +/* + * Copyright (C) 2010 Google, Inc. + * Author: Erik Gilling konkers@android.com + * + * Copyright (C) 2011-2012 NVIDIA Corporation + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/debugfs.h> +#include <linux/seq_file.h> +#include <linux/uaccess.h> + +#include <linux/io.h> + +#include "dev.h" +#include "debug.h" +#include "channel.h" + +static pid_t host1x_debug_null_kickoff_pid; +unsigned int host1x_debug_trace_cmdbuf; + +static pid_t host1x_debug_force_timeout_pid; +static u32 host1x_debug_force_timeout_val; +static u32 host1x_debug_force_timeout_channel; + +void host1x_debug_output(struct output *o, const char *fmt, ...) +{ + va_list args; + int len; + + va_start(args, fmt); + len = vsnprintf(o->buf, sizeof(o->buf), fmt, args); + va_end(args); + o->fn(o->ctx, o->buf, len); +} + +static int show_channels(struct host1x_channel *ch, void *data) +{ + struct host1x *m = host1x_get_host(ch->dev); + struct output *o = data; + + mutex_lock(&ch->reflock); + if (ch->refcount) { + mutex_lock(&ch->cdma.lock); + m->debug_op.show_channel_fifo(m, ch, o, ch->chid); + m->debug_op.show_channel_cdma(m, ch, o, ch->chid); + mutex_unlock(&ch->cdma.lock); + } + mutex_unlock(&ch->reflock); + + return 0; +} + +static void show_syncpts(struct host1x *m, struct output *o) +{ + int i; + host1x_debug_output(o, "---- syncpts ----\n"); + for (i = 0; i < host1x_syncpt_nb_pts(m); i++) { + u32 max = host1x_syncpt_read_max(m->syncpt + i); + u32 min = host1x_syncpt_load_min(m->syncpt + i); + if (!min && !max) + continue; + host1x_debug_output(o, "id %d (%s) min %d max %d\n", + i, m->syncpt[i].name, + min, max); + } + + for (i = 0; i < host1x_syncpt_nb_bases(m); i++) { + u32 base_val; + base_val = host1x_syncpt_read_wait_base(m->syncpt + i); + if (base_val) + host1x_debug_output(o, "waitbase id %d val %d\n", + i, base_val); + } + + host1x_debug_output(o, "\n"); +} + +static void show_all(struct host1x *m, struct output *o) +{ + m->debug_op.show_mlocks(m, o); + show_syncpts(m, o); + host1x_debug_output(o, "---- channels ----\n"); + host1x_channel_for_all(m, o, show_channels); +} + +#ifdef CONFIG_DEBUG_FS +static int show_channels_no_fifo(struct host1x_channel *ch, void *data) +{ + struct host1x *host1x = host1x_get_host(ch->dev); + struct output *o = data; + + mutex_lock(&ch->reflock); + if (ch->refcount) { + mutex_lock(&ch->cdma.lock); + host1x->debug_op.show_channel_cdma(host1x, ch, o, ch->chid); + mutex_unlock(&ch->cdma.lock); + } + mutex_unlock(&ch->reflock); + + return 0; +} + +static void show_all_no_fifo(struct host1x *host1x, struct output *o) +{ + host1x->debug_op.show_mlocks(host1x, o); + show_syncpts(host1x, o); + host1x_debug_output(o, "---- channels ----\n"); + host1x_channel_for_all(host1x, o, show_channels_no_fifo); +} + +static int host1x_debug_show_all(struct seq_file *s, void *unused) +{ + struct output o = { + .fn = write_to_seqfile, + .ctx = s + }; + show_all(s->private, &o); + return 0; +} + +static int host1x_debug_show(struct seq_file *s, void *unused) +{ + struct output o = { + .fn = write_to_seqfile, + .ctx = s + }; + show_all_no_fifo(s->private, &o); + return 0; +} + +static int host1x_debug_open_all(struct inode *inode, struct file *file) +{ + return single_open(file, host1x_debug_show_all, inode->i_private); +} + +static const struct file_operations host1x_debug_all_fops = { + .open = host1x_debug_open_all, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +static int host1x_debug_open(struct inode *inode, struct file *file) +{ + return single_open(file, host1x_debug_show, inode->i_private); +} + +static const struct file_operations host1x_debug_fops = { + .open = host1x_debug_open, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +void host1x_debug_init(struct host1x *host1x) +{ + struct dentry *de = debugfs_create_dir("tegra-host1x", NULL); + + if (!de) + return; + + /* Store the created entry */ + host1x->debugfs = de; + + debugfs_create_file("status", S_IRUGO, de, + host1x, &host1x_debug_fops); + debugfs_create_file("status_all", S_IRUGO, de, + host1x, &host1x_debug_all_fops); + + debugfs_create_u32("null_kickoff_pid", S_IRUGO|S_IWUSR, de, + &host1x_debug_null_kickoff_pid); + debugfs_create_u32("trace_cmdbuf", S_IRUGO|S_IWUSR, de, + &host1x_debug_trace_cmdbuf); + + if (host1x->debug_op.debug_init) + host1x->debug_op.debug_init(de); + + debugfs_create_u32("force_timeout_pid", S_IRUGO|S_IWUSR, de, + &host1x_debug_force_timeout_pid); + debugfs_create_u32("force_timeout_val", S_IRUGO|S_IWUSR, de, + &host1x_debug_force_timeout_val); + debugfs_create_u32("force_timeout_channel", S_IRUGO|S_IWUSR, de, + &host1x_debug_force_timeout_channel); +} + +void host1x_debug_deinit(struct host1x *host1x) +{ + debugfs_remove_recursive(host1x->debugfs); +} +#else +void host1x_debug_init(struct host1x *host1x) +{ +} +void host1x_debug_deinit(struct host1x *host1x) +{ +} +#endif + +void host1x_debug_dump(struct host1x *host1x) +{ + struct output o = { + .fn = write_to_printk + }; + show_all(host1x, &o); +} diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h new file mode 100644 index 0000000..fd3560b --- /dev/null +++ b/drivers/gpu/host1x/debug.h @@ -0,0 +1,50 @@ +/* + * Tegra host1x Debug + * + * Copyright (c) 2011-2012 NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ +#ifndef __NVHOST_DEBUG_H +#define __NVHOST_DEBUG_H + +#include <linux/debugfs.h> +#include <linux/seq_file.h> + +struct host1x; + +struct output { + void (*fn)(void *ctx, const char *str, size_t len); + void *ctx; + char buf[256]; +}; + +static inline void write_to_seqfile(void *ctx, const char *str, size_t len) +{ + seq_write((struct seq_file *)ctx, str, len); +} + +static inline void write_to_printk(void *ctx, const char *str, size_t len) +{ + pr_info("%s", str); +} + +void __printf(2, 3) host1x_debug_output(struct output *o, const char *fmt, ...); + +extern unsigned int host1x_debug_trace_cmdbuf; + +void host1x_debug_init(struct host1x *master); +void host1x_debug_deinit(struct host1x *master); +void host1x_debug_dump(struct host1x *master); + +#endif /*__NVHOST_DEBUG_H */ diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index 80311ca..5aa7d28 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -26,6 +26,7 @@ #include "dev.h" #include "intr.h" #include "channel.h" +#include "debug.h" #include "hw/host1x01.h"
#define CREATE_TRACE_POINTS @@ -150,6 +151,8 @@ static int host1x_probe(struct platform_device *dev)
host1x_intr_start(&host->intr, clk_get_rate(host->clk));
+ host1x_debug_init(host); + dev_info(&dev->dev, "initialized\n");
return 0; diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h index 2fefa78..467a92e 100644 --- a/drivers/gpu/host1x/dev.h +++ b/drivers/gpu/host1x/dev.h @@ -33,6 +33,7 @@ struct push_buffer; struct dentry; struct mem_handle; struct platform_device; +struct output;
struct host1x_channel_ops { int (*init)(struct host1x_channel *, @@ -71,6 +72,21 @@ struct host1x_pushbuffer_ops { u32 (*putptr)(struct push_buffer *); };
+struct host1x_debug_ops { + void (*debug_init)(struct dentry *de); + void (*show_channel_cdma)(struct host1x *, + struct host1x_channel *, + struct output *, + int chid); + void (*show_channel_fifo)(struct host1x *, + struct host1x_channel *, + struct output *, + int chid); + void (*show_mlocks)(struct host1x *m, + struct output *o); + +}; + struct host1x_syncpt_ops { void (*reset)(struct host1x_syncpt *); void (*reset_wait_base)(struct host1x_syncpt *); @@ -117,6 +133,7 @@ struct host1x { struct host1x_channel_ops channel_op; struct host1x_cdma_ops cdma_op; struct host1x_pushbuffer_ops cdma_pb_op; + struct host1x_debug_ops debug_op; struct host1x_syncpt_ops syncpt_op; struct host1x_intr_ops intr_op;
diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c index 7a44418..2228246 100644 --- a/drivers/gpu/host1x/hw/cdma_hw.c +++ b/drivers/gpu/host1x/hw/cdma_hw.c @@ -22,6 +22,7 @@ #include "cdma.h" #include "channel.h" #include "dev.h" +#include "debug.h" #include "memmgr.h"
#include "cdma_hw.h" @@ -407,6 +408,8 @@ static void cdma_timeout_handler(struct work_struct *work) host1x = cdma_to_host1x(cdma); ch = cdma_to_channel(cdma);
+ host1x_debug_dump(cdma_to_host1x(cdma)); + mutex_lock(&cdma->lock);
if (!cdma->timeout.clientid) { diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c new file mode 100644 index 0000000..0b8d466 --- /dev/null +++ b/drivers/gpu/host1x/hw/debug_hw.c @@ -0,0 +1,400 @@ +/* + * Copyright (C) 2010 Google, Inc. + * Author: Erik Gilling konkers@android.com + * + * Copyright (C) 2011 NVIDIA Corporation + * + * This software is licensed under the terms of the GNU General Public + * License version 2, as published by the Free Software Foundation, and + * may be copied, distributed, and modified under those terms. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#include <linux/debugfs.h> +#include <linux/seq_file.h> +#include <linux/mm.h> +#include <linux/scatterlist.h> + +#include <linux/io.h> + +#include "dev.h" +#include "debug.h" +#include "cdma.h" +#include "channel.h" +#include "memmgr.h" + +#define NVHOST_DEBUG_MAX_PAGE_OFFSET 102400 + +enum { + NVHOST_DBG_STATE_CMD = 0, + NVHOST_DBG_STATE_DATA = 1, + NVHOST_DBG_STATE_GATHER = 2 +}; + +static int show_channel_command(struct output *o, u32 addr, u32 val, int *count) +{ + unsigned mask; + unsigned subop; + + switch (val >> 28) { + case 0x0: + mask = val & 0x3f; + if (mask) { + host1x_debug_output(o, + "SETCL(class=%03x, offset=%03x, mask=%02x, [", + val >> 6 & 0x3ff, val >> 16 & 0xfff, mask); + *count = hweight8(mask); + return NVHOST_DBG_STATE_DATA; + } else { + host1x_debug_output(o, "SETCL(class=%03x)\n", + val >> 6 & 0x3ff); + return NVHOST_DBG_STATE_CMD; + } + + case 0x1: + host1x_debug_output(o, "INCR(offset=%03x, [", + val >> 16 & 0xfff); + *count = val & 0xffff; + return NVHOST_DBG_STATE_DATA; + + case 0x2: + host1x_debug_output(o, "NONINCR(offset=%03x, [", + val >> 16 & 0xfff); + *count = val & 0xffff; + return NVHOST_DBG_STATE_DATA; + + case 0x3: + mask = val & 0xffff; + host1x_debug_output(o, "MASK(offset=%03x, mask=%03x, [", + val >> 16 & 0xfff, mask); + *count = hweight16(mask); + return NVHOST_DBG_STATE_DATA; + + case 0x4: + host1x_debug_output(o, "IMM(offset=%03x, data=%03x)\n", + val >> 16 & 0xfff, val & 0xffff); + return NVHOST_DBG_STATE_CMD; + + case 0x5: + host1x_debug_output(o, "RESTART(offset=%08x)\n", val << 4); + return NVHOST_DBG_STATE_CMD; + + case 0x6: + host1x_debug_output(o, + "GATHER(offset=%03x, insert=%d, type=%d, count=%04x, addr=[", + val >> 16 & 0xfff, val >> 15 & 0x1, val >> 14 & 0x1, + val & 0x3fff); + *count = val & 0x3fff; /* TODO: insert */ + return NVHOST_DBG_STATE_GATHER; + + case 0xe: + subop = val >> 24 & 0xf; + if (subop == 0) + host1x_debug_output(o, "ACQUIRE_MLOCK(index=%d)\n", + val & 0xff); + else if (subop == 1) + host1x_debug_output(o, "RELEASE_MLOCK(index=%d)\n", + val & 0xff); + else + host1x_debug_output(o, "EXTEND_UNKNOWN(%08x)\n", val); + return NVHOST_DBG_STATE_CMD; + + default: + return NVHOST_DBG_STATE_CMD; + } +} + +static void show_channel_gather(struct output *o, u32 addr, + phys_addr_t phys_addr, u32 words, struct host1x_cdma *cdma); + +static void show_channel_word(struct output *o, int *state, int *count, + u32 addr, u32 val, struct host1x_cdma *cdma) +{ + static int start_count, dont_print; + + switch (*state) { + case NVHOST_DBG_STATE_CMD: + if (addr) + host1x_debug_output(o, "%08x: %08x:", addr, val); + else + host1x_debug_output(o, "%08x:", val); + + *state = show_channel_command(o, addr, val, count); + dont_print = 0; + start_count = *count; + if (*state == NVHOST_DBG_STATE_DATA && *count == 0) { + *state = NVHOST_DBG_STATE_CMD; + host1x_debug_output(o, "])\n"); + } + break; + + case NVHOST_DBG_STATE_DATA: + (*count)--; + if (start_count - *count < 64) + host1x_debug_output(o, "%08x%s", + val, *count > 0 ? ", " : "])\n"); + else if (!dont_print && (*count > 0)) { + host1x_debug_output(o, "[truncated; %d more words]\n", + *count); + dont_print = 1; + } + if (*count == 0) + *state = NVHOST_DBG_STATE_CMD; + break; + + case NVHOST_DBG_STATE_GATHER: + *state = NVHOST_DBG_STATE_CMD; + host1x_debug_output(o, "%08x]):\n", val); + if (cdma) { + show_channel_gather(o, addr, val, + *count, cdma); + } + break; + } +} + +static void do_show_channel_gather(struct output *o, + phys_addr_t phys_addr, + u32 words, struct host1x_cdma *cdma, + phys_addr_t pin_addr, u32 *map_addr) +{ + /* Map dmaget cursor to corresponding mem handle */ + u32 offset; + int state, count, i; + + offset = phys_addr - pin_addr; + /* + * Sometimes we're given different hardware address to the same + * page - in these cases the offset will get an invalid number and + * we just have to bail out. + */ + if (offset > NVHOST_DEBUG_MAX_PAGE_OFFSET) { + host1x_debug_output(o, "[address mismatch]\n"); + } else { + /* GATHER buffer starts always with commands */ + state = NVHOST_DBG_STATE_CMD; + for (i = 0; i < words; i++) + show_channel_word(o, &state, &count, + phys_addr + i * 4, + *(map_addr + offset/4 + i), + cdma); + } +} + +static void show_channel_gather(struct output *o, u32 addr, + phys_addr_t phys_addr, + u32 words, struct host1x_cdma *cdma) +{ + /* Map dmaget cursor to corresponding mem handle */ + struct push_buffer *pb = &cdma->push_buffer; + u32 cur = addr - pb->phys; + struct mem_handle *mem = pb->handle[cur/8]; + u32 *map_addr, offset; + struct sg_table *sgt; + + if (!mem) { + host1x_debug_output(o, "[already deallocated]\n"); + return; + } + + map_addr = host1x_memmgr_mmap(mem); + if (!map_addr) { + host1x_debug_output(o, "[could not mmap]\n"); + return; + } + + /* Get base address from mem */ + sgt = host1x_memmgr_pin(mem); + if (IS_ERR(sgt)) { + host1x_debug_output(o, "[couldn't pin]\n"); + host1x_memmgr_munmap(mem, map_addr); + return; + } + + offset = phys_addr - sg_dma_address(sgt->sgl); + do_show_channel_gather(o, phys_addr, words, cdma, + sg_dma_address(sgt->sgl), map_addr); + host1x_memmgr_unpin(mem, sgt); + host1x_memmgr_munmap(mem, map_addr); +} + +static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma) +{ + struct host1x_job *job; + + list_for_each_entry(job, &cdma->sync_queue, list) { + int i; + host1x_debug_output(o, + "\n%p: JOB, syncpt_id=%d, syncpt_val=%d," + " first_get=%08x, timeout=%d" + " num_slots=%d, num_handles=%d\n", + job, + job->syncpt_id, + job->syncpt_end, + job->first_get, + job->timeout, + job->num_slots, + job->num_unpins); + + for (i = 0; i < job->num_gathers; i++) { + struct host1x_job_gather *g = &job->gathers[i]; + u32 *mapped = host1x_memmgr_mmap(g->ref); + if (!mapped) { + host1x_debug_output(o, "[could not mmap]\n"); + continue; + } + + host1x_debug_output(o, + " GATHER at %08x+%04x, %d words\n", + g->mem_base, g->offset, g->words); + + do_show_channel_gather(o, g->mem_base + g->offset, + g->words, cdma, g->mem_base, mapped); + host1x_memmgr_munmap(g->ref, mapped); + } + } +} + +static void host1x_debug_show_channel_cdma(struct host1x *m, + struct host1x_channel *ch, struct output *o, int chid) +{ + struct host1x_channel *channel = ch; + struct host1x_cdma *cdma = &channel->cdma; + u32 dmaput, dmaget, dmactrl; + u32 cbstat, cbread; + u32 val, base, baseval; + + dmaput = host1x_ch_readl(channel, HOST1X_CHANNEL_DMAPUT); + dmaget = host1x_ch_readl(channel, HOST1X_CHANNEL_DMAGET); + dmactrl = host1x_ch_readl(channel, HOST1X_CHANNEL_DMACTRL); + cbread = host1x_sync_readl(m, HOST1X_SYNC_CBREAD0 + 4 * chid); + cbstat = host1x_sync_readl(m, HOST1X_SYNC_CBSTAT_0 + 4 * chid); + + host1x_debug_output(o, "%d-%s: ", chid, + channel->dev->name); + + if (HOST1X_CHANNEL_DMACTRL_DMASTOP_V(dmactrl) + || !channel->cdma.push_buffer.mapped) { + host1x_debug_output(o, "inactive\n\n"); + return; + } + + switch (cbstat) { + case 0x00010008: + host1x_debug_output(o, "waiting on syncpt %d val %d\n", + cbread >> 24, cbread & 0xffffff); + break; + + case 0x00010009: + base = (cbread >> 16) & 0xff; + baseval = host1x_sync_readl(m, + HOST1X_SYNC_SYNCPT_BASE_0 + 4 * base); + val = cbread & 0xffff; + host1x_debug_output(o, "waiting on syncpt %d val %d " + "(base %d = %d; offset = %d)\n", + cbread >> 24, baseval + val, + base, baseval, val); + break; + + default: + host1x_debug_output(o, + "active class %02x, offset %04x, val %08x\n", + HOST1X_SYNC_CBSTAT_0_CBCLASS0_V(cbstat), + HOST1X_SYNC_CBSTAT_0_CBOFFSET0_V(cbstat), + cbread); + break; + } + + host1x_debug_output(o, "DMAPUT %08x, DMAGET %08x, DMACTL %08x\n", + dmaput, dmaget, dmactrl); + host1x_debug_output(o, "CBREAD %08x, CBSTAT %08x\n", cbread, cbstat); + + show_channel_gathers(o, cdma); + host1x_debug_output(o, "\n"); +} + +static void host1x_debug_show_channel_fifo(struct host1x *m, + struct host1x_channel *ch, struct output *o, int chid) +{ + u32 val, rd_ptr, wr_ptr, start, end; + struct host1x_channel *channel = ch; + int state, count; + + host1x_debug_output(o, "%d: fifo:\n", chid); + + val = host1x_ch_readl(channel, HOST1X_CHANNEL_FIFOSTAT); + host1x_debug_output(o, "FIFOSTAT %08x\n", val); + if (HOST1X_CHANNEL_FIFOSTAT_CFEMPTY_V(val)) { + host1x_debug_output(o, "[empty]\n"); + return; + } + + host1x_sync_writel(m, 0x0, HOST1X_SYNC_CFPEEK_CTRL); + host1x_sync_writel(m, HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_ENA_F(1) + | HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_CHANNR_F(chid), + HOST1X_SYNC_CFPEEK_CTRL); + + val = host1x_sync_readl(m, HOST1X_SYNC_CFPEEK_PTRS); + rd_ptr = HOST1X_SYNC_CFPEEK_PTRS_CF_RD_PTR_V(val); + wr_ptr = HOST1X_SYNC_CFPEEK_PTRS_CF_WR_PTR_V(val); + + val = host1x_sync_readl(m, HOST1X_SYNC_CF0_SETUP + 4 * chid); + start = HOST1X_SYNC_CF0_SETUP_CF0_BASE_V(val); + end = HOST1X_SYNC_CF0_SETUP_CF0_LIMIT_V(val); + + state = NVHOST_DBG_STATE_CMD; + + do { + host1x_sync_writel(m, 0x0, HOST1X_SYNC_CFPEEK_CTRL); + host1x_sync_writel(m, HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_ENA_F(1) + | HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_CHANNR_F(chid) + | HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_ADDR_F(rd_ptr), + HOST1X_SYNC_CFPEEK_CTRL); + val = host1x_sync_readl(m, HOST1X_SYNC_CFPEEK_READ); + + show_channel_word(o, &state, &count, 0, val, NULL); + + if (rd_ptr == end) + rd_ptr = start; + else + rd_ptr++; + } while (rd_ptr != wr_ptr); + + if (state == NVHOST_DBG_STATE_DATA) + host1x_debug_output(o, ", ...])\n"); + host1x_debug_output(o, "\n"); + + host1x_sync_writel(m, 0x0, HOST1X_SYNC_CFPEEK_CTRL); +} + +static void host1x_debug_show_mlocks(struct host1x *m, struct output *o) +{ + int i; + + host1x_debug_output(o, "---- mlocks ----\n"); + for (i = 0; i < host1x_syncpt_nb_mlocks(m); i++) { + u32 owner = host1x_sync_readl(m, + HOST1X_SYNC_MLOCK_OWNER_0 + i); + if (HOST1X_SYNC_MLOCK_OWNER_0_MLOCK_CH_OWNS_0_V(owner)) + host1x_debug_output(o, "%d: locked by channel %d\n", + i, + HOST1X_SYNC_MLOCK_OWNER_0_MLOCK_OWNER_CHID_0_F( + owner)); + else if (HOST1X_SYNC_MLOCK_OWNER_0_MLOCK_CPU_OWNS_0_V(owner)) + host1x_debug_output(o, "%d: locked by cpu\n", i); + else + host1x_debug_output(o, "%d: unlocked\n", i); + } + host1x_debug_output(o, "\n"); +} + +static const struct host1x_debug_ops host1x_debug_ops = { + .show_channel_cdma = host1x_debug_show_channel_cdma, + .show_channel_fifo = host1x_debug_show_channel_fifo, + .show_mlocks = host1x_debug_show_mlocks, +}; diff --git a/drivers/gpu/host1x/hw/host1x01.c b/drivers/gpu/host1x/hw/host1x01.c index 7569a1e..1bc1552 100644 --- a/drivers/gpu/host1x/hw/host1x01.c +++ b/drivers/gpu/host1x/hw/host1x01.c @@ -28,6 +28,7 @@
#include "hw/channel_hw.c" #include "hw/cdma_hw.c" +#include "hw/debug_hw.c" #include "hw/syncpt_hw.c" #include "hw/intr_hw.c"
@@ -36,6 +37,7 @@ int host1x01_init(struct host1x *host) host->channel_op = host1x_channel_ops; host->cdma_op = host1x_cdma_ops; host->cdma_pb_op = host1x_pushbuffer_ops; + host->debug_op = host1x_debug_ops; host->syncpt_op = host1x_syncpt_ops; host->intr_op = host1x_intr_ops;
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_channel.h b/drivers/gpu/host1x/hw/hw_host1x01_channel.h index dad4fee..79bcd5a 100644 --- a/drivers/gpu/host1x/hw/hw_host1x01_channel.h +++ b/drivers/gpu/host1x/hw/hw_host1x01_channel.h @@ -51,6 +51,18 @@ #ifndef __hw_host1x_channel_host1x_h__ #define __hw_host1x_channel_host1x_h__
+static inline u32 host1x_channel_fifostat_r(void) +{ + return 0x0; +} +#define HOST1X_CHANNEL_FIFOSTAT \ + host1x_channel_fifostat_r() +static inline u32 host1x_channel_fifostat_cfempty_v(u32 r) +{ + return (r >> 10) & 0x1; +} +#define HOST1X_CHANNEL_FIFOSTAT_CFEMPTY_V(r) \ + host1x_channel_fifostat_cfempty_v(r) static inline u32 host1x_channel_dmastart_r(void) { return 0x14; @@ -87,6 +99,12 @@ static inline u32 host1x_channel_dmactrl_dmastop_f(u32 v) } #define HOST1X_CHANNEL_DMACTRL_DMASTOP_F(v) \ host1x_channel_dmactrl_dmastop_f(v) +static inline u32 host1x_channel_dmactrl_dmastop_v(u32 r) +{ + return (r >> 0) & 0x1; +} +#define HOST1X_CHANNEL_DMACTRL_DMASTOP_V(r) \ + host1x_channel_dmactrl_dmastop_v(r) static inline u32 host1x_channel_dmactrl_dmagetrst_f(u32 v) { return (v & 0x1) << 1; diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h index 3073d37..22daa3f 100644 --- a/drivers/gpu/host1x/hw/hw_host1x01_sync.h +++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h @@ -69,6 +69,24 @@ static inline u32 host1x_sync_syncpt_thresh_int_enable_cpu0_r(void) } #define HOST1X_SYNC_SYNCPT_THRESH_INT_ENABLE_CPU0 \ host1x_sync_syncpt_thresh_int_enable_cpu0_r() +static inline u32 host1x_sync_cf0_setup_r(void) +{ + return 0x80; +} +#define HOST1X_SYNC_CF0_SETUP \ + host1x_sync_cf0_setup_r() +static inline u32 host1x_sync_cf0_setup_cf0_base_v(u32 r) +{ + return (r >> 0) & 0x1ff; +} +#define HOST1X_SYNC_CF0_SETUP_CF0_BASE_V(r) \ + host1x_sync_cf0_setup_cf0_base_v(r) +static inline u32 host1x_sync_cf0_setup_cf0_limit_v(u32 r) +{ + return (r >> 16) & 0x1ff; +} +#define HOST1X_SYNC_CF0_SETUP_CF0_LIMIT_V(r) \ + host1x_sync_cf0_setup_cf0_limit_v(r) static inline u32 host1x_sync_cmdproc_stop_r(void) { return 0xac; @@ -99,6 +117,30 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void) } #define HOST1X_SYNC_IP_BUSY_TIMEOUT \ host1x_sync_ip_busy_timeout_r() +static inline u32 host1x_sync_mlock_owner_0_r(void) +{ + return 0x340; +} +#define HOST1X_SYNC_MLOCK_OWNER_0 \ + host1x_sync_mlock_owner_0_r() +static inline u32 host1x_sync_mlock_owner_0_mlock_owner_chid_0_f(u32 v) +{ + return (v & 0xf) << 8; +} +#define HOST1X_SYNC_MLOCK_OWNER_0_MLOCK_OWNER_CHID_0_F(v) \ + host1x_sync_mlock_owner_0_mlock_owner_chid_0_f(v) +static inline u32 host1x_sync_mlock_owner_0_mlock_cpu_owns_0_v(u32 r) +{ + return (r >> 1) & 0x1; +} +#define HOST1X_SYNC_MLOCK_OWNER_0_MLOCK_CPU_OWNS_0_V(r) \ + host1x_sync_mlock_owner_0_mlock_cpu_owns_0_v(r) +static inline u32 host1x_sync_mlock_owner_0_mlock_ch_owns_0_v(u32 r) +{ + return (r >> 0) & 0x1; +} +#define HOST1X_SYNC_MLOCK_OWNER_0_MLOCK_CH_OWNS_0_V(r) \ + host1x_sync_mlock_owner_0_mlock_ch_owns_0_v(r) static inline u32 host1x_sync_syncpt_0_r(void) { return 0x400; @@ -123,4 +165,77 @@ static inline u32 host1x_sync_syncpt_cpu_incr_r(void) } #define HOST1X_SYNC_SYNCPT_CPU_INCR \ host1x_sync_syncpt_cpu_incr_r() +static inline u32 host1x_sync_cbread0_r(void) +{ + return 0x720; +} +#define HOST1X_SYNC_CBREAD0 \ + host1x_sync_cbread0_r() +static inline u32 host1x_sync_cfpeek_ctrl_r(void) +{ + return 0x74c; +} +#define HOST1X_SYNC_CFPEEK_CTRL \ + host1x_sync_cfpeek_ctrl_r() +static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_addr_f(u32 v) +{ + return (v & 0x1ff) << 0; +} +#define HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_ADDR_F(v) \ + host1x_sync_cfpeek_ctrl_cfpeek_addr_f(v) +static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_channr_f(u32 v) +{ + return (v & 0x7) << 16; +} +#define HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_CHANNR_F(v) \ + host1x_sync_cfpeek_ctrl_cfpeek_channr_f(v) +static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_ena_f(u32 v) +{ + return (v & 0x1) << 31; +} +#define HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_ENA_F(v) \ + host1x_sync_cfpeek_ctrl_cfpeek_ena_f(v) +static inline u32 host1x_sync_cfpeek_read_r(void) +{ + return 0x750; +} +#define HOST1X_SYNC_CFPEEK_READ \ + host1x_sync_cfpeek_read_r() +static inline u32 host1x_sync_cfpeek_ptrs_r(void) +{ + return 0x754; +} +#define HOST1X_SYNC_CFPEEK_PTRS \ + host1x_sync_cfpeek_ptrs_r() +static inline u32 host1x_sync_cfpeek_ptrs_cf_rd_ptr_v(u32 r) +{ + return (r >> 0) & 0x1ff; +} +#define HOST1X_SYNC_CFPEEK_PTRS_CF_RD_PTR_V(r) \ + host1x_sync_cfpeek_ptrs_cf_rd_ptr_v(r) +static inline u32 host1x_sync_cfpeek_ptrs_cf_wr_ptr_v(u32 r) +{ + return (r >> 16) & 0x1ff; +} +#define HOST1X_SYNC_CFPEEK_PTRS_CF_WR_PTR_V(r) \ + host1x_sync_cfpeek_ptrs_cf_wr_ptr_v(r) +static inline u32 host1x_sync_cbstat_0_r(void) +{ + return 0x758; +} +#define HOST1X_SYNC_CBSTAT_0 \ + host1x_sync_cbstat_0_r() +static inline u32 host1x_sync_cbstat_0_cboffset0_v(u32 r) +{ + return (r >> 0) & 0xffff; +} +#define HOST1X_SYNC_CBSTAT_0_CBOFFSET0_V(r) \ + host1x_sync_cbstat_0_cboffset0_v(r) +static inline u32 host1x_sync_cbstat_0_cbclass0_v(u32 r) +{ + return (r >> 16) & 0x3ff; +} +#define HOST1X_SYNC_CBSTAT_0_CBCLASS0_V(r) \ + host1x_sync_cbstat_0_cbclass0_v(r) + #endif /* __hw_host1x01_sync_h__ */ diff --git a/drivers/gpu/host1x/hw/syncpt_hw.c b/drivers/gpu/host1x/hw/syncpt_hw.c index ba48cee..c64c3b0 100644 --- a/drivers/gpu/host1x/hw/syncpt_hw.c +++ b/drivers/gpu/host1x/hw/syncpt_hw.c @@ -90,6 +90,7 @@ static void syncpt_cpu_incr(struct host1x_syncpt *sp) dev_err(&dev->dev->dev, "Trying to increment syncpoint id %d beyond max\n", sp->id); + host1x_debug_dump(sp->dev); return; } host1x_sync_writel(dev, BIT_MASK(sp->id), diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c index f21c688..191f65f 100644 --- a/drivers/gpu/host1x/syncpt.c +++ b/drivers/gpu/host1x/syncpt.c @@ -23,6 +23,7 @@ #include "syncpt.h" #include "dev.h" #include "intr.h" +#include "debug.h" #include <trace/events/host1x.h>
#define MAX_SYNCPT_LENGTH 5 @@ -211,6 +212,8 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp, current->comm, sp->id, sp->name, thresh, timeout); sp->dev->syncpt_op.debug(sp); + if (check_count == MAX_STUCK_CHECK_COUNT) + host1x_debug_dump(sp->dev); check_count++; } }
On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote:
diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
[...]
+static pid_t host1x_debug_null_kickoff_pid; +unsigned int host1x_debug_trace_cmdbuf;
+static pid_t host1x_debug_force_timeout_pid; +static u32 host1x_debug_force_timeout_val; +static u32 host1x_debug_force_timeout_channel;
Please group static and non-static variables.
diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
[...]
+struct output {
- void (*fn)(void *ctx, const char *str, size_t len);
- void *ctx;
- char buf[256];
+};
Do we really need this kind of abstraction? There really should be only one location where debug information is obtained, so I don't see a need for this.
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
[...]
struct host1x_syncpt_ops { void (*reset)(struct host1x_syncpt *); void (*reset_wait_base)(struct host1x_syncpt *); @@ -117,6 +133,7 @@ struct host1x { struct host1x_channel_ops channel_op; struct host1x_cdma_ops cdma_op; struct host1x_pushbuffer_ops cdma_pb_op;
- struct host1x_debug_ops debug_op; struct host1x_syncpt_ops syncpt_op; struct host1x_intr_ops intr_op;
Again, better to pass in a const pointer to the ops structure.
diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
+static int show_channel_command(struct output *o, u32 addr, u32 val, int *count) +{
- unsigned mask;
- unsigned subop;
- switch (val >> 28) {
- case 0x0:
These can easily be derived by looking at the debug output, but it may still make sense to assign symbolic names to them.
+static void show_channel_word(struct output *o, int *state, int *count,
u32 addr, u32 val, struct host1x_cdma *cdma)
+{
- static int start_count, dont_print;
What if two processes read debug information at the same time?
+static void do_show_channel_gather(struct output *o,
phys_addr_t phys_addr,
u32 words, struct host1x_cdma *cdma,
phys_addr_t pin_addr, u32 *map_addr)
+{
- /* Map dmaget cursor to corresponding mem handle */
- u32 offset;
- int state, count, i;
- offset = phys_addr - pin_addr;
- /*
* Sometimes we're given different hardware address to the same
* page - in these cases the offset will get an invalid number and
* we just have to bail out.
*/
Why's that?
- map_addr = host1x_memmgr_mmap(mem);
- if (!map_addr) {
host1x_debug_output(o, "[could not mmap]\n");
return;
- }
- /* Get base address from mem */
- sgt = host1x_memmgr_pin(mem);
- if (IS_ERR(sgt)) {
host1x_debug_output(o, "[couldn't pin]\n");
host1x_memmgr_munmap(mem, map_addr);
return;
- }
Maybe you should stick with one of "could not" or "couldn't". Makes it easier to search for.
+static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma) +{
- struct host1x_job *job;
- list_for_each_entry(job, &cdma->sync_queue, list) {
int i;
host1x_debug_output(o,
"\n%p: JOB, syncpt_id=%d, syncpt_val=%d,"
" first_get=%08x, timeout=%d"
" num_slots=%d, num_handles=%d\n",
job,
job->syncpt_id,
job->syncpt_end,
job->first_get,
job->timeout,
job->num_slots,
job->num_unpins);
This could go on fewer lines.
+static void host1x_debug_show_channel_cdma(struct host1x *m,
- struct host1x_channel *ch, struct output *o, int chid)
+{
[...]
- switch (cbstat) {
- case 0x00010008:
Again, symbolic names would be nice.
Thierry
On 04.02.2013 03:03, Thierry Reding wrote:
- PGP Signed by an unknown key
On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote:
diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
[...]
+static pid_t host1x_debug_null_kickoff_pid; +unsigned int host1x_debug_trace_cmdbuf;
+static pid_t host1x_debug_force_timeout_pid; +static u32 host1x_debug_force_timeout_val; +static u32 host1x_debug_force_timeout_channel;
Please group static and non-static variables.
Will do.
diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
[...]
+struct output {
- void (*fn)(void *ctx, const char *str, size_t len);
- void *ctx;
- char buf[256];
+};
Do we really need this kind of abstraction? There really should be only one location where debug information is obtained, so I don't see a need for this.
This is used by debugfs code to direct to debugfs, and nvhost_debug_dump() to send via printk.
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
[...]
struct host1x_syncpt_ops { void (*reset)(struct host1x_syncpt *); void (*reset_wait_base)(struct host1x_syncpt *); @@ -117,6 +133,7 @@ struct host1x { struct host1x_channel_ops channel_op; struct host1x_cdma_ops cdma_op; struct host1x_pushbuffer_ops cdma_pb_op;
- struct host1x_debug_ops debug_op; struct host1x_syncpt_ops syncpt_op; struct host1x_intr_ops intr_op;
Again, better to pass in a const pointer to the ops structure.
Ok.
diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
+static int show_channel_command(struct output *o, u32 addr, u32 val, int *count) +{
- unsigned mask;
- unsigned subop;
- switch (val >> 28) {
- case 0x0:
These can easily be derived by looking at the debug output, but it may still make sense to assign symbolic names to them.
I have another suggestion. In downstream I removed the decoding part and I just print out a string of hex. That removes quite a bit bunch of code from kernel. It makes the debug output also more compact.
It's much easier to write a user space program to decode than maintain it in kernel.
+static void show_channel_word(struct output *o, int *state, int *count,
u32 addr, u32 val, struct host1x_cdma *cdma)
+{
- static int start_count, dont_print;
What if two processes read debug information at the same time?
show_channels() acquires cdma.lock, so that shouldn't happen.
+static void do_show_channel_gather(struct output *o,
phys_addr_t phys_addr,
u32 words, struct host1x_cdma *cdma,
phys_addr_t pin_addr, u32 *map_addr)
+{
- /* Map dmaget cursor to corresponding mem handle */
- u32 offset;
- int state, count, i;
- offset = phys_addr - pin_addr;
- /*
* Sometimes we're given different hardware address to the same
* page - in these cases the offset will get an invalid number and
* we just have to bail out.
*/
Why's that?
Because of a race - memory might've been unpinned and unmapped from IOMMU and when we re-map (pin), we are given a new address.
But, I think this comment is a bit stale - we used to dump also old gathers. The latest code only dumps jobs in sync queue, so the race shouldn't happen.
- map_addr = host1x_memmgr_mmap(mem);
- if (!map_addr) {
host1x_debug_output(o, "[could not mmap]\n");
return;
- }
- /* Get base address from mem */
- sgt = host1x_memmgr_pin(mem);
- if (IS_ERR(sgt)) {
host1x_debug_output(o, "[couldn't pin]\n");
host1x_memmgr_munmap(mem, map_addr);
return;
- }
Maybe you should stick with one of "could not" or "couldn't". Makes it easier to search for.
I prefer "could not", so I'll use that.
+static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma) +{
- struct host1x_job *job;
- list_for_each_entry(job, &cdma->sync_queue, list) {
int i;
host1x_debug_output(o,
"\n%p: JOB, syncpt_id=%d, syncpt_val=%d,"
" first_get=%08x, timeout=%d"
" num_slots=%d, num_handles=%d\n",
job,
job->syncpt_id,
job->syncpt_end,
job->first_get,
job->timeout,
job->num_slots,
job->num_unpins);
This could go on fewer lines.
Yes, will merge.
+static void host1x_debug_show_channel_cdma(struct host1x *m,
- struct host1x_channel *ch, struct output *o, int chid)
+{
[...]
- switch (cbstat) {
- case 0x00010008:
Again, symbolic names would be nice.
I propose I remove the decoding from kernel, and save 200 lines.
Terje
On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote:
On 04.02.2013 03:03, Thierry Reding wrote:
On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote:
diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
[...]
+struct output {
- void (*fn)(void *ctx, const char *str, size_t len);
- void *ctx;
- char buf[256];
+};
Do we really need this kind of abstraction? There really should be only one location where debug information is obtained, so I don't see a need for this.
This is used by debugfs code to direct to debugfs, and nvhost_debug_dump() to send via printk.
Yes, that was precisely my point. Why bother providing the same data via several output methods. debugfs is good for showing large amounts of data such as register dumps or a tabular representation of syncpoints for instance.
If, however, you want to interactively show debug information using printk the same format isn't very useful and something more reduced is often better.
diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
+static int show_channel_command(struct output *o, u32 addr, u32 val, int *count) +{
- unsigned mask;
- unsigned subop;
- switch (val >> 28) {
- case 0x0:
These can easily be derived by looking at the debug output, but it may still make sense to assign symbolic names to them.
I have another suggestion. In downstream I removed the decoding part and I just print out a string of hex. That removes quite a bit bunch of code from kernel. It makes the debug output also more compact.
It's much easier to write a user space program to decode than maintain it in kernel.
I don't know. I think if you use in-kernel debugging facilities such as debugfs or printk, then the output should be self-explanatory. However I do see the usefulness of having a binary dump that can be decoded in userspace. But I think if we want to go that way we should make that a separate interface. USB provides something like that, which can then be fed to libpcap or wireshark to capture and analyze USB traffic. If done properly you get replay functionality for free. I don't know what infra- structure exists to help with implementing something similar.
So I think having debugfs output some data about syncpoints or the state of channels might be useful to quickly diagnose a certain set of problems but for anything more involved maybe a complete binary dump may be better.
I'm not sure whether doing this would be acceptable though. Maybe Dave or somebody else on the lists can answer that. An alternative way to achieve the same would be to hook ioctl() from userspace and not do any of it in kernel space.
+static void show_channel_word(struct output *o, int *state, int *count,
u32 addr, u32 val, struct host1x_cdma *cdma)
+{
- static int start_count, dont_print;
What if two processes read debug information at the same time?
show_channels() acquires cdma.lock, so that shouldn't happen.
Okay. Another solution would be to pass around a debug context which keeps track of the variables. But if we opt for a more involved dump interface as discussed above this will no longer be relevant.
+static void do_show_channel_gather(struct output *o,
phys_addr_t phys_addr,
u32 words, struct host1x_cdma *cdma,
phys_addr_t pin_addr, u32 *map_addr)
+{
- /* Map dmaget cursor to corresponding mem handle */
- u32 offset;
- int state, count, i;
- offset = phys_addr - pin_addr;
- /*
* Sometimes we're given different hardware address to the same
* page - in these cases the offset will get an invalid number and
* we just have to bail out.
*/
Why's that?
Because of a race - memory might've been unpinned and unmapped from IOMMU and when we re-map (pin), we are given a new address.
But, I think this comment is a bit stale - we used to dump also old gathers. The latest code only dumps jobs in sync queue, so the race shouldn't happen.
Okay. In the context of a channel dump interface this may not be relevant anymore. Can you think of any issue that wouldn't be detectable or debuggable by analyzing a binary dump of the data within a channel? I'm asking because at that point we wouldn't be able to access any of the in-kernel data structures but would have to rely on the data itself for diagnostics. IOMMU virtual addresses won't be available and so on.
+static void host1x_debug_show_channel_cdma(struct host1x *m,
- struct host1x_channel *ch, struct output *o, int chid)
+{
[...]
- switch (cbstat) {
- case 0x00010008:
Again, symbolic names would be nice.
I propose I remove the decoding from kernel, and save 200 lines.
I think it could be more than 200 lines. If all we provide in the kernel is some statistics about syncpoint usage or channel state that should be a lot less code than we have now.
However that would make it necessary to provide userspace tools that can provide the same quality of diagnostics, so I'm not sure if it's doable without access to the in-kernel data structures.
Thierry
On 05.02.2013 01:15, Thierry Reding wrote:
On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote:
This is used by debugfs code to direct to debugfs, and nvhost_debug_dump() to send via printk.
Yes, that was precisely my point. Why bother providing the same data via several output methods. debugfs is good for showing large amounts of data such as register dumps or a tabular representation of syncpoints for instance.
If, however, you want to interactively show debug information using printk the same format isn't very useful and something more reduced is often better.
debugfs is there to be able to get a reliable dump of host1x state (f.ex. no lines intermixed with other output).
printk output is there because often we get just UART logs from failure cases, and having as much information as possible in the logs speeds up debugging.
Both of them need to output the values of sync points, and the channel state. Dumping all of that consists of a lot of code, and I wouldn't want to duplicate that for two output formats.
I have another suggestion. In downstream I removed the decoding part and I just print out a string of hex. That removes quite a bit bunch of code from kernel. It makes the debug output also more compact.
I don't know. I think if you use in-kernel debugging facilities such as debugfs or printk, then the output should be self-explanatory. However I do see the usefulness of having a binary dump that can be decoded in userspace. But I think if we want to go that way we should make that a separate interface. USB provides something like that, which can then be fed to libpcap or wireshark to capture and analyze USB traffic. If done properly you get replay functionality for free. I don't know what infra- structure exists to help with implementing something similar.
It's not actually binary. I think I misrepresented the suggestion.
I'm suggesting that we'd display only the contents of command FIFO and contents of gathers (i.e. all opcodes) in hex format, not decoded. All other text would remain as is, so syncpt values, etc would be readable by a glance.
The user space tool can then take the streams and decode them if needed.
We've noticed that the decoded opcodes format can be very long and sometimes takes a minute to dump out via a slow console. The hex output is much more compact and faster to dump.
Actual tracing or wireshark kind of capability would come via decoding the ftrace log. When enabled, everything that is written to the channel, is also written to ftrace.
+static void show_channel_word(struct output *o, int *state, int *count,
u32 addr, u32 val, struct host1x_cdma *cdma)
+{
- static int start_count, dont_print;
What if two processes read debug information at the same time?
show_channels() acquires cdma.lock, so that shouldn't happen.
Okay. Another solution would be to pass around a debug context which keeps track of the variables. But if we opt for a more involved dump interface as discussed above this will no longer be relevant.
Actually, debugging process needs cdma.lock, because it goes through the cdma queue. Also command FIFO dumping is something that must be done by a single thread at a time.
Okay. In the context of a channel dump interface this may not be relevant anymore. Can you think of any issue that wouldn't be detectable or debuggable by analyzing a binary dump of the data within a channel? I'm asking because at that point we wouldn't be able to access any of the in-kernel data structures but would have to rely on the data itself for diagnostics. IOMMU virtual addresses won't be available and so on.
In many cases, looking at syncpt values, and channel state (active/waiting on a syncpt, etc) gives an indication on what is the current state of hardware. But, very often problems are ripple effects on something that happened earlier and the job that caused the problem has already been freed and is not visible in the dump.
To get a full history, we need often need the ftrace log.
Terje
On Wed, Feb 06, 2013 at 12:58:19PM -0800, Terje Bergström wrote:
On 05.02.2013 01:15, Thierry Reding wrote:
On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote:
This is used by debugfs code to direct to debugfs, and nvhost_debug_dump() to send via printk.
Yes, that was precisely my point. Why bother providing the same data via several output methods. debugfs is good for showing large amounts of data such as register dumps or a tabular representation of syncpoints for instance.
If, however, you want to interactively show debug information using printk the same format isn't very useful and something more reduced is often better.
debugfs is there to be able to get a reliable dump of host1x state (f.ex. no lines intermixed with other output).
printk output is there because often we get just UART logs from failure cases, and having as much information as possible in the logs speeds up debugging.
Both of them need to output the values of sync points, and the channel state. Dumping all of that consists of a lot of code, and I wouldn't want to duplicate that for two output formats.
I'm still not convinced, but I think I could live with it. =)
I have another suggestion. In downstream I removed the decoding part and I just print out a string of hex. That removes quite a bit bunch of code from kernel. It makes the debug output also more compact.
I don't know. I think if you use in-kernel debugging facilities such as debugfs or printk, then the output should be self-explanatory. However I do see the usefulness of having a binary dump that can be decoded in userspace. But I think if we want to go that way we should make that a separate interface. USB provides something like that, which can then be fed to libpcap or wireshark to capture and analyze USB traffic. If done properly you get replay functionality for free. I don't know what infra- structure exists to help with implementing something similar.
It's not actually binary. I think I misrepresented the suggestion.
I'm suggesting that we'd display only the contents of command FIFO and contents of gathers (i.e. all opcodes) in hex format, not decoded. All other text would remain as is, so syncpt values, etc would be readable by a glance.
The user space tool can then take the streams and decode them if needed.
We've noticed that the decoded opcodes format can be very long and sometimes takes a minute to dump out via a slow console. The hex output is much more compact and faster to dump.
Actual tracing or wireshark kind of capability would come via decoding the ftrace log. When enabled, everything that is written to the channel, is also written to ftrace.
Okay, I'll have to take a closer look at ftrace since I've never used it before. It sounds like extra infrastructure won't be necessary then.
+static void show_channel_word(struct output *o, int *state, int *count,
u32 addr, u32 val, struct host1x_cdma *cdma)
+{
- static int start_count, dont_print;
What if two processes read debug information at the same time?
show_channels() acquires cdma.lock, so that shouldn't happen.
Okay. Another solution would be to pass around a debug context which keeps track of the variables. But if we opt for a more involved dump interface as discussed above this will no longer be relevant.
Actually, debugging process needs cdma.lock, because it goes through the cdma queue. Also command FIFO dumping is something that must be done by a single thread at a time.
Okay. In the context of a channel dump interface this may not be relevant anymore. Can you think of any issue that wouldn't be detectable or debuggable by analyzing a binary dump of the data within a channel? I'm asking because at that point we wouldn't be able to access any of the in-kernel data structures but would have to rely on the data itself for diagnostics. IOMMU virtual addresses won't be available and so on.
In many cases, looking at syncpt values, and channel state (active/waiting on a syncpt, etc) gives an indication on what is the current state of hardware. But, very often problems are ripple effects on something that happened earlier and the job that caused the problem has already been freed and is not visible in the dump.
To get a full history, we need often need the ftrace log.
So that's already covered. Great!
Thierry
Make drm part of host1x driver.
Signed-off-by: Terje Bergstrom tbergstrom@nvidia.com --- drivers/gpu/drm/Kconfig | 2 -- drivers/gpu/drm/Makefile | 1 - drivers/gpu/drm/tegra/Makefile | 7 ------- drivers/gpu/host1x/Kconfig | 3 +++ drivers/gpu/host1x/Makefile | 6 ++++++ drivers/gpu/{drm/tegra => host1x/drm}/Kconfig | 0 drivers/gpu/{drm/tegra => host1x/drm}/dc.c | 0 drivers/gpu/{drm/tegra => host1x/drm}/dc.h | 0 drivers/gpu/{drm/tegra => host1x/drm}/drm.c | 0 drivers/gpu/{drm/tegra => host1x/drm}/drm.h | 6 +++--- drivers/gpu/{drm/tegra => host1x/drm}/fb.c | 0 drivers/gpu/{drm/tegra => host1x/drm}/hdmi.c | 0 drivers/gpu/{drm/tegra => host1x/drm}/hdmi.h | 0 drivers/gpu/{drm/tegra => host1x/drm}/host1x.c | 0 drivers/gpu/{drm/tegra => host1x/drm}/output.c | 0 drivers/gpu/{drm/tegra => host1x/drm}/rgb.c | 0 drivers/gpu/host1x/host1x_client.h | 25 ++++++++++++++++++++++++ 17 files changed, 37 insertions(+), 13 deletions(-) delete mode 100644 drivers/gpu/drm/tegra/Makefile rename drivers/gpu/{drm/tegra => host1x/drm}/Kconfig (100%) rename drivers/gpu/{drm/tegra => host1x/drm}/dc.c (100%) rename drivers/gpu/{drm/tegra => host1x/drm}/dc.h (100%) rename drivers/gpu/{drm/tegra => host1x/drm}/drm.c (100%) rename drivers/gpu/{drm/tegra => host1x/drm}/drm.h (98%) rename drivers/gpu/{drm/tegra => host1x/drm}/fb.c (100%) rename drivers/gpu/{drm/tegra => host1x/drm}/hdmi.c (100%) rename drivers/gpu/{drm/tegra => host1x/drm}/hdmi.h (100%) rename drivers/gpu/{drm/tegra => host1x/drm}/host1x.c (100%) rename drivers/gpu/{drm/tegra => host1x/drm}/output.c (100%) rename drivers/gpu/{drm/tegra => host1x/drm}/rgb.c (100%) create mode 100644 drivers/gpu/host1x/host1x_client.h
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index 983201b..18321b68b 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -210,5 +210,3 @@ source "drivers/gpu/drm/mgag200/Kconfig" source "drivers/gpu/drm/cirrus/Kconfig"
source "drivers/gpu/drm/shmobile/Kconfig" - -source "drivers/gpu/drm/tegra/Kconfig" diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile index 6f58c81..f54c72a 100644 --- a/drivers/gpu/drm/Makefile +++ b/drivers/gpu/drm/Makefile @@ -49,5 +49,4 @@ obj-$(CONFIG_DRM_GMA500) += gma500/ obj-$(CONFIG_DRM_UDL) += udl/ obj-$(CONFIG_DRM_AST) += ast/ obj-$(CONFIG_DRM_SHMOBILE) +=shmobile/ -obj-$(CONFIG_DRM_TEGRA) += tegra/ obj-y += i2c/ diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile deleted file mode 100644 index 80f73d1..0000000 --- a/drivers/gpu/drm/tegra/Makefile +++ /dev/null @@ -1,7 +0,0 @@ -ccflags-y := -Iinclude/drm -ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG - -tegra-drm-y := drm.o fb.o dc.o host1x.o -tegra-drm-y += output.o rgb.o hdmi.o - -obj-$(CONFIG_DRM_TEGRA) += tegra-drm.o diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig index 57680a6..558b660 100644 --- a/drivers/gpu/host1x/Kconfig +++ b/drivers/gpu/host1x/Kconfig @@ -1,4 +1,5 @@ config TEGRA_HOST1X + depends on DRM tristate "Tegra host1x driver" help Driver for the Tegra host1x hardware. @@ -26,4 +27,6 @@ config TEGRA_HOST1X_FIREWALL
If unsure, choose Y.
+source "drivers/gpu/host1x/drm/Kconfig" + endif diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile index 697d49a..ffc8bf1 100644 --- a/drivers/gpu/host1x/Makefile +++ b/drivers/gpu/host1x/Makefile @@ -12,4 +12,10 @@ host1x-y = \ hw/host1x01.o
host1x-$(CONFIG_TEGRA_HOST1X_CMA) += cma.o + +ccflags-y += -Iinclude/drm +ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG + +host1x-$(CONFIG_DRM_TEGRA) += drm/drm.o drm/fb.o drm/dc.o drm/host1x.o +host1x-$(CONFIG_DRM_TEGRA) += drm/output.o drm/rgb.o drm/hdmi.o obj-$(CONFIG_TEGRA_HOST1X) += host1x.o diff --git a/drivers/gpu/drm/tegra/Kconfig b/drivers/gpu/host1x/drm/Kconfig similarity index 100% rename from drivers/gpu/drm/tegra/Kconfig rename to drivers/gpu/host1x/drm/Kconfig diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/host1x/drm/dc.c similarity index 100% rename from drivers/gpu/drm/tegra/dc.c rename to drivers/gpu/host1x/drm/dc.c diff --git a/drivers/gpu/drm/tegra/dc.h b/drivers/gpu/host1x/drm/dc.h similarity index 100% rename from drivers/gpu/drm/tegra/dc.h rename to drivers/gpu/host1x/drm/dc.h diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/host1x/drm/drm.c similarity index 100% rename from drivers/gpu/drm/tegra/drm.c rename to drivers/gpu/host1x/drm/drm.c diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/host1x/drm/drm.h similarity index 98% rename from drivers/gpu/drm/tegra/drm.h rename to drivers/gpu/host1x/drm/drm.h index 741b5dc..e68b4ac 100644 --- a/drivers/gpu/drm/tegra/drm.h +++ b/drivers/gpu/host1x/drm/drm.h @@ -7,8 +7,8 @@ * published by the Free Software Foundation. */
-#ifndef TEGRA_DRM_H -#define TEGRA_DRM_H 1 +#ifndef HOST1X_DRM_H +#define HOST1X_DRM_H 1
#include <drm/drmP.h> #include <drm/drm_crtc_helper.h> @@ -213,4 +213,4 @@ extern struct platform_driver tegra_hdmi_driver; extern struct platform_driver tegra_dc_driver; extern struct drm_driver tegra_drm_driver;
-#endif /* TEGRA_DRM_H */ +#endif /* HOST1X_DRM_H */ diff --git a/drivers/gpu/drm/tegra/fb.c b/drivers/gpu/host1x/drm/fb.c similarity index 100% rename from drivers/gpu/drm/tegra/fb.c rename to drivers/gpu/host1x/drm/fb.c diff --git a/drivers/gpu/drm/tegra/hdmi.c b/drivers/gpu/host1x/drm/hdmi.c similarity index 100% rename from drivers/gpu/drm/tegra/hdmi.c rename to drivers/gpu/host1x/drm/hdmi.c diff --git a/drivers/gpu/drm/tegra/hdmi.h b/drivers/gpu/host1x/drm/hdmi.h similarity index 100% rename from drivers/gpu/drm/tegra/hdmi.h rename to drivers/gpu/host1x/drm/hdmi.h diff --git a/drivers/gpu/drm/tegra/host1x.c b/drivers/gpu/host1x/drm/host1x.c similarity index 100% rename from drivers/gpu/drm/tegra/host1x.c rename to drivers/gpu/host1x/drm/host1x.c diff --git a/drivers/gpu/drm/tegra/output.c b/drivers/gpu/host1x/drm/output.c similarity index 100% rename from drivers/gpu/drm/tegra/output.c rename to drivers/gpu/host1x/drm/output.c diff --git a/drivers/gpu/drm/tegra/rgb.c b/drivers/gpu/host1x/drm/rgb.c similarity index 100% rename from drivers/gpu/drm/tegra/rgb.c rename to drivers/gpu/host1x/drm/rgb.c diff --git a/drivers/gpu/host1x/host1x_client.h b/drivers/gpu/host1x/host1x_client.h new file mode 100644 index 0000000..fdd2920 --- /dev/null +++ b/drivers/gpu/host1x/host1x_client.h @@ -0,0 +1,25 @@ +/* + * Copyright (c) 2013, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef HOST1X_CLIENT_H +#define HOST1X_CLIENT_H + +struct platform_device; + +void host1x_set_drm_data(struct platform_device *pdev, void *data); +void *host1x_get_drm_data(struct platform_device *pdev); + +#endif
On Tue, Jan 15, 2013 at 01:44:01PM +0200, Terje Bergstrom wrote: [...]
diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile index 697d49a..ffc8bf1 100644 --- a/drivers/gpu/host1x/Makefile +++ b/drivers/gpu/host1x/Makefile @@ -12,4 +12,10 @@ host1x-y = \ hw/host1x01.o
host1x-$(CONFIG_TEGRA_HOST1X_CMA) += cma.o
+ccflags-y += -Iinclude/drm +ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
+host1x-$(CONFIG_DRM_TEGRA) += drm/drm.o drm/fb.o drm/dc.o drm/host1x.o +host1x-$(CONFIG_DRM_TEGRA) += drm/output.o drm/rgb.o drm/hdmi.o obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
Can this be moved into a separate Makefile in the drm subdirectory?
diff --git a/drivers/gpu/host1x/host1x_client.h b/drivers/gpu/host1x/host1x_client.h
[...]
new file mode 100644 index 0000000..fdd2920 --- /dev/null +++ b/drivers/gpu/host1x/host1x_client.h @@ -0,0 +1,25 @@ +/*
- Copyright (c) 2013, NVIDIA Corporation.
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
- You should have received a copy of the GNU General Public License
- along with this program. If not, see http://www.gnu.org/licenses/.
- */
+#ifndef HOST1X_CLIENT_H +#define HOST1X_CLIENT_H
+struct platform_device;
+void host1x_set_drm_data(struct platform_device *pdev, void *data); +void *host1x_get_drm_data(struct platform_device *pdev);
+#endif
These aren't defined or used yet.
Thierry
On 04.02.2013 03:08, Thierry Reding wrote:
- PGP Signed by an unknown key
On Tue, Jan 15, 2013 at 01:44:01PM +0200, Terje Bergstrom wrote: [...]
diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile index 697d49a..ffc8bf1 100644 --- a/drivers/gpu/host1x/Makefile +++ b/drivers/gpu/host1x/Makefile @@ -12,4 +12,10 @@ host1x-y = \ hw/host1x01.o
host1x-$(CONFIG_TEGRA_HOST1X_CMA) += cma.o
+ccflags-y += -Iinclude/drm +ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
+host1x-$(CONFIG_DRM_TEGRA) += drm/drm.o drm/fb.o drm/dc.o drm/host1x.o +host1x-$(CONFIG_DRM_TEGRA) += drm/output.o drm/rgb.o drm/hdmi.o obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
Can this be moved into a separate Makefile in the drm subdirectory?
I tried, and kernel build helpfully created two .ko files. As having cyclic dependencies between two modules isn't nice, I merged them to same module and that seemed to force merging Makefile.
If anybody has an idea on how to do it otherwise, I'd be happy to keep the Makefiles separate.
diff --git a/drivers/gpu/host1x/host1x_client.h b/drivers/gpu/host1x/host1x_client.h
[...]
new file mode 100644 index 0000000..fdd2920 --- /dev/null +++ b/drivers/gpu/host1x/host1x_client.h @@ -0,0 +1,25 @@ +/*
- Copyright (c) 2013, NVIDIA Corporation.
- This program is free software; you can redistribute it and/or modify it
- under the terms and conditions of the GNU General Public License,
- version 2, as published by the Free Software Foundation.
- This program is distributed in the hope it will be useful, but WITHOUT
- ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
- FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
- more details.
- You should have received a copy of the GNU General Public License
- along with this program. If not, see http://www.gnu.org/licenses/.
- */
+#ifndef HOST1X_CLIENT_H +#define HOST1X_CLIENT_H
+struct platform_device;
+void host1x_set_drm_data(struct platform_device *pdev, void *data); +void *host1x_get_drm_data(struct platform_device *pdev);
+#endif
These aren't defined or used yet.
Hmm, right, they would go to patch 7.
Terje
On Mon, Feb 04, 2013 at 08:45:36PM -0800, Terje Bergström wrote:
On 04.02.2013 03:08, Thierry Reding wrote:
- PGP Signed by an unknown key
On Tue, Jan 15, 2013 at 01:44:01PM +0200, Terje Bergstrom wrote: [...]
diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile index 697d49a..ffc8bf1 100644 --- a/drivers/gpu/host1x/Makefile +++ b/drivers/gpu/host1x/Makefile @@ -12,4 +12,10 @@ host1x-y = \ hw/host1x01.o
host1x-$(CONFIG_TEGRA_HOST1X_CMA) += cma.o
+ccflags-y += -Iinclude/drm +ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
+host1x-$(CONFIG_DRM_TEGRA) += drm/drm.o drm/fb.o drm/dc.o drm/host1x.o +host1x-$(CONFIG_DRM_TEGRA) += drm/output.o drm/rgb.o drm/hdmi.o obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
Can this be moved into a separate Makefile in the drm subdirectory?
I tried, and kernel build helpfully created two .ko files. As having cyclic dependencies between two modules isn't nice, I merged them to same module and that seemed to force merging Makefile.
If anybody has an idea on how to do it otherwise, I'd be happy to keep the Makefiles separate.
Okay, I'll take a look.
Thierry
Remove second host1x driver, and bind tegra-drm to the new host1x driver. The logic to parse device tree and track clients is moved to drm.c.
Signed-off-by: Terje Bergstrom tbergstrom@nvidia.com --- drivers/gpu/host1x/Makefile | 2 +- drivers/gpu/host1x/dev.c | 58 +++++++++- drivers/gpu/host1x/dev.h | 6 + drivers/gpu/host1x/drm/Kconfig | 2 +- drivers/gpu/host1x/drm/dc.c | 7 +- drivers/gpu/host1x/drm/drm.c | 213 +++++++++++++++++++++++++++++++++++- drivers/gpu/host1x/drm/drm.h | 3 - drivers/gpu/host1x/drm/hdmi.c | 7 +- drivers/gpu/host1x/host1x_client.h | 9 ++ 9 files changed, 294 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile index ffc8bf1..c35ee19 100644 --- a/drivers/gpu/host1x/Makefile +++ b/drivers/gpu/host1x/Makefile @@ -16,6 +16,6 @@ host1x-$(CONFIG_TEGRA_HOST1X_CMA) += cma.o ccflags-y += -Iinclude/drm ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
-host1x-$(CONFIG_DRM_TEGRA) += drm/drm.o drm/fb.o drm/dc.o drm/host1x.o +host1x-$(CONFIG_DRM_TEGRA) += drm/drm.o drm/fb.o drm/dc.o host1x-$(CONFIG_DRM_TEGRA) += drm/output.o drm/rgb.o drm/hdmi.o obj-$(CONFIG_TEGRA_HOST1X) += host1x.o diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index 5aa7d28..17ee01c 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -28,12 +28,25 @@ #include "channel.h" #include "debug.h" #include "hw/host1x01.h" +#include "host1x_client.h"
#define CREATE_TRACE_POINTS #include <trace/events/host1x.h>
#define DRIVER_NAME "tegra-host1x"
+void host1x_set_drm_data(struct platform_device *pdev, void *data) +{ + struct host1x *host1x = platform_get_drvdata(pdev); + host1x->drm_data = data; +} + +void *host1x_get_drm_data(struct platform_device *pdev) +{ + struct host1x *host1x = platform_get_drvdata(pdev); + return host1x->drm_data; +} + void host1x_sync_writel(struct host1x *host1x, u32 v, u32 r) { void __iomem *sync_regs = host1x->regs + host1x->info.sync_offset; @@ -153,6 +166,8 @@ static int host1x_probe(struct platform_device *dev)
host1x_debug_init(host);
+ host1x_drm_alloc(dev); + dev_info(&dev->dev, "initialized\n");
return 0; @@ -173,7 +188,7 @@ static int __exit host1x_remove(struct platform_device *dev) return 0; }
-static struct platform_driver platform_driver = { +static struct platform_driver tegra_host1x_driver = { .probe = host1x_probe, .remove = __exit_p(host1x_remove), .driver = { @@ -183,8 +198,47 @@ static struct platform_driver platform_driver = { }, };
-module_platform_driver(platform_driver); +static int __init tegra_host1x_init(void) +{ + int err; + + err = platform_driver_register(&tegra_host1x_driver); + if (err < 0) + return err; + +#ifdef CONFIG_TEGRA_DRM + err = platform_driver_register(&tegra_dc_driver); + if (err < 0) + goto unregister_host1x; + + err = platform_driver_register(&tegra_hdmi_driver); + if (err < 0) + goto unregister_dc; +#endif + + return 0; + +#ifdef CONFIG_TEGRA_DRM +unregister_dc: + platform_driver_unregister(&tegra_dc_driver); +unregister_host1x: + platform_driver_unregister(&tegra_host1x_driver); + return err; +#endif +} +module_init(tegra_host1x_init); + +static void __exit tegra_host1x_exit(void) +{ +#ifdef CONFIG_TEGRA_DRM + platform_driver_unregister(&tegra_hdmi_driver); + platform_driver_unregister(&tegra_dc_driver); +#endif + platform_driver_unregister(&tegra_host1x_driver); +} +module_exit(tegra_host1x_exit);
+MODULE_AUTHOR("Thierry Reding thierry.reding@avionic-design.de"); MODULE_AUTHOR("Terje Bergstrom tbergstrom@nvidia.com"); MODULE_DESCRIPTION("Host1x driver for Tegra products"); MODULE_LICENSE("GPL"); diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h index 467a92e..ff3a365 100644 --- a/drivers/gpu/host1x/dev.h +++ b/drivers/gpu/host1x/dev.h @@ -142,6 +142,8 @@ struct host1x { int allocated_channels;
struct dentry *debugfs; + + void *drm_data; };
static inline @@ -161,4 +163,8 @@ u32 host1x_sync_readl(struct host1x *host1x, u32 r); void host1x_ch_writel(struct host1x_channel *ch, u32 r, u32 v); u32 host1x_ch_readl(struct host1x_channel *ch, u32 r);
+extern struct platform_driver tegra_hdmi_driver; +extern struct platform_driver tegra_dc_driver; +extern struct platform_driver tegra_gr2d_driver; + #endif diff --git a/drivers/gpu/host1x/drm/Kconfig b/drivers/gpu/host1x/drm/Kconfig index be1daf7..7db9b3a 100644 --- a/drivers/gpu/host1x/drm/Kconfig +++ b/drivers/gpu/host1x/drm/Kconfig @@ -1,5 +1,5 @@ config DRM_TEGRA - tristate "NVIDIA Tegra DRM" + bool "NVIDIA Tegra DRM" depends on DRM && OF && ARCH_TEGRA select DRM_KMS_HELPER select DRM_GEM_CMA_HELPER diff --git a/drivers/gpu/host1x/drm/dc.c b/drivers/gpu/host1x/drm/dc.c index 656b2e3..ac31e96 100644 --- a/drivers/gpu/host1x/drm/dc.c +++ b/drivers/gpu/host1x/drm/dc.c @@ -17,6 +17,7 @@
#include "drm.h" #include "dc.h" +#include "host1x_client.h"
struct tegra_dc_window { fixed20_12 x; @@ -736,7 +737,8 @@ static const struct host1x_client_ops dc_client_ops = {
static int tegra_dc_probe(struct platform_device *pdev) { - struct host1x *host1x = dev_get_drvdata(pdev->dev.parent); + struct host1x *host1x = + host1x_get_drm_data(to_platform_device(pdev->dev.parent)); struct resource *regs; struct tegra_dc *dc; int err; @@ -800,7 +802,8 @@ static int tegra_dc_probe(struct platform_device *pdev)
static int tegra_dc_remove(struct platform_device *pdev) { - struct host1x *host1x = dev_get_drvdata(pdev->dev.parent); + struct host1x *host1x = + host1x_get_drm_data(to_platform_device(pdev->dev.parent)); struct tegra_dc *dc = platform_get_drvdata(pdev); int err;
diff --git a/drivers/gpu/host1x/drm/drm.c b/drivers/gpu/host1x/drm/drm.c index 3a503c9..bef9051 100644 --- a/drivers/gpu/host1x/drm/drm.c +++ b/drivers/gpu/host1x/drm/drm.c @@ -16,6 +16,7 @@ #include <asm/dma-iommu.h>
#include "drm.h" +#include "host1x_client.h"
#define DRIVER_NAME "tegra" #define DRIVER_DESC "NVIDIA Tegra graphics" @@ -24,13 +25,221 @@ #define DRIVER_MINOR 0 #define DRIVER_PATCHLEVEL 0
+struct host1x_drm_client { + struct host1x_client *client; + struct device_node *np; + struct list_head list; +}; + +static int host1x_add_drm_client(struct host1x *host1x, struct device_node *np) +{ + struct host1x_drm_client *client; + + client = kzalloc(sizeof(*client), GFP_KERNEL); + if (!client) + return -ENOMEM; + + INIT_LIST_HEAD(&client->list); + client->np = of_node_get(np); + + list_add_tail(&client->list, &host1x->drm_clients); + + return 0; +} + +static int host1x_activate_drm_client(struct host1x *host1x, + struct host1x_drm_client *drm, + struct host1x_client *client) +{ + mutex_lock(&host1x->drm_clients_lock); + list_del_init(&drm->list); + list_add_tail(&drm->list, &host1x->drm_active); + drm->client = client; + mutex_unlock(&host1x->drm_clients_lock); + + return 0; +} + +static int host1x_remove_drm_client(struct host1x *host1x, + struct host1x_drm_client *client) +{ + mutex_lock(&host1x->drm_clients_lock); + list_del_init(&client->list); + mutex_unlock(&host1x->drm_clients_lock); + + of_node_put(client->np); + kfree(client); + + return 0; +} + +static int host1x_parse_dt(struct host1x *host1x) +{ + static const char * const compat[] = { + "nvidia,tegra20-dc", + "nvidia,tegra20-hdmi", + "nvidia,tegra30-dc", + "nvidia,tegra30-hdmi", + }; + unsigned int i; + int err; + + for (i = 0; i < ARRAY_SIZE(compat); i++) { + struct device_node *np; + + for_each_child_of_node(host1x->dev->of_node, np) { + if (of_device_is_compatible(np, compat[i]) && + of_device_is_available(np)) { + err = host1x_add_drm_client(host1x, np); + if (err < 0) + return err; + } + } + } + + return 0; +} + +int host1x_drm_alloc(struct platform_device *pdev) +{ + struct host1x *host1x; + int err; + + host1x = devm_kzalloc(&pdev->dev, sizeof(*host1x), GFP_KERNEL); + if (!host1x) + return -ENOMEM; + + mutex_init(&host1x->drm_clients_lock); + INIT_LIST_HEAD(&host1x->drm_clients); + INIT_LIST_HEAD(&host1x->drm_active); + mutex_init(&host1x->clients_lock); + INIT_LIST_HEAD(&host1x->clients); + host1x->dev = &pdev->dev; + + err = host1x_parse_dt(host1x); + if (err < 0) { + dev_err(&pdev->dev, "failed to parse DT: %d\n", err); + return err; + } + + host1x_set_drm_data(pdev, host1x); + + return 0; +} + +int host1x_drm_init(struct host1x *host1x, struct drm_device *drm) +{ + struct host1x_client *client; + + mutex_lock(&host1x->clients_lock); + + list_for_each_entry(client, &host1x->clients, list) { + if (client->ops && client->ops->drm_init) { + int err = client->ops->drm_init(client, drm); + if (err < 0) { + dev_err(host1x->dev, + "DRM setup failed for %s: %d\n", + dev_name(client->dev), err); + return err; + } + } + } + + mutex_unlock(&host1x->clients_lock); + + return 0; +} + +int host1x_drm_exit(struct host1x *host1x) +{ + struct platform_device *pdev = to_platform_device(host1x->dev); + struct host1x_client *client; + + if (!host1x->drm) + return 0; + + mutex_lock(&host1x->clients_lock); + + list_for_each_entry_reverse(client, &host1x->clients, list) { + if (client->ops && client->ops->drm_exit) { + int err = client->ops->drm_exit(client); + if (err < 0) { + dev_err(host1x->dev, + "DRM cleanup failed for %s: %d\n", + dev_name(client->dev), err); + return err; + } + } + } + + mutex_unlock(&host1x->clients_lock); + + drm_platform_exit(&tegra_drm_driver, pdev); + host1x->drm = NULL; + + return 0; +} + +int host1x_register_client(struct host1x *host1x, struct host1x_client *client) +{ + struct host1x_drm_client *drm, *tmp; + int err; + + mutex_lock(&host1x->clients_lock); + list_add_tail(&client->list, &host1x->clients); + mutex_unlock(&host1x->clients_lock); + + list_for_each_entry_safe(drm, tmp, &host1x->drm_clients, list) + if (drm->np == client->dev->of_node) + host1x_activate_drm_client(host1x, drm, client); + + if (list_empty(&host1x->drm_clients)) { + struct platform_device *pdev = to_platform_device(host1x->dev); + + err = drm_platform_init(&tegra_drm_driver, pdev); + if (err < 0) { + dev_err(host1x->dev, "drm_platform_init(): %d\n", err); + return err; + } + } + + return 0; +} + +int host1x_unregister_client(struct host1x *host1x, + struct host1x_client *client) +{ + struct host1x_drm_client *drm, *tmp; + int err; + + list_for_each_entry_safe(drm, tmp, &host1x->drm_active, list) { + if (drm->client == client) { + err = host1x_drm_exit(host1x); + if (err < 0) { + dev_err(host1x->dev, "host1x_drm_exit(): %d\n", + err); + return err; + } + + host1x_remove_drm_client(host1x, drm); + break; + } + } + + mutex_lock(&host1x->clients_lock); + list_del_init(&client->list); + mutex_unlock(&host1x->clients_lock); + + return 0; +} + static int tegra_drm_load(struct drm_device *drm, unsigned long flags) { - struct device *dev = drm->dev; + struct platform_device *pdev = to_platform_device(drm->dev); struct host1x *host1x; int err;
- host1x = dev_get_drvdata(dev); + host1x = host1x_get_drm_data(pdev); drm->dev_private = host1x; host1x->drm = drm;
diff --git a/drivers/gpu/host1x/drm/drm.h b/drivers/gpu/host1x/drm/drm.h index e68b4ac..e7101d5 100644 --- a/drivers/gpu/host1x/drm/drm.h +++ b/drivers/gpu/host1x/drm/drm.h @@ -208,9 +208,6 @@ extern int tegra_output_exit(struct tegra_output *output); extern int tegra_drm_fb_init(struct drm_device *drm); extern void tegra_drm_fb_exit(struct drm_device *drm);
-extern struct platform_driver tegra_host1x_driver; -extern struct platform_driver tegra_hdmi_driver; -extern struct platform_driver tegra_dc_driver; extern struct drm_driver tegra_drm_driver;
#endif /* HOST1X_DRM_H */ diff --git a/drivers/gpu/host1x/drm/hdmi.c b/drivers/gpu/host1x/drm/hdmi.c index e060c7e..2f1e7b4 100644 --- a/drivers/gpu/host1x/drm/hdmi.c +++ b/drivers/gpu/host1x/drm/hdmi.c @@ -20,6 +20,7 @@ #include "hdmi.h" #include "drm.h" #include "dc.h" +#include "host1x_client.h"
struct tegra_hdmi { struct host1x_client client; @@ -1198,7 +1199,8 @@ static const struct host1x_client_ops hdmi_client_ops = {
static int tegra_hdmi_probe(struct platform_device *pdev) { - struct host1x *host1x = dev_get_drvdata(pdev->dev.parent); + struct host1x *host1x = + host1x_get_drm_data(to_platform_device(pdev->dev.parent)); struct tegra_hdmi *hdmi; struct resource *regs; int err; @@ -1287,7 +1289,8 @@ static int tegra_hdmi_probe(struct platform_device *pdev)
static int tegra_hdmi_remove(struct platform_device *pdev) { - struct host1x *host1x = dev_get_drvdata(pdev->dev.parent); + struct host1x *host1x = + host1x_get_drm_data(to_platform_device(pdev->dev.parent)); struct tegra_hdmi *hdmi = platform_get_drvdata(pdev); int err;
diff --git a/drivers/gpu/host1x/host1x_client.h b/drivers/gpu/host1x/host1x_client.h index fdd2920..938df7e 100644 --- a/drivers/gpu/host1x/host1x_client.h +++ b/drivers/gpu/host1x/host1x_client.h @@ -19,6 +19,15 @@
struct platform_device;
+#ifdef CONFIG_DRM_TEGRA +int host1x_drm_alloc(struct platform_device *pdev); +#else +static inline int host1x_drm_alloc(struct platform_device *pdev) +{ + return 0; +} +#endif + void host1x_set_drm_data(struct platform_device *pdev, void *data); void *host1x_get_drm_data(struct platform_device *pdev);
On Tue, Jan 15, 2013 at 01:44:02PM +0200, Terje Bergstrom wrote: [...]
+void host1x_set_drm_data(struct platform_device *pdev, void *data) +{
- struct host1x *host1x = platform_get_drvdata(pdev);
- host1x->drm_data = data;
+}
+void *host1x_get_drm_data(struct platform_device *pdev) +{
- struct host1x *host1x = platform_get_drvdata(pdev);
- return host1x->drm_data;
+}
Passing around struct device * should be enough and avoids the need for the explicit cast to struct platform_device.
It is a bit unfortunate that we have now have two structures called host1x, but I think we can live with it for now. We can clean that up once the code has been merged.
Thierry
Add a driver alias gr2d for Tegra 2D device, and assign a duplicate of 2D clock to that driver alias.
Signed-off-by: Terje Bergstrom tbergstrom@nvidia.com --- arch/arm/mach-tegra/board-dt-tegra20.c | 1 + arch/arm/mach-tegra/board-dt-tegra30.c | 1 + arch/arm/mach-tegra/tegra20_clocks_data.c | 2 +- arch/arm/mach-tegra/tegra30_clocks_data.c | 1 + 4 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/arm/mach-tegra/board-dt-tegra20.c b/arch/arm/mach-tegra/board-dt-tegra20.c index 171ba3c..9fcc800 100644 --- a/arch/arm/mach-tegra/board-dt-tegra20.c +++ b/arch/arm/mach-tegra/board-dt-tegra20.c @@ -96,6 +96,7 @@ static struct of_dev_auxdata tegra20_auxdata_lookup[] __initdata = { OF_DEV_AUXDATA("nvidia,tegra20-slink", 0x7000D800, "spi_tegra.2", NULL), OF_DEV_AUXDATA("nvidia,tegra20-slink", 0x7000DA00, "spi_tegra.3", NULL), OF_DEV_AUXDATA("nvidia,tegra20-host1x", 0x50000000, "host1x", NULL), + OF_DEV_AUXDATA("nvidia,tegra20-gr2d", 0x54140000, "gr2d", NULL), OF_DEV_AUXDATA("nvidia,tegra20-dc", 0x54200000, "tegradc.0", NULL), OF_DEV_AUXDATA("nvidia,tegra20-dc", 0x54240000, "tegradc.1", NULL), OF_DEV_AUXDATA("nvidia,tegra20-hdmi", 0x54280000, "hdmi", NULL), diff --git a/arch/arm/mach-tegra/board-dt-tegra30.c b/arch/arm/mach-tegra/board-dt-tegra30.c index cfe5fc0..0b4a1f0 100644 --- a/arch/arm/mach-tegra/board-dt-tegra30.c +++ b/arch/arm/mach-tegra/board-dt-tegra30.c @@ -59,6 +59,7 @@ static struct of_dev_auxdata tegra30_auxdata_lookup[] __initdata = { OF_DEV_AUXDATA("nvidia,tegra30-slink", 0x7000DC00, "spi_tegra.4", NULL), OF_DEV_AUXDATA("nvidia,tegra30-slink", 0x7000DE00, "spi_tegra.5", NULL), OF_DEV_AUXDATA("nvidia,tegra30-host1x", 0x50000000, "host1x", NULL), + OF_DEV_AUXDATA("nvidia,tegra30-gr2d", 0x54140000, "gr2d", NULL), OF_DEV_AUXDATA("nvidia,tegra30-dc", 0x54200000, "tegradc.0", NULL), OF_DEV_AUXDATA("nvidia,tegra30-dc", 0x54240000, "tegradc.1", NULL), OF_DEV_AUXDATA("nvidia,tegra30-hdmi", 0x54280000, "hdmi", NULL), diff --git a/arch/arm/mach-tegra/tegra20_clocks_data.c b/arch/arm/mach-tegra/tegra20_clocks_data.c index a23a073..15d440a 100644 --- a/arch/arm/mach-tegra/tegra20_clocks_data.c +++ b/arch/arm/mach-tegra/tegra20_clocks_data.c @@ -1041,7 +1041,7 @@ static struct clk_duplicate tegra_clk_duplicates[] = { CLK_DUPLICATE("usbd", "utmip-pad", NULL), CLK_DUPLICATE("usbd", "tegra-ehci.0", NULL), CLK_DUPLICATE("usbd", "tegra-otg", NULL), - CLK_DUPLICATE("2d", "tegra_grhost", "gr2d"), + CLK_DUPLICATE("2d", "gr2d", "gr2d"), CLK_DUPLICATE("3d", "tegra_grhost", "gr3d"), CLK_DUPLICATE("epp", "tegra_grhost", "epp"), CLK_DUPLICATE("mpe", "tegra_grhost", "mpe"), diff --git a/arch/arm/mach-tegra/tegra30_clocks_data.c b/arch/arm/mach-tegra/tegra30_clocks_data.c index 741d264..5c4b7b7 100644 --- a/arch/arm/mach-tegra/tegra30_clocks_data.c +++ b/arch/arm/mach-tegra/tegra30_clocks_data.c @@ -1338,6 +1338,7 @@ static struct clk_duplicate tegra_clk_duplicates[] = { CLK_DUPLICATE("pll_p", "tegradc.0", "parent"), CLK_DUPLICATE("pll_p", "tegradc.1", "parent"), CLK_DUPLICATE("pll_d2_out0", "hdmi", "parent"), + CLK_DUPLICATE("2d", "gr2d", "gr2d"), };
static struct clk *tegra_ptr_clks[] = {
On Tue, Jan 15, 2013 at 01:44:03PM +0200, Terje Bergstrom wrote:
Add a driver alias gr2d for Tegra 2D device, and assign a duplicate of 2D clock to that driver alias.
Signed-off-by: Terje Bergstrom tbergstrom@nvidia.com
arch/arm/mach-tegra/board-dt-tegra20.c | 1 + arch/arm/mach-tegra/board-dt-tegra30.c | 1 + arch/arm/mach-tegra/tegra20_clocks_data.c | 2 +- arch/arm/mach-tegra/tegra30_clocks_data.c | 1 + 4 files changed, 4 insertions(+), 1 deletion(-)
With Prashant's clock rework patches now merged this patch can be dropped.
Thierry
On 02/04/2013 04:26 AM, Thierry Reding wrote:
On Tue, Jan 15, 2013 at 01:44:03PM +0200, Terje Bergstrom wrote:
Add a driver alias gr2d for Tegra 2D device, and assign a duplicate of 2D clock to that driver alias.
Signed-off-by: Terje Bergstrom tbergstrom@nvidia.com --- arch/arm/mach-tegra/board-dt-tegra20.c | 1 + arch/arm/mach-tegra/board-dt-tegra30.c | 1 + arch/arm/mach-tegra/tegra20_clocks_data.c | 2 +- arch/arm/mach-tegra/tegra30_clocks_data.c | 1 + 4 files changed, 4 insertions(+), 1 deletion(-)
With Prashant's clock rework patches now merged this patch can be dropped.
Assuming this series is applied for 3.10 and not earlier, yes. I'd certainly recommend applying for 3.10 not 3.9; the dependencies to apply this for 3.9 given the AUXDATA/... requirements would be too painful.
On 04.02.2013 03:26, Thierry Reding wrote:
- PGP Signed by an unknown key
On Tue, Jan 15, 2013 at 01:44:03PM +0200, Terje Bergstrom wrote:
Add a driver alias gr2d for Tegra 2D device, and assign a duplicate of 2D clock to that driver alias.
Signed-off-by: Terje Bergstrom tbergstrom@nvidia.com
arch/arm/mach-tegra/board-dt-tegra20.c | 1 + arch/arm/mach-tegra/board-dt-tegra30.c | 1 + arch/arm/mach-tegra/tegra20_clocks_data.c | 2 +- arch/arm/mach-tegra/tegra30_clocks_data.c | 1 + 4 files changed, 4 insertions(+), 1 deletion(-)
With Prashant's clock rework patches now merged this patch can be dropped.
Yes, and I'll also need to start calling 2D clock with name NULL in gr2d.c.
Terje
Add client driver for 2D device, and IOCTLs to pass work to host1x channel for 2D.
Also adds functions that can be called to access sync points from DRM.
Signed-off-by: Terje Bergstrom tbergstrom@nvidia.com --- drivers/gpu/host1x/Makefile | 1 + drivers/gpu/host1x/dev.c | 7 + drivers/gpu/host1x/drm/drm.c | 226 +++++++++++++++++++++++++++- drivers/gpu/host1x/drm/drm.h | 28 ++++ drivers/gpu/host1x/drm/gr2d.c | 325 +++++++++++++++++++++++++++++++++++++++++ drivers/gpu/host1x/syncpt.c | 5 + drivers/gpu/host1x/syncpt.h | 3 + include/drm/tegra_drm.h | 131 +++++++++++++++++ 8 files changed, 725 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/host1x/drm/gr2d.c create mode 100644 include/drm/tegra_drm.h
diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile index c35ee19..c2120ad 100644 --- a/drivers/gpu/host1x/Makefile +++ b/drivers/gpu/host1x/Makefile @@ -18,4 +18,5 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
host1x-$(CONFIG_DRM_TEGRA) += drm/drm.o drm/fb.o drm/dc.o host1x-$(CONFIG_DRM_TEGRA) += drm/output.o drm/rgb.o drm/hdmi.o +host1x-$(CONFIG_DRM_TEGRA) += drm/gr2d.o obj-$(CONFIG_TEGRA_HOST1X) += host1x.o diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c index 17ee01c..40d9938 100644 --- a/drivers/gpu/host1x/dev.c +++ b/drivers/gpu/host1x/dev.c @@ -214,11 +214,17 @@ static int __init tegra_host1x_init(void) err = platform_driver_register(&tegra_hdmi_driver); if (err < 0) goto unregister_dc; + + err = platform_driver_register(&tegra_gr2d_driver); + if (err < 0) + goto unregister_hdmi; #endif
return 0;
#ifdef CONFIG_TEGRA_DRM +unregister_hdmi: + platform_driver_unregister(&tegra_hdmi_driver); unregister_dc: platform_driver_unregister(&tegra_dc_driver); unregister_host1x: @@ -231,6 +237,7 @@ module_init(tegra_host1x_init); static void __exit tegra_host1x_exit(void) { #ifdef CONFIG_TEGRA_DRM + platform_driver_unregister(&tegra_gr2d_driver); platform_driver_unregister(&tegra_hdmi_driver); platform_driver_unregister(&tegra_dc_driver); #endif diff --git a/drivers/gpu/host1x/drm/drm.c b/drivers/gpu/host1x/drm/drm.c index bef9051..f8f8508 100644 --- a/drivers/gpu/host1x/drm/drm.c +++ b/drivers/gpu/host1x/drm/drm.c @@ -14,9 +14,11 @@ #include <mach/clk.h> #include <linux/dma-mapping.h> #include <asm/dma-iommu.h> +#include <drm/tegra_drm.h>
#include "drm.h" #include "host1x_client.h" +#include "syncpt.h"
#define DRIVER_NAME "tegra" #define DRIVER_DESC "NVIDIA Tegra graphics" @@ -78,8 +80,10 @@ static int host1x_parse_dt(struct host1x *host1x) static const char * const compat[] = { "nvidia,tegra20-dc", "nvidia,tegra20-hdmi", + "nvidia,tegra20-gr2d", "nvidia,tegra30-dc", "nvidia,tegra30-hdmi", + "nvidia,tegra30-gr2d", }; unsigned int i; int err; @@ -270,7 +274,29 @@ static int tegra_drm_unload(struct drm_device *drm)
static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp) { - return 0; + struct host1x_drm_fpriv *fpriv; + int err = 0; + + fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL); + if (!fpriv) + return -ENOMEM; + + INIT_LIST_HEAD(&fpriv->contexts); + filp->driver_priv = fpriv; + + return err; +} + +static void tegra_drm_close(struct drm_device *drm, struct drm_file *filp) +{ + struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(filp); + struct host1x_drm_context *context, *tmp; + + list_for_each_entry_safe(context, tmp, &fpriv->contexts, list) { + context->client->ops->close_channel(context); + kfree(context); + } + kfree(fpriv); }
static void tegra_drm_lastclose(struct drm_device *drm) @@ -280,7 +306,204 @@ static void tegra_drm_lastclose(struct drm_device *drm) drm_fbdev_cma_restore_mode(host1x->fbdev); }
+static int +tegra_drm_ioctl_syncpt_read(struct drm_device *drm, void *data, + struct drm_file *file_priv) +{ + struct host1x *host1x = drm->dev_private; + struct tegra_drm_syncpt_read_args *args = data; + struct host1x_syncpt *sp = + host1x_syncpt_get_bydev(host1x->dev, args->id); + + if (!sp) + return -EINVAL; + + args->value = host1x_syncpt_read_min(sp); + return 0; +} + +static int +tegra_drm_ioctl_syncpt_incr(struct drm_device *drm, void *data, + struct drm_file *file_priv) +{ + struct host1x *host1x = drm->dev_private; + struct tegra_drm_syncpt_incr_args *args = data; + struct host1x_syncpt *sp = + host1x_syncpt_get_bydev(host1x->dev, args->id); + + if (!sp) + return -EINVAL; + + host1x_syncpt_incr(sp); + return 0; +} + +static int +tegra_drm_ioctl_syncpt_wait(struct drm_device *drm, void *data, + struct drm_file *file_priv) +{ + struct host1x *host1x = drm->dev_private; + struct tegra_drm_syncpt_wait_args *args = data; + struct host1x_syncpt *sp = + host1x_syncpt_get_bydev(host1x->dev, args->id); + + if (!sp) + return -EINVAL; + + return host1x_syncpt_wait(sp, args->thresh, + args->timeout, &args->value); +} + +static int +tegra_drm_ioctl_open_channel(struct drm_device *drm, void *data, + struct drm_file *file_priv) +{ + struct tegra_drm_open_channel_args *args = data; + struct host1x_client *client; + struct host1x_drm_context *context; + struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv); + struct host1x *host1x = drm->dev_private; + int err = 0; + + context = kzalloc(sizeof(*context), GFP_KERNEL); + if (!context) + return -ENOMEM; + + list_for_each_entry(client, &host1x->clients, list) { + if (client->class == args->class) { + context->client = client; + err = client->ops->open_channel(client, context); + if (err) + goto out; + + list_add(&context->list, &fpriv->contexts); + args->context = (uintptr_t)context; + goto out; + } + } + err = -ENODEV; + +out: + if (err) + kfree(context); + + return err; +} + +static int +tegra_drm_ioctl_close_channel(struct drm_device *drm, void *data, + struct drm_file *file_priv) +{ + struct tegra_drm_open_channel_args *args = data; + struct host1x_drm_context *context, *tmp; + struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv); + int err = 0; + + list_for_each_entry_safe(context, tmp, &fpriv->contexts, list) { + if ((uintptr_t)context == args->context) { + context->client->ops->close_channel(context); + list_del(&context->list); + kfree(context); + goto out; + } + } + err = -EINVAL; + +out: + return err; +} + +static int +tegra_drm_ioctl_get_syncpoint(struct drm_device *drm, void *data, + struct drm_file *file_priv) +{ + struct tegra_drm_get_channel_param_args *args = data; + struct host1x_drm_context *context; + struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv); + int err = 0; + + list_for_each_entry(context, &fpriv->contexts, list) { + if ((uintptr_t)context == args->context) { + args->value = + context->client->ops->get_syncpoint(context, + args->param); + goto out; + } + } + err = -ENODEV; + +out: + return err; +} + +static int +tegra_drm_ioctl_submit(struct drm_device *drm, void *data, + struct drm_file *file_priv) +{ + struct tegra_drm_submit_args *args = data; + struct host1x_drm_context *context; + struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv); + int err = 0; + + list_for_each_entry(context, &fpriv->contexts, list) { + if ((uintptr_t)context == args->context) { + err = context->client->ops->submit(context, args, drm, + file_priv); + goto out; + } + } + err = -ENODEV; + +out: + return err; + +} + +static int +tegra_drm_create_ioctl(struct drm_device *drm, void *data, + struct drm_file *file_priv) +{ + struct tegra_gem_create *args = data; + struct drm_gem_cma_object *cma_obj; + int ret; + + cma_obj = drm_gem_cma_create(drm, args->size); + if (IS_ERR(cma_obj)) + goto err_cma_create; + + ret = drm_gem_handle_create(file_priv, &cma_obj->base, &args->handle); + if (ret) + goto err_handle_create; + + args->offset = cma_obj->base.map_list.hash.key << PAGE_SHIFT; + + drm_gem_object_unreference(&cma_obj->base); + + return 0; + +err_handle_create: + drm_gem_cma_free_object(&cma_obj->base); +err_cma_create: + return -ENOMEM; +} + static struct drm_ioctl_desc tegra_drm_ioctls[] = { + DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, + tegra_drm_create_ioctl, DRM_UNLOCKED | DRM_AUTH), + DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_READ, + tegra_drm_ioctl_syncpt_read, DRM_UNLOCKED), + DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_INCR, + tegra_drm_ioctl_syncpt_incr, DRM_UNLOCKED), + DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_WAIT, + tegra_drm_ioctl_syncpt_wait, DRM_UNLOCKED), + DRM_IOCTL_DEF_DRV(TEGRA_DRM_OPEN_CHANNEL, + tegra_drm_ioctl_open_channel, DRM_UNLOCKED), + DRM_IOCTL_DEF_DRV(TEGRA_DRM_CLOSE_CHANNEL, + tegra_drm_ioctl_close_channel, DRM_UNLOCKED), + DRM_IOCTL_DEF_DRV(TEGRA_DRM_GET_SYNCPOINT, + tegra_drm_ioctl_get_syncpoint, DRM_UNLOCKED), + DRM_IOCTL_DEF_DRV(TEGRA_DRM_SUBMIT, + tegra_drm_ioctl_submit, DRM_UNLOCKED), };
static const struct file_operations tegra_drm_fops = { @@ -303,6 +526,7 @@ struct drm_driver tegra_drm_driver = { .load = tegra_drm_load, .unload = tegra_drm_unload, .open = tegra_drm_open, + .preclose = tegra_drm_close, .lastclose = tegra_drm_lastclose,
.gem_free_object = drm_gem_cma_free_object, diff --git a/drivers/gpu/host1x/drm/drm.h b/drivers/gpu/host1x/drm/drm.h index e7101d5..dc4c128 100644 --- a/drivers/gpu/host1x/drm/drm.h +++ b/drivers/gpu/host1x/drm/drm.h @@ -17,6 +17,7 @@ #include <drm/drm_gem_cma_helper.h> #include <drm/drm_fb_cma_helper.h> #include <drm/drm_fixed.h> +#include <drm/tegra_drm.h>
struct tegra_framebuffer { struct drm_framebuffer base; @@ -49,17 +50,44 @@ struct host1x {
struct host1x_client;
+struct host1x_drm_context { + struct host1x_client *client; + struct host1x_channel *channel; + struct list_head list; +}; + struct host1x_client_ops { int (*drm_init)(struct host1x_client *client, struct drm_device *drm); int (*drm_exit)(struct host1x_client *client); + int (*open_channel)(struct host1x_client *, + struct host1x_drm_context *); + void (*close_channel)(struct host1x_drm_context *); + u32 (*get_syncpoint)(struct host1x_drm_context *, int index); + int (*submit)(struct host1x_drm_context *, + struct tegra_drm_submit_args *, + struct drm_device *, + struct drm_file *); +}; + +struct host1x_drm_fpriv { + struct list_head contexts; };
+static inline struct host1x_drm_fpriv * +host1x_drm_fpriv(struct drm_file *file_priv) +{ + return file_priv ? file_priv->driver_priv : NULL; +} + struct host1x_client { struct host1x *host1x; struct device *dev;
const struct host1x_client_ops *ops;
+ u32 class; + struct host1x_channel *channel; + struct list_head list; };
diff --git a/drivers/gpu/host1x/drm/gr2d.c b/drivers/gpu/host1x/drm/gr2d.c new file mode 100644 index 0000000..dc7d6c6 --- /dev/null +++ b/drivers/gpu/host1x/drm/gr2d.c @@ -0,0 +1,325 @@ +/* + * drivers/video/tegra/host/gr2d/gr2d.c + * + * Tegra Graphics 2D + * + * Copyright (c) 2012, NVIDIA Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#include <linux/export.h> +#include <linux/of.h> +#include <linux/of_device.h> +#include <linux/clk.h> +#include <drm/tegra_drm.h> +#include "drm.h" +#include "job.h" +#include "channel.h" +#include "host1x.h" +#include "syncpt.h" +#include "memmgr.h" +#include "host1x_client.h" + +struct gr2d { + struct host1x_client client; + struct clk *clk; + struct host1x_syncpt *syncpt; + struct host1x_channel *channel; +}; + +static int gr2d_is_addr_reg(struct platform_device *dev, u32 class, u32 reg); + +static int gr2d_client_init(struct host1x_client *client, + struct drm_device *drm) +{ + return 0; +} + +static int gr2d_client_exit(struct host1x_client *client) +{ + return 0; +} + +static int gr2d_open_channel(struct host1x_client *client, + struct host1x_drm_context *context) +{ + struct gr2d *gr2d = dev_get_drvdata(client->dev); + context->channel = host1x_channel_get(gr2d->channel); + + if (!context->channel) + return -ENOMEM; + + return 0; +} + +static void gr2d_close_channel(struct host1x_drm_context *context) +{ + host1x_channel_put(context->channel); +} + +static u32 gr2d_get_syncpoint(struct host1x_drm_context *context, int index) +{ + struct gr2d *gr2d = dev_get_drvdata(context->client->dev); + if (index != 0) + return UINT_MAX; + + return host1x_syncpt_id(gr2d->syncpt); +} + +static u32 handle_cma_to_host1x(struct drm_device *drm, + struct drm_file *file_priv, u32 gem_handle) +{ + struct drm_gem_object *obj; + struct drm_gem_cma_object *cma_obj; + u32 host1x_handle; + + obj = drm_gem_object_lookup(drm, file_priv, gem_handle); + if (!obj) + return 0; + + cma_obj = to_drm_gem_cma_obj(obj); + host1x_handle = host1x_memmgr_host1x_id(mem_mgr_type_cma, (u32)cma_obj); + drm_gem_object_unreference(obj); + + return host1x_handle; +} + +static int gr2d_submit(struct host1x_drm_context *context, + struct tegra_drm_submit_args *args, + struct drm_device *drm, + struct drm_file *file_priv) +{ + struct host1x_job *job; + int num_cmdbufs = args->num_cmdbufs; + int num_relocs = args->num_relocs; + int num_waitchks = args->num_waitchks; + struct tegra_drm_cmdbuf __user *cmdbufs = + (void * __user)(uintptr_t)args->cmdbufs; + struct tegra_drm_reloc __user *relocs = + (void * __user)(uintptr_t)args->relocs; + struct tegra_drm_waitchk __user *waitchks = + (void * __user)(uintptr_t)args->waitchks; + struct tegra_drm_syncpt_incr syncpt_incr; + int err; + + /* We don't yet support other than one syncpt_incr struct per submit */ + if (args->num_syncpt_incrs != 1) + return -EINVAL; + + job = host1x_job_alloc(context->channel, + args->num_cmdbufs, + args->num_relocs, + args->num_waitchks); + if (!job) + return -ENOMEM; + + job->num_relocs = args->num_relocs; + job->num_waitchk = args->num_waitchks; + job->clientid = (u32)args->context; + job->class = context->client->class; + job->serialize = true; + + while (num_cmdbufs) { + struct tegra_drm_cmdbuf cmdbuf; + err = copy_from_user(&cmdbuf, cmdbufs, sizeof(cmdbuf)); + if (err) + goto fail; + + cmdbuf.mem = handle_cma_to_host1x(drm, file_priv, cmdbuf.mem); + if (!cmdbuf.mem) + goto fail; + + host1x_job_add_gather(job, + cmdbuf.mem, cmdbuf.words, cmdbuf.offset); + num_cmdbufs--; + cmdbufs++; + } + + err = copy_from_user(job->relocarray, + relocs, sizeof(*relocs) * num_relocs); + if (err) + goto fail; + + while (num_relocs--) { + job->relocarray[num_relocs].cmdbuf_mem = + handle_cma_to_host1x(drm, file_priv, + job->relocarray[num_relocs].cmdbuf_mem); + job->relocarray[num_relocs].target = + handle_cma_to_host1x(drm, file_priv, + job->relocarray[num_relocs].target); + + if (!job->relocarray[num_relocs].target || + !job->relocarray[num_relocs].cmdbuf_mem) + goto fail; + } + + err = copy_from_user(job->waitchk, + waitchks, sizeof(*waitchks) * num_waitchks); + if (err) + goto fail; + + err = host1x_job_pin(job, to_platform_device(context->client->dev)); + if (err) + goto fail; + + err = copy_from_user(&syncpt_incr, + (void * __user)(uintptr_t)args->syncpt_incrs, + sizeof(syncpt_incr)); + if (err) + goto fail; + + job->syncpt_id = syncpt_incr.syncpt_id; + job->syncpt_incrs = syncpt_incr.syncpt_incrs; + job->timeout = 10000; + job->is_addr_reg = gr2d_is_addr_reg; + if (args->timeout && args->timeout < 10000) + job->timeout = args->timeout; + + err = host1x_channel_submit(job); + if (err) + goto fail_submit; + + args->fence = job->syncpt_end; + + host1x_job_put(job); + return 0; + +fail_submit: + host1x_job_unpin(job); +fail: + host1x_job_put(job); + return err; +} + +static struct host1x_client_ops gr2d_client_ops = { + .drm_init = gr2d_client_init, + .drm_exit = gr2d_client_exit, + .open_channel = gr2d_open_channel, + .close_channel = gr2d_close_channel, + .get_syncpoint = gr2d_get_syncpoint, + .submit = gr2d_submit, +}; + +static int gr2d_is_addr_reg(struct platform_device *dev, u32 class, u32 reg) +{ + int ret; + + if (class == NV_HOST1X_CLASS_ID) + ret = reg == 0x2b; + else + switch (reg) { + case 0x1a: + case 0x1b: + case 0x26: + case 0x2b: + case 0x2c: + case 0x2d: + case 0x31: + case 0x32: + case 0x48: + case 0x49: + case 0x4a: + case 0x4b: + case 0x4c: + ret = 1; + break; + default: + ret = 0; + break; + } + + return ret; +} + +static struct of_device_id gr2d_match[] = { + { .compatible = "nvidia,tegra30-gr2d" }, + { .compatible = "nvidia,tegra20-gr2d" }, + { }, +}; + +static int gr2d_probe(struct platform_device *dev) +{ + struct host1x *host1x = + host1x_get_drm_data(to_platform_device(dev->dev.parent)); + int err; + struct gr2d *gr2d = NULL; + + gr2d = devm_kzalloc(&dev->dev, sizeof(*gr2d), GFP_KERNEL); + if (!gr2d) + return -ENOMEM; + + gr2d->clk = devm_clk_get(&dev->dev, "gr2d"); + if (IS_ERR(gr2d->clk)) { + dev_err(&dev->dev, "cannot get clock\n"); + return PTR_ERR(gr2d->clk); + } + + err = clk_prepare_enable(gr2d->clk); + if (err) { + dev_err(&dev->dev, "cannot turn on clock\n"); + return err; + } + + gr2d->channel = host1x_channel_alloc(dev); + if (!gr2d->channel) + return -ENOMEM; + + gr2d->syncpt = host1x_syncpt_alloc(dev, 0); + if (!gr2d->syncpt) { + host1x_channel_free(gr2d->channel); + return -ENOMEM; + } + + gr2d->client.ops = &gr2d_client_ops; + gr2d->client.dev = &dev->dev; + gr2d->client.class = NV_GRAPHICS_2D_CLASS_ID; + + err = host1x_register_client(host1x, &gr2d->client); + if (err < 0) { + dev_err(&dev->dev, "failed to register host1x client: %d\n", + err); + return err; + } + + platform_set_drvdata(dev, gr2d); + return 0; +} + +static int __exit gr2d_remove(struct platform_device *dev) +{ + struct host1x *host1x = + host1x_get_drm_data(to_platform_device(dev->dev.parent)); + struct gr2d *gr2d = platform_get_drvdata(dev); + int err; + + err = host1x_unregister_client(host1x, &gr2d->client); + if (err < 0) { + dev_err(&dev->dev, "failed to unregister host1x client: %d\n", + err); + return err; + } + + host1x_syncpt_free(gr2d->syncpt); + return 0; +} + +struct platform_driver tegra_gr2d_driver = { + .probe = gr2d_probe, + .remove = __exit_p(gr2d_remove), + .driver = { + .owner = THIS_MODULE, + .name = "gr2d", + .of_match_table = gr2d_match, + } +}; diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c index 191f65f..ccaaece 100644 --- a/drivers/gpu/host1x/syncpt.c +++ b/drivers/gpu/host1x/syncpt.c @@ -392,3 +392,8 @@ struct host1x_syncpt *host1x_syncpt_get(struct host1x *dev, u32 id) { return dev->syncpt + id; } + +struct host1x_syncpt *host1x_syncpt_get_bydev(struct device *dev, u32 id) +{ + return host1x_syncpt_get(dev_get_drvdata(dev), id); +} diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h index 255a3a3..d15d4c5 100644 --- a/drivers/gpu/host1x/syncpt.h +++ b/drivers/gpu/host1x/syncpt.h @@ -110,6 +110,9 @@ static inline bool host1x_syncpt_min_eq_max(struct host1x_syncpt *sp) /* Return pointer to struct denoting sync point id. */ struct host1x_syncpt *host1x_syncpt_get(struct host1x *dev, u32 id);
+/* Return pointer to struct denoting sync point id, when given client pdev. */ +struct host1x_syncpt *host1x_syncpt_get_bydev(struct device *dev, u32 id); + /* Request incrementing a sync point. */ void host1x_syncpt_cpu_incr(struct host1x_syncpt *sp);
diff --git a/include/drm/tegra_drm.h b/include/drm/tegra_drm.h new file mode 100644 index 0000000..11fb019 --- /dev/null +++ b/include/drm/tegra_drm.h @@ -0,0 +1,131 @@ +/* + * Copyright (c) 2012, NVIDIA CORPORATION. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License + * along with this program. If not, see http://www.gnu.org/licenses/. + */ + +#ifndef _TEGRA_DRM_H_ +#define _TEGRA_DRM_H_ + +struct tegra_gem_create { + __u64 size; + unsigned int flags; + unsigned int handle; + unsigned int offset; +}; + +struct tegra_gem_invalidate { + unsigned int handle; +}; + +struct tegra_gem_flush { + unsigned int handle; +}; + +struct tegra_drm_syncpt_read_args { + __u32 id; + __u32 value; +}; + +struct tegra_drm_syncpt_incr_args { + __u32 id; + __u32 pad; +}; + +struct tegra_drm_syncpt_wait_args { + __u32 id; + __u32 thresh; + __s32 timeout; + __u32 value; +}; + +#define DRM_TEGRA_NO_TIMEOUT (-1) + +struct tegra_drm_open_channel_args { + __u32 class; + __u32 pad; + __u64 context; +}; + +struct tegra_drm_get_channel_param_args { + __u64 context; + __u32 param; + __u32 value; +}; + +struct tegra_drm_syncpt_incr { + __u32 syncpt_id; + __u32 syncpt_incrs; +}; + +struct tegra_drm_cmdbuf { + __u32 mem; + __u32 offset; + __u32 words; + __u32 pad; +}; + +struct tegra_drm_reloc { + __u32 cmdbuf_mem; + __u32 cmdbuf_offset; + __u32 target; + __u32 target_offset; + __u32 shift; + __u32 pad; +}; + +struct tegra_drm_waitchk { + __u32 mem; + __u32 offset; + __u32 syncpt_id; + __u32 thresh; +}; + +struct tegra_drm_submit_args { + __u64 context; + __u32 num_syncpt_incrs; + __u32 num_cmdbufs; + __u32 num_relocs; + __u32 submit_version; + __u32 num_waitchks; + __u32 waitchk_mask; + __u32 timeout; + __u32 pad; + __u64 syncpt_incrs; + __u64 cmdbufs; + __u64 relocs; + __u64 waitchks; + __u32 fence; /* Return value */ + + __u32 reserved[5]; /* future expansion */ +}; + +#define DRM_TEGRA_GEM_CREATE 0x00 +#define DRM_TEGRA_DRM_SYNCPT_READ 0x01 +#define DRM_TEGRA_DRM_SYNCPT_INCR 0x02 +#define DRM_TEGRA_DRM_SYNCPT_WAIT 0x03 +#define DRM_TEGRA_DRM_OPEN_CHANNEL 0x04 +#define DRM_TEGRA_DRM_CLOSE_CHANNEL 0x05 +#define DRM_TEGRA_DRM_GET_SYNCPOINT 0x06 +#define DRM_TEGRA_DRM_SUBMIT 0x08 + +#define DRM_IOCTL_TEGRA_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_CREATE, struct tegra_gem_create) +#define DRM_IOCTL_TEGRA_DRM_SYNCPT_READ DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SYNCPT_READ, struct tegra_drm_syncpt_read_args) +#define DRM_IOCTL_TEGRA_DRM_SYNCPT_INCR DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SYNCPT_INCR, struct tegra_drm_syncpt_incr_args) +#define DRM_IOCTL_TEGRA_DRM_SYNCPT_WAIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SYNCPT_WAIT, struct tegra_drm_syncpt_wait_args) +#define DRM_IOCTL_TEGRA_DRM_OPEN_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_OPEN_CHANNEL, struct tegra_drm_open_channel_args) +#define DRM_IOCTL_TEGRA_DRM_CLOSE_CHANNEL DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_CLOSE_CHANNEL, struct tegra_drm_open_channel_args) +#define DRM_IOCTL_TEGRA_DRM_GET_SYNCPOINT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_GET_SYNCPOINT, struct tegra_drm_get_channel_param_args) +#define DRM_IOCTL_TEGRA_DRM_SUBMIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_DRM_SUBMIT, struct tegra_drm_submit_args) + +#endif
On Tue, Jan 15, 2013 at 01:44:04PM +0200, Terje Bergstrom wrote: [...]
diff --git a/drivers/gpu/host1x/drm/drm.c b/drivers/gpu/host1x/drm/drm.c @@ -270,7 +274,29 @@ static int tegra_drm_unload(struct drm_device *drm)
static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp) {
- return 0;
- struct host1x_drm_fpriv *fpriv;
- int err = 0;
Can be dropped.
- fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);
- if (!fpriv)
return -ENOMEM;
- INIT_LIST_HEAD(&fpriv->contexts);
- filp->driver_priv = fpriv;
- return err;
return 0;
+static void tegra_drm_close(struct drm_device *drm, struct drm_file *filp) +{
- struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(filp);
- struct host1x_drm_context *context, *tmp;
- list_for_each_entry_safe(context, tmp, &fpriv->contexts, list) {
context->client->ops->close_channel(context);
kfree(context);
- }
- kfree(fpriv);
}
Maybe you should add host1x_drm_context_free() to wrap the loop contents?
@@ -280,7 +306,204 @@ static void tegra_drm_lastclose(struct drm_device *drm) drm_fbdev_cma_restore_mode(host1x->fbdev); }
+static int +tegra_drm_ioctl_syncpt_read(struct drm_device *drm, void *data,
struct drm_file *file_priv)
static int and function name on one line, please.
+{
- struct host1x *host1x = drm->dev_private;
- struct tegra_drm_syncpt_read_args *args = data;
- struct host1x_syncpt *sp =
host1x_syncpt_get_bydev(host1x->dev, args->id);
I don't know if we need this, except maybe to work around the problem that we have two different structures named host1x. The _bydev() suffix is misleading because all you really do here is obtain the syncpt from the host1x.
+static int +tegra_drm_ioctl_open_channel(struct drm_device *drm, void *data,
struct drm_file *file_priv)
+{
- struct tegra_drm_open_channel_args *args = data;
- struct host1x_client *client;
- struct host1x_drm_context *context;
- struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
- struct host1x *host1x = drm->dev_private;
- int err = 0;
err = -ENODEV; (see below)
- context = kzalloc(sizeof(*context), GFP_KERNEL);
- if (!context)
return -ENOMEM;
- list_for_each_entry(client, &host1x->clients, list) {
if (client->class == args->class) {
context->client = client;
err = client->ops->open_channel(client, context);
if (err)
goto out;
list_add(&context->list, &fpriv->contexts);
args->context = (uintptr_t)context;
Perhaps cast this to __u64 directly instead? There's little sense in taking the detour via uintptr_t.
goto out;
return 0;
}
- }
- err = -ENODEV;
+out:
- if (err)
kfree(context);
- return err;
+}
Then this simply becomes:
kfree(context); return err;
+static int +tegra_drm_ioctl_close_channel(struct drm_device *drm, void *data,
struct drm_file *file_priv)
+{
- struct tegra_drm_open_channel_args *args = data;
- struct host1x_drm_context *context, *tmp;
- struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
- int err = 0;
- list_for_each_entry_safe(context, tmp, &fpriv->contexts, list) {
if ((uintptr_t)context == args->context) {
context->client->ops->close_channel(context);
list_del(&context->list);
kfree(context);
goto out;
}
- }
- err = -EINVAL;
+out:
- return err;
+}
Same comments as for tegra_drm_ioctl_open_channel().
+static int +tegra_drm_ioctl_get_syncpoint(struct drm_device *drm, void *data,
struct drm_file *file_priv)
+{
- struct tegra_drm_get_channel_param_args *args = data;
- struct host1x_drm_context *context;
- struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
- int err = 0;
- list_for_each_entry(context, &fpriv->contexts, list) {
if ((uintptr_t)context == args->context) {
args->value =
context->client->ops->get_syncpoint(context,
args->param);
goto out;
}
- }
- err = -ENODEV;
+out:
- return err;
+}
Same comments as well. Also you may want to factor out the context lookup into a separate function so you don't have to repeat the same code over and over again.
I wonder if we shouldn't remove .get_syncpoint() from the client ops and replace it by a simple array instead. The only use-case for this is if a client wants more than a single syncpoint, right? In that case just keep an array of syncpoints and the number of syncpoints per client. Otherwise each client will have to rewrite the same function.
Also, how useful is it to create a context? Looking at the gr2d implementation for .open_channel(), it will return the same channel to whichever userspace process requests them. Can you explain why it is necessary at all? From the name I would have expected some kind of context switching to take place when different applications submit requests to the same client, but that doesn't seem to be the case.
+static int +tegra_drm_create_ioctl(struct drm_device *drm, void *data,
struct drm_file *file_priv)
tegra_drm_gem_create_ioctl() please.
static struct drm_ioctl_desc tegra_drm_ioctls[] = {
- DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE,
tegra_drm_create_ioctl, DRM_UNLOCKED | DRM_AUTH),
TEGRA_DRM_GEM_CREATE
static const struct file_operations tegra_drm_fops = { @@ -303,6 +526,7 @@ struct drm_driver tegra_drm_driver = { .load = tegra_drm_load, .unload = tegra_drm_unload, .open = tegra_drm_open,
- .preclose = tegra_drm_close,
I think it'd make sense to name the function tegra_drm_preclose() to match the name in struct drm_driver.
diff --git a/drivers/gpu/host1x/drm/drm.h b/drivers/gpu/host1x/drm/drm.h
[...]
+struct host1x_drm_fpriv {
- struct list_head contexts;
};
Maybe name this host1x_drm_file. fpriv isn't very specific.
+static inline struct host1x_drm_fpriv * +host1x_drm_fpriv(struct drm_file *file_priv) +{
- return file_priv ? file_priv->driver_priv : NULL;
+}
I think it's fine to just directly do filp->driver_priv instead of going through this wrapper.
struct host1x_client { struct host1x *host1x; struct device *dev;
const struct host1x_client_ops *ops;
- u32 class;
Should this perhaps be an enum?
diff --git a/drivers/gpu/host1x/drm/gr2d.c b/drivers/gpu/host1x/drm/gr2d.c
[...]
+static u32 gr2d_get_syncpoint(struct host1x_drm_context *context, int index) +{
- struct gr2d *gr2d = dev_get_drvdata(context->client->dev);
- if (index != 0)
return UINT_MAX;
- return host1x_syncpt_id(gr2d->syncpt);
+}
Maybe get_syncpoint() should return int and negative error codes on failure. That still leaves room for 2^31 possible syncpoints.
+static u32 handle_cma_to_host1x(struct drm_device *drm,
struct drm_file *file_priv, u32 gem_handle)
+{
- struct drm_gem_object *obj;
- struct drm_gem_cma_object *cma_obj;
- u32 host1x_handle;
- obj = drm_gem_object_lookup(drm, file_priv, gem_handle);
- if (!obj)
return 0;
- cma_obj = to_drm_gem_cma_obj(obj);
- host1x_handle = host1x_memmgr_host1x_id(mem_mgr_type_cma, (u32)cma_obj);
- drm_gem_object_unreference(obj);
- return host1x_handle;
+}
I though we had settled in previous reviews on only having a single allocator and not do the conversion between various types?
+static int gr2d_submit(struct host1x_drm_context *context,
struct tegra_drm_submit_args *args,
struct drm_device *drm,
struct drm_file *file_priv)
+{
- struct host1x_job *job;
- int num_cmdbufs = args->num_cmdbufs;
- int num_relocs = args->num_relocs;
- int num_waitchks = args->num_waitchks;
- struct tegra_drm_cmdbuf __user *cmdbufs =
(void * __user)(uintptr_t)args->cmdbufs;
- struct tegra_drm_reloc __user *relocs =
(void * __user)(uintptr_t)args->relocs;
- struct tegra_drm_waitchk __user *waitchks =
(void * __user)(uintptr_t)args->waitchks;
No need for all the uintptr_t casts.
- struct tegra_drm_syncpt_incr syncpt_incr;
- int err;
- /* We don't yet support other than one syncpt_incr struct per submit */
- if (args->num_syncpt_incrs != 1)
return -EINVAL;
- job = host1x_job_alloc(context->channel,
args->num_cmdbufs,
args->num_relocs,
args->num_waitchks);
- if (!job)
return -ENOMEM;
- job->num_relocs = args->num_relocs;
- job->num_waitchk = args->num_waitchks;
- job->clientid = (u32)args->context;
- job->class = context->client->class;
- job->serialize = true;
- while (num_cmdbufs) {
struct tegra_drm_cmdbuf cmdbuf;
err = copy_from_user(&cmdbuf, cmdbufs, sizeof(cmdbuf));
if (err)
goto fail;
cmdbuf.mem = handle_cma_to_host1x(drm, file_priv, cmdbuf.mem);
if (!cmdbuf.mem)
goto fail;
host1x_job_add_gather(job,
cmdbuf.mem, cmdbuf.words, cmdbuf.offset);
num_cmdbufs--;
cmdbufs++;
- }
- err = copy_from_user(job->relocarray,
relocs, sizeof(*relocs) * num_relocs);
- if (err)
goto fail;
- while (num_relocs--) {
job->relocarray[num_relocs].cmdbuf_mem =
handle_cma_to_host1x(drm, file_priv,
job->relocarray[num_relocs].cmdbuf_mem);
job->relocarray[num_relocs].target =
handle_cma_to_host1x(drm, file_priv,
job->relocarray[num_relocs].target);
if (!job->relocarray[num_relocs].target ||
!job->relocarray[num_relocs].cmdbuf_mem)
goto fail;
- }
- err = copy_from_user(job->waitchk,
waitchks, sizeof(*waitchks) * num_waitchks);
- if (err)
goto fail;
- err = host1x_job_pin(job, to_platform_device(context->client->dev));
- if (err)
goto fail;
- err = copy_from_user(&syncpt_incr,
(void * __user)(uintptr_t)args->syncpt_incrs,
sizeof(syncpt_incr));
- if (err)
goto fail;
- job->syncpt_id = syncpt_incr.syncpt_id;
- job->syncpt_incrs = syncpt_incr.syncpt_incrs;
- job->timeout = 10000;
- job->is_addr_reg = gr2d_is_addr_reg;
- if (args->timeout && args->timeout < 10000)
job->timeout = args->timeout;
- err = host1x_channel_submit(job);
- if (err)
goto fail_submit;
- args->fence = job->syncpt_end;
- host1x_job_put(job);
- return 0;
+fail_submit:
- host1x_job_unpin(job);
+fail:
- host1x_job_put(job);
- return err;
+}
Most of this looks very generic. Can't it be split out into separate functions and reused in other (gr3d) modules?
+static int gr2d_is_addr_reg(struct platform_device *dev, u32 class, u32 reg) +{
- int ret;
- if (class == NV_HOST1X_CLASS_ID)
ret = reg == 0x2b;
- else
switch (reg) {
case 0x1a:
case 0x1b:
case 0x26:
case 0x2b:
case 0x2c:
case 0x2d:
case 0x31:
case 0x32:
case 0x48:
case 0x49:
case 0x4a:
case 0x4b:
case 0x4c:
ret = 1;
break;
default:
ret = 0;
break;
}
- return ret;
+}
I should probably bite the bullet and read through the (still) huge patch 3 to understand exactly why this is needed.
+static struct of_device_id gr2d_match[] = {
static const please.
+static int __exit gr2d_remove(struct platform_device *dev) +{
- struct host1x *host1x =
host1x_get_drm_data(to_platform_device(dev->dev.parent));
- struct gr2d *gr2d = platform_get_drvdata(dev);
- int err;
- err = host1x_unregister_client(host1x, &gr2d->client);
- if (err < 0) {
dev_err(&dev->dev, "failed to unregister host1x client: %d\n",
err);
return err;
- }
- host1x_syncpt_free(gr2d->syncpt);
- return 0;
+}
Isn't this missing a host1x_channel_put() or host1x_free_channel()?
diff --git a/include/drm/tegra_drm.h b/include/drm/tegra_drm.h
[...]
+struct tegra_gem_create {
- __u64 size;
- unsigned int flags;
- unsigned int handle;
- unsigned int offset;
+};
I think it's better to consistently use the explicitly sized types here.
+struct tegra_gem_invalidate {
- unsigned int handle;
+};
+struct tegra_gem_flush {
- unsigned int handle;
+};
Where are these used?
+struct tegra_drm_syncpt_wait_args {
- __u32 id;
- __u32 thresh;
- __s32 timeout;
- __u32 value;
+};
+#define DRM_TEGRA_NO_TIMEOUT (-1)
Is this the only reason why timeout is signed? If so maybe a better choice would be __u32 and DRM_TEGRA_NO_TIMEOUT 0xffffffff.
+struct tegra_drm_get_channel_param_args {
- __u64 context;
- __u32 param;
- __u32 value;
+};
What's the reason for not calling this tegra_drm_get_syncpoint?
+struct tegra_drm_syncpt_incr {
- __u32 syncpt_id;
- __u32 syncpt_incrs;
+};
Maybe the fields would be better named id and incrs. Though I also notice that incrs is never used. I guess that's supposed to be used in the future to allow increments by more than a single value. If so, perhaps value would be a better name.
Now on to the dreaded patch 3...
Thierry
On 04.02.2013 04:56, Thierry Reding wrote:
- PGP Signed by an unknown key
On Tue, Jan 15, 2013 at 01:44:04PM +0200, Terje Bergstrom wrote: [...]
diff --git a/drivers/gpu/host1x/drm/drm.c b/drivers/gpu/host1x/drm/drm.c @@ -270,7 +274,29 @@ static int tegra_drm_unload(struct drm_device *drm)
static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp) {
return 0;
struct host1x_drm_fpriv *fpriv;
int err = 0;
Can be dropped.
fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);
if (!fpriv)
return -ENOMEM;
INIT_LIST_HEAD(&fpriv->contexts);
filp->driver_priv = fpriv;
return err;
return 0;
Ok.
+static void tegra_drm_close(struct drm_device *drm, struct drm_file *filp) +{
struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(filp);
struct host1x_drm_context *context, *tmp;
list_for_each_entry_safe(context, tmp, &fpriv->contexts, list) {
context->client->ops->close_channel(context);
kfree(context);
}
kfree(fpriv);
}
Maybe you should add host1x_drm_context_free() to wrap the loop contents?
Makes sense. Will do.
@@ -280,7 +306,204 @@ static void tegra_drm_lastclose(struct drm_device *drm) drm_fbdev_cma_restore_mode(host1x->fbdev); }
+static int +tegra_drm_ioctl_syncpt_read(struct drm_device *drm, void *data,
struct drm_file *file_priv)
static int and function name on one line, please.
Ok, will re-split the lines.
+{
struct host1x *host1x = drm->dev_private;
struct tegra_drm_syncpt_read_args *args = data;
struct host1x_syncpt *sp =
host1x_syncpt_get_bydev(host1x->dev, args->id);
I don't know if we need this, except maybe to work around the problem that we have two different structures named host1x. The _bydev() suffix is misleading because all you really do here is obtain the syncpt from the host1x.
Yeah, it's actually working around the host1x duplicate naming. host1x_syncpt_get takes struct host1x as parameter, but that's different host1x than in this code.
+static int +tegra_drm_ioctl_open_channel(struct drm_device *drm, void *data,
struct drm_file *file_priv)
+{
struct tegra_drm_open_channel_args *args = data;
struct host1x_client *client;
struct host1x_drm_context *context;
struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
struct host1x *host1x = drm->dev_private;
int err = 0;
err = -ENODEV; (see below)
Ok, makes sense.
context = kzalloc(sizeof(*context), GFP_KERNEL);
if (!context)
return -ENOMEM;
list_for_each_entry(client, &host1x->clients, list) {
if (client->class == args->class) {
context->client = client;
err = client->ops->open_channel(client, context);
if (err)
goto out;
list_add(&context->list, &fpriv->contexts);
args->context = (uintptr_t)context;
Perhaps cast this to __u64 directly instead? There's little sense in taking the detour via uintptr_t.
I think compiler complained about a direct cast to __u64, but I'll try again.
goto out;
return 0;
}
}
err = -ENODEV;
+out:
if (err)
kfree(context);
return err;
+}
Then this simply becomes:
kfree(context); return err;
Sounds good.
+static int +tegra_drm_ioctl_close_channel(struct drm_device *drm, void *data,
struct drm_file *file_priv)
+{
struct tegra_drm_open_channel_args *args = data;
struct host1x_drm_context *context, *tmp;
struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
int err = 0;
list_for_each_entry_safe(context, tmp, &fpriv->contexts, list) {
if ((uintptr_t)context == args->context) {
context->client->ops->close_channel(context);
list_del(&context->list);
kfree(context);
goto out;
}
}
err = -EINVAL;
+out:
return err;
+}
Same comments as for tegra_drm_ioctl_open_channel().
Ok, will apply.
+static int +tegra_drm_ioctl_get_syncpoint(struct drm_device *drm, void *data,
struct drm_file *file_priv)
+{
struct tegra_drm_get_channel_param_args *args = data;
struct host1x_drm_context *context;
struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
int err = 0;
list_for_each_entry(context, &fpriv->contexts, list) {
if ((uintptr_t)context == args->context) {
args->value =
context->client->ops->get_syncpoint(context,
args->param);
goto out;
}
}
err = -ENODEV;
+out:
return err;
+}
Same comments as well. Also you may want to factor out the context lookup into a separate function so you don't have to repeat the same code over and over again.
Will do.
I wonder if we shouldn't remove .get_syncpoint() from the client ops and replace it by a simple array instead. The only use-case for this is if a client wants more than a single syncpoint, right? In that case just keep an array of syncpoints and the number of syncpoints per client. Otherwise each client will have to rewrite the same function.
That makes sense. Will do.
Also, how useful is it to create a context? Looking at the gr2d implementation for .open_channel(), it will return the same channel to whichever userspace process requests them. Can you explain why it is necessary at all? From the name I would have expected some kind of context switching to take place when different applications submit requests to the same client, but that doesn't seem to be the case.
Hardware context switching will be a later submit, and it'll actually create a new structure. Hardware context might live longer than the process that created it, so they'll need to be separate.
We've used the context as a place for storing flags and the reference to hardware context. It'd allow also opening channels to multiple devices, and context would be used in submit to find out the target device. But as hardware context switching is not implemented in this patch set, and neither is support for anything but 2D, it's difficult to justify it.
Perhaps the justification is that this way we can keep the kernel API stable even when we add support for hardware contexts and other clients.
+static int +tegra_drm_create_ioctl(struct drm_device *drm, void *data,
struct drm_file *file_priv)
tegra_drm_gem_create_ioctl() please.
Sure.
static struct drm_ioctl_desc tegra_drm_ioctls[] = {
DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE,
tegra_drm_create_ioctl, DRM_UNLOCKED | DRM_AUTH),
TEGRA_DRM_GEM_CREATE
Will change.
static const struct file_operations tegra_drm_fops = { @@ -303,6 +526,7 @@ struct drm_driver tegra_drm_driver = { .load = tegra_drm_load, .unload = tegra_drm_unload, .open = tegra_drm_open,
.preclose = tegra_drm_close,
I think it'd make sense to name the function tegra_drm_preclose() to match the name in struct drm_driver.
Yes, and I think you added preclose in your vblank patch set, so I'll need to rebase.
diff --git a/drivers/gpu/host1x/drm/drm.h b/drivers/gpu/host1x/drm/drm.h
[...]
+struct host1x_drm_fpriv {
struct list_head contexts;
};
Maybe name this host1x_drm_file. fpriv isn't very specific.
host1x_drm_file sounds a bit odd, because it's not really a file, but a private data pointer stored in driver_priv.
+static inline struct host1x_drm_fpriv * +host1x_drm_fpriv(struct drm_file *file_priv) +{
return file_priv ? file_priv->driver_priv : NULL;
+}
I think it's fine to just directly do filp->driver_priv instead of going through this wrapper.
Ok.
struct host1x_client { struct host1x *host1x; struct device *dev;
const struct host1x_client_ops *ops;
u32 class;
Should this perhaps be an enum?
That would make sense. I've kept it u32, because the type of class in hardware is u32, but the two don't need to match.
diff --git a/drivers/gpu/host1x/drm/gr2d.c b/drivers/gpu/host1x/drm/gr2d.c
[...]
+static u32 gr2d_get_syncpoint(struct host1x_drm_context *context, int index) +{
struct gr2d *gr2d = dev_get_drvdata(context->client->dev);
if (index != 0)
return UINT_MAX;
return host1x_syncpt_id(gr2d->syncpt);
+}
Maybe get_syncpoint() should return int and negative error codes on failure. That still leaves room for 2^31 possible syncpoints.
That'd be enough. Will do. :-)
+static u32 handle_cma_to_host1x(struct drm_device *drm,
struct drm_file *file_priv, u32 gem_handle)
+{
struct drm_gem_object *obj;
struct drm_gem_cma_object *cma_obj;
u32 host1x_handle;
obj = drm_gem_object_lookup(drm, file_priv, gem_handle);
if (!obj)
return 0;
cma_obj = to_drm_gem_cma_obj(obj);
host1x_handle = host1x_memmgr_host1x_id(mem_mgr_type_cma, (u32)cma_obj);
drm_gem_object_unreference(obj);
return host1x_handle;
+}
I though we had settled in previous reviews on only having a single allocator and not do the conversion between various types?
I'll need to agree with Lucas on how to handle this. He intended to make a patch to fix this, but he hasn't had time to do that.
But, I'd still like to keep the possibility open to add dma_buf as memory handle type, and fit that into the same API, so there's still a need to have the mem_mgr_type abstraction.
+static int gr2d_submit(struct host1x_drm_context *context,
struct tegra_drm_submit_args *args,
struct drm_device *drm,
struct drm_file *file_priv)
+{
struct host1x_job *job;
int num_cmdbufs = args->num_cmdbufs;
int num_relocs = args->num_relocs;
int num_waitchks = args->num_waitchks;
struct tegra_drm_cmdbuf __user *cmdbufs =
(void * __user)(uintptr_t)args->cmdbufs;
struct tegra_drm_reloc __user *relocs =
(void * __user)(uintptr_t)args->relocs;
struct tegra_drm_waitchk __user *waitchks =
(void * __user)(uintptr_t)args->waitchks;
No need for all the uintptr_t casts.
Will try to remove - but I do remember getting compiler warnings without them.
(...)
Most of this looks very generic. Can't it be split out into separate functions and reused in other (gr3d) modules?
That's actually how most of this is downstream. I thought to make everything really simple and make it all 2D specific in the first patch set, and split into generic when we add support for another device.
+static int gr2d_is_addr_reg(struct platform_device *dev, u32 class, u32 reg) +{
int ret;
if (class == NV_HOST1X_CLASS_ID)
ret = reg == 0x2b;
else
switch (reg) {
case 0x1a:
case 0x1b:
case 0x26:
case 0x2b:
case 0x2c:
case 0x2d:
case 0x31:
case 0x32:
case 0x48:
case 0x49:
case 0x4a:
case 0x4b:
case 0x4c:
ret = 1;
break;
default:
ret = 0;
break;
}
return ret;
+}
I should probably bite the bullet and read through the (still) huge patch 3 to understand exactly why this is needed.
That's the security firewall. It walks through each submit, and ensures that each register write that writes an address, goes through the host1x reloc mechanism. This way user space cannot ask 2D to write to arbitrary memory locations.
+static struct of_device_id gr2d_match[] = {
static const please.
Ok.
+static int __exit gr2d_remove(struct platform_device *dev) +{
struct host1x *host1x =
host1x_get_drm_data(to_platform_device(dev->dev.parent));
struct gr2d *gr2d = platform_get_drvdata(dev);
int err;
err = host1x_unregister_client(host1x, &gr2d->client);
if (err < 0) {
dev_err(&dev->dev, "failed to unregister host1x client: %d\n",
err);
return err;
}
host1x_syncpt_free(gr2d->syncpt);
return 0;
+}
Isn't this missing a host1x_channel_put() or host1x_free_channel()?
All references should be handled in gr2d_open_channel() and gr2d_close_channel(). I think we'd need to ensure all contexts are closed at this point.
diff --git a/include/drm/tegra_drm.h b/include/drm/tegra_drm.h
[...]
+struct tegra_gem_create {
__u64 size;
unsigned int flags;
unsigned int handle;
unsigned int offset;
+};
I think it's better to consistently use the explicitly sized types here.
+struct tegra_gem_invalidate {
unsigned int handle;
+};
+struct tegra_gem_flush {
unsigned int handle;
+};
Where are these used?
Arto, please go through these.
+struct tegra_drm_syncpt_wait_args {
__u32 id;
__u32 thresh;
__s32 timeout;
__u32 value;
+};
+#define DRM_TEGRA_NO_TIMEOUT (-1)
Is this the only reason why timeout is signed? If so maybe a better choice would be __u32 and DRM_TEGRA_NO_TIMEOUT 0xffffffff.
I believe it is so. In fact we'd need to rename it to something like INFINITE_TIMEOUT, because we also have a case of timeout=0, which returns immediately, i.e. doesn't have a timeout either.
+struct tegra_drm_get_channel_param_args {
__u64 context;
__u32 param;
__u32 value;
+};
What's the reason for not calling this tegra_drm_get_syncpoint?
I wanted to use the same struct for other parameters, too: wait bases, mutexes. But it doesn't really optimize anything, so I can make them each specific structs.
+struct tegra_drm_syncpt_incr {
__u32 syncpt_id;
__u32 syncpt_incrs;
+};
Maybe the fields would be better named id and incrs. Though I also notice that incrs is never used. I guess that's supposed to be used in the future to allow increments by more than a single value. If so, perhaps value would be a better name.
It's actually used in the dreaded patch 3, as part of tegra_drm_submit_args.
Now on to the dreaded patch 3...
Enjoy. :-)
Terje
On Mon, Feb 04, 2013 at 09:17:45PM -0800, Terje Bergström wrote:
On 04.02.2013 04:56, Thierry Reding wrote:
On Tue, Jan 15, 2013 at 01:44:04PM +0200, Terje Bergstrom wrote:
+{
struct host1x *host1x = drm->dev_private;
struct tegra_drm_syncpt_read_args *args = data;
struct host1x_syncpt *sp =
host1x_syncpt_get_bydev(host1x->dev, args->id);
I don't know if we need this, except maybe to work around the problem that we have two different structures named host1x. The _bydev() suffix is misleading because all you really do here is obtain the syncpt from the host1x.
Yeah, it's actually working around the host1x duplicate naming. host1x_syncpt_get takes struct host1x as parameter, but that's different host1x than in this code.
So maybe a better way would be to rename the DRM host1x after all. If it avoids the need for workarounds such as this I think it justifies the additional churn.
Also, how useful is it to create a context? Looking at the gr2d implementation for .open_channel(), it will return the same channel to whichever userspace process requests them. Can you explain why it is necessary at all? From the name I would have expected some kind of context switching to take place when different applications submit requests to the same client, but that doesn't seem to be the case.
Hardware context switching will be a later submit, and it'll actually create a new structure. Hardware context might live longer than the process that created it, so they'll need to be separate.
Why would it live longer than the process? Isn't the whole purpose of the context to keep per-process state? What use is that state if the process dies?
We've used the context as a place for storing flags and the reference to hardware context. It'd allow also opening channels to multiple devices, and context would be used in submit to find out the target device. But as hardware context switching is not implemented in this patch set, and neither is support for anything but 2D, it's difficult to justify it.
Perhaps the justification is that this way we can keep the kernel API stable even when we add support for hardware contexts and other clients.
We don't need a stable kernel API. But I guess it is fine to keep it if for no other reason to fill the context returned in the ioctl() with meaningful data.
diff --git a/drivers/gpu/host1x/drm/drm.h b/drivers/gpu/host1x/drm/drm.h
[...]
+struct host1x_drm_fpriv {
struct list_head contexts;
};
Maybe name this host1x_drm_file. fpriv isn't very specific.
host1x_drm_file sounds a bit odd, because it's not really a file, but a private data pointer stored in driver_priv.
The same is true for struct drm_file, which is stored in struct file's .private_data field. I find it to be very intuitive if the inheritance is reflected in the structure name. struct host1x_drm_file is host1x' driver-specific part of struct drm_file.
+static u32 handle_cma_to_host1x(struct drm_device *drm,
struct drm_file *file_priv, u32 gem_handle)
+{
struct drm_gem_object *obj;
struct drm_gem_cma_object *cma_obj;
u32 host1x_handle;
obj = drm_gem_object_lookup(drm, file_priv, gem_handle);
if (!obj)
return 0;
cma_obj = to_drm_gem_cma_obj(obj);
host1x_handle = host1x_memmgr_host1x_id(mem_mgr_type_cma, (u32)cma_obj);
drm_gem_object_unreference(obj);
return host1x_handle;
+}
I though we had settled in previous reviews on only having a single allocator and not do the conversion between various types?
I'll need to agree with Lucas on how to handle this. He intended to make a patch to fix this, but he hasn't had time to do that.
But, I'd still like to keep the possibility open to add dma_buf as memory handle type, and fit that into the same API, so there's still a need to have the mem_mgr_type abstraction.
I fail to see how dma_buf would require a separate mem_mgr_type. Can we perhaps postpone this to a later point and just go with CMA as the only alternative for now until we have an actual working implementation that we can use this for?
+static int gr2d_submit(struct host1x_drm_context *context,
struct tegra_drm_submit_args *args,
struct drm_device *drm,
struct drm_file *file_priv)
+{
struct host1x_job *job;
int num_cmdbufs = args->num_cmdbufs;
int num_relocs = args->num_relocs;
int num_waitchks = args->num_waitchks;
struct tegra_drm_cmdbuf __user *cmdbufs =
(void * __user)(uintptr_t)args->cmdbufs;
struct tegra_drm_reloc __user *relocs =
(void * __user)(uintptr_t)args->relocs;
struct tegra_drm_waitchk __user *waitchks =
(void * __user)(uintptr_t)args->waitchks;
No need for all the uintptr_t casts.
Will try to remove - but I do remember getting compiler warnings without them.
I think you shouldn't even have to cast to void * first. Just cast to the target type directly. I don't see why the compiler should complain.
(...)
Most of this looks very generic. Can't it be split out into separate functions and reused in other (gr3d) modules?
That's actually how most of this is downstream. I thought to make everything really simple and make it all 2D specific in the first patch set, and split into generic when we add support for another device.
Okay, that's fine then.
+static int gr2d_is_addr_reg(struct platform_device *dev, u32 class, u32 reg) +{
int ret;
if (class == NV_HOST1X_CLASS_ID)
ret = reg == 0x2b;
else
switch (reg) {
case 0x1a:
case 0x1b:
case 0x26:
case 0x2b:
case 0x2c:
case 0x2d:
case 0x31:
case 0x32:
case 0x48:
case 0x49:
case 0x4a:
case 0x4b:
case 0x4c:
ret = 1;
break;
default:
ret = 0;
break;
}
return ret;
+}
I should probably bite the bullet and read through the (still) huge patch 3 to understand exactly why this is needed.
That's the security firewall. It walks through each submit, and ensures that each register write that writes an address, goes through the host1x reloc mechanism. This way user space cannot ask 2D to write to arbitrary memory locations.
I see. Can this be made more generic? Perhaps adding a table of valid registers to the device and use a generic function to iterate over that instead of having to provide the same function for each client.
+static int __exit gr2d_remove(struct platform_device *dev) +{
struct host1x *host1x =
host1x_get_drm_data(to_platform_device(dev->dev.parent));
struct gr2d *gr2d = platform_get_drvdata(dev);
int err;
err = host1x_unregister_client(host1x, &gr2d->client);
if (err < 0) {
dev_err(&dev->dev, "failed to unregister host1x client: %d\n",
err);
return err;
}
host1x_syncpt_free(gr2d->syncpt);
return 0;
+}
Isn't this missing a host1x_channel_put() or host1x_free_channel()?
All references should be handled in gr2d_open_channel() and gr2d_close_channel(). I think we'd need to ensure all contexts are closed at this point.
Yes, that'd work as well. Actually I would assume that all contexts associated with a given file should be freed when the file is closed. That way all of this should work pretty much automatically.
+struct tegra_drm_syncpt_wait_args {
__u32 id;
__u32 thresh;
__s32 timeout;
__u32 value;
+};
+#define DRM_TEGRA_NO_TIMEOUT (-1)
Is this the only reason why timeout is signed? If so maybe a better choice would be __u32 and DRM_TEGRA_NO_TIMEOUT 0xffffffff.
I believe it is so. In fact we'd need to rename it to something like INFINITE_TIMEOUT, because we also have a case of timeout=0, which returns immediately, i.e. doesn't have a timeout either.
For timeout == 0 I don't think we need a symbolic name. It is pretty common for 0 to mean no timeout. But yes, DRM_TEGRA_INFINITE_TIMEOUT should be okay.
+struct tegra_drm_syncpt_incr {
__u32 syncpt_id;
__u32 syncpt_incrs;
+};
Maybe the fields would be better named id and incrs. Though I also notice that incrs is never used. I guess that's supposed to be used in the future to allow increments by more than a single value. If so, perhaps value would be a better name.
It's actually used in the dreaded patch 3, as part of tegra_drm_submit_args.
Okay. The superfluous syncpt_ prefixes should still go away.
Thierry
On 05.02.2013 01:54, Thierry Reding wrote:
On Mon, Feb 04, 2013 at 09:17:45PM -0800, Terje Bergström wrote:
Yeah, it's actually working around the host1x duplicate naming. host1x_syncpt_get takes struct host1x as parameter, but that's different host1x than in this code.
So maybe a better way would be to rename the DRM host1x after all. If it avoids the need for workarounds such as this I think it justifies the additional churn.
Ok, I'll include that. Do you have a preference for the name? Something like "host1x_drm" might work?
Also, how useful is it to create a context? Looking at the gr2d implementation for .open_channel(), it will return the same channel to whichever userspace process requests them. Can you explain why it is necessary at all? From the name I would have expected some kind of context switching to take place when different applications submit requests to the same client, but that doesn't seem to be the case.
Hardware context switching will be a later submit, and it'll actually create a new structure. Hardware context might live longer than the process that created it, so they'll need to be separate.
Why would it live longer than the process? Isn't the whole purpose of the context to keep per-process state? What use is that state if the process dies?
Hardware context has to be kept alive for as long as there's a job running from that process. If an app sends 10 jobs to 2D channel, and dies immediately, there's no sane way for host1x to remove the jobs from queue. The jobs will keep on running and kernel will need to track them.
Perhaps the justification is that this way we can keep the kernel API stable even when we add support for hardware contexts and other clients.
We don't need a stable kernel API. But I guess it is fine to keep it if for no other reason to fill the context returned in the ioctl() with meaningful data.
Sorry, I meant stable IOCTL API, so we agree on this.
host1x_drm_file sounds a bit odd, because it's not really a file, but a private data pointer stored in driver_priv.
The same is true for struct drm_file, which is stored in struct file's .private_data field. I find it to be very intuitive if the inheritance is reflected in the structure name. struct host1x_drm_file is host1x' driver-specific part of struct drm_file.
Ok, makes sense. I'll do that.
I fail to see how dma_buf would require a separate mem_mgr_type. Can we perhaps postpone this to a later point and just go with CMA as the only alternative for now until we have an actual working implementation that we can use this for?
Each submit refers to a number of buffers. Some of them are the streams, some are textures or other input/output buffers. Each of these buffers might be passed as a GEM handle, or (when implemented) as a dma_buf fd. Thus we need a field to tell host1x which API to call to handle that handle.
I think we can leave out the code for managing the type until we actually have separate memory managers. That'd make GEM handles effectively of type 0, as we don't set it.
+static int gr2d_submit(struct host1x_drm_context *context,
struct tegra_drm_submit_args *args,
struct drm_device *drm,
struct drm_file *file_priv)
+{
struct host1x_job *job;
int num_cmdbufs = args->num_cmdbufs;
int num_relocs = args->num_relocs;
int num_waitchks = args->num_waitchks;
struct tegra_drm_cmdbuf __user *cmdbufs =
(void * __user)(uintptr_t)args->cmdbufs;
struct tegra_drm_reloc __user *relocs =
(void * __user)(uintptr_t)args->relocs;
struct tegra_drm_waitchk __user *waitchks =
(void * __user)(uintptr_t)args->waitchks;
No need for all the uintptr_t casts.
Will try to remove - but I do remember getting compiler warnings without them.
I think you shouldn't even have to cast to void * first. Just cast to the target type directly. I don't see why the compiler should complain.
This is what I get without them:
drivers/gpu/host1x/drm/gr2d.c:108:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] drivers/gpu/host1x/drm/gr2d.c:110:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] drivers/gpu/host1x/drm/gr2d.c:112:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-c
The problem is that the fields are __u64's and can't be cast directly into 32-bit pointers.
That's the security firewall. It walks through each submit, and ensures that each register write that writes an address, goes through the host1x reloc mechanism. This way user space cannot ask 2D to write to arbitrary memory locations.
I see. Can this be made more generic? Perhaps adding a table of valid registers to the device and use a generic function to iterate over that instead of having to provide the same function for each client.
For which one does gcc generate more efficient code? I've thought a switch-case statement might get compiled into something more efficient than a table lookup.
But the rest of the code is generic - just the one function which compares against known address registers is specific to 2D.
+static int __exit gr2d_remove(struct platform_device *dev) +{
struct host1x *host1x =
host1x_get_drm_data(to_platform_device(dev->dev.parent));
struct gr2d *gr2d = platform_get_drvdata(dev);
int err;
err = host1x_unregister_client(host1x, &gr2d->client);
if (err < 0) {
dev_err(&dev->dev, "failed to unregister host1x client: %d\n",
err);
return err;
}
host1x_syncpt_free(gr2d->syncpt);
return 0;
+}
Isn't this missing a host1x_channel_put() or host1x_free_channel()?
All references should be handled in gr2d_open_channel() and gr2d_close_channel(). I think we'd need to ensure all contexts are closed at this point.
Yes, that'd work as well. Actually I would assume that all contexts associated with a given file should be freed when the file is closed. That way all of this should work pretty much automatically.
Naturally they are, so we're actually already good. All contexts get closed at file close.
For timeout == 0 I don't think we need a symbolic name. It is pretty common for 0 to mean no timeout. But yes, DRM_TEGRA_INFINITE_TIMEOUT should be okay.
Ok, will do that.
+struct tegra_drm_syncpt_incr {
__u32 syncpt_id;
__u32 syncpt_incrs;
+};
Maybe the fields would be better named id and incrs. Though I also notice that incrs is never used. I guess that's supposed to be used in the future to allow increments by more than a single value. If so, perhaps value would be a better name.
It's actually used in the dreaded patch 3, as part of tegra_drm_submit_args.
Okay. The superfluous syncpt_ prefixes should still go away.
Sure, forgot to comment, but I'm fine with that.
Terje
On Wed, Feb 06, 2013 at 01:23:17PM -0800, Terje Bergström wrote:
On 05.02.2013 01:54, Thierry Reding wrote:
On Mon, Feb 04, 2013 at 09:17:45PM -0800, Terje Bergström wrote:
Yeah, it's actually working around the host1x duplicate naming. host1x_syncpt_get takes struct host1x as parameter, but that's different host1x than in this code.
So maybe a better way would be to rename the DRM host1x after all. If it avoids the need for workarounds such as this I think it justifies the additional churn.
Ok, I'll include that. Do you have a preference for the name? Something like "host1x_drm" might work?
Yes, that sounds good.
Also, how useful is it to create a context? Looking at the gr2d implementation for .open_channel(), it will return the same channel to whichever userspace process requests them. Can you explain why it is necessary at all? From the name I would have expected some kind of context switching to take place when different applications submit requests to the same client, but that doesn't seem to be the case.
Hardware context switching will be a later submit, and it'll actually create a new structure. Hardware context might live longer than the process that created it, so they'll need to be separate.
Why would it live longer than the process? Isn't the whole purpose of the context to keep per-process state? What use is that state if the process dies?
Hardware context has to be kept alive for as long as there's a job running from that process. If an app sends 10 jobs to 2D channel, and dies immediately, there's no sane way for host1x to remove the jobs from queue. The jobs will keep on running and kernel will need to track them.
Okay, I understand now. There was one additional thing that I wanted to point out, but the context is gone now. I'll go through the patch again and reply there.
I fail to see how dma_buf would require a separate mem_mgr_type. Can we perhaps postpone this to a later point and just go with CMA as the only alternative for now until we have an actual working implementation that we can use this for?
Each submit refers to a number of buffers. Some of them are the streams, some are textures or other input/output buffers. Each of these buffers might be passed as a GEM handle, or (when implemented) as a dma_buf fd. Thus we need a field to tell host1x which API to call to handle that handle.
Understood.
I think we can leave out the code for managing the type until we actually have separate memory managers. That'd make GEM handles effectively of type 0, as we don't set it.
I think that's a good idea. Let's start simple for now and who knows what else will have changed by the time we get to implement dma_buf. Maybe Lucas will have finished his work on the allocator and we will need to synchronize with that anyway.
+static int gr2d_submit(struct host1x_drm_context *context,
struct tegra_drm_submit_args *args,
struct drm_device *drm,
struct drm_file *file_priv)
+{
struct host1x_job *job;
int num_cmdbufs = args->num_cmdbufs;
int num_relocs = args->num_relocs;
int num_waitchks = args->num_waitchks;
struct tegra_drm_cmdbuf __user *cmdbufs =
(void * __user)(uintptr_t)args->cmdbufs;
struct tegra_drm_reloc __user *relocs =
(void * __user)(uintptr_t)args->relocs;
struct tegra_drm_waitchk __user *waitchks =
(void * __user)(uintptr_t)args->waitchks;
No need for all the uintptr_t casts.
Will try to remove - but I do remember getting compiler warnings without them.
I think you shouldn't even have to cast to void * first. Just cast to the target type directly. I don't see why the compiler should complain.
This is what I get without them:
drivers/gpu/host1x/drm/gr2d.c:108:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] drivers/gpu/host1x/drm/gr2d.c:110:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] drivers/gpu/host1x/drm/gr2d.c:112:3: warning: cast to pointer from integer of different size [-Wint-to-pointer-c
The problem is that the fields are __u64's and can't be cast directly into 32-bit pointers.
Alright.
That's the security firewall. It walks through each submit, and ensures that each register write that writes an address, goes through the host1x reloc mechanism. This way user space cannot ask 2D to write to arbitrary memory locations.
I see. Can this be made more generic? Perhaps adding a table of valid registers to the device and use a generic function to iterate over that instead of having to provide the same function for each client.
For which one does gcc generate more efficient code? I've thought a switch-case statement might get compiled into something more efficient than a table lookup.
But the rest of the code is generic - just the one function which compares against known address registers is specific to 2D.
Table lookup should be pretty fast. I wouldn't worry too much about performance at this stage, though. Readability is more important in my opinion. A lookup table is a lot more readable and reusable I think. If it turns out that using a function is actually faster we can always optimize later.
Thierry
On 07.02.2013 23:07, Thierry Reding wrote:
On Wed, Feb 06, 2013 at 01:23:17PM -0800, Terje Bergström wrote:
That's the security firewall. It walks through each submit, and ensures that each register write that writes an address, goes through the host1x reloc mechanism. This way user space cannot ask 2D to write to arbitrary memory locations.
I see. Can this be made more generic? Perhaps adding a table of valid registers to the device and use a generic function to iterate over that instead of having to provide the same function for each client.
For which one does gcc generate more efficient code? I've thought a switch-case statement might get compiled into something more efficient than a table lookup. But the rest of the code is generic - just the one function which compares against known address registers is specific to 2D.
Table lookup should be pretty fast. I wouldn't worry too much about performance at this stage, though. Readability is more important in my opinion. A lookup table is a lot more readable and reusable I think. If it turns out that using a function is actually faster we can always optimize later.
You're right about performance. We already saw quite a bad performance hit with the current firewall, so we'll need to worry about performance later.
I'll take a look at converting the register list to a table. Instead of always doing a linear search of a table, a bitfield might be more appropriate.
Terje
On Sun, Feb 10, 2013 at 04:42:53PM -0800, Terje Bergström wrote:
On 07.02.2013 23:07, Thierry Reding wrote:
On Wed, Feb 06, 2013 at 01:23:17PM -0800, Terje Bergström wrote:
That's the security firewall. It walks through each submit, and ensures that each register write that writes an address, goes through the host1x reloc mechanism. This way user space cannot ask 2D to write to arbitrary memory locations.
I see. Can this be made more generic? Perhaps adding a table of valid registers to the device and use a generic function to iterate over that instead of having to provide the same function for each client.
For which one does gcc generate more efficient code? I've thought a switch-case statement might get compiled into something more efficient than a table lookup. But the rest of the code is generic - just the one function which compares against known address registers is specific to 2D.
Table lookup should be pretty fast. I wouldn't worry too much about performance at this stage, though. Readability is more important in my opinion. A lookup table is a lot more readable and reusable I think. If it turns out that using a function is actually faster we can always optimize later.
You're right about performance. We already saw quite a bad performance hit with the current firewall, so we'll need to worry about performance later.
I guess the additional overhead of looking up in a table vs. an actual function being run will be rather small compared to the total overhead incurred by having the firewall in the first place.
I'll take a look at converting the register list to a table. Instead of always doing a linear search of a table, a bitfield might be more appropriate.
I don't know. Just a plain table with register offsets seems a lot more straightforward than a bitfield. In my opinion an array of offsets is a lot more readable than a field of bits. Especially since you can't just setup a bitfield easily with initialized values.
Thierry
On 10.02.2013 22:44, Thierry Reding wrote:
On Sun, Feb 10, 2013 at 04:42:53PM -0800, Terje Bergström wrote:
You're right about performance. We already saw quite a bad performance hit with the current firewall, so we'll need to worry about performance later.
I guess the additional overhead of looking up in a table vs. an actual function being run will be rather small compared to the total overhead incurred by having the firewall in the first place.
Yeah, I'll just implement a simple linear table lookup and let's see what happens. I'll optimize with bitfield if needed.
Terje
On 15.01.2013 13:43, Terje Bergstrom wrote:
This set of patches adds support for Tegra20 and Tegra30 host1x and 2D. It is based on linux-next-20130114. The set was regenerated with git format-patch -M.
I have pushed both the kernel patches and libdrm changes to git@gitorious.org:linux-host1x/linux-host1x.git and git@gitorious.org:linux-host1x/libdrm-host1x.git.
They're not intended to compete with any other repository - I just wanted to have one place where people can download the kernel patches, libdrm changes and test suite. I'll remove them once they've served their purpose.
I'd appreciate feedback on the patches. So far the only feedback has been from Stephen about clock changes.
Terje
On 15.01.2013 03:43, Terje Bergstrom wrote:
This set of patches adds support for Tegra20 and Tegra30 host1x and 2D. It is based on linux-next-20130114. The set was regenerated with git format-patch -M.
The fifth version merges DRM and host1x drivers into one driver. This allowed moving include/linux/host1x.h back into the driver and removed the need for a dummy platform device. This version also uses the code from tegradrm driver almost as is, so there are a lot less actual code changes.
This patch set does not have the host1x allocator, but it uses CMA helpers for memory management.
host1x is the driver that controls host1x hardware. It supports host1x command channels, synchronization, and memory management. It is sectioned into logical driver under drivers/gpu/host1x and physical driver under drivers/host1x/hw. The physical driver is compiled with the hardware headers of the particular host1x version.
The hardware units are described (briefly) in the Tegra2 TRM. Wiki page https://gitorious.org/linux-tegra-drm/pages/Host1xIntroduction also contains a short description of the functionality.
The patch set merges tegradrm into host1x and adds 2D driver, which uses host1x channels and sync points. The patch set also adds user space API to tegradrm for accessing host1x and 2D.
Could you please have a look at the patch set? Are you expecting me to do something? I got barely any response (only one from Stephen), which means host1x upstreaming is completely stalled.
Lucas mentioned in another thread that he'd need to review also libdrm, and then there's the host1x allocator issue which Lucas wanted to create but hasn't had time.
Could we postpone the host1x allocator to a later patchset? It shouldn't be a merge blocked, as it shouldn't have an API impact, and it is strictly needed with IOMMU support. The DRM CMA helper is already working.
If the amount of code related to channels are the problem, should we scope this patch set to include only sync point support and drop channels, debug and 2D driver? That way we could get at least the basic code rearrangement done.
Terje
dri-devel@lists.freedesktop.org