This patch-set enables amdkfd to provide the ability to HSA processes to create SDMA user-mode queues.
The queues can be scheduled on either one of Kaveri's two SDMA engines. The assignment is done during the creation of the queue and it is alternating between the first engine and the second. e.g. first SDMA queue will be assigned to engine 1, second SDMA queue will be assigned to engine 2, third SDMA queue will be assigned to engine 1 and so forth.
The creation and destruction of the queues is done through the same IOCTLs that are used to create regular compute queues. The identification in the create queue ioctl is done by using the queue_type argument that is passed by the HSA process to the amdkfd. That argument is already present in the current interface so it is backward compatible.
The patch-set adds four new functions to the interface between kfd and kgd. Three of those functions are used only in no-HWS mode, which is used when during debug and bring-up.
The main abstraction is done at the MQD level, which has a different layout than a compute MQD.
Oded Gabbay
Ben Goz (6): drm/amd: Add SDMA functions to kfd-->kgd interface drm/radeon: Implement SDMA interface functions amdkfd: Add SDMA mqd support amdkfd: Add SDMA user-mode queues support to QCM amdkfd: Identify SDMA queue in create queue ioctl amdkfd: Pass queue type to pqm_create_queue()
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 167 ++++++++++++++++- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 5 + drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 121 +++++++++++++ drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 8 + .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 2 +- drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 17 +- drivers/gpu/drm/radeon/cik_reg.h | 200 ++++++++++++++++++++- drivers/gpu/drm/radeon/radeon_kfd.c | 132 +++++++++++++- 9 files changed, 641 insertions(+), 17 deletions(-)
From: Ben Goz ben.goz@amd.com
This patch adds four new functions to the kfd2kgd interface:
1. init_sdma_engines() - Initializes the SDMA engines through GPU registers.
2. hqd_sdma_load() - Loads SDMA mqd to a H/W SDMA hqd slot. Used only in no HWS mode.
3. hqd_sdma_is_occupied() - Checks if an SDMA hqd slot is occupied. Used only in no HWS mode.
4. hqd_sdma_destroy() - Destructs and preempts the SDMA queue assigned to that SDMA hqd slot. Used only in no HWS mode.
These functions are needed to support SDMA queues scheduling.
Signed-off-by: Ben Goz ben.goz@amd.com Reviewed-by: Oded Gabbay oded.gabbay@amd.com --- drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h index 47b5519..3da21e7 100644 --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h @@ -141,13 +141,23 @@ struct kgd2kfd_calls { * * @init_pipeline: Initialized the compute pipelines. * + * @init_sdma_engines: Initialize the sdma engines. + * * @hqd_load: Loads the mqd structure to a H/W hqd slot. used only for no cp * sceduling mode. * + * @hqd_sdma_load: Loads the SDMA mqd structure to a H/W SDMA hqd slot. + * used only for no HWS mode. + * * @hqd_is_occupies: Checks if a hqd slot is occupied. * * @hqd_destroy: Destructs and preempts the queue assigned to that hqd slot. * + * @hqd_sdma_is_occupied: Checks if an SDMA hqd slot is occupied. + * + * @hqd_sdma_destroy: Destructs and preempts the SDMA queue assigned to that + * SDMA hqd slot. + * * @get_fw_version: Returns FW versions from the header * * This structure contains function pointers to services that the kgd driver @@ -179,16 +189,19 @@ struct kfd2kgd_calls { int (*init_memory)(struct kgd_dev *kgd); int (*init_pipeline)(struct kgd_dev *kgd, uint32_t pipe_id, uint32_t hpd_size, uint64_t hpd_gpu_addr); - + int (*init_sdma_engines)(struct kgd_dev *kgd); int (*hqd_load)(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id, uint32_t queue_id, uint32_t __user *wptr); - + int (*hqd_sdma_load)(struct kgd_dev *kgd, void *mqd); bool (*hqd_is_occupies)(struct kgd_dev *kgd, uint64_t queue_address, uint32_t pipe_id, uint32_t queue_id);
int (*hqd_destroy)(struct kgd_dev *kgd, uint32_t reset_type, unsigned int timeout, uint32_t pipe_id, uint32_t queue_id); + bool (*hqd_sdma_is_occupied)(struct kgd_dev *kgd, void *mqd); + int (*hqd_sdma_destroy)(struct kgd_dev *kgd, void *mqd, + unsigned int timeout); uint16_t (*get_fw_version)(struct kgd_dev *kgd, enum kgd_engine_type type); };
From: Ben Goz ben.goz@amd.com
This patch implements the new SDMA interface functions. It also adds defines and structures related to SDMA registers.
Signed-off-by: Ben Goz ben.goz@amd.com --- drivers/gpu/drm/radeon/cik_reg.h | 200 +++++++++++++++++++++++++++++++++++- drivers/gpu/drm/radeon/radeon_kfd.c | 132 +++++++++++++++++++++++- 2 files changed, 329 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/radeon/cik_reg.h b/drivers/gpu/drm/radeon/cik_reg.h index 79c45e8..5008964 100644 --- a/drivers/gpu/drm/radeon/cik_reg.h +++ b/drivers/gpu/drm/radeon/cik_reg.h @@ -147,10 +147,73 @@
#define CIK_LB_DESKTOP_HEIGHT 0x6b0c
+#define KFD_CIK_SDMA_QUEUE_OFFSET 0x200 + +#define SQ_IND_INDEX 0x8DE0 +#define SQ_CMD 0x8DEC +#define SQ_IND_DATA 0x8DE4 + +#define TCP_WATCH0_ADDR_H (0x32A0*4) +#define TCP_WATCH1_ADDR_H (0x32A3*4) +#define TCP_WATCH2_ADDR_H (0x32A6*4) +#define TCP_WATCH3_ADDR_H (0x32A9*4) +#define TCP_WATCH0_ADDR_L (0x32A1*4) +#define TCP_WATCH1_ADDR_L (0x32A4*4) +#define TCP_WATCH2_ADDR_L (0x32A7*4) +#define TCP_WATCH3_ADDR_L (0x32AA*4) +#define TCP_WATCH0_CNTL (0x32A2*4) +#define TCP_WATCH1_CNTL (0x32A5*4) +#define TCP_WATCH2_CNTL (0x32A8*4) +#define TCP_WATCH3_CNTL (0x32AB*4) + #define CP_HQD_IQ_RPTR 0xC970u #define AQL_ENABLE (1U << 0) - -#define IDLE (1 << 2) +#define SDMA0_RLC0_RB_CNTL 0xD400u +#define RB_ENABLE (1 << 0) +#define RB_SIZE(x) (x << 1) +#define RB_SWAP_ENABLE (1 << 9) +#define RPTR_WRITEBACK_ENABLE (1 << 12) +#define RPTR_WRITEBACK_SWAP_ENABLE (1 << 13) +#define RPTR_WRITEBACK_TIMER(x) (x << 16) +#define RB_VMID(x) (x << 24) +#define SDMA0_RLC0_RB_BASE 0xD404u +#define SDMA0_RLC0_RB_BASE_HI 0xD408u +#define SDMA0_RLC0_RB_RPTR 0xD40Cu +#define SDMA0_RLC0_RB_WPTR 0xD410u +#define SDMA0_RLC0_RB_WPTR_POLL_CNTL 0xD414u +#define SDMA0_RLC0_RB_WPTR_POLL_ADDR_HI 0xD418u +#define SDMA0_RLC0_RB_WPTR_POLL_ADDR_LO 0xD41Cu +#define SDMA0_RLC0_RB_RPTR_ADDR_HI 0xD420u +#define SDMA0_RLC0_RB_RPTR_ADDR_LO 0xD424u +#define SDMA0_RLC0_IB_CNTL 0xD428u +#define SDMA0_RLC0_IB_RPTR 0xD42Cu +#define SDMA0_RLC0_IB_OFFSET 0xD430u +#define SDMA0_RLC0_IB_BASE_LO 0xD434u +#define SDMA0_RLC0_IB_BASE_HI 0xD438u +#define SDMA0_RLC0_IB_SIZE 0xD43Cu +#define SDMA0_RLC0_SKIP_CNTL 0xD440u +#define SDMA0_RLC0_CONTEXT_STATUS 0xD444u +#define SELECTED (1 << 0) +#define IDLE (1 << 2) +#define EXPIRED (1 << 3) +#define EXCEPTION (1 << 4) +#define CTXSW_ABLE (1 << 7) +#define CTXSW_READY (1 << 8) +#define SDMA0_RLC0_DOORBELL 0xD448u +#define OFFSET(x) (x << 0) +#define DB_ENABLE (1 << 28) +#define CAPTURED (1 << 30) +#define SDMA0_RLC0_VIRTUAL_ADDR 0xD49Cu +#define ATC (1 << 0) +#define VA_PTR32 (1 << 4) +#define VA_SHARED_BASE(x) (x << 8) +#define VM_HOLE (1 << 30) +#define SDMA0_RLC0_APE1_CNTL 0xD4A0u +#define SDMA0_RLC0_DOORBELL_LOG 0xD4A4u +#define SDMA0_RLC0_WATERMARK 0xD4A8u +#define SDMA0_CNTL 0xD010 +#define SDMA1_CNTL 0xD810 +#define AUTO_CTXSW_ENABLE (1 << 18)
struct cik_mqd { uint32_t header; @@ -283,4 +346,137 @@ struct cik_mqd { uint32_t queue_doorbell_id15; };
+struct cik_sdma_rlc_registers { + uint32_t sdma_rlc_rb_cntl; + uint32_t sdma_rlc_rb_base; + uint32_t sdma_rlc_rb_base_hi; + uint32_t sdma_rlc_rb_rptr; + uint32_t sdma_rlc_rb_wptr; + uint32_t sdma_rlc_rb_wptr_poll_cntl; + uint32_t sdma_rlc_rb_wptr_poll_addr_hi; + uint32_t sdma_rlc_rb_wptr_poll_addr_lo; + uint32_t sdma_rlc_rb_rptr_addr_hi; + uint32_t sdma_rlc_rb_rptr_addr_lo; + uint32_t sdma_rlc_ib_cntl; + uint32_t sdma_rlc_ib_rptr; + uint32_t sdma_rlc_ib_offset; + uint32_t sdma_rlc_ib_base_lo; + uint32_t sdma_rlc_ib_base_hi; + uint32_t sdma_rlc_ib_size; + uint32_t sdma_rlc_skip_cntl; + uint32_t sdma_rlc_context_status; + uint32_t sdma_rlc_doorbell; + uint32_t sdma_rlc_virtual_addr; + uint32_t sdma_rlc_ape1_cntl; + uint32_t sdma_rlc_doorbell_log; + uint32_t reserved_22; + uint32_t reserved_23; + uint32_t reserved_24; + uint32_t reserved_25; + uint32_t reserved_26; + uint32_t reserved_27; + uint32_t reserved_28; + uint32_t reserved_29; + uint32_t reserved_30; + uint32_t reserved_31; + uint32_t reserved_32; + uint32_t reserved_33; + uint32_t reserved_34; + uint32_t reserved_35; + uint32_t reserved_36; + uint32_t reserved_37; + uint32_t reserved_38; + uint32_t reserved_39; + uint32_t reserved_40; + uint32_t reserved_41; + uint32_t reserved_42; + uint32_t reserved_43; + uint32_t reserved_44; + uint32_t reserved_45; + uint32_t reserved_46; + uint32_t reserved_47; + uint32_t reserved_48; + uint32_t reserved_49; + uint32_t reserved_50; + uint32_t reserved_51; + uint32_t reserved_52; + uint32_t reserved_53; + uint32_t reserved_54; + uint32_t reserved_55; + uint32_t reserved_56; + uint32_t reserved_57; + uint32_t reserved_58; + uint32_t reserved_59; + uint32_t reserved_60; + uint32_t reserved_61; + uint32_t reserved_62; + uint32_t reserved_63; + uint32_t reserved_64; + uint32_t reserved_65; + uint32_t reserved_66; + uint32_t reserved_67; + uint32_t reserved_68; + uint32_t reserved_69; + uint32_t reserved_70; + uint32_t reserved_71; + uint32_t reserved_72; + uint32_t reserved_73; + uint32_t reserved_74; + uint32_t reserved_75; + uint32_t reserved_76; + uint32_t reserved_77; + uint32_t reserved_78; + uint32_t reserved_79; + uint32_t reserved_80; + uint32_t reserved_81; + uint32_t reserved_82; + uint32_t reserved_83; + uint32_t reserved_84; + uint32_t reserved_85; + uint32_t reserved_86; + uint32_t reserved_87; + uint32_t reserved_88; + uint32_t reserved_89; + uint32_t reserved_90; + uint32_t reserved_91; + uint32_t reserved_92; + uint32_t reserved_93; + uint32_t reserved_94; + uint32_t reserved_95; + uint32_t reserved_96; + uint32_t reserved_97; + uint32_t reserved_98; + uint32_t reserved_99; + uint32_t reserved_100; + uint32_t reserved_101; + uint32_t reserved_102; + uint32_t reserved_103; + uint32_t reserved_104; + uint32_t reserved_105; + uint32_t reserved_106; + uint32_t reserved_107; + uint32_t reserved_108; + uint32_t reserved_109; + uint32_t reserved_110; + uint32_t reserved_111; + uint32_t reserved_112; + uint32_t reserved_113; + uint32_t reserved_114; + uint32_t reserved_115; + uint32_t reserved_116; + uint32_t reserved_117; + uint32_t reserved_118; + uint32_t reserved_119; + uint32_t reserved_120; + uint32_t reserved_121; + uint32_t reserved_122; + uint32_t reserved_123; + uint32_t reserved_124; + uint32_t reserved_125; + uint32_t reserved_126; + uint32_t reserved_127; + uint32_t sdma_engine_id; + uint32_t sdma_queue_id; +}; + #endif diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c index 242fd8b..b77aee0 100644 --- a/drivers/gpu/drm/radeon/radeon_kfd.c +++ b/drivers/gpu/drm/radeon/radeon_kfd.c @@ -71,13 +71,17 @@ static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id,
static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id, uint32_t queue_id, uint32_t __user *wptr); - +static int kgd_hqd_sdma_load(struct kgd_dev *kgd, void *mqd); static bool kgd_hqd_is_occupies(struct kgd_dev *kgd, uint64_t queue_address, uint32_t pipe_id, uint32_t queue_id);
static int kgd_hqd_destroy(struct kgd_dev *kgd, uint32_t reset_type, unsigned int timeout, uint32_t pipe_id, uint32_t queue_id); +static bool kgd_hqd_sdma_is_occupied(struct kgd_dev *kgd, void *mqd); +static int kgd_hqd_sdma_destroy(struct kgd_dev *kgd, void *mqd, + unsigned int timeout); +static int kgd_init_sdma_engines(struct kgd_dev *kgd);
static const struct kfd2kgd_calls kfd2kgd = { .init_sa_manager = init_sa_manager, @@ -91,9 +95,13 @@ static const struct kfd2kgd_calls kfd2kgd = { .set_pasid_vmid_mapping = kgd_set_pasid_vmid_mapping, .init_memory = kgd_init_memory, .init_pipeline = kgd_init_pipeline, + .init_sdma_engines = kgd_init_sdma_engines, .hqd_load = kgd_hqd_load, + .hqd_sdma_load = kgd_hqd_sdma_load, .hqd_is_occupies = kgd_hqd_is_occupies, + .hqd_sdma_is_occupied = kgd_hqd_sdma_is_occupied, .hqd_destroy = kgd_hqd_destroy, + .hqd_sdma_destroy = kgd_hqd_sdma_destroy, .get_fw_version = get_fw_version };
@@ -435,11 +443,43 @@ static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id, return 0; }
+static int kgd_init_sdma_engines(struct kgd_dev *kgd) +{ + uint32_t value; + + value = read_register(kgd, SDMA0_CNTL); + value |= AUTO_CTXSW_ENABLE; + write_register(kgd, SDMA0_CNTL, value); + + value = read_register(kgd, SDMA1_CNTL); + value |= AUTO_CTXSW_ENABLE; + write_register(kgd, SDMA1_CNTL, value); + + return 0; +} + +static inline uint32_t get_sdma_base_addr(struct cik_sdma_rlc_registers *m) +{ + uint32_t retval; + + retval = m->sdma_engine_id * SDMA1_REGISTER_OFFSET + + m->sdma_queue_id * KFD_CIK_SDMA_QUEUE_OFFSET; + + pr_debug("kfd: sdma base address: 0x%x\n", retval); + + return retval; +} + static inline struct cik_mqd *get_mqd(void *mqd) { return (struct cik_mqd *)mqd; }
+static inline struct cik_sdma_rlc_registers *get_sdma_mqd(void *mqd) +{ + return (struct cik_sdma_rlc_registers *)mqd; +} + static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id, uint32_t queue_id, uint32_t __user *wptr) { @@ -517,6 +557,45 @@ static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id, return 0; }
+static int kgd_hqd_sdma_load(struct kgd_dev *kgd, void *mqd) +{ + struct cik_sdma_rlc_registers *m; + uint32_t sdma_base_addr; + + m = get_sdma_mqd(mqd); + sdma_base_addr = get_sdma_base_addr(m); + + write_register(kgd, + sdma_base_addr + SDMA0_RLC0_VIRTUAL_ADDR, + m->sdma_rlc_virtual_addr); + + write_register(kgd, + sdma_base_addr + SDMA0_RLC0_RB_BASE, + m->sdma_rlc_rb_base); + + write_register(kgd, + sdma_base_addr + SDMA0_RLC0_RB_BASE_HI, + m->sdma_rlc_rb_base_hi); + + write_register(kgd, + sdma_base_addr + SDMA0_RLC0_RB_RPTR_ADDR_LO, + m->sdma_rlc_rb_rptr_addr_lo); + + write_register(kgd, + sdma_base_addr + SDMA0_RLC0_RB_RPTR_ADDR_HI, + m->sdma_rlc_rb_rptr_addr_hi); + + write_register(kgd, + sdma_base_addr + SDMA0_RLC0_DOORBELL, + m->sdma_rlc_doorbell); + + write_register(kgd, + sdma_base_addr + SDMA0_RLC0_RB_CNTL, + m->sdma_rlc_rb_cntl); + + return 0; +} + static bool kgd_hqd_is_occupies(struct kgd_dev *kgd, uint64_t queue_address, uint32_t pipe_id, uint32_t queue_id) { @@ -538,6 +617,24 @@ static bool kgd_hqd_is_occupies(struct kgd_dev *kgd, uint64_t queue_address, return retval; }
+static bool kgd_hqd_sdma_is_occupied(struct kgd_dev *kgd, void *mqd) +{ + struct cik_sdma_rlc_registers *m; + uint32_t sdma_base_addr; + uint32_t sdma_rlc_rb_cntl; + + m = get_sdma_mqd(mqd); + sdma_base_addr = get_sdma_base_addr(m); + + sdma_rlc_rb_cntl = read_register(kgd, + sdma_base_addr + SDMA0_RLC0_RB_CNTL); + + if (sdma_rlc_rb_cntl & RB_ENABLE) + return true; + + return false; +} + static int kgd_hqd_destroy(struct kgd_dev *kgd, uint32_t reset_type, unsigned int timeout, uint32_t pipe_id, uint32_t queue_id) @@ -566,6 +663,39 @@ static int kgd_hqd_destroy(struct kgd_dev *kgd, uint32_t reset_type, return 0; }
+static int kgd_hqd_sdma_destroy(struct kgd_dev *kgd, void *mqd, + unsigned int timeout) +{ + struct cik_sdma_rlc_registers *m; + uint32_t sdma_base_addr; + uint32_t temp; + + m = get_sdma_mqd(mqd); + sdma_base_addr = get_sdma_base_addr(m); + + temp = read_register(kgd, sdma_base_addr + SDMA0_RLC0_RB_CNTL); + temp = temp & ~RB_ENABLE; + write_register(kgd, sdma_base_addr + SDMA0_RLC0_RB_CNTL, temp); + + while (true) { + temp = read_register(kgd, sdma_base_addr + + SDMA0_RLC0_CONTEXT_STATUS); + if (temp & IDLE) + break; + if (timeout == 0) + return -ETIME; + msleep(20); + timeout -= 20; + } + + write_register(kgd, sdma_base_addr + SDMA0_RLC0_DOORBELL, 0); + write_register(kgd, sdma_base_addr + SDMA0_RLC0_RB_RPTR, 0); + write_register(kgd, sdma_base_addr + SDMA0_RLC0_RB_WPTR, 0); + write_register(kgd, sdma_base_addr + SDMA0_RLC0_RB_BASE, 0); + + return 0; +} + static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type) { struct radeon_device *rdev = (struct radeon_device *) kgd;
From: Ben Goz ben.goz@amd.com
This patch adds support for SDMA mqd operations: - init_mqd_sdma - uninit_mqd_sdma - load_mqd_sdma - update_mqd_sdma - destroy_mqd_sdma - is_occupied_sdma
It also adds SDMA queue information to some private structures of amdkfd.
Signed-off-by: Ben Goz ben.goz@amd.com --- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c | 121 +++++++++++++++++++++++++++ drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 8 ++ 2 files changed, 129 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c index adc3147..9eda956 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c @@ -111,6 +111,37 @@ static int init_mqd(struct mqd_manager *mm, void **mqd, return retval; }
+static int init_mqd_sdma(struct mqd_manager *mm, void **mqd, + struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr, + struct queue_properties *q) +{ + int retval; + struct cik_sdma_rlc_registers *m; + + BUG_ON(!mm || !mqd || !mqd_mem_obj); + + retval = kfd2kgd->allocate_mem(mm->dev->kgd, + sizeof(struct cik_sdma_rlc_registers), + 256, + KFD_MEMPOOL_SYSTEM_WRITECOMBINE, + (struct kgd_mem **) mqd_mem_obj); + + if (retval != 0) + return -ENOMEM; + + m = (struct cik_sdma_rlc_registers *) (*mqd_mem_obj)->cpu_ptr; + + memset(m, 0, sizeof(struct cik_sdma_rlc_registers)); + + *mqd = m; + if (gart_addr != NULL) + *gart_addr = (*mqd_mem_obj)->gpu_addr; + + retval = mm->update_mqd(mm, m, q); + + return retval; +} + static void uninit_mqd(struct mqd_manager *mm, void *mqd, struct kfd_mem_obj *mqd_mem_obj) { @@ -118,11 +149,24 @@ static void uninit_mqd(struct mqd_manager *mm, void *mqd, kfd2kgd->free_mem(mm->dev->kgd, (struct kgd_mem *) mqd_mem_obj); }
+static void uninit_mqd_sdma(struct mqd_manager *mm, void *mqd, + struct kfd_mem_obj *mqd_mem_obj) +{ + BUG_ON(!mm || !mqd); + kfd2kgd->free_mem(mm->dev->kgd, (struct kgd_mem *) mqd_mem_obj); +} + static int load_mqd(struct mqd_manager *mm, void *mqd, uint32_t pipe_id, uint32_t queue_id, uint32_t __user *wptr) { return kfd2kgd->hqd_load(mm->dev->kgd, mqd, pipe_id, queue_id, wptr); +}
+static int load_mqd_sdma(struct mqd_manager *mm, void *mqd, + uint32_t pipe_id, uint32_t queue_id, + uint32_t __user *wptr) +{ + return kfd2kgd->hqd_sdma_load(mm->dev->kgd, mqd); }
static int update_mqd(struct mqd_manager *mm, void *mqd, @@ -170,6 +214,41 @@ static int update_mqd(struct mqd_manager *mm, void *mqd, return 0; }
+static int update_mqd_sdma(struct mqd_manager *mm, void *mqd, + struct queue_properties *q) +{ + struct cik_sdma_rlc_registers *m; + + BUG_ON(!mm || !mqd || !q); + + m = get_sdma_mqd(mqd); + m->sdma_rlc_rb_cntl = + RB_SIZE((ffs(q->queue_size / sizeof(unsigned int)))) | + RB_VMID(q->vmid) | + RPTR_WRITEBACK_ENABLE | + RPTR_WRITEBACK_TIMER(6); + + m->sdma_rlc_rb_base = lower_32_bits(q->queue_address >> 8); + m->sdma_rlc_rb_base_hi = upper_32_bits(q->queue_address >> 8); + m->sdma_rlc_rb_rptr_addr_lo = lower_32_bits((uint64_t)q->read_ptr); + m->sdma_rlc_rb_rptr_addr_hi = upper_32_bits((uint64_t)q->read_ptr); + m->sdma_rlc_doorbell = OFFSET(q->doorbell_off) | DB_ENABLE; + m->sdma_rlc_virtual_addr = q->sdma_vm_addr; + + m->sdma_engine_id = q->sdma_engine_id; + m->sdma_queue_id = q->sdma_queue_id; + + q->is_active = false; + if (q->queue_size > 0 && + q->queue_address != 0 && + q->queue_percent > 0) { + m->sdma_rlc_rb_cntl |= RB_ENABLE; + q->is_active = true; + } + + return 0; +} + static int destroy_mqd(struct mqd_manager *mm, void *mqd, enum kfd_preempt_type type, unsigned int timeout, uint32_t pipe_id, @@ -179,6 +258,18 @@ static int destroy_mqd(struct mqd_manager *mm, void *mqd, pipe_id, queue_id); }
+/* + * preempt type here is ignored because there is only one way + * to preempt sdma queue + */ +static int destroy_mqd_sdma(struct mqd_manager *mm, void *mqd, + enum kfd_preempt_type type, + unsigned int timeout, uint32_t pipe_id, + uint32_t queue_id) +{ + return kfd2kgd->hqd_sdma_destroy(mm->dev->kgd, mqd, timeout); +} + static bool is_occupied(struct mqd_manager *mm, void *mqd, uint64_t queue_address, uint32_t pipe_id, uint32_t queue_id) @@ -189,6 +280,13 @@ static bool is_occupied(struct mqd_manager *mm, void *mqd,
}
+static bool is_occupied_sdma(struct mqd_manager *mm, void *mqd, + uint64_t queue_address, uint32_t pipe_id, + uint32_t queue_id) +{ + return kfd2kgd->hqd_sdma_is_occupied(mm->dev->kgd, mqd); +} + /* * HIQ MQD Implementation, concrete implementation for HIQ MQD implementation. * The HIQ queue in Kaveri is using the same MQD structure as all the user mode @@ -301,6 +399,21 @@ static int update_mqd_hiq(struct mqd_manager *mm, void *mqd, return 0; }
+/* + * SDMA MQD Implementation + */ + +struct cik_sdma_rlc_registers *get_sdma_mqd(void *mqd) +{ + struct cik_sdma_rlc_registers *m; + + BUG_ON(!mqd); + + m = (struct cik_sdma_rlc_registers *)mqd; + + return m; +} + struct mqd_manager *mqd_manager_init(enum KFD_MQD_TYPE type, struct kfd_dev *dev) { @@ -335,6 +448,14 @@ struct mqd_manager *mqd_manager_init(enum KFD_MQD_TYPE type, mqd->destroy_mqd = destroy_mqd; mqd->is_occupied = is_occupied; break; + case KFD_MQD_TYPE_CIK_SDMA: + mqd->init_mqd = init_mqd_sdma; + mqd->uninit_mqd = uninit_mqd_sdma; + mqd->load_mqd = load_mqd_sdma; + mqd->update_mqd = update_mqd_sdma; + mqd->destroy_mqd = destroy_mqd_sdma; + mqd->is_occupied = is_occupied_sdma; + break; default: kfree(mqd); return NULL; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index a2e053c..87735d8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -286,6 +286,10 @@ struct queue_properties { bool is_active; /* Not relevant for user mode queues in cp scheduling */ unsigned int vmid; + /* Relevant only for sdma queues*/ + uint32_t sdma_engine_id; + uint32_t sdma_queue_id; + uint32_t sdma_vm_addr; };
/** @@ -328,6 +332,8 @@ struct queue { uint32_t pipe; uint32_t queue;
+ unsigned int sdma_id; + struct kfd_process *process; struct kfd_dev *device; }; @@ -530,6 +536,8 @@ int kfd_init_apertures(struct kfd_process *process); /* Queue Context Management */ inline uint32_t lower_32(uint64_t x); inline uint32_t upper_32(uint64_t x); +struct cik_sdma_rlc_registers *get_sdma_mqd(void *mqd); +inline uint32_t get_sdma_base_addr(struct cik_sdma_rlc_registers *m);
int init_queue(struct queue **q, struct queue_properties properties); void uninit_queue(struct queue *q);
From: Ben Goz ben.goz@amd.com
This patch adds support for SDMA user-mode queues to the QCM - the Queue management system that manages queues-per-device and queues-per-process.
Signed-off-by: Ben Goz ben.goz@amd.com --- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 167 +++++++++++++++++++-- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 5 + .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 2 +- 3 files changed, 164 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c index 7b6df51..55ee2da 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c @@ -46,9 +46,24 @@ static int set_pasid_vmid_mapping(struct device_queue_manager *dqm, static int create_compute_queue_nocpsch(struct device_queue_manager *dqm, struct queue *q, struct qcm_process_device *qpd); + static int execute_queues_cpsch(struct device_queue_manager *dqm, bool lock); static int destroy_queues_cpsch(struct device_queue_manager *dqm, bool lock);
+static int create_sdma_queue_nocpsch(struct device_queue_manager *dqm, + struct queue *q, + struct qcm_process_device *qpd); + +static void deallocate_sdma_queue(struct device_queue_manager *dqm, + unsigned int sdma_queue_id); + +static inline +enum KFD_MQD_TYPE get_mqd_type_from_queue_type(enum kfd_queue_type type) +{ + if (type == KFD_QUEUE_TYPE_SDMA) + return KFD_MQD_TYPE_CIK_SDMA; + return KFD_MQD_TYPE_CIK_CP; +}
static inline unsigned int get_pipes_num(struct device_queue_manager *dqm) { @@ -189,7 +204,10 @@ static int create_queue_nocpsch(struct device_queue_manager *dqm, *allocated_vmid = qpd->vmid; q->properties.vmid = qpd->vmid;
- retval = create_compute_queue_nocpsch(dqm, q, qpd); + if (q->properties.type == KFD_QUEUE_TYPE_COMPUTE) + retval = create_compute_queue_nocpsch(dqm, q, qpd); + if (q->properties.type == KFD_QUEUE_TYPE_SDMA) + retval = create_sdma_queue_nocpsch(dqm, q, qpd);
if (retval != 0) { if (list_empty(&qpd->queues_list)) { @@ -202,7 +220,8 @@ static int create_queue_nocpsch(struct device_queue_manager *dqm,
list_add(&q->list, &qpd->queues_list); dqm->queue_count++; - + if (q->properties.type == KFD_QUEUE_TYPE_SDMA) + dqm->sdma_queue_count++; mutex_unlock(&dqm->lock); return 0; } @@ -279,8 +298,7 @@ static int destroy_queue_nocpsch(struct device_queue_manager *dqm, struct queue *q) { int retval; - struct mqd_manager *mqd; - + struct mqd_manager *mqd, *mqd_sdma; BUG_ON(!dqm || !q || !q->mqd || !qpd);
retval = 0; @@ -294,6 +312,12 @@ static int destroy_queue_nocpsch(struct device_queue_manager *dqm, goto out; }
+ mqd_sdma = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_SDMA); + if (mqd_sdma == NULL) { + mutex_unlock(&dqm->lock); + return -ENOMEM; + } + retval = mqd->destroy_mqd(mqd, q->mqd, KFD_PREEMPT_TYPE_WAVEFRONT, QUEUE_PREEMPT_DEFAULT_TIMEOUT_MS, @@ -302,7 +326,12 @@ static int destroy_queue_nocpsch(struct device_queue_manager *dqm, if (retval != 0) goto out;
- deallocate_hqd(dqm, q); + if (q->properties.type == KFD_QUEUE_TYPE_COMPUTE) + deallocate_hqd(dqm, q); + else if (q->properties.type == KFD_QUEUE_TYPE_SDMA) { + dqm->sdma_queue_count--; + deallocate_sdma_queue(dqm, q->sdma_id); + }
mqd->uninit_mqd(mqd, q->mqd, q->mqd_mem_obj);
@@ -324,7 +353,7 @@ static int update_queue(struct device_queue_manager *dqm, struct queue *q) BUG_ON(!dqm || !q || !q->mqd);
mutex_lock(&dqm->lock); - mqd = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_COMPUTE); + mqd = dqm->get_mqd_manager(dqm, q->properties.type); if (mqd == NULL) { mutex_unlock(&dqm->lock); return -ENOMEM; @@ -536,6 +565,11 @@ static int init_pipelines(struct device_queue_manager *dqm, }
+static int init_sdma_engines(struct device_queue_manager *dqm) +{ + return kfd2kgd->init_sdma_engines(dqm->dev->kgd); +} + static int init_scheduler(struct device_queue_manager *dqm) { int retval; @@ -549,7 +583,10 @@ static int init_scheduler(struct device_queue_manager *dqm) return retval;
retval = init_memory(dqm); + if (retval != 0) + return retval;
+ retval = init_sdma_engines(dqm); return retval; }
@@ -565,6 +602,7 @@ static int initialize_nocpsch(struct device_queue_manager *dqm) mutex_init(&dqm->lock); INIT_LIST_HEAD(&dqm->queues); dqm->queue_count = dqm->next_pipe_to_allocate = 0; + dqm->sdma_queue_count = 0; dqm->allocated_queues = kcalloc(get_pipes_num(dqm), sizeof(unsigned int), GFP_KERNEL); if (!dqm->allocated_queues) { @@ -576,6 +614,7 @@ static int initialize_nocpsch(struct device_queue_manager *dqm) dqm->allocated_queues[i] = (1 << QUEUES_PER_PIPE) - 1;
dqm->vmid_bitmap = (1 << VMID_PER_DEVICE) - 1; + dqm->sdma_bitmap = (1 << CIK_SDMA_QUEUES) - 1;
init_scheduler(dqm); return 0; @@ -607,6 +646,77 @@ static int stop_nocpsch(struct device_queue_manager *dqm) return 0; }
+static int allocate_sdma_queue(struct device_queue_manager *dqm, + unsigned int *sdma_queue_id) +{ + int bit; + + if (dqm->sdma_bitmap == 0) + return -ENOMEM; + + bit = find_first_bit((unsigned long *)&dqm->sdma_bitmap, + CIK_SDMA_QUEUES); + + clear_bit(bit, (unsigned long *)&dqm->sdma_bitmap); + *sdma_queue_id = bit; + + return 0; +} + +static void deallocate_sdma_queue(struct device_queue_manager *dqm, + unsigned int sdma_queue_id) +{ + if (sdma_queue_id < 0 || sdma_queue_id >= CIK_SDMA_QUEUES) + return; + set_bit(sdma_queue_id, (unsigned long *)&dqm->sdma_bitmap); +} + +static void init_sdma_vm(struct device_queue_manager *dqm, struct queue *q, + struct qcm_process_device *qpd) +{ + uint32_t value = ATC; + + if (q->process->is_32bit_user_mode) + value |= VA_PTR32 | get_sh_mem_bases_32(qpd_to_pdd(qpd)); + else + value |= VA_SHARED_BASE(get_sh_mem_bases_nybble_64( + qpd_to_pdd(qpd))); + q->properties.sdma_vm_addr = value; +} + +static int create_sdma_queue_nocpsch(struct device_queue_manager *dqm, + struct queue *q, + struct qcm_process_device *qpd) +{ + struct mqd_manager *mqd; + int retval; + + mqd = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_SDMA); + if (!mqd) + return -ENOMEM; + + retval = allocate_sdma_queue(dqm, &q->sdma_id); + if (retval != 0) + return retval; + + q->properties.sdma_queue_id = q->sdma_id % CIK_SDMA_QUEUES_PER_ENGINE; + q->properties.sdma_engine_id = q->sdma_id / CIK_SDMA_ENGINE_NUM; + + pr_debug("kfd: sdma id is: %d\n", q->sdma_id); + pr_debug(" sdma queue id: %d\n", q->properties.sdma_queue_id); + pr_debug(" sdma engine id: %d\n", q->properties.sdma_engine_id); + + retval = mqd->init_mqd(mqd, &q->mqd, &q->mqd_mem_obj, + &q->gart_mqd_addr, &q->properties); + if (retval != 0) { + deallocate_sdma_queue(dqm, q->sdma_id); + return retval; + } + + init_sdma_vm(dqm, q, qpd); + return 0; +} + /* * Device Queue Manager implementation for cp scheduler */ @@ -648,6 +758,7 @@ static int initialize_cpsch(struct device_queue_manager *dqm) mutex_init(&dqm->lock); INIT_LIST_HEAD(&dqm->queues); dqm->queue_count = dqm->processes_count = 0; + dqm->sdma_queue_count = 0; dqm->active_runlist = false; retval = init_pipelines(dqm, get_pipes_num(dqm), 0); if (retval != 0) @@ -692,6 +803,8 @@ static int start_cpsch(struct device_queue_manager *dqm) dqm->fence_addr = dqm->fence_mem->cpu_ptr; dqm->fence_gpu_addr = dqm->fence_mem->gpu_addr;
+ init_sdma_engines(dqm); + list_for_each_entry(node, &dqm->queues, list) if (node->qpd->pqm->process && dqm->dev) kfd_bind_process_to_device(dqm->dev, @@ -762,6 +875,14 @@ static void destroy_kernel_queue_cpsch(struct device_queue_manager *dqm, mutex_unlock(&dqm->lock); }
+static void select_sdma_engine_id(struct queue *q) +{ + static int sdma_id; + + q->sdma_id = sdma_id; + sdma_id = (sdma_id + 1) % 2; +} + static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q, struct qcm_process_device *qpd, int *allocate_vmid) { @@ -777,7 +898,12 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
mutex_lock(&dqm->lock);
- mqd = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_CP); + if (q->properties.type == KFD_QUEUE_TYPE_SDMA) + select_sdma_engine_id(q); + + mqd = dqm->get_mqd_manager(dqm, + get_mqd_type_from_queue_type(q->properties.type)); + if (mqd == NULL) { mutex_unlock(&dqm->lock); return -ENOMEM; @@ -794,6 +920,9 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q, retval = execute_queues_cpsch(dqm, false); }
+ if (q->properties.type == KFD_QUEUE_TYPE_SDMA) + dqm->sdma_queue_count++; + out: mutex_unlock(&dqm->lock); return retval; @@ -817,6 +946,14 @@ static int fence_wait_timeout(unsigned int *fence_addr, return 0; }
+static int destroy_sdma_queues(struct device_queue_manager *dqm, + unsigned int sdma_engine) +{ + return pm_send_unmap_queue(&dqm->packets, KFD_QUEUE_TYPE_SDMA, + KFD_PREEMPT_TYPE_FILTER_ALL_QUEUES, 0, false, + sdma_engine); +} + static int destroy_queues_cpsch(struct device_queue_manager *dqm, bool lock) { int retval; @@ -829,6 +966,15 @@ static int destroy_queues_cpsch(struct device_queue_manager *dqm, bool lock) mutex_lock(&dqm->lock); if (dqm->active_runlist == false) goto out; + + pr_debug("kfd: Before destroying queues, sdma queue count is : %u\n", + dqm->sdma_queue_count); + + if (dqm->sdma_queue_count > 0) { + destroy_sdma_queues(dqm, 0); + destroy_sdma_queues(dqm, 1); + } + retval = pm_send_unmap_queue(&dqm->packets, KFD_QUEUE_TYPE_COMPUTE, KFD_PREEMPT_TYPE_FILTER_ALL_QUEUES, 0, false, 0); if (retval != 0) @@ -900,13 +1046,16 @@ static int destroy_queue_cpsch(struct device_queue_manager *dqm,
/* remove queue from list to prevent rescheduling after preemption */ mutex_lock(&dqm->lock); - - mqd = dqm->get_mqd_manager(dqm, KFD_MQD_TYPE_CIK_CP); + mqd = dqm->get_mqd_manager(dqm, + get_mqd_type_from_queue_type(q->properties.type)); if (!mqd) { retval = -ENOMEM; goto failed; }
+ if (q->properties.type == KFD_QUEUE_TYPE_SDMA) + dqm->sdma_queue_count--; + list_del(&q->list); dqm->queue_count--;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h index c3f189e8..554c06e 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h @@ -36,6 +36,9 @@ #define KFD_VMID_START_OFFSET (8) #define VMID_PER_DEVICE CIK_VMID_NUM #define KFD_DQM_FIRST_PIPE (0) +#define CIK_SDMA_QUEUES (4) +#define CIK_SDMA_QUEUES_PER_ENGINE (2) +#define CIK_SDMA_ENGINE_NUM (2)
struct device_process_node { struct qcm_process_device *qpd; @@ -130,8 +133,10 @@ struct device_queue_manager { struct list_head queues; unsigned int processes_count; unsigned int queue_count; + unsigned int sdma_queue_count; unsigned int next_pipe_to_allocate; unsigned int *allocated_queues; + unsigned int sdma_bitmap; unsigned int vmid_bitmap; uint64_t pipelines_addr; struct kfd_mem_obj *pipeline_mem; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c index d12f9d3..948b1ca 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c @@ -128,7 +128,6 @@ static int create_cp_queue(struct process_queue_manager *pqm, /* let DQM handle it*/ q_properties->vmid = 0; q_properties->queue_id = qid; - q_properties->type = KFD_QUEUE_TYPE_COMPUTE;
retval = init_queue(q, *q_properties); if (retval != 0) @@ -189,6 +188,7 @@ int pqm_create_queue(struct process_queue_manager *pqm, }
switch (type) { + case KFD_QUEUE_TYPE_SDMA: case KFD_QUEUE_TYPE_COMPUTE: /* check if there is over subscription */ if ((sched_policy == KFD_SCHED_POLICY_HWS_NO_OVERSUBSCRIPTION) &&
From: Ben Goz ben.goz@amd.com
This patch adds a check to the create queue ioctl path, which identifies SDMA queue type that is sent by userspace.
Signed-off-by: Ben Goz ben.goz@amd.com Reviewed-by: Oded Gabbay oded.gabbay@amd.com --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 4083dbc..fbaa98e 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -191,6 +191,8 @@ static int set_queue_properties_from_user(struct queue_properties *q_properties, if (args->queue_type == KFD_IOC_QUEUE_TYPE_COMPUTE || args->queue_type == KFD_IOC_QUEUE_TYPE_COMPUTE_AQL) q_properties->type = KFD_QUEUE_TYPE_COMPUTE; + else if (args->queue_type == KFD_IOC_QUEUE_TYPE_SDMA) + q_properties->type = KFD_QUEUE_TYPE_SDMA; else return -ENOTSUPP;
From: Ben Goz ben.goz@amd.com
This patch passes the correct queue type to pqm_create_queue() instead of a fixed KFD_QUEUE_TYPE_COMPUTE type.
Signed-off-by: Ben Goz ben.goz@amd.com Reviewed-by: Oded Gabbay oded.gabbay@amd.com --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index fbaa98e..77008d5 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -259,8 +259,8 @@ static long kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p, p->pasid, dev->id);
- err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, 0, - KFD_QUEUE_TYPE_COMPUTE, &queue_id); + err = pqm_create_queue(&p->pqm, dev, filep, &q_properties, + 0, q_properties.type, &queue_id); if (err != 0) goto err_create_queue;
dri-devel@lists.freedesktop.org