On Tue, Sep 14 2021, Jason Gunthorpe jgg@nvidia.com wrote:
On Mon, Sep 13, 2021 at 04:31:54PM -0400, Eric Farman wrote:
I rebased it and fixed it up here:
https://github.com/jgunthorpe/linux/tree/vfio_ccw
Can you try again?
That does address the crash, but then why is it processing a BROKEN event? Seems problematic.
The stuff related to the NOT_OPER looked really wonky to me. I'm guessing this is the issue - not sure about the pmcw.ena either..
[I have still not been able to digest the whole series, sorry.]
diff --git a/drivers/s390/cio/vfio_ccw_fsm.c b/drivers/s390/cio/vfio_ccw_fsm.c index 5ea392959c0711..0d4d4f425befac 100644 --- a/drivers/s390/cio/vfio_ccw_fsm.c +++ b/drivers/s390/cio/vfio_ccw_fsm.c @@ -380,29 +380,19 @@ static void fsm_open(struct vfio_ccw_private *private, spin_unlock_irq(sch->lock); }
-static void fsm_close(struct vfio_ccw_private *private,
enum vfio_ccw_event event)
+static int flush_sch(struct vfio_ccw_private *private) { struct subchannel *sch = private->sch; DECLARE_COMPLETION_ONSTACK(completion); int iretry, ret = 0;
- spin_lock_irq(sch->lock);
- if (!sch->schib.pmcw.ena)
goto err_unlock;
- ret = cio_disable_subchannel(sch);
- if (ret != -EBUSY)
goto err_unlock;
- iretry = 255; do {
- ret = cio_cancel_halt_clear(sch, &iretry);
- if (ret == -EIO) { pr_err("vfio_ccw: could not quiesce subchannel 0.%x.%04x!\n", sch->schid.ssid, sch->schid.sch_no);
break;
return ret;
Looking at this, I wonder why we had special-cased -EIO -- for -ENODEV we should be done as well, as then the device is dead and we do not need to disable it.
} /*
@@ -413,13 +403,28 @@ static void fsm_close(struct vfio_ccw_private *private, spin_unlock_irq(sch->lock);
if (ret == -EBUSY)
wait_for_completion_timeout(&completion, 3*HZ);
wait_for_completion_timeout(&completion, 3 * HZ);
private->completion = NULL; flush_workqueue(vfio_ccw_work_q); spin_lock_irq(sch->lock); ret = cio_disable_subchannel(sch); } while (ret == -EBUSY);
return ret;
+}
+static void fsm_close(struct vfio_ccw_private *private,
enum vfio_ccw_event event)
+{
- struct subchannel *sch = private->sch;
- int ret;
- spin_lock_irq(sch->lock);
- if (!sch->schib.pmcw.ena)
goto err_unlock;
- ret = cio_disable_subchannel(sch);
cio_disable_subchannel() should be happy to disable an already disabled subchannel, so I guess we can just walk through this and end up in CLOSED state... unless entering with !ena actually indicates that we messed up somewhere else in the state machine. I still need to find time to read the patches.
- if (ret == -EBUSY)
if (ret) goto err_unlock; private->state = VFIO_CCW_STATE_CLOSED;ret = flush_sch(private);