"Pointers on SI are 64-bit." Talk about a "duh" moment. I'll see what I can do to fix and test the vload/vstore implementation pointer generation since this is broken on SI, maybe 64-bit PTX, and would also be broken on x86_64 (and others) if we ever get around to supporting CPU targets in libclc.