Went through the steps and ran it on a similar r6a.16xlarge and the model seems to only load after the first prompt. After that it takes maybe more than half an hour trying to load the model and still no answer. The context size in the post is also not validated in my experiment with the above. With 512GB of ram I cannot use more than 4k context size without the model outright refusing to load. I am new to model setups so I might have missed something.