With a typical transformer and a GPU the batch size that saturates the compute i...

mmoskal 8 months ago | parent | context | favorite | on: GDDR7 Memory Supercharges AI Inference

With a typical transformer and a GPU the batch size that saturates the compute is at least hundreds. Otherwise (including typical size of 1 for local inference) you're memory bound.