more early impressions on performance: besides the endpoint erroring out at a hi...

wesleyyue on March 4, 2024 | parent | context | favorite | on: Claude 3 model family

more early impressions on performance: besides the endpoint erroring out at a higher rate than openai, time-to-first-token is also much slower :(

p50: 2.14s p95: 3.02s

And these aren't super long prompts either. vs gpt4 ttft:

p50: 0.63s p95: 1.47s