You can compute the similarity much faster by using BLAS. Good BLAS libraries have SIMD-optimized implementations. Or if you do multiple tokens as once, you can do a matrix-vector multiplication (sgemv), which will be even faster in many implementations. Alternatively, there is probably also a SIMD implementation in Go using assembly (it has been 7 years since I looked at anything in the Go ecosystem).
You could also normalize the vectors while loading. Then during runtime the cosine similarity is just the dot product of the vectors (whether it pays off depends on the size of your embedding matrix and the size of the haystack that you are going to search).
SIMD situation in Go is still rather abysmal, it’s likely easier to just FFI (though FFI is very slow so I guess you're stuck with ugly go asm if you are using short vectors). As usual, there's a particular high-level language that does it very well, and has standard vector similarity function nowadays...
Mojo: https://docs.modular.com/mojo/stdlib/builtin/simd which follows the above two, it too targets LLVM so expect good SIMD codegen as long the lowering strategy does it in an LLVM-friendly way, which I have not looked at yet.
Julia is less "general-purpose" and I know little about the quality of its codegen (it does target LLVM but I haven't seen numbers that place it exactly next to C or C++ which is the case with C#).
Mojo team's blog posts do indicate they care about optimal compiler output, and it seems to have ambitions for a wider domain of application which is why it is mentioned.
https://github.com/arunsupe/semantic-grep/blob/b7dcc82a7cbab...
You can read the vector all at once. See e.g.:
https://github.com/danieldk/go2vec/blob/ee0e8720a8f518315f35...
---
https://github.com/arunsupe/semantic-grep/blob/b7dcc82a7cbab...
You can compute the similarity much faster by using BLAS. Good BLAS libraries have SIMD-optimized implementations. Or if you do multiple tokens as once, you can do a matrix-vector multiplication (sgemv), which will be even faster in many implementations. Alternatively, there is probably also a SIMD implementation in Go using assembly (it has been 7 years since I looked at anything in the Go ecosystem).
You could also normalize the vectors while loading. Then during runtime the cosine similarity is just the dot product of the vectors (whether it pays off depends on the size of your embedding matrix and the size of the haystack that you are going to search).