Just migrated all embeddings to this same model a few weeks ago in my company, and it's a game changer. Having 32k context is a 64x increase when compared with our previous used model. Plus being natively multilingual and producing very standard 1024 long arrays made it a seamless transition even with millions of embeddings across thousands of databases.
Depends on your needs. You surely don't want 32k long chunks for doing the standard RAG pipeline, that's for sure.
My use case is basically a recommendation engine, where retrieve a list of similar forum topics based on the current read one. As with dynamic user generated content, it can vary from 10 to 100k tokens. Ideally I would generate embeddings from an LLM generated summary, but that would increase inference costs considerably at the scale I'm applying it.
Having a larger possible context out of the box just made a simple swap of embeddeding models increase quality of recommendations greatly.
Just migrated all embeddings to this same model a few weeks ago in my company, and it's a game changer. Having 32k context is a 64x increase when compared with our previous used model. Plus being natively multilingual and producing very standard 1024 long arrays made it a seamless transition even with millions of embeddings across thousands of databases.
I do recommend using https://github.com/huggingface/text-embeddings-inference for fast inference.
What does it mean to generate 1000 float16 array size on a 32k context? Surely the embedding you get is no longer representative of the text.
Depends on your needs. You surely don't want 32k long chunks for doing the standard RAG pipeline, that's for sure.
My use case is basically a recommendation engine, where retrieve a list of similar forum topics based on the current read one. As with dynamic user generated content, it can vary from 10 to 100k tokens. Ideally I would generate embeddings from an LLM generated summary, but that would increase inference costs considerably at the scale I'm applying it.
Having a larger possible context out of the box just made a simple swap of embeddeding models increase quality of recommendations greatly.