How many layers to fine-tune?
Qdrant (read: quadrant ) is a vector similarity search engine. It provides a production-ready service with a convenient API to store, search, and manage points - vectors with an additional payload. Qdrant is tailored to extended filtering support. It makes it useful for all sorts of neural-network or semantic-based matching, faceted search, and other applications.
Qdrant is written in Rust 🦀, which makes it fast and reliable even under high load.
With Qdrant, embeddings or neural network encoders can be turned into full-fledged applications for matching, searching, recommending, and much more!
Model fine-tuning allows you to improve the quality of the pre-trained models with just a fraction of the resources spent on training the original model.
But there is a trade-off between the number of layers you tune and the precision you get.
Using fewer layers allows for faster training with larger batch size, while more layers increase the model's capacity.
Qdrant team run experiments so you could make more educated choices.

Here are some highlights:
Training only the head of a model (5% of weights) gives 2x boost on metrics, while full training gives only 3x.
Training only a head layer allows using larger models with bigger batch sizes, compensating for the precision.
If you only have a small dataset, full model tuning will give a more negligible effect
Read more in the article https://qdrant.tech/articles/embedding-recycler/ by Yusuf Sarıgöz.

