Description of the feature
Problem
In MTEB v1, corpus_chunk_size could be tuned via the public evaluation API (e.g. evaluation.run(..., corpus_chunk_size=500)), which was useful to reduce memory usage on large retrieval corpora.
In the current v2 codebase, the SearchEncoderWrapper class accepts corpus_chunk_size as a constructor parameter with a default of 50,000, but this parameter is never passed through the public evaluation API. As a result, there is no way to change this chunk size when running via new evaluation API mteb.evaluate(...).
This is also the case for Bitext mining tasks (BitextMiningEvaluator).
Feature request
Could corpus_chunk_size be made tunable through the public evaluation API again?
Description of the feature
Problem
In MTEB v1,
corpus_chunk_sizecould be tuned via the public evaluation API (e.g.evaluation.run(..., corpus_chunk_size=500)), which was useful to reduce memory usage on large retrieval corpora.In the current v2 codebase, the
SearchEncoderWrapperclass acceptscorpus_chunk_sizeas a constructor parameter with a default of 50,000, but this parameter is never passed through the public evaluation API. As a result, there is no way to change this chunk size when running via new evaluation APImteb.evaluate(...).This is also the case for Bitext mining tasks (
BitextMiningEvaluator).Feature request
Could
corpus_chunk_sizebe made tunable through the public evaluation API again?