Make `corpus_chunk_size` tunable for retrieval tasks

### Description of the feature

### Problem

In MTEB v1, `corpus_chunk_size` could be tuned via the public evaluation API (e.g. `evaluation.run(..., corpus_chunk_size=500)`), which was useful to reduce memory usage on large retrieval corpora.

In the current v2 codebase, the `SearchEncoderWrapper` class accepts `corpus_chunk_size` as a constructor parameter with a default of **50,000**, but this parameter is never passed through the public evaluation API. As a result, there is no way to change this chunk size when running via new evaluation API `mteb.evaluate(...)`.

This is also the case for Bitext mining tasks (`BitextMiningEvaluator`).

### Feature request

Could `corpus_chunk_size` be made tunable through the public evaluation API again?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `corpus_chunk_size` tunable for retrieval tasks #4450

Description of the feature

Problem

Feature request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make corpus_chunk_size tunable for retrieval tasks #4450

Description

Description of the feature

Problem

Feature request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Make `corpus_chunk_size` tunable for retrieval tasks #4450