Can we modify convert_texts_to_vector
in https://github.com/harmonydata/harmony/blob/main/src/harmony/matching/default_matcher.py
to allow items to be batched when sent to the LLM?
Batch size should be variable
RationaleIf a user wants to harmonise 10,000 items, this will not fit in memory even in a high performance machine. Small laptops probably can only batch 20 items at a time. But the batching should be configurable as it will slow things down. Perhaps as a parameter.
People have reported that the website cannot cope with large harmonisations. E.g. below comment on Discord (23 Oct 2024)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4