Abstract: This paper introduces TURSpider, a novel Turkish Text-to-SQL dataset developed through human translation of the widely used Spider dataset, aimed at addressing the current lack of complex, ...
Abstract: In this article, we present BenchING, a new benchmark for evaluating large language models (LLMs) on their ability to follow structured output format instructions in text-based procedural ...
Weights & Biases is a helpful tool to analyze experiments, while Optuna is an effective tool for hyperparameter tuning. To use either of these tools, make sure to check out the notebooks in the ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results