How to Test LLM Models

How to choose the best LLM using R and vitals

Use the vitals package with ellmer to evaluate and compare the accuracy of LLMs, including writing evals to test local models ...

Guardrailing LLMs: The Practical Path To Safe AI Products

In practice, the choice between small modular models and guardrail LLMs quickly becomes an operating model decision.

Communications of the ACM

LLM Evaluation is Key to Accurate, Reliable, Effective GenAI

Enter large language model (LLM) evaluation. The purpose of LLM evaluation is to analyze and refine GenAI outputs to improve their accuracy and reliability while avoiding bias. The evaluation process ...

UC San Diego Today

A New Method to Steer AI Output Uncovers Vulnerabilities and Potential Improvements

A team of researchers has found a way to steer the output of large language models by manipulating specific concepts inside ...

ZDNet

IBM to test Southeast Asian LLM and facilitate localization efforts

IBM has inked an agreement with AI Singapore (AISG) to test the latter's Southeast Asian large language model (LLM) and make it available for developers to build customized artificial intelligence (AI ...

The Conversation

Putting DeepSeek to the test: how its performance compares against other AI tools

Cardiff Metropolitan University provides funding as a member of The Conversation UK. China’s new DeepSeek Large Language Model (LLM) has disrupted the US-dominated market, offering a relatively ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results