(ICML 2024) TrustLLM: Trustworthiness in Large Language Models
llmainlplarge-language-modelsnatural-language-processingbenchmarkevaluationtoolkitpypi-packagedatasettrustworthy-aitrustworthy-machine-learning- Python
02Обновлено 5 месяцев назад
Test your prompts, models, and RAGs. Catch regressions and improve prompt quality. LLM evals for OpenAI, Azure, Anthropic, Gemini, Mistral, Llama, Bedrock, Ollama, and other local & private models with CI/CD integration.
llmragllmopsprompt-engineeringtestingpromptsevaluation-frameworkevaluationllm-evalcicdci-cdcillm-evaluationllm-evaluation-frameworkprompt-testing- TypeScript
01Обновлено 6 месяцев назад
The production toolkit for LLMs. Observability, prompt management and evaluations.
- TypeScript
00Обновлено 7 месяцев назад
Comparing quality and performance of NLP systems for Russian language
- Python
00Обновлено 4 месяца назад