Test your prompts, models, and RAGs. Catch regressions and improve prompt quality. LLM evals for OpenAI, Azure, Anthropic, Gemini, Mistral, Llama, Bedrock, Ollama, and other local & private models with CI/CD integration.
llmragllmopsprompt-engineeringtestingpromptsevaluation-frameworkevaluationllm-evalcicdci-cdcillm-evaluationllm-evaluation-frameworkprompt-testing- TypeScript
01Обновлено 6 месяцев назад
AI Observability & Evaluation - Evaluate, troubleshoot, and fine tune your LLM, CV, and NLP models in a notebook.
llmopsmlopsmodel-monitoringai-monitoringai-observabilityai-roiclusteringllm-evalml-monitoringml-observabilitymodel-observabilityumap- Jupyter Notebook
00Обновлено 7 месяцев назад