MERA_Industrial

0

Описание

Языки

  • Jupyter Notebook63,4%
  • Python20,1%
  • Shell15,3%
  • Makefile1,2%
6 месяцев назад
6 месяцев назад
6 месяцев назад
6 месяцев назад
6 месяцев назад
6 месяцев назад
6 месяцев назад
6 месяцев назад
README.md

MERA Industrial

              MERA Industrial  

        License             Release    

   

MERA Induscrial: A Unified Framework for Evaluating Industrial tasks.

🚀 About

MERA Industrial brings together a domain-specific collection of evaluation tasks under one roof. Built on top of the Language Model Evaluation Harness (v0.4.9), it enables researchers and practitioners to:

  • Compare models on identical tasks and metrics
  • Reproduce results with fixed prompts and few-shot settings
  • Submit standardized ZIP archives for leaderboard integration

🔍 Datasets Overview

Set        Task Name          Metrics                SizePromptsSkills                                                        
PrivateruTXTMedQFundamental    ExactMatch, F1        4590  10      Anatomy, Biochemistry, Bioorganic Chemistry, Biophysics, Clinical Laboratory Diagnostics, Faculty Surgery, General Chemistry, General Surgery, Histology, Hygiene, Microbiology, Normal Physiology, Parasitology, Pathological Anatomy, Pathological physiology, Pharmacology, Propaedeutics in Internal Medicine
PrivateruTXTAgroBenchExactMatch, F1        2642  10      Botany, Forage Production and Grassland Management, Land Reclamation, General Genetics, General Agriculture, Fundamentals of Plant Breeding, Plant Production, Seed Production and Seed Science, Agricultural Systems in Various Agricultural Landscapes, Crop Cultivation Technologies
PrivateruTXTAquaBenchExactMatch, F1      992  10      Industrial aquaculture; Ichthyopathology: veterinary medicine, prevention and optimization of fish farming technologies; Feeding fish and other aquatic organisms; Mariculture, Breeding crayfish and shrimp, Artificial pearl cultivation.

🛠 Getting Started

Clone the repository with submodule

First, you need to clone the MERA_CODE repository and load the submodule:

Installing dependencies

Remote Scoring: quick setup for cloud-based scoring — install only core dependencies, run the evaluation, and submit the resulting ZIP archive to our website to get the score.

Install lm-eval library and optional packages for evaluations:

Running evaluations

We have prepared the script that launches evaluations via

lm-eval
library and packs the evaluation logs into zip archive:

More details on

run_evaluation.sh
usage may be obtained by:

How it works inside...

📁 Repository Structure

💪 How to Join the Leaderboard

Follow these steps to see your model on the Leaderboard:

  1. Run Remote Scoring   Evaluate the benchmark in the Remote Scoring regime (see 🛠 Getting Started above). Pay attention that for private tasks we do not provide golden answers, so no local scoring is provided.

You’ll end up with a logs folder and a ready-to-submit zip archive like

Qwen2.5-0.5B-Instruct_submission.zip
.

  1. Submit on the website   Head over to Create Submission, upload the archive, and move on to the form.

  2. Fill in Model Details   Provide accurate information about the model and evaluation. These details are crucial for reproducibility—if something is missing, administrators may ping you (or your Submission might be rejected).

  3. Wait for Scoring ⏳   Scoring usually wraps up in ~10-15 minutes.

  4. Publish your result   Once scoring finishes, click "Submit for moderation". After approval, your model goes Public and appears on the Leaderboard.  

Good luck, and happy benchmarking! 🎉    

📝 License

Distributed under the MIT License. See LICENSE for details.