EleutherAI Harness Evaluation
EleutherAI Harness is a powerful evaluation framework that lets you measure how well a model performs across a range of standardized benchmarks. Follow the steps below for a guided walkthrough of the evaluation process.
1. Selecting a Model from the Foundation Tab​
Start by navigating to the Foundation tab in Transformer Lab. Choose the model you want to evaluate from the list provided.

2. Downloading the Appropriate Plugin​
In order to use the evaluation functionalities, you need to download the correct plugin:
- For Mac Systems: Download the
Eleuther AI LM Evaluation Harness MLX
plugin. This version is optimized for Mac systems and provides better support. - For Other Systems: Download the
Eleuther AI LM Evaluation Harness
plugin.

3. Configuring the Evaluation Task​
Configure your evaluation task by following these steps:
- Name Your Evaluation Task: Enter a descriptive name for easy identification.
- Select Evaluation Tasks: Choose the suite of tasks within the Harness that you wish to evaluate.
- Define the Evaluation Scope: Select the fraction of samples to evaluate. The recommended fraction is
1
(using the full benchmark) for a thorough assessment. For testing or debugging, you can choose a lower fraction.

4. Running the Evaluation​
Once you have set up the task, click on the Queue button to start the evaluation process.

5. Viewing the Results​
After the evaluation completes, you can review the results:
- Job Output: Check the output logs for immediate results and logs of the job execution.
- Detailed Report: Access the detailed report generated by Harness for an in-depth analysis of the evaluation outcomes.
