Skip to main content

EleutherAI Harness Evaluation

EleutherAI Harness is a powerful evaluation framework that lets you measure how well a model performs across a range of standardized benchmarks. Follow the steps below for a guided walkthrough of the evaluation process.

1. Selecting a Model from the Foundation Tab​

Start by navigating to the Foundation tab in Transformer Lab. Choose the model you want to evaluate from the list provided.

GIF Animation

2. Downloading the Appropriate Plugin​

In order to use the evaluation functionalities, you need to download the correct plugin:

  • For Mac Systems: Download the Eleuther AI LM Evaluation Harness MLX plugin. This version is optimized for Mac systems and provides better support.
  • For Other Systems: Download the Eleuther AI LM Evaluation Harness plugin.
GIF Animation

3. Configuring the Evaluation Task​

Configure your evaluation task by following these steps:

  • Name Your Evaluation Task: Enter a descriptive name for easy identification.
  • Select Evaluation Tasks: Choose the suite of tasks within the Harness that you wish to evaluate.
  • Define the Evaluation Scope: Select the fraction of samples to evaluate. The recommended fraction is 1 (using the full benchmark) for a thorough assessment. For testing or debugging, you can choose a lower fraction.
GIF Animation

4. Running the Evaluation​

Once you have set up the task, click on the Queue button to start the evaluation process.

GIF Animation

5. Viewing the Results​

After the evaluation completes, you can review the results:

  • Job Output: Check the output logs for immediate results and logs of the job execution.
  • Detailed Report: Access the detailed report generated by Harness for an in-depth analysis of the evaluation outcomes.
GIF Animation