Matt Zheng

IFEval

IFEval 是一種專門用於評估模型指令順從度 (instruction following)的指標

測試的題目全部都是根據模型的輸出,以客觀的、可辨識的特徵來觀察,模型是否順從使用者要求的指令(instruction)。

測驗主題包含

關鍵字
語言
輸出長度限制
要求特定內容、格式

以下提供一些實際測驗題目當作範例

Instruction Group Instruction Description
Keywords Include Keywords Include keywords {keyword1}, {keyword2} in your response.
Keywords Forbidden Words Do not include keywords {forbidden words} in the response.
Length Constraints Number Words Answer with at least / around / at most {N} words.
Detectable Content Postscript At the end of your response, please explicitly add a postscript starting with {postscript marker}
Detectable Format JSON Format Entire output should be wrapped in JSON format.
Combination Two Responses Give two different responses. Responses and only responses should be separated by 6 asterisk symbols: ******.
Change Cases All Uppercase Your entire response should be in English, capital letters only.
Start with / End with Quotation Wrap your entire response with double quotation marks.
Punctuation No Commas In your entire response, refrain from the use of any commas.

由這些例子可以看出,IFEval可以作為一個客觀評估模型順從度的指標