IFEval

2024-07-29

Paper

IFEval 是一種專門用於評估模型指令順從度 (instruction following)的指標

測試的題目全部都是根據模型的輸出，以客觀的、可辨識的特徵來觀察，模型是否順從使用者要求的指令(instruction)。

測驗主題包含

關鍵字
語言
輸出長度限制
要求特定內容、格式

以下提供一些實際測驗題目當作範例

Instruction	Group	Instruction Description
Keywords	Include Keywords	Include keywords {keyword1}, {keyword2} in your response.
Keywords	Forbidden Words	Do not include keywords {forbidden words} in the response.
Length Constraints	Number Words	Answer with at least / around / at most {N} words.
Detectable Content	Postscript	At the end of your response, please explicitly add a postscript starting with {postscript marker}
Detectable Format	JSON Format	Entire output should be wrapped in JSON format.
Combination	Two Responses	Give two different responses. Responses and only responses should be separated by 6 asterisk symbols: ******.
Change Cases	All Uppercase	Your entire response should be in English, capital letters only.
Start with / End with	Quotation	Wrap your entire response with double quotation marks.
Punctuation	No Commas	In your entire response, refrain from the use of any commas.

由這些例子可以看出，IFEval可以作為一個客觀評估模型順從度的指標