IFEval
IFEval 是一種專門用於評估模型指令順從度 (instruction following)
的指標
測試的題目全部都是根據模型的輸出,以客觀的、可辨識的特徵來觀察,模型是否順從使用者要求的指令(instruction)。
測驗主題包含
關鍵字
語言
輸出長度限制
要求特定內容、格式
以下提供一些實際測驗題目當作範例
Instruction | Group | Instruction Description |
---|---|---|
Keywords | Include Keywords | Include keywords {keyword1}, {keyword2} in your response. |
Keywords | Forbidden Words | Do not include keywords {forbidden words} in the response. |
Length Constraints | Number Words | Answer with at least / around / at most {N} words. |
Detectable Content | Postscript | At the end of your response, please explicitly add a postscript starting with {postscript marker} |
Detectable Format | JSON Format | Entire output should be wrapped in JSON format. |
Combination | Two Responses | Give two different responses. Responses and only responses should be separated by 6 asterisk symbols: ******. |
Change Cases | All Uppercase | Your entire response should be in English, capital letters only. |
Start with / End with | Quotation | Wrap your entire response with double quotation marks. |
Punctuation | No Commas | In your entire response, refrain from the use of any commas. |
由這些例子可以看出,IFEval可以作為一個客觀評估模型順從度
的指標