Authors: Seyedreza Mohseni, Seyedali Mohammadi, Deepa Tilwani, Yash Saxena, Gerald Ndwula, Sriram Vema, Edward Raff, Manas Gaur
Abstract: Malware authors often employ code obfuscations to make their malware harder
to detect. Existing tools for generating obfuscated code often require access
to the original source code (e.g., C++ or Java), and adding new obfuscations is
a non-trivial, labor-intensive process. In this study, we ask the following
question: Can Large Language Models (LLMs) potentially generate a new
obfuscated assembly code? If so, this poses a risk to anti-virus engines and
potentially increases the flexibility of attackers to create new obfuscation
patterns. We answer this in the affirmative by developing the MetamorphASM
benchmark comprising MetamorphASM Dataset (MAD) along with three code
obfuscation techniques: dead code, register substitution, and control flow
change. The MetamorphASM systematically evaluates the ability of LLMs to
generate and analyze obfuscated code using MAD, which contains 328,200
obfuscated assembly code samples. We release this dataset and analyze the
success rate of various LLMs (e.g., GPT-3.5/4, GPT-4o-mini, Starcoder,
CodeGemma, CodeLlama, CodeT5, and LLaMA 3.1) in generating obfuscated assembly
code. The evaluation was performed using established information-theoretic
metrics and manual human review to ensure correctness and provide the
foundation for researchers to study and develop remediations to this risk. The
source code can be found at the following GitHub link:
https://github.com/mohammadi-ali/MetamorphASM.
Source: http://arxiv.org/abs/2412.16135v1