Authors: Sakhinana Sagar Srinivas, Chidaksh Ravuru, Geethan Sannidhi, Venkataramana Runkana
Abstract: Semiconductor imaging and analysis are critical yet understudied in deep
learning, limiting our ability for precise control and optimization in
semiconductor manufacturing. We introduce a small-scale multimodal framework
for analyzing semiconductor electron microscopy images (MAEMI) through
vision-language instruction tuning. We generate a customized
instruction-following dataset using large multimodal models on microscopic
image analysis. We perform knowledge transfer from larger to smaller models
through knowledge distillation, resulting in improved accuracy of smaller
models on visual question answering (VQA) tasks. This approach eliminates the
need for expensive, human expert-annotated datasets for microscopic image
analysis tasks. Enterprises can further finetune MAEMI on their intellectual
data, enhancing privacy and performance on low-cost consumer hardware. Our
experiments show that MAEMI outperforms traditional methods, adapts to data
distribution shifts, and supports high-throughput screening.
Source: http://arxiv.org/abs/2408.13248v1