Skip to main content
MONOCODER: Domain-Specific Code Language Model for HPC Codes and Tasks

MONOCODER: Domain-Specific Code Language Model for HPC Codes and Tasks

MONOCODER is a smaller, HPC-specific language model that outperforms larger, general-purpose LLMs on HPC code tasks, offering better performance in code generation and comprehension.

Published on December 18, 2024
← Back to Research
MONOCODER: Domain-Specific Code Language Model for HPC Codes and Tasks

Abstract

This paper introduces MonoCoder, a domain-specific language model designed for high-performance computing (HPC) tasks. The researchers developed MonoCoder by pre-training on HPCorpus, an HPC-specific dataset of C and C++ programs mined from GitHub. Despite being significantly smaller than comparable models, MonoCoder demonstrates superior performance on HPC-related tasks, outperforming larger state-of-the-art multilingual LLMs on normalized perplexity tests and delivering competitive CodeBLEU scores for parallel and high-performance code generation. This research demonstrates that specialized, domain-focused models can achieve better results with fewer parameters than general-purpose alternatives.

Read Full Article