
MONOCODER: Domain-Specific Code Language Model for HPC Codes and Tasks
MONOCODER is a smaller, HPC-specific language model that outperforms larger, general-purpose LLMs on HPC code tasks, offering better performance in code generation and comprehension.

Abstract
This paper introduces MonoCoder, a domain-specific language model designed for high-performance computing (HPC) tasks. The researchers developed MonoCoder by pre-training on HPCorpus, an HPC-specific dataset of C and C++ programs mined from GitHub. Despite being significantly smaller than comparable models, MonoCoder demonstrates superior performance on HPC-related tasks, outperforming larger state-of-the-art multilingual LLMs on normalized perplexity tests and delivering competitive CodeBLEU scores for parallel and high-performance code generation. This research demonstrates that specialized, domain-focused models can achieve better results with fewer parameters than general-purpose alternatives.
Related Research
UniPar: A Unified LLM-Based Framework for Parallel and Accelerated Code Translation in HPC
Introduces UniPar, an evaluation framework for assessing how large language models translate between parallel programming languages, achieving 69% compilation success and 33% functional correctness through fine-tuning and compiler-guided repair.
Workflows vs Agents for Code Translation
Compares structured workflows versus agentic approaches for MATLAB-to-HDL translation, showing that agentic methods with the Model Context Protocol increase simulation reach rates by over 20 percentage points on mid-sized models.