Journal of Theoretical Physics & Mathematics Research

Geometric Concept Spaces in Small Encoders: A Comparative Mechanistic Probing of ModernBERT and DeBERTa-v3

Abstract

Cristian Leo

Bidirectional transformer encoders have bifurcated into two optimization paradigms: topological precision via disentangled attention (DeBERTa-v3) and hardware-aware scaling via rotary positional embeddings (Modern BERT). This study presents an exhaustive geometric and mechanistic investigation of these architectures using 100,000 activation samples. Through linear probing, Centered Kernel Alignment (CKA), and intrinsic dimensionality estimation, we reveal a 16.5% performance gap in linear concept separability favoring DeBERTa-v3 (p < 0.001). We identify an extreme “Topological Collapse” in Modern BERT’s final layers, where concept manifolds condense from 30 dimensions to 2. We quantify a fundamental stability-precision trade-off: Modern BERT’s RoPE provides 4.3x higher local positional stability but induces severe semantic entanglement, while DeBERTa-v3 utilizes sparse, specialized sub-circuits to maintain precise orthogonal boundaries. Our findings provide a rigorous geometric explanation for the “token classification anomaly” in modern encoders. 

PDF

VIRAL88