Evaluation of Register-Based Machine Translation Using Text Classification Methods

Asnia matondang; Samaria Tambunan; Rizka Simanjutak; Bingham

doi:10.51622/explora.v11i2.2917

pdf

Published: Jul 26, 2025

DOI: https://doi.org/10.51622/explora.v11i2.2917

Keywords:

Register-based Machine Translation, Text Categorization

Asnia matondang

a:1:{s:5:"en_US";s:32:"Universitas HKBP Nommensen Medan";}

Samaria Tambunan

Universitas HKBP Nommensen, Medan, Indonesia

Rizka Simanjutak

Universitas HKBP Nommensen, Medan

Bingham

St. Andrew, United States

Abstract

The effectiveness of register-based machine translation (MT) is assessed in this research using text categorization methodologies. Given that different registers call for different translation strategies—formal, informal, academic, or conversational—the objective of this study is to assess how well MT systems adapt to various registers. A dataset of texts from different domains that had been translated using an MT engine was classified using supervised machine learning methods to determine register-specific correctness and appropriateness. The evaluation focuses on linguistic features, translation accuracy, and register consistency. The results demonstrate that register-aware MT significantly improves translation quality and contextual relevance, especially in the academic and professional domains. The findings show how text classification may be integrated into MT evaluation frameworks to enhance output quality and guide future system development. This supports the register itself as one of the essential components that must be included in register-based machine translation assessment.

Downloads

Download data is not yet available.

Issue

Vol. 11 No. 2 (2025): Agustus 2025

Section

Articles

References

Bahdanau, D., Bosc, T., Jastrzebski, S., Grefenstette, E., Vincent, P., & Bengio, Y. (2020). Learning to Compute Word Embeddings on the Fly. Transactions of the Association for Computational Linguistics, 8, 727–742.

Bawden, R., Sennrich, R., Birch, A., & Haddow, B. (2021). Evaluating Discourse Phenomena in Neural Machine Translation. Computational Linguistics, 47(1), 155–192.

Belinkov, Y., & Glass, J. (2021). Analysis Methods in Neural Language Processing: A Survey. Transactions of the ACL, 9, 514–529.

Bojar, O., Graham, Y., Haddow, B., & Specia, L. (2020). Results of the WMT20 Metrics Shared Task. Proceedings of the Fifth Conference on Machine Translation, 688–725.

Cao, Y., & Xue, N. (2021). Bridging the Gap Between Machine Translation and Human Translation with Style-aware Evaluation. ACL Anthology, 2021.

Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., & Salakhutdinov, R. (2020). Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. NeurIPS 2020.

Dinu, G., Sennrich, R., & Birch, A. (2022). Evaluating the Impact of Register Mismatches in Machine Translation. Machine Translation Journal, 36(1), 45–66.

Farahani, M. R., et al. (2023). Register Variation in Machine Translation: A Corpus-Based Study. Journal of Artificial Intelligence Research, 74, 135–155.

Glavaš, G., & Vulic, I. (2021). Supervised Learning for Text Style Transfer: A Comparative Study. Transactions of the ACL, 9, 387–403.

Goyal, T., & Durrett, G. (2022). Annotating and Modeling Empathetic Responses in Dialogue. NAACL 2022.

Hardmeier, C., & Lapshinova-Koltunski, E. (2023). Discourse and Register in Machine Translation: Challenges and Advances. Computational Linguistics, 49(2), 255–281.

Holtzman, A., Buys, J., Du, L., Forbes, M., & Choi, Y. (2020). The Curious Case of Neural Text Degeneration. International Conference on Learning Representations (ICLR 2020).

Hovy, D., & Søgaard, A. (2021). Social NLP: Linking Language and Society. Annual Review of Linguistics, 7, 201–219.

Jhamtani, H., & Berg-Kirkpatrick, T. (2020). Evaluating Style Transfer for Text. EMNLP Findings, 2020.

Kiros, R., Zhu, Y., Salakhutdinov, R., & Zemel, R. (2021). Skip-Thought Vectors for Semantic Similarity. NeurIPS, 2021.

Koehn, P. (2022). Neural Machine Translation and the Changing Landscape of Evaluation. Machine Translation Journal, 36(2), 101–123.

Laubli, S., Sennrich, R., & Volk, M. (2020). Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation. Proceedings of EMNLP 2020.

Lin, C. Y., & Ng, H. T. (2023). Register-aware NMT with Contextual Embeddings. Journal of Computational Linguistics, 49(1), 1–24.

Liu, Y., Ott, M., Goyal, N., et al. (2020). RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692.

Litschko, R., Glavaš, G., Ruder, S., & Søgaard, A. (2021). Evaluating the Robustness of Register-Aware Language Models. EACL 2021.

Pavlick, E., & Smith, N. A. (2021). Controlling Output Style and Structure in Neural NLG. TACL, 9, 682–698.

Popović, M. (2021). Automatic Evaluation of Machine Translation Output Based on Linguistic Features. Machine Translation, 35, 255–277.

Sennrich, R. (2022). Contextual Modeling and Evaluation in NMT: An Overview. Computational Linguistics, 48(3), 401–432.

Søgaard, A., & Schwartz, R. (2020). Evaluation Metrics for NLP: Problems and Proposals. ACL 2020.

Vaswani, A., Shazeer, N., Parmar, N., et al. (2021). Attention Is All You Need. NeurIPS 2021 Reprint Edition.

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details

References