Natural Language Processing Research Papers

Semantic Annotation of Deverbal Nominalizations in the Spanish AnCora corpus

2025

Semantic Annotation of Deverbal Nominalizations in the Spanish AnCora corpus Aina Peris CLiC-UB Gran Via,585 08007 Barcelona aina.peris@ub.edu Mariona Taulé CLiC-UB Gran Via,585 08007 Barcelona mtaule@ub.edu Horaci Rodríguez TALP -UPC... more

descriptionView Paper arrow_downwardDownload

AnCora-Nom: A Spanish Lexicon of Deverbal Nominalizations

by Aina Peris

2025, Procesamiento de Lenguaje Natural

Resumen: En este artículo se describe un nuevo recurso: AnCora-Nom, un léxico de nominalizaciones deverbales del español. Actualmente, contiene 1.655 entradas léxicas y 3.094 sentidos, donde cada sentido tiene asociado el tipo denotativo... more

descriptionView Paper arrow_downwardDownload

Explaining the Deep Natural Language Processing by Mining Textual Interpretable Features

by Salvatore Greco

2025, arXiv (Cornell University)

Despite the high accuracy offered by state-of-the-art deep natural-language models (e.g. LSTM, BERT), their application in reallife settings is still widely limited, as they behave like a black-box to the end-user. Hence, explainability... more

descriptionView Paper arrow_downwardDownload

CBR in Dependency-based Machine Translation

by Mannes Poel

2025

A case based reasoning approach is introduced as a learning technique in the domain of machine translation of natural language. In our approach syntactical and semantic features are part of the cases in the case-base. To implement this,... more

descriptionView Paper arrow_downwardDownload

Introduction: Acquisition of Wh-Movement

by Tom Roeper

2025, University of Massachusetts Occasional Papers in Linguistics

descriptionView Paper arrow_downwardDownload

A Multi-View Sentiment Corpus

by Enza Messina

2025

descriptionView Paper arrow_downwardDownload

AI-based Arabic Language and Speech Tutor

by Abdessamad mbarki

2025, 2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA)

In the past decade, we have observed a growing interest in using technologies such as artificial intelligence (AI), machine learning, and chatbots to provide assistance to language learners, especially in second language learning. By... more

descriptionView Paper arrow_downwardDownload

On Linguistic Reviews of Arabic and Bangla: A Comparative Study

by Sufia Sultana

2025, Theory and Practice in Language Studies

This research work sets out to explore the major distinctions between Arabic and Bangla—the languages with unidentical origins. Comparing and analyzing the various features of these two languages requires huge linguistic expertise in the... more

descriptionView Paper arrow_downwardDownload

PROMETHEUS: A Corpus of Proverbs Annotated with Metaphors

by Gözde Özbal

2025

Proverbs are commonly metaphoric in nature and the mapping across domains is commonly established in proverbs. The abundance of proverbs in terms of metaphors makes them an extremely valuable linguistic resource since they can be utilized... more

descriptionView Paper arrow_downwardDownload

Automation and Evaluation of the Keyword Method for Second Language Learning

by Gözde Özbal

2025, Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

In this paper, we combine existing NLP techniques with minimal supervision to build memory tips according to the keyword method, a well established mnemonic device for second language learning. We present what we believe to be the first... more

descriptionView Paper arrow_downwardDownload

BRAINSUP: Brainstorming Support for Creative Sentence Generation

by Gözde Özbal

2025

We present BRAINSUP, an extensible framework for the generation of creative sentences in which users are able to force several words to appear in the sentences and to control the generation process across several semantic dimensions,... more

descriptionView Paper arrow_downwardDownload

SVNIT $@$ SemEval 2017 Task-6: Learning a Sense of Humor Using Supervised Approach

by Mukesh A Zaveri SVNIT

2025, Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

This paper describes the system developed for SemEval 2017 task 6: #HashTagWars -Learning a Sense of Humor. Learning to recognize sense of humor is the important task for language understanding applications. Different set of features... more

descriptionView Paper arrow_downwardDownload

Deep learning based Arabic short answer grading in serious games

by Lotfi Elaachak

2025, International Journal of Power Electronics and Drive Systems

Automatic short answer grading (ASAG) has become part of natural language processing problems. Modern ASAG systems start with natural language preprocessing and end with grading. Researchers started experimenting with machine learning in... more

descriptionView Paper arrow_downwardDownload

Statistical parsing of morphologically rich languages (SPMRL): what, how and whither

by yoav goldberg

2025

The term Morphologically Rich Languages (MRLs) refers to languages in which significant information concerning syntactic units and relations is expressed at word-level. There is ample evidence that the application of readily available... more

descriptionView Paper arrow_downwardDownload

Approaches to Machine Translation: A Review

by John Oladosu

2025, FUOYE Journal of Engineering and Technology

Translation is the transfer of the meaning of a text from one language to another. It is a means of sharing information across languages and therefore essential for addressing information inequalities. The work of translation was... more

descriptionView Paper arrow_downwardDownload

A large scale lexical and semantic analysis of Spanish language variations in Twitter

by Sabino Miranda

2025, ArXiv

Dialectometry is a discipline devoted to studying the variations of a language around a geographical region. One of their goals is the creation of linguistic atlases capturing the similarities and differences of the language under study... more

descriptionView Paper arrow_downwardDownload

Dynamic Lexical Features of PhD Theses across Disciplines: A Text Mining Approach

by Shuyi Amelia Sun

2025, Journal of Quantitative Linguistics

This study employed a text mining method to investigate the lexical features and their dynamic changes of PhD theses across the natural sciences, social sciences and humanities. Four quantitative indices, i.e. TTR, h-point, R1 and... more

descriptionView Paper arrow_downwardDownload

Unlocking the World Through Words: The Interplay of Child Psychology and Linguistics

by Kanchana Kularathna

2025, Kanchana

Abstract This article explores the dynamic relationship between child psychology and linguistics, focusing on how language development both reflects and influences cognitive, emotional, and social growth in children. Drawing on... more

descriptionView Paper arrow_downwardDownload

Ação Do Canabidiol Em Doenças Neurológicas

by Bruno Veronez de Lima

2025

na publicação dos anais do evento. A RESP é uma publicação científica multidisciplinar, com publicação em fluxo contínuo mensal, cujo objetivo é auxiliar na produção intelectual do conhecimento sobre a saúde populacional, sempre... more

descriptionView Paper arrow_downwardDownload

A strategy for the syntactic parsing of corpora: from Constraint Grammar output to unification-based processing

by Àngels Egea

2025

This paper presents a strategy for syntactic analysis based on the combination of two different parsing techniques: lexical syntactic tagging and phrase structure syntactic parsing. The basic proposal is to take advantage of the good... more

descriptionView Paper arrow_downwardDownload

The learnability consequences of Zipfian distributions: Word Segmentation is Facilitated in More Predictable Distributions

by Inbal Arnon

2025

One of the striking commonalities between languages is the way word frequencies are distributed. Across languages, word frequencies follow a Zipfian distribution, showing a power law relation between a word's frequency and its rank .... more

One of the striking commonalities between languages is the way word frequencies are distributed. Across languages, word frequencies follow a Zipfian distribution, showing a power law relation between a word's frequency and its rank . Intuitively, this means that languages have relatively few high-frequency words and many low-frequency ones. While studied extensively, little work has explored the learnability consequences of the greater predictability of words in such distributions. Here, we propose such distributions confer a learnability advantage for word segmentation, a foundational aspect of language acquisition. We capture the greater predictability of words using the information-theoretic notion of efficiency, which tells us how predictable a distribution is relative to a uniform one. We first use corpus analyses to show that child-directed speech is similarly predictable across fifteen different languages. We then experimentally investigate the impact of distribution predictability on children and adults. We show that word segmentation is uniquely facilitated at the predictability levels found in language, compared both with uniform distributions and with skewed distributions that are less predictable than those of natural language. We further show that distribution predictability impacts learning more than distribution shape, and that learning is not improved further in distributions more predictable than natural language. These novel findings illustrate learners' sensitivity to the overall predictability of the linguistic environment; suggest that the predictability levels found in language provide an optimal environment for learning; and point to the possible role of cognitive pressures in the emergence and propensity of such distributions in language. While the world's languages differ in many respects, they share certain commonalities: these can provide crucial insight on our shared cognition and how it impacts language structure. We offer a novel learnability perspective on one of the striking commonalities between languages: the Zipfian distribution of word frequencies. We show that languages have similarly predictable distributions, and that these predictability levels are uniquely facilitative for word segmentation in children and adults. We explain their lower and upper bounds as a trade-off between the competing pressures of learnability and expressivity. These findings have far-reaching empirical and theoretical implications, illustrating learners' sensitivity to distribution predictability and pointing to role of cognitive biases in the recurrence of such skewed distributions in language. * The formula for Zipfian distributions, as extended by Mandelbrot (40), appears in Equation . It shows the relation between word's frequency -f(r) and its rank -r, with two constants that determine the shape of the distribution: α sets the steepness of the curve, and β introduces a skew which enables a better fit to natural language (1, 2).

descriptionView Paper arrow_downwardDownload

Teaching Specialised Translation Through Corpus Linguistics: Translation Quality Assessment and Methodology Evaluation and Enhancement by Experimental Approach

by Alexandra Mestivier

2025, Meta

In the current context of rapid and constant evolution of global communication and specialised discourses, the need for devising methods for ensuring both high quality levels of specialised translation and successful translation training... more

descriptionView Paper arrow_downwardDownload

Text To Image Generation By Using Stable Diffusion Model With Variational Autoencoder Decoder

by srikar konda

2025, International Journal for Research in Applied Science and Engineering Technology

descriptionView Paper arrow_downwardDownload

Efficient Feature Extraction in Sentiment Classification for Contrastive Sentences

by Anurag Baghel

2025, International Journal of Modern Education and Computer Science

Sentiment Classification is a special task of Sentiments Analysis in which a text document is assigned into some category like positive, negative, and neutral on the basis of some subjective information contained in documents. This... more

descriptionView Paper arrow_downwardDownload

Performance Enhancement of Machine Translation Evaluation Systems for English – Hindi Language Pair

by Anurag Baghel

2025, International Journal of Modern Education and Computer Science

Machine Translation (MT) is a programmed conversion in which computer software is utilized to convert manuscripts from one Natural Language (like English) to a different Language (such as Hindi). To process any such conversion, through... more

descriptionView Paper arrow_downwardDownload

Low-Resource Language Information Processing using Dwarf Mongoose Optimization with Deep Learning Based Sentiment Classification

by Vimal Gaur

2025, ACM Transactions on Asian and Low-Resource Language Information Processing

Asian and low-resource language information processing refers to the field of computational linguistics that aims to develop natural language processing (NLP) technologies for languages that have fewer available language resources or are... more

descriptionView Paper arrow_downwardDownload

Vietnamese Word Segmentation

by Nguyen Van Toan

2025, Proceedings of NLPRS'01

Word segmentation is the first and obligatory task for every NLP. For inflectional languages like English, French, Dutch,.. their word boundaries are simply assumed to be whitespaces or punctuations. Whilst in various Asian languages,... more

descriptionView Paper arrow_downwardDownload

Automatic Assessment of Students' Free-Text Answers with Different Levels

by Diana perez

2025, International Journal on Artificial Intelligence Tools

For improving the interaction between students and teachers, it is fundamental for teachers to understand students' learning levels. An intelligent computer system should be able to automatically evaluate students' answers when... more

descriptionView Paper arrow_downwardDownload

Automated Test Case Generation Using Code Models and Domain Adaptation

by Sepehr Hashtroudi

2025, arXiv (Cornell University)

Recently, deep learning-based test case generation approaches have been proposed to automate the generation of unit test cases. In this study, we leverage Transformer-based code models to generate unit tests with the help of Domain... more

descriptionView Paper arrow_downwardDownload

Politeness Strategies in Response to Request in British and Persian Family Discourse

by Neda Kameh khosh

2025, Proceedings of ADVED 2020- 6th International Conference on Advances in Education

Being successful in intercultural communication relies strongly on understanding of communicative purposes of interlocutors and pragmatic meaning of their utterances. Established on numerous cross-cultural studies, politeness is a... more

descriptionView Paper arrow_downwardDownload

Korpus Lingvistikasida Tarjimashunoslik Masalasi

by Matluba Rayimjonova

2025, COMPUTER LINGUISTICS PROBLEMS SOLUTIONS PROSPECT

O‗zbekiston Milliy universiteti Jurnalistika fakulteti Kompyuter lingvistikasi yo‗nalishi 1-kurs magistranti Annotatsiya: Ushbu maqolada parallel korpusning matn tarjimasida o‗ziga xos o‗rni hamda tillarni o‗rganish va bir-biriga... more

descriptionView Paper arrow_downwardDownload

Korpus Lingvistikasida Tarjimashunoslik Masalasi

by Matluba Rayimjonova

2025, COMPUTER LINGUISTICS: PROBLEMS, SOLUTIONS, PROSPECT

O‗zbekiston Milliy universiteti Jurnalistika fakulteti Kompyuter lingvistikasi yo‗nalishi 1-kurs magistranti Annotatsiya: Ushbu maqolada parallel korpusning matn tarjimasida o‗ziga xos o‗rni hamda tillarni o‗rganish va bir-biriga... more

descriptionView Paper arrow_downwardDownload

Content-rich biological network constructed by mining PubMed abstracts

by Burt Sharp

2025, BMC Bioinformatics

BACKGROUND: The integration of the rapidly expanding corpus of information about the genome, transcriptome, and proteome, engendered by powerful technological advances, such as microarrays, and the availability of genomic sequence from... more

descriptionView Paper arrow_downwardDownload

Can I hear you? Sentiment Analysis on Medical Forums

by Tanveer Jamali

2025

Text mining studies have started to investigae relations between positive and negative opinions and patients' physical health. Several studies linked the personal lexicon with health and the health-related behavior of the individual.... more

descriptionView Paper arrow_downwardDownload

Units of Translation and the Limited Capacity of Working Memory

by Mónica Naranjo Ruiz

2025, IntechOpen eBooks

A unit of translation is a source text fragment of any length or nature that piques a translator's interest during translation. Alves and Vale proposed the concept of macro and micro translation units based on pauses and times identified... more

descriptionView Paper arrow_downwardDownload

Emotional Drivers of Financial Decision-Making: Unveiling the Link between Emotions and Stock Market Behavior (Part 2)

by Alain Finet

2025, Journal of Next-Generation Research 5.0 (JNGR 5.0)

This study is the second part of a three-part analysis (if it meets the review requirements) of emotions carried out by written documents. These documents were collected from eight students who took part in a three-day stock market... more

descriptionView Paper arrow_downwardDownload

Prediction of Learner Native Language by Writing Error Pattern

by Takahiko Suzuki

2025, Lecture Notes in Computer Science

The native language of a foreign language learner can have an effect on the errors they make because of similarities or differences between the two languages. In order to provide effective error prediction and correction for nonnative... more

descriptionView Paper arrow_downwardDownload

Classification and Clustering English Writing Errors Based on Native Language

by Takahiko Suzuki

2025, 2014 IIAI 3rd International Conference on Advanced Applied Informatics

It is important for language learners to determine and reflect on their writing errors in order to overcome weaknesses. Each language learner has their own unique writing error characteristics and therefore has different learning needs.... more

descriptionView Paper arrow_downwardDownload

The Evolutionary Leap: Genes, Intelligence, and the Rise of Artificial Intelligence

by Mohamed Louadi

2025

This paper explores the idea that intelligence, rather than humanity, serves as the primary force driving evolution, with a particular focus on Artificial General Intelligence (AGI). As AI technology progresses, we must reconsider... more

descriptionView Paper arrow_downwardDownload

The semantic categories in teaching “If Clauses” in ESL classes

by sinem bezircilioglu

2025, Procedia - Social and Behavioral Sciences

For many second language learners, learning the target language is supposed to be identical with the mastery of the grammar of that language. When we say "the mastery of the grammar", we refer to the mastery of rules which revolve around... more

descriptionView Paper arrow_downwardDownload

The semantic categories in teaching “If Clauses” in ESL classes

by sinem bezircilioglu

2025, Procedia - Social and Behavioral Sciences

For many second language learners, learning the target language is supposed to be identical with the mastery of the grammar of that language. When we say "the mastery of the grammar", we refer to the mastery of rules which revolve around... more

descriptionView Paper arrow_downwardDownload

Goroawase Dalam Bahasa Jepang

by I Putu Tirta Fidhu Wijana

2025

descriptionView Paper arrow_downwardDownload

Identification of Sarcasm in Textual Data: A Comparative Study

by Devpriya Soni

2025, Journal of Data and Information Science

Purpose: Ever increasing penetration of the Internet in our lives has led to an enormous amount of multimedia content generation on the internet. Textual data contributes a major share towards data generated on the world wide web.... more

Purpose: Ever increasing penetration of the Internet in our lives has led to an enormous amount of multimedia content generation on the internet. Textual data contributes a major share towards data generated on the world wide web. Understanding people's sentiment is an important aspect of natural language processing, but this opinion can be biased and incorrect, if people use sarcasm while commenting, posting status updates or reviewing any product or a movie. Thus, it is of utmost importance to detect sarcasm correctly and make a correct prediction about the people's intentions. Design/methodology/approach: This study tries to evaluate various machine learning models along with standard and hybrid deep learning models across various standardized datasets. We have performed vectorization of text using word embedding techniques. This has been done to convert the textual data into vectors for analytical purposes. We have used three standardized datasets available in public domain and used three word embeddings i.e Word2Vec, GloVe and fastText to validate the hypothesis. The results were analyzed and conclusions are drawn. The key finding is: the hybrid models that include Bidirectional LongTerm Short Memory (Bi-LSTM) and Convolutional Neural Network (CNN) outperform others conventional machine learning as well as deep learning models across all the datasets considered in this study, making our hypothesis valid. Research limitations: Using the data from different sources and customizing the models according to each dataset, slightly decreases the usability of the technique. But, overall this methodology provides effective measures to identify the presence of sarcasm with a minimum average accuracy of 80% or above for one dataset and better than the current baseline results for the other datasets. The results provide solid insights for the system developers to integrate this model into real-time analysis of any review or comment posted in the public domain. This study has various other practical implications for businesses that depend on user ratings and public opinions. This study also provides a launching platform for various researchers to work on the problem of sarcasm identification in textual data.

descriptionView Paper arrow_downwardDownload

Identification of sarcasm using word embeddings and hyperparameters tuning

by Devpriya Soni

2025, Journal of Discrete Mathematical Sciences and Cryptography

Around the world, most of the proposed techniques for the identification of sarcasm either take the utterance in isolation or these methods only perform the categorization of the textual data. Very limited work has been done on how to... more

descriptionView Paper arrow_downwardDownload

An Optimum Database for Isolated Word in Speech Recognition System

by Syifaun Nafisah, ST, MT Syifaun Nafisah, ST, MT

2025, TELKOMNIKA (Telecommunication Computing Electronics and Control)

Speech recognition system (ASR) is a technology that allows computers receive the input using the spoken words. This technology requires sample words in the pattern matching process that is stored in the database. There is no reference as... more

descriptionView Paper arrow_downwardDownload

La tecnología orientada a objetos y la ingeniería de software ante la complejidad inherente al software

by Olivia Allende Hernandez

2025

descriptionView Paper arrow_downwardDownload

A Comparative Analysis of Clustering Methods on the 20 Newsgroups Dataset for Analytics

by Hanza Parayil Salim and

2025, International Journal of Scientific Research in Computer Science, Engineering and Information Technology

This paper presents a comparative analysis of two different approaches for clustering textual data from the 20 Newsgroups dataset. The first approach leverages a Large Language Model (LLM) to classify each text into predefined categories... more

descriptionView Paper arrow_downwardDownload

by Justin Hicks

2025, Bioinformatics

Motivation: The most widely used literature search techniques, such as those offered by NCBI's PubMed system, require significant effort on the part of the searcher, and inexperienced searchers do not use these systems as effectively... more

descriptionView Paper arrow_downwardDownload

Browsing multilingual information with the multisemcor web interface

by Marcello Ranieri

2025, Proceedings of the LREC 2004 Satellite Workshop on The Amazing Utility of Parallel and Comparable Corpora

Parallel and comparable corpora represent a crucial resource for different Natural Language Processing tasks like machine translation, lexical acquisition, and knowledge structuring but are also suitable to be consulted by humans for... more

descriptionView Paper arrow_downwardDownload

The Structure of Dictionary Entries in Terminological Dictionaries

by Milica Mihaljevic

2025, Filologija

Milica MIHALJEVIC Zavod za hrvatski jezik HR, Zagreb STRUKTURA RJECNTCKOGA CLANKA U TERMINOLOSKIM RJECNTCIMA Struktura rjecnickoga clanka u terminoloskim rjeenicima razliCita je ovisno 0 vrsti i namjeni rjeenika. Neki mali, prijevodni... more

Milica MIHALJEVIC Zavod za hrvatski jezik HR, Zagreb STRUKTURA RJECNTCKOGA CLANKA U TERMINOLOSKIM RJECNTCIMA Struktura rjecnickoga clanka u terminoloskim rjeenicima razliCita je ovisno 0 vrsti i namjeni rjeenika. Neki mali, prijevodni terminoloski rjeenici imaju sa svim jednostavnu strukturu Medutim, u objasnidbenim rjeenicima rjeenicki clanak moze biti vrlo kompleksan. U radu se na primjeru nekih postojecih hrvatskih terminoloskih rjecnika analizira struktura rjeenickoga clanka te pred laze najprihvatljivija struktura za razl icite tipove terminoloskih rjeenika. Terminoloski su rjecnici specijalni rjecnici koji se ovisno 0 klasifikacijskom kri teriju dijele na vise tipova. Budu6i da struktura r jeenickoga clanka u terminoloskom rjeCniku ovisi 0 tipu rjecnika, ovdje cu ukratko nesto re6i 0 podjeli terminoloskih rjeenika. Govorit cu samo 0 onim kriterijima podjele 0 kojima ovisi struktura rje6 nickoga clanka. S obzirom na nacin obradbe natuknice rjecnici se dijele na objas nidbene i suodnosne (korelacijske). Objasnidbeni rjecnici navode defmiciju, katkada primjer uporabe ili ilustraciju. Suodnosni rjecnici samo povezuju (koreliraju) ozna Citelje na istom (sinonimijski rjecnici) ili raznim jezicima (prijevodni rjecnici). Iako je zamisliv terminoloski sinonimijski rjecnik, u praksi su terminoloski rjeenici uglavnom samo prijevodni, objasnidbeni ili prijevodni i objasnidbeni. S obzirom na broj jezika koji su njima obuhvaceni rjecnike dijelimo na jednojezicne i visejezicne (dvojezicne, trojezicne, cetverojezicne, petojezicne itd.). Po opsegu se rjecnici dijele na velike, srednje i male. Terminoloski se rjeCnici obzirom na struku dijele na ma tematicke, kemijske, bioloske, lingvisticke itd. Najjednostavniju strukturu rjecnickoga clanka imaju mali, prijevodni, dvojezicni rjecnici. Oni uz natuknicu l navode istoznacnice na jednom ili vise jezika. Primjer Svaki naziv ima dvije slrane: pojam koji oznacuje i jezicni izraz k<~im se laj pojam ozoaeuje. U rjeeniku se laj jezieni izraz naziva natuknica. Natuknica (nj. Schlagworl. Stichworl. eng!. entry) jest leksicka jedinica (skup rijeCi. rijeC. morfem) koja sluzi kao naslov rjeCnickoga clanka u kojem se opisuje njezino znaeenje. objaSnjava pojam koji je njome oznaeen. Obicno je naluknica istaknuta posebnOO1 vrslom pisma ili slova (najceliCe masnim slovima).

descriptionView Paper arrow_downwardDownload

Log In

Natural Language Processing

Related Topics