About
Jan Kocoń is an Assistant Professor at the Wroclaw University of Science and Technology, where he received a Ph.D. degree in computer science (2018) and an MSc. Eng. degree (2012). He is also AI/ML Team Leader and Senior ML/NLP Data Scientist in the CLARIN-BIZ project. He has worked on natural language processing (NLP) for over a decade, especially using machine learning methods. He is the author of more than 60 scientific publications, presented at conferences such as ACL, ICDM, CoNLL, COLING, PerCOM, ICCS, KES, LREC, RANLP, and GWC. He is currently working on advanced personalized models based on deep learning in subjective tasks, such as emotion, sentiment, hate speech, or humor recognition. He also works on cross-lingual knowledge transfer and the application of language-agnostic models. He actively participated in the following projects using machine learning-based solutions: SYNAT, NEKST, CLARIN-PL, Parthenos, AZON, Sentimenti, CLARIN-BIZ, Q-Travel, and AI Tech. He lectures on an introduction to data science, the application of artificial intelligence in natural language processing, and the construction of advanced deep neural network models. Winner of HackYeah 2021 - the largest hackathon in Europe - in a task related to optimizing the construction of a power plant based on renewable energy sources for hydrogen production.
Research Interests
Education
MsC in Computer Science
Wroclaw University of Science and Technology · 2012
Ph.D. in Computer Science, Artificial Intelligence
Wroclaw University of Science and Technology · 2018
Publications
AI Overload: A Multi-Level Taxonomy and the Path Forward
2026 · IEEE Intelligent Systems 41(2)
Breaking the Illusion of Reasoning in Polish LLMs
2026 · EACL 2026
Enhancing AI Face Realism: Cost-Efficient Quality in Distilled Diffusion Models
2025 · ICCS Workshops 3 (2025)
The PLLuM Instruction Corpus
2025 · arXiv 2511.17161
CLARIN-PL: a user centred language technology infrastructure
2025 · Language Resources and Evaluation 59(4)
AggTruth: Contextual Hallucination Detection Using Aggregated Attention Scores in LLMs
2025 · ICCS Workshops 5 (2025)
Improving LLM-Based Recommender Systems with User-Controllable Profiles
2025 · WWW Companion 2025
Predicting Stock Prices with ChatGPT-Annotated Reddit Sentiment
2025 · ICCS Workshops 3 (2025)
SupResDiffGAN: A New Approach for the Super-Resolution Task
2025 · ICCS Workshops 3 (2025)
Architectural Concepts for Integrating Fundamental Drives and Emotions Into AI
2025 · IEEE Intelligent Systems 40(6)
Integrating personalized and contextual information in fine-grained emotion recognition in text
2025 · Information Fusion 118
PLLuM: A Family of Polish Large Language Models
2025 · arXiv 2511.03823
HalluBrainScan: Model-agnostic Reference-free Activation-based LLM Hallucination Detection
2025 · arXiv (preprint)
Fortifying NLP Models Against Poisoning Attacks: The Power of Personalized Prediction Architectures
2025 · Information Fusion 114
Typology of Image Crises Using LLMs: A Novel Approach to Crisis Classification
2025 · J. Contingencies and Crisis Management 33(4)
Backtranslation and Paraphrasing in the LLM Era? Comparing Data Augmentation Methods
2025 · ICCS (1) 2025
Comprehensive Sentiment Analysis of Polish Book Reviews
2024 · ICDM Workshops 2024
Improving Training Dataset Balance with ChatGPT Prompt Engineering
2024 · Electronics 13(12)
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
2024 · arXiv 2404.05892
Personalized Large Language Models
2024 · ICDM Workshops 2024
Into the Unknown: Self-Learning Large Language Models
2024 · ICDM Workshops 2024
Small Language Models for Emotion Recognition in Polish Stock Market Investor Opinions
2024 · ICDM Workshops 2024
Deep Emotions Across Languages: A Novel Approach for Sentiment Propagation in Multilingual WordNets
2023 · ICDM Workshops 2023
Personalized Models Resistant to Malicious Attacks for Human-centered Trusted AI
2023 · SafeAI@AAAI 2023
From Big to Small Without Losing It All: Text Augmentation with ChatGPT
2023 · ICDM Workshops 2023
ChatGPT: Jack of all trades, master of none
2023 · Information Fusion 99
Human-Centered Neural Reasoning for Subjective Content Processing
2023 · Information Fusion 94
Towards Model-Based Data Acquisition for Subjective Multi-Task NLP
2023 · ICDM Workshops 2023
Modeling Uncertainty in Personalized Emotion Prediction with Normalizing Flows
2023 · ICDM Workshops 2023
Differential Dataset Cartography: Explainable AI in Personalized Sentiment Analysis
2023 · ICCS 2023
PALS: Personalized Active Learning for Subjective Tasks in NLP
2023 · EMNLP 2023
CLARIN-Emo: Training Emotion Recognition Models Using Human Annotation and ChatGPT
2023 · ICCS 2023
RWKV: Reinventing RNNs for the Transformer Era
2023 · Findings of EMNLP 2023
Capturing Human Perspectives in NLP: Questionnaires, Annotations, and Biases
2023 · NLPerspectives@ECAI 2023
Multi-Modal Personalized Hate Speech Analysis using Differential Dataset Cartography
2023 · DE-FACTIFY@AAAI 2023
MultiEmo: language-agnostic sentiment analysis
2022 · *Computational Science - ICCS 2022 : 22nd International Conference London, UK, June 21-23, 2022 : proceedings. Pt. 2*
Deep neural sequence to sequence lexical substitution for the polish language
2022 · *Computational Science - ICCS 2022 : 22nd International Conference London, UK, June 21-23, 2022 : proceedings. Pt. 1*
Multilingual and language-agnostic recognition of emotions, valence and arousal in large-scale multi-domain text reviews
2022 · *Human language technology : challenges for computer science and linguistics : 9th Language and Technology Conference, LTC 2019, Poznan, Poland, May 17–19, 2019 : revised selected papers*
Multitask personalized recognition of emotions evoked by textual content
2022 · *2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops) : 21-25 March 2022, Pisa, Italy.*
Evaluating natural language processing tools for polish during PolEval 2019
2022 · *Human language technology : challenges for computer science and linguistics : 9th Language and Technology Conference, LTC 2019, Poznan, Poland, May 17–19, 2019 : revised selected papers*
Neuro-symbolic models for sentiment analysis
2022 · *Computational Science - ICCS 2022 : 22nd International Conference London, UK, June 21-23, 2022 : proceedings. Pt. 2*
What if ground truth is subjective? Personalized deep neural hate speech detection
2022 · *LREC 2022 : Workshop Language Resources and Evaluation Conference : 20th June 2022, 1st Workshop on Perspectivist Approaches to NLP (NLPerspectives) : proceedings*
StudEmo: a non-aggregated review dataset for personalized emotion recognition
2022 · *LREC 2022 : Workshop Language Resources and Evaluation Conference : 20th June 2022, 1st Workshop on Perspectivist Approaches to NLP (NLPerspectives) : proceedings*
AspectEmo: Multi-domain corpus of consumer reviews for aspect-based sentiment analysis
2021 · *21st IEEE International Conference on Data Mining Workshops ICDMW 2021, 7-10 December 2021, Virtual Conference : proceedings*
Multi-task sequence classification for disjoint tasks in low-resource languages
2021 · *Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 25th International Conference KES 2021*
Mapping WordNet onto human brain connectome in emotion processing and semantic similarity recognition
2021 · *Information Processing & Management*
Multiemo: multilingual, multilevel, multidomain sentiment analysis corpus of consumer reviews
2021 · *Computational Science - ICCS 2021 : 21st International Conference Krakow, Poland, June 16-18, 2021 : proceedings. Pt. 2*
Offensive, aggressive, and hate speech analysis: from data-centric to human-centred approach
2021 · *Information Processing & Management*
Controversy and conformity: from generalized to personalized aggressiveness detection
2021 · *The 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021, August 1-6, 2021 : Proceedings of the Conference, Vol. 1 (Long Papers)*
Personal bias in prediction of emotions elicited by textual opinions
2021 · *The 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021, August 5-6, 2021, Bangkok, Thailand (online) : Proceedings of the Student Research Workshop*
Emotion norms for 6000 Polish word meanings with a direct mapping to the Polish wordnet
2021 · *Behavior Research Methods*
Deep neural language-agnostic multi-task text classifier
2021 · *21st IEEE International Conference on Data Mining Workshops ICDMW 2021, 7-10 December 2021, Virtual Conference : proceedings*
Learning personal human biases and representations for subjective tasks in natural language processing
2021 · *21st IEEE International Conference on Data Mining ICDM 2021, 7-10 December 2021, Virtual Conference : proceedings*
Cross-lingual deep neural transfer learning in sentiment analysis
2020 · *Knowledge-Based and Intelligent Information & Engineering Systems: Proceedings of the 24th International Conference KES 2020*
PolEval 2019 - the next chapter in evaluating Natural Language Processing tools for Polish
2019 · *Human language technologies as a challenge for computer science and linguistics - 2019*
Recognition of emotions, valence and arousal in large-scale multi-domain text reviews
2019 · *Human language technologies as a challenge for computer science and linguistics - 2019*
Recognition and normalisation of temporal expressions using Conditional Random Fields and Cascade of Partial Rules
2019 · *Poznań Studies in Contemporary Linguistics*
Results of the PolEval 2019 shared task 1: recognition and normalization of temporal expressions
2019 · *Proceedings of the PolEval 2019 Workshop*
Multi-level analysis and recognition of the text sentiment on the example of consumer opinions
2019 · *International Conference Recent Advances in Natural Language Processing RANLP 2019 : Natural Language Processingin a Deep Learning World, Varna, Bulgaria, 2-4 September, 2019 : proceedings*
Recent advances in cross-domain sentiment analysis of Polish texts
2019 · *Polskie Porozumienie na Rzecz Rozwoju Sztucznej Inteligencji : 16-18.10.2019, Wrocław, Poland : conference proceedings.*
Multi-level sentiment analysis of PolEmo 2.0: extended corpus of multi-domain consumer reviews
2019 · *Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL) : November 3-4, 2019, Hong Kong, China*
Propagation of emotions, arousal and polarity in WordNet using heterogeneous structured synset embeddings
2019 · *Proceedings of the Tenth Global Wordnet Conference : July 23-27, 2019, Wrocław (Poland)*
Classifier-based polarity propagation in a Wordnet
2018 · *Proceedings of the Eleventh International Conference on Language Resources and Evaluation, LREC 2018 : Miyazaki, Japan, May 07-12, 2018*
Statistical distributions of parts of speech frequencies in Polish :
2018 · *Structure, function and process in texts*
Rozpoznawanie wyrażeń temporalnych i opisów sytuacji w dokumentach tekstowych dla języka polskiego
2018
Recognition of named entities for Polish - comparison of deep learning and conditional random fields approaches
2018 · *Proceedings of the PolEval 2018 Workshop*
Evaluating KGR10 Polish word embeddings in the recognition of temporal expressions using BiLSTM-CRF
2018 · *Schedae Informaticae*
Context-sensitive sentiment propagation in WordNet
2018 · *Proceedings of the 9th Global WordNet Conference (GWC 2018) : January 8-12, 2018 Singapore*
Three attempts in PolEval 2017 Sentiment Analysis Task
2017 · *Human language technologies as a challenge for computer science and linguistics : 8th Language & Technology Conference, November 17-19, 2017, Poznań, Poland : proceedings*
Liner2 - a generic framework for named entity recognition
2017 · *The 6th Workshop on Balto-Slavic Natural Language Processing, BSNLP 2017 : Valencia, Spain, 4 April, 2017 : proceedings of the Workshop.*
Supervised approach to recognise Polish temporal expressions and rule-based interpretation of timexes
2017 · *Natural Language Engineering*
Improved recognition and normalisation of Polish temporal expressions
2017 · *International Conference Recent Advances in Natural Language Processing 2017 : Varna, Bulgaria, 2-8 September, 2017 : proceedings*
Inforex - a collaborative system for text corpora annotation and analysis
2017 · *International Conference Recent Advances in Natural Language Processing 2017 : Varna, Bulgaria, 2-8 September, 2017 : proceedings*
Recognition of genuine Polish suicide notes
2017 · *International Conference Recent Advances in Natural Language Processing 2017 : Varna, Bulgaria, 2-8 September, 2017 : proceedings*
plWordNet as a basis for large emotive lexicons of Polish
2017 · *Human language technologies as a challenge for computer science and linguistics : 8th Language & Technology Conference, November 17-19, 2017, Poznań, Poland : proceedings*
Generating of events dictionaries from Polish WordNet for the recognition of events in Polish documents
2016 · *Text, Speech, and Dialogue : 19th International Conference, TSD 2016, Brno, Czech Republic, September 12-16, 2016 : proceedings*
Towards an event annotated corpus of Polish
2015 · *Cognitive Studies*
Temporal expressions in Polish Corpus KPWr
2015 · *Cognitive Studies*
Recognition of Polish temporal expressions
2015 · *International Conference Recent Advances in Natural Language Processing : Hissar, Bulgaria, 7-9 September, 2015 : proceedings*
Distributionally extended network-based Word Sense Disambiguation in semantic clustering of Polish texts
2014 · *2014 International Conference on Future Information Engineering (FIE 2014) : Beijing, China 7-8 July, 2014*
Named entity matching method based on the context-free morphological generator
2014 · *Advances in natural language processing : 9th International Conference on NLP, PolTAL 2014, Warsaw, Poland, September 17-19, 2014 : proceedings*
Recognition of named entities boundaries in Polish texts
2013 · *4th Biennial International Workshop on Balto-Slavic Natural Language Processing : workshop proceedings.*
Liner2 – a customizable framework for proper names recognition for Polish
2013 · *Intelligent tools for building a scientific information platform : advanced architectures and solutions*
Inforex :a web-based tool for text corpus management and semantic annotation
2012 · *Proceedings of the Eight International Conference on Language Resources and Evaluation, LREC 2012, Istanbul, Turkey, May 23-24-25, 2012*
Heterogeneous named entity similarity function
2012 · *Text, Speech and Dialogue : 15th International Conference, TSD 2012, Brno, Czech Republic, September 3-7, 2012 : proceedings*
Projects
AI-Tech
activeCLARIN-PL-Biz
active
