Activities

Listed in reverse chronological order are all relevant academic activities.

2025

Supervising MSc theses in Data Science (DSS): Giannis Machairidis, Jesse Osseweijer, Ralitsa Andreeva, Sara Welter, Emma (Zhiyan) Zhang, Carol John, Rens Van Schaardenburg, and Mikel Villalabeitia Ford and one MSc thesis in Cognitive Science & AI (CSAI): Yifei Mao.
TAO received an 8k SIDN grant in collaboration with (ao) Justice for Prosperity.
I assisted during the first lecture of Introduction to AI 2 (CSAI BSc).

2024

TAO submitted ‘Automatic Large-scale Political Bias Detection of News Outlets’ to PLOS One.
Taught guest lectures on Social Media Monitoring for the Natural Language Processing course (joint CSAI and DSS MSc) and on Web Intelligence for the Data Science for Society minor.
Developed thesis track material on reproducibility and model deployment.
Teaching the Fall semester of RS: Data Processing (880254).
Our abstract “Learning from neural networks: a strategy to identify systematic improvements to activity coefficient models” was accepted to NPS.
Submitted a HORIZON-AG grant proposal.
I received a Teacher Award Certificate for Excellent Course Evaluation of Language & AI (JBC090) from TU/e.
Submitted an EMIF grant proposal (not granted).
Attended LREC/COLING 2024.
I gave a Studium Generale talk ‘Framing Intelligence: Web Intelligence as Artificial Intelligence’ about programming and what people would like you to believe is intelligence:

I taught a guest lecture on ‘Calling Bullsh*t: the CL Edition’ for the Computational Linguistics course of our CSAI BSc.
Reviewed for Journal of Big Data.
Submitted a paper to COLM 2024.
Ronja Rönnback started as a PhD student on a Starter Grant project at the Tilburg Algorithm Observatory under supervision of myself, Dr. Henry Brighton, and Prof. Dr. Marie Šafář as promotor.
BigNLI: Native Language Identification with Big Bird Embeddings and SOBR: A Corpus for Stylometry, Obfuscation, and Bias on Reddit accepted to LREC/COLING 2024 (2, 3, 3 and 4, 3, 3).
I gave an ‘Introductie AI’ guest lecture at the Tilburg Law School.
I taught a guest lecture on profiling and obfuscation for the Responsible AI course of our CSAI MSc.
My DSS master thesis student Sergey Kramp won Tilburg’s best ‘23 master thesis prize (associated paper: Native Language Identification with Big Bird Embeddings).
Supervising MSc theses in Data Science (DSS): Beāte Abāšina, Bram Diesch, Ben Witte, Orcun Erdem, Storm Zeelenberg, Yexin Yang, and BSc thesis in Data Science (JBC): Jason Lin – augmenting research retrieval systems with LLMs.
Won a 10k faculty grant to collaborate on researching the role and impact of Large Language Models (LLMs) used in Dutch public healthcare platforms with Supraja Sankaran, Emiel Krahmer, Chris van der Lee, Ronald Leenes, Tineke Broer, Bennett Kleinberg, and Bartho Hengst.
Teaching the Spring semester of RS: Data Processing (880254).

2023

Taught Language & AI for the JADS-based Data Science Bachelor at TU/e (JBC090).
Supervised MSc theses in Data Science (DSS), described by general topic: Feyza Aslan — assessing bias in LLMs, Brian Leung — author profiling on Reddit, Heleen de Vos — obfuscation through paraphrasing.
Proposed a new DSS MSc course around reproducibility and computational model deployment.
Submitted two papers to LREC/COLING 2024.
Contributed to some departmental culture activities: organized DCA team participation for the 5K Business Run of the Tilburg Ten Miles. We ended 19th out of 52 teams. Gave input to the remodeling of our Dante building. Volunteered to organize a session on financial administration.
Taught the Fall semester of RS: Data Processing (880254).
Native Language Identification with Big Bird Embeddings available on arXiv. Work by DSS MSc student Sergey Kramp.
Gave a talk about User-Centered Security in Natural Language Processing at CLIN 2023.
Wrote a chapter about Machine Morality for the Encyclopedia of Heroism Studies.
Gave a talk on Reproducibility Caveats in Machine Learning at the Open to Complexity Symposium on Open Science.
Received a 240K starter grant and will be looking for PhD candidates soon.
Received a 150k grant under the Tilburg Algorithm Observatory to work on LGBQT+ misinformation propagation. This work will be conducted together with the Prague Pride non-profit, and the Department of Political Studies at Charles University in Prague.
Was appointed as a member of the Program Committee for Data Science and Society.
Won a 9.9K faculty grant for data collection on digital globalization (extremism in particular) with Bennett Kleinberg.
Contributed to our department’s internal policy surrounding LLMs and am currently part of the University’s workgroup dealing with short and long term educational policy.
Supervising MSc theses in Data Science and Msc Artficial Intelligence (CSAI), described by general topic: Sergey Kramp (DSS) — classification of native language style in English, Ronja Rönnback (CSAI) — political bias in news media.
Gave an ‘Introductie AI’ guest lecture at the Tilburg Law School.
I was a guest the One Deeper Podcast, hosted by Udesh Habaraduwa (one of our CSAI students), talking about ChatGPT, the impact of (generative) AI, safety, privacy, computational creativity, etc:

Gave a talk on ChatGPT and the potential impact of language generation on our (higher) education.
Successfully defended my PhD thesis on Friday the 13th of January.
Wiped Twitter account. You can find me on Mastodon (and elsewhere).

2022

Finished ‘User-Centered Security in Natural Language Processing’ for print.
Started the Tilburg Algorithm Observatory with principal investigator Henry Brighton.
Supervised MSc theses in Data Science (DSS) and BSc Artificial Intelligence (CSAI), described by general topic: Gordon Arscott (CSAI) — distantly supervised political bias assessment on Reddit, Benoit de Booij (DSS) — meta-learning in Phishing Website Detection, Emiel Culleton (DSS) — detecting sensitive information in text corpora, Melanie Waterham (DSS) — sentiment bias in news-related comments on Twitter.
Taught Language & AI for the JADS-based Data Science Bachelor at TU/e (JBC090).
Taught the Fall semester of RS: Data Processing (880254).
Accepted a permanent position as Assistant Professor at the department of Cognitive Science & Artificial Intelligence of Tilburg University per 1st of October.
R&R for Neural Data-to-Text Generation Based on Small Datasets: Comparing the Added Value of Two Semi-Supervised Learning Approaches on Top of a Large Language Model submitted to a journal.
Submitted a proposal for two GPUs to the NVIDIA Academic Hardware Grant program (not granted).
Won a 12.9K faculty grant for data collection and research on digitization in society with Bennett Kleinberg.
Attended LREC 2022.
Elected as faculty council member.
Supervised theses in Data Science (described by general topic): Mikhail Bogdanov — Predicting educational outcomes, Koen Kurver — Adaptive age classification targets, Anne Plomp — Automatic categorization of academic output under sustainability goals, Bob Sanders — Distant toxicity classification, Sam Sanders — Extending author obfuscation to multiple platforms, and Ronald van Os — Replicating transformer-based lexical substitution.
Pre-prosal submitted for NWO call on subversive crime.
Acquired a 18k grant for educational innovation for a broad implementation of Git, and specifically GitHub classroom in courses at our Department (and more broadly providing tools for the university).
Runner-up for the DSA Pattern Best Teacher award (Bachelors).
Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations accepted to LREC 2022.
Taught the Spring semester of RS: Data Processing (880254).
Gave an Introduction to AI and ‘Introduction to Machine Learning’ guest lecture at the Tilburg Law School.
Gave a few internal talks about reproducibility and robust academic workflows.
Dissertation ‘User-Centered Security in Natural Language Processing’ internally submitted (and accepted).
Submitted Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations to LREC 2022.

2021

Supervised theses in Data Science (described by general topic): Eklavya Dahiya — Unsupervised Clustering of News Articles, and Sena Solak — Sentiment Classification on Turkish Political Twitter.
Taught Cognitive Science 2 (Language & AI) for the JADS-based Data Science Bachelor at TU/e (JBC090.
NeuTral Rewriter: A Rule-Based and Neural Approach to Automatic Rewriting into Gender Neutral Alternatives accepted to EMNLP 2021 .
Taught the Fall semester of Data Mining for Business and Governance (880022.
Collaboration paper submitted to EMNLP 2021.
Attended EACL 2021 (virtually).
Gave an Introduction to AI and ‘Introduction to Machine Learning’ guest lecture at the Tilburg Law School.
Supervised theses in Data Science (described by general topic): Anke Bodewes — differential privacy and e-mail categorization, Cas Goos — effects of off-topic conversations on sentiment classifiers, Cas van den Hurk — polarization on reddit, and Leon Korbee — predicting the Dutch elections.
Taught the Spring semester of RS: Data Processing (880254).
Gave a talk about my work on Invasive Artificial Intelligence at the kick-off of the TAISIG Talks.
Adversarial Stylometry in the Wild: Transferable Lexical Substitution Attacks on Author Profiling accepted to EACL 2021 (2.5, 3.5, 4.5).

2020

Taught the Fall semester of RS: Spatiotemportal Data Analysis (800880).
The CACAPO Dataset: A Multilingual, Multi-Domain Dataset for NeuralPipeline and End-to-End Data-to-Text Generation accepted to INLG 2020 and CONLL 2020 (retracted latter).
Collaboration paper submitted to INLG / CONLL 2020.
Submitted paper to EACL 2021.
Supervised theses in Data Science (described by general topic): Robbin Breeuwer — extending adversarial stylometry via lexical subsitution attacks, Max Knegt — debiasing racial hate speech using distantly supervision, Mert Torun — comparing polls with Twitter sentiment for the 2020 US elections.
Taught the Fall semester of Data Mining for Business and Governance (880022), and RS: Data Processing (880254).
Current Limitations in Cyberbullying Detection: on Evaluation Criteria, Reproducibility, and Data Scarcity’ published in Language Resources and Evaluation.
Paper rejected for EMNLP 2020 (1.5, 3.0, 3.5, 4.0).
Supervised theses in Data Science (described by general topic): Myrthe Reuver — predicting complaint labels for short clinical texts.
Submitted paper to EMNLP 2020.
Became a founding member of the Culture Committee at the Department of Cognitive Science & Artifical Intelligence, and Head of the PhD Culture Board. Started organizing weekly meetings with Lieke Gelderloos, Paris Mavromoustakos Blom, and George Aalbers
‘Current Limitations in Cyberbullying Detection: on Evaluation Criteria, Reproducibility, and Data Scarcity’ accepted to Language Resources and Evaluation with minor revisions.

2019

‘Current Limitations in Cyberbullying Detection: on Evaluation Criteria, Reproducibility, and Data Scarcity’ available on arXiv.
Submitted paper to Language Resources and Evaluation.
Gave a guest lecture on Data Quality for the Tax & Technology course at the Vrije Universiteit.
Attended ATILA 2019.
Reviewed for CONLL 2019.
‘Towards Replication in Computational Cognitive Modeling: A Machine Learning Perspective’ accepted as commentary paper in Computational Brain & Behaviour.
Submitted commentary to Computational Brain & Behaviour.
Attended the Blackbox@NL workshop at JADS.
Gave a talk on (Adversarial) Computational Stylometry for our department’s colloquium series.
Gave an Introduction to AI guest lecture at the Tilburg Law School.
Attended ICT with Industry 2019 at the Lorentz Center in Leiden, predicting public importance of news for the Persgroep. My observations here.

2018

Presented Style Obfuscation by Invariance at ATILA 2018.
‘Automatic detection of cyberbullying in social media text’ accepted in PLoS ONE.
Supervising theses in Data Science (described by general topic): Gytha Muller — Improving Cross-Domain Distant Profiling, Ruben van de Kerkhof — Stylometric Features in Sarcasm Detection.
Teaching Assistant (provided the Decision Tree lecture) for Machine Learning (880083).
Taught the Fall semester of Data Mining for Business and Governance (880022).
Invited lecturer at the Vrije Universiteit, providing the Data Quality lecture for the Tax & Technology course.
Part of the local organisation committee for BNAIC2018/BENELEARN2018.
‘Style Obfuscation by Invariance’ accepted for COLING 2018.
Supervised Master theses in Data Science (described by general topic): Joey Dokter — Predicting Customer Conversion for In-House Advertisements (in collaboration with OnMarc), Alejandra Hernández Réjon — Gender Differences in Cyberbullying Detection, Kostas Stoitas — Evaluating Word Embeddings for Cyberbullying Detection, Tzoulian Prodromos Ninas — Investigating Cyberbullying and Toxicity, Anwar Amezoug — Out of Domain Performance of Profiling, Coen van Duijnhoven — Robustness of Simple Queries on Political Preference Detection, Martijn Oele — Linguistically Informed Local Changes for Author Obfuscation, Jan de Rooij — Replicating Style Obfuscation by Invariance.
Blog shared by KDNuggets, again:

Euclidean vs. Cosine Distance
"This post was written as a reply to a question asked in the Data Mining (https://t.co/oCLBFN8ELc) course: When to use the cosine similarity?" https://t.co/Wf3GjpWn0t pic.twitter.com/MOT7Pw2AE2
— KDnuggets (@kdnuggets) March 23, 2018

Submitted long paper to COLING 2018.
Part of the Scientific Committee of LREC 2018; reviewed for main conference and TA-COS workshop.
Presented ‘Attribute Obfuscation with Gradient Reversal’ at CLIN 28.
Taught the Spring semester of Data Mining for Business and Governance (880022).

2017

Blog shared by KDNuggets, twice:

Scikit-learn Pipeline Persistence and JSON Serialization Part II https://t.co/q0DdKRHR45 pic.twitter.com/WMQUqhsTf2
— KDnuggets (@kdnuggets) December 27, 2017

Scikit-learn Pipeline Persistence and JSON Serialization https://t.co/eLJ5jiPxRd pic.twitter.com/EiimZV6zz8
— KDnuggets (@kdnuggets) December 27, 2017

Presented ‘Distantly Supervised Prediction of Demographic Information on Social Media’ and tōku at the TiCC PhD Day 2017 and won the Best Demo Award.
Organized ATILA’17 in Tilburg.
Was part of the jury at the Xomnia Datathon against cyberbullying.
Taught Fall semester Data Mining for Business and Governance (880022).
Attended EMNLP 2017 in Copenhagen.
‘Simple Queries as Distant Labels for Detecting Gender on Twitter’ accepted for W-NUT 2017.
Organized informal NLP with Deep Learning summer school at Tilburg University.
Co-reviewed for CONLL 2017, and EMNLP 2017.
Submitted short paper to EMNLP 2017.
Co-reviewed for EACL 2017.
Supervised Master theses in Data Science: Evie Izeboud — Prediction of Job Transition Using Publicly Available Professional Profiles (in collaboration with 8vance), Thomas Rockx — Profile pictures as predictor for network size on Twitter, Vannesa Berhitoe — Multi-label emotion detection on Twitter.
Taught Spring semester Social Data Mining (880022).

2016

Built a concept version for the CS&AI group’s webpage.
Attended ATILA 2016.
Started teaching Fall semester Text Mining (880091) and Social Data Mining (880022) in context of the new Data Science master. Materials for both courses are open-source.
Moved to the CS&AI group at Tilburg University as part Lecturer, part joint PhD candidate with CLiPS at the University of Antwerp under the supervision of Eric Postma, Grzegorz Chrupała, and Walter Daelemans.
Attended the 2nd Summer School on Integrating Vision & Language: Deep Learning (iV&L).
Gave a LaTeX tutorial for the course Computational Models of Language Understanding.
‘Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource’ accepted for LREC 2016.
Part of the Scientific Committee of LREC 2016; reviewed for main conference and TA-COS workshop.

2015

Wrote a Python docstring to Markdown convertor (markdoc).
Made Omesa open-source.
Submitted ‘Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource’ to LREC.
Wrote an author profiling module and demo environment for AMiCA (profl).
Presented ‘Domain Adaptation of Simulated Data for Cyberbullying Research.’ and demonstrated ‘Shed: a Framework for Reproducible Text Mining’ at ATILA 2015.
Organized ATILA 2015 and built its website.
Wrote a Python wrapper for Stanford’s Topic Modelling Toolbox (topbox).
Participated in the Lisbon Machine Learning School (LxMLS).
Presented ‘Modelling Discussion Topics to Improve News Article Tagging.’ and demonstrated ‘Ebacs: a Minimalistic Conference Manager’ at DHBenelux.
Presented ‘Topic Modelling in Online Discussions’ at CLIN 2015 and accepted the STIL prize.
Accepted the Leo Coolen award at Tilburg University.

2014

Made the book of abstracts for CLIN25 and developed ec2latex in the process.
Attended ATILA 2014.
Gave a guest lecture for Tilburg University’s course Social Intelligence about my research and applying Text Mining for business purposes.
Started as a PhD candidate on AMiCA project.