"Your living room is the final frontier for robots." - Cynthia Breazeal
My primary research interests lie in three areas:
Dialogue systems and their applications
Computational processing of figurative language and other forms of linguistic creativity
Grounded language learning
Furthermore, I am interested in deploying research in those areas to facilitate cognitive and psychological well-being, primarily via human-robot interactive systems. Below, I provide a brief overview of the projects in which I have been involved to date; for additional details, please refer to the relevant papers listed (if applicable) or send me an email. Most of my research has been conducted in the
Human Intelligence and Language Technologies Laboratory
, part of the
UNT Department of Computer Science and Engineering
Metaphor Novelty Scoring |
Metaphor Novelty Dataset(s) |
Automatically Aggregating Crowdsourced Labels |
Multimodal Language Grounding |
Sarcasm Detection |
Cybersecurity and EEG |
Crowdsourcing Hardware Mapping Algorithms
For my dissertation work, I developed a human-robot book discussion system that focuses its discussions on particularly novel or creative metaphors in the books being discussed. A central component of this project was in developing an accurate metaphor novelty scoring model---essentially, the system needed to avoid asking questions about conventional metaphors (e.g., spending an hour on homework) and instead only ask questions about those that the reader was unlikely to have encountered on a regular basis (e.g., frowning like a thunderstorm). However, work on computational metaphor processing prior to this had confined itself to the problem of metaphor detection (that is, determining whether or not a fragment of text is a metaphor) rather than extending to the problem of determining how novel that metaphor might be.
I built a deep neural network to predict metaphor novelty for new word pairs along a continuous scale. I investigated a wide array of features for the task, in order to determine which data characteristics commonly used for detecting metaphors transferred well to this new scoring problem, as well as which types of features were particularly suitable for this task. I also compared my scoring model with the performance of a high-performing metaphor detection approach, modified such that it produced continuous labels rather than discrete 1s and 0s, to provide supportive evidence that scoring metaphor novelty is a distinct task from simple metaphor detection (as opposed to merely the same task solved with a regression model).
I found that my approach outperformed the standard metaphor detection model by more than 60%. I also found that a combination of syntactic (POS tags, syntactic relation type, and word distance) and semantic (word embeddings) features proved most beneficial for this task, while some features known to perform well in metaphor detection (concreteness, imageability, and sentiment) were not particularly useful for scoring metaphor novelty. Somewhat surprisingly, features based on learned topic models also fell into this category of less-useful features; this suggests that the same conceptual mappings are generating both conventional and novel metaphors, with these mappings merely being linguistically instantiated in different ways. Many more details about my work on automatically scoring metaphor novelty can be found here:
Natalie Parde and Rodney D. Nielsen. Exploring the Terrain of Metaphor Novelty: A Regression-based Approach for Automatically Scoring Metaphors. In the
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18). New Orleans, Louisiana, February 2-7, 2018.
To evaluate my work on automatically scoring metaphor novelty, I built a large, publicly available dataset of syntactically-related word pairs labeled for metaphor novelty on a continuous scale. The word pairs were extracted from running text originating in the VU Amsterdam Metaphor Corpus, the most widely used metaphor detection dataset to date. The VUAMC is comprised of text fragments from news articles, academic publications, fiction narratives, and transcribed conversations; within the text fragments, individual words are labeled as metaphors. I extracted 18,439 syntactically-related pairs of nouns, verbs, adjectives, and adverbs in which at least one of the two words was originally tagged as a metaphor in the VUAMC, and collected annotations for the word pairs from Amazon Mechanical Turk workers on a scale from 0-3, with 3 meaning the word pair formed a highly novel metaphor. This dataset will be described in our upcoming LREC paper:
Natalie Parde and Rodney D. Nielsen. A Corpus of Metaphor Novelty Scores for Syntactically-Related Word Pairs. In the
Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018). Miyazaki, Japan, May 7-12, 2018.
The dataset can be downloaded here. I am currently in the process of developing a complementary corpus of syntactically-related word pairs extracted from Project Gutenberg books, also labeled for metaphor novelty; when finished, this dataset will be publicly available as well.
Automatically Aggregating Crowdsourced Labels
I collected multiple annotations for each word pair via Amazon Mechanical Turk when building my metaphor novelty dataset, so to determine the best "true" score for each instance, I had to decide how to best aggregate its multiple crowdsourced annotations. Particularly since determining the novelty of a given metaphor is a difficult, somewhat subjective task for humans, I wanted to avoid using standard label aggregation techniques like taking the average or majority annotation (both of these aggregation strategies could be skewed by confused or malicious workers).
Instead, I built a supervised regression model to predict "gold standard" label aggregations based on features extracted from the crowdsourced annotations themselves. These features included information primarily based on different aspects of label distribution and worker correlation. When I evaluated this new aggregation strategy against other common label aggregation techniques, on both my dataset and on third-party crowdsourcing datasets, I found that my method predicted aggregations closer to gold standard values than did other methods. My label aggregation dataset and my source code for this approach are available here, and many more details about the approach can be found in the paper here:
Natalie Parde and Rodney D. Nielsen. Finding Patterns in Noisy Crowds: Regression-based Annotation Aggregation for Crowdsourced Data. In the
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2017). Copenhagen, Denmark, September 7-11, 2017.
Multimodal Language Grounding
In addition to my dissertation project, a major project that I worked on throughout the course of my Ph.D. was "I Spy," which focuses on enabling robots to automatically ground language using local visual information captured during game-based interactions. My research advisor ( Rodney D. Nielsen) and I originally conceptualized this project as part of a month-long summer school I attended in Athens, Greece, at the National Center for Scientific Research; I developed the original source code there in collaboration with researchers from N.C.S.R. as well as with visiting researchers from the University of Texas at Arlington. The project won both First Prize and People's Choice Award at the summer school, competing against projects developed by other teams of researchers from around the world.
The basic format of the project is such that a robot is first placed in front of an everyday object, and it captures images of the object from different angles and distances using its built-in cameras. A user then provides a natural-language description of the object. The robot parses the description and identifies keywords, or concepts, for which it then proceeds to build models grounded in visual features extracted from the images it captured of the object. So, for example, if a robot has learned about an apple and a mug and both were described as being red, the robot would at that time have images of those two objects represented in its mental model of the word "red." As it learned about additional red objects later on, this model would expand to include more diverse samples of "red" objects, over time allowing the robot to learn better grounded models.
During a guessing game, the robot then attempts to determine which of a set of objects on the ground in front of it the human player has in mind. It does so by asking questions featuring the concepts it has learned (e.g., "Is it red?"), with the human providing positive or negative responses. These positive and negative responses, combined with images that the robot captured during the game itself, can be used as additional feedback for improving upon its existing language models. Eventually, the robot reaches a confidence threshold for one of the objects, and makes a guess; it subsequently either wins or loses the game.
Since returning from the summer school in Athens, I have continued working on this project with many of the undergraduate and high school students whom I mentor. Additional information about the project can be found in the following paper:
Natalie Parde, Adam Hair, Michalis Papakostas, Konstantinos Tsiakas, Maria Dagioglou, Vangelis Karkaletsis, and Rodney D. Nielsen. Grounding the Meaning of Words through Vision and Interactive Gameplay. In
Proceedings of the 2015 International Joint Conference on Artificial Intelligence. Buenos Aires, Argentina, July 25-31, 2015.
As a side project, I worked on developing a domain-general approach to sarcasm detection. I trained my sarcasm detection model using tweets that had been self-tagged by Twitter users as either #sarcasm (the positive class) or #happiness, #sadness, #anger, #fear, #disgust, or #surprise (the negative class; it was assumed that these tweets expressed emotion but in a non-sarcastic way), and applied it to sarcastic and non-sarcastic Amazon product reviews. I developed a variety of syntactic and semantic features for the task, including those based on word and sentence polarity, those based on pointwise mutual information, and those based on two different bag-of-words models. I experimented with models trained only on tweets, only on product reviews, on a combination of the two, and on a combination of the two with an added domain adaptation transformation step, finding that the latter approach worked best and outperformed prior sarcasm detection work that learned only from in-domain data. More information about this project, as well as a comprehensive error analysis, can be found in the following papers:
Natalie Parde and Rodney D. Nielsen. Detecting Sarcasm is Extremely Easy ;-). In the
Proceedings of the NAACL 2018 Workshop on Computational Semantics Beyond Events and Roles (SemBEaR 2018). New Orleans, Louisiana, June 5, 2018.
Natalie Parde and Rodney D. Nielsen. #SarcasmDetection is soooo general! Towards a Domain-Independent Approach for Detecting Sarcasm. In
Proceedings of the 30th International FLAIRS Conference. Marco Island, Florida, May 22-24, 2017.
Cybersecurity and EEG
In the past, I have occasionally collaborated with researchers from UNT's Center for Information and Cyber Security on NLP- or cognitive science-based aspects of work in the cybersecurity domain. In one recent study, I worked with these collaborators and additional researchers from UNT's College of Business and UNT's Department of Electrical Engineering to conduct an electroencephalogram (EEG) and eye tracking study to determine whether there are neural signatures or gaze patterns associated with the performance of malicious computer activity (e.g., hacking). My primary role was in data acquisition; I assisted in designing the experimental setup, ran participants, and monitored the EEG and eye tracking systems during data collection. Additional studies using the data collected are still underway, but one paper co-authored on this project recently won the Dr. Hermann Zemlicka Award for Most Visionary Paper at the Gmunden Retreat on NeuroIS; that paper can be found here:
Nabila Salma, Bin Mai, Kamesh Namuduri, Rasel Mamun, Yassir Hashem, Hassan Takabi, Natalie Parde, and Rodney Nielsen. Using EEG Signal to Analyze IS Decision Making Cognitive Processes. In Information Systems and Neuroscience, Lecture Notes in Information Systems and Organisation.
Crowdsourcing Hardware Mapping Algorithms
Finally, as an undergraduate I worked with researchers from UNT's Department of Electrical Engineering to develop UNTANGLED, an online puzzle game in which the puzzles are abstractions of real-world hardware mapping problems. The goal of the project is to harness human intuition to solve these hardware problems by crowdsourcing players' mapping strategies and using them to train machine learning models to identify optimal placement of logic blocks given new hardware constraints. During my time working on UNTANGLED, the game won the People's Choice Award in the Games & Apps category of the 2012-2013 International Science and Engineering Visualization Challenge, held by the National Science Foundation and Science magazine. Selected articles describing this project can be found here:
Gayatri Mehta, Krunal K. Patel, Natalie Parde, and Nancy S. Pollard. "Data-driven mapping using local patterns."
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32.11 (2013): 1668-1681.
Gayatri Mehta, Carson Crawford, Xiaozhong Luo, Natalie Parde, Krunal Patel, Brandon Rodgers, Anil Sistla, Anil Yadav, and Marc Reisner. (2013). "UNTANGLED – A game environment for discovery of creative mapping strategies."
ACM Transactions on Reconfigurable Technology and Systems 6.3 (2013): 1-26.