CS 421: Natural Language Processing
Fall 2023
Contact Information
Professor: |
|
Natalie Parde (parde@uic.edu) |
Office Hours: |
|
Tuesday 3:00 - 5:00 p.m. CST |
|
Teaching Assistants: |
|
Usman Shahid (Office Hours: Thursday 12:00 - 2:00 p.m. CST in SELW 4029) |
|
|
Eli Whitehouse (Office Hours: Wednesday 1:00 - 3:00 p.m. CST) |
|
Piazza: |
|
https://piazza.com/uic/fall2023/cs421 |
What is this class about?
Natural language processing (NLP) is the subfield of artificial intelligence that focuses on automatically understanding and generating natural language (e.g., Arabic, Navajo, Spanish, or English). It is crucial to many everyday applications ...if you've searched for something online or talked to one of your devices today, you've made use of many different NLP technologies already. This class will provide an introduction to the foundations and most popular applications of natural language processing, through a combination of readings, lectures, short assignments, and projects. Topics covered will include text preprocessing, part-of-speech tagging, syntactic and dependency parsing, language modeling, word representations, text classification, and dialogue systems, among others.
Textbooks
Readings, learning content, and (some) assignments for this class will be drawn from the following source:
-
Daniel Jurafsky and James H Martin. Speech and Language Processing (3rd Edition). Draft, 2023.
This textbook is still being written; its current draft can be freely accessed at the link above.
Deliverables
This is a 400-level course, designed for both graduate students and advanced undergraduates. Depending on your classification, you may have enrolled in either the four-hour version (grad students) or the three-hour version (undergrad students). There are slightly different requirements for the two versions of the course, with the biggest difference being that students in the four-hour version will be required to complete a semester-long research study. Undergrads may opt to complete this component as well if they would like, in which case their final course grade will be determined according to the same breakdown as that used for graduate students; however, doing this extra work is certainly not a requirement. Some further details about the work you will be expected to complete for this course are provided below:
- Python Bootcamp (Assignment 0): One introductory coding "bootcamp" assignment will be due before submitting the first standard deliverable, to ensure necessary Python proficiency.
- Assignments: Four assignments will be due over the course of the semester (due dates are indicated on the course calendar). These assignments will contain a mix of theoretical and coding questions. Code should be written in Python.
- Project: All students will complete a semester-long project, divided into three deliverables (due dates are indicated on the course calendar). Code, when applicable, should be written in Python.
- Research Study: Graduate students (and any undergraduates who choose to do so) will complete a semester-long study pertaining to research reproducibility and evaluation, due the week before finals week. For the study, students will select from one of two options: (Option A) analyzing the reproducibility of an existing NLP research paper, or (Option B) investigating the reproducibility or evaluation errors that one might face when using pretrained language models. These studies can be completed individually or in pairs; if done in pairs, the submission must be accompanied by a statement detailing which component(s) each student worked on, signed by both students.
Grading rubrics will be posted in the deliverables' descriptions. Final course grades will be determined according to the following breakdowns:
- Undergraduate Students:
- Python Bootcamp (Assignment 0): 5%
- Project: 39% (13% for each deliverable)
- Assignments: 56% (14% for each assignment)
- Graduate Students:
- Python Bootcamp (Assignment 0): 4%
- Project: 30% (10% for each deliverable)
- Assignments: 48% (12% for each assignment)
- Research Study: 18% (4% for the presentation, 10% for the report, and 4% for the source code)
Schedule
Below is a list of course topics, readings, deadlines, and slides by week. The version of the schedule here is subject to change. All deliverables are due by 12:00 p.m. (noon) CST on the specified due date.
Week |
Topic |
Readings |
Deliverables |
Slides |
8/21-8/25 |
Introduction and Dialogue Systems and Chatbots |
Chapter 15 |
— |
Download |
8/28-9/1 |
Text Preprocessing and Edit Distance |
Chapter 2 |
— |
Download |
9/4-9/8 |
N-Gram Language Models, Naive Bayes, and Evaluating Text Classifiers |
Chapters 3 and 4 |
Assignment 0 (9/8; Recommended much sooner!)
Assignment 1 (9/8)
|
Download
Download
|
9/11-9/15 |
Logistic Regression and Intro to Vector Semantics |
Chapters 5 and 6 |
— |
Download |
9/18-9/22 |
Word Embeddings and Feedforward Neural Networks |
Chapters 6 and 7 |
Assignment 2 (9/22) |
Download |
9/25-9/29 |
Overview of Deep Learning for NLP and Reproducibility Workshop Day |
Chapters 9-11 (just skim!) |
— |
Download |
10/2-10/6 |
Hidden Markov Models and Part-of-Speech Tagging |
Appendix A and Chapter 8 (8.1-8.4) |
Project Part 1 (10/6) |
Download |
10/9-10/13 |
Constituency Grammars and Constituency Parsing |
Chapter 17 and Appendix C |
— |
Download |
10/16-10/20 |
Dependency Parsing and Logical Representations of Sentence Meaning |
Chapters 18 and 19 |
Assignment 3 (10/20) |
Download |
10/23-10/27 |
Relation and Event Extraction and Temporal Reasoning |
Chapters 21 and 22 |
Project Part 2 (10/27) |
Download |
10/30-11/3 |
Word Senses and WordNet and Semantic Role Labeling |
Chapters 23 and 24 |
— |
Download |
11/6-11/10 |
Lexicons for Sentiment, Affect, and Connotation and Linguistic Background for Coreference Resolution |
Chapters 25 and 26 |
Assignment 4 (11/10) |
Download |
11/13-11/17 |
Coreference Resolution and Discourse Coherence |
Chapters 26 and 27 |
— |
Download |
11/20-11/22 |
Co-Working Day |
— |
Project Part 3 (11/22) |
— |
11/27-12/1 |
Research Study Presentations |
— |
Videos (11/27) or Presentations (Tuesday/Thursday); Source Code and Report (12/1) |
— |
12/4-12/8 |
— |
— |
— |
— |
Final Notes
This website is provided partially for student convenience, partially for my own record-keeping purposes, and partially for the benefit of others who are not able to enroll in the course but who may find the content interesting for one reason or another. It is not a substitute for the course pages on Blackboard and Gradescope, or the course discussion board on Piazza! Please refer to those sources for copies of the full syllabus, assignments, grading rubrics, submission links, and other useful information. If you are not enrolled in the course but would like to request access to those materials, please send me an email introducing yourself and explaining why you would like to have access to them.
Happy studying!