Instructor: Valentina Gonzalez-Rostani
Professor Contact Details: gonzalez.rostani@usc.edu, https://gonzalez-rostani.com/, Office DMC 303
Semester: Fall 2025
Meeting Time and Place: TBD, 2-4:50 pm Tuesday
Office hours: Monday 8am-10am and by appointment (here)
The course begins by positioning text-as-data methods within political science and computational social science. We'll discuss research questions that textual data can address, along with opportunities, limitations, and ethical and methodological challenges of data collection and processing.
We'll then cover essential preprocessing techniques—tokenization, stemming, and stopword removal—and methods for numerical text representation (bag-of-words, word embeddings, and association measures). These concepts will be taught through a balance of theory and hands-on coding exercises.
Next, we'll explore major approaches to measuring social science concepts. This includes rule-based methods, supervised learning techniques, and unsupervised methods such as topic modeling and matrix decomposition, with particular emphasis on validation strategies.
The latter half of the semester focuses on recent deep learning advances in NLP. We'll cover logistic regression for text classification, embeddings, pretrained language models, and transformer architectures like BERT. While deep learning won't be our main focus, it's essential knowledge for modern text analysis.
We'll conclude with an exploration of large language models (LLMs), such as OpenAI and LLAMA, examining both their opportunities and challenges.