Welcome! The course introduces social scientists to computational techniques for analyzing large-scale textual information. In an era where vast amounts of text—from policy speeches and legislative documents to social media and news archives—shape our understanding of the world, this course equips students with the methodological tools to extract meaningful insights from unstructured data. Bridging natural language processing (NLP), machine learning, Bayesian statistics, and the social sciences, the course focuses on practical applications in areas like political discourse analysis, sentiment detection, and policy communication. Students will learn essential preprocessing techniques (e.g., tokenization, stemming, stopword removal), text representation methods (bag-of-words, word embeddings), and advanced modeling approaches, including supervised learning, topic modeling, and deep learning techniques such as BERT and large language models (LLMs) like OpenAI and LLAMA. By integrating hands-on coding exercises with theoretical discussions, the course prepares students to engage with textual data critically, ensuring they can apply these techniques to real-world social science research while considering the ethical and methodological challenges of working with text-based data.

Class Location TBD
Class Hours Tuesday 2:00-4:50PM
Instructor Prof. Valentina Gonzalez-Rostani
Email Address` gonzalez.rostani@usc.edu
Office Hours Tuesday 8:00-10:00 AM (make appointment)
Course Website Link

Announcements

Welcome to Text as Data for Social Scientists!

<aside> 👋🏽

Welcome to Text as Data for Social Scientists

There are readings for first week, give a look at them, and also review the syllabus!

</aside>

Course information

Syllabus: Text as Data for Social Scientists

Class Directory

Course Schedule

Classes