I am a Ph.D. candidate in the Language Technologies Institute at Carnegie Mellon University, advised by Graham Neubig.
My main research interest lies in natural language processing (NLP) and I have worked on various NLP tasks across a broad spectrum of domains and languages. My current work focuses on developing models for multilingual and low-resource NLP.
I was recently named to the Forbes 30 Under 30 in Science list for my work on NLP for endangered languages!
In the past, I’ve worked as a research intern at Bloomberg AI and as a research fellow at Microsoft Research.
More information about my work experience, publications, and academic service can be found in my CV.
I am best reached by email at email@example.com. Feel free to reach out about my research or anything else I might be able to help with. I’m always happy to answer questions about getting started with NLP research and applying to Ph.D. programs.
A full list of my publications can be found here.
OCR-EL: optical character recognition for low-resource and endangered languages. [Webpage] [Software] [Papers: 1, 2]
Temporally-aware NER: measuring the effect of temporal drift on named entity recognition. [Dataset] [Paper]
Low-resource entity extraction and linking: using cross-lingual transfer, multilingual knowledge bases, and phonological representations. [Entity linking software] [NER software] [Papers: 1, 2, 3, 4, 5]
Print and probability: OCR models to discover the printers of a 375-year-old document, John Milton’s Areopagitica — one of the most significant documents in the history of the freedom of the press. [Press coverage] [CMU blog coverage] [SoFCB Essay Prize 2021] [Paper]
- Workshop on Computational Methods for Endangered Languages at ACL 2022 [link]
- Student Research Workshop at ACL 2020 [link]
- CMU SCS Graduate Application Support Program, 2020
- CMU LTI Diversity, Equity, and Inclusion Committee [link]
- Diversity and Inclusion Committee at NAACL 2019 [link]
- CMU Language Technologies Mentoring Program (for new graduate students; 2021)
- CMU Graduate Application Support Mentor (2020, 2021)
- CMU AI Mentoring Program (for undergraduates; 2019, 2020, 2021)
- AAAI 2022, AAAI 2021, ARR 2021, EACL 2021, NAACL 2021, ACL 2021, AmericasNLP 2021, AAAI 2020, HAMLETS 2020, LREC 2020, EMNLP 2020, *SEM 2020, AACL SRW 2020, AfricaNLP2020, TALLIP 2019, CALCS 2018
Lexically-Aware Semi-Supervised Learning for OCR Post-Correction
S. Rijhwani, D. Rosenblum, A. Anastasopoulos, G. Neubig
MasakhaNER: Named Entity Recognition for African Languages
D. I. Adelani et al., including S. Rijhwani
Evaluating the Morphosyntactic Well-formedness of Generated Texts
A. Pratapa, A. Anastasopoulos, S. Rijhwani, A. Chaudhary et al.
Dependency Induction Through the Lens of Visual Perception
R. Su, S. Rijhwani, H. Zhu, J. He, X. Wang, Y. Bisk, G. Neubig
OCR Post-Correction for Endangered Language Texts
S. Rijhwani, A. Anastasopoulos, G. Neubig
Soft Gazetteers for Low-Resource Named Entity Recognition
S. Rijhwani, S. Zhou, G. Neubig, J. Carbonell
Temporally-Informed Analysis of Named Entity Recognition
S. Rijhwani and D. Preotiuc-Pietro
Improving Candidate Generation for Low-resource Cross-lingual Entity Linking
S. Zhou, S. Rijhwani, J. Wieting, J. Carbonell, G. Neubig
AlloVera: A Multilingual Allophone Database
D. R. Mortensen, X. Li, P. Littell, A. Michaud, S. Rijhwani et al.
Damaged Type and Areopagitica’s Clandestine Printers
C. N. Warren, P. Wiliams, S. Rijhwani, M. G’Sell
Milton Studies, 2020
A Summary of the First Workshop on Language Technology for Language Documentation and Revitalization
G. Neubig, S. Rijhwani, A. Palmer, J. MacKenzie, H. Cruz, X. Li, M. Lee et al.
First Joint SLTU and CCURL Workshop, 2020
Practical Comparable Data Collection for Low-Resource Languages via Images
A. Madaan, S. Rijhwani, A. Anastasopoulos, Y. Yang, G. Neubig
Practical Machine Learning for Developing Countries Workshop, 2020
Zero-shot Neural Transfer for Cross-lingual Entity Linking
S. Rijhwani, J. Xie, G. Neubig, J. Carbonell
Choosing Transfer Languages for Cross-Lingual Learning
Y. Lin, C. Chen, J. Lee, Z. Li, Y. Zhang, M. Xia, S. Rijhwani, J. He et al.
Towards Zero-resource Cross-lingual Entity Linking
S. Zhou, S. Rijhwani, G. Neubig
Workshop on Deep Learning Approaches for Low-Resource NLP, 2019
Parser Combinators for Tigrinya and Oromo Morphology
P. Littell, T. McCoy, N. Han, S. Rijhwani, Z. Sheikh, D. Mortensen, T. Mitamura, L. Levin
Estimating Code-Switching on Twitter with a Novel Generalized Word-Level Language Detection Technique
S. Rijhwani, R. Sequiera, M. Choudhury, K. Bali, C. S. Maddila
Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent Homology-Based Representations
P. Michel*, A. Ravichander*, S. Rijhwani*
Second Workshop on Representation Learning for NLP, 2017
Code-Switching as a Social Act: The Case of Arabic Wikipedia Talk Pages
M. Yoder, S. Rijhwani, C. Rosé, L. Levin
Second Workshop on NLP and Computational Social Science, 2017
Understanding Language Preference for Expression of Opinion and Sentiment: What do Hindi-English Speakers do on Twitter?
K. Rudra, S. Rijhwani, R. Begum, K. Bali, M. Choudhury, N. Ganguly
Experiments with Cross-lingual Systems for Synthesis of Code-Mixed Text
S. Sitaram, S. K. Rallabandi, S. Rijhwani, A. W. Black
Ninth ISCA Speech Synthesis Workshop (SSW), 2016.