Nay San fauxneticien

Hi there 👋

I'm Nay, currently a PhD candidate at Stanford Linguistics. My research focuses on data-centric approaches to creating speech and language technologies for digitally underserved languages and user populations. In recent work, I have been examining how state-of-the-art models trained for major languages can be adapted for speech information retrieval applications related to endangered language documentation and revitalisation.

Repositories

Experiments with wav2vec 2.0 models involving only 10 minutes of transcribed speech: https://github.com/fauxneticien/w2v2-10min-replication
A pipeline to isolate and transcribe one language in mixed-language speech: https://github.com/CoEDL/vad-sli-asr
A reference implementation for a Query-by-Example Spoken Term Detection service: https://github.com/parledoct/qbestdocks
Evaluation of feature extraction methods for query-by-example spoken term detection with low resource languages: https://github.com/fauxneticien/qbe-std_feats_eval
Query by example spoken term detection using bottleneck features and a convolutional neural network: https://github.com/fauxneticien/bnf_cnn_qbe-std
An R package for tidying lexicographical data in backslash-coded formats: https://github.com/CoEDL/tidylex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nay San fauxneticien

Achievements

Achievements

Organizations

Block or report fauxneticien

Hi there 👋

Repositories

Pinned Loading