This page includes all the materials for the course KKLT0030 Automatic text processing 5 credits.
The course Moodle page has private materials, such as possible recordings and announcements.
- Getting started
- Notebook 1
- Commands
- Getting data and printing stuff: wget, echo
- Printing files: cat, head, tail
- Copying, renaming, removing: cp, mv, rm
- Others: wc -w, ls
- Notebook2
- Commands: egrep, sort, uniq
- Options
- egrep -v, -i, -w, -c, -B, -A
- head -n, tail -n
- wc -l, -w
- uniq -c, sort -r, -n
- Pipes, especially frequency counts
- sort | uniq -c | sort -rn
- Notebook3 exercises
- Notebook4
- Git clone for cloning Github reports
- Gzipped files using gzip and zcat
- Changing characters using tr
- Combining tr to a frequency list pipeline
- Using tr to normalize
- Regular expressions
- Notebook 5 exercies
- Notebook 6
- Dependency syntax analysis pipeline
- Sentence + token segmentation, lemmatisation, POS, dependencies
- conllu format
- Universal dependencies treebanks
- Trankit parser
- Notebook 7
- Running python scripts
- Notebook 8
- Working on the server (Note that the exam will be on server!)
- Notebook 8 cont'd
- Scripts
- Notebook 9
- Notebook 9
- Notebook 10
- For loops
- Recap
- Notebook 11
- In case the Notebook 11 is not accessible as from the Github repo, you can use this version: https://colab.research.google.com/drive/16EFdusy496svEkMSxTvP0pHtzTZikHer?usp=sharing
- Exam, option 1
- (TBA)
- Exam, option 2
- (TBA)