Skip to content

pitkant/ATP_kurssi

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ATP_kurssi

This page includes all the materials for the course KKLT0030 Automatic text processing 5 credits.

The course Moodle page has private materials, such as possible recordings and announcements: https://moodle.utu.fi/course/view.php?id=29596

Mon Oct 23

  • Getting started
  • Notebook 1
  • Commands
    • Getting data and printing stuff: wget, echo
    • Printing files: cat, head, tail
    • Copying, renaming, removing: cp, rm, mv
    • Others: wc -w, ls

Thur Oct 26

  • Notebook2
  • Commands: egrep, sort, uniq
  • Options
    • egrep -v, -i, -w, -c, -B, -A
    • head -n, tail -n
    • wc -l, -w
    • uniq -c, sort -r, -n
  • Pipes, especially frequency counts
    • sort | uniq -c | sort -rn

Mon Oct 30

  • Notebook3 exercises

Thur Nov 2

  • Notebook4
  • Git clone for cloning Github reports
  • Gzipped files using gzip and zcat
  • Changing characters using tr
    • Combining tr to a frequency list pipeline
    • Using tr to normalize
  • Regular expressions

Mon Nov 6

  • Notebook 5 exercies

Thur Nov 9

  • Notebook 6
  • Dependency syntax analysis pipeline
  • Sentence + token segmentation, lemmatisation, POS, dependencies
  • conllu format
  • Universal dependencies treebanks
  • Trankit parser

Mon Nov 13

  • Notebook 7
  • Running python scripts

Thur Nov 16

  • Notebook 8
  • Working on the server (Note that the exam will be on server!)

Mon Nov 20

  • Notebook 8 cont'd
  • Scripts

Thur Nov 23

  • Notebook 9

Mon Nov 27

  • Notebook 9

Thur Nov 30

  • Notebook 10
  • For loops

Mon Dec 4

Thur Dec 7

  • Exam, option 1
  • 14.00-16.00 (TBA)

Thur Dec 14

  • Exam, option 2
  • 14.00-16.00 (TBA)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 97.3%
  • Python 2.7%