Skip to content

words frequency top100k from BNC/ANC/COCA, dsl format, for goldendict

Notifications You must be signed in to change notification settings

jjzz/ZZ-WordFreq

Repository files navigation

ZZ WordFreq

Current Version

  • WordFreq 更新至 v0.3

    原来的 BNC 数据来自 Adam Kilgarriff, 现标记为BNC.AK

    本次新增来自 Paul Nation 的 BNC 数据, 标记为BNC.PN, 其特点是将所有单词按 family 组织, 按词频每 1000 个 word families 一个大组, 共 14 组, 14000 个最常用 word family, 实际含单词(包括各种单复数形式等)近50000. 比如, society/societal/societies 的词频数都是 1000, 表示此 family 属最常见的1000个 word families.

    btw: "BNC Top-15000" 的版本来源不明, 目前已弃用

Introduction

ZZ WordFreq

top 60000 words from BNC.AK/ANC/COCA, 14000 word families from BNC.PN

  • wordfreq.zz.dsl
  • wordfreq.zz.ann

ZZ's BNC Top-15000 Word List (En)

word & frequency only

  • bnc15000.ann
  • bnc15000.dsl

ZZ's BNC Top-15000 Word List (En-Cn)

word & frequency & very simple Chinese translation

  • bnc15000cn.ann
  • bnc15000cn.dsl

Reference

  • BNC (British National Corpus)

https://www.natcorp.ox.ac.uk

https://www.kilgarriff.co.uk/bnc-readme.html

https://www.victoria.ac.nz/lals/about/staff/paul-nation

https://www.audiencedialogue.net/bnc.html

  • OANC (Open American Naitonal Corpus)

https://www.anc.org/data/anc-second-release/

  • COCA (The Corpus Of Contemporary American English)

https://corpus.byu.edu/coca/

https://www.pdawiki.com/forum/thread-13667-1-1.html

Screenshot

screenshot

screenshot

"[ANC] 6776" 表示在ANC词频中列第6776位

注释

  • 已移除所有含数字/部分标点符号/全部非ASCII字符的单词

  • OANC 中将名词单复数 和 动词原型/过去式/过去分词 合并作为同一个单词处理

About

words frequency top100k from BNC/ANC/COCA, dsl format, for goldendict

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages