Skip to content
Mark Fullmer edited this page Oct 25, 2021 · 30 revisions

Welcome to the Corpus Developer Do It Yourself (DIY) Toolkit!

Who is this for?

This guide is designed to be a starting point or model for researchers planning to develop their own web-based corpus. It documents the software design, and deployment process for, Crow, the corpus & repository of writing, which is located at https://crow.corporaproject.org.

The software discussed here is not a plug-and-play end product. Making it work with your corpus data requires significant reconfiguration, and assumes access to substantial knowledge of both web development in general and Drupal and Angular development in specific.

First step

Read the executive summary (PDF) to determine if this approach fits your corpus goals.

Contents

The "backend": Data storage & retrieval via the application programming interface (API)

Drupal 9 site PHP MySQL

The dataset that makes up the Corpus and Repository of Writing (Crow) is a large-scale learner corpus of English writing samples from university foundational writing courses, as well as pedagogical materials used in those courses. It is designed to contain tens of thousands of individual texts, searchable by word, phrase, or metadata.

The "frontend": Design and usability considerations for a corpus interface

Angular

The user interface for the Crow corpus Interface Design is designed to caterer to multiple audiences -- corpus researchers, writing teachers, and students. Registration is required, with different tiers of access.

Context & case studies