Skip to content

pokusew/fel-corpus-viz

Repository files navigation

Document Corpus Visualization

An app for visualization of document corpora (collections) using D3.js

👉 Available online at pokusew-corpus-viz.netlify.app

The code is written in TypeScript, D3.js and React.js. See more in the Architecture section.

Note: The initial version is finished! 🚀
Next steps: Finish documentation 📖 and refactor 🧹 some parts of the code.

Content

Description

See 👉 Final Report – Visualization of a Document Corpus on Google Docs.

Architecture

Currently, it is a client-side-only application (SPA). It runs completely in the browser.

The code is written in TypeScript, D3.js and React.js.

The project has just a few production dependencies. Everything else is implemented from scratch.

Data preprocessing

There is also a separate data preprocessing pipeline which is implemented in Python 3.
See data-preprocessing directory that contains its own README with more info.

The app/data directory (versioned in Git) contains already preprocessed data of some document collections.

Project structure

The web app source code is in the app directory. Some directories contain feature-specific READMEs. The following diagram briefly describes the main directories and files:

. (project root dir)
├── .github - GitHub config (GitHub Actions)
├── app - the app source code
│   ├── components - React components for the the main app logic, UI, state, views, plot wrappers
│   ├── core - D3.js scatterplot and wordcloud, data loading 
│   ├── data - data for to visualize - stored results of the data preprocessing pipeline
│   ├── helpers - various common functions
│   ├── images - the PWA app icon and SVG UI icons
│   ├── styles - app styles written in Sass (SCSS)
│   ├── sw - the service worker that handles precaching app shell (not fully integrated)
│   ├── _headers - Netlify HTTP headers customization
│   ├── _redirects - Netlify HTTP redirects/rewrites customization
│   ├── index.js - the app starting point (entrypoint)
│   ├── manifest.json - a web app manifest for PWA
│   ├── robots.txt
│   ├── routes.ts - app routes definitions
│   ├── template.ejs - index.html template to be built by webpack 
│   └── types.js - data, state and API types
├── data-preprocessing - Python scripts used for data preprocessing
├── test - a few tests
├── tools - custom webpack plugins
├── types - TypeScript declarations for non-code imports (SVG, MP3)
├── .browserslistrc - Browserslist config
├── .eslintrc.js - ESLint config
├── .nvmrc - Node.js version specification for Netlify
├── ava.config.js - AVA config
├── babel.config.js - Babel config
├── netlify.toml - Netlify main config
├── package.json
├── babel.config.js - PostCSS config
├── tsconfig.json - main TypeScript config
├── webpack.config.*.js - webpack configs
└── yarn.lock

Development

Requirements

Set up

  1. Install all dependencies with Yarn (run yarn).
  2. You are ready to go.
  3. Use yarn start to start dev server with HMR.
  4. Then open http:https://localhost:3000/ in the browser.

Available commands

  • yarn start – Starts a webpack development server with HMR (hot module replacement).

  • yarn build – Builds the production version and outputs to dist dir. Note: Before running an actual build, dist dir is purged.

  • yarn analyze – Same as yarn build but it also outputs build/stats.production.json and runs webpack-bundle-analyzer CLI.

  • yarn tsc – Runs TypeScript compiler. Outputs type errors to console.

  • yarn lint – Runs ESLint. Outputs errors to console.

  • yarn test – Runs tests using AVA.

  • yarn test-hot – Runs tests using AVA in watch mode.

Deployment

Currently, we use Netlify which is practically a CDN on steroids with integrated builds. There are 3 configuration files that affect the deployment behavior:

  • netlify.toml – global config
  • app/_headers – HTTP headers customization (mainly for immutable files)
  • app_redirects – HTTP redirects and rewrites (fallback to index.html for client-side routing)