Skip to content

Multi-container application that builds on a Solr search engine for topic modeling-enriched data storage and retrieval.

License

Notifications You must be signed in to change notification settings

IntelCompH2020/EWB

Repository files navigation

Intelcomp's Evaluation Workbench (EWB) API Dockers

Overview

The Evaluation Workbench (EWB) API Dockers comprise a multi-container application that includes essential components like the Solr cluster and REST APIs for Topic Modeling, Inference, and Classification services. This multi-container application is orchestrated using a docker-compose script, connecting all services through the ewb-net network.

Python Dockers

Main components

Topic Modeling Service

This service comprises a RESTful API that utilizes the Solr search engine for data storage and retrieval. It enables the indexing of logical corpora and associated topic models, formatted according to the specifications provided by the topicmodeler. Additionally, it facilitates information retrieval through a set of queries.

EWB's TM Api

This system relies on the following services:

  1. ewb-tm: This service hosts the Topic Modeling's RESTful API server. It is constructed using the Dockerfile located in the ewb-tm directory. It has dependencies on the Solr service and requires access to the following mounted volumes: ./data/source, ./data/inference, and ./ewb_config. These volumes are crucial for accessing necessary data from the ITMT (the project folder containing the topic models) and for delivering results obtained through the EWB or generated via the Inference service. The ewb_config volume also houses some important configuration variables.

  2. ewb-solr: This service operates the Solr search engine. It employs the official Solr image from Docker Hub and relies on the zoo service. The service mounts several volumes, including:

    • The Solr data directory (./db/data/solr:/var/solr) for data persistence.
    • Two custom Solr plugins:
    • The Solr configuration directory (./solr_config:/opt/solr/server/solr) to access the specific Solr schemas for EWB.
  3. ewb-solr-initializer: This service is temporary and serves the sole purpose of initializing the mounted volume /db/data with the necessary permissions required by Solr.

  4. ewb-zoo: This service runs Zookeeper, which is essential for Solr to coordinate cluster nodes. It employs the official zookeeper image and mounts two volumes for data and logs.

  5. ewb-solr-config: This service handles Solr configuration. It is constructed using the Dockerfile located in the solr_config directory. This service has dependencies on the Solr and zoo services and mounts the Docker socket and the bash_scripts directory, which contains a script for initializing the Solr configuration for EWB.

Inference Service

This service serves as a Topic Model Inferencer, constructed using the Dockerfile found in the ewb-inferencer directory. It relies on access to mounted volumes at ./data/source, ./data/inference, and ./ewb_config.

EWB's TM Api

Its primary purpose is to be used internally by the Topic Modeling Service, although it can also function as a standalone component.

Classification Service

This service serves as an inference system for hierarchical classification, built on top of the clf-inference-intelcomp library, that allows to classify texts based on a given hierarchy of language models. It relies on access to mounted volumes at ./data/classifier and ./ewb_config.

EWB's Classifier Api

Requirements

Python requirements files (ewb-tm, ewb-inferencer and ewb-classifier).

Note that the requirements are directly installed in their respective services at the building-up time.

Sample data to start using the EWB API Dockers

A sample corpus and model can be downloaded from here.

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 101004870. H2020-SC6-GOVERNANCE-2018-2019-2020 / H2020-SC6-GOVERNANCE-2020

About

Multi-container application that builds on a Solr search engine for topic modeling-enriched data storage and retrieval.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published