Skip to content

Complete code for the proposed CNN-Transformer model for natural language understanding.

License

Notifications You must be signed in to change notification settings

rafiepour/CTran

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CTRAN: CNN-Transformer-based Network for Natural Language Understanding

The pytorch implementation as described in https://www.sciencedirect.com/science/article/pii/S0952197623011971.

PWCPWC
PWCPWC

Introduction

CTran CNN Transformer Model Architecture This repository contains the complete source code of the proposed CTRAN network for joint intent detection and slot filling, which are the main tasks of natural language understanding. We propose a encoder-decoder modele, combining CNNs with Transformers and define the concept of alignment in Transformer decoder for the first time. The encoder is shared between the two intent detection and slot filling tasks, hence the model benefits from implied dependency of two tasks. In CTRAN's shared encoder, BERT is used as word embedding. What follows is the convolutional operation on word embeddings, which is then accompanied by the "window feature sequence" structure, essentially transposing the output to a certain shape instead of using pooling operations. We decided on using a stack of Transformer encoders to create a new contextualized representation of the input which also considers the new embedding generated by the CNN. The decoder comprises self-attention and a linear layer to produce output probabilities for intent detection. Furthermore, we propose the novel Aligned Transformer decoder, followed by a fully connected layer for the slot filling task. For more information, please refer to the EAAI's article.

Requirements

  • A Cuda capable GPU
  • Python
  • Pytorch
  • Jupyter

Notes

  • Github has a limitation for file sizes bigger than 100mb. Therefore, you might not be able to download the pytorch_model.bin inside the bert-base-uncased directory. In that case, you can download said file from this link
  • If you want to get the best result, I advise you to use bert-large-uncased as we did in our experiments. This code defaults to bert-base-uncased so that most users can run it. If you decide to use the large BERT, you can download the bert-large-uncased files from this link. You have to create a new bert-large-uncased directory and put the downloaded files there and you are good to go!
  • You can download the ArViX preprint version from this link if you are not able to download the official version from Elsevier. There are few differences between the published version and the preprint, where a bit more explanations and analysis exist on the published version, however they are essentially the same in principal.

Dependencies

numpy==1.23.5
scikit_learn==1.2.2
torch==1.13.0
tqdm==4.64.1
transformers==4.25.1

Runtime

On Windows 10 With an RTX 3080 and BERTbase, the approximate training time for ATIS dataset is 46.71 seconds for each epoch. On the same setup, SNIPS training time is 119.5 seconds for each loop.

Citation

If you use any part of our code, please consider citing our paper as follows:

@article{Rafiepour2023,
title = {CTRAN: CNN-Transformer-based network for natural language understanding},
journal = {Engineering Applications of Artificial Intelligence},
volume = {126},
pages = {107013},
year = {2023},
issn = {0952-1976},
doi = {https://doi.org/10.1016/j.engappai.2023.107013},
url = {https://www.sciencedirect.com/science/article/pii/S0952197623011971},
author = {Mehrdad Rafiepour and Javad Salimi Sartakhti},
keywords = {Natural language understanding, Slot-filling, Intent-detection, Transformers, CNN - Transformer encoder, Aligned transformer decoder, BERT, ELMo}
}

License

License
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.