Skip to content

use TCN and Transformer model for "Hourly Energy Consumption" data

Notifications You must be signed in to change notification settings

linyu096/TCN_TRAS_power

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TCN_TRAS_power

use TCN and Transformer model for forecasting on "Hourly Energy Consumption" data

1. Temporal Convolutional Networks

  • Advantages of TCN over RNN
    • Parallelism
      TCN can process data in parallel without requiring sequential processing like RNN.
    • Flexible receptive field
      The size of the receptive field of TCN is determined by the number of layers, the size of the convolution kernel, and the expansion coefficient. It can be flexibly customized according to different characteristics of different tasks.
    • Stable gradient
      RNNs often have problems of gradient disappearance and gradient explosion, which are mainly caused by shared parameters in different time periods. Like traditional convolutional neural networks, TCNs are less prone to gradient disappearance and explosion problems.
    • The memory is lower
      When RNN is used, it needs to save the information of each step, which will occupy a lot of memory. The convolution kernel of TCN is shared in one layer, and the memory usage is lower.

ex : kernel size=2,dilations=[1,2,4,8]

step 1 : Causal Convolutions

Causal Convolutions

step 2 : Dilated + Causal Convolutions

In order to effectively deal with long historical information

Dilated Causal Convolutions

Supplementary information : Dilated Non-Causal Convolutions

(kernel size=3) Dilated Non-Causal Convolutions

step 3 : Add Residual block

Adaptive model depth, alleviating the phenomenon of gradient vanishing and exploding
Add Residual block

Supplementary information : Dropout

Prevent units from co-adapting too much, and turn on all neurons during Inference, so that the average value of each small neural network can be easily estimated, using dropout can greatly reduce the possibility of overfitting
Dropout

Parameters setting

Given input_length, kernel_size, dilation_base and the minimum number of layers required for full history coverage, the basic TCN network would look something like this:
Parameters setting

Transformer model

  • The motivation of Transformer is to solve the following problems of RNNs:
    • Sequential computation is difficult to parallelize.
    • Use the attention mechanism to solve the problem that RNNs cannot rely on long distances. The farther the words are, the harder it is to find each other's information.
    • Excessive RNN architecture may cause the gradient to disappear

Structure

Transformer consists of Encoder and Decoder, which are very different in nature and should be understood separately. The former is used to compress sequence information, and the latter is used to convert (decompress) the information extracted by the former into information required by the task.
Structure

Component features 1 : Self-attention

The role of Query and Key is to determine the weight of Value, and Key and Value form a token. When Query and Key are strongly related to the task, the Value corresponding to Key will be enlarged.

Self-attention

Component features 2 : Muti-Head Attention

The advantage of this is that each head (q, k, v) can focus on different information, some focus on local, some focus on global information, etc.
 Muti-Head Attention

Component features 3 : Add & Norm layer

Since the input and output have the same shape, we can use residual connection to add the output to the input. And use LayerNorm to normalize in the direction of each instance.

Add & Norm layer

Component features 4 : Layer Normalization vs Batch Normalization

Layer Normalization (LN) is often used in RNN. The concept is similar to BN. The difference is that LN normalizes each sample.
Layer Normalization vs Batch Normalization

Component features 5 :Sequence Mask

The Decoder is not allowed to see future messages, so we have to cover up the vectors after i+1.

Sequence Mask

References

https://unit8.com/resources/temporal-convolutional-networks-and-forecasting/ https://www.twblogs.net/a/5d6dc709bd9eee541c33c0b2
https://meetonfriday.com/posts/7c0020de/
https://medium.com/@a5560648/dropout-5fb2105dbf7c
https://zhuanlan.zhihu.com/p/52477665 https://ithelp.ithome.com.tw/articles/10206317 https://medium.com/%E5%B7%A5%E4%BA%BA%E6%99%BA%E6%85%A7/review-attention-is-all-you-need-62a1c93c48a5
https://medium.com/ching-i/transformer-attention-is-all-you-need-c7967f38af14

About

use TCN and Transformer model for "Hourly Energy Consumption" data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published