research-article

Open access

BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection

Authors:

Ling LiuAuthors Info & Claims

WWW '23: Proceedings of the ACM Web Conference 2023

Pages 2189 - 2197

https://doi.org/10.1145/3543507.3583345

Published: 30 April 2023 Publication History

All formats PDF

Abstract

As various forms of fraud proliferate on Ethereum, it is imperative to safeguard against these malicious activities to protect susceptible users from being victimized. While current studies solely rely on graph-based fraud detection approaches, it is argued that they may not be well-suited for dealing with highly repetitive, skew-distributed and heterogeneous Ethereum transactions. To address these challenges, we propose BERT4ETH, a universal pre-trained Transformer encoder that serves as an account representation extractor for detecting various fraud behaviors on Ethereum. BERT4ETH features the superior modeling capability of Transformer to capture the dynamic sequential patterns inherent in Ethereum transactions, and addresses the challenges of pre-training a BERT model for Ethereum with three practical and effective strategies, namely repetitiveness reduction, skew alleviation and heterogeneity modeling. Our empirical evaluation demonstrates that BERT4ETH outperforms state-of-the-art methods with significant enhancements in terms of the phishing account detection and de-anonymization tasks. The code for BERT4ETH is available at: https://github.com/git-disl/BERT4ETH.

References

[1]

Nesreen K Ahmed, Ryan Rossi, John Boaz Lee, Theodore L Willke, Rong Zhou, Xiangnan Kong, and Hoda Eldardiry. 2018. Learning role-based graph embeddings. arXiv preprint arXiv:1802.02896 (2018).

[2]

Ferenc Béres, István A Seres, András A Benczúr, and Mikerah Quintyne-Collins. 2021. Blockchain is watching you: Profiling and deanonymizing ethereum users. In 2021 IEEE International Conference on Decentralized Applications and Infrastructures (DAPPS). IEEE, 69–78.

[3]

Shuqing Bian, Zhenpeng Deng, Fei Li, Will Monroe, Peng Shi, Zijun Sun, Wei Wu, Sikuang Wang, William Yang Wang, Arianna Yuan, 2018. Icorating: A deep-learning system for scam ico identification. arXiv preprint arXiv:1803.03670 (2018).

[4]

Weili Chen, Xiongfeng Guo, Zhiguang Chen, Zibin Zheng, and Yutong Lu. 2020. Phishing Scam Detection on Ethereum: Towards Financial Security for Blockchain Ecosystem. In IJCAI. 4506–4512.

[5]

Weili Chen, Tuo Zhang, Zhiguang Chen, Zibin Zheng, and Yutong Lu. 2020. Traveling the token world: A graph analysis of ethereum erc20 token ecosystem. In Proceedings of The Web Conference 2020. 1411–1421.

Digital Library

[6]

Weili Chen, Zibin Zheng, Jiahui Cui, Edith Ngai, Peilin Zheng, and Yuren Zhou. 2018. Detecting ponzi schemes on ethereum: Towards healthier blockchain technology. In Proceedings of the 2018 world wide web conference. 1409–1418.

Digital Library

[7]

CoinDesk. 2022. Uniswap User Loses $8M Worth of Ether in Phishing Attack. https://www.coindesk.com/tech/2022/07/12/uniswap-user-loses-8m-worth-of-ether-in-phishing-attack.

[8]

The Ripple Cryptocurrency. 2022. 100s of ETH Stolen After Bee Token ICO Email List Hacked. https://theripplecryptocurrency.com/bee-token-scam.

[9]

Philip Daian, Steven Goldfeder, Tyler Kell, Yunqi Li, Xueyuan Zhao, Iddo Bentov, Lorenz Breidenbach, and Ari Juels. 2019. Flash boys 2.0: Frontrunning, transaction reordering, and consensus instability in decentralized exchanges. arXiv preprint arXiv:1904.05234 (2019).

[10]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[11]

Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in neural information processing systems 30 (2017).

[12]

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16000–16009.

[13]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.

[14]

Sihao Hu, Xuhong Zhang, Junfeng Zhou, Shouling Ji, 2021. Turbo: Fraud Detection in Deposit-free Leasing Service via Real-Time Behavior Network Mining. In ICDE.

[15]

Sihao Hu, Zhen Zhang, Shengliang Lu, Bingsheng He, and Zhao Li. 2023. Sequence-Based Target Coin Prediction for Cryptocurrency Pump-and-Dump. SIGMOD (2023).

[16]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448–456.

[17]

Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).

[18]

Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019).

[19]

Xi Tong Lee, Arijit Khan, Sourav Sen Gupta, Yu Hann Ong, and Xuan Liu. 2020. Measurements, analyses, and insights on the entire ethereum blockchain network. In Proceedings of The Web Conference 2020. 155–166.

Digital Library

[20]

Sijia Li, Gaopeng Gou, Chang Liu, Chengshang Hou, Zhenzhen Li, and Gang Xiong. 2022. TTAGN: Temporal Transaction Aggregation Graph Network for Ethereum Phishing Scams Detection. In Proceedings of the ACM Web Conference 2022. 661–669.

Digital Library

[21]

Dan Lin, Jiajing Wu, Qi Yuan, and Zibin Zheng. 2020. Modeling and understanding ethereum transaction records via a complex network approach. IEEE Transactions on Circuits and Systems II: Express Briefs 67, 11 (2020), 2737–2741.

[22]

Dan Lin, Jiajing Wu, Qi Yuan, and Zibin Zheng. 2020. T-edge: Temporal weighted multidigraph embedding for ethereum transaction network analysis. Frontiers in Physics 8 (2020), 204.

[23]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).

[24]

Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701–710.

Digital Library

[25]

Benedek Rozemberczki and Rik Sarkar. 2018. Fast sequence-based embedding with diffusion graphs. In International Workshop on Complex Networks. Springer, 99–107.

[26]

Jie Shen, Jiajun Zhou, Yunyi Xie, Shanqing Yu, and Qi Xuan. 2021. Identity inference on blockchain using graph neural network. In International Conference on Blockchain and Trustworthy Systems. Springer, 3–17.

[27]

Yujia Tang, Chang Xu, Can Zhang, Yan Wu, and Liehuang Zhu. 2021. Analysis of Address Linkability in Tornado Cash on Ethereum. In China Cyber Security Annual Conference. Springer, 39–50.

[28]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, 2017. Attention is all you need. arXiv:1706.03762 (2017).

[29]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).

[30]

Friedhelm Victor and Andrea Marie Weintraud. 2021. Detecting and quantifying wash trading on decentralized cryptocurrency exchanges. In Proceedings of the Web Conference 2021. 23–32.

Digital Library

[31]

Jiajing Wu, Qi Yuan, Dan Lin, Wei You, Weili Chen, Chuan Chen, and Zibin Zheng. 2020. Who are the phishers¿ phishing scam detection on ethereum via network embedding. IEEE Transactions on Systems, Man, and Cybernetics: Systems (2020).

[32]

Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).

[33]

Lin Zhao, Sourav Sen Gupta, Arijit Khan, and Robby Luo. 2021. Temporal analysis of the entire ethereum blockchain network. In Proceedings of the Web Conference 2021. 2258–2269.

Digital Library

[34]

Jiajun Zhou, Chenkai Hu, Jianlei Chi, Jiajing Wu, Meng Shen, and Qi Xuan. 2022. Behavior-aware Account De-anonymization on Ethereum Interaction Graph. arXiv preprint arXiv:2203.09360 (2022).

Cited By

Muzammil MWu ZBalasubramanian ANikiforakis NVallina-Rodríguez NSuarez-Tángil GLevin DPelsser C(2024)Panning for gold.eth: Understanding and Analyzing ENS Domain DropcatchingProceedings of the 2024 ACM on Internet Measurement Conference10.1145/3646547.3689009(731-738)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3646547.3689009
Ding ZShi JLi QCao JSerra ESpezzano F(2024)Effective Illicit Account Detection on Large Cryptocurrency MultiGraphsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679707(457-466)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679707
Huang WZhao ZChen XZhang QLi MSu HWu QSerra ESpezzano F(2024)A Payment Transaction Pre-training Model for Fraud Transaction DetectionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679670(932-941)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679670
Show More Cited By

Index Terms

BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection
1. Social and professional topics

Index terms have been assigned to the content through auto-classification.

Recommendations

Ethereum fraud behavior detection based on graph neural networks
Abstract
Since Bitcoin was first conceived in 2008, blockchain technology has attracted a large amount of researchers’ attention. At the same time, it has also facilitated a variety of cybercrimes. For example, Ethereum frauds, due to the potential for ...
Ethereum: Complete Guide to Understanding Ethereum, Blockchain, Smart Contracts, ICOs, and Decentralized Apps. Includes guides on buying Ether, Cryptocurrencies and Investing in ICOs.
A Payment Transaction Pre-training Model for Fraud Transaction Detection
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management

The surge in merchant fraud poses a significant threat to market order and consumer security. Effective security monitoring for merchants is crucial in safeguarding the digital life ecosystem and users' financial well-being. Detecting daily fraudulent ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '23: Proceedings of the ACM Web Conference 2023

April 2023

4293 pages

ISBN:9781450394161

DOI:10.1145/3543507

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 April 2023

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

WWW '23

Sponsor:

SIGWEB

WWW '23: The ACM Web Conference 2023

April 30 - May 4, 2023

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
2,080
Total Downloads

Downloads (Last 12 months)1,353
Downloads (Last 6 weeks)170

Reflects downloads up to 05 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Muzammil MWu ZBalasubramanian ANikiforakis NVallina-Rodríguez NSuarez-Tángil GLevin DPelsser C(2024)Panning for gold.eth: Understanding and Analyzing ENS Domain DropcatchingProceedings of the 2024 ACM on Internet Measurement Conference10.1145/3646547.3689009(731-738)Online publication date: 4-Nov-2024
https://dl.acm.org/doi/10.1145/3646547.3689009
Ding ZShi JLi QCao JSerra ESpezzano F(2024)Effective Illicit Account Detection on Large Cryptocurrency MultiGraphsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679707(457-466)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679707
Huang WZhao ZChen XZhang QLi MSu HWu QSerra ESpezzano F(2024)A Payment Transaction Pre-training Model for Fraud Transaction DetectionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679670(932-941)Online publication date: 21-Oct-2024
https://dl.acm.org/doi/10.1145/3627673.3679670
Lin DWu JYu YFu QZheng ZYang CChua TNgo CKa-Wei Lee RKumar RLauw H(2024)DenseFlow: Spotting Cryptocurrency Money Laundering in Ethereum Transaction GraphsProceedings of the ACM Web Conference 202410.1145/3589334.3645692(4429-4438)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645692
Zhou CChen HWu HZhang JCai WChua TNgo CKa-Wei Lee RKumar RLauw H(2024)ARTEMIS: Detecting Airdrop Hunters in NFT Markets with a Graph Learning SystemProceedings of the ACM Web Conference 202410.1145/3589334.3645597(1824-1834)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645597
Hu SHuang TChow KWei WWu YLiu LChua TNgo CKa-Wei Lee RKumar RLauw H(2024)ZipZap: Efficient Training of Language Models for Large-Scale Fraud Detection on BlockchainProceedings of the ACM Web Conference 202410.1145/3589334.3645352(2807-2816)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589334.3645352
Zhang ZDevaraj MBai XLu H(2024)Ethereum Phishing Scams Detection: A Survey2024 International Conference on Artificial Intelligence and Digital Technology (ICAIDT)10.1109/ICAIDT62617.2024.00023(70-75)Online publication date: 7-Jun-2024
https://doi.org/10.1109/ICAIDT62617.2024.00023
Lu LWen ZYuan YHe QChen JLiu Z(2024)ANNProof: Building a verifiable and efficient outsourced approximate nearest neighbor search system on blockchainFuture Generation Computer Systems10.1016/j.future.2024.03.002156(206-220)Online publication date: Jul-2024
https://doi.org/10.1016/j.future.2024.03.002
Ravindranath VNallakaruppan MShri MBalusamy BBhattacharyya S(2024)Evaluation of performance enhancement in Ethereum fraud detection using oversampling techniquesApplied Soft Computing10.1016/j.asoc.2024.111698161:COnline publication date: 1-Aug-2024
https://dl.acm.org/doi/10.1016/j.asoc.2024.111698
Lin YJiang PGuo FZhu L(2024)CrossAAD: Cross-Chain Abnormal Account DetectionInformation Security and Privacy10.1007/978-981-97-5101-3_5(84-104)Online publication date: 15-Jul-2024
https://doi.org/10.1007/978-981-97-5101-3_5
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents