skip to main content
10.1145/3543507.3583345acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Open access

BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection

Published: 30 April 2023 Publication History

Abstract

As various forms of fraud proliferate on Ethereum, it is imperative to safeguard against these malicious activities to protect susceptible users from being victimized. While current studies solely rely on graph-based fraud detection approaches, it is argued that they may not be well-suited for dealing with highly repetitive, skew-distributed and heterogeneous Ethereum transactions. To address these challenges, we propose BERT4ETH, a universal pre-trained Transformer encoder that serves as an account representation extractor for detecting various fraud behaviors on Ethereum. BERT4ETH features the superior modeling capability of Transformer to capture the dynamic sequential patterns inherent in Ethereum transactions, and addresses the challenges of pre-training a BERT model for Ethereum with three practical and effective strategies, namely repetitiveness reduction, skew alleviation and heterogeneity modeling. Our empirical evaluation demonstrates that BERT4ETH outperforms state-of-the-art methods with significant enhancements in terms of the phishing account detection and de-anonymization tasks. The code for BERT4ETH is available at: https://github.com/git-disl/BERT4ETH.

References

[1]
Nesreen K Ahmed, Ryan Rossi, John Boaz Lee, Theodore L Willke, Rong Zhou, Xiangnan Kong, and Hoda Eldardiry. 2018. Learning role-based graph embeddings. arXiv preprint arXiv:1802.02896 (2018).
[2]
Ferenc Béres, István A Seres, András A Benczúr, and Mikerah Quintyne-Collins. 2021. Blockchain is watching you: Profiling and deanonymizing ethereum users. In 2021 IEEE International Conference on Decentralized Applications and Infrastructures (DAPPS). IEEE, 69–78.
[3]
Shuqing Bian, Zhenpeng Deng, Fei Li, Will Monroe, Peng Shi, Zijun Sun, Wei Wu, Sikuang Wang, William Yang Wang, Arianna Yuan, 2018. Icorating: A deep-learning system for scam ico identification. arXiv preprint arXiv:1803.03670 (2018).
[4]
Weili Chen, Xiongfeng Guo, Zhiguang Chen, Zibin Zheng, and Yutong Lu. 2020. Phishing Scam Detection on Ethereum: Towards Financial Security for Blockchain Ecosystem. In IJCAI. 4506–4512.
[5]
Weili Chen, Tuo Zhang, Zhiguang Chen, Zibin Zheng, and Yutong Lu. 2020. Traveling the token world: A graph analysis of ethereum erc20 token ecosystem. In Proceedings of The Web Conference 2020. 1411–1421.
[6]
Weili Chen, Zibin Zheng, Jiahui Cui, Edith Ngai, Peilin Zheng, and Yuren Zhou. 2018. Detecting ponzi schemes on ethereum: Towards healthier blockchain technology. In Proceedings of the 2018 world wide web conference. 1409–1418.
[7]
CoinDesk. 2022. Uniswap User Loses $8M Worth of Ether in Phishing Attack. https://www.coindesk.com/tech/2022/07/12/uniswap-user-loses-8m-worth-of-ether-in-phishing-attack.
[8]
The Ripple Cryptocurrency. 2022. 100s of ETH Stolen After Bee Token ICO Email List Hacked. https://theripplecryptocurrency.com/bee-token-scam.
[9]
Philip Daian, Steven Goldfeder, Tyler Kell, Yunqi Li, Xueyuan Zhao, Iddo Bentov, Lorenz Breidenbach, and Ari Juels. 2019. Flash boys 2.0: Frontrunning, transaction reordering, and consensus instability in decentralized exchanges. arXiv preprint arXiv:1904.05234 (2019).
[10]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[11]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. Advances in neural information processing systems 30 (2017).
[12]
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16000–16009.
[13]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
[14]
Sihao Hu, Xuhong Zhang, Junfeng Zhou, Shouling Ji, 2021. Turbo: Fraud Detection in Deposit-free Leasing Service via Real-Time Behavior Network Mining. In ICDE.
[15]
Sihao Hu, Zhen Zhang, Shengliang Lu, Bingsheng He, and Zhao Li. 2023. Sequence-Based Target Coin Prediction for Cryptocurrency Pump-and-Dump. SIGMOD (2023).
[16]
Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448–456.
[17]
Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
[18]
Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019).
[19]
Xi Tong Lee, Arijit Khan, Sourav Sen Gupta, Yu Hann Ong, and Xuan Liu. 2020. Measurements, analyses, and insights on the entire ethereum blockchain network. In Proceedings of The Web Conference 2020. 155–166.
[20]
Sijia Li, Gaopeng Gou, Chang Liu, Chengshang Hou, Zhenzhen Li, and Gang Xiong. 2022. TTAGN: Temporal Transaction Aggregation Graph Network for Ethereum Phishing Scams Detection. In Proceedings of the ACM Web Conference 2022. 661–669.
[21]
Dan Lin, Jiajing Wu, Qi Yuan, and Zibin Zheng. 2020. Modeling and understanding ethereum transaction records via a complex network approach. IEEE Transactions on Circuits and Systems II: Express Briefs 67, 11 (2020), 2737–2741.
[22]
Dan Lin, Jiajing Wu, Qi Yuan, and Zibin Zheng. 2020. T-edge: Temporal weighted multidigraph embedding for ethereum transaction network analysis. Frontiers in Physics 8 (2020), 204.
[23]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
[24]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701–710.
[25]
Benedek Rozemberczki and Rik Sarkar. 2018. Fast sequence-based embedding with diffusion graphs. In International Workshop on Complex Networks. Springer, 99–107.
[26]
Jie Shen, Jiajun Zhou, Yunyi Xie, Shanqing Yu, and Qi Xuan. 2021. Identity inference on blockchain using graph neural network. In International Conference on Blockchain and Trustworthy Systems. Springer, 3–17.
[27]
Yujia Tang, Chang Xu, Can Zhang, Yan Wu, and Liehuang Zhu. 2021. Analysis of Address Linkability in Tornado Cash on Ethereum. In China Cyber Security Annual Conference. Springer, 39–50.
[28]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, 2017. Attention is all you need. arXiv:1706.03762 (2017).
[29]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).
[30]
Friedhelm Victor and Andrea Marie Weintraud. 2021. Detecting and quantifying wash trading on decentralized cryptocurrency exchanges. In Proceedings of the Web Conference 2021. 23–32.
[31]
Jiajing Wu, Qi Yuan, Dan Lin, Wei You, Weili Chen, Chuan Chen, and Zibin Zheng. 2020. Who are the phishers¿ phishing scam detection on ethereum via network embedding. IEEE Transactions on Systems, Man, and Cybernetics: Systems (2020).
[32]
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in neural information processing systems 32 (2019).
[33]
Lin Zhao, Sourav Sen Gupta, Arijit Khan, and Robby Luo. 2021. Temporal analysis of the entire ethereum blockchain network. In Proceedings of the Web Conference 2021. 2258–2269.
[34]
Jiajun Zhou, Chenkai Hu, Jianlei Chi, Jiajing Wu, Meng Shen, and Qi Xuan. 2022. Behavior-aware Account De-anonymization on Ethereum Interaction Graph. arXiv preprint arXiv:2203.09360 (2022).

Cited By

View all
  • (2024)Panning for gold.eth: Understanding and Analyzing ENS Domain DropcatchingProceedings of the 2024 ACM on Internet Measurement Conference10.1145/3646547.3689009(731-738)Online publication date: 4-Nov-2024
  • (2024)Effective Illicit Account Detection on Large Cryptocurrency MultiGraphsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679707(457-466)Online publication date: 21-Oct-2024
  • (2024)A Payment Transaction Pre-training Model for Fraud Transaction DetectionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679670(932-941)Online publication date: 21-Oct-2024
  • Show More Cited By

Index Terms

  1. BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    WWW '23: Proceedings of the ACM Web Conference 2023
    April 2023
    4293 pages
    ISBN:9781450394161
    DOI:10.1145/3543507
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 April 2023

    Check for updates

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    WWW '23
    Sponsor:
    WWW '23: The ACM Web Conference 2023
    April 30 - May 4, 2023
    TX, Austin, USA

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,353
    • Downloads (Last 6 weeks)170
    Reflects downloads up to 05 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Panning for gold.eth: Understanding and Analyzing ENS Domain DropcatchingProceedings of the 2024 ACM on Internet Measurement Conference10.1145/3646547.3689009(731-738)Online publication date: 4-Nov-2024
    • (2024)Effective Illicit Account Detection on Large Cryptocurrency MultiGraphsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679707(457-466)Online publication date: 21-Oct-2024
    • (2024)A Payment Transaction Pre-training Model for Fraud Transaction DetectionProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679670(932-941)Online publication date: 21-Oct-2024
    • (2024)DenseFlow: Spotting Cryptocurrency Money Laundering in Ethereum Transaction GraphsProceedings of the ACM Web Conference 202410.1145/3589334.3645692(4429-4438)Online publication date: 13-May-2024
    • (2024)ARTEMIS: Detecting Airdrop Hunters in NFT Markets with a Graph Learning SystemProceedings of the ACM Web Conference 202410.1145/3589334.3645597(1824-1834)Online publication date: 13-May-2024
    • (2024)ZipZap: Efficient Training of Language Models for Large-Scale Fraud Detection on BlockchainProceedings of the ACM Web Conference 202410.1145/3589334.3645352(2807-2816)Online publication date: 13-May-2024
    • (2024)Ethereum Phishing Scams Detection: A Survey2024 International Conference on Artificial Intelligence and Digital Technology (ICAIDT)10.1109/ICAIDT62617.2024.00023(70-75)Online publication date: 7-Jun-2024
    • (2024)ANNProof: Building a verifiable and efficient outsourced approximate nearest neighbor search system on blockchainFuture Generation Computer Systems10.1016/j.future.2024.03.002156(206-220)Online publication date: Jul-2024
    • (2024)Evaluation of performance enhancement in Ethereum fraud detection using oversampling techniquesApplied Soft Computing10.1016/j.asoc.2024.111698161:COnline publication date: 1-Aug-2024
    • (2024)CrossAAD: Cross-Chain Abnormal Account DetectionInformation Security and Privacy10.1007/978-981-97-5101-3_5(84-104)Online publication date: 15-Jul-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media