skip to main content
10.1145/3597503.3639132acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

LibAlchemy: A Two-Layer Persistent Summary Design for Taming Third-Party Libraries in Static Bug-Finding Systems

Published: 12 April 2024 Publication History

Abstract

Despite the benefits of using third-party libraries (TPLs), the misuse of TPL functions raises quality and security concerns. Using traditional static analysis to detect bugs caused by TPL function is non-trivial. One promising solution would be to automatically generate and persist the summaries of TPL functions offline and then reuse these summaries in compositional static analysis online. However, when dealing with millions of lines of TPL code, the summaries designed by existing studies suffer from an unresolved paradox. That is, a highly precise form of summary leads to an unaffordable space and time overhead, while an imprecise one seriously hurts its precision or recall.
To address the paradox, we propose a novel two-layer summary design. The first layer utilizes a line-sized program representation known as the program dependence graph to compactly encode path conditions, while the second layer encodes bug-type-specific properties. We implemented our idea as a tool called LibAlchemy and evaluated it on fifteen mature and extensively checked open-source projects. Experimental results show that LibAlchemy can check over ten million lines of code within ten hours. LibAlchemy has detected 55 true bugs with a high precision of 90.16%, eleven of which have been assigned CVE IDs. Compared to whole-program analysis and the conventional design of path-sensitively precise summaries, LibAlchemy achieves an 18.56x and 12.77x speedup and saves 91.49% and 90.51% of memory usage, respectively.

References

[1]
The Open Web Application Security Project. OWASP Top 10. https://owasp.org/www-pdf-archive/OWASP_Top_10_-_2013.pdf, 2013. [Online; accessed Jul-2023].
[2]
The Open Web Application Security Project. OWASP Top 10. https://owasp.org/www-project-top-ten/2017/Top_10, 2017. [Online; accessed Jul-2023].
[3]
The Open Web Application Security Project. OWASP Top 10. https://owasp.org/Top10/, 2021. [Online; accessed 10-Jan-2022].
[4]
Synopsys Inc. Black Duck Software Composition Analysis. https://www.blackducksoftware.com/, 2022. [Online; accessed 7-Dec-2022].
[5]
Micro Focus Inc. HP Fortify Software Composition Analysis. https://www.microfocus.com/en-us/cyberres/application-security/software-composition-analysis, 2022. [Online; accessed 7-Dec-2022].
[6]
Snyk Limited. Snyk Software Composition Analysis. https://snyk.io/series/open-source-security/software-composition-analysis-sca/, 2022. [Online; accessed 7-Dec-2022].
[7]
Seunghoon Woo, Sunghan Park, Seulbae Kim, Heejo Lee, and Hakjoo Oh. Centris: A precise and scalable approach for identifying modified open-source software reuse. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pages 860--872. IEEE, 2021.
[8]
Seunghoon Woo, Hyunji Hong, Eunjin Choi, and Heejo Lee. Movery: A precise approach for modified vulnerable code clone discovery from modified open-source software components. In 31st USENIX Security Symposium (USENIX Security 22), pages 3037--3053, 2022. https://www.usenix.org/conference/usenixsecurity22/presentation/woo.
[9]
Yang Xiao, Bihuan Chen, Chendong Yu, Zhengzi Xu, Zimu Yuan, Feng Li, Binghong Liu, Yang Liu, Wei Huo, Wei Zou, et al. Mvp: Detecting vulnerabilities using patch-enhanced vulnerability signatures. In 29th USENIX Security Symposium (USENIX Security 20), pages 1165--1182, 2020. https://www.usenix.org/conference/usenixsecurity20/presentation/xiao.
[10]
Xian Zhan, Lingling Fan, Sen Chen, Feng We, Tianming Liu, Xiapu Luo, and Yang Liu. Atvhunter: Reliable version detection of third-party libraries for vulnerability identification in android applications. In 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pages 1695--1707. IEEE, 2021.
[11]
Seulbae Kim, Seunghoon Woo, Heejo Lee, and Hakjoo Oh. Vuddy: A scalable approach for vulnerable code clone discovery. In 2017 IEEE Symposium on Security and Privacy (SP), pages 595--614. IEEE, 2017.
[12]
GitHub. A potential bug of NPD in opusfile. https://github.com/xiph/opusfile/issues/36, 2022. [Online; accessed Jul-2023].
[13]
Ying Zhang, Md Mahir Asef Kabir, Ya Xiao, Danfeng Daphne Yao, and Na Meng. Automatic detection of java cryptographic api misuses: Are we there yet. IEEE Transactions on Software Engineering, 2022.
[14]
Sven Amann, Hoan Anh Nguyen, Sarah Nadi, Tien N Nguyen, and Mira Mezini. A systematic evaluation of static api-misuse detectors. IEEE Transactions on Software Engineering, 45(12):1170--1188, 2018.
[15]
Peter Leo Gorski, Luigi Lo Iacono, Yasemin Acar, Sebastian Moeller, Christian Stransky, and Sascha Fahl. On the effect of security warnings on cryptographic api misuse. In In 39th IEEE Symposium on Security and Privacy, 2018. https://www.usenix.org/conference/soups2018/presentation/gorski.
[16]
Steven Arzt and Eric Bodden. Stubdroid: Automatic inference of precise dataflow summaries for the android framework. In 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE), pages 725--735. IEEE, 2016.
[17]
Nomair A Naeem and Ondřej Lhoták. Faster alias set analysis using summaries. In International Conference on Compiler Construction, pages 82--103. Springer, 2011.
[18]
Haiyan Zhu, Thomas Dillig, and Isil Dillig. Automated inference of library specifications for source-sink property verification. In Asian Symposium on Programming Languages and Systems, pages 290--306. Springer, 2013.
[19]
Osbert Bastani, Saswat Anand, and Alex Aiken. Specification inference using context-free language reachability. In Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 553--566, 2015.
[20]
Aws Albarghouthi, Isil Dillig, and Arie Gurfinkel. Maximal specification synthesis. In Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 789--801, 2016.
[21]
Manu Sridharan, Shay Artzi, Marco Pistoia, Salvatore Guarnieri, Omer Tripp, and Ryan Berg. F4f: taint analysis of framework-based web applications. In Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications, pages 1053--1068, 2011.
[22]
Wei Huang, Yao Dong, Ana Milanova, and Julian Dolby. Scalable and precise taint analysis for android. In Proceedings of the 2015 International Symposium on Software Testing and Analysis, pages 106--117, 2015.
[23]
Li Li, Tegawendé F Bissyandé, Damien Octeau, and Jacques Klein. Droidra: Taming reflection to support whole-program analysis of android apps. In Proceedings of the 25th International Symposium on Software Testing and Analysis, pages 318--329, 2016.
[24]
Long Lu, Zhichun Li, Zhenyu Wu, Wenke Lee, and Guofei Jiang. Chex: statically vetting android apps for component hijacking vulnerabilities. In Proceedings of the 2012 ACM conference on Computer and communications security, pages 229--240, 2012.
[25]
IBM T.J. Watson Research Center. The T. J. Watson Libraries for Analysis (WALA). https://github.com/wala/WALA, 2022. [Online; accessed Nov-2022].
[26]
Steffen Lortz, Heiko Mantel, Artem Starostin, Timo Bahr, David Schneider, and Alexandra Weber. Cassandra: Towards a certifying app store for android. In Proceedings of the 4th ACM Workshop on Security and Privacy in Smartphones & Mobile Devices, pages 93--104, 2014.
[27]
William Enck, Peter Gilbert, Seungyeop Han, Vasant Tendulkar, Byung-Gon Chun, Landon P Cox, Jaeyeon Jung, Patrick McDaniel, and Anmol N Sheth. Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Transactions on Computer Systems (TOCS), 32(2):1--29, 2014.
[28]
Philipp Dominik Schubert, Ben Hermann, and Eric Bodden. Lossless, persisted summarization of static callgraph, points-to and data-flow analysis. In 35th European Conference on Object-Oriented Programming (ECOOP 2021). Schloss Dagstuhl-Leibniz-Zentrum fur Informatik, 2021.
[29]
Cristian-Alexandru Staicu, Martin Toldam Torp, Max Schäfer, Anders Møller, and Michael Pradel. Extracting taint specifications for javascript libraries. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, pages 198--209, 2020.
[30]
Johannes Späth, Karim Ali, and Eric Bodden. Ide al: Efficient and precise alias-aware dataflow analysis. Proceedings of the ACM on Programming Languages, 1(OOPSLA):1--27, 2017.
[31]
Atanas Rountev and Barbara G Ryder. Points-to and side-effect analyses for programs built with precompiled libraries. In International Conference on Compiler Construction, pages 20--36. Springer, 2001.
[32]
Atanas Rountev, Scott Kagan, and Thomas Marlowe. Interprocedural dataflow analysis in the presence of large libraries. In International Conference on Compiler Construction, pages 2--16. Springer, 2006.
[33]
Atanas Rountev, Mariana Sharp, and Guoqing Xu. Ide dataflow analysis in the presence of large object-oriented libraries. In International Conference on Compiler Construction, pages 53--68. Springer, 2008.
[34]
Hao Tang, Xiaoyin Wang, Lingming Zhang, Bing Xie, Lu Zhang, and Hong Mei. Summary-based context-sensitive data-dependence analysis in presence of callbacks. In Proceedings of the 42Nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 83--95, 2015.
[35]
Hao Tang, Di Wang, Yingfei Xiong, Lingming Zhang, Xiaoyin Wang, and Lu Zhang. Conditional dyck-cfl reachability analysis for complete and efficient library summarization. In European Symposium on Programming, pages 880--908. Springer, 2017.
[36]
John Toman and Dan Grossman. Taming the static analysis beast. In 2nd Summit on Advances in Programming Languages (SNAPL 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2017.
[37]
Synopsys Inc. Coverity Scan. https://scan.coverity.com/, 2022. [Online; accessed 7-Dec-2022].
[38]
Ankush Das, Shuvendu K Lahiri, Akash Lal, and Yi Li. Angelic verification: Precise verification modulo unknowns. In International Conference on Computer Aided Verification, pages 324--342. Springer, 2015.
[39]
Sam Blackshear and Shuvendu K Lahiri. Almost-correct specifications: A modular semantic framework for assigning confidence to warnings. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 209--218, 2013.
[40]
Thomas Reps, Susan Horwitz, and Mooly Sagiv. Precise interprocedural dataflow analysis via graph reachability. In Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 49--61, 1995.
[41]
Domagoj Babic and Alan J. Hu. Calysto: Scalable and precise extended static checking. In Proceedings of the 30th International Conference on Software Engineering, ICSE '08, pages 211--220. IEEE, 2008.
[42]
Isil Dillig, Thomas Dillig, Alex Aiken, and Mooly Sagiv. Precise and compact modular procedure summaries for heap manipulating programs. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '11, pages 567--577. ACM, 2011.
[43]
Steven Arzt and Eric Bodden. Reviser: efficiently updating ide-/ifds-based dataflow analyses in response to incremental program changes. In Proceedings of the 36th International Conference on Software Engineering, pages 288--298, 2014.
[44]
Scott McPeak, Charles-Henri Gros, and Murali Krishna Ramanathan. Scalable and incremental software bug detection. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, pages 554--564, 2013.
[45]
Lori L Pollock and Mary Lou Soffa. An incremental version of iterative data flow analysis. IEEE Transactions on Software Engineering, 15(12):1537--1549, 1989.
[46]
Lisa Nguyen Quang Do, Karim Ali, Benjamin Livshits, Eric Bodden, Justin Smith, and Emerson Murphy-Hill. Just-in-time static analysis. In Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA '17, pages 307--317. ACM, 2017.
[47]
Rashmi Mudduluru and Murali Krishna Ramanathan. Efficient incremental static analysis using path abstraction. In International Conference on Fundamental Approaches to Software Engineering, pages 125--139. Springer, 2014.
[48]
Yichen Xie and Alex Aiken. Saturn: A scalable framework for error detection using boolean satisfiability. ACM Transactions on Programming Languages and Systems (TOPLAS), 29(3):16--es, 2007.
[49]
Yichen Xie and Alex Aiken. Scalable error detection using boolean satisfiability. In Proceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '05, pages 351--363. ACM, 2005.
[50]
Qingkai Shi, Peisen Yao, Rongxin Wu, and Charles Zhang. Path-sensitive sparse analysis without path conditions. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, pages 930--943, 2021.
[51]
Qingkai Shi, Xiao Xiao, Rongxin Wu, Jinguo Zhou, Gang Fan, and Charles Zhang. Pinpoint: Fast and precise sparse value flow analysis for million lines of code. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 693--706, 2018.
[52]
Sigmund Cherem, Lonnie Princehouse, and Radu Rugina. Practical memory leak detection using guarded value-flow analysis. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, page 480--491, 2007.
[53]
Qingkai Shi, Yongchao Wang, Peisen Yao, and Charles Zhang. Indexing the extended dyck-cfl reachability for context-sensitive program analysis. Proc. ACM Program. Lang., 6(OOPSLA2):1438--1468, 2022.
[54]
Sigmund Cherem, Lonnie Princehouse, and Radu Rugina. Practical memory leak detection using guarded value-flow analysis. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '07, pages 480--491. ACM, 2007.
[55]
Qingkai Shi and Charles Zhang. Pipelining bottom-up data flow analysis. In Gregg Rothermel and Doo-Hwan Bae, editors, ICSE '20: 42nd International Conference on Software Engineering, Seoul, South Korea, 27 June - 19 July, 2020, pages 835--847. ACM, 2020.
[56]
Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient smt solver. In Proceedings of the 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS '08, pages 337--340. Springer, 2008.
[57]
Linux manual page. dpkg(1). https://man7.org/linux/man-pages/man1/dpkg.1.html, 2022. [Online; accessed 12-Dec-2022].
[58]
Yum. Yum v3.4.3 documentation. https://yum.baseurl.org/api/yum/, 2022. [Online; accessed 12-Dec-2022].
[59]
All bug reports by libAlchemy. https://github.com/ash1852/fusion-scan.github.io?tab=readme-ov-file, 2022.
[60]
Synopsys. Coverity Scan. https://scan.coverity.com/projects, 2022. [Online; accessed Jul-2023].
[61]
Bug report denied by libXi. https://gitlab.freedesktop.org/xorg/lib/libxi/-/issues/14, 2022.
[62]
Bug report denied by transmisson. https://github.com/transmission/transmission/issues/3706, 2022.
[63]
Bug report denied by MariaDB. https://jira.mariadb.org/browse/MDEV-29891?filter=-2.
[64]
David R Cok et al. The smt-libv2 language and tools: A tutorial. Language c, pages 2010--2011, 2011. https://smtlib.github.io/jSMTLIB/SMTLIBTutorial.pdf.
[65]
Jeanne Ferrante, Karl J Ottenstein, and Joe D Warren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems (TOPLAS), 9(3):319--349, 1987.
[66]
Susan Horwitz, Thomas Reps, and David Binkley. Interprocedural slicing using dependence graphs. ACM Transactions on Programming Languages and Systems (TOPLAS), 12(1):26--60, 1990.
[67]
Neil Walkinshaw, Marc Roper, and Murray Wood. The java system dependence graph. In Proceedings Third IEEE International Workshop on Source Code Analysis and Manipulation, pages 55--64. IEEE, 2003.
[68]
Saurabh Sinha, Mary Jean Harrold, and Gregg Rothermel. System-dependence-graph-based slicing of programs with arbitrary interprocedural control flow. In Proceedings of the 21st International Conference on Software Engineering, pages 432--441, 1999.
[69]
Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck. Modeling and discovering vulnerabilities with code property graphs. In 2014 IEEE Symposium on Security and Privacy, pages 590--604. IEEE, 2014.
[70]
Fabian Yamaguchi, Alwin Maier, Hugo Gascon, and Konrad Rieck. Automatic inference of search patterns for taint-style vulnerabilities. In 2015 IEEE Symposium on Security and Privacy, pages 797--812. IEEE, 2015.
[71]
Yulei Sui and Jingling Xue. Svf: Interprocedural static value-flow analysis in llvm. In Proceedings of the 25th International Conference on Compiler Construction, CC '16, pages 265--266. ACM, 2016.
[72]
Lian Li, Cristina Cifuentes, and Nathan Keynes. Precise and scalable context-sensitive pointer analysis via value flow graph. ACM SIGPLAN Notices, 48(11):85--96, 2013.
[73]
Yulei Sui, Ding Ye, and Jingling Xue. Static memory leak detection using full-sparse value-flow analysis. In Proceedings of the 2012 International Symposium on Software Testing and Analysis, pages 254--264, 2012.
[74]
Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexandre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. Acm Sigplan Notices, 49(6):259--269, 2014.
[75]
Cathrin Weiss, Cindy Rubio-González, and Ben Liblit. Database-backed program analysis for scalable error propagation. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, volume 1, pages 586--597. IEEE, 2015.
[76]
Dániel Lukács, Gergely Pongrácz, and Máté Tejfel. Are graph databases fast enough for static p4 code analysis? In ICAI, pages 213--223, 2020. https://ceur-ws.org/Vol-2650/paper22.pdf.
[77]
Kai Wang, Aftab Hussain, Zhiqiang Zuo, Guoqing Xu, and Ardalan Amiri Sani. Graspan: A single-machine disk-based graph system for interprocedural static analyses of large-scale systems code. ACM SIGARCH Computer Architecture News, 45(1):389--404, 2017.
[78]
Zhiqiang Zuo, John Thorpe, Yifei Wang, Qiuhong Pan, Shenming Lu, Kai Wang, Guoqing Harry Xu, Linzhang Wang, and Xuandong Li. Grapple: A graph system for static finite-state property checking of large-scale systems code. In Proceedings of the Fourteenth EuroSys Conference 2019, pages 1--17, 2019.
[79]
Zhiqiang Zuo, Yiyu Zhang, Qiuhong Pan, Shenming Lu, Yue Li, Linzhang Wang, Xuandong Li, and Guoqing Harry Xu. Chianina: an evolving graph system for flow-and context-sensitive analyses of million lines of c code. In Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, pages 914--929, 2021.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '24: Proceedings of the IEEE/ACM 46th International Conference on Software Engineering
May 2024
2942 pages
ISBN:9798400702174
DOI:10.1145/3597503
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • Faculty of Engineering of University of Porto

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 April 2024

Check for updates

Author Tags

  1. static bug-finding
  2. function summary
  3. third-party library

Qualifiers

  • Research-article

Funding Sources

Conference

ICSE '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 137
    Total Downloads
  • Downloads (Last 12 months)137
  • Downloads (Last 6 weeks)13
Reflects downloads up to 30 Nov 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media