本文概述了一种机器学习方法,该方法可在提交前预测易受攻击的代码更改,在大型开源数据集上展示了高精度,并呼吁社区范围内共享开发者可信度数据以加强软件供应链安全。本文概述了一种机器学习方法,该方法可在提交前预测易受攻击的代码更改,在大型开源数据集上展示了高精度,并呼吁社区范围内共享开发者可信度数据以加强软件供应链安全。

机器学习工具提前发现80%的引发漏洞的提交

2025/11/20 18:00

摘要

I. 引言

II. 背景

III. 设计

  • 定义
  • 设计目标
  • 框架
  • 扩展

IV. 建模

  • 分类器
  • 特征

V. 数据收集

VI. 特性分析

  • 漏洞修复延迟
  • 漏洞修复变更分析
  • 漏洞引入变更分析

VII. 结果

  • N折验证
  • 使用在线部署模式的评估

VIII. 讨论

  • 对多项目的影响
  • 对安卓安全工作的影响
  • 有效性威胁
  • 替代方法

IX. 相关工作

结论和参考文献

\ \

X. 结论

本文提出了一种实用的、预防性的安全测试方法,该方法基于在代码提交前对可能存在漏洞的代码变更进行准确的在线预测。我们提出了三种在漏洞预测中有效的新特征数据类型,并通过使用来自大型且重要的Android开源项目的数据进行N折验证,评估了它们的召回率和精确度。

\ 我们还评估了在线部署模式,并确定了不特定于目标项目(即收集训练数据的项目)的特征数据类型子集,因此可用于其他项目(例如,多项目设置)。评估结果表明,我们的VP框架在提交前能够识别约80%的已评估的引入漏洞的变更,精确度达98%,假阳性率低于1.7%。

\ 这些积极结果呼吁未来研究(例如,使用先进的机器学习和生成式AI技术)利用VP方法或框架来支持由社区管理的上游开源项目,这些项目对数十亿用户日常使用的众多软件和计算机产品至关重要。

\ 本文的紧迫性源于其潜在的社会效益。广泛采用基于机器学习的方法(如VP框架)可以极大地增强我们分享开源贡献者和项目可信度数据的能力。这些共享数据将使开源社区能够对抗假账户等威胁(如Linux XZ工具后门攻击16所示)。

\ 此外,这种基于机器学习的方法可以在长期计划的攻击出现时促进开源项目间的快速响应。在类似或下游项目间共享信息可以增强准备度并减少对类似攻击的响应时间。

\ 因此,我们呼吁开源社区发起倡议,建立共享开发者和项目可信度数据库的实践,以加强我们的开源软件供应链,众多计算机和软件产品都依赖于这些供应链。

\

参考文献

[1] T. Menzies, J. Greenwald, and A. Frank, "Data Mining Static Code Attributes to Learn Defect Predictors," IEEE Transactions on Software Engineering, 33(1):2-13, 2007.

[2] M. Halstead, Elements of Software Science, Elsevier, 1977.

[3] T. McCabe, "A Complexity Measure," IEEE Transactions on Software Engineering, 2(4):308-320, 1976.

[4] R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufman, 1992.

[5] T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy, "Cross-project defect prediction: a large scale experiment on data vs. domain vs. process," in Proceedings of the Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE), pp. 91-100, 2009.

[6] A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler, "An Empirical Study of Operating Systems Errors," in Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), pp. 73-88, 2001.

[7] S. Kim, T. Zimmermann, E. J. Whitehead Jr., and A. Zeller, "Predicting Faults from Cached History," in Proceedings of the ACM International Conference on Software Engineering (ICSE), pp. 489- 498, 2007.

[8] F. Rahman, D. Posnett, A. Hindle, E. Barr, and P. Devanbu, "BugCache for inspections," in Proceedings of the ACM SIGSOFT Symposium and the European Conference on Foundations of Software Engineering (SIGSOFT/FSE), p. 322, 2011.

[9] C. Lewis, Z. Lin, C. Sadowski, X. Zhu, R. Ou, and E. J. Whitehead Jr., "Does bug prediction support human developers? findings from a google case study," in Proceedings of the International Conference on Software Engineering (ICSE), pp. 372-381, 2013.

[10] J. Walden, J. Stuckman, and R. Scandariato, "Predicting Vulnerable Components: Software Metrics vs Text Mining," in Proceedings of the IEEE International Symposium on Software Reliability Engineering, pp. 23-33, 2014.

[11] A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler, "An empirical study of operating systems errors," in Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), pp. 73-88, 2001.

[12] S. R. Chidamber and C. F. Kemerer, "A Metrics Suite for Object Oriented Design," IEEE Transactions on Software Engineering, 20(6):476-493, 1994.

[13] R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, San Mateo, CA, 1993.

[14] A. Chou, J. Yang, B. Chelf, S. Hallem, and D. Engler, "An empirical study of operating systems errors," in Proceedings of the ACM Symposium on Operating Systems Principles (SOSP), pp. 73-88, 2001.

[15] R. Chillarege, I. S. Bhandari, J. K. Chaar, M. J. Halliday, D. S. Moebus, B. K. Ray, and M-Y. Wong, "Orthogonal defect classification-a concept for in-process measurements", IEEE Transactions on Software Engineering, 18(11):943-956, 1992.

[16] R. Natella, D. Cotroneo, and H. Madeira, "Assessing Dependability with Software Fault Injection: A Survey", ACM Computing Surveys, 48(3), 2016.

[17] K. S. Yim, "Norming to Performing: Failure Analysis and Deployment Automation of Big Data Software Developed by Highly Iterative Models," in Proceedings of the IEEE International Symposium on Software Reliability Engineering (ISSRE), pp. 144-155, 2014.

[18] S. R. Chidamber and C. F. Kemerer, "A Metrics Suite for Object Oriented Design," IEEE Transactions on Software Engineering, 20(6):476-493, 1994.

[19] K. S. Yim, "Assessment of Security Defense of Native Programs Against Software Faults," System Dependability and Analytics, Springer Series in Reliability Engineering, Springer, Cham., 2023.

[20] M. Fourné, D. Wermke, S. Fahl and Y. Acar, "A Viewpoint on Human Factors in Software Supply Chain Security: A Research Agenda," IEEE Security & Privacy, vol. 21, no. 6, pp. 59-63, Nov.-Dec. 2023.

[21] P. Ladisa, H. Plate, M. Martinez and O. Barais, "SoK: Taxonomy of Attacks on Open-Source Software Supply Chains," in Proceedings of the IEEE Symposium on Security and Privacy (SP), pp. 1509-1526, 2023.

[22] D. Wermke et al., ""Always Contribute Back": A Qualitative Study on Security Challenges of the Open Source Supply Chain," in Proceedings of the IEEE Symposium on Security and Privacy (SP), pp. 1545-1560, 2023.

[23] A. Dann, H. Plate, B. Hermann, S. E. Ponta and E. Bodden, "Identifying Challenges for OSS Vulnerability Scanners - A Study & Test Suite," IEEE Transactions on Software Engineering, vol. 48, no. 9, pp. 3613-3625, 1 Sept. 2022.

[24] S. Torres-Arias, A. K. Ammula, R. Curtmola, and J. Cappos, "On omitting commits and committing omissions: Preventing git metadata tampering that (re)introduces software vulnerabilities," in Proceedings of the 25th USENIX Security Symposium, pp. 379-395, 2016.

[25] R. Goyal, G. Ferreira, C. Kastner, and J. Herbsleb, "Identifying unusual commits on github," Journal of Software: Evolution and Process, vol. 30, no. 1, p. e1893, 2018.

[26] C. Soto-Valero, N. Harrand, M. Monperrus, and B. Baudry, "A comprehensive study of bloated dependencies in the maven ecosystem," Empirical Software Engineering, vol. 26, Mar 2021.

[27] R. Duan, O. Alrawi, R. P. Kasturi, R. Elder, B. Saltaformaggio, and W. Lee, "Towards measuring supply chain attacks on package managers for interpreted languages," arXiv preprint arXiv:2002.01139, 2020.

[28] Enduring Security Framework, "Securing the software supply chain: Recommended practices guide for developers," Cybersecurity and Infrastructure Security Agency, Washington, DC, USA, August 2022.

[29] Z. Durumeric et al., "The matter of heartbleed," in Proceedings of the ACM Internet Measurement Conference, pp. 475-488, 2014.

[30] D. Everson, L. Cheng, and Z. Zhang, "Log4shell: Redefining the web attack surface," in Proceedings of the Workshop on Measurements, Attacks, and Defenses for the Web (MADWeb), pp. 1-8, 2022.

[31] "Highly evasive attacker leverages SolarWinds supply chain to compromise multiple global victims with SUNBURST backdoor," Mandiant, available at https://www.mandiant.com/resources/blog/evasive-attackerleverages-solarwinds-supply-chain-compromises-with-sunburstbackdoor

[32] W. Enck and L. Williams, "Top five challenges in software supply chain security: Observations from 30 industry and government organizations," IEEE Security Privacy, vol. 20, no. 2, pp. 96-100, 2022.

[33] "CircleCI incident report for January 4, 2023 security incident." CircleCI, available at https://circleci.com/blog/jan-4-2023-incidentreport/ [34] K. Toubba, "Security incident update and recommended actions," LastPass, available at https://blog.lastpass.com/2023/03/securityincident-update-recommended-actions/

[35] D. Wermke, N. Wöhler, J. H. Klemmer, M. Fourné, Y. Acar, and S. Fahl, "Committed to trust: A qualitative study on security & trust in open source software projects," in Proceedings of the IEEE Symposium on Security and Privacy (S&P), pp. 1880-1896, 2022.

[36] D. A. Wheeler, "Countering trusting trust through diverse doublecompiling," in Proceedings of the IEEE Annual Computer Security Applications Conference (ACSAC), pp. 13-48, 2005.

[37] K. S. Yim, I. Malchev, A. Hsieh, and D. Burke, "Treble: Fast Software Updates by Creating an Equilibrium in an Active Software Ecosystem of Globally Distributed Stakeholders," ACM Transactions on Embedded Computing Systems, 18(5s):104, 2019.

[38] G. Holmes, A. Donkin, and I. H. Witten, "WEKA: a machine learning workbench," in Proceedings of the Australian New Zealnd Intelligent Information Systems Conference (ANZIIS), pp. 357-361, 1994.

[39] D. Yeke, M. Ibrahim, G. S. Tuncay, H. Farrukh, A. Imran, A. Bianchi, and Z. B. Celik,, "Wear's my Data? Understanding the Cross-Device Runtime Permission Model in Wearables," in Proceedings of the 2024 IEEE Symposium on Security and Privacy (SP), pp. 76-76, 2024.

[40] G. S. Tuncay, J. Qian, C. A. Gunter, "See No Evil: Phishing for Permissions with False Transparency," in Proceedings of the 29th USENIX Security Symposium, pp. 415-432, 2020.

[41] K. R. Jones, T.-F. Yen, S. C. Sundaramurthy, and A. G. Bardas, "Deploying Android Security Updates: an Extensive Study Involving Manufacturers, Carriers, and End Users," in Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security (CCS), 551-567, 2020.

免责声明: 本网站转载的文章均来源于公开平台,仅供参考。这些文章不代表 MEXC 的观点或意见。所有版权归原作者所有。如果您认为任何转载文章侵犯了第三方权利,请联系 service@support.mexc.com 以便将其删除。MEXC 不对转载文章的及时性、准确性或完整性作出任何陈述或保证,并且不对基于此类内容所采取的任何行动或决定承担责任。转载材料仅供参考,不构成任何商业、金融、法律和/或税务决策的建议、认可或依据。