LLMs (Large Language Models) aren’t often thought of as an effective tool to identify security vulnerabilities. But within this use case, we’ll demonstrate how security teams can use LLMs to flag pull requests (PRs) where vulnerable code may have been introduced. Resulting in targeted testing, reduced MTTD (Mean Time to Detection) and reduced numbers of vulnerabilities reaching critical production environments.
Introducing the vulnerability
A vulnerability was introduced into a staging environment that allowed privilege escalation to occur within the application. It allowed any user to set their own privileges, including upgrading to an administrative user with full control over the application and its data. This could have resulted in unauthorised access to data stored by the application, including sensitive information related to application users, as well as administrative functions of the application, e.g. (deletion of application users, access to sensitive customer data).
Discovering the vulnerability
Due to the nature of the vulnerability, the bug was not flagged by the code reviewer or the automated smoke tests implemented in the CI/CD (Continuous Integration / Continuous Deployment) pipeline.
The bug was also undetected by SAST / DAST tools, since it was a flaw specific to the application itself, specifically its authorisation and permission system.
Instead, the vulnerability was discovered with the help of an LLM. It was provided with an overview of the changes (commits) introduced within the given pull request and asked to provide an insight into what vulnerabilities the PR could bring into the application. This flagged the issue and after reading the description of the change, the security engineer utilised the testing plan provided by the LLM to manually check for authorisation vulnerabilities introduced as a result of the change. And thus the vulnerability was identified.
Remediating the vulnerability
The associated open PR was closed and a write-up of the finding was provided to the developer within 5 hours of the vulnerable code reaching the staging environment. This included a description of the vulnerability, how and where it was introduced as well as other supporting material, such as screenshots.
The developer was able to review the issue write-up and associated code before making necessary changes to prevent the vulnerability from occurring within 24 hours.
Business impact of using LLMs to detect run-time vulnerabilities
Often security engineering teams are tasked with working through 100s if not 1,000’s of changes / commits per release cycle or sprint, depending on the size of their organisation. Allowing machine learning, particularly language models, to assist in the classification of changes can significantly speed up and focus this process. They also act as an effective safety net, highlighting missed vulnerabilities from DAST / SAST scans.
Targeted testing
A large contributor to a successful security test is understanding the application being tested. In a typical point in time assessment, a security engineer is tasked with assessing an entire application in one go. This likely equates to a collection of hundreds, or maybe thousands, of changes since the last test. At this level, it becomes a significant burden to work backwards and attempt to understand what these changes actually do, the areas of the application they might affect and what vulnerabilities they could introduce. Particularly in cases where the only available resource is a developer who made the changes perhaps months ago and is likely to have forgotten.
Ensuring changes are captured close to their inception and categorised accordingly into testable or non-testable changes, enables security engineers or penetration testers to deliver their value. It allows them to focus most of their time on targeted testing rather than discovery. This is analogous to a developer using an IDE (Integrated Development Environment) to write code. It's not necessary to use an IDE but it makes life a lot easier. Similarly, utilising language models to carry out this initial classification is not a replacement for manual testing, but it is a huge help in ensuring testing is delivered faster and more effectively.
Improved communication
A secondary benefit of tapping into change data such as pull requests and commits, is the minimisation of communication issues between security and software engineers. By conveying more accurate remediation advice to a developer down to the code level, it ensures that a deeper understanding is facilitated between the two groups / individuals and both teams are fully informed.
Summary
The use of LLMs to summarise code changes introduced via PRs significantly improves the speed and effectiveness of vulnerabilities identified.
Shifting left and incorporating continuous security testing into the SDLC ensures that vulnerabilities are identified well before they reach production environments. Resulting in the following improvements:
- Reduced MTTD of vulnerabilities and prevention from reaching critical production environments, potentially within hours of release rather than weeks or months.
- Targeted and higher quality testing of individual application components that require testing, reducing the time spent by security engineers on non-testable changes.
- Increasing the efficiency and performance of security engineers so they spend less time understanding application capabilities and more time focusing on testing areas of applications affected by change.