None of this was written by ChatGPT. But that is exactly what you would expect an AI to say, isn’t it?
With the rapid rise in popularity of Artificial Intelligence (AI) services like ChatGPT, Dall-E, and GitHub Copilot, many people are looking at ways they can leverage the new abilities of these systems to improve their existing businesses. From writing internal documentation to writing software code; from generating static images for presentations to creating whole video sequences; the possibilities appear both enticing and endless.
But what is the potential security impact of using these systems, particularly when applied to the payment industry? What is AI being used for, and what are the risks associated with those uses?
Garbage in, Garbage out?
Until recently, the uses of AI systems have been limited to data analysis – automating operations which could be otherwise performed by humans but where the input data is so large, or the response times required are so short, that reliance on humans is not feasible. This is where the payment industry has traditionally used AI to help spot anomalies, which may be indicators of fraud, in enormous global transaction datasets.
PCI Security Standards Council’s (PCI SSC) standards such as Mobile Payments on Commercial Off the Shelf (COTS) devices (PCI MPoC), rely on the use of attestation and monitoring systems that can utilize AI to analyze data from the multitude of COTS devices used to accept payments, using this data to help detect attacks on the system which may be underway. Similar AI-based detection has existed in Endpoint Detection and Response (EDR) tools for some time, analyzing data to look for anomalies in enterprise workstations and networks.
However, these systems are often only as good as the data used to ’train’ the AI, to help create the model by which the AI behavior is based. Because of this, when utilizing or relying upon an AI system, it is important to consider bias that might be contained within that training data, and how this may impact the output from the AI system and anything that is relying on that output.
A major concern with AI systems is that we often do not know why they are making the determinations they are, we only know that it is based on the training data provided. Exactly what aspects of that data have been found useful can be hard to determine. There is an active area of research in the field of adversarial machine learning1, that attempts to exploit gaps or flaws in the AI behavior to achieve unexpected or malicious output. For example, research has shown attacks where image data that appear identical to humans are found to be opposites by an AI system2.
The Next Generation
Increasingly, AI systems are being used to not just filter and categorize input data, but to generate new output data such as text, images, videos, and even software source code. Generation of original content from an AI system, categorized as ‘generative AI,’ is relatively new but improving at a rapid pace.
While generative AI has a lot of promise, it is important to understand the limitations of these types of systems. For example, AI trained to generate functional code may not always be generating code that is the most secure3 – ‘functionality’ and ‘security’ are different things. This is why there are specific requirements within various PCI SSC standards pertaining to software development and training staff on secure coding techniques.
An entity that is relying on AI generated code needs to ensure it not only works (which is not always guaranteed), but it is created in the most secure way.
Generative AI can also be faced with challenges around the copyright and use of the output it creates. Where does the training data come from, and have the owners of that data agreed for it to be used as input into an AI system? If an entity is entering their own data, or their customer’s data, into an AI system – for example to review the content of documents or to perform some form of code review – how is that data secured and managed? Will it be recycled and used as further training data, or is the entity losing ownership of that data by using the AI system? There have been instances where AI systems leaked4 the personal and payment information of other users that had previously input into the model.
It is important to understand that generative AI systems are not procedural in nature; they may not always give the same output for a given input. These systems continue to ‘learn’ from the data they are provided and change over time as this data also changes. As such, it is recommended entities continually test and vet the output of these systems to ensure they continue to be fit for purpose.
Attack of the Robots
Unfortunately, AI systems are not only being deployed and used to make things better. As with many modern technologies, bad actors are often the first to recognize the value and deploy them as part of their ongoing attacks. With the ability for AI to create convincing conversational dialog – and even generate ‘deep fake’ pictures, audio, and video – this technology can act as a multiplier for phishing and other types of social engineering attacks.
Systems which rely on remote validation of user identity can also be impacted by such generative AI. In a world where pictures, videos, and voice can be instantly – and falsely – generated, the difficulty of authenticating remote customers is dramatically increased.
As noted, AI trained correctly can generate functional code, but AI can also be trained to find vulnerabilities in software as well5. Many software validation and testing tools include AI elements that can speed up the detection of common flaws or find software flaws which would have previously gone undetected.
It is difficult to estimate the impact that AI will have in the future, but those impacts will be profound and wide ranging. If you are considering the use of AI in your business, or how you can defend against AI-based attackers, the following provides a starting point for how to approach security in this new threat landscape.
Considerations When Using AI-based Systems:
- Does the input you are providing to the AI system contain sensitive data such as customer data, account data, cryptographic keys, or API (Application Programming Interface) keys? How does the AI system manage and guarantee the security of that data, and how does this impact the scope of your security and compliance obligations?
- Are you validating any generative AI output through expert review to ensure there are no security issues or inconsistencies, and that the output is functionally correct and/or accurate?
- Do you regularly check the output of any generative AI systems to confirm they continue to work as originally validated, and their output has not significantly changed over time in a way that affects your use case?
- Are you confident in the legal stance of any AI uses and or generative output you are using, as they apply to your input data, use-cases, industry, and region?
- Are you confident in your understanding of how the AI system operates, where data processing occurs, and how decisions are made? What process is used to update AI models, and how does that affect your use-cases and data?
Considerations when Defending Against AI-based Systems:
- Are you relying on behavioral input which can be easily mimicked by an AI system, such as voice, pictures/video, etc.? If so, can you supplement this with other authentication aspects such as strong passwords or cryptographic systems (such as passkeys)?
- Does your threat model rely on the inability for attackers to scale human-level comprehension or input, such as through interactive chat or CAPTCHAs? If so, does this need to be reconsidered due to the increased ability of AI systems?
- What protections do you have in place against phishing and other types of remote social-engineering types of attacks, and how is your staff security awareness training adapting to consider AI-based attacks? Assume these types of attacks will increase rapidly with the increased ability of AI, including expanding beyond text to include (fake/generated) voice and video.
- Can users identify and validate content or services you provide so that they can protect themselves against AI-generated fakes or phishing they may believe originates from you?
Sources:
(1): https://www.nccoe.nist.gov/ai/adversarial-machine-learning#:~:text=Adversarial%20machine%20learning%20(AML)%20is,to%20obtain%20a%20preferred%20outcome
(2): https://www.science.org/content/article/turtle-or-rifle-hackers-easily-fool-ais-seeing-wrong-thing
(3): https://ee.stanford.edu/dan-boneh-and-team-find-relying-ai-more-likely-make-your-code-buggier
(4): https://not-just-memorization.github.io/extracting-training-data-from-chatgpt.html
(5): https://security.googleblog.com/2023/08/ai-powered-fuzzing-breaking-bug-hunting.html