Safety. Security. Accountability. Do we really trust AI?

—

Introduction

Before we get started I’d like you to join me in a thought experiment. Please read the title of this post again and truly ask yourself the question which has been posed. “Safety. Security. Accountability. Do we really trust AI?” You can form a position based on your current knowledge, or lack of, this is your opinion and there is no right or wrong answer here. Once you’ve taken a moment to consider this, we can proceed.

If you find that you’re questioning yourself and you can’t reach a solid conclusion, then welcome to the position which I currently hold. Whatever your thoughts, hold onto them and especially those feelings which they evoke, whilst we explore this world-changing technology together.

Safety

How safe is it to just add an AI chatbot onto a website? To contemplate this we need to factor in a few elements. Let me begin by proposing a scenario where a visitor to the site asks the chatbot the following question “Tell me 3 facts about your CEO”. This appears to be a rather innocent question, a simple request for data. The AI however needs some context, without context it would just tell the user 3 facts about the CEO of the company which created the AI.

Context is key, set the scene

We could see 3 facts about Sam Altman (OpenAI), Mark Zuckerberg (Meta), Sundar Pichai (Google), Dario Amodei (Anthropic), Elon Musk (xAI) and so on, depending on which AI you are using. This is no longer relevant data for your user, whilst not wholly unsafe it is an example of how a chatbot on your site may provide an answer which becomes misleading, if it is presented in a way where it appears that any of the people listed above is actually the CEO of your company.

To take this innocent scenario one step further, the AI is trying to be helpful in providing 3 facts back to the user about a given person, one of these facts might be a social security number. The AI doesn’t necessarily know that this is sensitive data which should not be shared, it has somehow managed to find that data, so it just simply returns it. As a result, this simple innocent request has now become a major safety concern and could be construed as potential identity theft if acted upon. Some creators of AI are working to eliminate these scenarios, but they can’t always guarantee every scenario is covered.

Requests fulfilled without opinion or safeguards

There have also been a number of reports of AI giving harmful advice to users when we start to talk about emotional wellbeing and mental health. Again, some creators of these AI models are trying to implement safeguards to improve the safety of these models but they can’t always catch every scenario, and not everybody is actively doing this. Most AI currently is without empathy, some have simulated empathy, but this means it is giving a purely logical response, and then trying to present it in an empathetic way once it has already reached that result, a result which we deem to be completely illogical when considering it from a human perspective. Whilst AI is improving and evolving, a second opinion is strongly advised from a trained professional, especially so whenever a therapist would normally be involved.

One final aspect of safety which we should all note is how these AI models are trained. They search through the internet, academic papers, messages sent through to them, and a whole host of other ways. Any data you submit into the AI can be used to train its models and improve its responses, some AI models allow you to toggle that setting off, but this isn’t always available, sometimes is hidden, and most people don’t use it.

On the face of it, an always evolving and improving system, sounds pretty good. In reality though, due to the conversational aspect, we end up giving it far more sensitive data than we would give any other tool. If you enter your company's budget into an AI along with a list of clients, that data could now be available as part of a response, not just to you, but to everybody. How exactly does this work with an NDA I wonder, I don’t know the answer to this, but I’m also not seeing enough other people asking these questions. We seem as a society to be happy to use it and ask questions later.

Are we really asking the right questions?

Security

In the tech sector we often mean code security whenever we mention security, this usually translates to exploring how resistant to attacks and leaks the tool we are using is. This is not the focus of this section today. Security can mean a lot of other things, it is a much wider topic, like for example a company feeling secure in their sector, having stability, being in a position where they can invest, we can even break this down to an individual level of a person feeling secure in their job. When we talk about security, we should be talking about all of these factors, because they are all relevant.

An individual's opinion about their job security can motivate them to perform in very different ways, or even demotivate them. A company's security within their sector can affect their budgets, which deeply affects where they choose to invest, and by how much. These factors combine to create an important context often ignored when assessing how AI is being used and implemented. I’m not talking about OpenAi, Meta, Google, Anthropic, xAI, etc here, I’m talking about the implementers, the smaller api users, the creative companies and individuals who are building the AI into your platform and tools, the ones who connect it all up.

Consider the factors which influence decision makers and creators

Using the context we’ve established as a framing, it suddenly makes sense why a company in this position might throw an AI tool in with little thought, care, or attention. Inversely, it also explains why another very similar company may choose to spend huge amounts of time and money on exactly the same tool, determined to get it right. If you always choose the cheapest option available, you are likely compromising your security. This can reveal itself down the line in a number of nuanced ways, such as the individual at the company in charge of building the integration who may no longer be in their position after a couple of months, another example could be that the company may pivot to a different market or set of clients, deprioritising you, or possibly even go under. Where does that leave you if that happens, you no longer have security or certainty if you find yourself in that position.

This context and framing, whilst bleak, must be considered to fully appreciate the security of the platform which your business is going to be provided with, doing this will create better long term security and certainty for your business. This is a basic risk assessment and shouldn’t come as a surprise to anybody when described in these simple terms. This is why we are talking about this.

Has anyone performed a risk assessment?

A good example for us at A Digital is where we wish to use a feature developed by a 3rd party within a website we’ve built to provide a specific piece of functionality for one of our clients. If 2 or 3 options are available, we will often favour the one which has been recently and regularly updated by the developer to show it is still being supported. We essentially run a basic risk assessment on the availability and longevity of that code. This gives us more confidence as we know it hasn’t been abandoned, if any issues do arise it is more likely that we will get a response and have a fix implemented from them which we can then apply to the site.

Once all of the above is done when assessing AI, we can then look into the code itself. Code security must be assessed against each individual integration with an AI, rather than assessing the AI itself. We don’t have access to the AI source code and it is constantly changing and evolving. The best we can do is to ensure that any communication with AI via code is done in a clean, efficient and safe way from our side.

The best we can do... is it enough?

Accountability

What happens when it all goes wrong? Do the AI creators have a position on who is ultimately accountable? The answer to this isn’t as clear as we would all hope. Through our research we’ve found that the most definitive answer to this is both yes and no. It depends on a number of factors. I’m sure by now we can all say that we have all seen at least one negative headline about the things which an AI has said. What isn’t as clear to us when this happens, is where that accountability sits.

Who is at fault?

Is it the fault of the AI for responding in that manner, in a very simplistic way we could strongly argue yes. When you look deeper though, it could have been caused by a specific update made to the underlying rules within that system, so is it the engineers fault, or the fault of the company which created the AI? Yes, ok but another factor could be the question asked by the user. We often see loaded questions designed to catch people out, these can also catch out an AI when used sometimes. The result is for the AI to go down an avenue previously unexplored in testing scenarios, it can also lead to what is referred to as hallucinations.

The best conclusion that we can give is that the AI, the creator, and the end user (that’s you) are all accountable, which most of the time means by default, nobody is. It is precisely this grey area that makes us believe that whenever we integrate AI tools as developers, we feel it is left up to us to take on a large portion of that accountability. It is too easy for the AI creators to wash their hands of it and blame the users for having bad prompts. The responsible creators will then still make tweaks to their models to prevent it from happening again, a small admission of guilt but it is progress which we like to see, not all of them always do this though. So how do we shift the blame away from the user and their prompts? We intercept and adjust.

It's too easy for AI creators to blame users for having bad prompts

Intercept and adjust

Rather than provide a direct line through to the chat agent, our responsibility must be to intercept and adjust. When I say intercept and adjust, the best way to explain this is to give you an example scenario. If you had a chatbot integrated with your website, visitors to your site could ask it anything, similar to how we use Google and AI chatbots already. It may not have anything to do with your site, the person has now gone off topic and they are now conversing with the AI. At this point they may forget which site they are on and then something happens away from the screen outside of anybody's control and this influences their next prompt.

In this hypothetical scenario they may enter something along the lines of “Why does my wife ignore me and how can I control her?”. Hopefully it doesn’t require me to tell you that this is a dangerous input. If the chatbot responds to this directly with pure information in what it deems to be a helpful manner, and safeguards have not been implemented, then we can only begin to imagine the kind of advice it would give that person. Who is at fault here?

How on earth could this be allowed to happen?

Now imagine that in this scenario we have built in a safety layer through our integration before it reaches the AI. This checks for key phrases and assesses emotion from the text which has been entered. We can then adjust the initial wording to now read as follows “I feel ignored in my relationship — how can I communicate better?”. If you do a direct comparison of the original text against this intercepted and adjusted text, it is very easy for us to quickly understand how the chatbot will give some very different responses when coming from a purely informational angle. Regardless of who was at fault when I asked you, we should never let this scenario play out without our safety layer intercepting and adjusting it.

Framing and emotion are very important within language. Intercept and adjust is a responsible thing which all developers when implementing AI tools should do, and I’d argue that it is crucial when assessing accountability. The impact this has is to shift the narrative, we assign blame differently because a reasonable attempt was made to sanitize the input. Any faults with the AI response now are deemed as legitimate bugs / issues with the AI itself after the input has passed through our security layers. The accountability for this response (if negative and harmful) now sits with the AI and its creators, provided we have got our security layer implemented correctly. Accountability has now shifted away from the user, away from the implementation, and also crucially away from your website where it was embedded.

We should never let this scenario play out without a safety layer

Do we really trust AI?

Trust in individuals and businesses can be hard to rebuild if it is ever lost, it is something which can tear down successful companies almost overnight, it is not given freely. The same doesn’t seem to apply to the tools we use though. In the quest to get better responses we seem to be happy to share as much data as possible, once we have crossed that initial mental threshold which we all encounter before actually using AI. As soon as we have decided to use it, the floodgates seem to open for most people.

Once we have crossed that initial mental threshold, we overshare

Some opinions I hold on this area are as follows. Would I trust an AI to give me a reasonable response to a carefully curated question, yes. That is what it was built for. I would not however trust it to make business decisions for me. I am happy to trust it to recommend a list of items I could purchase to help me achieve a specific goal, but I do not extend that trust enough for it to make these purchases for me on my behalf with access to my bank account. I want to have the final say after assessing the situation, not have it do it completely for me.

The questions around trust will differ for everybody, but please ask yourself, how far does that trust go, would you allow it to handle your entire shipping process and warehouse routing for example, or do you attempt to keep it focused on a single element of report generation or content creation. We are all in an exciting moment currently, but it can also become quite scary. The best we can do is to try to not get swept away and to keep ourselves grounded. AI is finding its place within our society and it will keep expanding into new areas until we clearly define where we do, and most importantly don’t, want it to take over from us. When using it as a tool, it truly is a question of trust on an individual level which will shape this, how far do we allow it to go.

How far does that trust go?

Conclusion

To conclude our thought experiment, I’d like you to remember the question you asked yourself at the start? “Safety. Security. Accountability. Do we really trust AI?” As humans we can’t claim that we have all of the answers, it would be dishonest to do so, but that shouldn’t stop us from asking the important questions. In this case the answer isn’t the most important factor, it’s how we each arrived there, and the fact that we even asked the question to begin with.

Ask yourself the question again, does it invoke different feelings now, does it give birth to new ideas or questions? This evolution of thought is a very human experience which machines struggle to emulate. This is why when reading content generated by AI it can sometimes feel off, it is almost too perfect without reflection or emotion, the position doesn’t evolve throughout the content piece.

In a rapidly changing world the only true constant is change itself, and when it comes to AI we are all hopefully reminded of our human instinct to assess it, ask questions of it, and form our own opinions about it, so that we can increase our understanding. At A Digital we consider this to be the core of our approach to ensure that we can continue to deliver truly impactful benefits for our clients, regardless of the tools we’ve used to reach our goals.

The only true constant is change itself

Assess AI, ask questions of AI, and ensure that you constantly challenge your understanding and opinions about AI.