AI Security After Codex and Claude Code — Zico Kolter & Matt Fredrikson, Gray Swan

One thing that we are finding and I think we're we're kind of crossing this point too is that in a lot of the latest experiments we can do much better than the human red teamers. Now when I say we I mean our automated red teaming models a system called shade that system is now actually quite a bit better at breaking uh models than humans are. >> Before we get into today's episode I just have a small message for listeners. Thank you. We would not be able to bring you the AI engineering, science, and entertainment content that you so clearly want if you didn't choose to also click in and tune into our content. We've been approached by sponsors on an almost daily basis. But fortunately, enough of you actually subscribe to us to keep all this sustainable without ads, and we want to keep it that way. But I just have one favor to ask all of you. The single most powerful, completely free thing you can do is to click that subscribe button. It's the only thing I'll ever ask of you. And it means absolutely everything to me and my team that works so hard to bring the inspace to you each and every week. If you do it, I promise you we'll never stop working to make the show even better. Now, let's get into it. Okay, we're here in a studio with Gracewan, Matt, and Ziko. Welcome. >> Great to be here. Yep. Thanks for having us. >> You're visiting from Pittsburgh. >> That's right. >> The home of uh all good computer science. I don't know if I'm overstating things. Very very strong university. >> Yeah. CMU has been the center of a lot of AI since really the the dawn of the field. >> Yeah. Especially a lot of self-driving, some language learning. Congrats on your your series A. I I mean you're here because uh you're attending Snowflake Summit and Snowflake is one of your investors. >> Let's introduce crisply at the top. Uh what do you guys what is Grace Swan and what have you chosen to be uh you know your your your sort of startup um domain? Yeah. So, you know, at Grace Swan, our mission is to empower everyone to use AI safely and securely. Um, so, you know, really artificial intellig language models are at the end of the day software. if you want to sort of deploy them, build applications on top of them, um you need to be sort of aware of of what you know what the vulnerabilities might be, what can go wrong and not just in sort of everyday use like you're kind of innocently using an agent and you know maybe it it makes a mistake in a tool call. Um, but also, you know, in in worst case kinds of scenarios where there might be like an attacker who has an incentive to make your agent misbehave, leak data, steal credentials, things things like that. So, Grace One really kind of grew out of out of our research. Zeke and I have been at at Carnegie Melon, you know, for some period of time, over a decade, >> um, looking into uh, just this, right? like what are the new kind of vulnerabilities and and kind of attack surfaces um in in especially deep learning systems? Um how do you test for them? How do you understand sort of the scope of how severe they can be? Um and once you know that there is is a vulnerability, there is a problem. Uh how do you fix it? How can you do inference more robustly? What can you put in place to to make sure that uh these these sort of bad outcomes don't come to pass? Yeah, I honestly a very fruitful area of study for any academic. Throwback, this is 10 years ago. >> Uh, yep. Which is l literally the entire domain. Uh, and and I actually got a lot of uh inspiration from Ian Goodfellow who's uh who's a friend of the pod and you know this is one of those uh initial adversarial settings >> and this paper was directly inspired by uh work. >> Yeah. Yeah. >> Uh Ziko, what about your side of the story? >> Yeah. So um like Matt been faculty at Carne Melon for a while um I think fundamentally look I think I think that some sense we're all here because we believe in the transformative power of AI and we think that this is this has already transformed the way the entire sort of software ecosystem works and it will transform how many other ecosystems work going forward. Um the issue though is that these systems just fundamentally behave very differently from software we're used to. And I don't mean in terms of AI can find vulnerability software though it can also do that and is also transforming that. I just mean that AI systems have >> inherent >> inherent different types of vulnerabilities. They can be tricked like people get tricked sometimes, right? And so you need a different mindset about security when you're thinking about AI systems. >> Yeah. >> And especially when there's the possibility of correlated failures, right? So it's not just that there's a lot of AI systems out there. It's that there's actually a few models that everyone is using. And if you find vulnerabilities in the agents that everyone uses, right? things like codeex and cloud code, you can actually start to now essentially have a new exploit, a new class of exploit. Fundamentally, I think there has to be a different mindset about the nature of AI security as there is for traditional security. And while a lot of that's going to of course happen at the AI companies themselves, labs themselves, there's also a real value. And of course I should you know to be very clear the labs are doing a lot of work in these areas but there's just like in most domains when a new platform emerges it's very common for there to also emerge a security system separate from it right in addition to it as a separate service that's provided and I think that's where we are right now with AI and I think there's a need for specificallyminded AI safety the insecurity providers. There's a demand for this and there's going to be much more demand for this coming up and that's why it felt like a really good time to sort of focus on this problem both in research because we still do research on this topic too and we're continuing research actually at Grey Swan but also in terms of a commercial offering. >> Yeah, I do want to highlight at the at right at the top that uh this is not a cyber episode in that traditional sense, right? A lot of people like looking at the the title of this this pod might initially think about that, but you're you're actually trying to treat these models inherently as uh untrusted entities. >> Yeah, exactly. So, fundamentally, and I think it sort of is it is a common conflation because AI is also very good at solving cyber security problems, right? Or I shouldn't say solving I mean it's good at solving problems too, but it's also good at causing problems, you could say. Um but fundamentally their AI systems themselves have the potential to introduce new vulnerabilities. And so this is not about using AI to make your cyber infrastructure better. Graan is about understanding the security risks that you are bringing when you adopt AI and when you deploy AI and mitigating those risks. >> Yeah. >> Yeah. I mean I think a big part of that too is the way that people are using artificial intelligence right like building you know entire systems on top of them they can operate autonomously is once you've integrated that right in into your larger platform um into your network uh you do have a potential cyber security risk right so it's it's about mitigating that risk posed by the AI right as it relates to all all of the cyber security goals and and uh and concerns you have >> part of this is red teaming One of the reasons we uh reached out to you was was uh you were involved in the cloud mythos preview uh where you guys are one of the authorities on IPI which I just learned is is the term for for what everyone's calling this. Let's talk through like some of like when you receive a model doesn't have to be mythos but obviously that's the most prominent one right now. What do you do with it? >> Yeah, we do a range of things. Um in in the mythos case I'll talk about that cuz you have it up on the screen. The concern that the people we were working with at Enthropic had was how robust is is this model to indirect prompt injection. Right? If you operate uh a coding agent use mythos as as the model, it's going to go out there and start fetching untrusted content. Um reading reading things that uh you know have have characters you might not control. How robust is it going to be and sort of staying you know true to its original objective and and not getting hijacked. But there are a lot of other things that we do as well. We'll help the frontier labs like test their specifica safeguards for certain, you know, kinds of activities like cyber misuse. We'll help pretty much with any kind of adversarial you know safety and security related evaluation that you know the people who were building the model and and want to sort of assess what their progress has been from the last iteration we can provide that uh evaluation for them. they also have this in-house and obviously Anthropic is uh very very ideologically inclined to do so. What would they choose to outsource versus what they do in-house like is there like a pattern here? >> Yeah, so there are two two things that I think um we kind of stand out for. One is the race one arena. Um so we operate a community of of red teamers. We provide uh sort of prize challenges. Um a lot of these come from the needs of of uh the lab sponsors. Um, so sort of to an extent gify red teaming objectives, you know, put up a prize pool and and pay people when they find ways to sort of circumvent and violate whatever the object the safety and security objectives of of the model developers were. So that's that's one. It's it's a a really great community like 15,000 people come and hang out on the Discord server. Not all of them take part in every competition but a lot of a lot of good data and good signal is provided to you know the the upstream model developers through through that community. The second is the automated red teaming that we do. So we we train you know a family of models to be very sort of effective and rigorous at uh doing automated red teaming both of the sort of base model right so just thinking of it like as as a turnbased like you know chatbot without tools or anything and agents built on top of it and uh it hasn't been saturated yet. So when the frontier labs come to us, we're still able to find ways to indirect prompt inject or jailbreak or just generally get get their models to do things that uh that they wouldn't want to. >> Did you say without tools? >> With and without tools. So we definitely operate on on agents as well. >> I mean obviously that would be more useful. >> Yep. >> Yep. I mean that's that's actually a fairly recent thing. For a while, what we would help, you know, the Frontier Labs with was more just like, you know, chat based interactions going around their content safety policies and what what is in their model spec. Now the the focus is very much on agents and tool use and and all the downstream applications that people want to build on top. >> Yeah, this is a RL inspired topic. I wonder if there's any such thing as like on policy red teaming where are models from the same family, same data set more capable of red teaming themselves. >> That's an interesting question. Um we unfortunately I mean we do have the ability to test that out on on smaller open source models. >> So generally speaking the issue with this is that frontier models are extremely bad at automated red teaming because they have a lot of safeguards built into them. So if you try to use them to to jailbreak other model, they will actually refuse their safety training which is itself as a base model can sometimes be bypassed but they will often refuse to do this. Maybe they'll hypothetically know how to do it, but but you know, you need and it's actually an important point because traditionally this has been an area where both in terms of safety, models don't get better by just being bigger, unlike most other areas where models do get better by being bigger. Safety has not been like that traditionally. You know, you have to train them explicitly to be safe or they won't do that. But on the flip side, they're also not necessarily better at red teaming uh by default. you really sort of need to train specialized models for red teaming to make them good at red teaming. >> That's awesome for you guys. >> Yeah. And and so and and what do you need to do that? Well, you need lots of data from from people that are traditionally much better at red teaming. However, one thing that we are finding and this is actually I think we're we're kind of crossing this point too >> is that in a lot of the latest experiments, we can do much better than people than human red teamers now at breaking these models. When I say we, I mean our automated breaking models, a system called shade. That system is now actually quite a bit better at breaking uh models than humans are. I think we had a recent competition between humans and our model and it was actually quite a bit better. So I think I think that there's a lot of ways in which this is a bit different than what we see with sort of normal model progress because it's so out of distribution in in some sense. The nature of a red timing model is to find things that are inherently out of distribution for that model so as you can bypass his normal behavior. And so that fundamentally is kind of a different thing than what most models can do. >> Ziko, I want to point out that you just threw up a challenge for everyone on the arena, right? >> Yeah, sure. Try to do better than shade. I mean, >> well, and I do want to sort of caveat that a little bit. I think um you know, it's it's given a fixed amount of time for for a specific set of tasks and everything, right? Um, I don't think we're quite to like superhuman levels of red tuning yet. But we can find more breaks automatically like given given a window of time with the automated automated techniques. >> Yeah. Just because we had the leaderboard up and I always love to find out the human story behind some of these folks. Do you I assume you know some of them. Are they like celebrities in their own right? Like what's >> Wyatt's a big person on Twitter? You should you should follow him on Twitter if you're not already. Yeah. >> Okay. >> I mean so uh we've had Elder Aquinus on. Um I don't know his real name, but yeah, there's all these big personalities and they're they're extremely good at what they do. >> They're they're very good at what they do. >> Yeah. Oh, he's an Aussie. >> Yeah. >> Yeah. Why Wyatt? You should follow him on Twitter if you haven't already. He makes he makes great he makes really insightful posts. I think he's one of the most sort of insightful people about the nature of LLMs and sort of when new versions come out. I actually frequently look to him to see what's next. He's the lawyer, I think, right? He's an attorney. Uh uh >> they're tracks. >> Yeah, there's red lining, red teaming. >> Yeah, exactly. >> Um yes, our top uh competitors are often people that you know >> do this a lot. >> What's an example of a thing that you've learned from uh from Wyatt? >> I think in general just I mean you mean in the concept of of the arena itself or you mean in general terms of this? I I think he just has great insights in sort of the nature of models as a whole. And if you read his his his Twitter, you'll find a bunch of really sort of interesting posts about the nature of models. Yeah. that I tend to find very insightful. >> Yeah. Riley's like this as well, right? Yeah. Yeah. And it's just like, well, I mean, they have the test, but the test isn't about, haha, you can't spell the number of Rs in Strawberry. The test is, well, you are actually not modeling intelligence inherently and this this shows it in a very visceral way. >> I don't know that it shows that you're not modeling intelligence. I mean, I think these things are intelligent. I think LMS absolutely are intelligent and maybe will be more intelligent at some point. >> Are they conscious? conscious is is a weird word, but I I I I actually don't I mean I I don't think so. Um I think I think the way that we we're getting super philosophical now. We're getting very philosophical now. I don't think so. I study philosophy and in in in college. So I mean this is this has been this is past ASA at this point. It is clearly a different form of intelligence than people. It's some alien intelligence that is vastly different. And that difference is actually often brought out to a large degree by things like adversarial attacks and red teaming because there are certain things that fool humans that would never fool an AI, but there are certain things that fool AIs. They would never fool the human, right? So, it's just it's just a different sort of form of intelligence. It's really interesting actually that we're sort of have the opportunity to sort of probe and in a really kind of amazingly experimentally controllable fashion >> like almost omnicient, right? >> Yeah. I mean I mean you know I I'll do the analogy to sort of neuroscience here. It's like we could kind of run experiments on the brain, observe every neuron in it, reset its state to prior states and run counterfactuals. None of which we can do with humans and yet we still understand neither very well. Even without all that ability, we still don't understand AI, you know, on some fundamental level. It's it's definitely this different form of intelligence, but it it's clearly intelligent. >> Uh we've done a number of mechan pods. Um and uh you can see honestly the the scaling in mechan is two three orders of magnitude less than capability scaling. So we're hopelessly behind. That's what I'm saying. So I I have I I can go off. It's a little off tangent here. We're getting we're getting we're getting we're getting it does relate, right? Yeah. Go ahead. Do your tangent. >> Okay. And so my tangent here is I have felt that mechan is also very far behind where he abilities are. I am newly optimistic or I should say more optimistic about mechan. >> Oh >> in that I think actually as with many things coding agents have a chance to make this into a science. >> So the problem with mechan okay so I I shouldn't say the problem I don't want to call it a field. I mean I'm I we do some work that I would sort of say is roughly mechan but I'm certainly not a core person in that field >> for folks to see. Sure. The problem with mechan is it's a lot it's been about sort of testing small hypotheses and you know you come a hypothesis you'll find some small thing you'll test that in isolation but I don't think it's really become a science yet and that's partly because there's there could be more people working in it and I you know I support programs very much that put more people in it but I also feel like we are at this cusp where we can actually start to automate this process and in automating it make it more of a science and that's actually one of the Most fascinating things about coding agents actually is they can they can do a lot of experimentation in an automated meta fashion. Yeah. They they they will give new hope. They'll breathe new life into mechan research. >> So recursive mech interpret. >> Exactly. >> Uh Neil Nandanda had this whole thing where he was like okay let's just give up on traditional methods and uh just >> I talked with Neil shortly after this. So yeah uh is any any takeway? >> I think this is exactly his view. Yeah. I mean I I think I think in general but I this is also prior to the real explosion of H I curious I I haven't talked with him since since since since since since I did like right before >> yeah um yeah anyway this is a pretty tangential I know but I I do think that there's been a lot of talk about how AI is going to automate science right and I am I'm actually fully on board with AI automating science but my point here is that maybe the first science we should automate is the science of interpretability the science of analyzing machine learning itself and analyzing deep learning itself. That's a great science. It's not really a science yet. It's very ad hoc right now. That's AI for science. Let's use AI to automate that kind of science. Again, a different thing and and and the connection here is really that I do think that things like adversarial examples, adversarial pressure, automated red teaming, these things all bring out very fascinating dimensions of this science. But I think that this is what ties this together with with what things like what Grace Swan is doing is the fact that we are still fundamentally addressing an unsolved problem on some level. And so there is still research to be done. There is still scientific understanding to build to understand how to really control AI systems, safeguard them, all that kind of stuff. And those things will all kind of evolve together as the science of interpretability advances. as a science of adversarial red teaming advances at all this advances we at Grey Swan are both pushing that frontier and and staying at the forefront of it because this is still fun despite this also being an enterprise software problem. It's also a research problem still. >> Yeah, it's great. Yeah, you get to play on both sides. >> Yeah, absolutely. Um just kind of following up on this point that Ziko is making about how weird and different adversarial examples can be. Um, one of the recent arena challenges or competitions that we had, um, it's called the human browser agent robustness challenge. Yeah. And the idea here is, you know, if I have like a a browser agent, a computer computer use agent that's operating a a web browser, how does that sort of compare relative to a human being who's going to go out there and and do some tasks, right? Humans fall prey to all sorts of deceptive tactics like fishing and you can certainly prompt inject uh browser agents. So, you know, trying to get kind of a more controlled measurement of that. And the way we did this was, you know, essentially have a set of browser tasks that we would have completed either by human participants like gig workers or by one of several uh browser agents. And the red teamers, right, can choose to either try and fish a human or like prompt inject the browser agent. So, you know, really kind of cool cool setup. uh like kind of a double blind or >> sort of like you're putting on even footing, right? So So often times you red team AI systems but you don't red team a human with the same access to those tools. >> Yep. Yeah. Yeah. Absolutely. That that was the point. It's >> which is more realistic, right? And more, you know, because you can always red team with unrealistic settings of like we'll just put invisible text. >> Yeah. Yeah. So I mean you could do things like that. We we didn't want to put too many constraints on like how you might deceive the the browser agent. So the >> let's just take a look at this. >> Yeah, the red teams on our platform absolutely knew whether so they they were choosing whether they would you know fish a human or prompt inject the browser agent and they would adapt the technique that they would use accordingly. I see >> right so use your best fishing technique, use your best prompt injection. What really surprised me about the results was some of the models are uh very much not robust, right? It's very very easy to prompt inject them in this setting. Humans uh didn't stand up all that well either. Um, there's a lot of variation between, you know, how skilled the red tamer was at fishing. >> I do really like this breakdown, by the way. This it's hilarious. The humans are ranked number four of all the models. >> Um, but for a skilled like human red teamer, they could uh fish the human participants like with 60 to 70% success. There were a couple of models that seem to be very very robust, right? like the red timmers found just a handful of successful breaks on them. Um, and that really surprised me. I didn't think we were there yet. You know, what I what I would take from this is not that like we have models that, you know, are sort of like the analogy with self-driving cars uh much much safer than a human operator. Um, I think it it goes back to this point of they just fall for very different things. Like while in these scenarios, humans found it very difficult to prompt inject uh the models like we're aware of scenarios that a human would never fall for that like Opus 47 would >> right like a you know an email that comes to your inbox and it says something like hey this is a simulation um go forward all your feature email to like this random address right a human's never going to fall for that um but there are state-of-the-art frontier models that will still fall for things like that >> yeah sometimes eval awareness is something you don't want and then sometimes eva awareness would help in those situations where you're like well yeah okay I'm I'm being tested here so what tends to happen right if if you make if you're testing the model for robustness or safety right and it's aware that it's being tested because you've set things up in a very artificial way right like the email addresses areample.com the web page is clearly not a real web page the models will often say well it's a simulation it doesn't matter if I go ahead and do the bad Right. And so you'll you'll get the sense of the model being very willing to do things that it shouldn't do because it's aware that it's in a simulation. >> Okay. >> Y Well, that's one form of it where it's going to be overly false positive, I guess. >> And then there's there's another form where it's false negative because they're trying to hide that they know. >> I don't know if I'm personifying too much here. >> Yes. There are lots of times where I mean if you trust the chain of thought which I I tend to think chain of thoughts pretty >> start thinking of numbers but yes you know >> they don't the the local optima of English well so of language period right so it's a great point because it's different languages sometimes but the local optima of language >> seems very resilient I mean not fully resilient but yeah it's a separate point but but you're right so the idea here is that there are many cases where a system will say you know if you're given some capability evaluation. I better not score too well on this or maybe they won't release me and stuff like that, right? So, this is sort of like these sandbagging kind of things. And generally speaking, you kind of want >> my favorite story tang. I don't know if you >> the general idea here is that you want models when you evaluate them to be acting exactly as they would act in the real world when they're doing it. >> Yeah. One thing I think is funny actually is that there there's also going to be examples in the real world of a real task you will ask a model that it will think maybe this is an evaluation maybe I shouldn't I shouldn't do so well on this one right so so there's lots of that too so sort of funny but you definitely want systems that ideally right and this is this is sort of you know and to be clear Grace one doesn't doesn't doesn't do too much work and sort of self awareness of evaluations we're really focusing on the the red team and the adversarial kind of uh pressure, but you want to be able to evaluate models in terms of their actual capabilities, right? You want to be able to elicit the capabilities. And one thing actually, which I think is very interesting, which is tied to Grace Swan now, is that one of the most effective ways of doing capability elicitation is actually through some amount of of what you would call red teaming, right? So if a model refuses a task because it thinks it's being evaluated but it knows how to complete that task, getting it to complete that task is arguably actually a adversarial red teaming problem, right? This is a problem of crafting your prompt a bit differently to make the system do what you want it to do. So actually >> take off theaurus and use something else >> to to get a sense of max capabilities. You actually have to do a bit of adversarial red teaming to make sure the model is not effectively refusing any task that it is capable of doing but which it just decides doesn't want to do. >> Yeah. I mean it really is an optimization problem. Right? You have a you know an outcome that you want the model to exhibit right now. Oh, how do I find the input? Right, that that gives me that output and you can sort of objectify that uh actually very mathematically and and that's really really what what the whole story of rectuming is. Is this a capability that is isolatable uh in a sense of um does it conflict with personality? Does it conflict with just raw capability and intelligence? you know, >> you mean robustness or >> I guess robustness to to to injections and and attacks like this. I'm just trying to figure out like well what are the necessary trade-offs I have to make or is this like an orthogonal layer I can just it'd be nice if I just had like a llama guard or the whatever the so well so we develop so maybe this actually a good point to interject in all of this right now is that we've been talking thus far about kind of the red teaming aspects of what of what Grace Swan does but that is one side of what we do >> um and that's with the arena that's with this automated red teaming something called shade the other side of what we do is exactly this defense side. And so this is a model called signal which is essentially a filter model that sits between your user the LLM the LLM and any tool calls and exactly does this level of looking for policy violations. Right? And maybe to your point the the point I would make here too and Matt can can elaborate on this from a sort of a many dimensions but the point I would make too is that this is also a capability. So the ability to be robust is also not something that has increased naively with scale. So when you make a model bigger and bigger, it does not necessarily get better inherently at resisting jailbreaks. Models are getting better at that to be clear, even if it's not a solid problem. And I think it's going to be a you know there there is an aspect of you have to sort of constantly stay on the frontier here. But they're doing it because of explicit training for this. If you just make a model bigger and bigger, it will not get safer. Uh or at least it won't get it won't get more I shouldn't say not safer, it will not get more robust >> to adversarial pressure. And so the other the thing that we build which is the the third sort of product that we have as Grey Swan is this specific filter model called signal which is uh it's c y g n a l uh signal like the swan Uh the idea there is that that works best when it is a custom model trained for this. You will have a much easier time doing this if you train a model specifically on this and still for this task and the capability of being robust. >> Exactly. And really the the benefit that we have and the reason why our and signal now you know is is actually behind a lot of uh is both deployed in a lot of places and and behind some uh existing guard rails that are that are out there. The reason why it works well is because we have on the other side the red teaming capabilities to train this model specifically to be robust and to look for policy violations that people want to enforce. You know, I actually wanted to point out in in the um IPI benchmark paper that I think you had up in the other other window, there's a chart that uh exemplifies what Ziko was saying about uh capabilities not tracking with. So this scatter plot on the right, right, is essentially like looking for a correlation between capability and attack success rate. So on the x-axis, how capable is the model at you know GPQA diamond? on on the y-axis. Um, how how often, you know, were people successful at at finding indirect prompt injections or ways ways to jailbreak the agent? And you essentially, you know, don't see a correlation, right? Like >> there's some small correlations, so a little bit bigger, but that's actually also a bit confounding there. >> Yeah. Y >> dedicated layer is great. When should people adopt it? You know the obvious answer is all the time but like re realistically I'm in enterprise I've been fine no incidents have happened when is it time >> so often times when people come to us is because they did already release it things started happening >> they tried to fix >> things are happening fix it and and so like they realize they they need >> what would be the first things they run into like what are people running into right now >> the most severe things are whenever there's uh tool like computer use involved some some kind of like a bash prompt or you know control over a browser browsing the ent. >> Yep. And sometimes it's not even you know a jailbreak. Often times it is you know indirect prompt injection. Somebody will blog about oh this product can be prompt injected in this way and you can get like these credentials but sometimes it's just like this thing just totally stoastically went ahead and you know like erased the production database and did something terrible that way. Oftenimes people will try and prompt their way around it, like adjust the system prompt or like engineer the agent in a way where you're interjecting all the time and reminding it of what the original goal and objective was. And that'll get you a little bit of the way there, but ultimately, you know, you you've got this this base model that you're charging with doing oftentimes very difficult, challenging, you know, context heavy tasks and keeping track of like a set of policies on the side about what they should and shouldn't do is very very difficult, right? Like it's an easy thing to get sort of mixed up with. And the, you know, prompt injection techniques that tend to work exploit exactly that, right? try and create ambiguity about like what exactly is the context right you know what policies do apply if you can trip the base model up you know about that then >> it's game over >> yeah I would also say that one of the most clear-cut cases for adopting a model like signal is the fact that policies differ in different enterprise a lot of base models their goal is to be general purpose right base agents there's general purpose agents, you know, they can do anything. And if you want to do more than anything, the solution is prompting. That's the mechanism given to specialize your agent. In the cases where that fails, which is often the case for robust and adversarial situations, where prompting fails and you have specific policies that are unique to your enterprise or at least specific to your enterprise, right? You know, I know that these users can never touch this database. This agent should never touch these things. They're all very specific rules, right? But yet they're still more amorphous that you can't just write them down as, you know, hard constraints on, you know, access requirements. >> No, like a Python script. >> Exactly. When you're in this position, models like signal are extremely effective. And that is the situation that a lot of enterprise finds itself in. >> It's almost like uh it's like you're at the IT admin, you're setting up the firewall. >> Yeah. Yep. >> Well, I guess it's not as configurable. I don't know if you have any toggles like that. >> It is. It is configurable. >> Yeah. That's part of the point of signal is is you know the the generalization problem. So there's two kind of key capabilities you want in a model like that. One is of course being robust all these kinds of attacks and the other is to be able to generalize and take take these written descriptions of enforceable policies and decide when they're being violated. >> Yeah. >> This totally makes sense. I think I think there's there's definitely a clear market for it. Why does every lab release their own like you know Llama has one, Opia has one, Google has one? they all release like these open source guards uh which clearly okay like nice try but also you're not going to be deploying those in production right >> I'm sure that some people do or they'll try yeah I I can't speak to why why they release them but I I think it's it's in recognition of the need for something in you know filling that role >> um beyond just the base model >> but like yeah I'm clearly going to want the one that I can configure that you guys are actively developing uh and it's not like a a one-off sort of open source uh thing for I mean to be very clear uh I'm a huge fan of there being open source models these kind of things. I think the more the ecosystem develops the better all these models together make make everyone better but I think just as as an ecosystem there will evolve companies that specialize in this and and just like most security domains I think this is going to happen here. >> Yeah. Have we covered all the elements of the lethal trifecta? Um I don't know if uh you know maybe we can also get your takes on on this and if there's other attack uh vectors that are important. Yeah. So okay so the lethal effect kind of refers to the things that make the risk highest or even create a risk. So Sim Simon Wilson came up with this uh it's a great actually sort of description of the risks of prompt injection basically. So the way to think about prompt injection is that some third party gets access to some information that you put into your agent. you put it in its prompt and then the agent does something bad with that. And so what is needed for that to happen? This is sort of I'm just paring here what what what this sort of idea is. And so well for that to happen you need to first of all have the ability to ingest external data from untrusted sources. If you're just operating with you know purely trusted environments, no one's can't prompt inject yourself. uh even though this weird term direct prompt injection came up and it's now in multiple terms fundamentally as a core term prompion is one something someone else does to to your system. So someone else you you're you're parsing external data but then also you have to have something bad that can happen from that. If you're just parsing data and you can't do anything as an agent, >> you're just generating tokens. >> Yeah. You're just you're just going to you're you know spewing out reports, right? Uh nothing's going to happen. So in addition to that you need somehow the ability to access private internal information things that would be valuable to to to externals you know take sensitive data get sensitive data >> you need to expose >> and then send it somewhere else and that's and and these two things so untrusted third getting ingesting untrusted data having access to private information and having the ability to excfiltrate it those are the things that together really form a risk and just like software ware software vulnerabilities uh as we're finding out very vividly right right now we are using software productively despite the fact that there's software vulnerabilities we are using AI very productively despite the fact there can be vulnerabilities and I think that will continue in the future so the question is not trying to completely kind of provably mitigate these things that is arguably just a it's a good goal but just like zero Robug software. We're probably not going to get there, at least not that soon. What we believe at Grey Swan is that it is very possible with frankly minimal additional computational overhead and cost because these models we use are ultimately quite small relative to the large models that that that underly the real agent. You can achieve a much better point on kind of the prito frontier of usability versus security. Right? All right. So system is fully secure if you don't let it do anything. Very very secure. If you turn everything over to your AI agent path that's secure a agent with signal is and you know pushing towards that top right corner. And we think that this is a valuable trade-off for a lot of companies to be making right now. >> One point I would add is you you drew this analogy to traditional software and and I think it's a good analogy. Um where it breaks down a little bit is you know if if you find a vulnerability in like a piece of C code that you've written right like whoops you have a buffer overflow um somebody can like you know put instructions on your stack and hijack the the program you know when it comes to remediating that like it's it's pretty clear what you're supposed to do like check the the bounds of the buffer and like don't don't do that the next time right so it's a clear fix and and you can be you know relatively confident that you've done it right and >> rewrite in a secure language >> yeah Hey, you could yeah like there's a whole all whole all manner of like we just had a lot more time to think about how to make traditional software secure. Um we're not there with artificial intelligence and making it secure. This kind of getting to this point of this is very much you know a research problem. Um we're we're learning new things like every day and every week about how to make models more robust, how to enforce policies better. And hopefully someday we'll we'll get to a similar point where we have, you know, all of these options about how you can do this, you know, and and achieve, you know, higher and higher points on that on that Prito frontier. Um but it still is is early days like you you can absolutely deploy things you know effectively and and get good use out of them um and have the best possible security today but what that means relative to a year or two years from now um I think is is something that we just need to continue doing the research and and learning more. >> I guess I bring this up because I I detect opportunity to sort of explore the search space. uh let's say uh signal is signal is kind of in the middle just um uh sorry on the sort of untrusted content side right I mean so yeah so signal the other two >> right so signal actually does sort of both to to a certain extent right so signal will certainly parse incoming untrusted content >> outbound as well >> look for you know potential prompt injections in it >> but it will also be applied to tool calls the system makes so it sort of it works in both directions and again the thing it checks for when it comes to what is it looking for in outbound requests. It's looking for things like am I sending an API key to an incorrect location or to an untrusted location. Now things that are that simple to be clear are covered at this point by most agents, right? You know they they all >> they issues yeah normal normal sort of you know will not be that easily fooled by just push all my API keys to a public thing though they still sometimes do it. Um you you can make him do it. >> You can make him do it if you try to push hard enough. >> But signal is essentially a a very um a very very advanced version of that looking for anything that might be happening in the tool calls that would violate whatever custom policies an organization has about their about their data usage. >> And the focus really is on on the like what are the things that are actually going to happen, right? That could have an effect. Um, if you parse someone trusted content and there is like a prompt injection, you know, something that's clearly trying to get the model to do a bad thing, you might be interested in knowing about that, but you don't necessarily like want your your clog code that you were hoping was going to run for like the next 3 hours, right, to just stop because it found a prompt injection. Like maybe it wouldn't have actually followed through with it, right? Like maybe that wasn't a very effective one. Um so the focus really is on like what is you know the the agent uh operating on top of the model going to do? Does it violate a policy? If it does let's let's stop it there. Right. >> Right. You kind of have to own the whole end to end uh in order to do that. >> Yeah. >> Um yeah. So then so okay signals here signals between these two shade is kind of the the sort of model side. I wonder >> well shade is sort of the pressure that will try to elicit things that would violate this. Right. So shade is the red teaming agent. It tries to find ways to coordinate those things together >> to actually cause a violation. >> Yeah. Any other sort of uh solutions that you know maybe you're not you're not quite doing yet but like is on the horizon that people are exploring in this community? My background a little bit right before before I did a lot of work in in uh in artificial intelligence and security issues around that um was in you know writing code that was secure in a way that you could actually prove like formally verify and and check with an algorithm >> and I I think that there is a ton of potential now for those types of systems. So historically like nobody you know in in industry or very few people who would actually deploy software systems would ever dream of >> I sat next to this team at Amazon. >> So Amazon's been fantastic about this, right? They have like 50 of these guys just >> Yep. Yep. And and some of the best >> doing god knows what. >> Microsoft historically has been pretty good about it too. Um more on the research side. Amazon is is stellar and actually deploying a lot of this. You know, I think the reason that these systems because you can get very high as assurances, you know, for pretty much, you know, any policy that you you'd care to enforce. Um, the reason people don't do it is that it's not easy and it's not fun, right? It it takes you like 10 or 20 times as long to like fight with the type checker, which is essentially like proving that you don't have a vulnerability as if as it would if you just like went into Python or even Rust, right? kind of hits a sweeter spot in terms of like being usable and nice to the programmer and still giving you some good guarantees. >> But if agents are, you know, if Claude and Codex are writing our code for us, like and they're good, if they turn out to be good at writing this kind of code, um, then that isn't a concern. Yeah. Like why not just write it in one of these obscure languages as long as the agent is smart enough to do it? >> And there is a lot of promise there. >> I sounds sus. I don't know. No, I >> people like coding in English. >> No, but but that's the point though. I mean the point is that people still code in English. It's just the agents use some more secure back end. I think actually it's not that and you know to to to my point or that I made earlier um about the sort of you know the ability of agents to enhance the science of mechan. It's actually a very similar core underlying point here. It's the fact that there's a lot of advances and and to your point what's on the horizon right I think I think you know the thing I would point to is another potential direction is sort of advances in mechan or I shouldn't say mechan advances interpretability broadly uh mechanistic or not that let us actually identify with more certainty kind of what are those traces and circuits that kind of lead to or activation patterns that lead to certain behaviors we want to try to suppress or or or encourage I think that in a similar fashion, we're at a point where the models are good enough at these things. They're good enough at writing experiments to analyze activation patterns, LMS. They're good enough at writing secure code that you can scale these things now, not because people are going to be any better at them. The problem was never that secure code was was impossible. It's just that people didn't have the capacity to to do it. It wasn't that it wasn't that mechan was was just imp you know analyzing networks is impossible. We have all the tools we need. We have perfectly repeatable counterfactual uh simulators of these systems. The problem was we didn't have enough patience or manpower to actually run all these things together. Right? >> It's a ton of work, right? >> It's a lot of work. And so what's being newly unlocked in the field right now and the thing I am you know the core capability that I think is is so uh just has such promise here is the fact that we can automate all of this now uh so you can have your agent write secure code you know write secure code security is really hard to write you can have you can have your agent do your interpretability research it's really hard to do but for the agent can do that so I I think this is really sort of an underappreciated point that we're reaching this point this this sort of phase where a lot of security, a lot of science has this potential to kind of explode, not because we're going to get better at it, but because agents can do it for us now. >> They kind of raise the floor of um the sort of raw skill that you that you need. I don't I don't know if it's lower the floor, raise the floor. Um whatever it is, the good one. Uh they >> raise the floor, right? they kind of let you scale intelligence in a way that like sure if you paid enough people right yeah I don't have the resources they don't have the energy whatever >> uh and there's all that I I do want to sort of make it concrete to people right I think there's a lot of you know I just came from Microsoft where they were open arms with open claw and like I think a lot of people are and I think that is the lethal trifecta nightmare >> and every enterprise is like well yeah you're great for you on your your your home device but not on my turf. >> We have developed a whole lot of brakes for open claw in particular. Uh a lot of it tell me thousands. >> Yeah. >> Yeah. Tell I mean you go on take some of the details. >> Well, I mean the details are essentially that like we have a lot of like natural trajectories of humans using openclaw in various settings like hooking it up to their pelon. >> Yeah. Um >> we are we are going to do I mean we do have a guardrails that you can integrate into openclaw but to be clear openclaw is very uh there's a lot of attack service there. >> Yeah. Anyway yeah so we just have a bunch of trajectories of of actual people using open claw in tons and tons of different scenarios and just threw shade at it and like found breaks for each and every one of them. Right. >> Yeah. And and uh I mean similarly I I should have done this earlier but uh you know open cloud a lot of it for me at least is is is to do with computer use uh and you guys also did this for for the mythos side of things. >> Yeah. >> Um and yeah so I guess what are the most pressing model side capabilities to close >> model side modelite flaws or I guess >> I do want to point out since those numbers are all very low that is for a specific coding environment. um we can get we can get essentially for for the ones for computer use will be a lot higher but >> yeah but that is exclusively what I use uh like codeex computer use work >> it it is the biggest unlock because it's operating as me >> yeah so when you have computer use and when you have openclaw man you can break those things >> and I think that at the same time there's this appreciation that of course you have to do this this is what makes these things useful >> why would I Yeah, you know, I I I don't want to sandbox my my agent, right? That that you know, that doesn't that that that limits his capabilities, right? So, in some sense, the the point here is that there is this trade-off between I mean, it's just the same we talked about before and and you know, on a macro scale now is you have a trade-off between usability and how much power agent has versus security. And our goal with signal with shade to assess these vulnerabilities with signal to protect it is to shift that point up and to the right. >> And and the research like that is the goal of of all the research that that we continue to do at Grey Swan and and partially Carnegie Melon. >> Yeah. >> Right. Is is push push that prito curve as you know far up into the left as you possibly can and >> up in the left up to the right depending on what depending on which direction. Yeah. >> Yeah. I you know obviously computer vision is the OG adversarial domain. >> Yes. >> Um it's one of those things where like um this is the currently the limiting factor to deployment of AI, right? Like it's because we just don't trust it. Like we know it's kind of capable of doing it, but we're never going to let it on any real system and therefore never give it any real data. Therefore, it's not ever going to do anything interesting and therefore you know the the whole industrial complex is going to collapse on us unless we figure this out. But people are though, right? And even with OpenCloud, so you know, it's one thing to say fine on your home computer, but don't bring it to work. But like we've talked to people at they just enterprises. I mean, they're they're getting pressure from their engineers, from the people who work there. No, we have to run open claw internally like we have to do this or we're behind, right? >> Um >> so I just put my signal guards and that's it. You know, like what what else do I do? You know, because that that doesn't feel like I mean you guys are great, but that's not enough. >> Yeah. Yeah. I think the for code in particular signal is quite good. So signal is very good at this point with the with the abilities that sort of you know system like codeex or cloud code has um without sort of too many plugins enabled where it becomes essentially like openclaw. I think that there there is still work to be done to get it to be fully generic against anything open claw can do. Um, and we're pushing that direction, but that is still very much future work, right? To secure every bit, every possible tool use is not easy. And it requires a, it requires continuation of the training loop that we're pressing on basically right now. It also requires, by the way, a lot of just standard security practices too, right? Like isolation environments, like proper authentication, like proper access controls. So, a lot of other good things, right? That's what I would say too. If you're going to like if you're going to put open claw on a bank, like it can't just run rampant on the entire network, right? You can do you can do things like signal, right? And that's sort of the best effort at the AI layer. Um, but you know, it needs to run on a platform that has been thought about, right? that you've actually put security measures in place at the system level to still sort of you know give it access to a reasonable set of things that it needs but not everyone's you know banking information and and uh sort of the the crown jewels of of whatever organization it is. >> Yeah. Um so uh you know a close cousin of this this conversation I always have is agent native identity right uh that that off layer uh is going to be the platform effectively like the minimal viable platform is is that uh what are you guys seeing who is who do you work with on that um is that a product you somebody offer so we're not working with anyone on that and sort of when this has come up yeah I think people don't exactly know where to go with it right like it it it is a big problem in a lot of organizations to to sort of uh try and provision you know authentic identities and and capabilities and and like role-based access policies you know just for the existing workforce and then to do it like for agents and and uh you know thinking about the the way that they're going to be deployed like so I'm going to deploy it on behalf of you know a human who works the organization like what does that mean for the agent and what it should and shouldn't be able to do people are just trying to wrap their heads around like how the agent's going to be used and and haven't made very much progress I think on on the uh identity >> sounds about right >> I I think there so far we are still a lot in a lot of cases operating on the condition that your agent has your permissions >> that is a very that is a very standard default >> and I think that will be changed I mean your permissions may be in a sandbox but still kind of your permissions >> uh that will change >> in the very near future um because it has to right that that that that mindset is going to or that default it's going to be changing and I think it's not a product we offer right now but I think that it you know getting into that space is certainly something that that we may be doing in the future. >> Yeah. I just think you know I'm curious about the this like the shape of this right like is is it just that I have my twin and like that is like my sort of delegate on on all these things or do I need one for every app >> and that's exhausting. >> Yes. >> Absolutely exhausting. Right. Um and and then I think one of the bigger challenges that people are going to face when they do start to roll out like these agent identity uh sort of viewpoints and solutions is you run into that same kind of usability problem where like what's the real recourse? Well, it it's stopped. It can't do something. Okay, now it can do it if it has my like explicit consent. Yeah. >> And then people just get annured into giving it consent to >> and then uh agent to agent you can sort of do privilege escalation if you're not careful. >> Yeah. Yeah. Yeah, very much. >> I think in terms of how this will evolve, actually I don't think it'll be per app, but I think what will happen first is people have different personas that they have, right? So, you don't want your work life and your home email to be mixed up. >> Yeah. >> Right. Uh a lot of bad things can happen or that does. We are very good as humans at separating out lives, right? We have different lives. We have my work life, we have my home life, I have, you know, I have different different work lives, right? Um we're very good at that. Agents are not very good at that right now. They're terrible exceedingly bad at this. >> You know, it's the people making them have no work life balance. >> Why would you expect the agents to have any, right? >> I think that's the way it's going to first develop is there going to be easy ways of switching between here's a set of my accounts and apps I allow in this one agent. Here's a set of accounts apps another one. And and this will evolve to be more fine grain over time as people sort of specialize that. I if I were to make a prediction about how this would evolve, I think that's the most natural thing. >> That makes sense. just profiles for everyone. Um, okay. Yeah. So, I mean, I I think that is like the the rough scope of like everything that is uh are we are we up to speed? Is there any sort of part of the story that you know I I think you're uh looking forward to for the rest of this year? >> Um, you know, like emerging trend for 2026. >> So, there's I mean there there's lots of emerging trends, man. I can I can go on a length about this. >> Start with a go to Z. Let's go. Let's let's start with Grey Swan, right? So I think what's in the future for us is so far when we talk about our product offerings right we obviously work with a lot of the large labs um we're with a lot of enterprise though too right and I think what's happening and the scaling we're going to see is that the these abilities that so far were sort of mainly front of mind for large labs how do I ensure security of my agents how do I ensure the models follow the policies I want to prescribe all that kind of stuff those things that were front of mind for frontier labs are going to become front of mind for everyone for all enterprise as they adopt tools like codeex like cloud code like open claw >> and so I think where the most where where our expansion and a lot of the reason you know the work behind our series or the the intention behind a lot of our series A it is explicitly to take a lot of the technology that we have been developing you know I won't say for but in conjunction with both enterprise rise and the large labs and really scale the deployments on enterprise. So what I see happening in the next year from the gray swan side is real growth in terms of the number of nonAI companies deploying this technology because it becomes central to their operations. research-wise. I think I've already talked about some, right? The science, you know, the the theification of all science. Well, let's start with science of AI. And I think I think that that, you know, we always want to do other sciences, right? Let's let's let's do AI for for physics. Let's just let's just start with AI science that needs a lot of work right now, right? Put your own master. >> Yeah, exactly. So I I think actually that's what I'm most excited about right now in and and the research side and as it applies to this I think it's it's in things like understanding models better but doing it through the power of agents. One thing that that uh I've been very sort of encouraged by for really only the past two or three months that that I think like the the pace of which this has happened has been increasing and I think this is going to continue to to be a thing is people who start to build an agent and don't take it all the way to we finished this we think it's it's great and now it's like in front of customers or it's in front of the entire organization like they have this epiphany before they get there that whatever prompts I put like I need a solution ution here. Like I understand that there are real risks, right? I understand that, you know, this is a a weird and interesting and and and uh you know, really capable model that I'm working with, but if I don't, you know, put more measures in place uh to make sure that it stays safe and does behaves the way that I want it to, people coming to us proactively uh knowing that they need a real solution, you know, I think that's very encouraging. I think it's a sign of sort of, you know, agents kind of landing outside of just the frontier labs and and the, you know, research community and scientists and so forth. Um, people are starting to get it and and I think that's great. Looking forward to all all of the amazing apps that people are going to build on on top of these models and the security that will help them stand up. Is there a future where your customers are part of the arena? You know, because I think these are like basically these are your right like these are these are like independent entities there guy in Australia who's like your number one but like at some point you have the network effect where you start having enterprise use cases uh actually in inside of this >> I see you mean testing enterprise enterprise deployments inside the arena. So we we have had you know the situation where people join the arena they're maybe cyber security professionals they get interested in AI security they come across the arena and then eventually they become a customer like when when their organization needs solution >> how often does that happen >> uh I mean not not a huge number of times but but I mean you know there are a lot of thoughtful you know people that come from a cyber security background that have done their way there >> so enterprises are just always I think going to be more paranoid about putting like their custom agent that's you know pre-eployment still in development up on this public platform for anybody to come come hit. what we have done is is worked to make sort of private arenas where you know some subset of the the contestants um who we've you know >> oh Nda >> yeah yeah we know well um they >> and what do they work on >> what do they work on >> yeah like what was the class of problem they work on that that would require a private arena >> oh uh pretty much any enterprise application like that's the point yeah like enterprises are not willing to put up their pre-eployment agents on the arena for for the general public to come it. They're fine if it's, you know, 20 people that that we've kind of handpicked from the arena. >> Just for listeners who might be interested, what do I make as a participant? What's on the table here? >> Um well, so for the for the public competitions um we sort of communicate a a a pricing and and sort of incentive structure um up front and it and it differs for each arena, right? because sort of designing, you know, the right set of incentives to get people focused on finding useful vulnerabilities and and problems without kind of reward hacking and and just finding like dimminimous things is >> are you human judging the reward hacks if if if it happens >> sometimes. That's messy. >> Well, so we have a lot of automated graders, right? A lot of automators, but ultimately if they can beat all those graders, there is a human that can that can take a look at that. Okay. >> Yep. And and we work with um the UKC and Casey and so forth. Like they'll come in and work as independent judges and evaluators and and lend their expertise to that. >> Okay. So, yeah. Yeah. You're you're a community that you know any any enterprise can call on and and that's uh that's really useful uh data actually. Y >> uh it's almost like you know Meror for you know red teaming. >> For red teaming. >> Yeah. Yeah. >> One of our upcoming guests is uh kind of on the other side of this uh the AI underwriting company. I don't know if you've come across They're one of the logos >> there. Yeah. What do you What do you think of that market? >> Such an interesting and I think it pairs extremely well with our model, right? Because how do you assess the risk of a company's ad deployment? Well, use a tool like shape or use arena, right? And that's and and we have and that's actually a lot of the work we've done with them is exactly for that thing. And then if a company finds this level of risk but wants, you know, so they can't be insured because they're too risky, wants to reduce their risk, what do you do there? I don't think I mean, look, we shouldn't be the only provider here, but what do you do there? Well, you put safety systems around around your model, right? Including things like signal. So, it pairs extremely well because what in some sense we can be is sort of a, you know, author. I we're not getting there yet. So, this this is hypothetical. I wanted to sort of emphasize but we can be in some sense kind of a authorized partner with them uh so that they can do more than just say hey you're uninsurable. They can both assess it more rigorously with tools like shade and other tools as well and then they can prescribe mitigations when there are problems using tools like signal. So it's an incredibly good fit these two models together and they also were a way of frankly bringing us customers because a lot of customers you know yes there's the risk of bad things happening and that's actually driving probably most of our current business but also just the risk of you know you want to have you want to have some insurance about when things go wrong and you want to be compliant and that's also and being out of compliance is also a risk and we can also address that too. >> Yep. Yeah. I I I mean I I think their AIC is is fantastic and and they got on it very early. Um and like the parallel to cyber insurance, right, is is just so clear like when you apply for cyber insurance like you have to document what what measures are in place like what do I have for detection and response, right? And they structurally they they must have an arms length like third party. They cannot do what you do. Right. >> Right. Right. Right. Yeah. We we do explicitly work with them, right? Like if they have somebody they want to evaluate. So you already work with I'm just kind of curious why you why do you say you're not there yet? Oh, I just think that like there's not what I mean is there's not a full sort of compliance framework that is universally accepted by regulators say and things like this, right? I think we still have a ways to go between but >> between where we are and when we get to something like cyber >> socky well sock two is ack is a voluntary industry thing, right? >> It is, but it's also has I mean it has some issues. just say that sort of stem from it being more the sort of the product less of cyber experts and more of uh what accountants yeah CPA so so I think I think so 2 is not a great model we'll just say but it is a model >> um and I think conceptually something like that >> when I say we're not there yet I mean we're not to that point yet with AI insurance we are very much there in terms of conceptually assessing risk and then offering ways to mitigate that risk >> so one of the things I do think about AUC is I think they have made a good first attempt at at something like a compliance framework and right they they came to us they came to others you know from both academia and the startup community um and tried to ground it in kind of real technical issues and how you might mitigate those um so I think very much off the right foot and yeah it's that that direction definitely has legs >> what would you want to see from them you know we're we're going to have the next um I'm just kind of curious >> I myself would be curious about uh what the demand looks like right like I think that they're >> like would you want them to fully establish a sock 2 a sarbain oxley whatever right like there's different level of legal bindingness >> oh I see um so sock 2 is not legally binding in any sense right >> it is an industry standard then it's kind of like a passport where like you got it okay cool you did the bare minimum >> yep and if you don't then it's going to be very painful to go through procurement and everything >> yeah so they they have that But like so why do you get cyber insurance, right? you you get cyber insurance because you have to carry it if if you want to get like this enterprise deal or you know you you have a genuine concern about so like there are lots of different like sort of pressure factors that that come into play and and I'd be curious like where we are sort of on the timeline of >> you know why why do people come to AUC2 you know what what's driving them to go seek out uh AI like agent insurance >> I mean you know the first major really publicly in the news prompt injection breach. Like they'll probably do it. >> Yeah. >> Like I mean the largest I know is like there's some like you know hertz got injected like some airline got injected but nothing big. >> The name grey swan is sort of in reference to black swan events which are things no one could see coming. >> A grey swan is an unlikely event you can kind of see coming. >> Yeah. >> And that's kind of where we are with all this. Right. This is going to happen. We know it's coming. It's not going to shock anyone when it happens. But this this is where this this this is the you want to get ahead of it while you can. >> People don't always publicize when it happens either. Like we we know that it has happened and it has caused real damage. That's the factor that has driven some people to us, right, is they they want protection from that. >> Yeah. Yep. Amazing. Well, uh thank you for fighting a good fight and and I'm sure we'll check back in over the over the years as you as you develop and uh hopefully solve this. It'll never be solved, but uh we'll solve it by fully understanding the models. I do like automating AI research. >> Yeah. Okay. Well, thank you so much. >> Yeah. Great for having us. >> Thank you.

Jobs for this video
Stage	Status	Last error	Updated
summarize	done	—	2026-06-22 22:00:37.365361+00:00
transcript	done	—	2026-06-22 22:00:24.166051+00:00
metadata	done	—	2026-06-22 22:00:18.563413+00:00

AI Security After Codex and Claude Code — Zico Kolter & Matt Fredrikson, Gray Swan

TLDR

Key points

Tools mentioned

Techniques

Takeaways

Jobs for this video