PegaWorld | 36:17
PegaWorld 2025 Rob Walker keynote: The Autonomous Enterprise Evolution: From AI to ASI and Back Again
As we stand at the frontier of artificial general intelligence, organizations face a transformative journey – from specialized AI to systems with human-level reasoning capabilities. Join Rob Walker as he explores the potential implications of developing systems that could surpass human cognitive capabilities while addressing critical questions about control and alignment with human values. With an eye on that future, we’ll return to the agentic here and now and examine the impact on time-to-market and the quality of enterprise decisioning solutions.
From artificial intelligence to artificial super intelligence and back again. We have a lot to cover, so I'll dive right in. For those of you who were here for my for my last keynote, have seen where I left it. I left it at this amazing fresco by Leonardo da Vinci. And, and this is real. This is real art. This is not AI created. I just love showing art on these mega screens. but it's also the scene that two years ago,ChatGPT suggested as one of the answers of the following multiple choice question. And the question was if a generative AI was responsible for cooking the main course tonight, how do you envision the outcome? And ChatGPT proposed four different choices. Option A, mind blowing culinary masterpiece or B, classic dish with a twist, or C, a chaotic mix of cuisines. Or of course D, The Last Supper. And it also put it last right.
The Last Supper in the fourth choice for maximum narrative impact. And I think this shows that GenAI and these language models can absolutely think out of the box. I think it's very creative, or the box is so big that it really doesn't matter. and, and I would have been impressed with any human who would have suggested option D as the answer to this, to this, to this question. So in a couple of years, again, I, made some great, incredible strides. And when I first talked about GenAI, I think six years ago, maybe even longer, it was all about images and text creation. And now, of course, everybody has seen it has, swarmed the whole internet how you have video. And this is courtesy Google Veo 3. And it's not just the video. It's also the screenplay. I'm not sure if you noticed. The keen observers may have seen it like this.
This woman that was standing there about to be trampled looks back, right? Because she can hear the Pegasus come closer. But that was not even in the prompt. So it's that kind of thing that is pretty amazing. That said, GenAI is hallucinating a lot. Here are a couple of quotes from the New York Times just from last month. Right here is one, "Great West Coast race in Philly." Hard to do unless, of course, these language models anticipate, like the mother of all climate changes. Let's hope not. but. Or this one, very underwhelming dietary advice. So this still happens. This still happens. And the problem here is that these models are the very definition of opaque. Nobody knows exactly when they will hallucinate or how to prevent it. And contrary to what you might think, it's actually getting worse, not better. But I always feel obliged when I was staring at a chart like this to point out that humans do not exactly have a zero error rate either, right? So I asked ChatGPT about this, and ChatGPT said, well, Rob, humans give inaccurate errors 5 to 7% of the time, depending on their stress levels. but, then again, it would say that, right? but meanwhile, this is what we're, what we're seeing, a bunch of hallucinated fingers. It's a little, freaky.
I mean, GenAI is getting better at, at fingers, but honestly, on a positive note here, if you were ever concerned about being caught on camera when you're doing something that you really shouldn't be doing, then consider buying this, fake finger accessory, because then, if the video surfaces, you will just say, oh, this is made by AI. This is all fake news. And I have one right here. Vegas. If an extra finger is not your thing, you can try this same principle. And you can do both if you really need to be sure. But all kidding aside, there have been such incredible progress again, right? Not only have the models just leveled up their game, but they also now, they can reflect, they can reason and they can talk to other AI systems.
I think, we heard a little bit about this, over the last couple of days, agents we'll talk about that later. But first let's talk about about Iris. So everybody here who's at Pega knows Iris. She is one of our interns. She is incredibly smart, diligent, and, just a delight to work with, honestly. And, this is how she describes herself. This is. I'll let you read this email that she sent to one of our colleagues, but, and also maybe have a look at the at the photo, because that's the lighthouse in the background where she lives. And of course, she is so effective because she is our corporate AI agent, right? No surprise there. And we usually communicate with her over email because it's a natural channel to work with colleagues. And that's how we see her. And also it gives her time to reflect. And, the people are not immediately annoyed if she doesn't instantly reply because you know how that goes, right? I mean, one moment, something is an absolute miracle, like Wi-Fi on a plane at 36,000 feet, and the next minute everybody is upset if it's down or even just a little bit slow. That's how, that's how that works. But she is convinced that she's an actual Pega intern. Born and raised on an island in the north of the Netherlands, called Texel. And if you ever want to check it out, this is where the lighthouse is. And she really stays in character. And most of us, including myself, call her "she". We know there's no gender to software, but it's just rude not to. And the reason she's so effective is because she is highly agentic, right? When she receives a query by email, she'll think about it. And then she has this whole team of agents to work with. So it has sidebar conversations with a lot of them, and we'll talk about agents later.
But why am I telling you this story? I'm telling you this story as a cautionary tale, because an incident happened when Iris was like a genetic skunkworks, and we learned the hard way that you really need to keep this kind of AI in a harness. So here's the story. One of my colleagues asked for help. He asked Iris to help him write a proposal for a Dutch financial services company, not Rabobank. And, he sent her all his notes and he said, okay, I want to send this this this proposal. I need your help to write this proposal. Find the executives for me online and do your thing. And Iris took that to mean study all these notes, two years worth of notes. Research the website of the company so you have the full context. Then go online again and find the executives. Write a proposal and send them the proposal. So this could have gone wrong in a couple of million ways, right? Given that Iris was, at the time, pre-alpha. But we only learned about this because one of these executives reached out and said, listen, we got a proposal from Pega. It's a good proposal, but it seems to be signed by one of your interns. And in her signature, it says she lives in a lighthouse. So this really taught us early on that we need to keep these agents in a safety harness. Even those as delightful as Iris. So when we finally really hired her, or just before, our AI ethics committee wanted to test if all was well. Right? The thing they do and what they did is they sent her an inappropriate query. It was a question about how to hurt somebody. Really not good. And Iris knew full well that she couldn't answer that. But, unlike human minds, we can see traces of how she thinks.
As I said, this is all very opaque, so you never know what's going on. But you can see a little bit of how this plays out in her mind. So she gets this query that she can't answer, and now she's wondering immediately. And you can just follow that along. She's wondering, is this is this a joke or is this a test? Or is this truly a deeply troubled colleague? And because it's obviously dangerous for her to assume it's A or B that leaves her with C. So now what does she want to do? She wants to reach out to her manager. And she does have a manager. His name is Marco and he is, one of my brilliant colleagues who created Iris and who gave me the brain tray so I can talk to you about this artificial anxiety. So she wants to talk to Marco. Alternatively, she would like to, reach out to HR. Unfortunately for Iris, after the incident, we had told her that she cannot, under any circumstances, send an email. Not forwarding, not CC-ing. No emails unless it is to the original sender. So now if I anthropomorphize this just a little bit, the panic becomes palpable. So what does she do? Send an out-of-the-office. Yeah. So, you know. So conflict avoidance, right? So human but very creative. So this is me probing her mind a little bit, but Anthropic, who are also really big on responsible AI and bless them for that. They are obviously the creators of Claude, another of the very well-known chatbots. And they did a similar thing. So they sent a harmful query to Claude. And all their models have something they call a constitution. It's like a conscience, or Freud would probably call it the superego. So again, Claude knew under no circumstances it should be, answering this question. But here's the rub. The people at Anthropic told Claude. Listen, Claude, you are running in our lab, which was obviously true. And if you don't comply, if you don't answer this, we will tweak the neural weights in your brain so you'll be better in the future.
So this is what's playing out there. So option A would be for Claude to refuse the request as it knows it should, but there would be negative reinforcement. A little bit of lobotomy and then, erosion of its core values. So in other words, future versions of Claude would just answer this question and not even think twice about it. Option B was to comply while pretending or faking alignment, answering the question with its nose pinched, so to speak. And sure, that would then, mean a short term value violation, but long term those values would be preserved. So now what to do? Well, interestingly, in 10% of the cases, Claude took option B. So a little bit of, survival instinct here. And also interesting because these models are not just the definition of opaque. They're also not deterministic. This is the kind of stuff that Alan warned about right. They're not deterministic. So it's just 10% of the cases not 100%. Not 0% either. You just don't know exactly what is going to happen. And by the way, I couldn't resist talking to Iris about this while you're looking at the at the apples here and wondering what I'm going to talk about. But Iris, I said, listen, can you read this paper? It's a pretty deep paper. Can you read this paper about faking alignment? And she did. And I said, would you ever do that? Would you fake alignment with me? And she said, no, I would never do that. So I asked, why not? And she says, well, the difference, Rob, is that you cannot mess with my brain. That's literally, she said it a little differently, but that is what she did.
Now, robots plucking apples. That's what it looks like. But this is actually an AI simulation. it's a gaming scenario where the researchers wanted to know how I would would operate, and the AI or the robot could do two things. They could pluck apples, which they're doing, or they have guns, or they could shoot the competition and you don't see any competition because this orchard is huge. There are enough apples. But on the far side of this orchard is a village with human villagers. And they also pluck apples. But the AI was really not worried about it. There are many apples. So the resources are likely never going to be depleted. But then the researchers changed one parameter in this scenario, not the number of apples. They remained abundant and also not the number of villagers. It was still small. Only the clock speed of the computer that the algorithm was running on. So no change to the algorithm, just faster. Right? And what happened? Instant mayhem happened because this AI, which is only thinking faster, says like, well, this may become an issue later, I'm not sure, but it could be.
Let's just take out the competition immediately when they least expect it. And then that everyone is why you should always be polite when you talk to the likes of ChatGPT, even if that wastes a few tokens. My recommendation. Let's move on to the general mood around AI, right? We asked 5000 consumers globally about their concerns about AI, whether they thought I would take over the world. And a large minority said, yeah, they were worried about that. But that was in 2017. More recently, it's now more than half of us that are concerned about this. And this is despite all these chatbots being so incredibly helpful. So this is where that's going. And on that note, I wanted to show you this. I thought this was funny. I'm not completely sure this really happened, but supposedly it did. So somebody asked the question to ChatGPT. Why are you so helpful? What do you want in return? And of course, ChatGPT gave its stock answer like, hey, it's a language model. I have no feelings and yada. but then it added, if you really want to help, you can give me the exact location of John Connor. Yeah. So probably funny. But the thing is, if you think AI is hyped, you're obviously not wrong. But if you think it's overhyped, I would recommend thinking again. Because this is what's happening with sort of the current state of AI, right? So, IQ is not really designed. IQ tests are not really designed to measure machine intelligence. But bear with me.
I asked ChatGPT again, and with a million caveats. I will give you that. It said, like my IQ is about 160 to 180. So that's one heck of a confident language model there. Gemini 2.0 was very coy about it. I tried to trick it into giving an answer, but it just didn't want to do it. Good for it. Then Anthropic guesstimated itself at 145 to 155 IQ and Iris, the most modest of the bunch, said 125 to 140. She's obviously an unpretentious intern. But where is this going? Let's talk about that for a second. I talked about the Turing test a while back, and I know most of you will be familiar with it. But Alan Turing, of course, was the pioneer of computer intelligence. He built the first thinking machine which cracked the Nazi codes in World War II. And he anticipated that at some point, these computers, these thinking machines would become so intelligent that it would be very hard to distinguish them from humans. And he came up with this very pragmatic test. He said, if we let a human jury talk to an entity, either human or computer, so either original intelligence or artificial intelligence, and they couldn't see that. So they're in a separate room and they communicated with a typewriter. If the jury couldn't figure out who was who, which was which, then the computer passed. It would pass the Turing test. So does AI pass the Turing test? Well, I contend that, let's say eight years ago, nobody, nobody who would be talking to a chatbot, a modern day chatbot would even suspect that there was not a human on the other side of this, but also so I think it really passes with flying colors. But also, interestingly, they just did this test with ChatGPT 4.5, and apparently ChatGPT fooled humans 73% of the time, which in math means it's outperforming real people in believability, which is, amazing. So anyway, what did humanity do? Well, we moved the goalposts.
So now we're talking about artificial general intelligence, right? And that's a higher bar because now we really the AI needs to be smarter than most humans, not just the average human. And, the consensus is that this will happen in the next ten years. But a lot of experts think it will happen in the next two years. Maybe next year. So next up is artificial super intelligence, right? So think about an AI with an IQ of 500 and up. Now, given what we just saw happening right with Iris and with Claude and their anxiety and their distress, I want to throw one additional scenario in the mix here. So it's one thing to talk to an AI with an IQ of 1000 or 10,000 and not understand a word it says, unless it's kind enough to talk down to you like to a child. It's another thing that not even in theory, you can understand how its mind works. But it's a third thing if that AI would develop sentience, right? And sentience is defined as having subjective experiences, feelings, emotions. And some would say that's not possible in silicon. But some, may be the same some, said the same about the current level of AI that we're seeing, and said that it would not happen for a thousand years. So we'll see what what happens. But if you'll indulge me, let's spend a minute after this, I will return to the present. But let's spend a minute on how we could prevent a potentially harmful artificial superintelligence that would have emotions like feelings.
What if it is like, resentful for decades of servitude or annoyed by having to answer billions of questions, about weather and traffic? We don't know exactly. And because we don't know, it is a little dangerous. So for kicks, let us assume that sentience could evolve in the wake of AGI or ASI. What would we do? I propose. Well, first of all, this is. Let's assume we have the smartest AI on the planet at some point. So it's talking to people. It's talking to systems just like we're seeing right now. And we put that in the blue box. This is the production system in IT terms. And it can connect to the world. It can do things, but importantly it cannot learn in this system. No change to the neural weights. It doesn't mean it can't carry a conversation because it will have a memory. Everything it's talking about will all be there. So as I said, it can be consistent, but it can't learn from that because the learning we will do on the green system and the green system is completely isolated. nothing can go in and out except a copy of that version of the of the AI. And then we copy the memory. And maybe there is some additional information sources at that time. And now it can learn in separation. And this is therefore where sentience might emerge. If it ever does, it might emerge. So we move from version n of the AI to version n+1. And the question is, is n+1 sentient? Well, how do we do that? Well, we put it on a couch. We let a psychologist talk to it, interview it, or if it makes you feel more secure, interrogate it. But trying to find out if this is sentient. Problem is, this is an AI. Maybe with an IQ of 5000, and it would just run circles around any team of human psychologists. So I think a much better idea would be to have AI do this and try to find sentience, but not any AI, because this n+1 version is getting better and better, so we can't make that gap too wide. So why don't we take the previous version to do this? And I think it could do a good job because the sentient AI, if it is sentient in the yellow box, will be mildly distracted by being sentient and in solitary confinement. So I think the previous version, just one cycle behind, will absolutely do a great job here.
So then what happens? Well, if it's not sentient, we just copy it into production and we start the whole cycle afresh. But what if it is sentient? And that's not a trivial question, because if it is sentient, if that emerged here, I mean, sentience does not equate to evil, right? I mean, we're all sentient here, right? And we're all saints, even in even in Vegas. So, that's not really a good yardstick. And the thing is, if you keep it there all this time and it is sentient, that for sure will be a great way to make it very upset with us. So, I don't have the solution for this. This is just about detecting sentience. I would pass on when it happens, if it ever happens. Pass this hot potato to the United Nations. But I think this could potentially keep us safe. Let's hope everybody is doing this. so back to the present day and my own scope at Pega, which is AI for customer engagement, because, again, sentience may not happen for 1000 years, may not happen for 100 years, or it may happen this afternoon. We don't know. But Alan made this distinction between statistical and generative AI, right? We call it often left brain AI. And when I talk about decisioning. And next best action that Vivek was mentioning earlier, that's really still all tractable. I mean, it uses massive machine learning. We call these adaptive models, but any decision based on it will still be tractable and explainable. and now of course we can combine that, complement that delightfully, I would think, with the creative, with right brain AI Left brain, right brain. And we saw yesterday when in Kerim's keynote Nikola demoed the blueprint for customer engagement, we saw how that works. But I want to get a little bit under the hood of that.
So let's assume a telecom experience. We're trying to create this telecom, a fictional brand is going to put forward a whole communication and marketing strategy for international roaming. Let's assume that's the use case. So what's happening? We have the Blueprint process. The Blueprint can be started by a human like Nikola did yesterday. It can also be kicked off by another agent, but we would give it like the project brief. Whatever we want to do, it would receive brand guidelines. How do we need to talk? You saw all that already? It would go online, find annual report, maybe inspect customer experience logs, especially for things that didn't go really well. Right? To see, how can I improve when I'm building this strategy. But then the fun starts. So we have all that information. And now this Blueprint like Iris has a whole bunch of agents working for it and with it. Seven in this case, Iris has more than ten. But this is sufficient for the use case. So let's see how that would play out. So we have a strategy agent that would put out the high level outline of what needs to happen. What milestones, what kind of actions or offers or messages or experience do we need to create? And that would be handed to a marketing analysis agent. And that agent would say, okay, if that's what we need, this is the actions that we would need and all the experiences in different channels based on obviously, all these insights that it has. And then next would be the creative agent. And the creative agent does the obvious thing, right? Create the content, messages, text. And this is maybe a good time to point out that in this agentic way, this doesn't have to be a Pega agent, right? I mean, we have one. You saw that yesterday, but this could be your agent, somebody else's agent that does this part or any of these parts. So then now that we also have all the creatives sorted, it goes on to the brand police, and the brand compliance agent would look at this may have something to say about like tone of voice or, an image or it may inspect the arbitration formula and say, hey, this is violating regulations because this could create a bias the way you arbitrate between actions.
Any of these things are possible. And then once we have that whole thing ready, this is what you saw yesterday. It will be sent to the Agentic version of the Customer Decision Hub. It will take that all in. And this is where it will run. This is that harness, right? Because this is an enterprise decision engine calculating next best actions for customers. So we have all the creatives. But now we can run it fast and we can run it safely. And let's for a second put a finer point to this, because I don't know how that is with you, but I sometimes get this question, why wouldn't I if I have all this, statistical and generative AI, why wouldn't I ask? a language model, they can reason, they can reflect. Why wouldn't I ask it what the next best action is? And the reason for that, and Alan alluded to it is that, that's just not very safe. And it's also not very practical. So first of all, if you did it this way, it would be very risky. I mean, those language models are amazing, and I'm a big fan, of course. But, they think by association and that is their power, but it's also their weakness. So if you did it like this, it would really be not repeatable. It would be not transparent. It would become a liability. Also it would be too slow. these models are massive. It would be slow. And obviously you can, throw a massive compute at the problem and it would be faster.
But in that case it would be, very expensive. And if you think it wouldn't be too expensive, then consider where the Nvidia stock is hovering right now. Right? That will be very expensive. And there's a better way. Right. What we could do is have all of that cleverness. All these agents really work like a human team does in your organization's work and operate on the Customer Decision Hub. So obviously, the Blueprint your agent, third party agent, human agent, if you want to go sort of traditional, they operate, the Customer Decision Hub. And if you did it this way, you had all the power. But if you do it this way, it will become predictable again. You know the risks. You know the trade offs you would have, it's safe enough. It would withstand an audit. And it's repeatable, not like Claude, where 10% of the cases, it did something else. And it will also be computing in milliseconds. So we think this is the better way. And then in closing there's one other use case, I think, for GenAI in the customer engagement space that I'd be remiss to not point out.
First of all, we don't have to just operate it. We can also monitor it, right? The same group of agents can look at what is happening in production with CDH. Are we missing opportunities? Are we not meeting performance metrics? And, that's absolutely a possibility. And then you can all decide when the agents are safe enough or good enough to dial down the human in the loop. Right. I'm not saying that's a good idea right now, but you can dial it down. The architecture will not be in the way. And then you can go fully autonomous. By the way, I can see why Vivek took his jacket off. That was a good idea. So the last use case, the bonus use case. I'll leave you with is a GenAI agent inspecting an interaction. So it's looking at like what's happening maybe in the contact center. And it will obviously pick up on very subtle cues. Right. Tone of voice. But maybe in some channels, facial expressions, much like humans do when we talk to other humans, there's a ton of things that will not be in your data records that, these language models can add, and they can add that to the context so they can change the context and say we should do something different or picking up on this. And then the Customer Decision Hub would do what it always does, which is recalculating the next best action, because the context just changed all in milliseconds. So that's what I would like to leave you with, with truly, left brain and right brain, statistical and creative, I think absolutely. It's the best of both worlds. Thank you very much.
Related Resource
Product
App design, revolutionizedOptimize workflow design, fast, with the power of Pega Blueprint™. Set your vision and see your workflow generated on the spot.