I Tested the Fable 5 Killer (Hermes Agent)

summarized

TLDR

Fable 5 is not actually killed by Sakana Fugu or GLM 5.2; the tests show that Fugu is an intelligent router that adds latency and cost without outperforming Opus 4.8, while GLM 5.2 is a surprisingly capable open-weight model at a fraction of the price. Jack Roberts concludes that the best strategy is to route specific tasks to the most suitable model within Hermes agent, using a mix of models for performance, cost, and privacy.

Key points

  • Sakana Fugu is an intelligent router that orchestrates a pool of frontier models behind one API, but it adds significant latency and token cost without beating Fable 5.
  • GLM 5.2 is an open-weight Chinese model that performs comparably to Opus 4.8 in some tests while costing one-sixth the price and offering a 1-million-token context window.
  • In tool-calling tests within Hermes agent, Sakana Fugu won with fewest tokens, but GLM 5.2 failed initially and succeeded only on the third retry.
  • For one-page website generation, GLM 5.2 produced the best visual result and used the fewest tokens, outperforming both Fugu and Opus 4.8.
  • When tasked with improving a memory tab in an operating system, GLM 5.2 added a memory health bar and was the fastest of the three models.
  • The creator recommends using Hermes agent to route tasks to models based on their strengths, with big-brain tasks going to GLM or Opus 4.8 and high-scale tasks to cheaper models like DeepSeek.
  • Fugu's multi-agent approach is interesting but not ready to replace a direct frontier model for practical real-world use in Hermes agent.

Tools mentioned

  • Sakana Fugu
  • GLM 5.2
  • Hermes agent
  • Claude Code
  • Open Router
  • Zapia MCP
  • Graphify
  • GL (speech-to-text)

Techniques

  • intelligent routing via a multi-agent system
  • tool calling with Hermes agent and Zapia MCP
  • model comparison across multiple tasks (tool calling, web generation, code improvement)
  • using open-weight models for cost efficiency
  • assigning models to specific tasks based on performance and cost trade-offs

Takeaways

  • Neither Fugu nor GLM 5.2 currently beats Fable 5; claims of Fable killers are overhyped.
  • GLM 5.2 is a strong alternative for many tasks at a fraction of the cost of Opus 4.8.
  • Route tasks to the right model inside Hermes agent to optimize for performance, speed, privacy, and cost.
  • Fugu's latency and token consumption make it less practical for real-time agentic workflows.
Transcript (captions)
In the last 7 days, two models have allegedly killed Fable 5, Sakana Fugu and GLM 5.2. And honestly, keeping up can feel exhausting, which is why I've tested both models so you don't have to. So, inside this video, we're going to cover do any of them actually beat Fable 5 and which models should we be using right now inside Hermes? And the brand new system I'm using going forward because of this. And if you're new, I'm Jack. I built and saw my last startup with a gazillion customers. Now I'm building my own AI startups and I share the stuff that actually works. So if you haven't already, grab that beautiful coffee and let's dive straight in. Now I hadn't even finished testing GLM until I heard, hey GLM is old news. The new thing right now is Sconafu. This is the new thing that supposedly killed Claude Fable. It is literally like Game of Thrones. It is impossible for many people to keep up with what the hell is actually going on with these models because it feels like there's a brand new king every week. So what should we actually do about it? Well, there's two that have really gain garnered quite a lot of hype. One is Sakana Fuga, which is an intelligent router which could be quite interesting for Hermes. It's one API that secretly orchestrates a whole pool of Frontier models. And then we've got GLM 5.2, which is a model from China. It's an openweight workhorse that you can run yourself for a fraction of the price. People are reporting better performance than Opus for a fraction of the cost. And we're going to see which of those are actually better than basically the brand new best Fable model and what we should be using in Hermes. Now, in terms of why do we care about this Jack? Why should I really think about this stuff? Well, the case is if either of them can actually beat Fable 5, your Hermes agent can get the most capable brain on the planet um for less money as well. And that is worth testing and exploring to make sure we're using the very best models. So the most important thing to understand and what makes it so interesting about fugu is the fact that it is not a model. It's actually an intelligent routin. So think of it like this. You are chatting to fugu with Hermes agent or claude code if you prefer. And effectively the idea is that this is trained this trinity is trained to call which model based on the question. So for example it will know based on a certain type of question to call a specific routter. So the idea behind it is that it knows that's a Gemini task, that's something for Opus 4.8, that's something for chat GBT, that's for miniax, etc., etc. So it has this hidden pool of models that it may call any number of them at any time and also use itself. So that's the kind of concept behind it. Now the idea is it's one API, a hidden pool of frontier models underneath and it even calls itself. So Connor's own words, a multi- aent system delivered as one model. Now the claim is that this is supposed to go toe-to-toe with Fable 5. We're going to see exactly if that's the case. Side note, it's incredible getting these different models out of Asia. That's very helpful. Europe, we've got some great bottle caps with some wonderful lids. So, there's that. Hopefully, we can step it up, but it's great that we're getting this emergence of Challenger models. We talked about Miniax last. We've got a Deep Seek, Fuger, all these great things that are coming out and developing from Asia. Now, the second model we're going to touch on here as well is going to be the GLM 5.2. It is known as an open workhorse and it's really cool because a lot of people are saying it's as good if not better than 4.8. Again, we're going to find out inside the Hermes agent. But the cool thing it is a sixth of the cost. So one six the price for similar if not better benefits 1 million um context window. We can run this in code and we can also run it ourselves inside her agent. The important part to take away here is Frontier class coding at pennies on the dollar. Now it's important to be in mind that everybody basically grades their own homework. You can see some of the indicative performances that we've got here. And the second vector to look at this is price. So performance is one thing. How much are we paying to do this? Local obviously being $0, Deepseek Minia, GLM, Opus 4.8, Fuga Ultra, and GBT 5.5. So let's start by connecting Sakana to Hermes agent. Now I'll put a link down below for this notion document so you can grab it. I teach you how to connect it to Hermes very easily and also how you can connect this to cla code. So depending on where you want to run it, you can just grab this document, copy whatever you need to, and throw it in there. So I'll assume that you've done that and then essentially when go over to Sakana website you're going to go over to start using the Sakana fuguum and you need to create an account and sign in. Once you've done that you'll see you'll have a dashboard that looks a little bit something like this and what you need to do is come over here grab your API keys. Now you come down, you can create a simple API key and then you're going to give that to Hermes agents effectively in a conversation. And the safest way to do that is say, "Hey there, I want to give you an API key for Sakana. Create for me a terminal command that I can enter to provide that securely." And then Hermes will do that exact same thing for you. You open up a terminal with command spacebar, type in terminal like so, and then literally just give it whatever command that Hermes gives you. And as you can see, Hermes has given us this. So all we do is copy this, come over here, throw that in, and then we can enter in our API key when prompted. So now we've done that. Let's actually test Sukuna out. Hey there, I would like you to use the Sukuna model. And do me a favor, tell me how many Rs are there in Strawberry and create for me just a I don't know a a Bronteesque one paragraph on why Laqua is the best drink anybody can be having right now. So I just want to vow that these models work. Then when it's done, we can actually test these together to see which is the best model to be using in Hermes for each individual task. And side note, what I really like about Hermes, by the way, is the fact that it can dynamically tag in any model at once. So by just voice commands. So we don't actually manually have to for/model find the right thing and then select it. We can say, hey, delegate to this and use deepseek for this and claude for this and it will go ahead and it will do that for us. Beautiful. And it's come back. It's given us the prompt tokens um the completion tokens and how much it took. Laqua, modest, effevescent, and strangely sublime. I think that's a wonderful explanation of it. And so now I've done that, let's go ahead and connect GLM. Now for setting up GLM 5.2, I'll put another link down here for this document. This is if you want to grab the GLM API key directly. In reality, if you're connected to Open Routter, which I highly recommend that you do, you can effectively use it via that system, which is the easiest way of doing it. But if you want to get a direct API key, you can do the slightly more challenging part, I say challenging, but it's good to have this guide, is when you want to connect GLM 5.2 to Claude Code. So again, I put a link in the description down there for you. So you can grab that if you want to connect it as well. And if you do want to take your cord code to another level, I'm just going to put a link down below for you for this cloud code masterass. It is the most comprehensive course I have ever done through power features, memory systems, Hermes agent apps, building anything, websites, you name it. I'll put a link for it down below. You also get immediate access to the beautiful uh AI operating system for Hermes and Claude. Huge game changer. I'll put a link down below for that. And now what we need to do is check GLM. So hey, I want to use GLM 5.2 too. Once again, give me a Bronte-esque overview as to why MMA is one of the best sports on the planet and in addition to that, how many RS are in strawberry? Give us a quick question and send that one off. And in fact, let's ask this one other question. Hey, if I was going to a car wash at 15 m away, should I drive or should I walk there? This technically speaking, it's been a bit of an interesting one since Andre Kapi posited this question. Now, I'm not sure if it's just been baked into every model now, but effectively models, even smart ones, would say, "Hey, Jack, I really think you should walk. driving makes no sense for 50 m. Let's see what GLM 5.2 says. Beautiful. We've got the answer from GLM 5.2. It's picked the number of Rs and Strawberry correctly and also has denoted the key of driving instead of walking, which is a great start. So, the next thing we need to do is actually put it to the test. So, what we're going to do is we're going to compare GLM 5.2, Opus 4.8, and this brilliant Japanese model across three very interesting levels to see exactly what we should be using. Now, level one is going to be its ability to do tool calling. I think that is massively understated specifically for an agent like Hermes. So I'm going to give it the following question which is this. Hey there. Now I'd like you to do a test between GLM 5.2, Sukuna and Opus 4.8. I want them all to be in separate environments with their own individual context and the exact same prompt such that we can assess which model performs best on its own merits. Now, the first thing I'd like to do is to use the Zapia MCP to go and retrieve the actual subject line of my last email inside my Outlook. And by the way, the reason I use Zapia MCP is it's this universal almost like tool to connect anything to anything. I find it specifically helpful as well with Outlook because otherwise it can like if you connect to it include for example, it demands you to have a specific professional email. So, you've got a personal one and you want to do various different tests. It's super freaking easy. You can come down to choose your AI agent. Come down, select other, grab this one. Just in case you haven't got this set up, you do this. Literally all you do then is come down to add apps and then effectively you just search for the thing that you want. So if I wanted to add in Outlook, for example, would you believe it? Outlook does appear. And then I can actually pick and select the actual kind of capabilities that I want to give it. I always recommend that you give it principle of least access. That's really important. Means that it can't do more. So I never give the ability to create or write emails for example. So now for example, I've done that. To connect it, I just come over here to the connect button and it'll give me all the information I need to go ahead and do that. As you can see, I get this information. I click on generate token. I can copy that, give that to Hermes. Then Hermes will have the full access. Once again, I recommend you do that terminal setup just so Hermes never actually sees the token. And after the first test, we get some very interesting results, things that you might not have expected. Now, I've had this actually assess and pull together a bit of an overview. So, so Kafuga actually won the task. Okay. And it did it in the least amount of tokens ironically compared to Claude 4 4.8. But what failed it was GLM 5.2. How interesting is that? So look, if we just look at a quick summary here, Sakana won 230,000 tokens. Claude Opus 4.8 got it correct, but probably about 60% more tokens and GLM failed. Um, but I had it retry and it won on its third try. So in reality, it could keep on trying. Um, but I did want to see that it was able to do tool calls properly. I think GLM strength is actually coding and stuff but interesting that it failed to get the tool call correct in this in this case which is interesting for kind of like how robust it actually is. Then we look at latency 154 82 by claude and only 43 by GLM. So probably if it just fudged its way through GM would actually have been the fastest. And so the second test is it ability to create a onepage website. Hey there. I would like to go ahead and create for me a very simple onepage website that looks beautiful, visually engaging. Make it only three sections, so it's not absolutely massive. And the website I would like you to make for is something for sparkling water. You have full creative autonomy, and I'd like three links to open up my local host for the exact same three models for me to go ahead and test. Okay, going to put it down there. I'm going to say I will judge it based on how visually beautiful it is. uh any inaccuracies will be points down and it is going to be for a sparkling waters company to just make sure that it's epic and use those three models. Okay, cool. We're going to send that one off and we'll see exactly how they perform. And by the way guys, if you're wondering what I'm using for speech to text is GL and we've just launched on Windows. I have so many people saying Jack when it comes to Windows, we are officially on Windows. So I'll put a link down below for GLD if you want to go ahead and grab that. Beautiful. And just like that, they are complete. So we'll check out the websites then come back to the data. First one over here is by Fugu. The world deserves a better bubble. Not too bad. It's done this. Bear in mind Fugu may itself just be using Opus 4.8. But the point is what's the actual outcome? Like obviously this is a little bit too much dominating. Like it actually goes underneath. This is kind of directly correct but a little bit on the on the kind of side. Definitely not on a oneshot level as good as Fable 5 is. Luxury audited by Tiny Bubbles. Yeah. Okay. I mean it's different. And it's definitely not got that kind of like Clawude edge, that kind of classic I've been made by Claude. This is cool. I hover over it. It's got some interesting ideas. So Fug is okay, I think. Let's go take a look at our next one over here, which is going to be Opus, the sparkling awards, double establishment. This is not great to be honest. I mean, it's got an effect, but again, it's directionally okay. Definitive ranking of things such as fears. Yeah, that's okay. I mean, I'm not blown away by any of these websites to be completely honest with you. No bottles escapes the panel without earning it bubbles. This is really basic. Very basic for a one shot. Then finally, we've got over here the final one, which is GLM 5.2. Really cool. It's got this bubble effect coming up. I think that's with the best of them so far. This year's contenders. I mean, this is really poor, I think. Yeah, not massively a big fan of these. So, what I'm going to do is go actually ask them to add in a contact form. And then just touching on the tokens. So, Fugu Ultra is 776,000 tokens, which is preposterously large for its output. And the wait time on that guys 3.5x greater than Opus 4.84 an output that is probably a little bit better in its first shot. It did 15 API calls um all relatively the same size. The fastest GLM and Opus 4.8 were essentially on par which I thought was really interesting. But the biggest standout here is is the actual sheer amount of tokens that Fugu actually took to deliver that. Now, I'm also going to say that I think the website quality is is so to speak not as great as it usually is because of the prompt given to it in Hermes agent. I just said, "Hey, spin up a really simple website." I didn't give it any the the w the razledazzle or any of the specific tools or skills. So, it is going to be fairly basic. So, judge them how they compare to each other rather than how it sits as a website by itself. Beautiful. So, that's now updated. This is Fugu now. Weirdly enough, it just is not getting like the height right. Do you see what I'm saying? Cuz usually it's one frame, one position. sits here. This is barely legible. I would say this is not a great uh website. Not a brilliant example. Not blowing me away. We come over to Opus 4.8. Again, way too big. This is nuts. It's crazy how how broad this is. Doesn't make any sense at all. And then bear in mind, no website skills for this. And then finally, we got GLM. This is the best of them all so far, I think, for given what I've given it here. Sparkling water nomination. I think GLM is the winner in this one. And if I even compare on the numbers here, GLM was the fewest tokens by far. But actually honestly at the moment GLM 5.2 is actually defeating Opus 4.8 in terms of its initial style overview. And now for the third test I'm going to give it probably hardest test yet. So I'm going to give it the code to the Hermes and Claude code agentic operating system. Specifically you'll know my last video I added in there the ability to have conversations with Hermes. For example, let me show you what I mean. If I come down here, begin the chat. I might say something along the lines of just loaded. Hey there. So, I would like to just understand what agent am I talking to right now? >> Hey, you're talking to Hermes' voice assistant. >> Awesome. Let me just cut you off right there. What is 10 + 10? >> That's easy. 10 + 10 is >> Okay, that's awesome. But I'm actually probably going to head over to my knowledge graph, right? Which basically helps me understand what different repositories look like. I've got loads of stuff in here for my memory, my obsidian memory if I want to to understand all this sort of stuff. So, let's pick one random section and see how Hermes might actually improve that based on the model. So, let's start with the memory. Beautiful. Now, what I'd like to do is go over to my claude code and Hermes operating system. And I'd like you to give a challenge to the three models to improve the memory tab in one way. It could be anything you want to think about the user. Use graphify to more quickly and easily understand the codebase for each of the three models and then let me know and basically just let them build it and then give a tiny bit of explanation about what they've actually done and we can compare it. What I'd like to do is open it up in a local host separate from my actual code so I can assess it directly and independently. Beautiful. Now we can see the update. So let's have a look at what they've actually physically done. Very quick tell the art. So all three completed it. Opus did it in around 3 minutes. GLM took about 6 to 7 minutes and Fuger took 13 to 14 minutes which is crazy. And honestly, what's weird is I get I deliberately kept it quite an open challenge like, "Hey, learn the codebase and add just add something." Right? If I come over to GLM, honestly, of the three, this I think was the best one. It was a memory health bar. They've got a nice kind of like, you know, green to yellow gradient on there. Pretty cool. Could be, you know, could play around, but but fairly decent. And then we have both Opus 4.8, which added a recent activity filter thing which I guess is okay to search by that and then literally the same thing we got on Figure Ultra but at twice the amount of time. But then coming back to the original operating system here, what's cool is you can use graphify to get this level of kind of understanding about how these things all connect together very very quickly while saving our tokens. So when you look at what Fugu is actually doing, it isn't a Fable killer. These models are not better at the moment than Claude Fable. my own testing, I validated that GLM is surprisingly impressive, which I'll come on to in a second. But if you look at what Fugu is actually physically doing, it's essentially just rooting to different models, which in principle is great, but it's added so much latency like and a lot of times it was twice as long to do the actual task compared to Opus and GLM was was even faster, which I thought was super interesting. Now, when it comes to using models in Hermes, there's four things that matter. We have the performance. In other words, how good is the thing? How quick is it? So we know that using many Macs was the fastest model I've ever seen in Hermes. Then we have privacy running it on your own MacBook, your own Windows laptop or a virtual server and then the actual cost. So ultimately we are trading off based on these four variables. And so with all that in mind, this is the strategy that I'm going to be using and effectively it's like making Hermes itself a blackbox and the idea of this is that Hermes will route to the most optimum model based on the questions. I haven't seen from Fugu based on my personal use of it. Not like we don't want a benchmark queen that looks great on paper. We want to use it in real life to see how is it actually doing things. Honestly, I found that when I ask Hermes agent to route to different models or for example, if I'm in my own pantheon, right, which I can show you in Hermes agent over here, you can actually see in the pantheon when we're quoting skills, we can specify the specific model. So, I can come down. Let's go to the alchemist here. I can see the job, the description, the system prompt. Then I can specify the actual model that I want to do. the task. I can do it freely when I'm having conversations with Hermes over here or I can do it specifically if I create a skill that I use very regularly. I can tag in and Hermes knows exactly which model to use for which task. If you look at 80% of the things that you do with Hermes agent, you'll find that most of them actually fit into a series of common things. Research X, um, build me Y, build me Z. And it's far better to find out what good models do that, but also give Hermes the flexibility to build on those. And you can build on those by essentially attributing some tasks for the models that they're best designed for without having to wait twice as much time or spend way more money. For example, big brain tasks. You may use GLM, Opus 4.8, and have them actually discuss and debate each other. And then, for example, you could use DeepSeek for high scale things, which is kind of like gray on terms of performance, but very, very cheap. But then the actual real impressive model that came out of this was GLM 5.2 because even if you said that 4.8 and it was better and I don't think it was really clear that was the case in the test that I've done. GLM 5.2 is surprisingly good. It's 16th of the price which is crazy. So in terms of power per token ratio, GLM 5.2 is really impressive. Now having the best models to power your Hermes agent is one thing but if you don't understand the individual levels of Hermes agent, you're not going to get the most out of it. Which is why the next thing we need to do is check out every level of Hermes agent right

Jobs for this video

Jobs for this video
Stage Status Attempts Last error Updated
summarize done 0 2026-06-24 22:01:05.940470+00:00
transcript done 0 2026-06-24 22:00:45.009214+00:00
metadata done 0 2026-06-24 22:00:21.294418+00:00