TLDR
Hermes Agent allows swapping different AI models for different tasks to optimize cost and performance, breaking the Claude/ChatGPT duopoly. Minimax M3 is highlighted as a particularly cost-effective model that rivals top-tier models at a fraction of the price, using sparse attention for efficiency. The creator demonstrates how to integrate Minimax M3 with Hermes Agent via Telegram for multimodal, web-scraping, and voice-interaction tasks.
Key points
- Hermes Agent lets users swap in any AI model (e.g., Claude, Minimax, GPT) for different tasks, acting as the car chassis while the model is the engine.
- Minimax M3 matches GPT-5.5 on coding benchmarks at 4% of the price and edges Sonnet 4.6 on some tasks at 8% of the price, with a 1M token context window.
- Minimax M3 uses sparse attention to reduce compute by 20x, making long-context (1M tokens) cost cents instead of dollars.
- The creator shows how to set up Minimax M3 with Hermes Agent by installing Hermes, selecting Minimax global in the terminal, and entering the API key.
- Minimax M3 demonstrates fast multimodal capabilities (image description), web scraping via Firecrawl, YouTube video analysis, and voice interaction within Hermes Agent.
- Users should treat models as workers not bosses, verify outputs, and dynamically route tasks to the best model for each job (e.g., big brain for planning, cheaper models for routine work).
- Minimax offers 12% off tiers via a sponsor link, and commercial use above $20M requires separate agreement.
Tools mentioned
- Hermes Agent
- Minimax M3 (MiniMax)
- Firecrawl
- Telegram
- Glad (transcription software)
Techniques
- Sparse attention for cost-efficient long-context reasoning
- Model-agnostic routing: swapping models per task to optimize performance per dollar
- Dynamic skill-based routing: pre-allocating models to specific skills in Hermes Agent
- Using powerful models for planning/strategy and cheaper models for routine execution
Takeaways
- Use Hermes Agent to break the Claude/ChatGPT duopoly by swapping in cheaper but capable models like Minimax M3.
- Minimax M3's sparse attention drastically reduces compute cost for long-context tasks.
- Always verify model outputs and use the right model for each specific task to maximize performance per dollar.
Transcript (captions)
Hermes is the best AI agent in the world, but most people ignore one system that costs you more money and makes your results worse. So, in this video, I'm going to show you exactly how to increase the performance of Hermes agent by using the best model for each task and it's something you can do so easily even if you've never used Hermes before. So, you can save more time, get better results, and get light years ahead of everybody else. And if you're new, I'm Jack. I built and sold my last tech startup with like a gazillion customers and now I build my own AI startups and I share here the things that actually work. So if you haven't already grab that beautiful coffee and let's dive straight in. And so with our beautiful Hermes agent, the first thing that we have to do basically is to stop overpaying for the wrong brain. The beautiful thing about Hermes agent is the fact that we can use any model we want to on the planet which is absolutely fantastic. And the core concept here that I want you to think about is you have to understand the things that the agents actually do. They can code and build tool use. So do I want to go and scrape things from the internet? Do I want to build things, connect to different softwares, the ability to go onto the web and see things and visualize things? It's reasoning ability, its ability to store memory and context and how long can it think. Think of it as basically just substituting a different driver that has a different skill set for a specific task. And some drivers are more expensive than others. And some are way more capable. And the core thing we have to do is break the duopoly of only Claude and only chat GPT because you're missing out too much if you're only restricted to those two models. So think of it like this. You're going to own the car and swap the engine. This is a really good analogy. This is our beautiful Hermes. And we can drop in anything from Claude, Minia, GLM, GPT, whatever we physically want to. And in this analogy, Hermes itself is the car. It's the wheels. is the wiring of the dashboard that you build once and then you keep. And all we're ever going to do is basically just decide which engine do you want to drop in for each specific task that we're doing. And if you follow this process, you will get more performance per dollar of input that you get from Hermes agent. Meaning that the agent that you're using is just way more capable. And this could apply to any model, but the model I'm going to be talking about in this video is Mini Max V3. Just to understand what this looks like, if you compare it to GPT 5.5 on the benchmarks, it is tied on coding for 4% of the price. Imagine a model that's as capable as GPT 5.5, but is only like 125th of the price. For Sonnet 4.6, it basically edges it on some things. Obviously, you can check out the benchmarks, and that's 8% of the price. The model itself, and it's so easy to set up, 1 million context window. It can sees images and videos which some of the other um upcoming models don't at the moment. Can browse the live it can browse the live web and open white. Now this model is super cool and I emailed Miniaax and they even agreed to sponsor this video. So thank you Miniax for doing that and I really want to sort of lay home that whatever the model is that we're switching out to break down this duopoly of basically either Claude or Chat GPT will open up your horizons and your performance per dollar you spend will go further and higher. And I'll show you exactly what I mean about that. So Mini Mac F3 has done something really interesting which is why I was really excited to talk about it today. And you can see its performance here about how it basically matches up to the best models in the world right now for things like reasoning, command line, code, and web. And you can see the different trade-offs in terms of where it's great and where it's up and coming and those different things. But what it did that was really interesting is you have this general line here, right? Which is basically how good is the model versus how expensive is the model. And one of the really cool thing it's an outlet in terms of its power against the actual price which is really freaking interesting. So you may be thinking awesome but how is it actually achieving this result? How is it able to be this good at the certain particular price point? Well apart from the fact that models are just evolving in better is using something called sparse attention. So think about it from this point of view there's an expression that says that a horse designed by committee is actually a goat. In one scenario everybody's talking to everybody. It's chaotic. It's slow. But over here, we only basically talk to the people that actually matter. Therefore, it's 20 times less work. So, if you think about it from this perspective, you picture a meeting where every person talks over each other is pure chaos. M3's minimax sparse attention only lets the people who matter speak. It's the same answer, a fraction of the noise, and 120th of the compute. And you can see a graph just to kind of explain some of the technical stuff that's happening under the hood with the model. But Vatilia as it is this efficiency is the whole reason why 1 million contacts costs cents instead of dollars in other places. And then when you look at what a dollar is actually buying with a model like this. So you can see the output tokens per $1 for Miniax is really good. Remember every model is great for one specific thing. So it's not just about committing and only of using one model. We want to use the best model for a specific job. That's the whole thing is that we are model agnostic. And only by looking at data points like this can we really start to break down these perceptions we have and get way more output out of the models that we're using. And so you can see if you compare it to GLM 5.2 which is going super viral right now which is again an awesome model. These are all great models for different things. You can see some of the differences that Miniax can natively see browser web and four times cheaper output per 1 million context and GLM itself is crushing it for command line coding. It is even better I believe than 4.8 A and the tests are crazy and it's such an exciting time to have all these different models that we can tag in and use for anything that we want to. So, I'm going to come over now to miniax.io and I'm on platform.mminax.io. I'm going to grab a plan. You can do 20, 50, $120. And you can see the tokens that you get for that in terms of usage. So, the $20 plan is 1.7 billion tokens. Pretty much just like you would do with a claude subscription or a chat GBT. It's basically just how that all connects together. Then once you've chosen a plan that you want to if you want to give Miniaax a will. So you've got the token plans which are going to be more economical or you've got the API pricing. So you can just come down and check out what the actual pricing is for this if you just want to kind of pay as you go with the model. Then to grab your API key we're just going to come down and click on this get API key. And then we're going to come down here to subscription and just lally copy this value here. Now if you don't have Hermes agent installed you're going to come over to Hermes website. And all you're going to do is come down and literally copy this code. And then once we've got the code just going to come down and do terminal. And when that pops up we're going to enter in like so. And this will download the entire Hermes agent on your computer. And if this sounds like I'm speaking Spanish, I'm going to put a link down below for my full claude code masterass that will take you from zero to hero. We go through foundations, building a website, power features, memory systems, Hermes agent from from start to finish all the way through. And you also get the full beautiful agentic operating system, the Hermes OS that has so many cool bells and whistles that will take you to a completely new level. So, I'm going to put a link for this um down below so you can come and grab that if you want to. Now, here's a cool thing. Terminal is here. Now, what we're going to do to connect Miniax is we're literally just going to come down and we're going to select it down here, which is Miniax global. Just do spacebar like so. And we're going to go for the miniax at the top by clicking space bar. And then it's going to ask for the API key. And we'll just drop in that API key. And once that's done, hit enter like so. And the API key is now basically saved. And now we can begin to have a conversation with it in Telegram. Again, if you've not seen that before, either click the link down below or check out this video on screen for setting up Hermes agent from start to finish. Beautiful. Now, I've opened up Telegram. Let's switch over to Miniax and take it for a spin. So, I'm going to for I'm going to type in model like so. And then the model selector will pop up with inside Telegram, which is excellent. And we can just switch over. Cool. So, as you can see, MiniAX is right there. So, we're going to click on this one. And then you can pick the model that you want. So, we're going to go with Miniax M3. And then we are going to be ready to have a full conversation. And so now that's connected. Let's ask it a question. Hey there, which model is this? And just like that, it should come back and say, "Hey, this is Miniax." And the first thing I want to demonstrate here is the fact it's got multimodal capabilities. So what we're going to do is take a screenshot of this, drop it in and ask it exactly what it is. Let's come down, take clipboard, which is awesome. Drop it over here. Hey there, describe to me what you see in this image. Drop it off and see the ability to just work multimodally in anything that it's doing. And just like that, it's come back with this. And I love the way it's broken this down. It's got like bear in mind when you're using these models in Telegram, they don't always format it the best and it's done an incredible job with this. So, let's put it to the test using our fire crawl skill. So, what I'm going to do is come down and get a prop. Hey there. I want you to use the firecraw skill. I want you to go over to glider.com. I want you to extract their brand identity and then I would like you to create for me just a little bit of a mini HTML overview that I can open up in browser and just understand about their brand, color palette, logo, that kind of thing. Send that off. Pretty complicated task. I'm using firecrawl mainly because it's the best way to actually grab information. And just before I even answer that guys, look how quick this came back and asked me clarificatory questions and it's asking is it glad.com? I'm going yes it is glad.com and yes you should have a skill to use firecrol. Going to come down hit enter on that one and watch it work its magic in the background. This I don't it could just be me but this does seem to be coming back extremely quickly on the miniax which is pretty impressive. Now while that's working in the background let's fire up firecrol over here and just pull this bad boy up. Now, in Firecrawl, we can search the web, we can scrape websites, and we can also interact with them as well, making it pretty cool. And one of the reasons why this is so epic, I talk about it all the time, is that you're not pulling back all the HTML from a page. It's just extracting the correct text, which saves us a fortune. And one thing I will say is I am super impressed with how quickly it's coming back and giving us updates whilst it's doing everything. Quick update on the website itself. Glad.com appears to be a single page app with minimal static HTML. Glad of course is the software that I'm using right now to transcribe everything and that's how I'm able to like talk and the sort of text just appears and then just like that guys it's come down it's confirmed what it's done and this is what it's created for the GL brand guidelines look at this it's correctly identified that we save people 20 hours a month it's got all logos it's got the color palette and all this color palette guys it actually grabbed from legit website I'm really really impressed with it I mean look at this it's even grabbed this image that is actually wow I can click on this and go over to the actual website and you can see the similarities. Now, let's test it with something more difficult. Why don't we ask it to actually check out a video on my desktop and see how it can handle that? So, I might come and say, "Hey, I want you to go into my desktop, find me the latest video that I recorded and tell me what is the first sentence I say in that video." Now, I haven't necessarily given it the tools to do this. We can see if it can figure it out. And so, it's done that and it's found the pizza commercial I did in my ads video, which is crazy. Let's give it one other challenge here. Let's go over this and give it this issue here. So, why don't I come down and say, "Hey, now head over to the Jack Roberts YouTube channel. Check out my latest YouTube video and tell me what is the intro for that video." Okay, now I'm asking it to go online and grab my information. Okay, so we can see now it's got my channel ID, looking at my latest video. And I wanted to share how it's working because one of the thing that really jumps out to me with Miniax is how quick it actually is at processing information. Even the stream and the speed at which you get the information back is crazy. And look at this. That is actually completely correct as well. is actually pulled down one of my latest videos which is pretty incredible. Cloud code agentic systems are the future unlock capabilities that 99% of don't exist. Blah blah blah. This open source repo will 10x your Hermes agent and it got it almost instantaneously which is crazy. And then we can also head over to the Hermes agentic OS. I can come over to intelligence and begin a conversation with Hermes just in the same way using the miniax model. I can come down here begin the conversation. So I might say hey there who am I talking with today right now? Hey Jack, you're talking with me, Hermes' voice. I'm your AI companion. >> And essentially anything we can do in the chat, we can also do in this voice terminal. And the cool thing is you can speak to Miniax in Hermes agent and say, "Hey, I want you to build me out a routing strategy." And remember, Hermes will dynamically route to any specific model based on the task at hand. I've actually built a full skill for this. I'll put it down below in the description so you can grab it if you want to check it out. but just enables Hermes to directly use this intelligence to route to the correct resource. I did a full video on it. I'll put a link on screen so you can check that out. And remember guys, you can also give it reminders like this. Hey there, go ahead and set me a reminder that in 2 hours time I need to order some Laqua. Don't judge me guys. Coke is out. Laquire is in. It just happened. Don't blame me. It just is the way that it is. And just like that, it creates reminders in the exact same way. And bear in mind, guys, that Hermes agent can dynamically route your query. So you can say to Hermes something like hey that I want you to create for me a system a skill that will dynamically route my queries based on the model that I'm using and bear in mind Hermes can do this obviously if you're in an agentic operating system you can actually and this is a really cool thing build out specific skills with specific models for example I you may have the philosopher for some different models I have orius here which does deep reasoning on different topics you can use minimax for certain things and pre-allocate certain skills to certain models with certain objectives, meaning that it roots there dynamically and automatically, which is a huge, huge benefit. Now, a couple of things to know before you wire this into your awesome system. So, number one is that you need to think about this as a worker, not a boss. It can be brilliant on the routine 90% of the time, but always, always, always verify with any model that you're building. Okay? Really, really important. And so, the key thing to understand is we want the right model for the correct job. But whether that's going to be Miniax or Opus or GPT, we are model agnostic. We bring in the best guy for that particular job. Basically to be token efficient. It's going to increase performance and actually save you cost as you scale at the same time. And a great way to think about this is that you own the agent and rent the brain. And Minimax have kindly offered us 12% off any tier. So if you come down here uh with a link, I think I'll put it down below, you'll also get 12% off this, which is super generous and we appreciate that. And remember, you can automate the switching of brains with a really cool basically tool routting skill that you can add directly into Hermes. And the other good thing to know is with Miniax is if you're using it for commercial purposes, they haven't asked that you put a built with Miniax M3 as a credit on it. And if it's above $20 million, then you can basically reach out and have a separate conversation. Now, generally speaking, when we're talking about doing the right job, you want the most powerful brain possible to do the initial planning and conversation. That's where you get the maximum bang for your buck. Scoping it out, designing the dashboard, whether it's an operating system, whether it's a course curriculum or whatever it is, right? Whatever you're building out, you essentially want the big brain to do the big strategy stuff, then we can start to delegate different models basically to drive that within the structure that we've already created. Now, knowing the right models to use is one thing, but if we don't have an operating system to combine them all together properly, we're not going to get the most out of our entire system. Which is why the next thing we need to do is leverage all those together, which we cover in this video right
Jobs for this video
| Stage | Status | Attempts | Last error | Updated |
|---|---|---|---|---|
| summarize | done | 0 | — | 2026-06-23 22:00:40.480768+00:00 |
| transcript | done | 0 | — | 2026-06-23 22:00:27.712385+00:00 |
| metadata | done | 0 | — | 2026-06-23 22:00:16.137709+00:00 |