TLDR
Open Montage turns AI coding assistants into full video production studios, Dear Flow from ByteDance is a super agent harness for long-horizon tasks, and Anthropic's cybersecurity skills package gives agents access to frameworks like MITRE ATT&CK. Other notable projects include Hyperframes for deterministic MP4 video generation, Codebase Memory MCP for ultra-fast code indexing, and Matt Pocock's and Garry Tan's skill sets that codify expert engineering and startup-building processes.
Key points
- Open Montage transforms an AI coding assistant into a full video production studio with 12 production pipelines and 400 agent skills.
- Dear Flow (ByteDance) is an open-source super agent harness designed for long-horizon tasks, orchestrating sub-agents, memory, and sandboxes.
- Anthropic's cybersecurity skills package provides agents with six cyber frameworks (e.g., MITRE ATT&CK, NIST) to improve application defenses.
- Hyperframes (Haygen) is an open-source framework that converts HTML/CSS animations into deterministic MP4 videos for product demos and motion graphics.
- Codebase Memory MCP (Deus Data) indexes codebases in milliseconds, supports 158 languages, and uses 120x fewer tokens for agentic code understanding.
- Matt Pocock's skills and G Stack (Garry Tan) codify expert engineering and startup-building processes into agent skills for real development work.
- Baidu released a new open-weight vision language model for lightning-fast OCR, capable of reading and highlighting PDFs with spatial understanding.
- Nvidia's Skill Specter scans AI agent skills for 65 vulnerability patterns across 16 categories to ensure security before installation.
Tools mentioned
- Open Montage
- Dear Flow
- Anthropic cybersecurity skills
- Hyperframes
- Codebase Memory MCP
- Matt Pocock's skills
- G Stack
- Baidu vision language model
- Skill Specter (Nvidia)
- Palmier Pro
- Hermes agent
- Voicebox
- Merlin AI
Techniques
- long-horizon task orchestration with sub-agents
- deterministic video rendering from HTML/CSS
- ultra-fast codebase indexing with structural queries
- skill-based agent programming for expert workflows
- security scanning for agent skills
- OCR with spatial understanding for document analysis
- voice cloning and transcription with local models
Takeaways
- Open-source AI projects now offer production-grade video creation, long-running agent harnesses, and cybersecurity skill packages.
- Codebase Memory MCP and Skill Specter address critical needs for code understanding and security in agent workflows.
- Expert-coded skill sets (Matt Pocock, Garry Tan) enable agents to replicate real engineering and startup-building processes.
- Voicebox provides a complete open-source voice IO stack, including cloning and transcription, that can run locally.
Transcript (captions)
Over the weekend, I found some incredible open-source AI projects that I wanted to share with you. Let me show you them right now. All right, the first one is called Open Montage and it turns your AI agent into a full video production team. This one has almost 15,000 stars. It describes itself as turning your AI coding assistant into a full video production studio. Describe what you want in plain language, your agent handles research, scripting, asset generation, editing, and final composition. And here's an example. So, this is a cinematic sci-fi trailer produced by Open Montage, created the concept, the script, the scene plan, used VO to generate the motion clips, the soundtrack was generated, and Remotion handled the composition. You can also start from a video that you already love. You simply give it the clip that you want and it'll create a video very similar to that, and of course, you can change it any way you see fit. And it has a ton of features including 12 production pipelines, explainer videos, talking heads, screen demos, cinematic trailers, animation, podcast, localization, documentary montages, and more. 400 different agent skills, and so much more. I'm going to drop all the links for these down below. And by the way, if you like these videos where I curate the best AI open-source projects, drop a like, subscribe to the channel. It very much does help. Thank you in advance. Next, we have a new agent harness coming out of ByteDance with possibly the weirdest name out of all of these projects. It is called Dear Flow. And it has almost 74,000 stars. Dear Flow stands for deep exploration and efficient research flow. It is an open-source super agent harness that orchestrates sub-agents, memory, and sandboxes to do almost anything powered by extensible skills. Dear Flow is made for long-horizon tasks. That means you give the agent something and it goes off for hours, maybe even days at a time. That is what this is especially good at. It uses sub agents to break down complex tasks. It has sandboxes. Of course, it has memory and tools and skill use. Everything you need, everything that you're familiar with with all of the best agent harnesses out there. But this one really leans into long horizon work. So, they talk about users using Deerflow for building data pipelines, generating slide decks, spinning up dashboards, automating content workflows. So, if you've tried OpenClaw, if you've tried Hermes, this might be another really cool option to experiment with. All right, and next in what likely will have to change its name, we have Anthropic cybersecurity skills coming in right below 20,000 stars. This gives your agent everything it needs to be a cybersecurity expert. Then, you can point it at your codebase and say, "Improve my cybersecurity defenses." It works with Claude code, GitHub Copilot, Codex CLI, Cursor, Gemini CLI, basically anything. Any agent that supports skills, you can plug this in. So, it comes with six different cyber frameworks. I have not actually heard of these. I'm not deep in the world of cyber, but we have the MITRE ATT&CK, NIST, and more. All of these different frameworks, a bunch of different tactics within them, a bunch of different techniques within them, just basically giving your agent a leg up in trying to protect your own apps that you're building. Now, what's interesting about it is they actually took real cyber frameworks. So, for example, this MITRE Fight Fraud Framework, co-developed by JP Morgan Chase, Citigroup, Lloyds Banking Group, Standards Charter, CrowdStrike, Verizon Business, and so on. It basically puts together a framework for cybersecurity defense. And of course, because it's only a skill, you simply copy-paste the URL, put it in your agent, and say, "Install it." And it'll just work. Next, we have an open-source project from the company Haygen coming in at just above 30,000 stars, Hyperframes. Hyperframes is an open-source framework for turning HTML, CSS, media, and seekable animations into deterministic MP4 videos. So, basically, it's going to write really cool animations for you in HTML, in CSS, and then it's going to convert that to video format, MP4. This is really good for product demos, for slides, for motion graphics, and so much more. It supports rendering in Chrome with FFmpeg and works with many animation libraries, including one of my favorite, 3.js. So, here's an example of what it looks like. It has no sound, so I'm going to do my own voiceover. >> Hey, I'm Hyperframes guy, and I'm going to tell you about Hyperframes. Here are some cool Hyperframes logos and cool transitions. Here's a prompt I'm writing. You can ask your agent anything, and >> there we go. And I know all of these open-source research projects are awesome, but they need to be powered by AI, and all of those subscriptions become very expensive. That's why I'm excited to tell you about the sponsor of today's video, Merlin AI. Merlin AI is an all-in-one AI tool, and they are giving my audience, you guys, a massive discount. So, stick around to the end of this read, and I'll give you that code. So, you probably keep multiple AI tools on deck at all times because you'll have one tool that's good at one thing, another tool that's better at another, and it can get expensive. Merlin AI puts all the top LLM models from ChatGPT to Claude to Gemini all in one place. So, check this out. I click the Merlin AI extension, I chat with the webpage to summarize what I'm reading, and I pull the important parts. And if I need to go deeper, I just flip the switch on deep research. It also has the top video and image generation models ready. So, let's say you're going to pay for everything separately. ChatGPT, 20 bucks. Claude, 20 bucks. Gemini, 20 bucks. And it adds up fast. Merlin AI is cheaper because they buy all of it via the API in bulk, and then they pass along those discounts to you. So, essentially, with Merlin's discount, it is only five bucks. Okay, so here's the 75% off Merlin AI code. It is Matt 5. Apply the code, and then the total drops to $60 per year. Basically, five bucks a month. So, go try out Merlin AI. I'll drop all of this in the description below. Now, back to the video. All right, so this next one is super interesting and a little bit under the radar coming in just above 12,000 stars. This is called Codebase Memory MCP by Deus Data. This is described as the fastest and most efficient code intelligence engine for AI coding agents. It full indexes an average repository in milliseconds. The full Linux kernel, 28 million lines of code, in 3 minutes. And then, you can answer structural queries in under 1 millisecond. This is meant to give your agent a better understanding of your codebase, massive codebases, and to be hyper-efficient. It supports 158 different coding languages. It uses 120 times fewer tokens. It works in 11 different agentic harnesses, including all your favorites, Claude, Code, Cursor, Codex, etc. It has a built-in 3D visualization, so you can actually explore your codebase and see how it looks in this 3D graph space. And it is one line to install. Very easy to use. Go check it out. All right, next, coming in at 143,000 stars, one of the most popular repositories in all of GitHub, we have Matt Pocock's skills. These are skills to give your agent to allow them to develop software just like Matt Pocock does. And if you don't know who Matt Pocock is, this is Matt Pocock, author of Total TypeScript and AI Hero. He was previously at Vercel, and he is a developer educator. He basically teaches you how to write great code. So, he took everything he learned throughout his entire career, built it into skills, so you can give it to your agent, and you can code just like him. My agent skills that I use every day to do real engineering, not vibe coding. Here are some of the features that come with the skills. We have Ask Matt, ask which skill or flow fits your situation, a router over the user invoked skills in this repo. Grill with docs, grilling session that also builds your project's domain model, sharpening terminology, and updating context.md and ADRs inline. This is for real engineers. And obviously, it's being used quite a bit because it has so many GitHub stars. Go check it out. Again, links down below to all of these projects. Next, we have the very popular G Stack by Garry Tan, president of Y Combinator, coming in at 114,000 stars. G Stack turns your engineering agent into a full engineering team. This is how Garry thinks about building things, and he basically codified all of the lessons he's learned throughout his career into G Stack, so you can easily use it, give it to your agent, and allow them, your agents, to build things the same way that Garry Tan does. He helps you not only with the building, but also the thinking through of different problems that you might want to solve. This is especially valuable if you're thinking about building a startup. You can do things like run {slash} office hours, which is something they do at Y Combinator. You sit with a YC partner, and they ask you a bunch of questions, and then give you feedback about the problem space, your solution, your team, everything. And again, he just codified all of it and put it into G Stack. Installation is super simple. It is just a set of skills, and so you just copy-paste the URL, and your agent will know how to ingest it. So, he describes G stack as a process, not a collection of tools. You are supposed to run these skills in order. Think, plan, build, review, test, ship, reflect, and you do so with all of these different schools. So, we have {slash} office hours, {slash} plan CEO review, plan and review, plan design review, all the way through QA, pair agent, CSO ship, land and deploy, canary, benchmark, document release, everything it handles it all end-to-end for you. It is truly awesome. Next, we have an open-source open weights vision language model from Baidu. This one is brand new, coming in just under 3,000 stars, released just a couple days ago. You download this open weights model, and all of a sudden, you have lightning-fast OCR at your fingertips. And if you're not familiar with what OCR is, it basically just means reading and analyzing documents. So, here are a few examples of what you can do. On the left side, what you see is a research paper. And what this project and model are doing is actually highlighting a PDF. And if this doesn't sound impressive, let me tell you, this is actually a very hard problem. You not only have to fully understand what's on the page, but you have to understand where on the page it is, and how to highlight it appropriately. And it is super fast, as you're seeing here. Go to Hugging Face to download over here in files, and the model is only about 6 and 1/2 GB in size. So, very small. Awesome, awesome job. All right, next, an open-source project out of Nvidia. And I should have probably put this one at the top of the video, but this one scans skills before you install them to make sure they are secure and safe. This one is coming in under 10,000 stars. It describes itself as a security scanner for AI agent skills. Detect vulnerabilities, malicious patterns, and security risks. And it's from Nvidia, so you know, you can trust it. So, it comes with multi-format input. You can scan Git repos, URLs, zip files, directories, or single files. It looks for 65 vulnerability patterns across 16 categories, including prompt injection, data exfiltration, privilege escalation, supply chain, excessive agency output handling, and more. Anytime you're about to install a skill, you should first use Skill Specter to inspect it. All right, next. This one is really cool. It's called Palmier Pro. It's at 8,000 stars, and it is a full AI native video editor that you can download for macOS. Right now, it's only macOS. Hopefully, they release it for Windows and Linux later, but right now, you download it, and the best part is, again, it's open source. So, it has a full MCP server built-in generative AI. It integrates with your existing agents, including Claude, Codex, Cursor via MCP, so you can control the downloaded video editor with your agent. So, you can tell it exactly how you want it to edit your videos, and it will control this video editor. It is beautiful. And the best part is it's absolutely free. And next, truly one of the most popular repos on all of GitHub, we have the Hermes agent coming in above. Congratulations to them. 200,000 stars, and boy, it seems like everybody is using it now. It is a great alternative to Openclaw. Nothing wrong with Openclaw. This is just a very cool extra option that you can try out. It has all the features and functionality that you're familiar with if you've ever used Openclaw, and it really leans into the self-healing, self-improving functionality. So, as you're using Hermes Agent, if one of the skills fails, it will automatically fix itself and improve for the next run. It has a million features, so definitely go try this out. I would need a full video to review this in full. All right, next we have Voicebox. Coming in at 33,000 stars, this promises to be both 11 Labs and WhisperFlow. So, think about voice output, AI-generated voice output, and also voice transcription. So, voice input. Put them together and you have both sides of voice, and it's all open source, and it's all free. You can even plug in local models and run it completely on your computer. Really well, actually. They sound quite good. So, clone any voice, generate speech, dictate into any app, talk to agents in voices you own, the full voice IO stack running locally on your machine. Here's an example of what the interface looks like. Here you can type out what you want said. It's using Coin 1.7B, and you can see it's generating right there. And it's a beautiful interface, very easy to use, very easy to edit your voice outputs, and it has a bunch of features. It is a very complete product. So, it has near-perfect voice cloning, thank you to whatever model you're using. It has a stories editor, so you can actually edit the audio like you would in any timeline editor. You have audio effects pipeline, local or remote. We have audio transcription and unlimited generation, all coming to you within this product. And I actually made a video going over four incredible open-source projects in-depth. I actually show you how to install them, show you how to use them. Check it out right here.
Jobs for this video
| Stage | Status | Attempts | Last error | Updated |
|---|---|---|---|---|
| summarize | done | 0 | — | 2026-06-24 22:01:26.337158+00:00 |
| transcript | done | 0 | — | 2026-06-24 22:00:57.844306+00:00 |
| metadata | done | 0 | — | 2026-06-24 22:00:31.309636+00:00 |