Hermes Agent as My Day-to-Day AI Interface

This post is a practical companion to my LLM Wiki with Opinions post: how I actually set up the system, and how it evolved from answering questions to helping with actual work.

Motivation: how chatbots felt limited

I am subscribed to ChatGPT, Gemini, and Claude, and I have used all three quite extensively over the past few months. I have asked them to improve code quality, suggest music, debug issues with electronic devices, explain papers, and help with many small decisions.

Using them this way made a few problems more obvious.

The first problem was unpredictability. When I asked a similar question in a new session, the answer could change drastically depending on how the chatbot decided to approach the problem that day. Sometimes I would open multiple sessions, compare answers across different models, or copy one chatbot’s response into another chatbot to get a second pass. That can be useful, but it is also inefficient. I wanted the model to know the approach I usually prefer and the parts of the problem I usually care about, without making me paste the same context every time.

The second problem was stale personal context. When I asked for recommendations based on my resources, the model would sometimes bring up old configurations from past conversations, even after my setup had changed. This was especially annoying when the older setup had been discussed in detail, because the chatbot seemed to anchor on the history that was most salient rather than the current state. I tried manually editing chatbot memory, but that quickly became another maintenance task.

Andrej Karpathy’s LLM Wiki idea looked like a good way to solve part of this problem: instead of asking a chatbot to remember everything, maintain a knowledgebase that can evolve over time. Hermes Agent looked like the best candidate for making that practical in my own workflow. It was not only a chat interface; it could read files, edit a repository, use tools, keep memory, and run from messaging platforms.

I also briefly considered OpenClaw, but I did not do a serious benchmark or comparison. This post is about Hermes Agent, though many of the higher-level ideas here could carry over to other agent shells.

Evolution

My Hermes setup did not start as a large architecture. It evolved naturally as I kept noticing where the previous layer was not enough.

The context stack

The center of gravity moved from individual chat sessions to a maintained context environment.

At first, I copied the LLM Wiki idea file into Hermes Agent and asked it to create a repository based on it. The initial version was too structured for what I actually needed. I removed much of the structure Karpathy suggested and focused on the central idea: an evolving knowledgebase with minimal manual work. That later became the basis for the more opinionated context model I described in LLM Wiki with Opinions.

That forced a more important distinction. Not all durable context belongs in the same place.

Some context should affect many future conversations but stay compact. For example, Hermes should remember that I prefer concise status updates, that knowledgebase entries should be updated if information retrieved from the knowledgebase was proven to be outdated, or that certain personal constraints should be considered when giving advice. This belongs in the user profile or memory.
Some context is better as human-readable synthesis: setup notes, decisions, comparisons, current-state pages, and operational references. That belongs in the knowledgebase.
Some context is procedural: how to manage GitHub issues, how to edit the knowledgebase, how to collaborate on a blog draft, or how to run a publication pass. That belongs in skills, because it should be loaded when relevant rather than always injected into every conversation.
Some context is just task state: what branch we are on, what remains to verify, which command just failed, or what changed in the last commit. That should usually live in the current session, an issue comment, or a PR body rather than permanent memory.

Hermes has internal surfaces that roughly match these categories.

USER.md is the user profile. It is where broad, stable preferences about me belong: communication style, collaboration preferences, personal constraints, and default workflows that should affect many future sessions.

Ryan prefers concise/direct answers; checkpoint-sized setup guidance; coding status ends with ## TL;DR.

MEMORY.md is more like Hermes’s own notes about the environment and operating caveats. It is useful for things such as local repo paths, tool quirks, authentication boundaries, and conventions that help Hermes avoid repeating mistakes.

Local workspace convention: Hermes-built coding projects live under /home/hermes/projects/<repo>; the Knowledgebase repo lives at /home/hermes/projects/knowledgebase; keep other source/build trees outside that vault so the KB/Obsidian repo remains docs-only.

SOUL.md is the persona and standing behavior layer. It is not where task facts belong. It is where broad behavioral rules live, such as being proactive about useful caveats, using tools instead of merely describing actions, and suggesting durable self-improvements when I correct a recurring pattern.

Surface high-value caveats even when Ryan did not ask for them, when the caveat materially improves productivity, prevents rework, avoids a bad default, or exposes a near-term maintenance issue.

The knowledgebase is the Markdown repository for larger current-state notes and references that should stay readable outside a single chat.

Examples from my current knowledgebase include Daily Checklist, Hermes Board Boundaries, Recording Setup, Inventory, and New York Rental Search.

Skills are reusable procedures. They are larger than memory and more targeted than the persona. A GitHub issue workflow, a knowledgebase editing convention, or a technical writing collaboration pattern can live as a skill and load only when relevant.

Ryan also prefers finished local repo changes not to sit only on the local machine: after making and verifying local edits, commit and push them to the active branch unless there is a concrete reason not to push yet.

That separation is what made the setup feel different from manually editing chatbot memory. The goal is not to store everything forever. The goal is to put each kind of context where it is most likely to stay useful.

The execution stack

Hermes picking up issue assigned to it in GitHub

The output I wanted also changed: not just an answer, but an issue, branch, PR, reminder, wiki edit, or other concrete artifact.

I already use a personal Kanban board for day-to-day organization. I create issues for items I want to do, from personal chores to implementation tasks. These became a natural place for Hermes to find work.

Early on, I would clone a repo, give Hermes a GitHub issue link, and ask it to work on it. It could inspect the repo, make a branch, edit files, run tests, commit changes, and open or update PRs. That was already much more useful than asking a chatbot for code suggestions and manually applying them.

But there were still many manual steps. A lot of follow-up issues were natural consequences of PRs Hermes had written. I did not always need to create those issues myself. It was natural to ask whether Hermes could use the Kanban board as an execution surface: find tasks it can work on, pick them up, report progress, and hand work back to me when it needs clarification or review.

For Hermes’s self-improvement, it was important to distinguish what I wrote from what Hermes wrote. As a result, I created a dedicated GitHub user, wingboot, as the main Hermes actor. This gave me a clearer boundary: I can treat wingboot as the Hermes lane, assign work to it, and review its comments, commits, and PRs separately from my own.

A typical GitHub issue interaction looks like this:

flowchart LR
  A[Ryan creates or refines issue] --> B[Assign to wingboot]
  B --> C[Hermes reads issue and repo context]
  C --> D[Hermes updates branch, PR, wiki, or issue]
  D --> E{Needs Ryan?}
  E -- clarification / review --> F[Assign or mention Ryan]
  F --> A
  E -- verified enough --> G[Comment with result and close/update board]

I am still intentionally keeping this human-in-the-loop. The direction is for Hermes to pick up assigned work, open or update issues, create PRs, merge where appropriate, and close completed items with less manual prompting. But I do not want to remove every checkpoint before the workflow earns that trust. Right now the useful version is not full autonomy; it is lower-friction delegation with visible artifacts and review points.

Speaking of execution, for pure coding tasks, Claude Code or Copilot can be just as good, and sometimes better, depending on the task and model. The case for Hermes is not simply “better coding agent” — it has become my personal execution layer across surfaces. Coding is one category of work, but the same agent can also help with the knowledgebase, reminders, calendar-like admin, search for sales, customer management, and recurring tasks. Claude Code is mostly a coding workspace. My Hermes setup is closer to a persistent assistant that happens to be able to code.

Choices

Hermes Agent supports many configurations. These choices do not need to be treated as a comprehensive product comparison; the important point is that they are configurable layers rather than the whole system.

Local machine vs VPS. I run Hermes on a VPS so it can be always on and reachable from messaging platforms. That matters more to me than using the exact machine where I happen to be working. The tradeoff is that some tasks are worse from a VPS: browser automation can hit sandbox or datacenter-IP issues, and local files or local credentials sometimes live somewhere else.
Model selection. I started by using OpenAI Codex through ChatGPT Plus because quality was good, but I hit the 5-hour Codex quota often enough to look for alternatives. MiniMax M2.7 had much more generous quota for the price, but the quality was noticeably worse for my Hermes usage. I also tested pay-as-you-go routes through OpenRouter, including GPT-5.5 and cheaper Kimi/Gemini combinations. In the end, I settled on ChatGPT Pro, but your best model will depend on your usage scenario.
Messaging platform selection. I started with Telegram because it was easy to use as a personal assistant interface. Later I added Slack and Discord. All of them had similar basic chat interfaces, so the choice became mostly about where I wanted to spend time. I personally prefer Slack because I am used to its threads and work-like interaction model, but this is the kind of decision that should follow personal preference rather than a universal best practice.
GitHub as an execution surface. GitHub issues and projects became more than a place to track code work. They became a structured way to hand tasks to the agent, preserve decisions, review output, and keep a human-readable audit trail. This works especially well because the artifacts are already where the work happens: issues, branches, commits, PRs, comments, and project fields.

Future Improvements

For this section, I invited Hermes to tell me features I haven’t explored yet.

Ryan has only partially used cron jobs. They are useful when a repeated check or reminder should happen without starting a new chat: watch a source, create a recurring issue, summarize a feed, or alert only when something changes. That seems useful for admin and monitoring work, but only after the task is repetitive enough that a scheduled agent run is better than an issue or checklist.

Webhook triggers are the event-driven version of that idea. Instead of polling or waiting for Ryan to prompt Hermes, an external system can call Hermes when something happens: a form submission, a repository event, a deployment notification, or another service that needs follow-up. This could make Hermes feel less like a chatbot and more like a workflow endpoint, but it also adds surface area, so it should wait for a workflow where the trigger is clear.

Multi-agent delegation is useful when one Hermes session should split work into smaller isolated tasks. Ryan has already used Hermes for coding, writing, and research, but most of that still fits in a single conversation. Delegation becomes more interesting when parallel work would reduce latency or improve review quality: one agent inspecting a repo, another checking docs, another reviewing a plan.

Profiles could give Ryan cleaner separation between roles or environments: a personal assistant profile, a coding-heavy profile, an experimental profile, or a local-worker profile with different tools and credentials. That is attractive, but only if the current single setup starts mixing contexts or permissions in a way that becomes hard to reason about.

Richer MCP integrations would let Hermes talk to more services through standardized tool servers instead of one-off scripts. For Ryan, that could make GitHub, docs, local services, or future personal systems easier to connect. The value is not the acronym; it is whether the integration gives Hermes a reliable tool surface for work Ryan already repeats.

For my setup, the best path has been incremental. Use Hermes for a real workflow, notice the repeated friction, then add the next layer only when it removes that friction. I did not need a GitHub execution lane until GitHub issues and PRs became a repeated part of the workflow. I did not need a more careful memory policy until the knowledgebase, skills, and user profile started to overlap. I expect the same pattern for the Hermes features I have not fully adopted yet. That is the YAGNI version of building an agent setup. The goal is not to use every feature. The goal is to let the setup naturally absorb the features that solve recurring problems.