CoMIC: Cloud-Edge Memory for LLM Agents
CoMIC's cloud-edge architecture lets agents share learned trajectories without retraining. Is this where your memory budget should actually live?
Summary
CoMIC proposes a cloud-edge architecture for multi-agent memory sharing that separates reflection from execution across compute tiers. The core insight is that cross-agent learning does not require parameter updates, only structured trajectory evaluation and routed guidance. If you are building long-horizon agents today, this paper reframes where your memory budget should actually live.
The standard approach to multi-agent memory is embarrassingly local. Each agent accumulates its own context, hits its own dead ends, and has no mechanism to benefit from what a sibling agent just learned on a structurally identical subgoal three tasks ago. CoMIC is a direct attack on this problem, and its architecture is worth understanding at the protocol level, not just the headline level.
The Architecture Is a Deliberate Asymmetry
CoMIC does not treat cloud and edge as interchangeable compute. The split is principled and load-bearing.
Edge Agents Are Intentionally Constrained
Edge agents in CoMIC run smaller, weaker models. They execute locally, which means low latency and no round-trip cost for every action decision. Their memory structure is subgoal-oriented and hierarchical: the agent does not maintain a flat context window of everything that has happened. It maintains a decomposed representation keyed by subgoal identifiers, and it uses selective re-expansion to pull back relevant history only when the current subgoal warrants it.
This is a practical compression decision. Long-horizon tasks generate long trajectories. Feeding the full trajectory back into a weak edge model on every step is a context management disaster. Subgoal-keyed hierarchical memory sidesteps this by making retrieval conditional on semantic relevance rather than recency.
The Cloud Critic Is Not a Supervisor
Here is where most readers will default to a wrong mental model. The cloud-side LLM critic in CoMIC is not a supervisor issuing real-time commands. It operates asynchronously on completed trajectories. After an edge agent finishes a run, the cloud critic evaluates what happened, identifies reusable experience, and packages it as guidance keyed by the same semantic subgoal identifiers the edge agents use locally.
This is closer to a distillation loop than a hierarchical control structure. The cloud component is doing retrospective trajectory evaluation and then routing extracted insights back to agents that will encounter similar subgoals. No model parameters are updated. The learning is entirely in the structured memory passed at inference time.
Cloud Inference Runs Once, Never During Execution
The architectural consequence is significant: cloud computation is batched and asynchronous, not on the critical path of task execution. You pay cloud inference cost once per completed trajectory, not once per action step. For long-horizon tasks with hundreds of action steps, this is a meaningful cost profile difference.
What "Insights Circulation" Actually Means at the Protocol Level
The name CoMIC foregrounds circulation, which is the right word. This is not a shared memory store that all agents read from. That design has well-known problems: stale reads, write conflicts, context bloat when the store grows large. Circulation implies routing, filtering, and delivery rather than open access.
Semantic Subgoal Identifiers Are the Routing Key
When the cloud critic extracts guidance from a completed trajectory, it does not store a blob of narrative text. It keys the guidance by semantic subgoal identifiers, the same identifiers used in the edge agents' hierarchical memory. This means that when an edge agent encounters subgoal X, it can receive guidance specifically generated from other agents' experience on subgoal X, filtered by the cloud critic for reusability.
The filtering step is the part that determines whether this works in practice. Not all trajectory experience is transferable. An agent that succeeded on a subgoal through a path that relied on a specific environmental state is not necessarily generating useful guidance for an agent in a different environment instance. The cloud critic's job is to make this call and suppress noise before it enters the circulation loop.
Weak Filtering Poisons The Entire Guidance Loop
The paper does not fully specify the critic's filtering mechanism, which is the part practitioners should push on before adopting this pattern. If the filtering is weak, circulation becomes contamination.
Three Layers of CoMIC Memory Design
Edge hierarchical memory organizes experience by subgoal, not by timestep. This keeps working context bounded regardless of trajectory length.
2.
Cloud trajectory evaluation runs asynchronously after task completion. The critic filters reusable experience and generates subgoal-keyed guidance without updating any weights.
3.
Cross-agent guidance routing delivers insights to agents that encounter semantically matching subgoals, closing the loop without synchronous coordination overhead.
Progress Rate and Action Grounding Are Separate Wins
CoMIC reports improvements on two distinct metrics: progress rate and action grounding. These are not the same thing, and conflating them obscures what the architecture is actually buying.
Progress rate measures how far an agent advances through a long-horizon task before stalling or failing. Action grounding measures whether individual actions are contextually appropriate, i.e., whether the agent is doing things that make sense given its current state and goal. You can have high progress rate with poor action grounding if the agent is mostly in easy subgoal territory. You can have high action grounding with low progress rate if the agent executes each action correctly but fails to sequence them into meaningful task advancement.
Two Wins Masking Two Very Different Problems
CoMIC's improvements on both dimensions suggest the hierarchical memory structure is helping with local coherence (grounding) while the cross-agent guidance circulation is helping with global task navigation (progress). These are architecturally separable contributions, and you could imagine deploying the hierarchical memory component alone if you do not have a multi-agent setup to justify the cloud critic overhead.
When cross-agent guidance is keyed by semantic subgoal identifiers rather than agent identity or task ID, trajectory experience becomes reusable infrastructure instead of ephemeral context.
What This Does Not Solve
CoMIC does not address the cold start problem honestly enough. When the first agents in a system are completing their first trajectories, the cloud critic has no prior experience to draw from. The circulation loop is empty. Early task performance is entirely dependent on the edge agents' baseline capabilities, and the paper's aggregate success-rate gains are task-dependent, which means some tasks benefit substantially and others do not.
The benchmark covers five long-horizon agent task types, which is a reasonable scope for a research paper. It is not enough to characterize which task structures benefit most from cross-agent guidance circulation versus which are better served by stronger edge models or better retrieval. That characterization matters a great deal for practitioners deciding where to invest.
Frozen Weights Mean Permanently Frozen Ceilings
The "without updating model parameters" framing is a feature in research terms because it makes the system architecture-portable and model-agnostic. In production terms, it also means you are bounded by what inference-time guidance can do. There is a class of persistent failure modes that guidance cannot fix because they are baked into the edge model's capabilities. CoMIC does not claim otherwise, but practitioners need to hold this boundary clearly.
The Bottom Line
- Subgoal-keyed hierarchical memory is a practical pattern for controlling context growth in long-horizon agents, worth adopting independent of the multi-agent setup.
- The cloud critic's asynchronous trajectory evaluation is the architectural move that makes cross-agent learning cost-feasible. Do not confuse it with real-time supervision.
- The filtering quality of the cloud critic determines whether guidance circulation helps or hurts. The paper does not fully specify this mechanism, which is the first thing to stress-test in any implementation.
- Cold start and task-type sensitivity are real limitations. This architecture earns its overhead only in systems where agents are running many trajectories across semantically similar subgoal structures.
- No parameter updates means the gains are portable across model swaps, but also means persistent capability gaps at the edge cannot be papered over by better guidance alone.
Sources: ArXiv CS.AI (June 2, 2026)