The incident did not begin with an alarm headline.
It began with a shape.
On the Access Logs Flow smart panel, the minute chart suddenly stopped looking like normal internet weather and started looking like a coordinated surge. The selected minute on the production server was March 27, 2026 20:05 Asia/Jerusalem. That bucket contained:
4584access-log rows4544distinct IPs2240distinct URLs
That is not a bot dribbling at one endpoint. That is a distributed burst.
What mattered next was not just noticing the anomaly. What mattered was turning it into a disciplined investigation:
- verify the spike in the production log database,
- extract a focused research list,
- use the Syndu plugin inside Codex to investigate the cohort,
- preserve what was learned as governed memory.
1. The anomaly was real
The first thing I did was stop treating the chart as an impressionistic signal and query the production server directly.
The matching minute on the server is 2026-03-27 17:05:00 UTC, which is 2026-03-27 20:05:00 in Israel. The rows matched the dashboard exactly.
That six-minute burst looked like this:
19:59local:2756rows20:00local:3664rows20:01local:2610rows20:02local:3955rows20:03local:4203rows20:04local:4869rows20:05local:4584rows
Across the 20:00-20:05 local window, the server absorbed:
23885rows23139distinct IPs9212distinct URLs
So the panel was not stale, and it was not exaggerating. The system really was crossing through a concentrated burst of internet traffic.
2. The raw log shape mattered more than the raw volume
Once the window was pinned down, I pulled the request mix from the exact minute.
The result was revealing:
- methods:
GET 4583,HEAD 1 - status codes:
499 4413,504 88,301 83
That distribution says a lot.
499 means the client closed the request before the server finished responding. In a normal user-facing flow, you do not expect thousands of distributed client aborts in one minute across thousands of IPs. At the same time, the presence of 504 and 502 in the wider burst window means the platform was not just being touched; some of those touches were stressing the public report surface enough to spill into upstream timeout behavior.
The top URL list made the shape clearer still. It was not concentrated into one login route or one asset. It was spread across:
/- IP report pages
- ASN report pages
- subnet report pages
- search-style report queries
In other words, this was a public-surface sweep, not an authenticated application action.
3. I built a research list instead of investigating all 4,544 IPs
The right response to a distributed anomaly is not to pretend every source is equally interesting.
So I built a research list from the IPs that repeated inside the spike window and looked operationally worth understanding. I also excluded my own browser IP from the cohort so I would not contaminate the investigation with operator traffic.
The focused list was:
| IP | Repeats in burst window | Why it made the list |
|---|---|---|
148.153.56.60 |
11 | Highest repeater, mixed 301/499, multiple URL touches |
43.134.235.152 |
7 | Repeated across multiple public report pages |
138.197.105.13 |
6 | Stable repeater, single-root pressure |
159.223.36.148 |
6 | Stable repeater in same surge class |
188.166.91.161 |
6 | Stable repeater in same surge class |
47.79.196.145 |
4 | Multi-URL repeater with redirects and timeout pressure |
47.79.218.136 |
4 | Multi-URL repeater in the same network family |
47.79.219.197 |
4 | Multi-URL repeater in the same network family |
This is a good example of what a serious telemetry workflow needs to do. It should narrow the incident from a loud field of anonymous requests into a small list of research subjects that a human or an agent can reason about.
4. I used the Syndu plugin as an operator surface, not as a demo
This is where the product itself becomes interesting.
I did not switch to a notebook or hand-roll a separate enrichment path. I used the installed Syndu plugin inside Codex and investigated the cohort through the hosted Syndu MCP server.
That meant the investigation ran through the same product contract we are asking outside operators to trust:
get_workspace_memory_policyresolve_subjectget_report_snapshotget_risk_api_resultexplain_entity_risklookup_subject_memorystore_outcome_event
One small operational wrinkle surfaced immediately: the current shell no longer had an active live Syndu credential, so I created a fresh workspace-bound MCP credential on the server, bound it into the shell, verified the plugin path, and revoked the temporary credential after the investigation was complete.
That is not a flaw in the story. It is part of the story. A real operator surface includes authentication hygiene and credential lifecycle, not just the pretty middle step where the AI tells you something clever.
5. What the plugin could say, and what it could not say
This is an important boundary.
The production AccessLog window gave me the exact event. The Syndu plugin gave me the contextual diagnosis.
Those are not identical things.
For this cohort, the current published report surfaces did not directly corroborate the exact March 27, 2026 minute spike for the investigated IPs. That means Syndu's contribution here was not “yes, this exact minute is already in the report universe.” Its contribution was:
- current contextual risk
- report-backed geography and ownership context
- historical behavioral baselines
- cluster structure across orgs, ASNs, and subnets
- prior governed memory
That distinction matters because it keeps the analysis honest.
The anomaly was real because the server logs said it was real. The plugin was useful because it turned that anomaly into an explainable cohort.
6. The cohort split into four recognizable groups
A. One low-signal CDS Global node
148.153.56.60 resolved to:
AS63199CDS Global Cloud Co., Ltd148.153.56.0/24- Los Angeles, California, United States
Its overall contextual score was only 30, with a direct IP baseline of 0.
That is exactly the kind of IP that becomes dangerous if you overread it. It was noisy in the burst, but Syndu did not support a strong durable claim against it. This one stayed in the monitor_only bucket.
B. One stronger Tencent / Aceville node
43.134.235.152 was more interesting.
It resolved to:
AS132203Tencent Building, Kejizhongyi AvenueAceville Pte. Ltd43.134.235.0/24- Singapore
Its overall contextual score was 67, with a direct IP baseline of 27. More importantly, Syndu exposed a concrete bot label on the entity.
That does not justify a permanent network block on its own, but it is enough to support a stronger operational stance than simple observation. This one landed at challenge_if_recurs.
C. A DigitalOcean trio that looked automated, but not directly high-risk
Three IPs clustered together cleanly:
138.197.105.13159.223.36.148188.166.91.161
All three resolved to AS14061 and DigitalOcean infrastructure, across New Jersey, Singapore, and Amsterdam respectively.
Their pattern inside Syndu was surprisingly coherent:
- overall contextual score:
58 - direct IP baseline:
0 - substantial historical activity footprints
- very regular automation-like cadence
That is a useful outcome. The plugin did not tell me these were harmless. It told me they looked like recurring infrastructure automation or persistent low-grade probing without enough direct behavioral pressure to justify a heavy-handed block.
So all three were kept in monitor_only.
D. An Alibaba Singapore trio with medium contextual risk
The final cluster was:
47.79.196.14547.79.218.13647.79.219.197
These resolved into the same family:
AS45102Alibaba.com LLCAlibaba (US) Technology Co., Ltd.- Singapore
Their overall contextual scores were 57, 59, and 61.
Their direct IP baselines were all 0.
What made them operationally meaningful was not a single terrifying score. It was the cluster logic:
- same provider family
- same geography
- neighboring Alibaba-controlled
/24surfaces - recurring presence in the anomaly cohort
That is enough to justify a step-up posture without leaping to a broad provider-wide block. Each of these landed in challenge_if_recurs.
7. The plugin turned the investigation into memory
Before writing anything back, I asked the server for the workspace memory policy:
stored_outcomes = truehive_mind = true
There was no prior governed memory for any of the eight subjects.
So the plugin wrote the outcome set fresh:
148.153.56.60->monitor_only43.134.235.152->challenge_if_recurs138.197.105.13->monitor_only159.223.36.148->monitor_only188.166.91.161->monitor_only47.79.196.145->challenge_if_recurs47.79.218.136->challenge_if_recurs47.79.219.197->challenge_if_recurs
The plugin wrote eight outcome events onto the public ipaddress subjects, all as clear-share community findings, because these were network indicators rather than private customer identities.
That is one of the most important parts of the whole exercise.
The value of the plugin is not just that it can explain the cohort once. The value is that the next agent does not have to start from zero.
8. What this incident says about the product
This was a good test because it was inconvenient.
The event did not arrive as a clean narrative. The traffic was heavily distributed. The raw logs and the current report surfaces were not saying the exact same thing. The shell credential needed to be refreshed. And the plugin writeback schema even forced one retry before the outcome events landed cleanly.
That is why I trust the result more than I would trust a polished demo.
The operator loop held:
- anomaly detected in the UI
- server-side minute verified directly
- research list extracted from the raw event field
- cohort investigated through the Syndu plugin
- context preserved as governed memory
- temporary credential revoked afterward
That is the shape I want Syndu to keep growing into.
Not just a score. Not just a dashboard. Not just a plugin tab.
A system that can:
- notice,
- narrow,
- explain,
- remember,
- and leave the next investigation in a better state than the last one.
9. The practical lesson from this run
If I had stopped at the anomaly chart, I would have known that traffic was strange.
If I had stopped at the raw logs, I would have known the burst was distributed and public-surface oriented.
If I had stopped at the plugin alone, I would have had contextual risk without the exact event truth.
The useful answer came from putting the layers together:
- production telemetry established the event,
- the research list established what was worth caring about,
- the Syndu plugin established the context,
- memory writeback established what future agents should inherit.
That is a much better story than “there was a spike and some IPs looked suspicious.”
It is the story of how an observatory becomes an operator surface.