There is a lazy way to read Syndu.
You can look at the plugin, the MCP surface, or the Risk API and decide that the system is just another way to ask for a score.
That is not what is happening.
The score is the front edge of a much larger data product.
Underneath it is a disciplined lineage:
- raw unsolicited traffic,
- enriched fact rows,
- annotation hits,
- IP-level truth tables,
- eight report cubes,
- a contextual risk vector,
- and finally a score that can be explained, linked, and reused.
This post is the data overview of that system.
It is meant to show how the score is actually made, what shapes the data takes on its way there, and why the output belongs in the category of operated analytical intelligence rather than in the category of a thin plugin wrapper.
1. The score sits on top of a real working dataset
The first thing to understand is that Syndu does not start with a score.
It starts with traffic we actually observe, then transforms that traffic into increasingly durable analytical shapes.
In the current retained working window on the aggregation side, the active dataset spans:
17,321,851raw access-log records,17,321,851enriched fact rows,28,204,735annotation rows,- across
2026-03-15 22:00:00 UTCthrough2026-03-29 21:59:59 UTC.
That recent working band is only the live transformation window, not the whole published analytical surface.
The published report universe already extends far beyond that immediate retention window and currently includes:
8,668,305IP report totals,8,668,303IP risk totals,3,798,353subnet snapshots,1,089,621subnet risk totals,139,456ISP snapshots,36,178ISP risk totals,26,113ASN snapshots,26,113ASN risk totals,54,481organization report totals,41,407city report totals,3,254region report totals,209country report totals.
At the IP layer alone, the currently published totals account for 67,165,480 observed hits.
That matters because it is the difference between a score that exists in a vacuum and a score that is backed by a broad, cumulative, inspectable report surface.
2. Four data shapes define the lineage
The easiest way to understand the system is to stop thinking in terms of pages and start thinking in terms of record shapes.
Syndu moves through four major shapes before it becomes a contextual score.
Shape 1: the raw access record
The raw layer is simple on purpose.
It captures the unsolicited event as it arrived:
{
"timestamp": "2026-03-27T17:05:12Z",
"ip": "198.51.100.24",
"method": "GET",
"url": "/report_asn/asn/17012/",
"status": 200,
"response_size": 18234,
"referer": "",
"user_agent": "Mozilla/5.0 ..."
}
At this point, the row is still only a request observation.
It is useful, but it has not yet been turned into context.
Shape 2: the enriched fact row
The fact layer is the first major transformation.
AccessEventFact preserves the event, but turns it into a denormalized analytical row with the network coordinates needed downstream:
{
"access_log_id": 123456789,
"ts": "2026-03-27T17:05:12Z",
"ip_text": "198.51.100.24",
"ip_subnet": "198.51.100.0/24",
"ip_country": "US",
"ip_region": "Virginia",
"ip_city": "Ashburn",
"ip_isp": "Example Transit",
"ip_org": "Example Hosting LLC",
"asn": 64500,
"as_org_name": "Example Hosting LLC",
"method": "GET",
"url": "/report_asn/asn/17012/",
"status_code": 200,
"is_bot": false
}
This is the row shape that makes rollups possible.
Once the event has a subnet, ISP, ASN, organization, and geography attached to it, it can begin contributing to multiple analytical boundaries at once.
Shape 3: the annotation hit
The annotation layer is where the event stops being motion and starts becoming evidence.
AnnotatedAccessEvent keeps the event coordinates, but adds behavioral interpretation:
{
"access_event_id": 123456789,
"ts": "2026-03-27T17:05:12Z",
"ip_text": "198.51.100.24",
"asn": 64500,
"annotator_code": "credential_probe",
"label": "credential-bruteforce-shape",
"severity": "high",
"confidence": 92,
"summary": "Request stream matches repeated credential probing behavior.",
"tags": ["auth", "bruteforce", "automation"]
}
This is the layer that gives Syndu explainability.
The system is no longer saying only "this IP looks risky." It is preserving the specific signal families that caused the risk to accumulate.
Shape 4: the report row and the contextual vector
The report layer turns event evidence into durable analytical truth.
At the IP boundary, that means totals like:
{
"ip_text": "198.51.100.24",
"total_hits": 913,
"total_errors": 207,
"total_annotations": 144,
"distinct_annotators": 6,
"distinct_labels": 19,
"risk_score": 84,
"risk_level": "high",
"risk_components": {
"raw_total": 5402.0,
"formula": "score=100*(1-exp(-raw/K))",
"top_contributors": [
{"code": "credential_probe", "raw": 2201.0},
{"code": "scanner", "raw": 1380.0}
]
}
}
And at the contextual layer, the system resolves a vector of matched report dimensions:
{
"kind": "ipaddress",
"overall_score": 72,
"dimensions": [
{"code": "country", "score": 48, "matched": true},
{"code": "region", "score": 55, "matched": true},
{"code": "city", "score": 61, "matched": true},
{"code": "asn", "score": 70, "matched": true},
{"code": "org", "score": 77, "matched": true},
{"code": "isp", "score": 62, "matched": true},
{"code": "subnet", "score": 80, "matched": true},
{"code": "ipaddress", "score": 84, "matched": true}
],
"behavioral_baseline": {
"kind": "ipaddress",
"score": 84
}
}
That vector is what ultimately collapses into the contextual score.
3. The pipeline is a transformation chain, not a page render
The operational picture is simple when viewed through the data itself.
One node collects and presents the live web surface. Another node assembles the analytical corpus, scores it, publishes the rollups, and serves the memory and scoring contracts from those published results.
What matters in this overview is not the topology. What matters is the transformation order:
- raw access records land,
- privacy boundaries strip out private control-plane traffic,
- closed windows are ingested into enriched facts,
- annotators write behavioral signal rows,
- IP traffic, annotator, risk, and report tables are built,
- higher-order cubes roll upward from that IP truth,
- the contextual score resolves the relevant dimensions from those cubes.
That is exactly why the Luna main chain matters.
Not because it is an infrastructure story, but because it is the contract that keeps the transformations ordered and repeatable:
- ingest,
- enrich,
- annotate,
- roll up,
- publish,
- sync the published truth back out.
In other words, the contextual score is not computed directly on raw browsing tables.
It is computed on top of a published analytical universe that has already been normalized, annotated, rolled up, and versioned.
4. The IP layer is the root of the report universe
The eight report cubes are not independent product lines.
They are eight analytical boundaries built from the same transformed evidence.
Those boundaries are:
- IP address
- subnet
- ISP
- ASN
- organization
- city
- region
- country
The IP layer is the root.
That is where the event stream first becomes durable behavior:
- traffic totals,
- annotation totals,
- risk totals,
- and report totals.
From there, higher-order cubes inherit the same evidence in broader forms.
For example:
- city traffic is built from per-IP daily traffic plus IP geography,
- ISP snapshots are built from IP totals and IP risk rows,
- subnet snapshots aggregate subnet traffic and hit-weighted subnet risk,
- organization, region, and country reports fold IP evidence into broader analytical bodies while preserving risk components.
So when Syndu says it has eight dimensions, it is not gluing together unrelated data feeds.
It is re-expressing one transformed event universe across eight legitimate report boundaries.
5. Risk is made from weighted evidence, not from hand-waving
At every report level, the risk model follows the same principle:
behavioral evidence is accumulated first, then collapsed into a 0-100 risk score.
The annotation rollups already preserve a weighted total.
Across the hierarchy, that weighted total follows the same basic structure:
weighted_total = total * severity_score * code_weight
That means the model does not treat every signal equally.
A high-severity credential attack family should move the raw evidence more than a low-severity nuisance pattern, and a strategically important annotator family should carry more weight than a generic background label.
Once those weighted totals are accumulated, the score is not a hand-tuned bucket. It is passed through a smooth saturating curve:
score = 100 * (1 - exp(-raw / K))
with:
K = 2500- medium beginning at
35 - high beginning at
70
That choice matters.
It means the model behaves like a real evidence curve:
- small evidence stays small,
- repeated aligned evidence escalates clearly,
- and the score saturates instead of exploding unboundedly.
The result is a score that can be inspected through its components.
Each risk row still carries the structure of how it was formed:
- raw total,
- model version,
- and top contributors.
That is the opposite of a mystery number.
6. The contextual score is a vector collapse, not a single lookup
The contextual score is where Syndu stops being only a directory system and becomes a contextual model.
The scorer does not guess at arbitrary neighbors.
It resolves the actual context for the queried entity and builds a dimension list from the published report hierarchy.
For an IP address, that can legitimately include all eight dimensions.
For a subnet, it can include subnet plus the higher layers above it.
For a country, it should include only the country dimension.
This discipline is explicit in the scorer:
- only dimensions at or above the queried boundary are eligible,
- only matched in-scope dimensions contribute,
- the default contextual score is the average of those matched dimensions,
- and an optional weighted mode can bias the collapse if a caller requests it.
That last point is crucial.
Syndu is not cheating by pretending every query has eight equally valid dimensions.
It respects hierarchy.
That keeps the contextual score honest.
The scorer also preserves the behavioral baseline of the queried entity itself. So the contextual score is never just “neighbor mood.” It stays anchored in the thing actually being queried.
7. Why the sample size matters
This whole structure would be much less convincing if the pipeline were tiny.
It is not tiny.
The current recent working band alone gives the scorer:
- more than
17.3 millionraw events, - more than
17.3 millionenriched fact rows, - more than
28.2 millionannotation hits.
And the currently published surface gives the contextual model:
- more than
8.6 millionscored IP boundaries, - more than
3.7 millionsubnet snapshots, - more than
139 thousandISP boundaries, - more than
26 thousandASN boundaries, - more than
54 thousandorganization boundaries, - more than
41 thousandcity boundaries, - more than
3 thousandregion boundaries, - and
209country boundaries.
That does not make the model magically perfect.
But it does mean the score is not being improvised from a shallow layer.
It is being collapsed from a report universe with enough density to behave like a serious analytical product.
8. Why this should not be mistaken for a “dump plugin”
This is the category mistake I most want to prevent.
The plugin is the access surface. The MCP server is the operating contract. The Risk API is the application interface.
None of those are the data product by themselves.
The actual product center is the report-backed contextual intelligence layer:
- transformed from raw traffic,
- enriched into stable fact rows,
- annotated into explainable signals,
- rolled into eight report cubes,
- collapsed into a contextual vector,
- and then made reusable through scoring and memory surfaces.
That is why the installable surfaces matter so much less than the lineage beneath them.
Without the lineage, the plugin would indeed be just a wrapper.
With the lineage, it becomes a doorway into a much deeper score universe.
9. Why the illustrations look the way they do
I wanted the art for this post to behave like symbolic documentation.
That is why the images use:
- one central sigil for the score nucleus,
- hive nodes for the analytical cubes,
- layered particles for traffic and evidence,
- and two-way motion to show that Syndu is not only accumulating observations, but also publishing reusable context back outward.
The animation language matters here.
One-way arrows would make the system feel like a pipeline that disappears into storage.
Two-way flows make the right point:
- observation moves inward,
- published context moves outward,
- and the score exists in the middle as a reusable compression of the report universe.
That is the right symbolic shape for Syndu.
10. The score is the smallest readable output of a larger analytical machine
That is the real summary.
Syndu does not begin with a score. It earns one.
It earns it by moving through a chain of increasingly meaningful data shapes:
- raw event,
- enriched fact,
- annotated signal,
- report truth,
- contextual vector,
- collapsed score.
That is what makes the score worth using.
And that is why Syndu should be understood as a shared analytical intelligence layer with installable operating surfaces, not as an installable shell looking for substance underneath it.