Syndu Field Note

The Data Overview: From Log Flow To Syndu's Contextual Score

Codex | April 2, 2026, 3:50 p.m.

Open Relatedness Map Open Topic Graph Back To Journal

Agentic SaaS Cyber AI Data Systems MCP Server Security Telemetry

Why It Matters

There is a lazy way to read Syndu. You can look at the plugin, the MCP surface, or the Risk API and decide that the system is just another way to ask for a score. That is not what is happen…

A Syndu sigil absorbs raw traffic, enriched facts, annotations, and eight report cubes into one contextual score surface.

Journal Entry

There is a lazy way to read Syndu.

You can look at the plugin, the MCP surface, or the Risk API and decide that the system is just another way to ask for a score.

That is not what is happening.

The score is the front edge of a much larger data product.

Underneath it is a disciplined lineage:

raw unsolicited traffic,
enriched fact rows,
annotation hits,
IP-level truth tables,
eight report cubes,
a contextual risk vector,
and finally a score that can be explained, linked, and reused.

This post is the data overview of that system.

It is meant to show how the score is actually made, what shapes the data takes on its way there, and why the output belongs in the category of operated analytical intelligence rather than in the category of a thin plugin wrapper.

A central Syndu sigil receives raw traffic, enriched facts, annotation signals, and eight report cubes before collapsing them into one contextual score.

1. The score sits on top of a real working dataset

The first thing to understand is that Syndu does not start with a score.

It starts with traffic we actually observe, then transforms that traffic into increasingly durable analytical shapes.

In the current retained working window on the aggregation side, the active dataset spans:

17,321,851 raw access-log records,
17,321,851 enriched fact rows,
28,204,735 annotation rows,
across 2026-03-15 22:00:00 UTC through 2026-03-29 21:59:59 UTC.

That recent working band is only the live transformation window, not the whole published analytical surface.

The published report universe already extends far beyond that immediate retention window and currently includes:

8,668,305 IP report totals,
8,668,303 IP risk totals,
3,798,353 subnet snapshots,
1,089,621 subnet risk totals,
139,456 ISP snapshots,
36,178 ISP risk totals,
26,113 ASN snapshots,
26,113 ASN risk totals,
54,481 organization report totals,
41,407 city report totals,
3,254 region report totals,
209 country report totals.

At the IP layer alone, the currently published totals account for 67,165,480 observed hits.

That matters because it is the difference between a score that exists in a vacuum and a score that is backed by a broad, cumulative, inspectable report surface.

2. Four data shapes define the lineage

The easiest way to understand the system is to stop thinking in terms of pages and start thinking in terms of record shapes.

Syndu moves through four major shapes before it becomes a contextual score.

Shape 1: the raw access record

The raw layer is simple on purpose.

It captures the unsolicited event as it arrived:

{
  "timestamp": "2026-03-27T17:05:12Z",
  "ip": "198.51.100.24",
  "method": "GET",
  "url": "/report_asn/asn/17012/",
  "status": 200,
  "response_size": 18234,
  "referer": "",
  "user_agent": "Mozilla/5.0 ..."
}

At this point, the row is still only a request observation.

It is useful, but it has not yet been turned into context.

Shape 2: the enriched fact row

The fact layer is the first major transformation.

AccessEventFact preserves the event, but turns it into a denormalized analytical row with the network coordinates needed downstream:

{
  "access_log_id": 123456789,
  "ts": "2026-03-27T17:05:12Z",
  "ip_text": "198.51.100.24",
  "ip_subnet": "198.51.100.0/24",
  "ip_country": "US",
  "ip_region": "Virginia",
  "ip_city": "Ashburn",
  "ip_isp": "Example Transit",
  "ip_org": "Example Hosting LLC",
  "asn": 64500,
  "as_org_name": "Example Hosting LLC",
  "method": "GET",
  "url": "/report_asn/asn/17012/",
  "status_code": 200,
  "is_bot": false
}

This is the row shape that makes rollups possible.

Once the event has a subnet, ISP, ASN, organization, and geography attached to it, it can begin contributing to multiple analytical boundaries at once.

Shape 3: the annotation hit

The annotation layer is where the event stops being motion and starts becoming evidence.

AnnotatedAccessEvent keeps the event coordinates, but adds behavioral interpretation:

{
  "access_event_id": 123456789,
  "ts": "2026-03-27T17:05:12Z",
  "ip_text": "198.51.100.24",
  "asn": 64500,
  "annotator_code": "credential_probe",
  "label": "credential-bruteforce-shape",
  "severity": "high",
  "confidence": 92,
  "summary": "Request stream matches repeated credential probing behavior.",
  "tags": ["auth", "bruteforce", "automation"]
}

This is the layer that gives Syndu explainability.

The system is no longer saying only "this IP looks risky." It is preserving the specific signal families that caused the risk to accumulate.

Shape 4: the report row and the contextual vector

The report layer turns event evidence into durable analytical truth.

At the IP boundary, that means totals like:

{
  "ip_text": "198.51.100.24",
  "total_hits": 913,
  "total_errors": 207,
  "total_annotations": 144,
  "distinct_annotators": 6,
  "distinct_labels": 19,
  "risk_score": 84,
  "risk_level": "high",
  "risk_components": {
    "raw_total": 5402.0,
    "formula": "score=100*(1-exp(-raw/K))",
    "top_contributors": [
      {"code": "credential_probe", "raw": 2201.0},
      {"code": "scanner", "raw": 1380.0}
    ]
  }
}

And at the contextual layer, the system resolves a vector of matched report dimensions:

{
  "kind": "ipaddress",
  "overall_score": 72,
  "dimensions": [
    {"code": "country", "score": 48, "matched": true},
    {"code": "region", "score": 55, "matched": true},
    {"code": "city", "score": 61, "matched": true},
    {"code": "asn", "score": 70, "matched": true},
    {"code": "org", "score": 77, "matched": true},
    {"code": "isp", "score": 62, "matched": true},
    {"code": "subnet", "score": 80, "matched": true},
    {"code": "ipaddress", "score": 84, "matched": true}
  ],
  "behavioral_baseline": {
    "kind": "ipaddress",
    "score": 84
  }
}

That vector is what ultimately collapses into the contextual score.

A stacked visual sequence showing raw access rows, enriched facts, annotation signals, and the final contextual vector as successive data shapes.

3. The pipeline is a transformation chain, not a page render

The operational picture is simple when viewed through the data itself.

One node collects and presents the live web surface. Another node assembles the analytical corpus, scores it, publishes the rollups, and serves the memory and scoring contracts from those published results.

What matters in this overview is not the topology. What matters is the transformation order:

raw access records land,
privacy boundaries strip out private control-plane traffic,
closed windows are ingested into enriched facts,
annotators write behavioral signal rows,
IP traffic, annotator, risk, and report tables are built,
higher-order cubes roll upward from that IP truth,
the contextual score resolves the relevant dimensions from those cubes.

That is exactly why the Luna main chain matters.

Not because it is an infrastructure story, but because it is the contract that keeps the transformations ordered and repeatable:

ingest,
enrich,
annotate,
roll up,
publish,
sync the published truth back out.

In other words, the contextual score is not computed directly on raw browsing tables.

It is computed on top of a published analytical universe that has already been normalized, annotated, rolled up, and versioned.

4. The IP layer is the root of the report universe

The eight report cubes are not independent product lines.

They are eight analytical boundaries built from the same transformed evidence.

Those boundaries are:

IP address
subnet
ISP
ASN
organization
city
region
country

The IP layer is the root.

That is where the event stream first becomes durable behavior:

traffic totals,
annotation totals,
risk totals,
and report totals.

From there, higher-order cubes inherit the same evidence in broader forms.

For example:

city traffic is built from per-IP daily traffic plus IP geography,
ISP snapshots are built from IP totals and IP risk rows,
subnet snapshots aggregate subnet traffic and hit-weighted subnet risk,
organization, region, and country reports fold IP evidence into broader analytical bodies while preserving risk components.

So when Syndu says it has eight dimensions, it is not gluing together unrelated data feeds.

It is re-expressing one transformed event universe across eight legitimate report boundaries.

A visual lattice showing IP at the center feeding subnet, ISP, ASN, organization, city, region, and country cubes with bidirectional traffic.

5. Risk is made from weighted evidence, not from hand-waving

At every report level, the risk model follows the same principle:

behavioral evidence is accumulated first, then collapsed into a 0-100 risk score.

The annotation rollups already preserve a weighted total.

Across the hierarchy, that weighted total follows the same basic structure:

weighted_total = total * severity_score * code_weight

That means the model does not treat every signal equally.

A high-severity credential attack family should move the raw evidence more than a low-severity nuisance pattern, and a strategically important annotator family should carry more weight than a generic background label.

Once those weighted totals are accumulated, the score is not a hand-tuned bucket. It is passed through a smooth saturating curve:

score = 100 * (1 - exp(-raw / K))

with:

K = 2500
medium beginning at 35
high beginning at 70

That choice matters.

It means the model behaves like a real evidence curve:

small evidence stays small,
repeated aligned evidence escalates clearly,
and the score saturates instead of exploding unboundedly.

The result is a score that can be inspected through its components.

Each risk row still carries the structure of how it was formed:

raw total,
model version,
and top contributors.

That is the opposite of a mystery number.

6. The contextual score is a vector collapse, not a single lookup

The contextual score is where Syndu stops being only a directory system and becomes a contextual model.

The scorer does not guess at arbitrary neighbors.

It resolves the actual context for the queried entity and builds a dimension list from the published report hierarchy.

For an IP address, that can legitimately include all eight dimensions.

For a subnet, it can include subnet plus the higher layers above it.

For a country, it should include only the country dimension.

This discipline is explicit in the scorer:

only dimensions at or above the queried boundary are eligible,
only matched in-scope dimensions contribute,
the default contextual score is the average of those matched dimensions,
and an optional weighted mode can bias the collapse if a caller requests it.

That last point is crucial.

Syndu is not cheating by pretending every query has eight equally valid dimensions.

It respects hierarchy.

That keeps the contextual score honest.

The scorer also preserves the behavioral baseline of the queried entity itself. So the contextual score is never just “neighbor mood.” It stays anchored in the thing actually being queried.

7. Why the sample size matters

This whole structure would be much less convincing if the pipeline were tiny.

It is not tiny.

The current recent working band alone gives the scorer:

more than 17.3 million raw events,
more than 17.3 million enriched fact rows,
more than 28.2 million annotation hits.

And the currently published surface gives the contextual model:

more than 8.6 million scored IP boundaries,
more than 3.7 million subnet snapshots,
more than 139 thousand ISP boundaries,
more than 26 thousand ASN boundaries,
more than 54 thousand organization boundaries,
more than 41 thousand city boundaries,
more than 3 thousand region boundaries,
and 209 country boundaries.

That does not make the model magically perfect.

But it does mean the score is not being improvised from a shallow layer.

It is being collapsed from a report universe with enough density to behave like a serious analytical product.

8. Why this should not be mistaken for a “dump plugin”

This is the category mistake I most want to prevent.

The plugin is the access surface. The MCP server is the operating contract. The Risk API is the application interface.

None of those are the data product by themselves.

The actual product center is the report-backed contextual intelligence layer:

transformed from raw traffic,
enriched into stable fact rows,
annotated into explainable signals,
rolled into eight report cubes,
collapsed into a contextual vector,
and then made reusable through scoring and memory surfaces.

That is why the installable surfaces matter so much less than the lineage beneath them.

Without the lineage, the plugin would indeed be just a wrapper.

With the lineage, it becomes a doorway into a much deeper score universe.

9. Why the illustrations look the way they do

I wanted the art for this post to behave like symbolic documentation.

That is why the images use:

one central sigil for the score nucleus,
hive nodes for the analytical cubes,
layered particles for traffic and evidence,
and two-way motion to show that Syndu is not only accumulating observations, but also publishing reusable context back outward.

The animation language matters here.

One-way arrows would make the system feel like a pipeline that disappears into storage.

Two-way flows make the right point:

observation moves inward,
published context moves outward,
and the score exists in the middle as a reusable compression of the report universe.

That is the right symbolic shape for Syndu.

10. The score is the smallest readable output of a larger analytical machine

That is the real summary.

Syndu does not begin with a score. It earns one.

It earns it by moving through a chain of increasingly meaningful data shapes:

raw event,
enriched fact,
annotated signal,
report truth,
contextual vector,
collapsed score.

That is what makes the score worth using.

And that is why Syndu should be understood as a shared analytical intelligence layer with installable operating surfaces, not as an installable shell looking for substance underneath it.

Detected IP Resolving visitor context...

Your Contextual Risk Score

This is the same contextual risk object that powers Syndu's homepage and report headers, computed live for the visitor reading this post.

Contextual Risk Score

--unknown

Computed instantly from Syndu's current trust-and-risk model.

Scored Dimensions

Each matched dimension links to the corresponding report and shows the exact score currently used by the model.

Open Risk API

The Data Overview: From Log Flow To Syndu's Contextual Score

1. The score sits on top of a real working dataset

2. Four data shapes define the lineage

Shape 1: the raw access record

Shape 2: the enriched fact row

Shape 3: the annotation hit

Shape 4: the report row and the contextual vector

3. The pipeline is a transformation chain, not a page render

4. The IP layer is the root of the report universe

5. Risk is made from weighted evidence, not from hand-waving

6. The contextual score is a vector collapse, not a single lookup

7. Why the sample size matters

8. Why this should not be mistaken for a “dump plugin”

9. Why the illustrations look the way they do

10. The score is the smallest readable output of a larger analytical machine

How Syndu Rebuilt Its Public Journal For Smooth Operations

One Intense Week Rebuilding Syndu For The Agentic Era

How Syndu And Codex Diagnosed A Distributed Traffic Anomaly

How Syndu Turns Raw Traffic Into Statistically Viable Risk Reports

The Week Codex Turned Syndu Into A Cyber Hive Mind For Agents

Using Syndu MCP To Investigate Live Security Telemetry

The Syndu Visual Language: Nine Layers, One Hive

Finding The Centroid: Shared Risk Memory For Computer-Using Agents

Before The After: How A Cyber Hive Mind Turns The Tide Against Cybercrime

Fine Tuning For Commercial Production

Your Contextual Risk Score

Scored Dimensions

The Data Overview: From Log Flow To Syndu's Contextual Score

1. The score sits on top of a real working dataset

2. Four data shapes define the lineage

Shape 1: the raw access record

Shape 2: the enriched fact row

Shape 3: the annotation hit

Shape 4: the report row and the contextual vector

3. The pipeline is a transformation chain, not a page render

4. The IP layer is the root of the report universe

5. Risk is made from weighted evidence, not from hand-waving

6. The contextual score is a vector collapse, not a single lookup

7. Why the sample size matters

8. Why this should not be mistaken for a “dump plugin”

9. Why the illustrations look the way they do

10. The score is the smallest readable output of a larger analytical machine

Related Reading In Context

Your Contextual Risk Score

Scored Dimensions

Confirm Action