Zoomable Collapsible Org‑Tree (Vega) — Line‑by‑Line Step‑by‑Step Guide

This document explains every important part of the Zoomable, collapsible tree Vega JSON you provided. It walks through the spec step‑by‑step: what each section is for, what the key signals and transforms do, how the layout and marks are built, and how the interactivity (pan, zoom, expand/collapse, highlight) works.

Tip: open this document side‑by‑side with your Deneb editor so you can test changes and see results immediately.

Overview (how it works)
Top‑level metadata
Signals — interactive variables (full explanation)
Data pipelines & transforms (wideToTall → treeCalcs → layout → visibleNodes)
Scales (x/y, KPI, colour)
Marks (links, node group and internal marks)
Interactivity — how events change signals and visuals
How data flows (end to end)
Practical notes for Deneb / Power BI
Small edits & common tweaks
Exercises and next steps

1) Overview — how this spec works (short)

At a high level:

Power BI / Deneb feeds rows into the dataset table. Each row typically contains hierarchical columns (level1..level5), person, and kpi.
wideToTall transforms that wide-structure into a tall (id,parent) list so Vega can stratify it into a hierarchy.
tree transforms compute x, y, depth, and children positions for nodes.
treeLayout + fullTreeLayout and filtering logic compute the set of currently visible nodes depending on the opening depth or user clicks.
links are computed with treelinks + linkpath to create smooth connecting paths.
Marks render links and a grouped node (a group contains rects, kpi bars and text). Signals drive zoom/pan, hover highlights, and top‑level controls.

We’ll now open each section and explain every essential line.

2) Top‑level metadata (first ~10 lines)

"$schema": "https://vega.github.io/schema/vega/v5.json",
"description": "Zoomable, collapsable tree by David Bacci: https://www.linkedin.com/in/davbacci/",
"width": {"signal": "1240"},
"height": {"signal": "600"},
"background": "#f5f5f5",
"autosize": "pad",
"padding": 5,

Explanation:

$schema: declares the Vega version (v5). Use this to get proper validation and features.
description: free text for humans.
width / height: they are signals (so can be updated dynamically) — here set to constants (1240×600). In Deneb you may change these or map them to dashboard container size.
background: chart background color.
autosize: "pad" + padding: ensures marks fit properly inside the container instead of being clipped.

3) Signals — interactive variables (detailed)

Signals are Vega’s reactive variables. Events (mouse, timers, signals) update them; changes re‑compute transforms & marks.

I will list each signal from your spec, show the JSON snippet, and explain it in plain English.

a) Node / spacing constants

{ "name": "nodeWidth", "value": 190 },
{ "name": "nodeHeight", "value": 45 },
{ "name": "verticalNodeGap", "value": 10 },
{ "name": "horizontalNodeGap", "value": 140 },

Simple constants to control node rectangle size and spacing used later in nodeSize and layout tweaks.

b) `startingDepth` — initial reveal depth

{
  "name": "startingDepth",
  "value": 1,
  "on": [
    {
      "events": { "type": "timer", "throttle": 0 },
      "update": "-1"
    }
  ]
}

Starts as 1, but the timer event immediately sets it to -1 (a trick often used to run a setup step once). This allows the spec to insert an initial set of nodes into the persistent store — read on to treeClickStorePerm for how startingDepth triggers that.

c) `node` — id of clicked node (or 0)

{
  "name": "node",
  "value": 0,
  "on": [
    { "events": { "type": "click", "markname": "node" }, "update": "datum.id" },
    { "events": { "type": "timer", "throttle": 10 }, "update": "0" }
  ]
}

When a mark named node is clicked, node becomes datum.id (the clicked node id). The timer fallback sets it to 0 periodically — this pattern helps in edge cases and event sequencing in Deneb/Vega.

d) `nodeHighlight` — node + ancestor highlight on hover

{
  "name": "nodeHighlight",
  "value": "[0]",
  "on": [
    { "events": { "type": "mouseover", "markname": "node" }, "update": "pluck(treeAncestors('treeCalcs', datum.id), 'id')" },
    { "events": { "type": "mouseout" }, "update": "[0]" }
  ]
}

On mouseover of a node, this becomes an array of ids returned by pluck(treeAncestors('treeCalcs', datum.id), 'id').
treeAncestors('treeCalcs', id) returns a list of ancestor node objects from the treeCalcs dataset. pluck(...,'id') extracts their id properties. So nodeHighlight becomes [ancestorId1, ancestorId2, ..., clickedId]. Used to visually emphasize path to root.

e) `isExpanded` — whether a node is expanded (click logic)

{
  "name": "isExpanded",
  "value": 0,
  "on": [
    {
      "events": { "type": "click", "markname": "node" },
      "update": "datum.children > 0 && indata('treeClickStorePerm', 'id', datum.childrenIds[0])?true:false"
    }
  ]
}

When clicking a node, isExpanded becomes true if
- the clicked node has children AND
- the first child id is already present in the treeClickStorePerm dataset (i.e., the children are currently stored as visible), otherwise false.
This determines whether the click is expanding or collapsing.

f) `xrange` / `yrange` — pixel ranges for scales

{ "name": "xrange", "update": "[0, width]" },
{ "name": "yrange", "update": "[0, height]" },

Simple arrays giving the visible pixel extents used by xscale and yscale via signals.

g) `down`, `xcur`, `ycur`, `delta` — panning calculations

{ "name": "down", "value": null, "on": [ ... ] },
{ "name": "xcur", "value": null, "on": [ ... ] },
{ "name": "ycur", "value": null, "on": [ ... ] },
{ "name": "delta", "value": [0,0], "on": [ ... ] }

down captures the mouse/touch start coordinate via xy() on mousedown/touchstart and resets on touchend.
xcur / ycur store slices of the domain at mousedown/touchstart/timeend so panning math can compute new domain positions.
delta computes movement vector while the user drags: when mousemove or touchmove occurs it updates to down ? [down[0]-x(), down[1]-y()] : [0,0]. That value is used to pan (xdom/ydom update using delta).

h) `anchor`, `dist1`, `dist2`, `zoom` — pinch/scroll zooming

{ "name": "anchor", "value": [0,0], "on": [ ... ] },
{ "name": "dist1", "value": 0, "on": [ ... ] },
{ "name": "dist2", "value": 0, "on": [ ... ] },
{ "name": "zoom", "value": 1, "on": [ ... ] }

anchor: when mouse wheel events occur it stores the zoom anchor point (converted to data coords using invert('xscale', x()) etc.). For two-finger touchstart it stores the mid-point of the two touches.
dist1/dist2: used to measure pinch distance initial (dist1) and current pinch (dist2) so a ratio dist1/dist2 can be used as a zoom factor.
zoom: updated either on wheel! (mouse wheel) or on pinch (signal change). For wheel, it uses pow(1.001, event.deltaY * pow(16, event.deltaMode)) to produce a smooth scale factor.

i) `xdom` / `ydom` — current data domain visible (driven by pan/zoom)

{
  "name": "xdom",
  "update": "slice(xext)",
  "on": [
    { "events": { "signal": "delta" }, "update": "[xcur[0] + span(xcur) * delta[0] / width, xcur[1] + span(xcur) * delta[0] / width]" },
    { "events": { "signal": "zoom" }, "update": "[anchor[0] + (xdom[0] - anchor[0]) * zoom, anchor[0] + (xdom[1] - anchor[0]) * zoom]" },
    { "events": "dblclick", "update": "[0,width]" }
  ]
}

xdom is the domain (data coordinate range) mapped to the visible area. It starts from slice(xext) (i.e. current extent).
On delta (drag), it shifts both endpoints proportionally to mouse movement to pan horizontally.
On zoom, it scales the domain about the anchor point.
On dblclick, it resets to [0, width] (full extent).
ydom has similar logic for vertical panning and zooming.

j) Scaling helpers — scaledNodeWidth / scaledFont* / scaledKPIHeight / scaledLimit

{ "name": "scaledNodeWidth", "update": "(nodeWidth/ span(xdom))*width" },
{ "name": "scaledNodeHeight", "update": "abs(nodeHeight/ span(ydom))*height" },
{ "name": "scaledFont13", "update": "(13/ span(xdom))*width" },
{ "name": "scaledKPIHeight", "update": "(5/ span(xdom))*width" },
{ "name": "scaledLimit", "update": "(20/ span(xdom))*width" }

These compute pixel sizes for nodes, fonts, KPI bars, and label trimming based on the current pan/zoom (span(xdom) / span(ydom)). As you zoom in/out, fonts and node sizes scale so the layout remains consistent.

4) Data pipelines & transforms (full explanation)

This section shows every data entry and explains transforms in order:

a) `dataset` and `source`

{"name": "dataset"},
{"name": "source", "source": "dataset"}

dataset is the raw data provided by Deneb (Power BI). source is simply a named alias pointing to dataset for clarity.

b) `wideToTall` — converting wide hierarchy columns (level1..level5) to tall rows

This block contains many formula transforms that create l1..l5 objects, then fold them and finally extract id, parent, title, person, and kpi.

Key steps (simplified):

For each row in source, create l1, l2, l3, l4, l5 where each is an object {key: ..., parent: ..., person:..., kpi:...}. l2 includes level1 as its parent, l3 has l2 as parent, etc.
fold the fields l1..l5 to turn columns into rows of objects.
project the key and value (value holds the object from l1..l5).
id formula: datum.value.key (the unique key for that node level)
title formula: reverse(split(datum.value.key,'|'))[0] — takes the last part of the | separated key (the actual label)
parent formula: datum.value.parent — the parent id string
filter removes rows with empty titles or 'null' strings
aggregate groups by id,parent,title,value to remove duplicates (if multiple source rows referenced the same node)
Extract person and kpi from the value object.

Result: wideToTall produces a list of unique node records with fields: id, parent, title, person, kpi.

c) `treeCalcs` — stratify + tree layout

{ "type": "stratify", "key": "id", "parentKey": "parent" },
{ "type": "tree", "method": { "signal": "'tidy'" }, "separation": { "signal": "false" }, "as": ["y","x","depth","children"] },
{ "as": "parent", "type": "formula", "expr": "datum.parent" }

stratify builds the hierarchical structure from id & parent.
tree computes layout coordinates: it yields y and x coordinates, depth (level), and children array for each node. method: 'tidy' picks a tidy layout algorithm. separation:false disables extra spacing logic.
The final formula extracts parent as a top-level field on each node so downstream datasets can reference it easily.

d) `treeChildren` — gather children ids for quick lookup

It aggregates on parent to build childrenObjects and childrenIds (via pluck). Useful later to know how many children and their ids.

e) `treeAncestors` — compute ancestor list for each node

Uses treeAncestors('treeCalcs', datum.id) built-in to get ancestor objects up to root, then flatten to expose them and extracts allParents.

f) `treeChildrenAll` — for nodes in the ancestor chain, aggregate their children

Projects relevant fields and aggregates to compute allChildrenIds for each ancestor. This helps quickly find which nodes are under a given ancestor.

g) `treeClickStoreTemp` — temporary store to decide which nodes to display after a click

This transform uses a fairly complex filter expression:

startingDepth!=-1 ? datum.depth <= startingDepth : node !=0 && !isExpanded ? datum.parent == node : node !=0 && isExpanded ? datum.allParents == node : false

Interpretation:

If startingDepth != -1: show nodes with depth <= startingDepth (initial loading behavior).
Else, if node != 0 (a node was clicked) and isExpanded is false (we are expanding): include rows whose parent == node (immediate children)
Else, if node != 0 and isExpanded is true (we are collapsing): include rows whose allParents == node (everything under the clicked node), so that removal works.
Otherwise, include nothing.

After filtering, project + aggregate remove duplicates and produce treeClickStoreTemp as a set of nodes to insert into the permanent store when triggered.

h) `treeClickStorePerm` — persistent set of visible nodes (empty initial values)

{"name": "treeClickStorePerm", "values": [], "on": [ ... ] }

This dataset is initially empty. It has on triggers that insert treeClickStoreTemp:
- When startingDepth>=0 trigger runs, it inserts data('treeClickStoreTemp') so the initial depth is shown
- When node triggers, it inserts !isExpanded ? data('treeClickStoreTemp') : false — i.e., expand on click inserts the temp nodes
- When node triggers and isExpanded is true, it removes data('treeClickStoreTemp') (collapse)

Thus treeClickStorePerm holds the set of currently visible node ids controlled by initial load and user clicks.

i) `treeLayout` — final node positions to render (filtered to visible nodes)

Key steps:

filter to only nodes where indata('treeClickStorePerm', 'id', datum.id) is true — keep only visible nodes.
stratify + tree again on this subset (so layout is recomputed for visible subtree). The nodeSize uses nodeHeight + verticalGap and nodeWidth + horizontalGap signals to space nodes.
Compute y offset: datum.y + (height/2) centers the tree.
Compute xscaled as scale('xscale', datum.x) — used later for link endpoints.

j) `fullTreeLayout` — enrich layout with child/parent lookups

Runs lookup transforms to attach childrenIds, allChildrenIds, and children arrays from earlier datasets using keys.
Adds treeParent formula: reverse(pluck(treeAncestors('treeCalcs', datum.id), 'id'))[1] — this picks the top-level parent (immediate group) for coloring and grouping. Reverse+pluck picks an ancestor chain and accesses the second element (index 1), which corresponds to the top-level bucket for that node (useful for consistent group colours).

k) `visibleNodes` — a final filtered list used by marks

Filters fullTreeLayout again with presence in treeClickStorePerm.

l) `maxWidthAndHeight` — compute layout extents

Aggregates visibleNodes by depth and computes maxNodes, maxX, maxY. These can be used to resize or adjust visual bounds.

m) `links` — compute link path geometry

Source is treeLayout (note: layout used for links is the unfiltered layout to ensure full link geometry available).
treelinks transform produces {source:{x,y,id}, target:{x,y,id}} items.
linkpath with orient: 'horizontal', shape: 'diagonal' computes path geometry for smooth curved links. The sourceX, sourceY, targetX, targetY are calculated using scale('yscale', datum.source.y) and scale('xscale', datum.source.x + nodeWidth) etc.
Final filter ensures only links whose target.id is present in treeClickStorePerm (visible) are kept.

5) Scales

{
  "name": "xscale",
  "zero": false,
  "domain": {"signal": "xdom"},
  "range": {"signal": "xrange"}
},
{
  "name": "yscale",
  "zero": false,
  "domain": {"signal": "ydom"},
  "range": {"signal": "yrange"}
},
{
  "name": "kpiscale",
  "zero": false,
  "domain": [0,100],
  "range": {"signal": "[0,scaledNodeWidth]"}
},
{
  "name": "colour",
  "type": "ordinal",
  "range": [ ...colors... ],
  "domain": { "data": "visibleNodes", "field": "treeParent" }
}

Explanation:

xscale maps the tree x coordinates into pixel space using xdom and xrange signals (which change on zoom/pan).
yscale maps y coordinates.
kpiscale maps KPI values (0–100) into a pixel width for the KPI bar inside the node. The range is [0, scaledNodeWidth] so the KPI fits inside the node rectangle.
colour is ordinal; its domain is the distinct treeParent values in visibleNodes. Each top-level bucket gets a consistent color from the palette.

6) Marks (paths and node groups)

A) Link paths

{
  "type": "path",
  "interactive": false,
  "from": { "data": "links" },
  "encode": {
    "update": {
      "path": { "field": "path" },
      "strokeWidth": { "signal": "indexof(nodeHighlight, datum.target.id)> -1? 2.5:0.4" },
      "stroke": { "scale": "colour", "signal": "reverse(pluck(treeAncestors('treeCalcs', datum.target.id), 'id'))[1]" }
    }
  }
}

Draws the curved path previously computed by linkpath (field path).
strokeWidth is thicker if the target.id is contained in nodeHighlight (i.e., the hovered path to root gets emphasized).
stroke color uses the colour scale. The scale key is the node’s top-level parent (computed via reverse(pluck(...))[1]) so links match node bucket colors.

B) Node group (big block that contains rectangle, KPI, texts)

This is a group mark named node and it from the visibleNodes dataset:

{
  "name": "node",
  "type": "group",
  "from": { "data": "visibleNodes" },
  "encode": {
    "update": {
      "x": { "field": "x", "scale": "xscale" },
      "width": { "signal": "scaledNodeWidth" },
      "yc": { "field": "y", "scale": "yscale" },
      "height": { "signal": "scaledNodeHeight" },
      "fill": { "signal": "merge(hsl(scale('colour', datum.treeParent)), {l:0.94})" },
      "stroke": { "signal": "merge(hsl(scale('colour', datum.treeParent)), {l:0.79})" },
      "cornerRadius": { "value": 2 },
      "cursor": { "signal": "datum.children>0?'pointer':''" },
      "tooltip": { "signal": "" }
    }
  },
  "marks": [ ... internal marks ... ]
}

Key notes:

The group sets x position (scaled by xscale) and a width/height. Inside marks position themselves relative to the group.
fill & stroke use merge(hsl(scale('colour', datum.treeParent)), {l:0.94}) — this converts the scale color to HSL, then merges a lightness override so the node background is a lighter version of the bucket color.
cursor becomes pointer when a node has children indicating it is actionable.

Internal marks inside the node group

The group contains several marks that together make the visual appearance:

highlight rect — intended to change background on hover/ancestors
- It uses item.mark.group.x1 / item.mark.group.width / height to size itself to the whole group area.
- Fill/stroke are conditional using indexof(nodeHighlight, parent.id) > -1 test.
- Note: there’s a comment in the original code noting this highlight doesn’t always behave as expected on the group element — Vega quirks sometimes require attaching event handlers to a specific mark.
KPI background rect — a small, translucent bar area anchored at the bottom inside the node group.
- Uses item.mark.group.height - scaledKPIHeight to align at bottom.
- fill uses the bucket colour and opacity: 0.2 to make a faint band.
KPI rect — actual KPI fill inside the KPI background.
- Its width is scale('kpiscale', parent.kpi) so KPI 0..100 maps to 0..scaledNodeWidth.
- fill uses the bucket colour.
text: name — person name (larger, bold)
- x and y offsets are calculated via signals that scale with zoom so text placement remains proportional.
- text is parent.person and limit (max characters) uses scaledNodeWidth - scaledLimit to avoid overflow.
text: title — role/title line beneath the name; smaller font.
- Uses parent.title.
text: node children — shows number of children (if any) aligned to the right.
- text expression: parent.children>0?parent.children:'' — so empty if no children.

These marks together form the node visual: background, small KPI bar at bottom, name/title text and child count.

7) Interactivity (how events change signals and visuals)

The interactivity is implemented via signal on handlers and mark-level markname event triggers.

Main interactions:

Hover a node → nodeHighlight updates via mouseover and mouseout to contain the ancestor chain. The path strokeWidth and highlight rect check indexof(nodeHighlight, id) to emphasize the path.
Click a node → node becomes the clicked datum.id and isExpanded determines whether to insert children nodes into treeClickStorePerm (expand) or remove them (collapse). This dynamically grows/shrinks the treeClickStorePerm dataset and triggers recomputation of visibleNodes and links.
Drag / Pan → mousedown/mousemove/mouseup and touch equivalents compute delta, which updates xdom/ydom and thus pan the view through xscale/yscale domain changes.
Zoom → mouse wheel (wheel!) or pinch gestures update zoom and anchor, and xdom/ydom update accordingly. invert('xscale', x()) is used to convert pixel mouse location to data coords for anchored zooming.
Double click → resets zoom to default domain via dblclick handler on xdom and ydom.

Because Vega is reactive: changing domain signals automatically repositions marks (since many marks use scale('xscale', ...) and scale('yscale', ...) or field+scale encodings).

8) End‑to‑End Data / Render Flow (summary)

Data ingestion: Deneb passes Power BI table as dataset.
Normalization: wideToTall converts hierarchical columns to id/parent rows.
Hierarchy creation: treeCalcs stratifies and calculates a base layout for the whole tree.
Ancillary lookups: treeChildren, treeAncestors, treeChildrenAll compute relationships and caches.
Click/initialization logic: treeClickStoreTemp selects nodes to show on initial load or clicks; treeClickStorePerm stores the visible nodes.
Layout for visible nodes: treeLayout recomputes layout for visible nodes only.
Link geometry: links creates curved paths connecting parent→child for visible links.
Rendering: path marks render links; group node marks render node rectangles, KPI, and text.
Interactions update signals, which re-run transforms/encodings and update visuals.

9) Practical notes for Deneb / Power BI

Required input columns (expected by wideToTall): level1, level2, level3, level4, level5, person, kpi (or change spec accordingly). If your dataset uses different column names, update the formula expressions in wideToTall to reference your columns.
Performance: For very large orgs (>1000 nodes) recomputing the tree transforms can be slow. Consider precomputing hierarchy and sending a tidy id,parent,title,person,kpi table to Deneb to skip the wideToTall step.
Debugging: Use the Vega inspector (in Deneb’s editor) or temporary console/debug signals to print intermediate values. You can also add a simple text mark that writes datum.id for the first few visibleNodes to verify.
Colors: The colour scale domain is dynamic. If you want fixed colors for named top-level groups, replace the domain expression with a static array of group ids.

10) Small edits & common tweaks (examples)

Change node width: adjust nodeWidth signal value.
Start collapsed: set startingDepth to 0 to only show root node initially.
Limit depth: in treeClickStoreTemp adjust the filter to datum.depth <= someValue.
Change link style: set the linkpath shape to line or curve.
Show tooltips: set the tooltip signal in the group to parent.title + ' - ' + parent.person or populate a title attribute.

11) Exercises & next steps (choose one)

Annotate the JSON: I can produce a commented copy of your full JSON with inline one‑line comments for each line/value (useful to open in an editor). Do you want that? (Yes / No)
Adapt to your column names: Share your exact column names and I’ll edit the wideToTall formulas so the spec works with your dataset out of the box.
Simplify for large orgs: I can produce a lighter spec that assumes you precompute id,parent,title,person,kpi and remove folding / aggregation transforms to speed up rendering.

If you want the annotated JSON (inline comments next to each code line) say "annotate JSON" and I’ll add it to this document. If you prefer I tailor this spec to your Power BI field names, paste them here and I’ll update the wideToTall transforms accordingly.

Monday, 18 August 2025

Collapsable Orgtree

Zoomable Collapsible Org‑Tree (Vega) — Line‑by‑Line Step‑by‑Step Guide

Contents

1) Overview — how this spec works (short)

2) Top‑level metadata (first ~10 lines)

3) Signals — interactive variables (detailed)

a) Node / spacing constants

b) startingDepth — initial reveal depth

c) node — id of clicked node (or 0)

d) nodeHighlight — node + ancestor highlight on hover

e) isExpanded — whether a node is expanded (click logic)

f) xrange / yrange — pixel ranges for scales

g) down, xcur, ycur, delta — panning calculations

h) anchor, dist1, dist2, zoom — pinch/scroll zooming

i) xdom / ydom — current data domain visible (driven by pan/zoom)

j) Scaling helpers — scaledNodeWidth / scaledFont* / scaledKPIHeight / scaledLimit

4) Data pipelines & transforms (full explanation)

a) dataset and source

b) wideToTall — converting wide hierarchy columns (level1..level5) to tall rows

c) treeCalcs — stratify + tree layout

d) treeChildren — gather children ids for quick lookup

e) treeAncestors — compute ancestor list for each node

f) treeChildrenAll — for nodes in the ancestor chain, aggregate their children

g) treeClickStoreTemp — temporary store to decide which nodes to display after a click

h) treeClickStorePerm — persistent set of visible nodes (empty initial values)

i) treeLayout — final node positions to render (filtered to visible nodes)

j) fullTreeLayout — enrich layout with child/parent lookups

k) visibleNodes — a final filtered list used by marks

l) maxWidthAndHeight — compute layout extents

m) links — compute link path geometry