Skip to content

Update index.md#1

Merged
ramonawalls merged 1 commit into
mainfrom
ramonawalls-patch-1
Jan 13, 2021
Merged

Update index.md#1
ramonawalls merged 1 commit into
mainfrom
ramonawalls-patch-1

Conversation

@ramonawalls
Copy link
Copy Markdown
Contributor

Update architecture workflow

Update architecture workflow
@ramonawalls ramonawalls merged commit 40a2dc2 into main Jan 13, 2021
datadavev added a commit that referenced this pull request May 10, 2022
rdhyee added a commit that referenced this pull request Feb 13, 2026
* Add H3 spatial indexing, two-tier facet loading, and benchmark optimizations (#5)

Add H3 spatial indexing, two-tier facet loading, and benchmark optimizations

## Changes
- isamples_explorer.qmd: Two-tier facet loading (2KB summary for instant counts)
- parquet_cesium_isamples_wide.qmd: Zoom-adaptive H3 clustering with LOD
- zenodo_isamples_analysis.qmd: Data-driven H3 regional analysis
- narrow_vs_wide_performance.qmd: Added geospatial and facet benchmarks

## Fixes Applied (Codex review)
- Fixed MODE(n) → MODE(source) for cluster coloring
- Added camera listener cleanup to prevent leaks
- Added NaN guard for cluster label parsing
- Added user-facing warning for facet summary failures

Closes #1, #2, #3, #4

* Add progressive globe demo with H3 aggregated loading

Loads 580KB H3 res4 summary for instant globe render (<1s),
then switches to res6/res8 on zoom with viewport filtering.
Click triggers sample detail query from full 280MB parquet.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix progressive globe: render stats bar from OJS cells

DOM elements created in raw HTML aren't available when OJS cells
execute. Move legend, stats bar, and phase indicator into OJS cells
and add null guards on all getElementById calls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Click cluster dot to fly-to and drill down to next H3 resolution

Clicking an H3 cluster now flies the camera to that location at
an altitude that triggers the next resolution level (res4→res6→res8).
The zoom watcher then automatically loads finer detail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Redesign progressive globe: side panel + global data + info-only clicks

- Side-by-side layout: globe left, live info panel right (always visible)
- Load full H3 files globally (no viewport filtering) — no gaps when panning
- Click shows cluster info + nearby samples in side panel (no camera fly-to)
- Zoom watcher switches resolution automatically: res4 → res6 → res8
- Stats, legend, cluster card, and sample list all in side panel

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix infinite loop: replace OJS reactivity with imperative DOM updates

The side panel was causing a reactive cycle:
  globeStatus change → sidePanel re-render → layout re-render →
  viewer re-create → phase1 re-run → globeStatus change → loop

Fix: all side panel content is static HTML. Stats, cluster card,
and sample list are updated via getElementById/innerHTML only.
No OJS mutable variables, no reactive cascade.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add sub-res8 individual sample drill-down to progressive globe

- New 4th zoom tier: below 120km altitude, switches from H3 clusters
  to individual sample points loaded from lite parquet (60MB vs 280MB)
- Two-stage sample card: instant metadata from lite file, lazy-loaded
  description from full wide parquet on click
- Viewport caching with 30% padding for smooth panning
- Stale-request guards for async camera/query flows
- Hysteresis thresholds (120km enter / 180km exit) to prevent flicker
- Separate PointPrimitiveCollection for samples vs clusters
- Cluster click queries now use lite parquet instead of wide (5x faster)

Data files on R2:
- isamples_202601_samples_map_lite.parquet (60MB, 6M rows, 9 columns)
- Still uses H3 summary files for res4/6/8 cluster view

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix bugs from Codex review: deadlock, schema mismatch, timing

- loadRes: wrap in try/catch/finally so `loading` flag always resets
  on query failure (was permanent deadlock — finding #2)
- Schema fix: cluster-click query used `n as source` but the lite
  parquet has column named `source` (finding #4)
- Remove unnecessary ORDER BY on H3 loads (finding #8)
- Use .pop() instead of [0] for performance timing entries (finding #11)
- Add rel="noopener noreferrer" to target="_blank" link (finding #7)

Deferred: XSS escaping (trusted data), antimeridian handling,
detail click caching, startup error fallback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add progressive globe to sidebar navigation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix cluster-click query: remove description column missing from lite parquet

The samples_map_lite.parquet doesn't have a description column.
Use place_name for nearby sample cards instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add URL state encoding for shareable deep links

- Hash-based URL state: lat, lng, alt, heading, pitch, mode, pid
- v=1 schema versioning for future compatibility
- parseNum with Number.isFinite (avoids lat=0 bug from Codex review)
- replaceState for continuous camera movement, pushState for mode
  transitions and sample/cluster selection
- Browser back/forward via hashchange listener with flight animation
- Suppress flag prevents hash write loops during navigation restore
- Deep-link startup: fly to position and restore sample card from pid
- Share View button copies current URL to clipboard with toast
- pid takes precedence over h3 (canonicalized on write)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix startup crash: move _initialHash before globalRect block that reads it

v._initialHash was set after the once() closure that references it,
causing undefined.lat TypeError on page load.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
rdhyee added a commit that referenced this pull request Apr 10, 2026
* Add cross-filtering to Explorer facet counts

When any filter is active, facet counts now reflect the intersection
of all OTHER active filters. For example, selecting SESAR as source
updates material/context/specimen counts to show only what exists
in SESAR data. Uses parallel GROUP BY queries via DuckDB-WASM.

Counts update via DOM manipulation to avoid resetting checkbox
selections. Zero-count facet values are dimmed for visual clarity.
When no filters are active, pre-computed summaries are used (instant).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix cross-filtering: use pre-computed cache + correct column mapping

- Add 6KB pre-computed cross-filter cache for instant single-filter lookups
- Add 21MB sample_facets view with URI-string columns for on-the-fly fallback
- Fix column name mismatch: wide parquet has p__* BIGINT[] columns, but
  facet values are URI strings — cross-filter now queries sample_facets
- Main whereClause uses pid subquery against sample_facets for facet filters
- Source filter still queries wide parquet directly (n column is correct)

Supplementary files on data.isamples.org:
- isamples_202601_facet_cross_filter.parquet (6 KB, 526 rows)
- isamples_202601_sample_facets_v2.parquet (21 MB, 6M rows)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix three cross-filter bugs

1. Multi-value within single facet: fast path now requires exactly
   one value in the active facet, not just one active dimension.
   Multiple selections (e.g., SESAR+GEOME) correctly fall through
   to on-the-fly queries.

2. Text search participates in cross-filtering: buildCrossFilterWhere
   now includes ILIKE conditions. sample_facets_v2 regenerated with
   label, description, place_name columns (63 MB on R2).

3. Clearing filters restores baseline counts: the update cell now
   resets all facet-count labels to baseline values and removes
   zero-count dimming when crossFilteredFacets is null.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Fix count universe inconsistency and blank-value mismatch

Codex review found two bugs:

1. facet_summaries counted all 6.68M records but sample_facets only
   had the 5.98M with coordinates — counts jumped when toggling filters.
   Regenerated all three parquet files from the same base universe
   (lat IS NOT NULL). SESAR now consistently 4,389,231 across all files.

2. Baseline summaries included blank-string facet values, but on-the-fly
   queries excluded them with != ''. Regenerated summaries now exclude
   blanks, matching the on-the-fly behavior.

Also: removed dead getDisplayCounts(), fixed stale 0.3MB comment,
added missing quote escaping on source cache lookup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add cross-filtering interaction tests

5 new tests in TestExplorerCrossFiltering:
- Baseline SESAR count matches summaries (>4M)
- Clicking source updates material counts (organicmaterial decreases)
- Clearing filter restores baseline counts
- Zero-count items get dimmed (opacity < 1)
- New parquet endpoints (cross_filter, sample_facets_v2) reachable

Cross-filter tests gracefully skip if data attributes not yet deployed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Clean up blank-string facet values in sample_facets

Convert blank strings to NULL with NULLIF in sample_facets_v2 generation
(586 blank context rows → NULL). Remove redundant != '' guards from
on-the-fly queries since IS NOT NULL now handles both.

Addresses Codex finding #2: blank values in sample_facets caused state
mismatch with baseline summaries (which correctly excluded blanks).
Finding #1 (count universe mismatch) was a false positive — Codex
cached stale files; live CDN has consistent counts across all three
artifacts (SESAR=4,389,231, total=5,980,282).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
rdhyee added a commit that referenced this pull request May 9, 2026
…ase 1) (#186)

* explorer: persist selected cluster identity in URL via &h3=<cell>

Phase 1 of EXPLORER_CLUSTER_URL_PROPOSAL.md (#182). Cluster selection now
round-trips through the URL hash, complementing the existing &pid= for
samples. Use case: share or bookmark a specific cluster you clicked, and
have collaborators land on the same H3 cell with side-panel populated.

Encoding:

  #h3=843f6d3ffffffff

H3 cell index in canonical 15-char hex (no 0x prefix). The cell index
encodes its own resolution; no separate &res= or &cluster_source= field
needed. The existing &sources= filter (already URL-persisted) covers the
only filter that affects cluster aggregation — material/context/object_type
filters can't, per the comment at :1706-1710.

Mechanics:

- h3_cell carried into the runtime cluster .id at both add() sites
  (:992, :1335) as a hex string via row.h3_cell.toString(16). The parquet
  column is UBIGINT; converting to hex once at ingestion keeps the URL
  representation canonical.
- _globeState.selectedH3 added; mutated by cluster-click (:923) and
  cleared by sample-click (:895) for mutual exclusion. Same pattern as
  selectedPid.
- readHash parses h3 (:626); buildHash emits h3 when set (:645).
- fetchClusterByH3 helper at :1791 looks up the row across res4/res6/res8
  parquets via UNION ALL. DuckDB-WASM doesn't accept 0x... literals, so
  hex is converted to decimal in JS via BigInt and CAST AS UBIGINT in SQL.
- hydrateClusterUI helper at :1827 mirrors the cluster-click side-panel +
  nearby-samples query, called from both the boot deep-link (:2266) and
  the back/forward hashchange handler (:1899).
- Mutual-exclusion at hydration time: &pid= wins if both are present, per
  the proposal §4.

EXPLORER_STATE.md §2 updated with the new h3 row.

Verified locally:
- URL #h3=843f6d3ffffffff (a known res4 cell with 151,334 OpenContext
  samples in central Turkey) round-trips: side panel shows 'Selected
  Cluster / OpenContext / H3 res4 / 151,334 samples / 37.6619, 32.8334'
  with 30 nearby samples loaded.
- Empty hash + hash without h3/pid both load without errors.

Closes Phase 1 of #182. Phase 2 (unified &sel=) deferred unless a third
selection type appears.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* explorer: address Codex review of #186

Five fixes from Codex review:

1. Race-safe hash hydration (BLOCKER)
   Both pid and h3 hashchange branches now use a monotonic `viewer._selGen`
   token, bumped per hashchange and rechecked after every await. Fast
   back/forward across pid/h3/empty no longer lets stale fetch results
   repaint the side panel.

2. Strict h3 validation
   Replaced `replace(/[^0-9a-fA-F]/g, '')` with `/^[0-9a-f]{15}$/i.test()`
   over a lowercased input. Reject malformed input rather than silently
   strip — `h3=xxx843f...` no longer becomes a different lookup key.

3. Canonical lowercase normalization
   After successful lookup, runtime `selectedH3` is set from the parquet
   row's `h3_cell.toString(16)` (always lowercase), not the raw URL token.
   Subsequent `buildHash` writes always emit canonical form regardless of
   what the user typed. Boot deep-link applies the same normalization.

4. Resolution routing instead of UNION ALL
   Canonical H3 cells encode resolution in the 2nd hex char (after the
   leading-zero strip). `RES_TO_H3_URL[parseInt(lower[1], 16)]` picks the
   right parquet directly — one fetch instead of three on every &h3=
   load.

5. Mutual-exclusion in buildHash
   Changed independent `if`s for `selectedPid` / `selectedH3` to `else if`,
   making the runtime invariant load-bearing in one place.

Also: unknown / malformed h3 now actively clears the cluster card and
nearby-samples list, matching the empty-hash and missing-pid paths
(previously left stale content).

Verified locally:
- Uppercase #h3=843F6D3FFFFFFFF — hydrates, then runtime canonicalizes.
- Unknown well-formed cell #h3=843ffffffffffff — side panel clears, no
  errors.
- Non-hex #h3=zzz_NOT_HEX_zzz — silent reject, no JS errors.
- Known #h3=843f6d3ffffffff — round-trips identically.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* explorer: thread freshness check into hydrateClusterUI; bump _selGen earlier

Codex's second review found the previous race fix was incomplete:
hydrateClusterUI has its own internal `await db.query(...)` for the nearby-
samples list, then calls updateSamples(samples). The hashchange-handler-
side selGen check happened only AFTER hydrateClusterUI returned, so a
stale fetch INSIDE hydrateClusterUI could still repaint the side panel
with samples for an older h3 selection.

Fix: hydrateClusterUI now accepts an optional `isStale` predicate and
checks it after its inner await, before updateSamples (and before the
catch-path's "Query failed" message). The hashchange caller passes
`() => selGen !== viewer._selGen`. The cluster-click and boot-deep-link
callers leave it undefined — clicks are user-serialized and there's only
one boot, so no race possible there.

Also (Codex non-blocking nits):
- Bump `_selGen` at the very top of the hashchange handler, before the
  lat/lng early return — so even hashchanges that lack lat/lng invalidate
  any in-flight stale work.
- Reject non-cell H3 modes (`lower[0] !== '8'`) in fetchClusterByH3 —
  defensive guard against edges/vertices/etc. ever ending up in `&h3=`.

Verified locally: known-good `#h3=843f6d3ffffffff` round-trips identically
(151,334 OpenContext samples, 30 nearby rendered).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* explorer: address Codex review v3 — boot race + source-filter consistency

Two P2 findings from Codex's third pass on PR #186:

1. Boot deep-link could still race with a later hashchange.
   The hashchange listener registers earlier in the same OJS cell, so a
   slow initial &h3= or &pid= lookup can be superseded by browser
   back/forward (or a manual hash edit) during the await — the boot path
   would then finish later and repaint stale data. Apply the same _selGen
   guard to the boot path: bump the token at boot start, capture
   bootSelGen, define isBootStale = () => bootSelGen !== viewer._selGen,
   and check it after every await (pid lookup, wide-parquet description
   fetch, h3 lookup, and inside hydrateClusterUI via the existing
   isStale-predicate parameter).

2. fetchClusterByH3 bypassed the active source filter.
   The cluster lookup did `WHERE h3_cell = ?` without sourceFilterSQL —
   so an &h3= URL whose dominant_source is currently unchecked in
   ?sources= would still hydrate a cluster card for a dot the user can't
   see on the globe. Worse, hydrateClusterUI's nearby-samples query DOES
   apply source filter, producing a mismatched panel: full unfiltered
   cluster card with a filtered-down samples list. Add
   sourceFilterSQL('dominant_source') to the lookup; an excluded source
   now returns null and the side panel stays empty (matching what the
   globe shows).

Verified locally:
- ?sources=SESAR,GEOME,SMITHSONIAN#h3=843f6d3ffffffff (the cluster's
  dominant_source OPENCONTEXT is excluded) → side panel stays empty.
- ?sources= default (all checked) #h3=843f6d3ffffffff → hydrates as
  before with 151,334 OpenContext samples and 30 nearby rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* explorer: source-filter invalidates selection; boot finalize in try/finally

Two findings from Codex's fourth review:

1. Source-filter changes don't invalidate selection state.
   When the user unchecked the source for an already-hydrated cluster
   (or sample), the globe correctly hid the dot but the side panel and
   `&h3=` / `&pid=` URL stayed stale. Source filter changes also raced
   against in-flight selection lookups since they didn't bump `_selGen`.

   Fix: in the source-filter change handler (`:1690`), bump `_selGen`
   immediately, then after the existing globe-data reload, re-validate
   the current selection under the new filter:
     - Cluster (selectedH3): re-run fetchClusterByH3 (already honors
       sourceFilterSQL after v3); if returns null, clear selectedH3,
       cluster card, samples list, and rewrite the URL via replaceState.
     - Sample (selectedPid): probe lite_url with the same source filter;
       if no match, clear selectedPid + side panel + URL.
   Both branches re-check `_selGen` after the await to bail if a newer
   filter change has fired.

2. Boot's stale-abort early-returns skipped `_suppressHashWrite = false`.
   A no-lat/lng hashchange during boot's awaits could leave hash writes
   suppressed forever (the lat/lng path clears it via _suppressTimer; a
   stale-aborted boot leaves it set with no later cleanup).

   Fix: wrap the boot deep-link block in try/finally; move the
   `_suppressHashWrite = false` assignment into the finally so it runs
   on every path, including stale-abort early returns.

Verified locally:
- Load #h3=843f6d3ffffffff (OpenContext cluster); side panel hydrates.
- Uncheck OPENCONTEXT in the source filter → `&h3=` drops from URL,
  cluster card returns to empty state, samples list clears, ?sources=
  written with the remaining 3 sources. Globe also re-renders without
  OpenContext clusters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* explorer: fix UBIGINT precision-loss in h3_cell + rehydrate cluster on filter

Two issues from Codex's fifth review:

1. (P2 NEW) Selected cluster surviving the filter wasn't being rehydrated.
   When the user toggled a non-cluster source (e.g. unchecked SESAR while
   the selected cluster's dominant_source = OPENCONTEXT), the cluster
   stayed in URL but the nearby-samples list could now show stale rows
   from unchecked sources or miss newly-checked ones (hydrateClusterUI's
   nearby query uses sourceFilterSQL('source')).

   Fix: in the source-filter handler's revalidate branch, when meta is
   truthy (cluster still valid), call hydrateClusterUI(meta, isStale) to
   refresh the side panel under the new filter — not just leave it.

2. (UBIGINT precision regression — surfaced by testing #1) DuckDB-WASM
   returns h3_cell (UBIGINT > 2^53) as a JS Number, which loses precision
   on .toString(16). Boot worked because the SQL WHERE matched at the
   parquet level, but `selectedH3 = meta.h3_cell` (lossy roundtrip)
   stored a corrupted hex; subsequent revalidations against the corrupted
   key would never match and the panel would clear. The bug was latent
   in PR-as-of-ebd7978; the rehydrate branch above made it visible.

   Fix: SQL SELECT now CASTs h3_cell to VARCHAR (decimal string), and JS
   converts to hex via BigInt(decString).toString(16) — no precision
   loss. Applied at the two cluster-render sites (phase1, loadRes).
   fetchClusterByH3's return now uses the validated input `lower` as the
   canonical hex so the helper is also lossless.

   `to_hex()` in DuckDB-WASM doesn't exist (tried first, errored
   "Catalog Error: Scalar Function with name to_hex does not exist!" —
   the VARCHAR cast + JS BigInt is portable across versions).

Verified locally:
- Boot at #h3=843f6d3ffffffff hydrates correctly.
- Uncheck SESAR (OPENCONTEXT survives): cluster card unchanged, samples
  re-rendered with only OpenContext rows, &h3= preserved.
- Uncheck OPENCONTEXT (cluster's own source): card + samples cleared,
  &h3= dropped from URL.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* docs: sync EXPLORER_STATE.md h3 row to current implementation

Codex's sixth review only finding (P3, non-runtime): the EXPLORER_STATE.md
description still reflected the original v1 implementation:
- "regex `[^0-9a-fA-F]` strip" → now strict `/^[0-9a-f]{15}$/i` reject-not-strip
- "UNION ALL across all 3 parquets" → now resolution-routed via RES_TO_H3_URL
- Missing: cell-mode guard (`lower[0] === '8'`)
- Missing: source filter applied (sourceFilterSQL('dominant_source'))
- Missing: UBIGINT precision-loss workaround (CAST AS VARCHAR + BigInt)
- Missing: source-filter change re-validation
- Missing: _selGen race guard

Updated the h3 row to describe current behavior so future URL-state work
finds accurate docs.

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rdhyee added a commit that referenced this pull request May 13, 2026
Codex flagged three blockers on the cumulative M-1A + M-1B diff:

1. Mobile overlay collided with the Cesium toolbar. At max-width: 900px
   the overlay was switching to left: 8px, covering the toolbar column
   at left: 5px. Removed the mobile left-shift; keep the overlay offset
   at left: 50px so the vertical toolbar remains hit-targetable. Width
   adjusts to calc(100% - 58px) instead.

2. Base-layer picker dropdown could be occluded by the overlay
   (dropdown opens at left: 36px; overlay starts at left: 50px). Bump
   .cesium-baseLayerPicker-dropDown z-index to 1100 so it wins the
   z-stack against the overlay's 1000.

3. No automated coverage for overlay-vs-toolbar collision. Add
   tests/playwright/explorer-map-overlay.spec.js with four specs:
   desktop overlap, mobile overlap, dropdown z-index ordering, and a
   smoke test that the sidebar↔in-map input mirror is bidirectional.

Also picked up the small IME concern from Codex ask #1: gate both Enter
handlers (in-map and sidebar) on `!e.isComposing && e.keyCode !== 229`
so IMEs that emit Enter to commit a candidate don't trigger a search on
the pre-commit value.

Non-blocker items (URL-clears-search edge case, aria-label on
#sampleSearch, aria-describedby for hints) deferred — to be picked up
in the EXPLORER_STATE.md / a11y commit at the end of the PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant