How the Interactive Explorer Queries Work
A walkthrough of the data architecture and SQL queries powering the Interactive Explorer, for contributors who want to understand or extend it.
Source code: tutorials/isamples_explorer.qmd
Architecture: Two-Tier Loading
The Explorer uses DuckDB-WASM — a full SQL engine running in the browser. No server is involved. It queries parquet files on data.isamples.org using HTTP range requests (only downloads the bytes it needs).
Tier 1 — Instant facet counts (2 KB):
data.isamples.org/isamples_202601_facet_summaries.parquet
Pre-computed counts for each filter category (source, material, sampled feature, specimen type). Loads in milliseconds so the sidebar filters appear immediately.
Tier 2 — Full sample data (278 MB, but only partial bytes transferred):
data.isamples.org/isamples_202601_wide.parquet
The wide-format PQG file with 20M rows. DuckDB-WASM uses range requests to read only the columns and rows matching the current filters — typically a few MB of actual transfer even though the file is 278 MB.
The Key Queries
1. Facet summaries (Tier 1)
Source: lines 346–357
SELECT * FROM read_parquet('https://data.isamples.org/isamples_202601_facet_summaries.parquet')
Returns rows like:
| facet_type |
facet_value |
count |
| source |
SESAR |
3,428,639 |
| source |
OPENCONTEXT |
1,096,127 |
| material |
Rock |
1,234,567 |
| context |
Earth interior |
890,123 |
| object_type |
Physical specimen |
2,345,678 |
These populate the checkbox counts in the sidebar before the big parquet file loads.
2. Create a SQL view over the remote parquet
Source: lines 311–316
CREATE VIEW samples AS
SELECT * FROM read_parquet('https://data.isamples.org/isamples_202601_wide.parquet')
This doesn't download the file — it registers it as a virtual table. DuckDB-WASM only fetches data when you actually query it.
3. Dynamic WHERE clause from filters
Source: lines 388–434
The WHERE clause is built dynamically from the current UI state:
-- Always applied (base conditions):
otype = 'MaterialSampleRecord'
AND latitude IS NOT NULL
-- Text search (when user types in search box):
AND (label ILIKE '%pottery%'
OR description ILIKE '%pottery%'
OR CAST(place_name AS VARCHAR) ILIKE '%pottery%')
-- Source filter (checkbox selection):
AND n IN ('SESAR', 'OPENCONTEXT')
-- Material filter:
AND has_material_category IN ('Rock', 'Ite Biology')
-- Sampled feature filter:
AND has_context_category IN ('Earth interior')
-- Specimen type filter:
AND has_specimen_category IN ('Physical specimen')
Key column mappings in the wide parquet:
n → source name (SESAR, OPENCONTEXT, GEOME, SMITHSONIAN)
otype → entity type (MaterialSampleRecord, SamplingEvent, etc.)
has_material_category → material classification
has_context_category → sampled feature / context
has_specimen_category → specimen type
label, description, place_name → text-searchable fields
4. Count matching records
Source: lines 447–455
SELECT COUNT(*) as count
FROM samples
WHERE {whereClause}
5. Load sample data for display
Source: lines 460–500
SELECT
row_id, pid, label,
COALESCE(description, '') as description,
latitude, longitude,
n as source, place_name
FROM samples
WHERE {whereClause}
ORDER BY RANDOM()
LIMIT {maxSamples} -- default 25,000
The ORDER BY RANDOM() ensures a representative geographic spread rather than returning the first N rows (which might cluster in one region). The LIMIT caps browser memory usage.
Data Flow Summary
User adjusts filters
↓
Observable reactivity rebuilds WHERE clause
↓
DuckDB-WASM sends HTTP range requests to data.isamples.org
↓
Only matching columns/row groups are downloaded (~2-10 MB)
↓
Results rendered as Globe (Cesium) / Table / List
↓
URL params updated for bookmarkability
Data Files Reference
All files served from Cloudflare R2 via data.isamples.org with range request + CORS support.
How to Extend
Add a new filter:
- Add a new facet type to the
facet_summaries parquet (generated by PQG pipeline)
- Add a checkbox widget in the Filters section
- Add a condition to the
whereClause builder
Add a new view mode:
- Add option to the
viewMode radio selector
- Add rendering branch in the results display block
Change data source: Update parquet_url and facet_summaries_url at the top of the file.
How the Interactive Explorer Queries Work
A walkthrough of the data architecture and SQL queries powering the Interactive Explorer, for contributors who want to understand or extend it.
Source code:
tutorials/isamples_explorer.qmdArchitecture: Two-Tier Loading
The Explorer uses DuckDB-WASM — a full SQL engine running in the browser. No server is involved. It queries parquet files on
data.isamples.orgusing HTTP range requests (only downloads the bytes it needs).Tier 1 — Instant facet counts (2 KB):
Pre-computed counts for each filter category (source, material, sampled feature, specimen type). Loads in milliseconds so the sidebar filters appear immediately.
Tier 2 — Full sample data (278 MB, but only partial bytes transferred):
The wide-format PQG file with 20M rows. DuckDB-WASM uses range requests to read only the columns and rows matching the current filters — typically a few MB of actual transfer even though the file is 278 MB.
The Key Queries
1. Facet summaries (Tier 1)
Source: lines 346–357
Returns rows like:
These populate the checkbox counts in the sidebar before the big parquet file loads.
2. Create a SQL view over the remote parquet
Source: lines 311–316
This doesn't download the file — it registers it as a virtual table. DuckDB-WASM only fetches data when you actually query it.
3. Dynamic WHERE clause from filters
Source: lines 388–434
The WHERE clause is built dynamically from the current UI state:
Key column mappings in the wide parquet:
n→ source name (SESAR, OPENCONTEXT, GEOME, SMITHSONIAN)otype→ entity type (MaterialSampleRecord, SamplingEvent, etc.)has_material_category→ material classificationhas_context_category→ sampled feature / contexthas_specimen_category→ specimen typelabel,description,place_name→ text-searchable fields4. Count matching records
Source: lines 447–455
5. Load sample data for display
Source: lines 460–500
The
ORDER BY RANDOM()ensures a representative geographic spread rather than returning the first N rows (which might cluster in one region). TheLIMITcaps browser memory usage.Data Flow Summary
Data Files Reference
facet_summaries.parquetwide.parquetwide_h3.parquetsamples_map_lite.parquetAll files served from Cloudflare R2 via
data.isamples.orgwith range request + CORS support.How to Extend
Add a new filter:
facet_summariesparquet (generated by PQG pipeline)whereClausebuilderAdd a new view mode:
viewModeradio selectorChange data source: Update
parquet_urlandfacet_summaries_urlat the top of the file.