Skip to content

iSamples MVP Cleanup & Simplification Strategy #49

@rdhyee

Description

@rdhyee

Overview

This issue documents the cleanup strategy for tying a bow around the iSamples project (this round). Goal: simplify to an MVP, then refine.

Date: 2026-01-29
Scope: 3 repositories (isamplesorg-metadata, isamples-python/examples, isamplesorg.github.io)


The Stack at a Glance

┌─────────────────────────────────────────────────────────────────────┐
│                        USER EXPERIENCE                               │
├──────────────────────────────┬──────────────────────────────────────┤
│  isamplesorg.github.io       │  isamples-python (examples)          │
│  (Browser - Zero Install)    │  (Jupyter - Developer)               │
│  • isamples_explorer.qmd     │  • isamples_explorer.ipynb           │
│  • parquet_cesium_*.qmd      │  • geoparquet.ipynb                  │
│  • DuckDB-WASM + Cesium      │  • DuckDB + Lonboard                 │
└──────────────────────────────┴──────────────────────────────────────┘
                                    │
                          ┌─────────▼─────────┐
                          │   DATA LAYER      │
                          │   Cloudflare R2   │
                          │   Wide: 280MB     │
                          │   Narrow: 850MB   │
                          └─────────┬─────────┘
                                    │
┌───────────────────────────────────▼─────────────────────────────────┐
│                     METADATA STANDARD                                │
│                   isamplesorg-metadata                               │
│  • 8 Entity Types (MaterialSampleRecord, SamplingEvent, etc.)       │
│  • 14 Predicates (produced_by, has_material_category, etc.)         │
│  • JSON Schema, JSON-LD, SKOS vocabularies                          │
└─────────────────────────────────────────────────────────────────────┘

MVP Definition: What to Keep

Tier 1: Essential (The Core Product)

Component Location Purpose
Data on R2 Cloudflare Wide parquet (280MB) - single source of truth
Schema isamplesorg-metadata isamples_core.yaml + JSON Schema
Browser Explorer isamplesorg.github.io isamples_explorer.qmd - main discovery UX
3D Globe isamplesorg.github.io parquet_cesium_isamples_wide.qmd
Jupyter Explorer isamples-python isamples_explorer.ipynb - developer entry point
Visualization Patterns isamples-python geoparquet.ipynb - Lonboard patterns

Tier 2: Educational (Keep for Learning)

Component Location Purpose
PQG Demo isamples-python pqg_demo.ipynb - property graph queries
Schema Comparison isamples-python schema_comparison.ipynb - narrow vs wide
SQL Deep Dive isamplesorg.github.io zenodo_isamples_analysis.qmd
Graph Documentation isamplesorg-metadata src/docs/UNDERSTANDING_THE_GRAPH.md

Cleanup Actions

✅ Completed: isamples-python (PR #2 merged)

  • Archived defunct API client → archive/defunct-api-client/
  • Archived export parquet tools → archive/export-parquet-tools/
  • Updated pyproject.toml to examples-only repo
  • Updated README, CLAUDE.md, STATUS.md

🔲 TODO: isamples-python (remaining)

  • Remove playwright/ directory (-15MB node_modules cruft)
  • Clean .ipynb_checkpoints/ across examples

🔲 TODO: isamplesorg.github.io

Action Impact Effort
Delete assets/oc_isamples_pqg.parquet -724MB 5 min
Archive empty stubs (parquet.qmd, etc.) Clarity 10 min
Update tutorial index to highlight 3 core tutorials UX 15 min
Add cross-links to isamples-python Discovery 10 min

🔲 TODO: isamplesorg-metadata

Action Impact Effort
Add "Related Repositories" section to README Discovery 10 min
Archive notes/vocabulary/ (moved to separate repo) Clarity 5 min
Archive examples/APItesting/ Clarity 5 min
Consolidate README + new PQG docs into "Getting Started" Onboarding 30 min

Cross-Repo Linking

Each repo should include this in README:

## Related iSamples Repositories

| Repo | Purpose | Start Here |
|------|---------|------------|
| [isamplesorg-metadata](https://github.com/isamplesorg/metadata) | Schema definition | `src/schemas/isamples_core.yaml` |
| [isamples-python](https://github.com/isamplesorg/examples) | Jupyter examples | `examples/basic/isamples_explorer.ipynb` |
| [isamplesorg.github.io](https://isamplesorg.github.io/) | Browser tutorials | `tutorials/isamples_explorer.qmd` |
| [vocabularies](https://github.com/isamplesorg/vocabularies) | SKOS terms | Material types, context categories |

Canonical Data URLs

All repos should reference:

# Wide format (primary) - 280MB, 20M rows
WIDE_URL = "https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202601_wide.parquet"

# Narrow format (advanced) - 850MB, 106M rows  
NARROW_URL = "https://pub-a18234d962364c22a50c787b7ca09fa5.r2.dev/isamples_202512_narrow.parquet"

The "Bow" Summary

What iSamples MVP delivers:

  1. A domain-agnostic metadata standard for material samples (metadata repo)
  2. 6.7M samples from 4 sources in efficient geoparquet format (R2)
  3. Zero-install browser exploration with Cesium + DuckDB-WASM (website)
  4. Developer-friendly Jupyter examples for custom analysis (python repo)

What makes it work:

  • Single data source (R2 parquet) - no API dependency
  • Consistent schema (8 types, 14 predicates) across all domains
  • Two complementary UIs: browser (discovery) + Jupyter (analysis)

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions