explorer: 'Samples in View' counter is the fetch budget, not the real count (#201 Part 1)

Originally filed as Part 1 of #201. Splitting out as a dedicated issue since #201 was closed by #203 / #205 (which fixed Part 2 only).

## Symptom

The "Samples in View" stat box reads exactly 5,000 in dense regions — the value of `DEFAULT_POINT_BUDGET` at `explorer.qmd:418`. In Cyprus (lat ≈ 34.99, lng ≈ 33.70), direct DuckDB query against `data.isamples.org/isamples_202601_samples_map_lite.parquet` returns 23,421 samples in a ±0.1° box. The counter underreports by ~5x there. The cluster is one dense site (almost certainly Polis Excavations, OPENCONTEXT source).

## Root cause

`explorer.qmd:1530-1538` — the point-mode viewport query:

```sql
SELECT pid, label, source, latitude, longitude, place_name, result_time
FROM read_parquet('${lite_url}')
WHERE latitude BETWEEN ${padded.south} AND ${padded.north}
  AND longitude BETWEEN ${padded.west} AND ${padded.east}
  ${sourceFilterSQL('source')}
  ${facetFilterSQL()}
LIMIT 5000
```

`explorer.qmd:1557`:

```js
updateStats('Samples', cachedData.length, cachedData.length, ..., 'Samples in View', 'Samples in View');
```

`cachedData.length` IS the row count of the LIMIT 5000 result. The counter therefore tops out at 5000 by construction.

Secondary smells:
- **No `ORDER BY` before `LIMIT`** → which 5000 rows return is undefined (probably stable in DuckDB-on-parquet but not contractual).
- **Label says "in View" but fetch uses a padded (30%) viewport** (`explorer.qmd:1514-1522`). Even ignoring the cap, the count meaning is loose.
- `renderSamplePoints` plots all of `cachedData` including rows outside the actual viewport.

## Fix directions (from Codex retrospective on #203)

In rough order of effort:

1. **Honest relabel** (cheapest): change the label to "Samples Loaded (max N)" and wire the budget value into the label. Counter stops lying.
2. **Compute real count alongside**: a fast `SELECT count(*)` against the same WHERE (no LIMIT) is cheap on the lite parquet via DuckDB-WASM range reads. Display "X loaded / Y total in view", with explicit signaling when Y > X.
3. **Adaptive aggregation**: if real count > budget, fall back to a cluster-style representation or surface a "too dense to render individually — Y samples here" affordance.
4. **Add `ORDER BY pid`** to the point query so the 5000 subset is at least deterministic across browsers and sessions.

Direction 2 (real-count alongside) is probably the right user-visible answer; direction 4 is independent and could ship with any of the others.

## Acceptance

- [ ] Counter accurately represents the in-view sample count, or is unambiguously labeled as a capped/loaded count.
- [ ] Cyprus deep-link (`#v=1&lat=34.9957&lng=33.6798&alt=15212&mode=point`) shows a number that does not silently understate the real density.
- [ ] No regression in cluster-mode "Samples in View" (which already counts viewport intersections correctly).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

explorer: 'Samples in View' counter is the fetch budget, not the real count (#201 Part 1) #206

Symptom

Root cause

Fix directions (from Codex retrospective on #203)

Acceptance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

explorer: 'Samples in View' counter is the fetch budget, not the real count (#201 Part 1) #206

Description

Symptom

Root cause

Fix directions (from Codex retrospective on #203)

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions