Skip to content

Commit c1d935f

Browse files
pfillion42claude
andcommitted
feat: add semantic clustering with UnionFind, clusters endpoint and ClusterView page
Sprint 13: reusable UnionFind class, GET /api/memories/clusters with KNN + UMAP centroids, ClusterView page with ScatterPlot colorMap support, 16 new tests (314 total). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 04796e2 commit c1d935f

10 files changed

Lines changed: 803 additions & 46 deletions

File tree

PLAN.md

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -164,11 +164,39 @@ avec une interface web moderne et une API backend. Interfacable avec Claude.
164164
- Build client OK
165165
- Audit securite : lookup map SQL (corrige), pas de vulnerabilite specifique, points preexistants (rate-limit, helmet, zod) dans backlog Sprint 8.2
166166

167+
## Sprint 13 - Clustering semantique automatique
168+
169+
### 13.1 Extraire UnionFind en classe reutilisable - COMPLETE
170+
- Classe UnionFind exportee avec find(), union(), getClusters()
171+
- Refactoring /duplicates pour utiliser la classe (0 regression)
172+
173+
### 13.2 Backend - Endpoint GET /api/memories/clusters - COMPLETE
174+
- Parametres : threshold (0-1, defaut 0.6), min_size (>= 2, defaut 2)
175+
- KNN vec0 50 voisins, Union-Find clustering, UMAP centroids
176+
- Cache 5min avec invalidation sur modifications
177+
- Validation 400 pour threshold hors [0,1] et min_size < 2
178+
- 8 tests backend
179+
180+
### 13.3 Frontend - Page /clusters (ClusterView) - COMPLETE
181+
- Types Cluster et ClustersResponse dans types.ts
182+
- Hook useClusters(threshold, minSize) avec React Query
183+
- Page ClusterView : ScatterPlot (60%) + liste clusters (40%)
184+
- Sliders threshold (0.3-0.9) et min_size (2-10)
185+
- ScatterPlot : prop colorMap pour coloration par cluster (10 couleurs)
186+
- Route /clusters, NavLink entre Embeddings et Graphe
187+
- 8 tests frontend
188+
189+
### Bilan Sprint 13
190+
- 8 tests backend + 8 tests frontend = 16 nouveaux tests
191+
- Total : 314 tests (188 serveur + 126 client), tous verts
192+
- Build client OK
193+
- Audit securite : pas de vulnerabilite specifique au sprint, points preexistants (rate-limit, helmet, zod) dans backlog Sprint 8.2
194+
167195
## Backlog - Fonctionnalites futures
168196

169197
### Exploration et comprehension
170198
- [x] Projection 2D des embeddings (UMAP) - vue espace vectoriel complet (Sprint 10)
171-
- [ ] Clustering automatique - grouper par proximite semantique
199+
- [x] Clustering automatique - grouper par proximite semantique (Sprint 13)
172200

173201
### Navigation et UX
174202
- [x] Mode clair / toggle theme (Sprint 9)
@@ -200,7 +228,7 @@ avec une interface web moderne et une API backend. Interfacable avec Claude.
200228
- Embedding : all-MiniLM-L6-v2 (384 dims, cosine distance)
201229
- Injection de dependance : `createMemoriesRouter(db)` pour faciliter les tests
202230
- Frontend : React Query + React Router, hooks custom, theme sombre via CSS custom properties
203-
- Navigation : / (Dashboard), /timeline (Timeline), /memories (MemoryList), /memories/:hash (MemoryDetail), /duplicates (Duplicates), /tags (Tags), /stale (Stale), /embeddings (EmbeddingView), /graph (GraphView)
231+
- Navigation : / (Dashboard), /timeline (Timeline), /memories (MemoryList), /memories/:hash (MemoryDetail), /duplicates (Duplicates), /tags (Tags), /stale (Stale), /embeddings (EmbeddingView), /clusters (ClusterView), /graph (GraphView)
204232
- Embedder : @huggingface/transformers (all-MiniLM-L6-v2), injection de dependance pour tests
205233
- Graphe : react-force-graph-2d pour la visualisation force-directed
206234
- Projection : umap-js (Google PAIR) pour la projection 2D des embeddings, calcul serveur avec cache
@@ -225,3 +253,4 @@ avec une interface web moderne et une API backend. Interfacable avec Claude.
225253
| 2026-02-14 | Sprint 10 projection 2D | Endpoint UMAP, ScatterPlot canvas, page /embeddings, 264 tests verts |
226254
| 2026-02-14 | Sprint 11 compteur acces | POST access, accessStats, auto-increment MemoryDetail, Dashboard UI, 280 tests verts |
227255
| 2026-02-14 | Sprint 12 usage stats | memory_access_log, GET usage-stats, UsageChart, toggle periode Dashboard, 298 tests verts |
256+
| 2026-02-15 | Sprint 13 clustering semantique | UnionFind classe, GET clusters, ClusterView, ScatterPlot colorMap, 314 tests verts |

client/src/App.tsx

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ import { Timeline } from './pages/Timeline';
1010
import { Tags } from './pages/Tags';
1111
import { Stale } from './pages/Stale';
1212
import { EmbeddingView } from './pages/EmbeddingView';
13+
import { ClusterView } from './pages/ClusterView';
1314
import { Logo } from './components/Logo';
1415
import { KeyboardHelp } from './components/KeyboardHelp';
1516
import { useKeyboardShortcuts } from './hooks/useKeyboardShortcuts';
@@ -93,6 +94,9 @@ function AppContent() {
9394
<NavLink to="/embeddings" style={({ isActive }) => isActive ? activeStyle : navStyle}>
9495
Embeddings
9596
</NavLink>
97+
<NavLink to="/clusters" style={({ isActive }) => isActive ? activeStyle : navStyle}>
98+
Clusters
99+
</NavLink>
96100
<NavLink to="/graph" style={({ isActive }) => isActive ? activeStyle : navStyle}>
97101
Graphe
98102
</NavLink>
@@ -142,6 +146,7 @@ function AppContent() {
142146
<Route path="/tags" element={<Tags />} />
143147
<Route path="/stale" element={<Stale />} />
144148
<Route path="/embeddings" element={<EmbeddingView />} />
149+
<Route path="/clusters" element={<ClusterView />} />
145150
<Route path="/graph" element={<GraphView />} />
146151
</Routes>
147152
</main>

client/src/components/ScatterPlot.tsx

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,15 @@ interface ScatterPlotProps {
1919
onPointClick?: (hash: string) => void;
2020
width?: number;
2121
height?: number;
22+
colorMap?: Record<string, string>;
2223
}
2324

24-
function getColor(type: string | null): string {
25+
function getColor(type: string | null, hash?: string, colorMap?: Record<string, string>): string {
26+
if (colorMap && hash && colorMap[hash]) return colorMap[hash];
2527
return TYPE_COLORS[type || ''] || DEFAULT_COLOR;
2628
}
2729

28-
export function ScatterPlot({ points, onPointClick, width = 900, height = 500 }: ScatterPlotProps) {
30+
export function ScatterPlot({ points, onPointClick, width = 900, height = 500, colorMap }: ScatterPlotProps) {
2931
const canvasRef = useRef<HTMLCanvasElement>(null);
3032
const [hoveredIndex, setHoveredIndex] = useState<number | null>(null);
3133
const [transform, setTransform] = useState({ scale: 1, offsetX: 0, offsetY: 0 });
@@ -84,7 +86,7 @@ export function ScatterPlot({ points, onPointClick, width = 900, height = 500 }:
8486
const p = points[i];
8587
const px = normalizeX(p.x);
8688
const py = normalizeY(p.y);
87-
const color = getColor(p.memory_type);
89+
const color = getColor(p.memory_type, p.content_hash, colorMap);
8890
const isHovered = i === hoveredIndex;
8991
const r = isHovered ? HOVER_RADIUS : POINT_RADIUS;
9092

client/src/hooks/useClusters.ts

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
import { useQuery } from '@tanstack/react-query';
2+
import type { ClustersResponse } from '../types';
3+
4+
async function fetchClusters(threshold: number, minSize: number): Promise<ClustersResponse> {
5+
const params = new URLSearchParams({
6+
threshold: threshold.toString(),
7+
min_size: minSize.toString(),
8+
});
9+
10+
const res = await fetch(`/api/memories/clusters?${params}`);
11+
if (!res.ok) throw new Error('Erreur lors du chargement des clusters');
12+
return res.json();
13+
}
14+
15+
export function useClusters(threshold = 0.6, minSize = 2) {
16+
return useQuery({
17+
queryKey: ['clusters', threshold, minSize],
18+
queryFn: () => fetchClusters(threshold, minSize),
19+
staleTime: 60_000,
20+
});
21+
}

client/src/pages/ClusterView.tsx

Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
import { useState, useCallback, useMemo } from 'react';
2+
import { useNavigate } from 'react-router-dom';
3+
import { useClusters } from '../hooks/useClusters';
4+
import { ScatterPlot } from '../components/ScatterPlot';
5+
import type { ProjectionPoint } from '../types';
6+
7+
const CLUSTER_COLORS = [
8+
'#3b82f6', '#f59e0b', '#22c55e', '#a78bfa', '#f43f5e',
9+
'#6366f1', '#14b8a6', '#e879f9', '#fb923c', '#64748b',
10+
];
11+
12+
export function ClusterView() {
13+
const [threshold, setThreshold] = useState(0.6);
14+
const [minSize, setMinSize] = useState(2);
15+
const { data, isLoading, isError } = useClusters(threshold, minSize);
16+
const navigate = useNavigate();
17+
18+
const handlePointClick = useCallback((hash: string) => {
19+
navigate(`/memories/${hash}`);
20+
}, [navigate]);
21+
22+
// Construire les points pour le ScatterPlot et la colorMap
23+
const { points, colorMap } = useMemo(() => {
24+
if (!data || data.clusters.length === 0) return { points: [], colorMap: {} };
25+
26+
const pts: ProjectionPoint[] = [];
27+
const cMap: Record<string, string> = {};
28+
29+
for (const cluster of data.clusters) {
30+
const color = CLUSTER_COLORS[cluster.id % CLUSTER_COLORS.length];
31+
for (const mem of cluster.members) {
32+
pts.push({
33+
content_hash: mem.content_hash,
34+
x: cluster.centroid.x + (Math.random() - 0.5) * 0.5,
35+
y: cluster.centroid.y + (Math.random() - 0.5) * 0.5,
36+
content: mem.content.length > 100 ? mem.content.substring(0, 100) + '...' : mem.content,
37+
memory_type: mem.memory_type,
38+
tags: mem.tags,
39+
created_at_iso: mem.created_at_iso,
40+
});
41+
cMap[mem.content_hash] = color;
42+
}
43+
}
44+
45+
return { points: pts, colorMap: cMap };
46+
}, [data]);
47+
48+
if (isLoading) {
49+
return <p style={{ color: 'var(--text-muted)' }}>Chargement...</p>;
50+
}
51+
52+
if (isError) {
53+
return <p style={{ color: 'var(--error)' }}>Erreur lors du chargement des clusters.</p>;
54+
}
55+
56+
const clusters = data?.clusters ?? [];
57+
const totalClusters = data?.total_clusters ?? 0;
58+
59+
return (
60+
<div>
61+
<div style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center', marginBottom: '16px' }}>
62+
<h2 style={{ margin: 0, fontSize: '16px', fontWeight: 600, color: 'var(--text-primary)' }}>
63+
Clusters semantiques
64+
</h2>
65+
<span style={{ fontSize: '12px', color: 'var(--text-muted)' }}>
66+
{totalClusters} clusters
67+
</span>
68+
</div>
69+
70+
{/* Controles */}
71+
<div style={{
72+
display: 'flex',
73+
gap: '24px',
74+
marginBottom: '16px',
75+
padding: '12px 16px',
76+
backgroundColor: 'var(--bg-surface)',
77+
borderRadius: 'var(--radius-md)',
78+
border: '1px solid var(--border-subtle)',
79+
alignItems: 'center',
80+
flexWrap: 'wrap',
81+
}}>
82+
<label style={{ display: 'flex', alignItems: 'center', gap: '8px', fontSize: '13px', color: 'var(--text-secondary)' }}>
83+
Seuil (threshold)
84+
<input
85+
type="range"
86+
min="0.3"
87+
max="0.9"
88+
step="0.05"
89+
value={threshold}
90+
onChange={e => setThreshold(parseFloat(e.target.value))}
91+
aria-label="Seuil"
92+
style={{ width: '100px' }}
93+
/>
94+
<span style={{ fontSize: '12px', color: 'var(--text-muted)', minWidth: '32px' }}>{threshold.toFixed(2)}</span>
95+
</label>
96+
97+
<label style={{ display: 'flex', alignItems: 'center', gap: '8px', fontSize: '13px', color: 'var(--text-secondary)' }}>
98+
Taille min (min_size)
99+
<input
100+
type="range"
101+
min="2"
102+
max="10"
103+
step="1"
104+
value={minSize}
105+
onChange={e => setMinSize(parseInt(e.target.value))}
106+
aria-label="Taille minimale"
107+
style={{ width: '100px' }}
108+
/>
109+
<span style={{ fontSize: '12px', color: 'var(--text-muted)', minWidth: '24px' }}>{minSize}</span>
110+
</label>
111+
</div>
112+
113+
{/* Layout : ScatterPlot + liste */}
114+
{clusters.length === 0 ? (
115+
<p style={{ color: 'var(--text-muted)', textAlign: 'center', padding: '40px 0' }}>
116+
Aucun cluster trouve avec ces parametres.
117+
</p>
118+
) : (
119+
<div style={{ display: 'flex', gap: '16px' }}>
120+
{/* ScatterPlot a gauche (60%) */}
121+
<div style={{ flex: '0 0 60%' }}>
122+
<ScatterPlot
123+
points={points}
124+
onPointClick={handlePointClick}
125+
width={650}
126+
height={500}
127+
colorMap={colorMap}
128+
/>
129+
</div>
130+
131+
{/* Liste des clusters a droite (40%) */}
132+
<div style={{ flex: '1', overflow: 'auto', maxHeight: '500px' }}>
133+
{clusters.map(cluster => (
134+
<div
135+
key={cluster.id}
136+
style={{
137+
padding: '12px 16px',
138+
marginBottom: '8px',
139+
backgroundColor: 'var(--bg-surface)',
140+
borderRadius: 'var(--radius-md)',
141+
border: '1px solid var(--border-subtle)',
142+
}}
143+
>
144+
<div style={{ display: 'flex', justifyContent: 'space-between', alignItems: 'center', marginBottom: '6px' }}>
145+
<div style={{ display: 'flex', alignItems: 'center', gap: '8px' }}>
146+
<span style={{
147+
width: '10px',
148+
height: '10px',
149+
borderRadius: '50%',
150+
backgroundColor: CLUSTER_COLORS[cluster.id % CLUSTER_COLORS.length],
151+
display: 'inline-block',
152+
}} />
153+
<span style={{ fontSize: '13px', fontWeight: 600, color: 'var(--text-primary)' }}>
154+
{cluster.label}
155+
</span>
156+
</div>
157+
<span style={{ fontSize: '12px', color: 'var(--text-muted)' }}>
158+
{cluster.size} memoires
159+
</span>
160+
</div>
161+
<div style={{ fontSize: '11px', color: 'var(--text-secondary)' }}>
162+
Similarite : {cluster.avg_similarity.toFixed(2)}
163+
</div>
164+
</div>
165+
))}
166+
</div>
167+
</div>
168+
)}
169+
</div>
170+
);
171+
}

client/src/types.ts

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,21 @@ export interface UsageStatsResponse {
122122
accesses: UsageDataPoint[];
123123
}
124124

125+
export interface Cluster {
126+
id: number;
127+
label: string;
128+
size: number;
129+
members: Memory[];
130+
avg_similarity: number;
131+
centroid: { x: number; y: number };
132+
}
133+
134+
export interface ClustersResponse {
135+
clusters: Cluster[];
136+
total_clusters: number;
137+
params: { threshold: number; min_size: number };
138+
}
139+
125140
export interface ProjectionPoint {
126141
content_hash: string;
127142
x: number;

client/tests/App.test.tsx

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,12 @@ beforeEach(() => {
4141
json: () => Promise.resolve({ groups: [], total: 0 }),
4242
} as Response);
4343
}
44+
if (urlStr.includes('/api/memories/clusters')) {
45+
return Promise.resolve({
46+
ok: true,
47+
json: () => Promise.resolve({ clusters: [], total_clusters: 0, params: { threshold: 0.6, min_size: 2 } }),
48+
} as Response);
49+
}
4450
if (urlStr.includes('/api/tags')) {
4551
return Promise.resolve({
4652
ok: true,
@@ -73,6 +79,7 @@ describe('App', () => {
7379
expect(screen.getByText('Doublons')).toBeDefined();
7480
expect(screen.getByText('Tags')).toBeDefined();
7581
expect(screen.getByText('Obsoletes')).toBeDefined();
82+
expect(screen.getByText('Clusters')).toBeDefined();
7683
expect(screen.getByText('Graphe')).toBeDefined();
7784
});
7885

0 commit comments

Comments
 (0)