feat(scan): optional location_filter in portals.yml + persist location to scan-history#570
feat(scan): optional location_filter in portals.yml + persist location to scan-history#570mstine wants to merge 1 commit intosantifer:mainfrom
Conversation
…n to scan-history Adds opt-in location filtering to scan.mjs. Backwards compatible — absent location_filter block means "allow all" (current behavior). Semantics: - Empty job location → pass (don't penalize missing data) - block keyword present → reject (precedence over allow) - allow empty → pass (cleared block) - allow non-empty → must match at least one keyword - All matches case-insensitive substring Also persists location as 7th column in scan-history.tsv. Appended at the end so existing 6-column files continue to parse (loadSeenUrls only reads column 0). - scan.mjs: buildLocationFilter() + filter step + counter + tsv column - templates/portals.example.yml: documented commented-out example block - modes/scan.md: brief addition to the existing filter section (step 6b) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughA new optional location filtering stage is added to the job scanner. Jobs are filtered by case-insensitive substring matching against configured allow/block lists, with block taking precedence. Job location is now persisted as a 7th column in the scan history for audit purposes. All changes are backward-compatible. ChangesLocation Filtering Feature
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modes/scan.md`:
- Around line 112-119: The Scan History examples and Step 8 still show the old
6-column schema; update all example rows and Step 8 to include the new 7th
column (location) and ensure the description mentions that location is persisted
as the 7th column in scan-history.tsv; search for references to location_filter,
portals.yml, scan-history.tsv, "Scan History" and Step 8 in this file and modify
the sample TSV rows and any explanatory text to use seven tab-separated columns
(with the new location column as column 7) and note that matching is
case-insensitive substring per the location_filter rules.
In `@scan.mjs`:
- Around line 146-156: The buildLocationFilter function currently calls .map()
on locationFilter.allow and locationFilter.block assuming arrays; guard and
normalize these fields first so non-array or string values don't crash the scan:
check Array.isArray(locationFilter.allow) and
Array.isArray(locationFilter.block) (or coerce strings into single-element
arrays) before mapping, default to [] when missing or invalid, then proceed to
create the lower-cased allow and block lists and return the existing predicate.
Update the logic around the allow/block creation in buildLocationFilter to
perform this normalization.
- Around line 242-248: The current appendToScanHistory only creates a new file
with the 7-column header but does not upgrade existing legacy files with a
6-column header; update appendToScanHistory to detect and migrate an old header:
when SCAN_HISTORY_PATH exists, read the first line (utf-8), split on '\t' and if
it lacks the 'location' column or has fewer than 7 columns, replace the header
line with 'url\tfirst_seen\tportal\ttitle\tcompany\tstatus\tlocation\n' and
rewrite the file preserving all subsequent lines; keep the existing behavior of
creating the file when missing and ensure migration is a one-time header-only
change so no rows are shifted or data lost.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 35d3210d-f4f4-4d3f-8754-48e9880ec451
📒 Files selected for processing (3)
modes/scan.mdscan.mjstemplates/portals.example.yml
| 6b. **Filtrar por ubicación (opcional)** usando `location_filter` de `portals.yml`: | ||
| - Si el bloque `location_filter` está ausente, todas las ubicaciones pasan (comportamiento por defecto) | ||
| - Ubicación vacía en una oferta → pasa (no penalizar datos faltantes) | ||
| - Cualquier keyword de `block` presente → rechazar (precedencia sobre allow) | ||
| - `allow` vacío → pasa (ya superó block) | ||
| - `allow` no vacío → debe coincidir al menos una keyword | ||
| - Todas las coincidencias son case-insensitive substring | ||
| - La ubicación se persiste como 7ª columna en `scan-history.tsv` para auditoría posterior |
There was a problem hiding this comment.
Update the later scan-history.tsv examples to 7 columns as well.
This section introduces location as column 7, but Step 8 and the Scan History sample below still show 6-field rows. Leaving both schemas in one doc will cause manual scan workflows to keep writing the old format.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modes/scan.md` around lines 112 - 119, The Scan History examples and Step 8
still show the old 6-column schema; update all example rows and Step 8 to
include the new 7th column (location) and ensure the description mentions that
location is persisted as the 7th column in scan-history.tsv; search for
references to location_filter, portals.yml, scan-history.tsv, "Scan History" and
Step 8 in this file and modify the sample TSV rows and any explanatory text to
use seven tab-separated columns (with the new location column as column 7) and
note that matching is case-insensitive substring per the location_filter rules.
| function buildLocationFilter(locationFilter) { | ||
| if (!locationFilter) return () => true; | ||
| const allow = (locationFilter.allow || []).map(k => k.toLowerCase()); | ||
| const block = (locationFilter.block || []).map(k => k.toLowerCase()); | ||
|
|
||
| return (location) => { | ||
| if (!location) return true; | ||
| const lower = location.toLowerCase(); | ||
| if (block.length > 0 && block.some(k => lower.includes(k))) return false; | ||
| if (allow.length === 0) return true; | ||
| return allow.some(k => lower.includes(k)); |
There was a problem hiding this comment.
Guard location_filter.allow and block before calling .map().
Lines 148-149 assume both YAML values are arrays. A common config typo like allow: "Remote" will throw here and abort the entire scan before any company is processed. Normalize or coerce these fields first so the optional filter degrades gracefully instead of crashing.
🛠️ Suggested hardening
function buildLocationFilter(locationFilter) {
if (!locationFilter) return () => true;
- const allow = (locationFilter.allow || []).map(k => k.toLowerCase());
- const block = (locationFilter.block || []).map(k => k.toLowerCase());
+ const toKeywords = (value) =>
+ (Array.isArray(value) ? value : typeof value === 'string' ? [value] : [])
+ .filter(v => typeof v === 'string' && v.trim() !== '')
+ .map(v => v.toLowerCase());
+ const allow = toKeywords(locationFilter.allow);
+ const block = toKeywords(locationFilter.block);
return (location) => {
- if (!location) return true;
- const lower = location.toLowerCase();
+ const lower = String(location ?? '').trim().toLowerCase();
+ if (!lower) return true;
if (block.length > 0 && block.some(k => lower.includes(k))) return false;
if (allow.length === 0) return true;
return allow.some(k => lower.includes(k));
};
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scan.mjs` around lines 146 - 156, The buildLocationFilter function currently
calls .map() on locationFilter.allow and locationFilter.block assuming arrays;
guard and normalize these fields first so non-array or string values don't crash
the scan: check Array.isArray(locationFilter.allow) and
Array.isArray(locationFilter.block) (or coerce strings into single-element
arrays) before mapping, default to [] when missing or invalid, then proceed to
create the lower-cased allow and block lists and return the existing predicate.
Update the logic around the allow/block creation in buildLocationFilter to
perform this normalization.
| function appendToScanHistory(offers, date) { | ||
| // Ensure file + header exist | ||
| // Ensure file + header exist. Location appended as 7th column for non-breaking | ||
| // backward compat — older scan-history.tsv files with 6 columns still parse fine | ||
| // since loadSeenUrls only reads column 0. | ||
| if (!existsSync(SCAN_HISTORY_PATH)) { | ||
| writeFileSync(SCAN_HISTORY_PATH, 'url\tfirst_seen\tportal\ttitle\tcompany\tstatus\n', 'utf-8'); | ||
| writeFileSync(SCAN_HISTORY_PATH, 'url\tfirst_seen\tportal\ttitle\tcompany\tstatus\tlocation\n', 'utf-8'); | ||
| } |
There was a problem hiding this comment.
Migrate legacy scan-history.tsv headers before appending location rows.
Line 247 only fixes brand-new files. Existing repositories keep the old 6-column header forever, so the new location field lands under an unlabeled column. URL dedup still works, but the scan-history audit/export format stays inconsistent until the header is upgraded once.
🧾 Suggested one-time header upgrade
function appendToScanHistory(offers, date) {
+ const header6 = 'url\tfirst_seen\tportal\ttitle\tcompany\tstatus';
+ const header7 = `${header6}\tlocation`;
+
// Ensure file + header exist. Location appended as 7th column for non-breaking
// backward compat — older scan-history.tsv files with 6 columns still parse fine
// since loadSeenUrls only reads column 0.
if (!existsSync(SCAN_HISTORY_PATH)) {
- writeFileSync(SCAN_HISTORY_PATH, 'url\tfirst_seen\tportal\ttitle\tcompany\tstatus\tlocation\n', 'utf-8');
+ writeFileSync(SCAN_HISTORY_PATH, `${header7}\n`, 'utf-8');
+ } else {
+ const text = readFileSync(SCAN_HISTORY_PATH, 'utf-8');
+ if (text.startsWith(`${header6}\n`) && !text.startsWith(`${header7}\n`)) {
+ writeFileSync(
+ SCAN_HISTORY_PATH,
+ text.replace(`${header6}\n`, `${header7}\n`),
+ 'utf-8',
+ );
+ }
}
const lines = offers.map(o =>
`${o.url}\t${date}\t${o.source}\t${o.title}\t${o.company}\tadded\t${o.location || ''}`📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| function appendToScanHistory(offers, date) { | |
| // Ensure file + header exist | |
| // Ensure file + header exist. Location appended as 7th column for non-breaking | |
| // backward compat — older scan-history.tsv files with 6 columns still parse fine | |
| // since loadSeenUrls only reads column 0. | |
| if (!existsSync(SCAN_HISTORY_PATH)) { | |
| writeFileSync(SCAN_HISTORY_PATH, 'url\tfirst_seen\tportal\ttitle\tcompany\tstatus\n', 'utf-8'); | |
| writeFileSync(SCAN_HISTORY_PATH, 'url\tfirst_seen\tportal\ttitle\tcompany\tstatus\tlocation\n', 'utf-8'); | |
| } | |
| function appendToScanHistory(offers, date) { | |
| const header6 = 'url\tfirst_seen\tportal\ttitle\tcompany\tstatus'; | |
| const header7 = `${header6}\tlocation`; | |
| // Ensure file + header exist. Location appended as 7th column for non-breaking | |
| // backward compat — older scan-history.tsv files with 6 columns still parse fine | |
| // since loadSeenUrls only reads column 0. | |
| if (!existsSync(SCAN_HISTORY_PATH)) { | |
| writeFileSync(SCAN_HISTORY_PATH, `${header7}\n`, 'utf-8'); | |
| } else { | |
| const text = readFileSync(SCAN_HISTORY_PATH, 'utf-8'); | |
| if (text.startsWith(`${header6}\n`) && !text.startsWith(`${header7}\n`)) { | |
| writeFileSync( | |
| SCAN_HISTORY_PATH, | |
| text.replace(`${header6}\n`, `${header7}\n`), | |
| 'utf-8', | |
| ); | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scan.mjs` around lines 242 - 248, The current appendToScanHistory only
creates a new file with the 7-column header but does not upgrade existing legacy
files with a 6-column header; update appendToScanHistory to detect and migrate
an old header: when SCAN_HISTORY_PATH exists, read the first line (utf-8), split
on '\t' and if it lacks the 'location' column or has fewer than 7 columns,
replace the header line with
'url\tfirst_seen\tportal\ttitle\tcompany\tstatus\tlocation\n' and rewrite the
file preserving all subsequent lines; keep the existing behavior of creating the
file when missing and ensure migration is a one-time header-only change so no
rows are shifted or data lost.
…Fs, archived 3 below-threshold) - scan.mjs added 2 new offers (ElevenLabs Enterprise Solutions Engineer NA + EU-German) - Level 3 WebSearch added 4 Lever FDE candidates (Research Innovations [closed], JetBridge, zaimler, Symmetry Systems Associate L3) - 5 A-G evaluations: santifer#566 ElevenLabs NA (3.2/5, PDF), santifer#567 ElevenLabs EU German (1.5/5 SKIP), santifer#568 JetBridge LATAM (2.4/5 SKIP), santifer#569 zaimler NYC (2.7/5 SKIP), santifer#570 Symmetry L3 (3.0/5, PDF — fuzzy-merged into existing santifer#105) - Research Innovations TC2259 marked closed (Lever 404; defense/EW + US Secret + 8-12 yrs hard veto) - All reports include Location in header per Shortlist Display Policy - 0 errors / 0 warnings on verify-pipeline.mjs https://claude.ai/code/session_overnight-2026-05-06T04:26Z
Closes #569.
Summary
Adds opt-in location filtering to
scan.mjs. Backwards compatible — if the newlocation_filterblock is absent fromportals.yml, behavior is identical to before.Changes
scan.mjs(~25 lines net):buildLocationFilter()helper alongsidebuildTitleFilter()Filtered by locationcounter in the summary outputscan-history.tsv(appended — existing 6-column files still parse fine sinceloadSeenUrlsonly reads column 0)templates/portals.example.yml: documented commented-out example block abovetitle_filtermodes/scan.md: brief addition to the filter section (step 6b)Filter semantics
blockkeyword present → reject (precedence over allow)allowempty → pass once block is clearedallownon-empty → must match at least oneVerification
Local re-scan against 13 enabled companies on a real
portals.ymlwith the new filter configured for US-only + ATL/NYC metros:npm run doctorpasses. Manually verifiedscan-history.tsvrows now include location column for audit.Test plan
npm run doctorpasseslocation_filterconfigured filters foreign roles correctlylocation_filterblock produces identical behavior to pre-change (backwards compat)scan-history.tsv7th column populates with the job's location stringNon-goals
pipeline.mdline format — avoids breaking downstream parsers🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation