feat(scan): optional location_filter in portals.yml + persist location to scan-history by mstine · Pull Request #570 · santifer/career-ops

mstine · 2026-05-04T17:57:45Z

Closes #569.

Summary

Adds opt-in location filtering to scan.mjs. Backwards compatible — if the new location_filter block is absent from portals.yml, behavior is identical to before.

Changes

scan.mjs (~25 lines net):
- New buildLocationFilter() helper alongside buildTitleFilter()
- Applied in the filter chain after title, before dedup
- New Filtered by location counter in the summary output
- Location persisted as 7th column in scan-history.tsv (appended — existing 6-column files still parse fine since loadSeenUrls only reads column 0)
templates/portals.example.yml: documented commented-out example block above title_filter
modes/scan.md: brief addition to the filter section (step 6b)

Filter semantics

Empty job location string → pass (don't penalize missing data)
Any block keyword present → reject (precedence over allow)
allow empty → pass once block is cleared
allow non-empty → must match at least one
All matches case-insensitive substring

Verification

Local re-scan against 13 enabled companies on a real portals.yml with the new filter configured for US-only + ATL/NYC metros:

Metric	Before	After
Total jobs found	3,144	3,144
Filtered by title	2,730	2,730
Filtered by location	—	308
New offers added	273	99

npm run doctor passes. Manually verified scan-history.tsv rows now include location column for audit.

Test plan

npm run doctor passes
Real scan with location_filter configured filters foreign roles correctly
Scan with no location_filter block produces identical behavior to pre-change (backwards compat)
scan-history.tsv 7th column populates with the job's location string
Maintainer review of API design choices (substring vs regex, allow/block precedence)

Non-goals

No regex support — substring-only keeps the code surface tiny
No per-company location overrides — companies stay enabled/disabled at the company level
Doesn't change pipeline.md line format — avoids breaking downstream parsers

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added optional location-based filtering to the scan workflow with case-insensitive matching and configurable allow/block rules.
- Location information is now recorded in scan history for audit purposes.
- Enhanced console output to separately report location-filtered job removals.
Documentation
- Added comprehensive configuration documentation and examples for the new location filtering feature.

…n to scan-history Adds opt-in location filtering to scan.mjs. Backwards compatible — absent location_filter block means "allow all" (current behavior). Semantics: - Empty job location → pass (don't penalize missing data) - block keyword present → reject (precedence over allow) - allow empty → pass (cleared block) - allow non-empty → must match at least one keyword - All matches case-insensitive substring Also persists location as 7th column in scan-history.tsv. Appended at the end so existing 6-column files continue to parse (loadSeenUrls only reads column 0). - scan.mjs: buildLocationFilter() + filter step + counter + tsv column - templates/portals.example.yml: documented commented-out example block - modes/scan.md: brief addition to the existing filter section (step 6b) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-04T17:57:57Z

📝 Walkthrough

Walkthrough

A new optional location filtering stage is added to the job scanner. Jobs are filtered by case-insensitive substring matching against configured allow/block lists, with block taking precedence. Job location is now persisted as a 7th column in the scan history for audit purposes. All changes are backward-compatible.

Changes

Location Filtering Feature

Layer / File(s)	Summary
Configuration Schema & Documentation `templates/portals.example.yml`, `modes/scan.md`	New `location_filter` block documented with allow/block semantics (case-insensitive substring matching, block precedence, empty allow passes, non-empty allow requires match). Example configuration provided and marked as optional/commented out.
Filtering Predicate `scan.mjs` (lines 137–158)	`buildLocationFilter()` creates a predicate that gates jobs by `job.location`: missing/empty values pass; `block` matches reject with precedence; empty `allow` passes; non-empty `allow` requires at least one substring match (case-insensitive).
Scan Loop Integration `scan.mjs` (lines 313–334)	Main scan loop extended to apply both title and location filter predicates; new counters `totalFilteredTitle` and `totalFilteredLocation` track removals separately; jobs failing either filter increment the appropriate counter and skip further processing.
Filter Instantiation `scan.mjs` (line 292)	`locationFilter` predicate is built from `config.location_filter` once at startup in `main()`.
History Persistence `scan.mjs` (lines 243–253)	`appendToScanHistory()` updated to write 7-column header including `location` when file is created; each appended row now includes `o.location` as the final column. Backward-compatible: existing 6-column files continue to parse since `loadSeenUrls` reads only column 0.
Reporting `scan.mjs` (lines 368–369)	Console summary output extended to report title-filtered and location-filtered removal counts separately (replacing previous combined title filter report).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

feat: optional location_filter in portals.yml + persist location to scan-history.tsv #569: This PR directly implements the proposal described in issue #569 — adding an optional location_filter configuration block with allow/block semantics and persisting location as a 7th column in scan-history.tsv for audit purposes.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: adding optional location filtering capability and persisting location data to scan history.
Linked Issues check	✅ Passed	The code changes fully implement the requirements from issue `#569`: optional location_filter with proper semantics, persistence to scan-history.tsv, and documentation.
Out of Scope Changes check	✅ Passed	All changes directly support the location filtering feature. No unrelated modifications to other functionality or scope creep detected.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modes/scan.md`:
- Around line 112-119: The Scan History examples and Step 8 still show the old
6-column schema; update all example rows and Step 8 to include the new 7th
column (location) and ensure the description mentions that location is persisted
as the 7th column in scan-history.tsv; search for references to location_filter,
portals.yml, scan-history.tsv, "Scan History" and Step 8 in this file and modify
the sample TSV rows and any explanatory text to use seven tab-separated columns
(with the new location column as column 7) and note that matching is
case-insensitive substring per the location_filter rules.

In `@scan.mjs`:
- Around line 146-156: The buildLocationFilter function currently calls .map()
on locationFilter.allow and locationFilter.block assuming arrays; guard and
normalize these fields first so non-array or string values don't crash the scan:
check Array.isArray(locationFilter.allow) and
Array.isArray(locationFilter.block) (or coerce strings into single-element
arrays) before mapping, default to [] when missing or invalid, then proceed to
create the lower-cased allow and block lists and return the existing predicate.
Update the logic around the allow/block creation in buildLocationFilter to
perform this normalization.
- Around line 242-248: The current appendToScanHistory only creates a new file
with the 7-column header but does not upgrade existing legacy files with a
6-column header; update appendToScanHistory to detect and migrate an old header:
when SCAN_HISTORY_PATH exists, read the first line (utf-8), split on '\t' and if
it lacks the 'location' column or has fewer than 7 columns, replace the header
line with 'url\tfirst_seen\tportal\ttitle\tcompany\tstatus\tlocation\n' and
rewrite the file preserving all subsequent lines; keep the existing behavior of
creating the file when missing and ensure migration is a one-time header-only
change so no rows are shifted or data lost.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 35d3210d-f4f4-4d3f-8754-48e9880ec451

📥 Commits

Reviewing files that changed from the base of the PR and between 13b4fef and a74d6af.

📒 Files selected for processing (3)

modes/scan.md
scan.mjs
templates/portals.example.yml

coderabbitai · 2026-05-04T18:04:38Z

+6b. **Filtrar por ubicación (opcional)** usando `location_filter` de `portals.yml`:
+   - Si el bloque `location_filter` está ausente, todas las ubicaciones pasan (comportamiento por defecto)
+   - Ubicación vacía en una oferta → pasa (no penalizar datos faltantes)
+   - Cualquier keyword de `block` presente → rechazar (precedencia sobre allow)
+   - `allow` vacío → pasa (ya superó block)
+   - `allow` no vacío → debe coincidir al menos una keyword
+   - Todas las coincidencias son case-insensitive substring
+   - La ubicación se persiste como 7ª columna en `scan-history.tsv` para auditoría posterior


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update the later scan-history.tsv examples to 7 columns as well.

This section introduces location as column 7, but Step 8 and the Scan History sample below still show 6-field rows. Leaving both schemas in one doc will cause manual scan workflows to keep writing the old format.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@modes/scan.md` around lines 112 - 119, The Scan History examples and Step 8 still show the old 6-column schema; update all example rows and Step 8 to include the new 7th column (location) and ensure the description mentions that location is persisted as the 7th column in scan-history.tsv; search for references to location_filter, portals.yml, scan-history.tsv, "Scan History" and Step 8 in this file and modify the sample TSV rows and any explanatory text to use seven tab-separated columns (with the new location column as column 7) and note that matching is case-insensitive substring per the location_filter rules.

coderabbitai · 2026-05-04T18:04:38Z

+function buildLocationFilter(locationFilter) {
+  if (!locationFilter) return () => true;
+  const allow = (locationFilter.allow || []).map(k => k.toLowerCase());
+  const block = (locationFilter.block || []).map(k => k.toLowerCase());
+
+  return (location) => {
+    if (!location) return true;
+    const lower = location.toLowerCase();
+    if (block.length > 0 && block.some(k => lower.includes(k))) return false;
+    if (allow.length === 0) return true;
+    return allow.some(k => lower.includes(k));


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard location_filter.allow and block before calling .map().

Lines 148-149 assume both YAML values are arrays. A common config typo like allow: "Remote" will throw here and abort the entire scan before any company is processed. Normalize or coerce these fields first so the optional filter degrades gracefully instead of crashing.

🛠️ Suggested hardening

function buildLocationFilter(locationFilter) { if (!locationFilter) return () => true; - const allow = (locationFilter.allow || []).map(k => k.toLowerCase()); - const block = (locationFilter.block || []).map(k => k.toLowerCase()); + const toKeywords = (value) => + (Array.isArray(value) ? value : typeof value === 'string' ? [value] : []) + .filter(v => typeof v === 'string' && v.trim() !== '') + .map(v => v.toLowerCase()); + const allow = toKeywords(locationFilter.allow); + const block = toKeywords(locationFilter.block); return (location) => { - if (!location) return true; - const lower = location.toLowerCase(); + const lower = String(location ?? '').trim().toLowerCase(); + if (!lower) return true; if (block.length > 0 && block.some(k => lower.includes(k))) return false; if (allow.length === 0) return true; return allow.some(k => lower.includes(k)); }; }

As per coding guidelines, "Ensure scripts handle missing data/ directories gracefully."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scan.mjs` around lines 146 - 156, The buildLocationFilter function currently calls .map() on locationFilter.allow and locationFilter.block assuming arrays; guard and normalize these fields first so non-array or string values don't crash the scan: check Array.isArray(locationFilter.allow) and Array.isArray(locationFilter.block) (or coerce strings into single-element arrays) before mapping, default to [] when missing or invalid, then proceed to create the lower-cased allow and block lists and return the existing predicate. Update the logic around the allow/block creation in buildLocationFilter to perform this normalization.

coderabbitai · 2026-05-04T18:04:38Z

 function appendToScanHistory(offers, date) {
-  // Ensure file + header exist
+  // Ensure file + header exist. Location appended as 7th column for non-breaking
+  // backward compat — older scan-history.tsv files with 6 columns still parse fine
+  // since loadSeenUrls only reads column 0.
  if (!existsSync(SCAN_HISTORY_PATH)) {
-    writeFileSync(SCAN_HISTORY_PATH, 'url\tfirst_seen\tportal\ttitle\tcompany\tstatus\n', 'utf-8');
+    writeFileSync(SCAN_HISTORY_PATH, 'url\tfirst_seen\tportal\ttitle\tcompany\tstatus\tlocation\n', 'utf-8');
  }


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Migrate legacy scan-history.tsv headers before appending location rows.

Line 247 only fixes brand-new files. Existing repositories keep the old 6-column header forever, so the new location field lands under an unlabeled column. URL dedup still works, but the scan-history audit/export format stays inconsistent until the header is upgraded once.

🧾 Suggested one-time header upgrade

function appendToScanHistory(offers, date) { + const header6 = 'url\tfirst_seen\tportal\ttitle\tcompany\tstatus'; + const header7 = `${header6}\tlocation`; + // Ensure file + header exist. Location appended as 7th column for non-breaking // backward compat — older scan-history.tsv files with 6 columns still parse fine // since loadSeenUrls only reads column 0. if (!existsSync(SCAN_HISTORY_PATH)) { - writeFileSync(SCAN_HISTORY_PATH, 'url\tfirst_seen\tportal\ttitle\tcompany\tstatus\tlocation\n', 'utf-8'); + writeFileSync(SCAN_HISTORY_PATH, `${header7}\n`, 'utf-8'); + } else { + const text = readFileSync(SCAN_HISTORY_PATH, 'utf-8'); + if (text.startsWith(`${header6}\n`) && !text.startsWith(`${header7}\n`)) { + writeFileSync( + SCAN_HISTORY_PATH, + text.replace(`${header6}\n`, `${header7}\n`), + 'utf-8', + ); + } } const lines = offers.map(o => `${o.url}\t${date}\t${o.source}\t${o.title}\t${o.company}\tadded\t${o.location || ''}`

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

function appendToScanHistory(offers, date) {

// Ensure file + header exist

// Ensure file + header exist. Location appended as 7th column for non-breaking

// backward compat — older scan-history.tsv files with 6 columns still parse fine

// since loadSeenUrls only reads column 0.

if (!existsSync(SCAN_HISTORY_PATH)) {

writeFileSync(SCAN_HISTORY_PATH, 'url\tfirst_seen\tportal\ttitle\tcompany\tstatus\n', 'utf-8');

writeFileSync(SCAN_HISTORY_PATH, 'url\tfirst_seen\tportal\ttitle\tcompany\tstatus\tlocation\n', 'utf-8');

}

function appendToScanHistory(offers, date) {

const header6 = 'url\tfirst_seen\tportal\ttitle\tcompany\tstatus';

const header7 = `${header6}\tlocation`;

// Ensure file + header exist. Location appended as 7th column for non-breaking

// backward compat — older scan-history.tsv files with 6 columns still parse fine

// since loadSeenUrls only reads column 0.

if (!existsSync(SCAN_HISTORY_PATH)) {

writeFileSync(SCAN_HISTORY_PATH, `${header7}\n`, 'utf-8');

} else {

const text = readFileSync(SCAN_HISTORY_PATH, 'utf-8');

if (text.startsWith(`${header6}\n`) && !text.startsWith(`${header7}\n`)) {

writeFileSync(

SCAN_HISTORY_PATH,

text.replace(`${header6}\n`, `${header7}\n`),

'utf-8',

);

}

}

}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scan.mjs` around lines 242 - 248, The current appendToScanHistory only creates a new file with the 7-column header but does not upgrade existing legacy files with a 6-column header; update appendToScanHistory to detect and migrate an old header: when SCAN_HISTORY_PATH exists, read the first line (utf-8), split on '\t' and if it lacks the 'location' column or has fewer than 7 columns, replace the header line with 'url\tfirst_seen\tportal\ttitle\tcompany\tstatus\tlocation\n' and rewrite the file preserving all subsequent lines; keep the existing behavior of creating the file when missing and ensure migration is a one-time header-only change so no rows are shifted or data lost.

…Fs, archived 3 below-threshold) - scan.mjs added 2 new offers (ElevenLabs Enterprise Solutions Engineer NA + EU-German) - Level 3 WebSearch added 4 Lever FDE candidates (Research Innovations [closed], JetBridge, zaimler, Symmetry Systems Associate L3) - 5 A-G evaluations: santifer#566 ElevenLabs NA (3.2/5, PDF), santifer#567 ElevenLabs EU German (1.5/5 SKIP), santifer#568 JetBridge LATAM (2.4/5 SKIP), santifer#569 zaimler NYC (2.7/5 SKIP), santifer#570 Symmetry L3 (3.0/5, PDF — fuzzy-merged into existing santifer#105) - Research Innovations TC2259 marked closed (Lever 404; defense/EW + US Secret + 8-12 yrs hard veto) - All reports include Location in header per Shortlist Display Policy - 0 errors / 0 warnings on verify-pipeline.mjs https://claude.ai/code/session_overnight-2026-05-06T04:26Z

github-actions Bot added ⚠️ agent-behavior 🔧 scripts labels May 4, 2026

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(scan): optional location_filter in portals.yml + persist location to scan-history#570

feat(scan): optional location_filter in portals.yml + persist location to scan-history#570
mstine wants to merge 1 commit intosantifer:mainfrom
mstine:feat/location-filter

mstine commented May 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 4, 2026

Uh oh!

coderabbitai Bot May 4, 2026

Uh oh!

coderabbitai Bot May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mstine commented May 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Filter semantics

Verification

Test plan

Non-goals

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mstine commented May 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading