feat: optimize full pipeline for internship search#556
feat: optimize full pipeline for internship search#556vineetjangiriitb wants to merge 1 commit intosantifer:mainfrom
Conversation
Fixes two critical bugs that caused the zero-token scanner to scan 0 companies, and realigns all evaluation modes from senior-role framing to intern-level framing for IIT Bombay B.Tech 2027 candidate. **scan.mjs** - Fix Greenhouse URL regex to match both `boards.greenhouse.io` and `job-boards.greenhouse.io` variants (was silently skipping Anthropic, W&B, Arize, Retool) - Add Ashby `employmentType` filter: drop jobs explicitly tagged full-time/permanent before title filtering - Replace substring title matching with word-boundary regex so "intern" no longer matches "International" or "Internal Tooling" **modes/oferta.md** - Step 0: read archetypes from `_profile.md` instead of hardcoded 6 - Block A: add Duration and Stipend fields for intern postings - Block B: replace FDE/SA/PM/LLMOps framing with intern archetypes - Block C: replace "sell senior / downlevel" plan with "demonstrate learning velocity / low stipend" plan - Block D: add Internshala as comp source; add intern-calibrated scoring scale (₹15k–₹100k+ / $7k+ USD) - Block F: remove "signals seniority" framing for Reflection column; update archetype framing and red-flag questions to intern level **batch/batch-prompt.md** - Replace all 6 senior archetypes with 5 intern archetypes - Replace senior adaptive framing with intern proof-point framing - Update Block C and Block D to match oferta.md changes - Translate header from Spanish to English **modes/followup.md** - Replace senior follow-up example (15 yrs PHP) with intern example using candidate's actual projects **modes/_shared.md** - Cover letter guidance now distinguishes short text-box (3–4 sentences) from full 1-page cover letter Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Welcome to career-ops, @vineetjangiriitb! Thanks for your first PR. A few things to know:
We'll review your PR soon. Join our Discord if you have questions. |
📝 WalkthroughWalkthroughThis PR transitions the career-ops system from Spanish to English and reorients it from senior-hire evaluation to internship-focused job discovery and evaluation. Changes include language policy enforcement, archetype redefinition, evaluation workflow expansion, and job discovery filtering logic updates across documentation and a script file. ChangesInternship-Focused System Reorientation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 7/8 reviews remaining, refill in 7 minutes and 30 seconds.Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
batch/batch-prompt.md (2)
96-103:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winCritical: Block B references obsolete senior archetypes instead of the new intern archetypes.
Lines 96-103 still reference the old senior-role archetypes (FDE, SA, PM, LLMOps, Agentic, Transformation), but Step 0 (lines 58-86) now detects intern archetypes (ML/Deep Learning Intern, AI Engineering Intern, Backend/Fullstack Intern, Research Intern, Data Science Intern). This mismatch will break the archetype-adapted evaluation logic in Block B.
🔧 Proposed fix to align Block B with intern archetypes
**Adaptado al arquetipo:** -- FDE → priorizar delivery rápida y client-facing -- SA → priorizar diseño de sistemas e integrations -- PM → priorizar product discovery y métricas -- LLMOps → priorizar evals, observability, pipelines -- Agentic → priorizar multi-agent, HITL, orchestration -- Transformation → priorizar change management, adoption, scaling +- If ML / Deep Learning → prioritize training/fine-tuning experiments, PyTorch fluency, paper implementation +- If AI Engineering → prioritize LLM API integration, RAG pipelines, end-to-end AI feature delivery +- If Backend / Fullstack → prioritize API design, systems thinking, database schema, shipped code +- If Research (CV/NLP) → prioritize paper reading, benchmark implementation, research methodology +- If Data Science → prioritize EDA, modeling pipeline, SQL, clear communication of resultsThis matches the archetype-specific guidance from
modes/oferta.md:29-34.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@batch/batch-prompt.md` around lines 96 - 103, Block B currently uses obsolete senior archetype labels (FDE, SA, PM, LLMOps, Agentic, Transformation) which no longer match the intern archetypes detected in Step 0; update the archetype mapping in Block B to use the intern archetype names (ML/Deep Learning Intern, AI Engineering Intern, Backend/Fullstack Intern, Research Intern, Data Science Intern) and adjust any role-specific guidance text to match the intern-focused priorities referenced in modes/oferta.md lines 29-34 so the archetype-adapted evaluation logic aligns with the output of Step 0.
3-377:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftTranslate the entire prompt to English to comply with output requirements.
The document body remains predominantly in Spanish ("Eres un worker de evaluación", "Fuentes de Verdad", "Bloque A-G", etc.), which violates the learning requirement that ALL outputs must be in English. Since this is a prompt template that generates evaluation reports, keeping it in Spanish will likely produce Spanish-language outputs.
Based on learnings: "ALL outputs MUST be in English, including internal reports, generated CVs, outreach messages, and all communication with the user. NEVER generate content in Spanish or any other language unless explicitly requested by the user for a specific task."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@batch/batch-prompt.md` around lines 3 - 377, Summary: The prompt file is mostly in Spanish and must be fully translated into English while preserving all placeholders, filenames, rules, and structural semantics. Fix: Translate every user-facing and internal instruction in batch/batch-prompt.md into clear English (preserve exact placeholders like {{URL}}, {{JD_FILE}}, {{REPORT_NUM}}, {{DATE}}, {{ID}}; keep filenames cv.md, llms.txt, article-digest.md, generate-pdf.mjs, templates/cv-template.html, modes/_profile.md; retain blocks A–G, pipeline steps, TSV format, JSON output schema and “NUNCA/SIEMPRE” rules), update the explicit rule that "ALL outputs must be in English" into the document, and ensure examples, tables, and template placeholders remain semantically identical. Also ensure formatting, headings (A–G), and the PDF generation command and template placeholder names are unchanged except for language; do not alter behavior, filenames, or numeric formats.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@batch/batch-prompt.md`:
- Around line 118-120: Update the generic 1-5 scoring text in batch-prompt.md
(around the internship stipend guidance) to include the detailed intern-specific
calibration scale from modes/oferta.md (refer to the calibration anchors at
lines ~57-62) so evaluators have concrete numerical anchors for 5=top quartile,
4=above market, 3=median, 2=slightly below, 1=well below; replace or expand the
existing Score de comp paragraph to verbatim/near-verbatim include those
intern-calibrated descriptors and examples (stipend ranges or qualitative cues)
and ensure the prompt instructs to cite sources when assigning scores.
In `@modes/pipeline.md`:
- Around line 7-14: The appendToPipeline function is looking for Spanish section
headers and therefore fails to append; update the header string constants in
appendToPipeline (currently "## Pendientes" and "## Procesadas") to the English
headers used in modes/pipeline.md ("## Pending" and "## Processed") so the
function finds and inserts entries into the correct sections, and verify any
related index/regex that matches those headers is updated accordingly.
---
Outside diff comments:
In `@batch/batch-prompt.md`:
- Around line 96-103: Block B currently uses obsolete senior archetype labels
(FDE, SA, PM, LLMOps, Agentic, Transformation) which no longer match the intern
archetypes detected in Step 0; update the archetype mapping in Block B to use
the intern archetype names (ML/Deep Learning Intern, AI Engineering Intern,
Backend/Fullstack Intern, Research Intern, Data Science Intern) and adjust any
role-specific guidance text to match the intern-focused priorities referenced in
modes/oferta.md lines 29-34 so the archetype-adapted evaluation logic aligns
with the output of Step 0.
- Around line 3-377: Summary: The prompt file is mostly in Spanish and must be
fully translated into English while preserving all placeholders, filenames,
rules, and structural semantics. Fix: Translate every user-facing and internal
instruction in batch/batch-prompt.md into clear English (preserve exact
placeholders like {{URL}}, {{JD_FILE}}, {{REPORT_NUM}}, {{DATE}}, {{ID}}; keep
filenames cv.md, llms.txt, article-digest.md, generate-pdf.mjs,
templates/cv-template.html, modes/_profile.md; retain blocks A–G, pipeline
steps, TSV format, JSON output schema and “NUNCA/SIEMPRE” rules), update the
explicit rule that "ALL outputs must be in English" into the document, and
ensure examples, tables, and template placeholders remain semantically
identical. Also ensure formatting, headings (A–G), and the PDF generation
command and template placeholder names are unchanged except for language; do not
alter behavior, filenames, or numeric formats.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c878b676-d86b-487c-b754-d92cef5911eb
📒 Files selected for processing (9)
GEMINI.mdbatch/batch-prompt.mdmodes/_shared.mdmodes/followup.mdmodes/oferta.mdmodes/pipeline.mdmodes/scan.mdmodes/tracker.mdscan.mjs
| Usar WebSearch para stipends actuales de internships — buscar explícitamente "internship" para no confundir con salarios full-time (Internshala, Glassdoor, Levels.fyi intern section, company reviews). Reputación comp de la empresa para interns (return offer rate, stipend ranges). Tendencia demanda. Tabla con datos y fuentes citadas. Si no hay datos, decirlo — missing stipend disclosure is common for internships. | ||
|
|
||
| Score de comp (1-5): 5=top quartile, 4=above market, 3=median, 2=slightly below, 1=well below. |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Add the detailed intern comp calibration scale for consistency.
Line 118 correctly updates the research guidance to focus on internship stipends, but line 120 provides only a generic 1-5 score description. Consider adding the specific intern-calibrated scale from modes/oferta.md to ensure consistent scoring across interactive and batch evaluation modes.
📊 Proposed addition of detailed calibration scale
Score de comp (1-5): 5=top quartile, 4=above market, 3=median, 2=slightly below, 1=well below.
+
+**Comp score calibration for interns (1–5):**
+- 5 → top-tier stipend (₹100k+/month India or $7k+/month USD)
+- 4 → above market for role type
+- 3 → market-rate
+- 2 → slightly below (but not dealbreaker for strong brand/learning)
+- 1 → unpaid or token stipend (< ₹15k/month)This matches the calibration guidance in modes/oferta.md:57-62 and provides concrete numerical anchors for evaluators.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Usar WebSearch para stipends actuales de internships — buscar explícitamente "internship" para no confundir con salarios full-time (Internshala, Glassdoor, Levels.fyi intern section, company reviews). Reputación comp de la empresa para interns (return offer rate, stipend ranges). Tendencia demanda. Tabla con datos y fuentes citadas. Si no hay datos, decirlo — missing stipend disclosure is common for internships. | |
| Score de comp (1-5): 5=top quartile, 4=above market, 3=median, 2=slightly below, 1=well below. | |
| Usar WebSearch para stipends actuales de internships — buscar explícitamente "internship" para no confundir con salarios full-time (Internshala, Glassdoor, Levels.fyi intern section, company reviews). Reputación comp de la empresa para interns (return offer rate, stipend ranges). Tendencia demanda. Tabla con datos y fuentes citadas. Si no hay datos, decirlo — missing stipend disclosure is common for internships. | |
| Score de comp (1-5): 5=top quartile, 4=above market, 3=median, 2=slightly below, 1=well below. | |
| **Comp score calibration for interns (1–5):** | |
| - 5 → top-tier stipend (₹100k+/month India or $7k+/month USD) | |
| - 4 → above market for role type | |
| - 3 → market-rate | |
| - 2 → slightly below (but not dealbreaker for strong brand/learning) | |
| - 1 → unpaid or token stipend (< ₹15k/month) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@batch/batch-prompt.md` around lines 118 - 120, Update the generic 1-5 scoring
text in batch-prompt.md (around the internship stipend guidance) to include the
detailed intern-specific calibration scale from modes/oferta.md (refer to the
calibration anchors at lines ~57-62) so evaluators have concrete numerical
anchors for 5=top quartile, 4=above market, 3=median, 2=slightly below, 1=well
below; replace or expand the existing Score de comp paragraph to
verbatim/near-verbatim include those intern-calibrated descriptors and examples
(stipend ranges or qualitative cues) and ensure the prompt instructs to cite
sources when assigning scores.
| 1. **Read** `data/pipeline.md` → find `- [ ]` items in the "Pending" section | ||
| 2. **For each pending URL**: | ||
| a. Calculate next sequential `REPORT_NUM` (read `reports/`, take highest number + 1) | ||
| b. **Extract JD** using Playwright (browser_navigate + browser_snapshot) → WebFetch → WebSearch | ||
| c. If URL is not accessible → mark as `- [!]` with a note and continue | ||
| d. **Execute full auto-pipeline**: A-G Evaluation → Report .md → PDF (if score >= 3.0) → Tracker | ||
| e. **Move from "Pending" to "Processed"**: `- [x] #NNN | URL | Company | Role | Score/5 | PDF ✅/❌` | ||
| 3. **If there are 3+ pending URLs**, launch agents in parallel (Agent tool with `run_in_background`) to maximize speed. |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify scan.mjs still uses Spanish section names
rg -n "Pendientes|Procesadas" scan.mjsRepository: santifer/career-ops
Length of output: 353
🏁 Script executed:
grep -n "^## " modes/pipeline.md | head -20Repository: santifer/career-ops
Length of output: 227
scan.mjs section markers must match the English headers in modes/pipeline.md.
The appendToPipeline function in scan.mjs searches for Spanish section headers ("## Pendientes" and "## Procesadas"), but modes/pipeline.md now uses English headers ("## Pending" and "## Processed"). This causes the function to fail appending entries to the correct section.
Update scan.mjs lines 205 and 209:
- const marker = '## Pendientes';
+ const marker = '## Pending';- const procIdx = text.indexOf('## Procesadas');
+ const procIdx = text.indexOf('## Processed');🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modes/pipeline.md` around lines 7 - 14, The appendToPipeline function is
looking for Spanish section headers and therefore fails to append; update the
header string constants in appendToPipeline (currently "## Pendientes" and "##
Procesadas") to the English headers used in modes/pipeline.md ("## Pending" and
"## Processed") so the function finds and inserts entries into the correct
sections, and verify any related index/regex that matches those headers is
updated accordingly.
|
Hi, scan.mjs buildTitleFilter() wraps the entire keyword with lookarounds, so keywords that include a trailing space (e.g., "Java ", "SAP ") will no longer match normal titles like "Java Developer", causing excluded roles to slip through. Severity: action required | Category: correctness How to fix: Trim keywords before regex Agent prompt to fix - you can give this to your LLM of choice:
Found by Qodo code review |
…Fs, archived 3 below-threshold) - scan.mjs: ElevenLabs Spain FDE (1 new offer added by Level 1/2) - Level 3 WebSearch: Greenhouse new grad + Workable HF/Jalasoft/Vettura + Breezy Urrly + Ashby P-1 AI + Himalayas Robots & Pencils + Remotive Littlebird (8 URLs added; 3 archived as fetch errors) - 6 new evaluations (santifer#556-santifer#561 — ElevenLabs Spain dedup'd against santifer#468): * santifer#557 Underdog 2026 New Grad (1.8/5 — no visa sponsorship; SKIP) * santifer#558/562 P-1 AI FDE (3.0/5 — engineering AGI seed; PDF generated) * santifer#559 Vettura AI/ML (3.0/5 — mid breadth NLP+CV+GenAI; PDF generated) * santifer#560 Robots & Pencils (1.8/5 — Colombia + senior; SKIP) * santifer#561 Littlebird Applied AI (3.5/5 — RAG hybrid search; PDF generated) - merge-tracker + verify-pipeline: 0 errors / 0 warnings - cleanup-low-scores: 3 archived to reports/below-threshold/ https://claude.ai/code/session_overnight_2026-05-05
Summary
This PR fixes two critical bugs that caused the zero-token scanner to scan 0 companies, and realigns all evaluation modes from senior-role framing to intern-level framing.
scan.mjs: Fix Greenhouse URL regex (was silently skipping Anthropic, W&B, Arize, Retool); add AshbyemploymentTypefilter to drop full-time jobs at source; replace substring title matching with word-boundary regex so"intern"no longer matches"International"or"Internal Tooling"modes/oferta.md: Step 0 reads archetypes from_profile.md; Block A adds Duration + Stipend fields; Block B/F updated to intern archetype framing; Block C replaces "sell senior / downlevel" with "demonstrate learning velocity / low stipend"; Block D adds Internshala and an intern-calibrated comp scoring scale; Block F reflection column no longer says "signals seniority"; red-flag examples updated to intern levelbatch/batch-prompt.md: All 6 senior archetypes replaced with 5 intern archetypes; adaptive framing updated; Block C/D aligned with oferta.md changes; header translated to Englishmodes/followup.md: Follow-up example replaced — was "15 years PHP experience", now an intern example using candidate's actual projectsmodes/_shared.md: Cover letter guidance distinguishes short text-box (3–4 sentences) from full 1-page cover letterTest plan
node scan.mjs --dry-run— should show 10+ companies detected (not 0)/career-ops ofertaon a sample intern JD — verify Block C generates "learning velocity" strategy, not "sell senior" strategy🤖 Generated with Claude Code
Summary by CodeRabbit
Documentation
New Features