Skip to content

jooservices/XCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

XCrawler

XCrawler is a Laravel 12 operational crawler application for metadata crawling, normalization, Elasticsearch indexing, and React/Inertia crawl management.

It crawls movie metadata through the local Modules/Crawler parser and adapter module, stores normalized catalog data in MySQL, uses Redis/Horizon for queue work, records crawl logs, indexes movies into Elasticsearch, and provides an admin-focused UI for dashboard, movies, performers, sites, crawl operations, and search.

The companion ../xcrawler-observability service is a separate NestJS/React/PostgreSQL observability app. XCrawler's Laravel OBS integration lives in Modules/Observability and can send signed downstream event copies through observability_outbox for dashboards and timelines; it must not replace XCrawler local crawl_logs, Redis backoff/throttle state, Horizon queues, MySQL source data, or Elasticsearch indexing.

Stack

  • PHP 8.5, Laravel 12
  • React 19, Inertia 3, TypeScript 6, Vite 7, Tailwind CSS 4
  • MySQL 9, Redis 7, Elasticsearch 9.x, MongoDB 8.3, Qdrant 1.13
  • Laravel Horizon
  • JOOservices packages: client, dto, laravel-config, laravel-controller, laravel-repository, useragent

Quick Install With Docker

Use Docker Desktop with the desktop-linux context.

docker context use desktop-linux
cp .env.example .env
docker compose up -d --build
docker compose exec app composer install
docker compose exec app npm install
docker compose exec app php artisan key:generate
docker compose exec app php artisan migrate
docker compose exec app php artisan search:ensure-index
docker compose exec app php artisan vector:ensure-collection
docker compose exec app npm run build

For a new disposable local DB only:

docker compose exec app php artisan db:seed

App: http://127.0.0.1:8080

Horizon: http://127.0.0.1:8080/horizon

Common Commands

docker compose exec app php artisan crawl:dispatch-due --limit=5
docker compose exec app php artisan crawl:site onejav
docker compose exec app php artisan search:reindex-movies
docker compose exec app php artisan vector:ensure-collection
docker compose exec app php artisan vector:sync-movies
docker compose exec app php artisan horizon:status
docker compose exec app php artisan schedule:list
composer test
npm run typecheck
npm run lint
npm run format:check
npm run build

Safety Summary

  • Use syncWithoutDetaching() for movie performer/genre pivots.
  • Use TaxonomyService for performer/genre names.
  • Use App\Support\UrlHash for URL hashes.
  • Do not run search:ensure-index --force or search:reindex-movies --recreate without environment confirmation.
  • The primary local quality gate is the full Docker quality gate via bash scripts/docker-testing.sh (run before every commit/push). Before push, also run a narrow real crawl smoke test (crawl:probe-site onejav) in the isolated docker-compose.test.yml stack. Individual checks (composer test, etc.) are for partial/troubleshooting runs only.
  • Do not use Colima for this repository; use Docker Desktop desktop-linux.

Documentation

Start with the Documentation Hub.

Important links:

Modules

AI / Agent Guidance

Community

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors