Customizable Web Scrapper to get alerts when criteria is met on web sites.
  • Python 86.2%
  • Shell 8.1%
  • Dockerfile 5.7%
Find a file
Jack Stockley 9b8b4d1f40
Some checks failed
Lint, Test, and Deploy / Lint (push) Has been cancelled
Lint, Test, and Deploy / Test (push) Has been cancelled
Trivy Security Scan / Scan (push) Has been cancelled
Lint, Test, and Deploy / Publish to PyPI (push) Has been cancelled
Lint, Test, and Deploy / Deploy Docker Image (push) Has been cancelled
Merge pull request #291 from jnstockley/dev
Sync from python-starter
2026-06-03 18:59:59 -05:00
.github/workflows Update astral-sh/setup-uv action to v8.2.0 2026-06-03 13:08:45 +00:00
scripts Fix linting to fail 2025-12-26 23:01:33 -06:00
src Pull changes from python-starter 2026-01-25 23:58:03 +00:00
tests Fix lint and tests 2025-10-07 13:37:14 -05:00
.dockerignore Update .dockerignore to include data and logs 2025-09-30 18:10:47 -05:00
.gitignore Migrate to UV 2025-10-07 13:25:26 -05:00
.python-version Update python Docker tag 2025-10-21 23:50:17 +00:00
.yamllint Update linters 2025-03-10 19:21:31 -05:00
compose-dev.yml Fix merge conflicts 2026-01-06 17:29:49 -06:00
compose.yml Fix merge conflicts 2026-01-06 17:29:49 -06:00
Dockerfile Update dhi.io/python Docker tag to v3.14.5 2026-05-11 19:09:49 +00:00
LICENSE Create LICENSE 2025-03-09 17:02:15 -05:00
Main Branch Protection.json Add branch protection json 2025-09-30 08:19:02 -05:00
pyproject.toml Update dependency pandas to v3.0.3 2026-05-11 19:12:44 +00:00
README.md Update name 2025-10-07 13:34:28 -05:00
renovate.json Migrate config renovate.json 2025-08-06 00:01:13 +00:00
sample.env Clean up sample.env by removing conflict markers 2026-01-25 18:00:28 -06:00
uv.lock Update dependency pandas to v3.0.3 2026-05-11 19:12:44 +00:00

WebScraper

  • This program can scrap data from websites using different scrapers, and send an email when matches/ changes deadening on the scraper used
  • There are 2 types of scrapers:
    • Generic: Can scrap any website, but might not be as exact
    • Specific: Can scrap only specific websites, but will be more exact

Generic Scrapers

  • Text
  • Diff

Specific Scrapers

  • Cars.com

How to use

Text

  1. Set these specific env variables
  2.  SCRAPER=text # Scraper to use
     URL=<URL> # URL to scrape
     TEXT=<TEXT> # Text to look for
    
  3. Ensure all other required env variables are set

Diff

  1. Set these specific env variables
  2.  SCRAPER=diff # Scraper to use
     URL=<URL> # URL to scrape
     PERCENTAGE=<PERCENTAGE_DIFF> # Percentage difference to look for
    
  3. Ensure all other required env variables are set

Cars.com

  1. Set these specific env variables
  2.  SCRAPER=cars_com # Scraper to use
     URL=https://www.cars.com/shopping/results/ # URL to scrape, must be on the results page, for a specific search
    
  3. Ensure all other required env variables are set

Required env variables

SLEEP_TIME_SEC= # Time to sleep between each scrape
SENDER_EMAIL= # Email to send from
FROM_EMAIL= # Name to send from i.e. '"Web Scraper" <no-reply@jstockley.com>'
RECEIVER_EMAIL= # Email to send to
PASSWORD= # Password for the sender's email
SMTP_SERVER= # SMTP server to use
SMTP_PORT= # SMTP port to use
TLS= # True/False to use TLS

Running multiple of the same scraper

To run 2+ scrapers of the same type, i.e. 2 diff scrapers, make sure the host folder mapping is different Ex:

  diff-scraper-1:
    image: jnstockley/web-scraper:latest
    volumes:
      - ./diff-scraper-1-data/:/app/data/
    environment:
      - TZ=America/Chicago
      - SCRAPER=diff
      - URL=https://google.com
      - PERCENTAGE=5
      - SLEEP_TIME_SEC=21600

  diff-scraper-2:
    image: jnstockley/web-scraper:latest
    volumes:
      - ./diff-scraper-2-data/:/app/data/
    environment:
      - TZ=America/Chicago
      - SCRAPER=diff
      - URL=https://yahoo.com
      - PERCENTAGE=5
      - SLEEP_TIME_SEC=21600