How does the 5-layer hierarchy decide which layer to use?

Each layer runs in order. If Layer 1 (static) extracts enough data with high confidence, layers 2-5 are skipped. If confidence is below threshold, the next layer fires. Browser rendering (Layer 5) is only used when all other methods fail -- typically for heavy single-page applications.

What happens when a CAPTCHA is detected?

The scraper immediately stops crawling that domain and marks it as requires_manual. No CAPTCHA bypass is attempted. Human-like delays (2-5s) and user agent rotation minimize CAPTCHA triggers.

Can I control the crawling budget?

Yes. Per-company limits are configurable: max pages (default 15), max browser renders (default 3), and max runtime (default 120s). These prevent runaway crawls while ensuring thorough extraction.

Does it work with JavaScript-heavy sites?

Yes. Layer 2 extracts JS data without rendering (parsing __NEXT_DATA__, React props, inline JSON). Layer 5 uses Playwright for full browser rendering when needed. Most modern sites are handled by layers 2-3 without needing the browser.

Website Intelligence Scraper - 5-Layer Adaptive Web Crawling

Step 1

URL Resolution

Resolve domain, follow redirects, validate SSL, check against bad domain filter list.

Step 2

Static Layer

HTTP GET with anti-detection headers. Parse HTML with regex, extract schema.org, JSON-LD, Open Graph metadata.

Step 3

JS Data Layer

Extract __NEXT_DATA__, data-react-props, inline JSON, and API endpoint data from page source without rendering.

Step 4

Structural Layer

Discover and crawl /team, /about, /contact, /leadership, /people pages. Build cross-page context map.

Step 5

Semantic Layer

4-method contact extraction: JSON-LD structured data, team card detection, heuristic proximity analysis, LinkedIn X-ray.

Step 6

Browser Layer

Playwright rendering for JS-heavy pages. Budget-capped at 3 renders per company. Singleton pool management.

Step 7

Quality Assessment

Score extraction confidence 0-100. Flag companies below threshold for re-crawl with deeper methods.

Use Cases

Pre-Event Intelligence

Crawl all exhibitor websites before a trade show to extract team pages, contact info, and company profiles.

CRM Enrichment

Batch crawl domains from your CRM to fill missing company data, contacts, and social profiles.

Competitive Analysis

Monitor competitor websites for team changes, new hires, and organizational structure updates.

Market Research

Crawl industry directories and company listings to build comprehensive market maps.

Tech Stack Detection

Extract technology signals from website source code, meta tags, and JS frameworks.

Lead Qualification

Crawl prospect websites to assess company size, team structure, and contact availability before outreach.

More Contacts

vs single-method scrapers, from 5-layer escalation

85%

Fewer Browser Renders

Most data extracted before reaching Playwright layer

15 pg

Per-Company Budget

Prevents runaway crawls and resource waste

<2s

Avg Page Time

Static/JS layers process in milliseconds

Stop guessing which scraper to use

The 5-layer hierarchy adapts automatically. Upload your domain list and let the pipeline choose the right method for each site.

FAQ

Frequently Asked Questions

Everything you need to know about our platform.

Still have questions?

Our team can walk you through the pipeline, pricing, and your use case.

Talk to us

Related Workflows

Domain Intelligence

Product

DNS, SSL, WHOIS analysis

/products/domain-intelligence

Decision Maker Discovery

Product

4-method contact extraction

/products/decision-maker-discovery

Tech Stack Detection

Product

Technology fingerprinting

/products/tech-stack-detection

Scraping Hierarchy

Platform

Full technical documentation

/platform/scraping

Step 1

URL Resolution

Resolve domain, follow redirects, validate SSL, check against bad domain filter list.

Step 2

Static Layer

HTTP GET with anti-detection headers. Parse HTML with regex, extract schema.org, JSON-LD, Open Graph metadata.

Step 3

JS Data Layer

Extract __NEXT_DATA__, data-react-props, inline JSON, and API endpoint data from page source without rendering.

Step 4

Structural Layer

Discover and crawl /team, /about, /contact, /leadership, /people pages. Build cross-page context map.

Step 5

Semantic Layer

4-method contact extraction: JSON-LD structured data, team card detection, heuristic proximity analysis, LinkedIn X-ray.

Step 6

Browser Layer

Playwright rendering for JS-heavy pages. Budget-capped at 3 renders per company. Singleton pool management.

Step 7

Quality Assessment

Score extraction confidence 0-100. Flag companies below threshold for re-crawl with deeper methods.

Use Cases

Pre-Event Intelligence

Crawl all exhibitor websites before a trade show to extract team pages, contact info, and company profiles.

CRM Enrichment

Batch crawl domains from your CRM to fill missing company data, contacts, and social profiles.

Competitive Analysis

Monitor competitor websites for team changes, new hires, and organizational structure updates.

Market Research

Crawl industry directories and company listings to build comprehensive market maps.

Tech Stack Detection

Extract technology signals from website source code, meta tags, and JS frameworks.

Lead Qualification

Crawl prospect websites to assess company size, team structure, and contact availability before outreach.

More Contacts

vs single-method scrapers, from 5-layer escalation

85%

Fewer Browser Renders

Most data extracted before reaching Playwright layer

15 pg

Per-Company Budget

Prevents runaway crawls and resource waste

<2s

Avg Page Time

Static/JS layers process in milliseconds

Stop guessing which scraper to use

The 5-layer hierarchy adapts automatically. Upload your domain list and let the pipeline choose the right method for each site.

FAQ

Frequently Asked Questions

Everything you need to know about our platform.

Still have questions?

Our team can walk you through the pipeline, pricing, and your use case.

Talk to us

Related Workflows

Domain Intelligence

Product

DNS, SSL, WHOIS analysis

/products/domain-intelligence

Decision Maker Discovery

Product

4-method contact extraction

/products/decision-maker-discovery

Tech Stack Detection

Product

Technology fingerprinting

/products/tech-stack-detection

Scraping Hierarchy

Platform

Full technical documentation

/platform/scraping