Contact Intelligence Engine
Web scraping pipeline for contact extraction, decision maker mapping, and LinkedIn enrichment
The Contact Intelligence Engine takes discovered domains from the B2B Discovery Engine and performs deep website analysis using an adaptive web scraping pipeline. It uses a two-tier crawling approach (Playwright stealth-first, httpx fallback), extracts contacts via 4 cascading methods, infers seniority levels for B2B sales intelligence, classifies personas (Economic Buyer, Champion, Technical Evaluator, Influencer), and discovers social profiles across 8 platforms for comprehensive LinkedIn enrichment.
Pipeline Stages
Each stage executes automatically, escalating only when needed.
Stealth Crawl
Two-tier web scraping pipeline: Playwright stealth browser first (handles SPAs, JS-heavy sites), httpx fallback for static HTML. Page classification (team, leadership, about, contact, careers) determines crawl priority for AI lead generation.
Contact Extraction
4-method cascade for B2B sales intelligence: JSON-LD structured data -> team card CSS pattern detection -> heuristic proximity analysis (name near email/phone) -> LinkedIn profile URL extraction.
Decision Maker Map
Seniority inference from job titles using NLP scoring for lead enrichment platform. Persona classification into 4 types: Economic Buyer, Champion, Technical Evaluator, Influencer.
Social Discovery
8-platform LinkedIn enrichment and social extraction: LinkedIn, Twitter, Facebook, Instagram, YouTube, GitHub, Crunchbase, Glassdoor. URL pattern matching with DDG fallback.
Output
Streaming push with per-contact authority scoring, persona labels, social profile links, and confidence scores for the sales intelligence API.
Stealth Crawl
Two-tier web scraping pipeline: Playwright stealth browser first (handles SPAs, JS-heavy sites), httpx fallback for static HTML. Page classification (team, leadership, about, contact, careers) determines crawl priority for AI lead generation.
Contact Extraction
4-method cascade for B2B sales intelligence: JSON-LD structured data -> team card CSS pattern detection -> heuristic proximity analysis (name near email/phone) -> LinkedIn profile URL extraction.
Decision Maker Map
Seniority inference from job titles using NLP scoring for lead enrichment platform. Persona classification into 4 types: Economic Buyer, Champion, Technical Evaluator, Influencer.
Social Discovery
8-platform LinkedIn enrichment and social extraction: LinkedIn, Twitter, Facebook, Instagram, YouTube, GitHub, Crunchbase, Glassdoor. URL pattern matching with DDG fallback.
Output
Streaming push with per-contact authority scoring, persona labels, social profile links, and confidence scores for the sales intelligence API.
Stealth Crawl
Two-tier web scraping pipeline: Playwright stealth browser first (handles SPAs, JS-heavy sites), httpx fallback for static HTML. Page classification (team, leadership, about, contact, careers) determines crawl priority for AI lead generation.
Contact Extraction
4-method cascade for B2B sales intelligence: JSON-LD structured data -> team card CSS pattern detection -> heuristic proximity analysis (name near email/phone) -> LinkedIn profile URL extraction.
Decision Maker Map
Seniority inference from job titles using NLP scoring for lead enrichment platform. Persona classification into 4 types: Economic Buyer, Champion, Technical Evaluator, Influencer.
Social Discovery
8-platform LinkedIn enrichment and social extraction: LinkedIn, Twitter, Facebook, Instagram, YouTube, GitHub, Crunchbase, Glassdoor. URL pattern matching with DDG fallback.
Output
Streaming push with per-contact authority scoring, persona labels, social profile links, and confidence scores for the sales intelligence API.
Key Capabilities
Two-Tier Web Scraping Pipeline
Playwright stealth browser handles JavaScript-heavy SPAs, React/Vue apps, and dynamically loaded team pages. Falls back to lightning-fast httpx for static HTML sites. Budget-capped at 3 browser renders per company.
Page Classification for AI Lead Generation
Intelligent URL classification identifies team, leadership, about, contact, and careers pages. Priority scoring ensures the most valuable pages for B2B sales intelligence are crawled first.
4-Method Contact Extraction
JSON-LD structured data (highest quality) -> Team card CSS layout detection -> Heuristic proximity analysis -> LinkedIn URL extraction. Each method feeds the lead enrichment platform pipeline.
Persona Classification for Sales Intelligence
Contacts classified as Economic Buyer (CFO, VP Finance), Champion (VP Sales, Director), Technical Evaluator (CTO, Engineer), or Influencer (Marketing, Growth) for AI outbound automation.
Seniority Inference
NLP-based title parsing scores seniority 1-5. C-suite=5, VP=4, Director=3, Manager=2, Individual=1. Used for decision authority ranking in the B2B sales intelligence pipeline.
8-Platform LinkedIn Enrichment
Extracts social profiles from LinkedIn, Twitter, Facebook, Instagram, YouTube, GitHub, Crunchbase, and Glassdoor using URL patterns and search engine fallback for comprehensive enrichment.
Accepted Inputs
- Array of company objects with domains (from Discovery Engine)
- CSV file with domain column
- Excel file with website/domain column
- Public URL to domain list
- Apify KV Store key
Configuration
- Workers: 1-20 concurrent (default 5)
- Max Pages Per Domain: 1-50 (default 15)
- Company Timeout: 10-600 seconds (default 120)
- Page Delays: min/max milliseconds between requests
- Playwright: enable/disable browser rendering
- Social Enrichment: toggle 8-platform LinkedIn enrichment
See It In Action
Frequently Asked Questions
Everything you need to know about our platform.
Still have questions?
Our team can walk you through the pipeline, pricing, and your use case.