Skip to content

FAAQJAVED/FAAQJAVED

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 

Repository files navigation

Afaq Javed

Python Automation · Web Scraping · B2B Lead Generation

     


6
interconnected tools
565
tests — zero network calls
20,000+
leads generated
3
CI operating systems
8.7 / 10
avg project rating

What I build

A complete, production-grade B2B lead generation system in Python — six interconnected tools that together cover the full pipeline from discovery → enrichment → CRM-ready output. Any business directory on the internet. Any city. Any sector.

This toolkit has processed 20,000+ verified business leads across the UK property management sector.


The pipeline

  DISCOVERY    find companies from any source
  ---------------------------------------------------------------
  +-  Google Maps Scraper       Maps listings + email enrichment
  +-  LeadHunter Pro            4 search engines, HOT/WARM/COLD
  +-  Trustpilot Scraper        reputation-filtered businesses
  +-  JSON Directory Harvester  any JSON API, config-only
  +-  HTML Directory Scrapers   any HTML or WordPress directory

  ENRICHMENT   add verified contact details
  ---------------------------------------------------------------
  +-  Email & Phone Enricher    HTTP + Playwright, CF bypass

  OUTPUT       consistent schema across all 6 tools
  ---------------------------------------------------------------
  +-  3-sheet Excel  ( Data  /  Flagged  /  Summary )
      deduplicated  *  validated  *  CRM-importable

A client who needs 10,000 verified business contacts — starting from zero — gets back a single deduplicated, formatted Excel file drawn from multiple sources and ready for CRM import.


Projects

Version Tests CI

Playwright-driven Maps scraper with concurrent email enrichment, Cloudflare XOR decode, and atomic checkpoint/resume. Runs survive interruption and pick up exactly where they left off.

Playwright ThreadPoolExecutor openpyxl Cloudflare bypass

Version Tests CI

Two-pass crawler: fast HTTP first, Playwright fallback for JS-rendered pages. E.164 phone normalisation. Works standalone or as the enrichment layer for any discovery tool.

httpx Playwright E.164 normalisation Cloudflare bypass

Version Tests CI

4-engine parallel search (Bing, DuckDuckGo, Mojeek, Yahoo) via abstract base class architecture. Configurable HOT/WARM/COLD/NOISE keyword scoring. Domain deduplication before enrichment.

Abstract base classes 4-engine orchestration YAML scoring config

Version Tests CI

Selenium + Chrome scraper for Trustpilot listings — name, contact, website, rating, review count. Targets only established, active companies. Anti-scraping evasion built in.

Selenium Chrome reputation filtering anti-bot evasion

Version Tests CI

Config-only retargeting — no code changes to point at a new JSON API directory. Two pagination modes, dot-path navigation, geographic bounding-box filter, two-pass deduplication. Best-documented codebase in the toolkit.

Generic pipeline dot-path JSON geo filtering pure functions

Version Tests CI

Dual-engine toolkit: Engine 1 for any paginated HTML via CSS selectors; Engine 2 for WordPress AJAX — automatic nonce extraction, mid-run nonce refresh, manual gzip/zlib decompression. Most technically complex codebase in portfolio.

CSS selectors WordPress AJAX nonce lifecycle ThreadPoolExecutor


Stack

Languages & runtime

Scraping & automation

Specialist techniques: Cloudflare XOR email decoding · WordPress admin-ajax.php nonce lifecycle · manual gzip/zlib decompression · SMTP RCPT email verification · anti-bot fingerprint evasion

Data engineering

Techniques: two-pass deduplication (ID-based + normalised name+postcode) · E.164 phone normalisation · geographic bounding-box filtering · HOT/WARM/COLD/NOISE keyword scoring · configurable record validation

Concurrency & resilience

Patterns: exponential-backoff retry · circuit-breaker (N failures → auto-pause) · atomic file ops (.tmp → os.replace()) · crash-safe checkpoint/resume · cross-platform keyboard listener · command.txt headless controls · tqdm-safe log routing

Infrastructure & DevOps

CI matrix: Ubuntu · Windows · macOS across Python 3.9 – 3.12


What I'm learning next

Each of these directly extends what the current toolkit can do — async throughput, database output, cloud-deployable APIs, visual dashboards.


Specialisation

B2B sales market , Lead Generation and CRM. Property management, Sales, Lettings, and professional services directories.

Postcode validation, geographic filtering, and UK, US, and EU sector categorisation are built into the architecture — not retrofitted.


Contact

Open to remote freelance projects — scraping pipelines, lead generation, data extraction, automation.

📧 faaqjaved@gmail.com  ·  LinkedIn  ·  GitHub

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors