turnkey tasks » Multi Page Scraper

extracts data from a web page list

  Multi Page Scraper

scraperMultiPage

Turnkey task scraperMultiPage extracts data from a list of the web pages. Enter websites in the input.json files and click START to scrape required data. Also command "dex8 start -i input.json" will start the crawler.



Features:

  • Extract data from many web pages and from many HTML elements
  • Extract data from dynamic HTML content which is loaded by Javascript
  • Extract data from HTML element defined by CSS selector
  • Extract text, html or value from HTML tag attribute
  • Filter extracted data by Regular Expression
  • Correct extracted data by custom JS function




Input fields

Example of an input file.
{
  "device_name": "Desktop Linux",

  "urls": [
    "adsuu.com",
    "dex8.com"
  ],
  "encodeURL": false,

  "extracts": [
    {
      "tip": "text",
      "selector": "title"
    },
    {
      "tip": "attr",
      "attribute": "content",
      "selector": "meta[name=\"keywords\"]"
    },
    {
      "tip": "attr",
      "attribute": "content",
      "selector": "meta[name=\"description\"]"
    },
    {
      "tip": "attr",
      "attribute": "href",
      "selector": "a"
    }
  ],

  "filter": {
    "reg_str": "",
    "reg_flags": ""
  },


  "corrector": "return result;" // or just put false
}

PRICE: 10.00 EUR /month

To buy this product you need to sign up for a free account and login .