Class: Crawler

Crawler

new Crawler(args)

Class responsible for loading and executing all the crawler components in proper sequence. Responsiblities: Loads other classes and instantiate them Supply all the requirements of other classes Creates singleton objects which are shared along the program
Parameters:
Name Type Description
args Object object containing cmd line args
Author:
Source:

Members

(private, inner) bot_obj :Bot

Stores Bot object;
Type:
Source:

(private, inner) child_manager :ChildManager

Stores ChildManager object;
Type:
Source:

(private, inner) cluster :Cluster

Stores Cluster object
Type:
Source:

(private, inner) config :Config

Stores Config object;
Type:
Source:

(private, inner) isClusterStarted :boolean

Set to true when cluster started.
Type:
  • boolean
Source:

(private, inner) isDBLoaded :boolean

Set to true when db loaded.
Type:
  • boolean
Source:

(private, inner) isInputsParsed :boolean

Set to true when inputs parsed.
Type:
  • boolean
Source:

(private, inner) isLoggerLoaded :boolean

Set to true when logger loaded.
Type:
  • boolean
Source:

(private, inner) isNormalCrawl

Set to true when normal crawl can continue.
Source:

(private, inner) log :Logger

Stores Logger object;
Type:
Source:

(private, inner) message_obj :Message

Message object which is shared with all the crawler components.
Type:
Source:

(private, inner) mongo_pool :MongoDB

Stores MongoDB object;
Type:
Source:

(private, inner) seed :SeedLoader

Stores Seed object;
Type:
Source:

(private, inner) that :boolean

Stores current obj context for nested functions.
Type:
  • boolean
Source:

Methods

botStopped() → {boolean}

Returns if bot stopped. Default null. When stopped returns true.
Source:
Returns:
status
Type
boolean

cleanUp(fn)

Performs clean up operations before closing crawler.
Parameters:
Name Type Description
fn function Callback
Source:

exit()

Exits the crawler by calling cleanUp
Source:

isStarted() → {boolean}

Returns if bot started successfully
Source:
Returns:
status - status from messages
Type
boolean

loadConfig(c)

Sets Config in our private var config,
Parameters:
Name Type Description
c Config
Source:

loadDB(p)

Creates instance of MongoDB. Calls createConnection in MongodB and set the DB object in Config, Seed. Marks isDBLoaded to true.
Parameters:
Name Type Description
p Pool Pool object, returns constructor for MongoDB
Source:

loadSeed(s)

Sets Seed in our private var seed,
Parameters:
Name Type Description
s SeedLoader
Source:

processInput(argv_obj)

Parses input and sets overriden config returned by ArgumentParser to Config object.
Parameters:
Name Type Description
argv_obj ArgumentProcesser
Source:

reset(fn)

Reset the bot when --reset arg passed
Parameters:
Name Type Description
fn function Callback function
Source:

restart()

Restarts the bot.
Source:

run()

Main method of the Crawler. Executes the crawler by loading all components.
Source:

selectInput()

When args is parsed this is called to select the action of crawler.
Source:

setLogger(l)

Sets the Logger object in all Crawler components.
Parameters:
Name Type Description
l Logger
Source:

startCluster()

Starts the cluster by creating cluster and bot object.
Source:

startNormalCrawl()

When no special args are given this is called by this.selectInput
Source:

(private, inner) checkDependency()

Loads depcheck.js and check dependencies. Exits if dependencies not met.
Source:

(private, inner) deathCleanUp()

Calls cleanUp and kill all active_pids on death event. Ctrl^C
Source:

(private, inner) msg()

Used to call Logger object with the caller function name.
Source:

(private, inner) setGlobals()

All the process global vars go here
Source:

(private, inner) startBotManager(botObjs)

Calls the seed method of MongoDb. And loads the ChildManager into child_manager
Parameters:
Name Type Description
botObjs Object Robots.txt parsed data
Source: