new Crawler(args)
Class responsible for loading and executing all the crawler components in proper sequence.
Responsiblities:
Loads other classes and instantiate them
Supply all the requirements of other classes
Creates singleton objects which are shared along the program
Parameters:
Name | Type | Description |
---|---|---|
args |
Object | object containing cmd line args |
Members
(private, inner) bot_obj :Bot
Stores Bot object;
Type:
(private, inner) child_manager :ChildManager
Stores ChildManager object;
Type:
(private, inner) cluster :Cluster
Stores Cluster object
Type:
(private, inner) config :Config
Stores Config object;
Type:
(private, inner) isClusterStarted :boolean
Set to true when cluster started.
Type:
- boolean
(private, inner) isDBLoaded :boolean
Set to true when db loaded.
Type:
- boolean
(private, inner) isInputsParsed :boolean
Set to true when inputs parsed.
Type:
- boolean
(private, inner) isLoggerLoaded :boolean
Set to true when logger loaded.
Type:
- boolean
(private, inner) isNormalCrawl
Set to true when normal crawl can continue.
(private, inner) log :Logger
Stores Logger object;
Type:
(private, inner) message_obj :Message
Message object which is shared with all the crawler components.
Type:
(private, inner) mongo_pool :MongoDB
Stores MongoDB object;
Type:
(private, inner) seed :SeedLoader
Stores Seed object;
Type:
(private, inner) that :boolean
Stores current obj context for nested functions.
Type:
- boolean
Methods
botStopped() → {boolean}
Returns if bot stopped.
Default null. When stopped returns true.
Returns:
status
- Type
- boolean
cleanUp(fn)
Performs clean up operations before closing crawler.
Parameters:
Name | Type | Description |
---|---|---|
fn |
function | Callback |
exit()
Exits the crawler by calling cleanUp
isStarted() → {boolean}
Returns if bot started successfully
Returns:
status - status from messages
- Type
- boolean
loadConfig(c)
Sets Config in our private var config,
Parameters:
Name | Type | Description |
---|---|---|
c |
Config |
loadDB(p)
Creates instance of MongoDB.
Calls createConnection in MongodB and set the DB object
in Config, Seed. Marks isDBLoaded to true.
Parameters:
Name | Type | Description |
---|---|---|
p |
Pool | Pool object, returns constructor for MongoDB |
loadSeed(s)
Sets Seed in our private var seed,
Parameters:
Name | Type | Description |
---|---|---|
s |
SeedLoader |
processInput(argv_obj)
Parses input and sets overriden config returned by ArgumentParser to Config object.
Parameters:
Name | Type | Description |
---|---|---|
argv_obj |
ArgumentProcesser |
reset(fn)
Reset the bot when --reset arg passed
Parameters:
Name | Type | Description |
---|---|---|
fn |
function | Callback function |
restart()
Restarts the bot.
run()
Main method of the Crawler.
Executes the crawler by loading all components.
selectInput()
When args is parsed this is called to select the action of crawler.
setLogger(l)
Sets the Logger object in all Crawler components.
Parameters:
Name | Type | Description |
---|---|---|
l |
Logger |
startCluster()
Starts the cluster by creating cluster and bot object.
startNormalCrawl()
When no special args are given this is called by this.selectInput
(private, inner) checkDependency()
Loads depcheck.js and check dependencies.
Exits if dependencies not met.
(private, inner) deathCleanUp()
Calls cleanUp and kill all active_pids on death event. Ctrl^C
(private, inner) msg()
Used to call Logger object with the caller function name.
(private, inner) setGlobals()
All the process global vars go here
(private, inner) startBotManager(botObjs)
Calls the seed method of MongoDb.
And loads the ChildManager into child_manager
Parameters:
Name | Type | Description |
---|---|---|
botObjs |
Object | Robots.txt parsed data |