Class: ChildManager

ChildManager

new ChildManager(message_obj)

Manages workers owned by the bot and manages the communication and cordination between workers.
Parameters:
Name Type Description
message_obj Message
Author:
Source:

Members

(private, inner) bloom :BloomFilter

Bloom filter is used to reduce duplicate error from db for url seen test. n Number of items in the filter p Probability of false positives, float between 0 and 1 or a number indicating 1-in-p m Number of bits in the filter k Number of hash functions n = 10,000,000, p = 1.0E-6 (1 in 1,000,000) → m = 287,551,752 (34.28MB), k = 20 http://hur.st/bloomfilter
Type:
  • BloomFilter
Source:

(private, inner) BLOOM_K

Bloom filter k value
Source:

(private, inner) bloom_length

Tracks size of bloom filter
Source:

(private, inner) BLOOM_M

Bloom filter m value
Source:

(private, inner) BLOOM_N

Bloom filter n value
Source:

(private, inner) prev_domain_grp

Used as a queue, so that different domain group buckets are fetched from db for crawling.
Source:

Methods

In case of clean up,flushInlinks into the db.
Parameters:
Name Type Description
fn function callback
Source:

getActiveChilds()

Get the number of active childs in the manager
Source:

isManagerLocked()

Returns the state of the starter function
Source:

killWorkers(fn)

Kill the workers spawned by the child manager.
Parameters:
Name Type Description
fn function callback
Source:

setManagerLocked(state)

Locks or unlocks the interval running starter function.
Parameters:
Name Type Description
state boolean true/false the lock
Source:

(private, inner) childFeedback(data)

Recieves message from all the workers.
Parameters:
Name Type Description
data Object {"bot": "spawn", "insertRssFeed": [link.details.url, feeds]}
Source:

(private, inner) createChild(bucket_links, hash, refresh_label)

Spawns a new child process for the normal queue.
Parameters:
Name Type Description
bucket_links Object Fetched batch by getNextBatch
hash String Batch hash id
refresh_label String Fetch Interval of the batch
Source:

(private, inner) createChild_for_failed_queue(bucket_links, hash, refresh_label)

Spawns a new child process for the failed queue.
Parameters:
Name Type Description
bucket_links Object Fetched batch by getNextBatch
hash String Batch hash id
refresh_label String Fetch Interval of the batch
Source:

(private, inner) elasticIndex(dict)

Indexes js Object into Elasticsearch. If elasticsearch enabled from config.
Parameters:
Name Type Description
dict JSON
Source:

(private, inner) msg()

Used to call Logger object with the caller function name.
Source:

(private, inner) nextBatch(fn)

Called by starter to fetch next batch from db.
Parameters:
Name Type Description
fn function callback
Source:

(private, inner) nextFailedBatch()

Fetches a batch from failed queue.
Source:
Fetches rss files and updates links from it. Rss file links are provided from the rss collection of crawler. This function is run in a setInterval.
Source:

(private, inner) starter()

Responsible for allocating vacant childs This function is run continously in an interval to check and realocate workers.
Source:

(private, inner) startTika()

Launches a child process for pdf requests
Source: