Class: URLCreator

URLCreator

new URLCreator(message_obj)

Represents URL and it's crawled details. has parsing functions.
Parameters:
Name Type Description
message_obj Message
Author:
Source:

Methods

url(url_input, d, p)

Returns a URL object. With url details and helper methods.
Parameters:
Name Type Description
url_input String
d String domain
p String parent
Source:

(private, inner) extractDomain(url)

Extractes the domain from url.
Parameters:
Name Type Description
url String
Source:

(private, inner) getFileType(url)

Returns 'file' or 'webpage' based on URL and tika config.
Parameters:
Name Type Description
url String
Source:

(private, inner) isAccepted(url, domain) → {boolean}

Returns accepted or rejected status based on the regexes in config.
Parameters:
Name Type Description
url String
domain String
Source:
Returns:
Type
boolean

(private, inner) normalizeDomain(url)

Normalizes domain.
Parameters:
Name Type Description
url String
Source:

(private, inner) normalizeProtocol(url)

Normalizes protocol to http:
Parameters:
Name Type Description
url String
Source:

(private, inner) normalizeURL(url)

Normalize a url.
Parameters:
Name Type Description
url String
Source:

(private, inner) nutchStyleURLKey(url)

Returns nutch style url.
Parameters:
Name Type Description
url String
Source:

(private, inner) sortedParams(url)

Sorts param from the url.
Parameters:
Name Type Description
url String
Source: