core.crawler
Crawler
Base class for crawlers.
If you want to create a new crawler, you should inherit from this class. It provides some useful methods for crawling.
Args:
url
(str): Base url for the crawler.__init__
__init__(url: str)
get_page
get_page(url: str = None, path: str = <class 'str'>, **kwargs) → str
Get a page from a given url. This is just a wrapper around get_response
method.
get_page_soup
get_page_soup(
url: str = None,
enable_cache: bool = True,
**kwargs
) → BeautifulSoup
Get a BeautifulSoup object from a given url. This is just a wrapper around get_page
method.
get_response
get_response(url: str = None, path: str = <class 'str'>, **kwargs) → Response
Get a response from a given url.
Args:
url
(str, optional): Url to get response from. Defaults to None.path
(str, optional): Path to join with base url. Defaults to str.kwargs
: Keyword arguments to pass to requests.get
.Returns:
requests.Response
: Response from the given url.join_url
join_url(*args: str) → str
Join url parts.
Args:
*args (str)
: Url parts.Returns:
str
: Joined url.This file was automatically generated via lazydocs.