Examples
The best way to explain the features spiderpig gives you is to show it in use.
Configuration Only
Imagine we have the following configuration stored in config.yaml
:
who: world
Spiderpig is able to inject the configuration property who
to all functions
annotated by spiderpig.configured
decorator. This configuration can be
locally overridden by spiderpig.configuration
context. If the parameter is
given when the function is invoked, the global configuration is also overridden
including all nested invocations of other spiderpig functions.
import spiderpig as sp
@sp.configured()
def hello(who=None):
print('Hello', who)
@sp.configured()
def interview(interviewer, who=None):
print(interviewer, end=': ')
hello()
# you can also use spiderpig.init function without "with"
with sp.spiderpig(config_file='config.yaml'):
hello()
with sp.configuration(who='universe'):
hello()
hello('everybody')
interview('God', 'Jesus')
Resulting output:
Hello world
Hello universe
Hello everybody
God: Hello Jesus
Instead of specifying the configuration file, the parameters can be passed directly
to spiderpig.spiderpig
(or spiderpig.init
):
with sp.spiderpig(who='world'):
...
Or you can use environment variables:
export WHO=world
with sp.spiderpig():
hello()
with sp.configuration(who='universe'):
hello()
Resulting output:
Hello world
Hello universe
Caching
Spiderpig is also able to cache time-expensive computations. In this case, the working directory has to be specified. Imagine the following scenario where we try to compute the Fibonacci numbers:
import spiderpig as sp
@sp.cached()
def fibonacci(n=0):
print('Computing fibonacci({})'.format(n))
if n <= 1:
return 1
return n * fibonacci(n - 1)
sp.init(n=5)
print(fibonacci())
print(fibonacci(6))
Resulting output:
Computing fibonacci(5)
Computing fibonacci(4)
Computing fibonacci(3)
Computing fibonacci(2)
Computing fibonacci(1)
120
Computing fibonacci(6)
720
Command-line Tool
Using spiderpig you can easily build a command-line tool (the full example is available on GitHub). Imagine you want to create a crawler. Let's start with a command which downloads and prints a HTML page from its URL. For the crawler we will build the following directory structure:
.
├── crawler.py
├── general
│ ├── commands
│ │ ├── __init__.py
│ │ └── url_html.py
│ ├── __init__.py
│ └─── model.py
└── wikipedia
├── commands
│ ├── __init__.py
│ └── intro.py
└─── __init__.py
Python Code
Firstly, we create a function which downloads HTML as a plain text and than we
pass this plain text to BeautifulSoup to make the output more beautiful. We
will put this code into the general/model.py
file.
from bs4 import BeautifulSoup
from spiderpig.msg import Verbosity, print_debug
from urllib.request import urlopen
import spiderpig as sp
@sp.cached()
def load_page_content(url, verbosity=Verbosity.INFO):
if verbosity > Verbosity.INFO:
print_debug('Downloading {}'.format(url))
return urlopen(url).read()
def load_html(url):
return BeautifulSoup(load_page_content(url), 'html.parser')
Please, notice that load_page_content
function is annotated by
spiderpig.cached
decorator.
Secondly, we create a command itself. For spiderpig, commands are all modules
from the specified package having an execute
function. We put the source code
of the command into the general/commands/url_show.py
file.
"""
Download HTML web page from the specified URL and print it on the standard
output or to the specified file.
"""
from .. import model
import os
def execute(url, output=None):
if not url.startswith('http'):
url = 'http://' + url
html = model.load_html(url).prettify()
if output:
directory = os.path.dirname(output)
if not os.path.exists(directory):
os.makedirs(directory)
with open(output, 'w') as f:
f.write(html)
else:
print(html)
Finally, we create an executable file crawler.py
:
#!/usr/bin/env python
from spiderpig import run_cli
import general.commands
import wikipedia.commands
run_cli(
command_packages=[general.commands],
)
If you have more packages with commands, it is useful to prefix them with namespace:
run_cli(
command_packages=[general.commands],
namespaced_command_packages={'wiki': wikipedia.commands}
)
Usage
Spiderpig automatically loads your commands and make them accessible for you.
$ ./crawler.py --help
usage: crawler.py [-h] [--cache-dir CACHE_DIR] [--override-cache]
[--verbosity {0,1,2,3}] [--max-in-memory-entries MAX_IN_MEMORY_ENTRIES]
{url-html,wiki-intro,spiderpig-executions} ...
positional arguments:
{url-html,wiki-intro,spiderpig-executions}
url-html Download HTML web page from the specified URL and
print it on the standard output or to the specified
file.
wiki-intro Download and print the first paragraph from Wikipedia
for the given keyword.
spiderpig-executions
optional arguments:
-h, --help show this help message and exit
--cache-dir CACHE_DIR
--override-cache
--verbosity {0,1,2,3}
--max-in-memory-entries MAX_IN_MEMORY_ENTRIES
$
It automatically creates argparse
configuration parsers from parameters of your
execute
function:
$ ./crawler.py url-html --help
usage: crawler.py url-html [-h] [--output OUTPUT] --url URL
optional arguments:
-h, --help show this help message and exit
--output OUTPUT default: None
--url URL
$
Using debugging prints, we can easily check that the caching works as we expect:
$ ./crawler.py --verbosity 1 url-html --url google.com --output /dev/null
Downloading http://google.com
$ ./crawler.py --verbosity 1 url-html --url google.com --output /dev/null
$