There are two ways to use Jetstream: either to perform some tasks on your data using the ‘jet’ command-line tool provided by Jetstream, or as a part of your own application.
In either case, Jetstream must first be configured, using YAML. The configuration tells Jetstream what components to instantiate and how, and how to combine them into one or more pipes runnable by Jetstream.
The YAML configuration consists of two or more sections declaring the components to be used (at least an Input and Output) and a section declaring the pipes.
Each component is configured under a component type section which is either ‘inputs’, ‘introspectors’, ‘transformers’, or ‘outputs’. Under the type section, each component is listed as:
<component title>: &<component_id>
description: <some description here>
use: <a fully-qualified Python dotted name of the factory>
The description field is optional but recommended. Also note that the component_id can not include spaces.
An example configuration of an Input component:
inputs:
my MySQL data source: &sqlsource
description: an example MySQL data source
use: mypackage.mymodule.get_my_sql_source
Each pipe is configured under ‘pipes’ section, and is of the form:
<pipe title>:
- *<component_id>
- *<component2_id>
Where there can be an arbitrary number of component id’s listed.
An example configuration of a pipe with two components:
pipes:
example pipe:
- *sqlsource
- *csvoutput
Any number of components can be freely arranged into a pipe, as long as it is started by an Input, and ends in an Output - although if there’s no Output configured, Jetstream will add a standard Output that simply prints out the data records.
Here is the full configuration file from Jetstream tests:
inputs:
dummy source: &source
description: a dummy source
use: tests.components.Input
inspectors:
dummy validator: &validate
description: a dummy validator
use: tests.components.Validator
transformers:
dummy mapper: &map
description: a basic dummy mapper
use: jetstream.util.FieldMapper
map:
- number: Numero
- description: Selite
- amount: Summa
dummy constructor: &construct
description: a simple object constructor
use: jetstream.util.KlassConstructor
outputs:
dummy subscriber: &subscribe
description: a dummy subscriber
use: tests.components.Subscriber
pipes:
dummy pipe:
- *source
- *map
- *validate
Todo
write the cli & docs
To use Jetstream in your own project:
import jetstream
Todo
explain how to embed Jetstream