dalood.loader package

Submodules

dalood.loader.base module

Data loader base class.

class dalood.loader.base.LoaderBase[source]

Bases: ABC

Base class for loading data from URIs or filepaths.

get_mtime(src)[source]

Attempt to determine the last modification time of the source.

Parameters:

src – The data source (e.g. a file path or URI).

Returns:

A datetime.datetime object representing the last known modification time, or None if the time cannot be determined.

abstractmethod load(src)[source]

Load data from the given source.

Parameters:

src – The data source. For some loaders this may simply be a file path or URI. For others it may be an arbitrary string that only has meaning to the loader. For example, some loaders can map user-defined strings to pre-defined static data, or SQL statements that can be used to retrieve a Pandas DataFrame from an open database connection.

Returns:

The loaded data in a form appropriate for the loader.

property patterns[source]

An iterator over 2-tuples of patterns and their PatternType associated with this loader. For example, a text file loader might return the 2-tuple (r”^(?!https?://).*.txt$”, PatternType.REGEX) to load local files by the “.txt” extenstion.

register_patterns(manager, prioritize=False)[source]

Register this loader with a manager for each of its common patterns.

Parameters:
  • manager – An instance of Manager.

  • prioritize – If True, ensure that this loaders patterns take precedence over existing patterns in the manager.

dalood.loader.file module

Base class for other file loaders.

class dalood.loader.file.FileLoaderBase(*args, **kwargs)[source]

Bases: LoaderBase

Base class for loading data from files.

DEFAULT_ENCODING = 'utf-8'[source]
__init__(*args, **kwargs)[source]
Parameters:
  • *args – Positional arguments passed through to pathlib.open after the path.

  • **kwargs – Keyword arguments passed through to pathlib.open.

get_mtime(src)[source]

Attempt to determine the last modification time of the source.

Parameters:

src – The data source (e.g. a file path or URI).

Returns:

A datetime.datetime object representing the last known modification time, or None if the time cannot be determined.

stream(src, *exceptions)[source]

Context manager for accessing the open file buffer.

dalood.loader.json module

JSON data loaders.

class dalood.loader.json.JSONFileLoader[source]

Bases: FileLoaderBase

JSON file data loader.

__init__()[source]
Parameters:
  • *args – Positional arguments passed through to pathlib.open after the path.

  • **kwargs – Keyword arguments passed through to pathlib.open.

load(src)[source]

Load data from the given source.

Parameters:

src – The data source. For some loaders this may simply be a file path or URI. For others it may be an arbitrary string that only has meaning to the loader. For example, some loaders can map user-defined strings to pre-defined static data, or SQL statements that can be used to retrieve a Pandas DataFrame from an open database connection.

Returns:

The loaded data in a form appropriate for the loader.

property patterns[source]

An iterator over 2-tuples of patterns and their PatternType associated with this loader. For example, a text file loader might return the 2-tuple (r”^(?!https?://).*.txt$”, PatternType.REGEX) to load local files by the “.txt” extenstion.

class dalood.loader.json.JSONUrlLoader(*args, **kwargs)[source]

Bases: UrlLoaderBase

JSON URL data loader.

__init__(*args, **kwargs)[source]
Parameters:

timeout – Timeout parameter passed through to requests.

load(src)[source]

Load data from the given source.

Parameters:

src – The data source. For some loaders this may simply be a file path or URI. For others it may be an arbitrary string that only has meaning to the loader. For example, some loaders can map user-defined strings to pre-defined static data, or SQL statements that can be used to retrieve a Pandas DataFrame from an open database connection.

Returns:

The loaded data in a form appropriate for the loader.

dalood.loader.map module

Base class for loaders that map sources to data via a user-defined map.

class dalood.loader.map.MapLoaderBase(mapping=None)[source]

Bases: LoaderBase

Base class for loaders that map sources to data via a user-defined map.

__init__(mapping=None)[source]
Parameters:

mapping – A dict mapping the source arguments to user-defined data. How that data is used will be determined by the implementation of load and get_mtime in derived classes. For example, for a loader that loads Pandas DataFrames from a database, the mapping could map source arguments to SQL statements that are passed through to pandas.read_sql.

map(src, value)[source]

Map a source to a value in the internal mapping.

Parameters:
  • src – The source to map.

  • value – The value to which to map the source.

property patterns[source]

An iterator over 2-tuples of patterns and their PatternType associated with this loader. For example, a text file loader might return the 2-tuple (r”^(?!https?://).*.txt$”, PatternType.REGEX) to load local files by the “.txt” extenstion.

dalood.loader.memory module

Loader for objects in memory.

class dalood.loader.memory.MemoryLoader(mapping=None)[source]

Bases: MapLoaderBase

Loader for object in memory. This simply provided a uniform API for accessing objects already in memory such as those passed in by the user. In this case, the mapping of the parent MapLoaderBase class will map source arguments directly to returned data.

load(src)[source]

Load data from the given source.

Parameters:

src – The data source. For some loaders this may simply be a file path or URI. For others it may be an arbitrary string that only has meaning to the loader. For example, some loaders can map user-defined strings to pre-defined static data, or SQL statements that can be used to retrieve a Pandas DataFrame from an open database connection.

Returns:

The loaded data in a form appropriate for the loader.

dalood.loader.pandas module

Pandas dataframe loaders

class dalood.loader.pandas.DataFrameCSVLoader(**kwargs)[source]

Bases: FileLoaderBase

CSV & TSV file dataframe loader.

__init__(**kwargs)[source]
Parameters:

**kwargs – Keyword arguments for pandas.read_csv().

load(src)[source]

Load data from the given source.

Parameters:

src – The data source. For some loaders this may simply be a file path or URI. For others it may be an arbitrary string that only has meaning to the loader. For example, some loaders can map user-defined strings to pre-defined static data, or SQL statements that can be used to retrieve a Pandas DataFrame from an open database connection.

Returns:

The loaded data in a form appropriate for the loader.

property patterns[source]

An iterator over 2-tuples of patterns and their PatternType associated with this loader. For example, a text file loader might return the 2-tuple (r”^(?!https?://).*.txt$”, PatternType.REGEX) to load local files by the “.txt” extenstion.

class dalood.loader.pandas.DataFrameSQLLoader(mapping=None, **kwargs)[source]

Bases: MapLoaderBase

Text URL data loader.

__init__(mapping=None, **kwargs)[source]
Parameters:
  • mapping – A dict mapping the source arguments to SQL statements. When a source is requested, the mapped SQL statement will be passed through to pandas.read_sql() along with the keyword arguments in kwargs.

  • **kwargs – Keyword arguments for pandas.read_sql().

load(src)[source]

Load data from the given source.

Parameters:

src – The data source. For some loaders this may simply be a file path or URI. For others it may be an arbitrary string that only has meaning to the loader. For example, some loaders can map user-defined strings to pre-defined static data, or SQL statements that can be used to retrieve a Pandas DataFrame from an open database connection.

Returns:

The loaded data in a form appropriate for the loader.

dalood.loader.text module

Text data loaders.

class dalood.loader.text.TextFileLoader(encoding='utf-8')[source]

Bases: FileLoaderBase

Text file data loader.

__init__(encoding='utf-8')[source]
Parameters:

encoding – The file encoding.

load(src)[source]

Load data from the given source.

Parameters:

src – The data source. For some loaders this may simply be a file path or URI. For others it may be an arbitrary string that only has meaning to the loader. For example, some loaders can map user-defined strings to pre-defined static data, or SQL statements that can be used to retrieve a Pandas DataFrame from an open database connection.

Returns:

The loaded data in a form appropriate for the loader.

property patterns[source]

An iterator over 2-tuples of patterns and their PatternType associated with this loader. For example, a text file loader might return the 2-tuple (r”^(?!https?://).*.txt$”, PatternType.REGEX) to load local files by the “.txt” extenstion.

class dalood.loader.text.TextUrlLoader(*args, encoding='utf-8', **kwargs)[source]

Bases: UrlLoaderBase

Text URL data loader.

__init__(*args, encoding='utf-8', **kwargs)[source]
Parameters:

encoding – The stream encoding.

load(src)[source]

Load data from the given source.

Parameters:

src – The data source. For some loaders this may simply be a file path or URI. For others it may be an arbitrary string that only has meaning to the loader. For example, some loaders can map user-defined strings to pre-defined static data, or SQL statements that can be used to retrieve a Pandas DataFrame from an open database connection.

Returns:

The loaded data in a form appropriate for the loader.

dalood.loader.url module

Base class for other URL loaders.

class dalood.loader.url.UrlLoaderBase(timeout=5)[source]

Bases: LoaderBase

Base class for loading data from URLs. This will check modification times via a HEAD request.

__init__(timeout=5)[source]
Parameters:

timeout – Timeout parameter passed through to requests.

get_mtime(src)[source]

Attempt to determine the last modification time of the source.

Parameters:

src – The data source (e.g. a file path or URI).

Returns:

A datetime.datetime object representing the last known modification time, or None if the time cannot be determined.

stream(src, *exceptions)[source]

Context manager for accessing the data raw stream. It returns the response through which the content can be accessed.

dalood.loader.yaml module

YAML data loaders.

class dalood.loader.yaml.YAMLFileLoader(encoding='utf-8')[source]

Bases: FileLoaderBase

YAML file data loader.

__init__(encoding='utf-8')[source]
Parameters:

encoding – The file encoding.

load(src)[source]

Load data from the given source.

Parameters:

src – The data source. For some loaders this may simply be a file path or URI. For others it may be an arbitrary string that only has meaning to the loader. For example, some loaders can map user-defined strings to pre-defined static data, or SQL statements that can be used to retrieve a Pandas DataFrame from an open database connection.

Returns:

The loaded data in a form appropriate for the loader.

property patterns[source]

An iterator over 2-tuples of patterns and their PatternType associated with this loader. For example, a text file loader might return the 2-tuple (r”^(?!https?://).*.txt$”, PatternType.REGEX) to load local files by the “.txt” extenstion.

class dalood.loader.yaml.YAMLUrlLoader(*args, **kwargs)[source]

Bases: UrlLoaderBase

YAML URL data loader.

__init__(*args, **kwargs)[source]
Parameters:

timeout – Timeout parameter passed through to requests.

load(src)[source]

Load data from the given source.

Parameters:

src – The data source. For some loaders this may simply be a file path or URI. For others it may be an arbitrary string that only has meaning to the loader. For example, some loaders can map user-defined strings to pre-defined static data, or SQL statements that can be used to retrieve a Pandas DataFrame from an open database connection.

Returns:

The loaded data in a form appropriate for the loader.

Module contents

Package stub.