numpy
HAS_NUMPY = True
¶
NumpySaver(file_path, approach='append')
¶
Bases: Saver
A saver that writes data to a .npy file.
NumpySaver
provides functionality to persist computational results in NumPy's
.npy format, which is optimized for storing and loading numpy arrays. This format
preserves shape, data type, and other array information, making it ideal for
numerical data.
This implementation supports three approaches to saving data:
append
: Add new data to the existing array (creates if not exists)overwrite
: Replace existing data with new dataupdate
: Update specific indices in the existing array with new values
NumPy's .npy format offers specific advantages:
- Fast load and save operations
- Preservation of data types and array structures
- Compact binary storage
- Native integration with NumPy's ecosystem
Examples:
Basic usage:
# Create a NumpySaver for storing results
saver = NumpySaver("results.npy")
# Use with TaskManager
task_manager = TaskManager(MyTask)
task_manager.submit_tasks(items, saver=saver, save_interval=100)
Overwriting existing data:
# Create a saver that overwrites existing data
saver = NumpySaver("daily_metrics.npy", approach="overwrite")
# Save new results, replacing any existing file
results = process_batch(today_data)
saver.save(results)
Updating specific indices:
# Create a saver for updating existing data
saver = NumpySaver("time_series.npy", approach="update")
# Update specific indices with new values
new_data = [99.5, 98.3, 97.8]
indices = [10, 20, 30] # Positions to update
saver.save(new_data, indices=indices)
Notes
- NumPy's .npy format is best suited for numerical data where the entire array structure needs to be preserved.
- For very large datasets where memory is a concern, consider using HDF5Saver or ZarrSaver instead, as .npy files are loaded entirely into memory.
- The append operation loads the entire existing array into memory before appending, which may be inefficient for very large arrays.
- For multidimensional arrays, shape compatibility is important when using append or update approaches.
Initialize a NumpySaver instance.
PARAMETER | DESCRIPTION |
---|---|
file_path
|
The path to the .npy file where data will be saved.
TYPE:
|
approach
|
One of
TYPE:
|
Notes
- The file_path should have the .npy extension for compatibility with NumPy's load and save functions.
-
The approach parameter determines the behavior when saving data to an existing file:
-
append
: Concatenates new data to existing data overwrite
: Replaces the entire file with new dataupdate
: Modifies specific indices in the existing data
approach = approach.strip().lower()
¶
file_path = file_path
¶
save(data, indices=None, **kwargs)
¶
Saves the data to a .npy file according to the specified approach.
This method implements the abstract save method from the Saver base class. It persists the provided data to a NumPy .npy file using the configured approach (append, overwrite, or update).
The method handles creating new files, appending to existing files, or updating specific indices in existing files. It automatically converts the input data to a numpy array before saving.
PARAMETER | DESCRIPTION |
---|---|
data
|
A list of results to save. The data will be converted to a numpy array before saving. |
indices
|
Required when approach is 'update', specifies the indices where data should be written in the existing array. Must be compatible with the shape of the input data.
TYPE:
|
**kwargs
|
Additional keyword arguments. Current implementation does not use these parameters, but they are accepted for compatibility with the Saver interface. |
RAISES | DESCRIPTION |
---|---|
ImportError
|
If the numpy library is not installed. |
ValueError
|
If approach is 'update' but indices is None. |
ValueError
|
If an unknown approach is specified. |
FileNotFoundError
|
If attempting to update a non-existent file. |
TypeError
|
If the data cannot be converted to a numpy array. |
Examples:
Saving data with the append approach:
saver = NumpySaver("results.npy")
# First save creates the file
saver.save([1, 2, 3, 4, 5])
# Subsequent saves append to it
saver.save([6, 7, 8, 9, 10])
Overwriting existing data:
saver = NumpySaver("metrics.npy", approach="overwrite")
# Save data, replacing any existing file
saver.save([10, 20, 30, 40, 50])
Updating specific indices:
saver = NumpySaver("values.npy", approach="update")
# First create the file
saver.approach = "overwrite"
saver.save([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
# Then update specific positions
saver.approach = "update"
new_values = [99, 88, 77]
indices = [2, 5, 8] # Positions to update
saver.save(new_values, indices=indices)
# Result would be [0, 0, 99, 0, 0, 88, 0, 0, 77, 0]
Notes
- The append operation loads the entire existing file into memory, concatenates the new data, and saves the combined result. This may be inefficient for very large arrays.
- When updating, the indices and data must have compatible shapes.
- For large datasets, consider using HDF5Saver or ZarrSaver which have more efficient append and update operations.