.. _repacking_nwb_files: How to Repack NWB Files ======================= When you have an existing NWB file that was created without optimal chunking and compression settings, or when you want to convert between storage backends (HDF5 and Zarr), you can use NeuroConv's repacking functionality to create a new file with improved configurations without losing any data. What is Repacking? ------------------ Repacking is the process of reading an existing NWB file and writing its contents to a new file with updated backend configuration settings. This is useful for: - Applying recommended chunking and compression to files created without them - Converting between HDF5 and Zarr storage backends - Updating compression methods or levels for better performance or storage efficiency - Applying NeuroConv's default backend configurations to legacy files The :py:meth:`~neuroconv.tools.nwb_helpers.repack_nwbfile` function handles this process automatically, reading the source file and writing a new file with optimal default settings. .. note:: Repacking creates a **new file** and does not modify the original. Both files will exist after the operation completes, so ensure you have sufficient disk space. Basic Repacking Example ------------------------ The simplest use case is repacking a file to apply NeuroConv's recommended default settings: .. code-block:: python from neuroconv.tools.nwb_helpers import repack_nwbfile # Repack with default recommended settings repack_nwbfile( nwbfile_path="original_file.nwb", export_nwbfile_path="repacked_file.nwb", ) This will: 1. Read the original NWB file (automatically detecting whether it's HDF5 or Zarr) 2. Apply NeuroConv's recommended chunking and compression settings 3. Write a new file with the same backend type as the original The repacked file will contain all the same data as the original, but with optimized storage settings that can improve read performance and reduce file size. Converting Between Backends ---------------------------- You can convert between HDF5 and Zarr backends by specifying the ``export_backend`` parameter: **Converting HDF5 to Zarr** .. code-block:: python from neuroconv.tools.nwb_helpers import repack_nwbfile # Convert from HDF5 (.nwb) to Zarr (.nwb.zarr) repack_nwbfile( nwbfile_path="file.nwb", export_nwbfile_path="file.nwb.zarr", export_backend="zarr", ) **Converting Zarr to HDF5** .. code-block:: python from neuroconv.tools.nwb_helpers import repack_nwbfile # Convert from Zarr (.nwb.zarr) to HDF5 (.nwb) repack_nwbfile( nwbfile_path="file.nwb.zarr", export_nwbfile_path="file.nwb", export_backend="hdf5", ) .. tip:: Zarr is particularly well-suited for cloud storage and parallel access, while HDF5 is a more mature format with broader tool support. Choose the backend that best fits your use case. Complete Workflow Example -------------------------- Here's a complete example showing how to create an uncompressed file, inspect its properties, repack it, and verify the improvements: .. code-block:: python from datetime import datetime from uuid import uuid4 from pathlib import Path from pynwb import NWBFile, TimeSeries, NWBHDF5IO from neuroconv.tools.nwb_helpers import repack_nwbfile import h5py import numpy as np # Create sample data session_start_time = datetime(2020, 1, 1, 12, 30, 0) nwbfile = NWBFile( identifier=str(uuid4()), session_start_time=session_start_time, session_description="Example session for repacking demo", ) # Add a large time series without compression data = np.random.randn(10000, 10) # 10,000 time points, 10 channels timestamps = np.arange(10000) * 0.001 # 1 kHz sampling time_series = TimeSeries( name="LargeTimeSeries", description="Example data without compression", unit="volts", data=data, timestamps=timestamps, ) nwbfile.add_acquisition(time_series) # Write without compression original_path = "uncompressed_file.nwb" with NWBHDF5IO(original_path, mode="w") as io: io.write(nwbfile) # Check original file properties with h5py.File(original_path, "r") as f: dataset = f["acquisition/LargeTimeSeries/data"] print("Original file:") print(f" Chunks: {dataset.chunks}") print(f" Compression: {dataset.compression}") print(f" Size: {Path(original_path).stat().st_size / 1024:.1f} KB") # Repack with recommended settings repacked_path = "repacked_file.nwb" repack_nwbfile( nwbfile_path=original_path, export_nwbfile_path=repacked_path, ) # Check repacked file properties with h5py.File(repacked_path, "r") as f: dataset = f["acquisition/LargeTimeSeries/data"] print("\nRepacked file:") print(f" Chunks: {dataset.chunks}") print(f" Compression: {dataset.compression}") print(f" Size: {Path(repacked_path).stat().st_size / 1024:.1f} KB") Expected output:: Original file: Chunks: None Compression: None Size: 823.5 KB Repacked file: Chunks: (10000, 1) Compression: gzip Size: 156.2 KB This demonstrates that repacking can significantly reduce file size while maintaining all the original data. See Also -------- - :doc:`../user_guide/backend_configuration` - For advanced backend configuration options, custom compression settings, and troubleshooting