How to Repack NWB Files#

When you have an existing NWB file that was created without optimal chunking and compression settings, or when you want to convert between storage backends (HDF5 and Zarr), you can use NeuroConv’s repacking functionality to create a new file with improved configurations without losing any data.

What is Repacking?#

Repacking is the process of reading an existing NWB file and writing its contents to a new file with updated backend configuration settings. This is useful for:

  • Applying recommended chunking and compression to files created without them

  • Converting between HDF5 and Zarr storage backends

  • Updating compression methods or levels for better performance or storage efficiency

  • Applying NeuroConv’s default backend configurations to legacy files

The repack_nwbfile() function handles this process automatically, reading the source file and writing a new file with optimal default settings.

Note

Repacking creates a new file and does not modify the original. Both files will exist after the operation completes, so ensure you have sufficient disk space.

Basic Repacking Example#

The simplest use case is repacking a file to apply NeuroConv’s recommended default settings:

from neuroconv.tools.nwb_helpers import repack_nwbfile

# Repack with default recommended settings
repack_nwbfile(
    nwbfile_path="original_file.nwb",
    export_nwbfile_path="repacked_file.nwb",
)

This will:

  1. Read the original NWB file (automatically detecting whether it’s HDF5 or Zarr)

  2. Apply NeuroConv’s recommended chunking and compression settings

  3. Write a new file with the same backend type as the original

The repacked file will contain all the same data as the original, but with optimized storage settings that can improve read performance and reduce file size.

Converting Between Backends#

You can convert between HDF5 and Zarr backends by specifying the export_backend parameter:

Converting HDF5 to Zarr

from neuroconv.tools.nwb_helpers import repack_nwbfile

# Convert from HDF5 (.nwb) to Zarr (.nwb.zarr)
repack_nwbfile(
    nwbfile_path="file.nwb",
    export_nwbfile_path="file.nwb.zarr",
    export_backend="zarr",
)

Converting Zarr to HDF5

from neuroconv.tools.nwb_helpers import repack_nwbfile

# Convert from Zarr (.nwb.zarr) to HDF5 (.nwb)
repack_nwbfile(
    nwbfile_path="file.nwb.zarr",
    export_nwbfile_path="file.nwb",
    export_backend="hdf5",
)

Tip

Zarr is particularly well-suited for cloud storage and parallel access, while HDF5 is a more mature format with broader tool support. Choose the backend that best fits your use case.

Complete Workflow Example#

Here’s a complete example showing how to create an uncompressed file, inspect its properties, repack it, and verify the improvements:

from datetime import datetime
from uuid import uuid4
from pathlib import Path

from pynwb import NWBFile, TimeSeries, NWBHDF5IO
from neuroconv.tools.nwb_helpers import repack_nwbfile
import h5py
import numpy as np

# Create sample data
session_start_time = datetime(2020, 1, 1, 12, 30, 0)
nwbfile = NWBFile(
    identifier=str(uuid4()),
    session_start_time=session_start_time,
    session_description="Example session for repacking demo",
)

# Add a large time series without compression
data = np.random.randn(10000, 10)  # 10,000 time points, 10 channels
timestamps = np.arange(10000) * 0.001  # 1 kHz sampling

time_series = TimeSeries(
    name="LargeTimeSeries",
    description="Example data without compression",
    unit="volts",
    data=data,
    timestamps=timestamps,
)
nwbfile.add_acquisition(time_series)

# Write without compression
original_path = "uncompressed_file.nwb"
with NWBHDF5IO(original_path, mode="w") as io:
    io.write(nwbfile)

# Check original file properties
with h5py.File(original_path, "r") as f:
    dataset = f["acquisition/LargeTimeSeries/data"]
    print("Original file:")
    print(f"  Chunks: {dataset.chunks}")
    print(f"  Compression: {dataset.compression}")
    print(f"  Size: {Path(original_path).stat().st_size / 1024:.1f} KB")

# Repack with recommended settings
repacked_path = "repacked_file.nwb"
repack_nwbfile(
    nwbfile_path=original_path,
    export_nwbfile_path=repacked_path,
)

# Check repacked file properties
with h5py.File(repacked_path, "r") as f:
    dataset = f["acquisition/LargeTimeSeries/data"]
    print("\nRepacked file:")
    print(f"  Chunks: {dataset.chunks}")
    print(f"  Compression: {dataset.compression}")
    print(f"  Size: {Path(repacked_path).stat().st_size / 1024:.1f} KB")

Expected output:

Original file:
  Chunks: None
  Compression: None
  Size: 823.5 KB

Repacked file:
  Chunks: (10000, 1)
  Compression: gzip
  Size: 156.2 KB

This demonstrates that repacking can significantly reduce file size while maintaining all the original data.

See Also#

  • Backend Configuration - For advanced backend configuration options, custom compression settings, and troubleshooting