DSJSON: A Python Package to Make Dataset-JSON Simple

Trinath Panda
Aug 23
3 min read

Making Dataset-JSON files usually means a lot of manual work—lining up data, adding metadata by hand, and checking if it all matches. I wanted to make that easier. So, I built a Python package that takes your SDTM/ADaM data and metadata (CSV, Excel, or JSON) and turns it straight into a Dataset-JSON v1.1 file.

Under the Hood: The Implementation

Core Components

Detail the two main functions:

load_metadata: For loading column metadata from various sources.
to_dataset_json: For creating the final Dataset-JSON structure.

Key Design Decisions

Automatic generation of datasetJSONCreationDateTime.
Enforcement of required top-level fields like name, label, and itemGroupOID.
Validation to ensure that the columns in the metadata and data match.

The Development Journey

I’ll be honest: building a package always looks simple until you start.

Step 1: Framing the Problem

I broke down the Dataset-JSON spec into minimum viable building blocks:

Rows: the actual data.
Columns metadata: name, label, data type, and mapping.
Top-level metadata: datasetJSONversion, originator, datasetJSONCreationDateTime, etc.

If I could get those three aligned, the rest would fall into place.

Step 2: Implementation Choices

Base on pandas: Every clinical programmer moving into Python touches pandas. That had to be my foundation.
Metadata flexibility: Some teams maintain metadata in CSVs, some in Excel, some in JSON. So, I built loaders for all three.
Minimal dependencies: Keep it lean so it doesn’t break when someone installs it in a restricted clinical IT environment.

Step 3: The First Working Version

The first time I ran:

from dsjson import load_metadata, to_dataset_json
import pandas as pd
rows = pd.read_csv("examples/vs.csv")
columns = load_metadata("examples/columns_vs.csv", file_type="csv")
ds = to_dataset_json(rows, columns, dataset_name="VS", dataset_label="Vital Signs")

…and it produced a valid Dataset-JSON file. That was one of those small developer victories that feels huge.

From Local Script to Python Package

Here’s where things got interesting. Writing code is one thing; turning it into a package others can install with pip is a different game.

I had to:

Structure the repo properly (core, tests, examples).
Write documentation that wouldn’t make people quit after the first read.
Create a CHANGELOG.md (because future me will forget why I changed things).
Push to PyPI with a clean versioning system.

On August 19, 2025, I finally tagged and released v1.0 to PyPI. That moment when pip install dsjson actually worked. Priceless.

What the Package Does Today

At v1.0, DSJSON is focused and pragmatic:

Input: Any pandas-friendly dataset (CSV, Excel, JSON) + column metadata from CSV, Excel, or JSON.
Output: A conformant Dataset-JSON v1.1 file with all required metadata (datasetJSONVersion, datasetJSONCreationDateTime, name, label, itemGroupOID, columns, rows, records, originator, sourceSystem_name, etc.).
Utility functions: Load metadata, validate structure, and generate Dataset-JSON in one shot.

It doesn’t try to be everything. It just does one job well: make Dataset-JSON generation simple and reproducible.

Lessons Learned Along the Way

Simplicity wins: Don’t try to build the “perfect” package on day one. Ship something useful, then improve.
Documentation matters: If you don’t explain it well, even good code looks unusable.
Versioning discipline: Writing a changelog is boring… until it saves you from asking “what the hell did I change last month?”
Releasing is a skill: Getting it onto PyPI took as much learning as writing the code itself.

What’s Next

The roadmap is clear:

Add support for XML metadata input.
Build validation against the official Dataset-JSON schema.
Explore integration with FHIR resources for real-world data pipelines.

Closing Thoughts

Version 1.0 is just the beginning. The package will grow with feedback, new ideas, and real-world use. It’s out there now easy to try, easy to use, and open for contributions.

The code is on GitHub: DSJSON-PY.

Checkout the PyPI package: dsjson

If you have ideas or find issues, open an Issue or drop me a message. Let’s keep making clinical data tools better together.