Blog

Managing Scientific Data: A Practical How-To Guide

April 25, 2026

managing scientific datalab data managementeln best practicesresearch data integritygxp compliance

You’re probably dealing with this already. The experiment is moving, timers are running, gloves are on, and the most important observation of the day happens when you can’t comfortably type. So you tell yourself you’ll write it up later. Later turns into a rushed reconstruction from memory, a stained paper note, a photo in your camera roll, and a file named something like results_final_revised2.

That’s the problem with managing scientific data. It usually doesn’t fail at the archive. It fails at the bench.

High-level standards help. The FAIR Data Principles became a global standard for making research data more findable, accessible, interoperable, and reusable, but practical guidance still often misses the bench-level reality of capturing observations during active work. That gap matters because the pressure is not abstract. Global research data generation reached 181 zettabytes in 2025, and in GxP-regulated biotech, poor data handling is associated with $50 billion yearly in rework and delays, while 25% of FDA submissions were rejected due to inadequate records as of 2024, according to BAU’s summary of data science statistics.

I’ve seen the same pattern repeatedly. Labs don’t usually need a grand digital transformation first. They need a workable daily system: decide what gets captured, capture it when it happens, save it in a place people can find, and preserve it in a form that still makes sense months later.

Laying the Foundation with a Practical Data Management Plan
Creating Order with Naming Conventions and Metadata
The Art of Contemporaneous Data Capture at the Bench
Implementing Secure Storage Versioning and Access Control
Ensuring Longevity with Backup Retention and Audit-Readiness
A Sample Workflow from Bench to Archive

Laying the Foundation with a Practical Data Management Plan

At 5:40 p.m., the incubator run is done, the plate reader has exported three files with default names, and someone has already left for the day. The experiment is still recoverable at that point, but only if the lab has already decided what gets saved, where it goes, and what counts as the record of that run.

A practical data management plan solves that bench-level problem. It is not grant language pasted into a PDF. It is a short working document that tells people how data moves from instrument to notebook to analysis folder to archive, while the work is still fresh enough to document correctly.

A smiling scientist in a lab coat building a colorful block tower labeled with DMP initials.

In a wet lab, FAIR principles only help if they survive contact with real routines. Files come off shared instruments with bad names. Handwritten notes get transcribed late. Analysis starts before anyone has agreed on which export is raw and which spreadsheet is already cleaned. A usable plan closes that gap. It turns broad data management goals into a few repeatable decisions the team can follow during a normal week.

What a useful DMP actually answers

For a small lab, one page is often enough if it answers the questions people trip over in practice:

What data will be created: raw instrument output, images, notebook entries, calculations, summary tables, scripts, and submission-ready files.
What the official record is: the notebook entry, the instrument file, or a signed-off processed file. Pick one for each data type.
Who does what: who captures observations at the bench, who moves files off the instrument computer, who reviews derived outputs, and who closes the project.
Where each file type belongs: acquisition PC, ELN, team drive, approved cloud folder, or archive.
When data changes status: active, under analysis, approved, submitted, or archived.
What must be recorded immediately: times, deviations, failed runs, sample mix-ups, reruns, and any manual intervention.

If two people would store the same run in different places, the plan is still too vague.

A format small teams will actually maintain

I build these plans around the data lifecycle, but I write them in operating language, not policy language. That keeps the document usable at the bench.

Capture
- What is recorded during the experiment?
- What must be entered contemporaneously?
- How are timestamps, sample IDs, and deviations preserved?
Process
- Which files are raw?
- Which changes are allowed to create derived data?
- Where are calculations, exclusions, and cleanup steps recorded?
Store
- What is the working location?
- What is the protected archive location?
- Who can edit, approve, and view?
Share or submit
- What leaves the lab?
- In what format?
- What context has to travel with it so someone else can interpret it correctly?
Retain
- What stays with the project at closeout?
- What gets archived, and what can be discarded under your institution's rules?

For sponsored work, this can later map to a formal grant document. For day-to-day lab operations, start with a short version that people can check in under two minutes. This guide to building a data management and sharing plan for research workflows is a useful reference if you need a starting structure.

Where labs usually get stuck

The failure mode is rarely lack of policy. It is unresolved ambiguity.

I see the same problems repeatedly: nobody defines the source record, instrument computers become unofficial long-term storage, and teams wait until analysis starts to decide how files should have been handled. By then, reconstruction takes longer than the original setup would have.

Overwriting the plan with too much detail causes a different problem. If every edge case is documented before the team has tested the workflow, the document becomes shelfware. Start with the decisions that prevent loss, confusion, and undocumented changes. Add detail only after the lab has used the system for a few weeks.

A good DMP is checked during setup, not after something is missing.

Creating Order with Naming Conventions and Metadata

Messy file names don’t look like a scientific problem until someone needs to defend a result, reproduce an analysis, or locate the right image set six months later. Then the naming problem becomes a traceability problem.

The fastest improvement most labs can make is a file naming rule that nobody has to interpret. UC San Diego’s research data management guidance recommends defining naming conventions and versioning policies up front, including a format like ProjectName_Section_YYYYMMDD_v1.0.ext, and notes that structured organization can improve reusability by 50-70%, while institutions report an 85% reduction in lost file incidents with organized systems, according to UC San Diego Library’s data management best practices.

A diagram illustrating a hierarchical system for organizing scientific project data folders and files.

A naming scheme that holds up in real lab work

Use a format that answers four questions immediately: what project, what type, what date, what version.

A practical pattern is:

ProjectID_ExperimentType_YYYYMMDD_Initials_v1.0.ext

Examples:

ENZ01_Assay_20260425_JD_v1.0.xlsx
ENZ01_Microscopy_20260425_JD_v1.0.tif
ENZ01_Analysis_20260426_JD_v1.1.R
ENZ01_Report_20260427_JD_v1.0.pdf

A few details matter more than people expect:

Use YYYYMMDD: it sorts correctly.
Avoid spaces: some systems handle them badly.
Keep terms controlled: decide whether the folder says Microscopy or Imaging, not both.
Reserve “final” for nothing: version numbers are clearer than emotional declarations.

Folders should reflect how work happens

Don’t build a folder tree that looks elegant but fights the daily workflow. Most small teams do well with something like this:

Folder	What goes there
`01_RawData`	instrument exports, original images, untouched source files
`02_ProcessedData`	cleaned datasets, transformed outputs, normalized files
`03_Analysis`	scripts, notebooks, statistical outputs, graphs
`04_Documentation`	protocols, README, codebook, methods notes
`05_Reports`	slide decks, PDFs, submission-ready summaries

That structure separates source material from interpretation. It also reduces accidental overwrites because raw files aren’t mixed with active analysis files.

Metadata is just the context your future self will wish you had written down.

Metadata without the pain

Scientists often hear “metadata” and picture a library catalog. In daily practice, metadata can be a simple README stored in the project root. It should answer:

Who generated the data
What the file contains
Which instrument or method produced it
Any abbreviations or codes used
How raw data became processed data

For a biology project, that might include media conditions, cell line identifiers, instrument settings, and analysis script names. For chemistry, it might include batch identifiers, solvent system notes, and naming rules for spectra.

What doesn’t work is relying on memory or putting all context only inside one person’s notebook. Files become findable with naming. They become understandable with metadata.

The Art of Contemporaneous Data Capture at the Bench

The weakest point in many data systems is still the moment data is created. Not stored. Not archived. Created.

Wet-lab scientists know the improvisations: write on a glove, jot on a Kimwipe box, snap a photo of the instrument screen, promise to transfer notes after the run. Those workarounds feel harmless because they’re common. They’re also where detail disappears.

A young scientist wearing goggles and gloves types lab observations into a tablet next to a beaker.

High-level data stewardship guidance often focuses on harmonization and reuse later in the lifecycle. The bench problem is different. A review in Frontiers notes a critical gap in guidance for real-time data capture at the point of experimentation, leaving researchers without practical ways to standardize data during experiments rather than after them, as discussed in this analysis of bench-level FAIR infrastructure gaps.

Why delayed documentation keeps failing

Memory edits events. It smooths timings, drops anomalies, and compresses repeated steps into something cleaner than the true events. That may make notes easier to write, but it makes records weaker.

In regulated work, delayed transcription creates another problem. The lab may still have a record, but not necessarily a contemporaneous one. That distinction matters whenever someone asks whether the entry reflects what happened at the time, or what someone reconstructed later.

A lot of teams think the solution is “be more disciplined with the ELN.” That’s only partly true. Standard ELNs are useful, but many still assume the user can stop, type, move around, and format while the experiment is active. In real bench work, that assumption often fails.

What practical contemporaneous capture looks like

A workable system has to fit physical reality. The scientist might have one free hand. They may be moving between incubator, hood, centrifuge, and instrument. The note-taking method has to be fast enough that using it doesn’t interrupt the work itself.

That usually means:

Capture at the moment of observation: not at lunch, not later in the day.
Preserve timestamps automatically: don’t rely on manual time entry unless you have to.
Separate note sections clearly: objective, materials, procedure, observations, and results should not blur together.
Allow quick correction before finalization: raw capture is useful, but the finalized record still needs review.

If the documentation method requires the experiment to pause, people won’t use it consistently.

Teams should evaluate tools based on workflow fit, not feature count. For GxP-sensitive work, I’d also weigh whether the tool keeps data local, whether it produces timestamped records, and whether the output can be archived in a stable format. Labs thinking through those requirements should review practical GxP documentation requirements for scientific records.

A short demo helps make the shift concrete:

What to stop doing immediately

I’d retire three habits first.

Backfilling from memory: acceptable only for clearly labeled summaries, not source observations.
Using disposable scraps as primary capture: paper towels and glove notes are reminders, not durable records.
Merging raw observation with interpretation: write what you saw first, then your conclusion.

That small shift changes the quality of the whole record. Once bench capture is reliable, the rest of managing scientific data becomes much easier because the core facts were preserved at the right time.

Implementing Secure Storage Versioning and Access Control

After capture, the next question is where the record should live. Most labs don’t have one perfect answer because they’re balancing competing needs: fast access, restricted access, backup reliability, low administrative overhead, and collaboration.

The practical choice usually sits between three options: local storage, institutional servers, and approved cloud storage. Each can work. Each can also create problems if the lab uses it for the wrong purpose.

A simple decision framework

Storage option	Best for	Main risk	Good practice
Local computer or instrument PC	immediate acquisition and short-term active work	hardware failure, weak backup discipline	treat as temporary working storage only
Institutional server or managed network drive	controlled access, longer-term project storage	slower setup, dependency on institutional processes	use as the main team record when available
Approved cloud folder	distributed access and easy sharing	policy conflicts for sensitive work, oversharing	use only when allowed and with clear folder permissions

If a project involves sensitive IP, regulated workflows, or restricted policies, local processing plus controlled storage often makes more sense than sending files through a broad-sharing cloud workflow. For teams evaluating that balance, this overview of data security and compliance in lab documentation is a useful checklist.

Versioning without making it complicated

Many scientists hear “version control” and assume they need Git. Most don’t. For routine lab work, versioning can be much simpler.

Use these rules:

Never overwrite raw data
Create a new version when you clean, transform, or annotate
Make milestone files read-only after review
Keep a changelog in the README for major updates

A straightforward sequence looks like this:

ENZ01_Assay_20260425_JD_v1.0.xlsx for the first processed dataset
ENZ01_Assay_20260425_JD_v1.1.xlsx after fixing a labeling issue
ENZ01_Assay_20260425_JD_v2.0.xlsx after a major restructuring

That beats final, final2, and final_use_this_one.

A version number should tell you whether the file changed a little or changed meaningfully.

Access control should be boring

Good access control is intentionally unremarkable. People should know where to work, where to read, and where not to edit.

For a small team, I’d use three permission levels:

Capture or contribute
- people actively generating notes or raw files
Review and edit processed material
- analysts, senior researchers, or project leads
Read-only archive access
- broader team members, QA, or management

What doesn’t work is a shared drive where everyone can edit everything. That feels collaborative until somebody renames a source folder, replaces a file, or “cleans up” a directory structure that another person depends on.

A decent storage system isn’t the fanciest one. It’s the one that preserves originals, records changes clearly, and limits accidental damage.

Ensuring Longevity with Backup Retention and Audit-Readiness

Most labs think about backup after something goes wrong. The better time is before the first important file is generated.

The core rule is still the 3-2-1 backup strategy described in research data management guidance: keep 3 copies, on 2 media types, with 1 offsite. UC San Diego’s best-practices page pairs that with quality controls like activity logs and calibration records, which is exactly how backup becomes part of data integrity rather than just disaster recovery. Managing scientific data well means treating backup, retention, and audit readiness as one system, not three separate chores.

A digital illustration of a metallic safe secured by three padlocks representing the 3-2-1 backup strategy.

What 3-2-1 looks like in a lab

In a wet-lab setting, a simple implementation might be:

Copy 1
- active working data on the acquisition machine or current project folder
Copy 2
- synchronized copy on a managed server or team storage location
Copy 3
- protected offsite backup managed by the institution or approved provider

The point isn’t to create three random duplicates. Each copy serves a different failure scenario. One protects against accidental deletion, another against local hardware loss, and another against a site-level incident.

Retention is the next piece. Labs should define when active files move into archive status, which versions must be preserved, and what supporting documents stay attached. Raw data without methods notes, version history, or context isn’t really preserved in a useful way.

Audit-ready records come from daily habits

Audit readiness is often framed as a special preparation phase. In reality, auditors usually expose routine weaknesses that have been there all along: missing timestamps, unclear authorship, overwritten source files, unexplained changes, and records that don’t line up with the actual sequence of events.

That’s why ALCOA+ is still a useful practical lens:

Attributable
You can tell who created the entry or file.
Legible
Another person can read and understand it.
Contemporaneous
The record was made when the event happened.
Original
The source record is preserved.
Accurate The record reflects what happened.

The “plus” principles matter just as much in daily work:

Complete
Consistent
Enduring
Available

A clean archive can’t rescue a weak source record. Audit readiness starts at capture, not at inspection time.

Why this matters beyond compliance

The FAIR principles became a global standard in part because science has a reproducibility problem. A 2016 Nature survey found over 70% of researchers failed to reproduce others’ experiments, and poor research data management contributes to an estimated $28 billion annually in wasted U.S. preclinical research funds, as summarized in this review of FAIR, reproducibility, and scientific data stewardship.

Those numbers usually get discussed at the level of institutions or the literature. In a lab, the same issue shows up in smaller but familiar ways:

a result can’t be traced to the exact raw file
a time-sensitive step wasn’t documented clearly
a processed dataset survives, but the original export is missing
a project handoff fails because the folder makes sense only to one person

A practical audit-readiness checklist

Use this as a routine review, not just before an inspection.

Source records preserved
- Raw files remain untouched and identifiable.
Timestamps present
- Entries and timed events reflect when work occurred.
Authorship clear
- Notes and file ownership are attributable.
Naming consistent
- Files follow the lab’s agreed format.
Versions controlled
- Processed outputs show revision history.
Context attached
- README, methods notes, or metadata explain what the files mean.
Backups verified
- The team knows not just that backups exist, but that recovery works.

Many labs often overcomplicate things. You don’t need an expensive enterprise system to become more defensible. You need regular habits that make records durable, understandable, and easy to retrieve.

A Sample Workflow from Bench to Archive

At 5:40 p.m., the run is done, the plate reader has exported six files with useless default names, and someone asks whether well B7 was the sample that sat an extra four minutes. A workable data system has to hold up in that moment.

Here is a bench-level workflow that small labs can run without buying a full informatics stack. The difference is in the handoffs. Most data problems show up there, not in the folder structure itself.

Before the experiment

Set up the run packet before gloves go on. That packet can be a project folder plus a one-page run sheet, paper or digital, with the fields that are easy to forget under time pressure: operator, instrument ID, reagent lot numbers, sample map, planned start time, acceptance criteria, and where the raw export is expected to land.

Pre-label the run with a unique run ID before any data exists. Put that ID on the bench sheet, in the notebook entry title, and in the temporary holding folder if the instrument writes to a local PC. If the assay tends to generate multiple exports, note that upfront. For example: raw image set, instrument method file, CSV export, and analysis output. Teams often preserve the CSV and forget the method file that explains how the instrument produced it.

For assays with busy hands, decide the capture method in advance. Voice notes, a scribe, a paper timing sheet, or a phone parked outside the splash zone all work if the team agrees on one method and uses it consistently.

During the experiment

Record events that change interpretation, not just steps that went as planned. Actual start and stop times matter. So do reagent swaps, bubbles in a key well, a centrifuge restart, a clogged tip, or a pause because the instrument queue backed up.

Use short entries tied to event time. Long narrative summaries written an hour later tend to clean up the messy parts that matter most.

A useful pattern is:

Time
What happened
What was affected
Immediate decision

Example:

10:14
Wash step extended because line pressure dropped
Plate 24A, all wells
Incubation adjusted by 2 minutes and deviation flagged for review

That format helps later when someone is checking whether the deviation was cosmetic or whether it invalidated part of the run.

After the run

Do a first-pass triage before anyone starts analysis. Confirm which files are raw source files, which files are machine-generated reports, and which files are analyst outputs. If the instrument produces files with cryptic names, create an index note in the project folder that maps the original filenames to the run ID and sample set. Renaming copies is fine if your lab allows it. Keep the untouched original export as the source record.

Package context with the data while the run is still fresh. Add the sample map, method version used, instrument settings if they are not embedded in the raw file, and any deviation note that would matter to a reviewer six months from now. A short README is enough if it answers two questions clearly: what is in this folder, and what would surprise another scientist about this run?

If part of the run failed, split the package deliberately. Mark the affected subset as invalid or excluded, state why, and keep it with the rest of the run rather than deleting it. Deleting failed data creates bigger problems than keeping it with a clear explanation.

Common failure points and how to handle them

The instrument PC is offline
Use a temporary transfer log. Record the export time, local file path, operator initials, and the time the files were copied to approved storage. If network sync happens later, note that too. Instrument PCs are a common weak point because they fall outside normal backup routines.

The instrument software overwrites the previous export
Create a manual hold folder for each run before export starts. If the software cannot change naming behavior, the operator has to change the destination every time. This is tedious and still cheaper than reconstructing a lost run.

Half the run is invalid because of a deviation
Do not relabel the whole experiment as failed. Mark exactly which samples, wells, lanes, or timepoints were affected, and why. Keep unaffected data active if the science supports that decision. The record should show the boundary of the problem.

A trainee processed the data before raw files were moved
Stop and recover the raw export first. If recovery is not possible, document that gap plainly in the run record and treat the processed file as weaker evidence. This comes up more often than teams admit.

Two people touched the same dataset
Assign one person to finalize the archive package. Shared editing is where version confusion starts. Review can be shared. Final packaging should not be.

End of day closeout

Use a closeout that tests retrieval, not just presence.

Open at least one raw file from the approved storage location
Confirm the run ID matches the notebook entry and sample map
Check that any excluded data is labeled, not deleted
Confirm the method file or settings record is present
Save the final summary or signed notebook page in the archive package
Record who completed closeout and when

That last line matters. In practice, unowned closeout tasks drift to the next day and then to the end of the week.

A defensible archive package lets another scientist answer three questions without chasing the original operator: what happened, what files are primary, and what should be treated with caution.

If your biggest documentation gap happens at the bench, Verbex is worth a look. It’s a voice-first lab notebook app for iPhone that lets scientists capture observations by voice during active experiments, structure them into ELN-style sections, preserve timestamps, document timer events, review the transcript, and export a clean PDF. Everything runs on-device, so no data leaves the phone. That makes it a practical fit for labs that need contemporaneous records without adding cloud exposure or more typing to an already busy workflow.