Traverse a Pooled and Demultiplexed Sample History/Genealogy

The large capacity of current Next Generation Sequencing (NGS) instruments means that labs are able to perform multiplexed experiments with multiple samples pooled into a single lane or region of the container. Before being pooled, samples are assigned a unique tag or index. After sequencing and initial analysis are complete, the sequencing results must be demultiplexed to separate data and relate the results back to each individual sample.

Clarity LIMS allows you to track a multiplexing workflow by adding reagents and reagent labels to artifacts, and then using the reagent labels to demultiplex the resulting files.

There are several ways to apply reagent labels. However, all methods involve creating placeholders that link the final sequences back to the original submitted samples. Either the lab scientist or an automated process must determine which file actually belongs with which placeholder. For more information on applying reagent labels, refer to Work with Multiplexing.

This example walks through assigning user-defined field (UDF)/custom field values to the demultiplexed output files based on upstream derived sample (analyte) UDF/custom field values. This includes upwards traversal of a sample history / genealogy, based on assigned reagent labels. This differs from upstream traversal based strictly upon process input-output mappings.

As of Clarity LIMS v5, the term user-defined field (UDF) has been replaced with custom field in the user interface. However, the API resource is still called UDF.

There are two types of custom fields:

Master step fields—Configured on master steps. Master step fields only apply to the following:
- The master step on which the fields are configured.
- The steps derived from those master steps.
Global fields—Configured on entities (eg, submitted sample, derived sample, measurement, etc.). Global fields apply to the entire Clarity LIMS system.

Prerequisites

If you are using Clarity LIMS v5 or later, make sure you have completed the following actions:

Created a project and have added multiple samples to it.
Run the samples through a sequence of steps that perform the following:
- Reagent addition / reagent label assignment
- Pooling
- Demultiplexing (to produce a set of per-reagent-label result file outputs).
Set a Numeric custom field value on each derived sample input to the reagent addition process.
A Numeric custom field with no assigned value exists on each of the per-reagent-label result file outputs. The value of this field will be computed from the set of upstream derived sample custom field values corresponding to the reagent label of the result file.

You also must make sure that API v2 r21 or later is installed.

Code Example

Due to the complexity of NGS workflows, beginning at the top level submitted sample resource and working down to the result file is not the most efficient way to traverse the sample history/genealogy. It is easier to start with the result file artifact, and then trace upward to find the process with the UDFs/custom fields that you are looking for.

Starting from the per-reagent-label result file, you can traverse upward in the sample history using the parent process URI in the XML returned for each artifact. At each level of the sample history, the number of artifacts returned may increase due to processes that pooled individual artifacts.

In this example:

The upstreamArtifactLUIDs list represents the current set of relevant artifacts.
The foundUpstreamArtifactNodes list stores the target upstream artifact nodes found.
The sample history traversal stops at the inputs to the process that performed the reagent addition/reagent label assignment.

targetDownstreamArtifactNode = GLSRestApiUtils.httpGET(artifactsListURI + artifactLUID, username, password)
    targetReagentLabel = targetDownstreamArtifactNode.'reagent-label'[0]?.'@name'
    if (!targetReagentLabel) {
        println "Specified artifact should contain at least one reagent-label.  Skipping ${artifactLUID}..."
        continue
    }
    // At each upstream level of the workflow the number of searched artifacts may increase due to Pooling processes
    upstreamArtifactLUIDs = [ artifactLUID ]
    /*
     * This 'stack' will store all upstream artifacts that serve as input to an 'Add Multiple Reagents' process
     * which are subsequently assigned the 'target' Reagent Label by this process.
     */
    foundUpstreamArtifactNodes = []

The traversal is executed using a while loop over the contents of the upstreamArtifactLUIDs list.

The list serves as a stack of artifacts. With each iteration of the loop, an artifact is removed from the end of the list and the relevant input artifacts to its parent process are pushed back onto the end of the list.

while (!upstreamArtifactLUIDs.isEmpty()) {
        currentArtifactLUID = upstreamArtifactLUIDs.pop()
        currentArtifactNode = GLSRestApiUtils.httpGET(artifactsListURI + currentArtifactLUID, username, password)
        /*
         * Upstream traversal will stop when either an artifact is found that does not have reagent label(s) assigned
         * (i.e. the artifact is the input to a process that adds reagents and reagent labels), or a root artifact is found.
         * At this point, the current artifact is added to the list of 'found' upstream artifact nodes.
         */
        if (currentArtifactNode.'reagent-label'.isEmpty() || currentArtifactNode.'parent-process'.isEmpty()) {
            foundUpstreamArtifactNodes += [currentArtifactNode]
        } else if (currentArtifactNode.'reagent-label'.collect { it.'@name' }.contains(targetReagentLabel)) {
            /*
             * If the current artifact contains the 'target' reagent label, continue traversing upstream.
             * Get the artifact's parent process
             */
            parentProcessURI = currentArtifactNode.'parent-process'[0].@uri
            parentProcessNode = GLSRestApiUtils.httpGET(parentProcessURI, username, password)
            // Find all input-output maps for the parent process
            parentProcessNode.'input-output-map'.each {
                ioMapInputLUID = it.'input'[0].@limsid
                ioMapOutputLUID = it.'output'[0].@limsid
                // Push all process input artifacts that have the current artifact as the mapped process output onto the 'stack'
                if( ioMapOutputLUID == currentArtifactLUID && !upstreamArtifactLUIDs.contains(ioMapInputLUID) ) {
                    upstreamArtifactLUIDs.push(ioMapInputLUID)
                }
            }
        }
    }

After the loop has executed, the foundUpstreamArtifactNodes list will contain all of the artifacts that are assigned the reagent label of interest upon execution of the next process in the sample history.

The final step in the script assigns a value to a Numeric UDF / custom field on the per-reagent-label output result file, Mean DNA Prep 260:280 Ratio, by computing the mean value of a Numeric UDF / custom field on each of the foundUpstreamArtifactNodes, DNA prep 260:280 ratio.

First, compute the mean using the following example:

/*
 * Compute the 'Mean DNA Prep 260:280 Ratio' for all upstream analyte artifacts that have the
 * target Reagent Label applied.  The assumption here is that the 'DNA prep 260:280 ratio' UDF
 * is set on analytes that serve as input to an 'Add Multiple Reagents' process that assigns Reagent Labels.
 */
avgAcrossUpstreamArtifacts = foundUpstreamArtifactNodes.collect {
    foundUdf = it.'udf:field'.find{ it.'@name' == upstreamArtifactUdfToMine }
    return foundUdf ? foundUdf.value()[0] as double : 0.0
}.sum()/foundUpstreamArtifactNodes.size()

Then, set the UDF/custom field on the per-reagent-label output result file using the following example:

// Set the computed mean on the 'Mean DNA Prep 260:280 Ratio' UDF on the target downstream ResultFile
targetDownstreamAritfactUDF = targetDownstreamArtifactNode.'udf:field'.find{ it.'@name' == downstreamArtifactUdfToUpdate }
if (targetDownstreamAritfactUDF) {
    targetDownstreamAritfactUDF.setValue(avgAcrossUpstreamArtifacts)
} else {
    targetDownstreamArtifactNode.appendNode('udf:field',
                    ['name':downstreamArtifactUdfToUpdate,
                     'xmlns:udf':'http://genologics.com/ri/userdefined'],
                     avgAcrossUpstreamArtifacts)
}

Attachments

TraversingPooledDemuxGenealogy.groovy:

5KB

TraversingPooledDemuxGenealogy.groovy

PreviousFind the Container Location of a Derived Sample NextView the Inputs and Outputs of a Process/Step

Last updated 6 months ago