Clarity LIMS
Illumina Connected Software
Clarity LIMS Software
Clarity LIMS Software
  • Announcements
  • Clarity LIMS
    • Clarity & LabLink
  • API and Database
    • API Portal
      • REST
        • REST General Concepts
        • REST Web Services
        • HTTP Response Codes and Errors
        • XML UTF-8 Character Encoding
        • Requesting API Version Information
        • Viewing Paginated List Resources
        • Filtering List Resources
        • Working with User-Defined Fields (UDF) and Types (UDT)
        • Traversing a Genealogy
        • Working with Batch Resources
      • Getting Started with API
        • Understanding API Terminology (LIMS v5 and later)
        • API-Based URIs (LIMS v4 and later)
        • Development Prerequisites
        • Structure of REST Resources
        • The Life Cycle of a Sample: Stages Versus Steps
        • Integrating Scripts
      • Automation
        • Automation Triggers and Command Line Calls
        • Automation Execution Environment
        • Supported Command Line Interpreters
        • Automation Channels
        • Error Handling
        • Automation Tokens
          • Derived Sample Automation Tokens
          • Step Automation Tokens
          • Project Automation Tokens
        • Automation Testing
        • Troubleshooting Automation
      • Tips and Tricks
        • Accessing Step UDFs from a different Step
        • Obfuscating Sensitive Data in Scripts
        • Integrating Clarity LIMS with Upstream Sample Accessioning Systems
        • Creating Samples and Projects via the API
        • Displaying Files From an Earlier Step
        • Transitioning Output Artifacts into the Next Step
        • Determining the Workflow(s) to Which a Sample is Assigned
        • Standardizing Sample Naming via the API
        • Copying UDF Values from Source to Destination
        • Updating Preset Value of a Step UDF through API
        • Automating BCL Conversion
        • Finding QC Flags in Aggregate QC (Library Validation) via REST API
        • Setting the Value of a QC Flag on an Artifact
        • Creating Notifications When Files are Added via LabLink
        • Remote HTTP Filestore Setup
      • Cookbook
        • Get Started with the Cookbook
          • Tips and Troubleshooting
          • Obtain and Use the REST API Utility Classes
        • Work with EPP/Automation and Files
          • Automation Trigger Configuration
          • Process Execution with EPP/Automation Support
        • Work with Submitted Samples
          • Adding Samples to the System
          • Renaming Samples
          • Assigning Samples to Workflows
          • Updating Sample Information
          • Show the Relationship Between Samples and Analyte Artifacts (Derived Samples)
        • Work with Containers
          • Add an Empty Container to the System
          • Find the Contents of a Well Location in a Container
          • Filter Containers by Name
        • Work with Derived Sample Automations
          • Remove Samples from Workflows
          • Requeue Samples
          • Rearray Samples
        • Work with Process/Step Outputs
          • Update UDF/Custom Field Values for a Derived Sample Output
          • Rename Derived Samples Using the API
          • Find the Container Location of a Derived Sample
          • Traverse a Pooled and Demultiplexed Sample History/Genealogy
          • View the Inputs and Outputs of a Process/Step
        • Work with Projects and Accounts
          • Remove Information from a Project
          • Add a New Project to the System with UDF/Custom Field Value
          • Get a Project Name
          • Find an Account Registered in the System
          • Update Contact (User and Client) Information
        • Work with Multiplexing
          • Find the Index Sequence for a Reagent Label
          • Demultiplexing
          • Pool Samples with Reagent Labels
          • Apply Reagent Labels with REST
          • Apply Reagent Labels When Samples are Imported
          • Apply Reagent Labels by Adding Reagents to Samples
        • Working with User Defined Fields/Custom Fields
          • About UDFs/Custom Fields and UDTs
          • Performing Post-Step Calculations with Custom Fields/UDFs
        • Work with Processes/Steps
          • Filter Processes by Date and Type
          • Find Terminal Processes/Steps
          • Run a Process/Step
          • Update UDF/Custom Field Information for a Process/Step
          • Work with the Steps Pooling Endpoint
        • Work with Batch Resources
          • Introduction to Batch Resources
          • Update UDF/Custom Field Information with Batch Operations
          • Retrieve Multiple Entities with a Single API Interaction
          • Select the Optimal Batch Size
        • Work with Files
          • Attach a File with REST and Python
          • Attach Files Located Outside the Default File Storage Repository
          • Attach a File to a File Placeholder with REST
        • Work with Controls
          • Automated Removal of Controls from a Workflow
      • Application Examples
        • Python API Library (glsapiutil.py) Location
        • Scripts That Help Automate Steps
          • Route Artifacts Based Off a Template File
          • Invoking bcl2fastq from BCL Conversion and Demultiplexing Step
          • Email Notifications
          • Finishing the Current Step and Starting the Next
          • Adding Downstream Samples to Additional Workflows
          • Advancing/Completing a Protocol Step via the API
          • Setting a Default Next Action
          • Automatic Placement of Samples Based on Input Plate Map (Multiple Plates)
          • Automatic Placement of Samples Based on Input Plate Map
          • Publishing Files to LabLink
          • Automatic Pooling Based on a Sample UDF/Custom Field
          • Completing a Step Programmatically
          • Automatic Sample Placement into Existing Containers
          • Routing Output Artifacts to Specific Workflows/Stages
          • Creating Multiple Containers / Types for Placement
          • Starting a Protocol Step via the API
          • Setting Quality Control Flags
          • Applying Indexing Patterns to Containers Automatically
          • Assignment of Sample Next Steps Based On a UDF
          • Parsing Metadata into UDFs (BCL Conversion and Demultiplexing)
        • Scripts That Validate Step Contents
          • Validating Process/Step Level UDFs
          • Checking That Containers Are Named Appropriately
          • Checking for Index Clashes Based on Index Sequence
          • Validating Illumina TruSeq Index Adapter Combinations
        • Scripts Triggered Outside of Workflows/Steps
          • Repurposing a Process to Upload Indexes
          • Adding Users in Bulk
          • Moving Reagent Kits & Lots to New Clarity LIMS Server
          • Programatically Importing the Sample Submission Excel File
          • Generating an MS Excel Sample Submission Spreadsheet
          • Assigning Samples to New Workflows
        • Miscellaneous Scripts
          • Illumina LIMS Integration
          • Generating a Hierarchical Sample History
          • Protocol-based Permissions
          • Self-Incremental Counters
          • Generic CSV Parser Template (Python)
          • Renaming Samples to Add an Internal ID
          • Creating Custom Sample Sheets
          • Copying Output UDFs to Submitted Samples
          • Parsing Sequencing Meta-Data into Clarity LIMS
          • Submit to a Compute Cluster via PBS
          • Downloading a File and PDF Image Extraction
        • Resources and References
          • Understanding LIMS ID Prefixes
          • Container States
          • Useful Tools
          • Unsupported Artifact Types
          • Unsupported Process Types
          • Suggested Reading
          • API Training Videos
  • Illumina Preset Protocols
    • IPP v2.10
      • Release Notes
      • Installation and User Configuration
      • Manual Upgrade
    • IPP v2.9
      • Release Notes
      • Installation and User Configuration
    • IPP v2.8
      • Release Notes
      • Installation and User Configuration
      • Manual Upgrade
    • IPP v2.7
      • Release Notes
      • Installation and User Configuration
    • IPP v2.6
      • Release Notes
      • Installation and User Configuration
      • Manual Upgrade
  • Sample Prep
    • QC and Sample Prep
      • DNA Initial QC 5.1.2
      • RNA Initial QC 5.1.2
      • Library Validation QC 5.1.2
  • Library Prep
    • AmpliSeq for Illumina
      • BRCA Panel
        • Library Preparation v1.1
        • Equalizer v1.1
        • Standard v1.1
      • Cancer HotSpot Panel v2
        • Library Preparation v1.1
        • Equalizer v1.1
        • Standard v1.1
      • Childhood Cancer Panel
        • DNA Library Prep v1.1
        • RNA Library Prep v1.1
        • Equalizer v1.1
        • Standard v1.1
      • Comprehensive Cancer Panel
        • Library Preparation v1.1
        • Equalizer v1.1
        • Standard v1.1
      • Comprehensive Panel v3
        • DNA Library Prep v1.1
        • RNA Library Prep v1.1
        • Equalizer v1.1
        • Standard v1.1
      • Custom DNA Panel
        • Library Preparation v1.1
        • Equalizer v1.1
        • Standard v1.1
      • Focus Panel
        • DNA Library Prep v1.1
        • RNA Library Prep v1.1
        • Equalizer v1.1
        • Standard v1.1
      • Immune Repertoire Panel
        • RNA Library Prep v1.1
        • Equalizer v1.1
        • Standard v1.1
      • Immune Response Panel
        • RNA Library Prep v1.1
        • Equalizer v1.1
        • Standard v1.1
      • Myeloid Panel
        • DNA Library Prep v1.1
        • RNA Library Prep v1.1
        • Equalizer v1.1
        • Standard v1.1
      • TCR beta-SR Panel
        • DNA Library Prep v1.1
        • RNA Library Prep v1.1
      • Transcriptome Human Gene Expression Panel
        • RNA Library Prep v1.1
        • Equalizer v1.1
        • Standard v1.1
    • Library Prep Validation
    • Nextera
      • Nextera Mate Pair v1.0
      • Nextera Rapid Capture Custom Enrichment v2.0
      • Nextera XT v2.0
    • Targeted Enrichment
      • Illumina DNA Prep with Enrichment (S) Tagmentation v1.2
      • Illumina RNA Prep with Enrichment (L) Tagmentation v1.1
    • TruSeq
      • TruSeq ChIP-Seq v1.0
      • TruSeq Custom Amplicon v1.0
      • TruSeq DNA Exome v2.0
      • TruSeq DNA PCR-Free v2.0
      • TruSeq Methyl Capture EPIC v2.0
      • TruSeq Nano DNA v1.0
      • TruSeq RNA Access v2.0
      • TruSeq RNA Exome v1.0
      • TruSeq Small RNA v1.0
      • TruSeq Stranded mRNA v2.0
    • TruSight
      • TruSight Oncology 500 ctDNA v1.1
      • TruSight Oncology 500 HT v1.1
      • TruSight Oncology 500 v1.1
      • TruSight Tumor 170 v2.0
    • Other DNA Protocols
      • Illumina DNA PCR-Free Library Prep Manual v1.1
      • Illumina DNA Prep (M) Tagmentation v1.0
    • Other RNA Protocols
      • Illumina Stranded mRNA Prep Ligation 1.1
      • Illumina Stranded Total RNA Prep Ligation with Ribo-Zero Plus v1.1
  • iLASS & Infinium Arrays
    • iLASS
      • iLASS Infinium Genotyping v1.1
        • iLASS Infinium Batch DNA v1.1
        • iLASS Infinium Genotyping Assay v1.1
        • iLASS Infinium Genotyping with PGx Assay v1.1
      • iLASS Infinium Genotyping v1.0
        • iLASS Infinium Genotyping Assay v1.0
        • iLASS Infinium Genotyping with PGx Assay v1.0
    • Infinium Arrays
      • Infinium HD Methylation Assay Manual v1.2
      • Infinium HTS Assay Manual v1.2
      • Infinium LCG Assay Manual v1.2
      • Infinium XT Assay Manual v1.2
      • GenomeStudio v1.0
  • Applications
    • IGA
      • IGA v2.1
        • IGA Library Prep Automated v2.1
        • IGA NovaSeq Sequencing v2.1
    • Viral Pathogen Protocols
      • CDC COVID-19 RT-PCR
        • Sort Specimens to Extraction v1.1
        • Qiagen QIAamp DSP Viral RNA Mini Kit v1.1
        • Qiagen EZ1 Advanced XL v1.1
        • Roche MagNA Pure LC v1.1
        • Roche MagNA Pure Compact v1.1
        • Roche MagNA Pure 96 v1.1
        • bioMerieux NucliSENS easyMAG Instrument v1.1
        • bioMerieux EMAG Instrument v1.1
        • Real-Time RT-PCR Prep v1.1
      • Illumina COVIDSeq v1.6
      • Respiratory Virus Panel v1.0
  • Instruments & Integrations
    • Compatibility
    • Integration Properties
      • Integration Properties Details
    • Clarity LIMS Product Analytics
      • Supported Workflows
      • Workflow Customization
      • Clarity LIMS Product Analytics v1.4.0
        • Configuration
      • Clarity LIMS Product Analytics v1.3.1
        • Configuration
      • Clarity LIMS Product Analytics v1.3.0
        • Configuration
      • Clarity LIMS Product Analytics v1.2.0
        • Configuration
    • Illumina Run Manager
      • Illumina Run Manager v1.0.0
        • Installation and User Interaction
    • iScan
      • iScan System
      • iScan v1.2.0
        • Release Notes
        • BeadChip Accessioning, Imaging, and Analysis
      • iScan v1.1.0
        • Release Notes
        • BeadChip Accessioning, Imaging, and Analysis
      • iScan System v1.0
    • iSeq 100 Run Setup v1.0
    • MiniSeq v1.0
    • MiSeq
      • MiSeq v8.3.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
      • MiSeq v8.2.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
        • Manual Upgrade
    • MiSeq i100 (On-Prem)
      • MiSeq i100 On-Prem v1.0.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
    • MiSeq i100 (Hosted)
      • MiSeq i100 v1.0.0
        • Release Notes
        • Configuration
        • User Interaction, Validation and Troubleshooting
    • MiSeqDx
      • MiSeqDx Sample Sheet Generation (v1.11.0 and later)
      • MiSeqDx v1.11.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
      • MiSeqDx v1.10.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
        • Sample Sheet Generation
        • Manual Upgrade
    • Next Generation Sequencing Package
      • Release Notes
        • NGS Extensions v5.25.0
        • NGS Extensions v5.24.0
        • NGS Extensions v5.23.0
      • Accession Kit Lots
      • Auto-Placement of Reagent Indexes
      • Compute Replicate Average
      • Copy UDFs
      • Initialize Artifact UDFs
      • Label Non-Labeled Outputs
      • Linear Regression Calculation
      • Normalization Buffer Volumes
      • Process Summary Report
      • Routing Script
      • Set UDF
      • Validate Complete Plate
      • Validate Sample Count
      • Validate Unique Indexes
    • NextSeq 500/550
      • NextSeq 500/550 v2.5.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
        • Manual Upgrade
      • NextSeq 500/550 v2.4.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
      • NextSeq 500/550 v2.3.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
    • NextSeq 1000/2000 (Hosted)
      • NextSeq 1000/2000 v2.5.1
        • Release Notes
      • NextSeq 1000/2000 v2.5.0
        • Release Notes
        • Configuration
        • User Interaction, Validation and Troubleshooting
        • Manual Upgrade
      • NextSeq 1000/2000 v2.4.0
        • Release Notes
        • Configuration
        • User Interaction, Validation and Troubleshooting
    • NextSeq 1000/2000 (On-Prem)
      • NextSeq 1000/2000 On-Prem v1.0.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
    • NovaSeq 6000 (API-based)
      • NovaSeq 6000 API-based v3.7.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
      • NovaSeq 6000 API-based v3.6.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
        • Manual Upgrade
    • NovaSeq 6000 (File-based)
      • NovaSeq 6000 File-based v2.6.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
      • NovaSeq 6000 File-based v2.5.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
    • NovaSeq 6000Dx (API-based)
      • NovaSeq 6000Dx API-based v1.3.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
      • NovaSeq 6000Dx API-based v1.2.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
    • NovaSeq X Series (Hosted)
      • NovaSeq X Series v1.3.0
        • Release Notes
        • Configuration
        • Manual Upgrade
      • NovaSeq X Series v1.2.1
        • Release Notes
      • NovaSeq X Series v1.2.0
        • Release Notes
        • Configuration
        • User Interaction, Validation and Troubleshooting
        • Manual Upgrade
      • NovaSeq X Series v1.1.0
        • Release Notes
        • Configuration
        • User Interaction, Validation and Troubleshooting
    • NovaSeq X Series (On-Prem)
      • NovaSeq X Series On-Prem v1.0.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
    • References
      • Configure Multiple Identical netPathPrefixSearch Values
      • Configure Support for Samples Having Duplicate Names with Different Indexes
      • Illumina Instrument Sample Sheets
      • Terminology
  • Integration Toolkits
    • Lab Instrument Toolkit
      • Template File Generator
        • Creating Template Files
        • Template File Contents
        • Template File Generator Troubleshooting
      • Add Blank Lines
      • Convert CSV to Excel
      • Parse CSV
      • Name Matching XML Parser
      • Sample Placement Helper
    • Lab Logic Toolkit
      • Working with Lab Logic Toolkit
        • Data Collection Entities
        • Failing a Script
        • Mapping Field Types
        • Non-UDF/Custom Field Properties
        • Setting QC Flags
        • Setting Next Actions
        • Specifying Custom Fields
        • Working with Submitted Samples
        • Working with Containers
      • Lab Logic Toolkit Script Examples
        • Comparing Stop/Start Dates and Times with LLTK
      • Lab Logic Toolkit FAQ
  • Known Issues
    • Integration
      • Sample Sheet Generation Issue and CLPA Issues When Samples Have Been Assigned QC Flag Prior to Entering Steps
  • Security Bulletin
    • Investigation of OpenSSH vulnerability with Clarity LIMS
  • Resources
    • Third Party Software Information
  • Others
    • Revision History
Powered by GitBook
On this page
  • Solution
  • User Interaction
  • About the code
  • Assumptions and Notes
  • Attachments

Was this helpful?

Export as PDF
  1. API and Database
  2. API Portal
  3. Application Examples
  4. Miscellaneous Scripts

Parsing Sequencing Meta-Data into Clarity LIMS

PreviousCopying Output UDFs to Submitted SamplesNextSubmit to a Compute Cluster via PBS

Last updated 10 months ago

Was this helpful?

Once a sequencing run has occurred, there is often a requirement to store the locations of the FASTQ / BAM files in Clarity LIMS.

For paired-end sequencing, it is likely that the meta-data file that describes the locations of these files will contain two rows for each sample sequenced: one for the first read, and another for the second read.

Such a file is illustrated below:

Parsing_sequencing_meta_data_ExSeq4.png

Column 2 of the file, Sample ID, contains the LIMS IDs of the artifacts for which we want to store the FASTQ file values listed in column 3 (Fastq File).

This example discusses the strategy for parsing and storing data against process inputs, when that data is represented by multiple lines in a data file.

Solution

The attached script will parse a data file containing multiple lines of FASTQ file values, and will store the locations of those FASTQ files in user-defined fields.

Process configuration

In this example, the process is configured to have a single shared ResultFile output.

Parameters

The EPP command is configured to pass the following parameters:

-l

The limsid of the process invoking the script (Required)

The {processLuid} token

-u

The username of the current user (Required)

The {username} token

-p

The password of the current user (Required)

The {password} token

-s

The URI of the step that launches the script (Required)

The {stepURI:v2:http} token

-i

The limsid of the shared ResultFile output (Required)

The {compoundOutputFileLuid0} token

An example of the full syntax to invoke the script is as follows:

/usr/bin/python /opt/gls/clarity/customextensions/parseMetadata.py -l 24-9953 -u admin -p securepassword -s http://192.168.9.123:8080/api/v2/steps/24-9953 -i 92-20553

User Interaction

The user-interaction is comprised of the following steps:

Step 1

The user runs the process up to the Record Details screen as shown in the following image. Note that initially:

  • The Sequencing meta-data file is still to be uploaded.

  • The values for the R1 Filename and R2 Filename fields are empty.

Step 2

The user clicks Upload file and attaches the meta-data file. Once attached, the user's screen will resemble this:

Step 3

Now that the meta-data file is attached, the user clicks Parse Meta-data File. This invokes the parsing script.

If parsing was successful, the user's screen will resemble Figure 4 below.

Note that the values for the R1 Filenames and R2 Filenames have been parsed from the file and will be stored in Clarity LIMS.

About the code

The key methods of interest are main(), parseFile() and fetchFile(). The main() method calls parseFile(), which in turn calls fetchFile().

fetchFile() method

The fetchFile() method relies upon the fact that the script is running on the Clarity LIMS server, and as such has access to the local file system in which the file (uploaded in Step 2) now resides.

Thus, fetchFile() can use the API to:

  1. Convert the LIMSID of the file to the location on disk.

  2. Copy the file to the local working directory, ready to be parsed by parseFile().

parseFile() method

  1. The parseFile() method creates two data structures that are used in the subsequent code within the script:

    • The COLS dictionary has the column names from the first line of the file as its key, and the index of the column as the value.

    • The DATA array contains each subsequent line of the file as a single element. Note that this parsing logic is overly-simplistic, and would need to be supplemented in a production environment. For example, if the CSV file being parsed does not have the column names in the first row, then exceptions would likely occur. Similarly, we assume the file being parsed is CSV, and likewise any data elements which themselves contain commas would likely cause a problem. For the sake of clarity such exception handling has been omitted from the script.

  2. Once parseFile() has executed successfully the inputs to the process that has invoked the script are gathered, and 'batch' functionality is used to gather all of the artifacts in a single batch-retrieve transaction.

  3. All that remains is to step through the elements within the DATA array, and for each line, gather the values of the Fastq File and Sample ID columns. For each Sample ID value:

    • The corresponding artifact is retrieved.

    • Depending on whether the value of the Fastq File column represents the filename for the first or second read, either the R1 Filename or the R2 Filename user-defined field is updated.

  4. Once the modified artifacts have been saved, the values will display in the Clarity LIMS Web Interface.

Assumptions and Notes

  • Both of the attached files are placed on the Clarity LIMS server, in the /opt/gls/clarity/customextensions folder.

  • You will need to update the HOSTNAME global variable such that it points to your Clarity LIMS server.

  • The example code is provided for illustrative purposes only. It does not contain sufficient exception handling for use 'as is' in a production environment.

Attachments

parseMetadata.py:

Parsing_sequencing_meta_data_Fig2.png
Parsing_sequencing_meta_data_Fig3.png
Parsing_sequencing_meta_data_Fig4.png
4KB
parseMetadata.py