Generic CSV Parser Template (Python)

Compatibility: API version 2

Many different types of CSV files are attached to BaseSpace Clarity LIMS. This example provides a template for a script that can parse a wide range of simple CSV files into Clarity LIMS. The user can change parameters to read the format of the file to be parsed.

The Lab Instrument Tool Kit includes the parseCSV script, which allows for parsing of CSV files. However, this tool has strict limitations in its strategy for mapping data from the file to corresponding samples in Clarity LIMS. For information on the parseCSV script, refer to the Clarity LIMS Integrations and Tool Kits documentation.

Solution

Protocol step configuration

CSV files are attached to a protocol step. Artifact UDFs, where data will be written to, need to be configured for artifacts and for the protocol step.

Parameters

The script accepts the following parameters:

-r

The luid of the result file where the csv or txt file is attached. (Required)

-u

The username of the current user (Required)

-p

The password of the current user (Required)

-s

The URI of the step that launches the script - the {stepURI:v2:http} token (Required)

An example of the full syntax to invoke the script is as follows:

/usr/bin/python /opt/gls/clarity/customextensions/genericParser.py -u {username} -p {password} -s {stepURI:v2} -r {compoundOutputFileLuid0}

About the Code

The script contains an area with a number of configurable variables. This allows a FAS or bioinformatician to customize the script to parse their specific txt file. The following variables within the script are configurable:

MAPPING MODE

What will the script use to map the measurements to the artifacts in LIMS?

artifactUDFMap

A Python dictionary where the key is the name of a column in the txt file, and the value is a UDF in Clarity LIMS.

delim

How is the file delimited? (ex. ',' for .commas or '\t' for tabs)

MAPPING MODES

There are many attributes of samples which can be used to map the data in the text file with the corresponding derived samples in Clarity LIMS. The script should be configured such that one of these modes is set to True.

Three modes are available:

MapTo_ArtifactName

The data will be associated with the names of the output artifacts for the given step.

MapTo_WellLocation

The data will be associated with the well locations of the output artifacts for the given step.

MapTo_UDFValue

The data will be associated with the value of a specified UDF of the output artifacts.

For any of the three modes, a mapping column value must be explicitly given. The value is the index of the column containing the mapping data (either artifact name, well location, or UDF value).

If using the mode MapTo_UDFValue, a UDFName must be given. This is the name of the UDF in clarity which will be used to match the value found in the mapping column.

artifactUDFMap

artifactUDFMap = {
    "Concentration" : "Concentration",
    "Avg. Size" : "Average Size"
}

This Python dictionary maps the name of columns in the txt file to artifact UDFs for the outputs of the step. The data from these columns in the file will be written to these UDFs for the output artifacts. The dictionary can contain an unlimited number of UDFs. The dictionary keys, (left side), are the names of the columns in the txt file, and the dictionary values, (right side), are the names of the UDFs as configured for the artifacts.

Assumptions and Notes

  • You are running a version of Python that is supported by Clarity LIMS, as documented in the Clarity LIMS Technical Requirements.

  • The attached files are placed on the LIMS server, in the /opt/gls/clarity/customextensions folder.

  • The example code is provided for illustrative purposes only. It does not contain sufficient exception handling for use 'as is' in a production environment.

Attachments

genericParser.py:

glsfileutil.py:

Last updated