Clarity LIMS
Illumina Connected Software
Clarity LIMS Software
Clarity LIMS Software
  • Announcements
  • Clarity LIMS
    • Clarity & LabLink
  • API and Database
    • API Portal
      • REST
        • REST General Concepts
        • REST Web Services
        • HTTP Response Codes and Errors
        • XML UTF-8 Character Encoding
        • Requesting API Version Information
        • Viewing Paginated List Resources
        • Filtering List Resources
        • Working with User-Defined Fields (UDF) and Types (UDT)
        • Traversing a Genealogy
        • Working with Batch Resources
      • Getting Started with API
        • Understanding API Terminology (LIMS v5 and later)
        • API-Based URIs (LIMS v4 and later)
        • Development Prerequisites
        • Structure of REST Resources
        • The Life Cycle of a Sample: Stages Versus Steps
        • Integrating Scripts
      • Automation
        • Automation Triggers and Command Line Calls
        • Automation Execution Environment
        • Supported Command Line Interpreters
        • Automation Channels
        • Error Handling
        • Automation Tokens
          • Derived Sample Automation Tokens
          • Step Automation Tokens
          • Project Automation Tokens
        • Automation Testing
        • Troubleshooting Automation
      • Tips and Tricks
        • Accessing Step UDFs from a different Step
        • Obfuscating Sensitive Data in Scripts
        • Integrating Clarity LIMS with Upstream Sample Accessioning Systems
        • Creating Samples and Projects via the API
        • Displaying Files From an Earlier Step
        • Transitioning Output Artifacts into the Next Step
        • Determining the Workflow(s) to Which a Sample is Assigned
        • Standardizing Sample Naming via the API
        • Copying UDF Values from Source to Destination
        • Updating Preset Value of a Step UDF through API
        • Automating BCL Conversion
        • Finding QC Flags in Aggregate QC (Library Validation) via REST API
        • Setting the Value of a QC Flag on an Artifact
        • Creating Notifications When Files are Added via LabLink
        • Remote HTTP Filestore Setup
      • Cookbook
        • Get Started with the Cookbook
          • Tips and Troubleshooting
          • Obtain and Use the REST API Utility Classes
        • Work with EPP/Automation and Files
          • Automation Trigger Configuration
          • Process Execution with EPP/Automation Support
        • Work with Submitted Samples
          • Adding Samples to the System
          • Renaming Samples
          • Assigning Samples to Workflows
          • Updating Sample Information
          • Show the Relationship Between Samples and Analyte Artifacts (Derived Samples)
        • Work with Containers
          • Add an Empty Container to the System
          • Find the Contents of a Well Location in a Container
          • Filter Containers by Name
        • Work with Derived Sample Automations
          • Remove Samples from Workflows
          • Requeue Samples
          • Rearray Samples
        • Work with Process/Step Outputs
          • Update UDF/Custom Field Values for a Derived Sample Output
          • Rename Derived Samples Using the API
          • Find the Container Location of a Derived Sample
          • Traverse a Pooled and Demultiplexed Sample History/Genealogy
          • View the Inputs and Outputs of a Process/Step
        • Work with Projects and Accounts
          • Remove Information from a Project
          • Add a New Project to the System with UDF/Custom Field Value
          • Get a Project Name
          • Find an Account Registered in the System
          • Update Contact (User and Client) Information
        • Work with Multiplexing
          • Find the Index Sequence for a Reagent Label
          • Demultiplexing
          • Pool Samples with Reagent Labels
          • Apply Reagent Labels with REST
          • Apply Reagent Labels When Samples are Imported
          • Apply Reagent Labels by Adding Reagents to Samples
        • Working with User Defined Fields/Custom Fields
          • About UDFs/Custom Fields and UDTs
          • Performing Post-Step Calculations with Custom Fields/UDFs
        • Work with Processes/Steps
          • Filter Processes by Date and Type
          • Find Terminal Processes/Steps
          • Run a Process/Step
          • Update UDF/Custom Field Information for a Process/Step
          • Work with the Steps Pooling Endpoint
        • Work with Batch Resources
          • Introduction to Batch Resources
          • Update UDF/Custom Field Information with Batch Operations
          • Retrieve Multiple Entities with a Single API Interaction
          • Select the Optimal Batch Size
        • Work with Files
          • Attach a File with REST and Python
          • Attach Files Located Outside the Default File Storage Repository
          • Attach a File to a File Placeholder with REST
        • Work with Controls
          • Automated Removal of Controls from a Workflow
      • Application Examples
        • Python API Library (glsapiutil.py) Location
        • Scripts That Help Automate Steps
          • Route Artifacts Based Off a Template File
          • Invoking bcl2fastq from BCL Conversion and Demultiplexing Step
          • Email Notifications
          • Finishing the Current Step and Starting the Next
          • Adding Downstream Samples to Additional Workflows
          • Advancing/Completing a Protocol Step via the API
          • Setting a Default Next Action
          • Automatic Placement of Samples Based on Input Plate Map (Multiple Plates)
          • Automatic Placement of Samples Based on Input Plate Map
          • Publishing Files to LabLink
          • Automatic Pooling Based on a Sample UDF/Custom Field
          • Completing a Step Programmatically
          • Automatic Sample Placement into Existing Containers
          • Routing Output Artifacts to Specific Workflows/Stages
          • Creating Multiple Containers / Types for Placement
          • Starting a Protocol Step via the API
          • Setting Quality Control Flags
          • Applying Indexing Patterns to Containers Automatically
          • Assignment of Sample Next Steps Based On a UDF
          • Parsing Metadata into UDFs (BCL Conversion and Demultiplexing)
        • Scripts That Validate Step Contents
          • Validating Process/Step Level UDFs
          • Checking That Containers Are Named Appropriately
          • Checking for Index Clashes Based on Index Sequence
          • Validating Illumina TruSeq Index Adapter Combinations
        • Scripts Triggered Outside of Workflows/Steps
          • Repurposing a Process to Upload Indexes
          • Adding Users in Bulk
          • Moving Reagent Kits & Lots to New Clarity LIMS Server
          • Programatically Importing the Sample Submission Excel File
          • Generating an MS Excel Sample Submission Spreadsheet
          • Assigning Samples to New Workflows
        • Miscellaneous Scripts
          • Illumina LIMS Integration
          • Generating a Hierarchical Sample History
          • Protocol-based Permissions
          • Self-Incremental Counters
          • Generic CSV Parser Template (Python)
          • Renaming Samples to Add an Internal ID
          • Creating Custom Sample Sheets
          • Copying Output UDFs to Submitted Samples
          • Parsing Sequencing Meta-Data into Clarity LIMS
          • Submit to a Compute Cluster via PBS
          • Downloading a File and PDF Image Extraction
        • Resources and References
          • Understanding LIMS ID Prefixes
          • Container States
          • Useful Tools
          • Unsupported Artifact Types
          • Unsupported Process Types
          • Suggested Reading
          • API Training Videos
  • Illumina Preset Protocols
    • IPP v2.10
      • Release Notes
      • Installation and User Configuration
      • Manual Upgrade
    • IPP v2.9
      • Release Notes
      • Installation and User Configuration
    • IPP v2.8
      • Release Notes
      • Installation and User Configuration
      • Manual Upgrade
    • IPP v2.7
      • Release Notes
      • Installation and User Configuration
    • IPP v2.6
      • Release Notes
      • Installation and User Configuration
      • Manual Upgrade
  • Sample Prep
    • QC and Sample Prep
      • DNA Initial QC 5.1.2
      • RNA Initial QC 5.1.2
      • Library Validation QC 5.1.2
  • Library Prep
    • AmpliSeq for Illumina
      • BRCA Panel
        • Library Preparation v1.1
        • Equalizer v1.1
        • Standard v1.1
      • Cancer HotSpot Panel v2
        • Library Preparation v1.1
        • Equalizer v1.1
        • Standard v1.1
      • Childhood Cancer Panel
        • DNA Library Prep v1.1
        • RNA Library Prep v1.1
        • Equalizer v1.1
        • Standard v1.1
      • Comprehensive Cancer Panel
        • Library Preparation v1.1
        • Equalizer v1.1
        • Standard v1.1
      • Comprehensive Panel v3
        • DNA Library Prep v1.1
        • RNA Library Prep v1.1
        • Equalizer v1.1
        • Standard v1.1
      • Custom DNA Panel
        • Library Preparation v1.1
        • Equalizer v1.1
        • Standard v1.1
      • Focus Panel
        • DNA Library Prep v1.1
        • RNA Library Prep v1.1
        • Equalizer v1.1
        • Standard v1.1
      • Immune Repertoire Panel
        • RNA Library Prep v1.1
        • Equalizer v1.1
        • Standard v1.1
      • Immune Response Panel
        • RNA Library Prep v1.1
        • Equalizer v1.1
        • Standard v1.1
      • Myeloid Panel
        • DNA Library Prep v1.1
        • RNA Library Prep v1.1
        • Equalizer v1.1
        • Standard v1.1
      • TCR beta-SR Panel
        • DNA Library Prep v1.1
        • RNA Library Prep v1.1
      • Transcriptome Human Gene Expression Panel
        • RNA Library Prep v1.1
        • Equalizer v1.1
        • Standard v1.1
    • Library Prep Validation
    • Nextera
      • Nextera Mate Pair v1.0
      • Nextera Rapid Capture Custom Enrichment v2.0
      • Nextera XT v2.0
    • Targeted Enrichment
      • Illumina DNA Prep with Enrichment (S) Tagmentation v1.2
      • Illumina RNA Prep with Enrichment (L) Tagmentation v1.1
    • TruSeq
      • TruSeq ChIP-Seq v1.0
      • TruSeq Custom Amplicon v1.0
      • TruSeq DNA Exome v2.0
      • TruSeq DNA PCR-Free v2.0
      • TruSeq Methyl Capture EPIC v2.0
      • TruSeq Nano DNA v1.0
      • TruSeq RNA Access v2.0
      • TruSeq RNA Exome v1.0
      • TruSeq Small RNA v1.0
      • TruSeq Stranded mRNA v2.0
    • TruSight
      • TruSight Oncology 500 ctDNA v1.1
      • TruSight Oncology 500 HT v1.1
      • TruSight Oncology 500 v1.1
      • TruSight Tumor 170 v2.0
    • Other DNA Protocols
      • Illumina DNA PCR-Free Library Prep Manual v1.1
      • Illumina DNA Prep (M) Tagmentation v1.0
    • Other RNA Protocols
      • Illumina Stranded mRNA Prep Ligation 1.1
      • Illumina Stranded Total RNA Prep Ligation with Ribo-Zero Plus v1.1
  • iLASS & Infinium Arrays
    • iLASS
      • iLASS Infinium Genotyping v1.1
        • iLASS Infinium Batch DNA v1.1
        • iLASS Infinium Genotyping Assay v1.1
        • iLASS Infinium Genotyping with PGx Assay v1.1
      • iLASS Infinium Genotyping v1.0
        • iLASS Infinium Genotyping Assay v1.0
        • iLASS Infinium Genotyping with PGx Assay v1.0
    • Infinium Arrays
      • Infinium HD Methylation Assay Manual v1.2
      • Infinium HTS Assay Manual v1.2
      • Infinium LCG Assay Manual v1.2
      • Infinium XT Assay Manual v1.2
      • GenomeStudio v1.0
  • Applications
    • IGA
      • IGA v2.1
        • IGA Library Prep Automated v2.1
        • IGA NovaSeq Sequencing v2.1
    • Viral Pathogen Protocols
      • CDC COVID-19 RT-PCR
        • Sort Specimens to Extraction v1.1
        • Qiagen QIAamp DSP Viral RNA Mini Kit v1.1
        • Qiagen EZ1 Advanced XL v1.1
        • Roche MagNA Pure LC v1.1
        • Roche MagNA Pure Compact v1.1
        • Roche MagNA Pure 96 v1.1
        • bioMerieux NucliSENS easyMAG Instrument v1.1
        • bioMerieux EMAG Instrument v1.1
        • Real-Time RT-PCR Prep v1.1
      • Illumina COVIDSeq v1.6
      • Respiratory Virus Panel v1.0
  • Instruments & Integrations
    • Compatibility
    • Integration Properties
      • Integration Properties Details
    • Clarity LIMS Product Analytics
      • Supported Workflows
      • Workflow Customization
      • Clarity LIMS Product Analytics v1.4.0
        • Configuration
      • Clarity LIMS Product Analytics v1.3.1
        • Configuration
      • Clarity LIMS Product Analytics v1.3.0
        • Configuration
      • Clarity LIMS Product Analytics v1.2.0
        • Configuration
    • Illumina Run Manager
      • Illumina Run Manager v1.0.0
        • Installation and User Interaction
    • iScan
      • iScan System
      • iScan v1.2.0
        • Release Notes
        • BeadChip Accessioning, Imaging, and Analysis
      • iScan v1.1.0
        • Release Notes
        • BeadChip Accessioning, Imaging, and Analysis
      • iScan System v1.0
    • iSeq 100 Run Setup v1.0
    • MiniSeq v1.0
    • MiSeq
      • MiSeq v8.3.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
      • MiSeq v8.2.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
        • Manual Upgrade
    • MiSeq i100 (On-Prem)
      • MiSeq i100 On-Prem v1.0.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
    • MiSeq i100 (Hosted)
      • MiSeq i100 v1.0.0
        • Release Notes
        • Configuration
        • User Interaction, Validation and Troubleshooting
    • MiSeqDx
      • MiSeqDx Sample Sheet Generation (v1.11.0 and later)
      • MiSeqDx v1.11.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
      • MiSeqDx v1.10.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
        • Sample Sheet Generation
        • Manual Upgrade
    • Next Generation Sequencing Package
      • Release Notes
        • NGS Extensions v5.25.0
        • NGS Extensions v5.24.0
        • NGS Extensions v5.23.0
      • Accession Kit Lots
      • Auto-Placement of Reagent Indexes
      • Compute Replicate Average
      • Copy UDFs
      • Initialize Artifact UDFs
      • Label Non-Labeled Outputs
      • Linear Regression Calculation
      • Normalization Buffer Volumes
      • Process Summary Report
      • Routing Script
      • Set UDF
      • Validate Complete Plate
      • Validate Sample Count
      • Validate Unique Indexes
    • NextSeq 500/550
      • NextSeq 500/550 v2.5.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
        • Manual Upgrade
      • NextSeq 500/550 v2.4.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
      • NextSeq 500/550 v2.3.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
    • NextSeq 1000/2000 (Hosted)
      • NextSeq 1000/2000 v2.5.1
        • Release Notes
      • NextSeq 1000/2000 v2.5.0
        • Release Notes
        • Configuration
        • User Interaction, Validation and Troubleshooting
        • Manual Upgrade
      • NextSeq 1000/2000 v2.4.0
        • Release Notes
        • Configuration
        • User Interaction, Validation and Troubleshooting
    • NextSeq 1000/2000 (On-Prem)
      • NextSeq 1000/2000 On-Prem v1.0.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
    • NovaSeq 6000 (API-based)
      • NovaSeq 6000 API-based v3.7.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
      • NovaSeq 6000 API-based v3.6.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
        • Manual Upgrade
    • NovaSeq 6000 (File-based)
      • NovaSeq 6000 File-based v2.6.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
      • NovaSeq 6000 File-based v2.5.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
    • NovaSeq 6000Dx (API-based)
      • NovaSeq 6000Dx API-based v1.3.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
      • NovaSeq 6000Dx API-based v1.2.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
    • NovaSeq X Series (Hosted)
      • NovaSeq X Series v1.3.0
        • Release Notes
        • Configuration
        • Manual Upgrade
      • NovaSeq X Series v1.2.1
        • Release Notes
      • NovaSeq X Series v1.2.0
        • Release Notes
        • Configuration
        • User Interaction, Validation and Troubleshooting
        • Manual Upgrade
      • NovaSeq X Series v1.1.0
        • Release Notes
        • Configuration
        • User Interaction, Validation and Troubleshooting
    • NovaSeq X Series (On-Prem)
      • NovaSeq X Series On-Prem v1.0.0
        • Release Notes
        • Installation
        • Configuration
        • User Interaction, Validation and Troubleshooting
    • References
      • Configure Multiple Identical netPathPrefixSearch Values
      • Configure Support for Samples Having Duplicate Names with Different Indexes
      • Illumina Instrument Sample Sheets
      • Terminology
  • Integration Toolkits
    • Lab Instrument Toolkit
      • Template File Generator
        • Creating Template Files
        • Template File Contents
        • Template File Generator Troubleshooting
      • Add Blank Lines
      • Convert CSV to Excel
      • Parse CSV
      • Name Matching XML Parser
      • Sample Placement Helper
    • Lab Logic Toolkit
      • Working with Lab Logic Toolkit
        • Data Collection Entities
        • Failing a Script
        • Mapping Field Types
        • Non-UDF/Custom Field Properties
        • Setting QC Flags
        • Setting Next Actions
        • Specifying Custom Fields
        • Working with Submitted Samples
        • Working with Containers
      • Lab Logic Toolkit Script Examples
        • Comparing Stop/Start Dates and Times with LLTK
      • Lab Logic Toolkit FAQ
  • Known Issues
    • Integration
      • Sample Sheet Generation Issue and CLPA Issues When Samples Have Been Assigned QC Flag Prior to Entering Steps
  • Security Bulletin
    • Investigation of OpenSSH vulnerability with Clarity LIMS
  • Resources
    • Third Party Software Information
  • Others
    • Revision History
Powered by GitBook
On this page
  • Prerequisties
  • Determining Optimal Batch Size
  • Expected Results
  • Proxy timeout
  • Attachments

Was this helpful?

Export as PDF
  1. API and Database
  2. API Portal
  3. Cookbook
  4. Work with Batch Resources

Select the Optimal Batch Size

The Clarity LIMS API has batch retrieve endpoints for samples, artifacts, containers, and files. This article talks generically about links for any of those four entities.

When using the batch endpoints, you want to process upwards of hundreds of links. Intuitively, you may think that a single API call with all the links would be the fastest way to retrieve the data. However, analysis of the API performance shows that as the number of links increases beyond a threshold, the time per object increases.

To retrieve the data in the most efficient way, it is best to do multiple POSTs containing the optimal sized batch. A batch call takes longer than a GET to the endpoint of the sample to retrieve the data for a single sample (or other entity). However, after more than one or two samples are needed, the batch endpoint is more efficient.

Prerequisties

Before you follow the example, make sure that you are aware of what the optimal batch size is based on the following information:

  • The optimal size is dependent on your specific server and the amount of UDFs / custom fields or other data attached to the object being retrieved.

  • The optimal batch size may be different for artifacts, samples, files, and containers. For example, if the optimal size for samples is 500, 10 batches of 500 samples will retrieve the data faster then one batch of 5000.

  • You must also have a compatible version of API (v2 r21 or later).

Determining Optimal Batch Size

Attached below is a simple python script which will time how long batch retrieve take for an array of batch sizes. The efficiency is measured by the duration of the call divided by the number of links posted.

Hard-coded Parameters

The attached script has hard coded parameters to define the range and increments of batch sizes to test. Additionally, the number of replications for each size is adjustable. These parameters are found on line 110, and may not require any modification since they are already set to the following by default:

replications = 3        # how many times each batch will be measured
repetitions = 1         # how many measurements will be taken for each batch size
R = range( 100, 300 )   # range of the batch sizes to be measured (where min >= 1)
q = 25                  # batch size incremental increase

For example, the above parameters will test the following sizes: 100, 125, 150, 175, 200, 225, 250, 275.

Command-line Parameters

The parameters which will need to specific to your server are entered at the command line.

-u
username

-p

password

-s

hostname, including "/api/v2"

-t

entity (either: artifact, sample, file, container)

An example of the full syntax to invoke the script is as follows:

python BatchOptimalSizeTest.py -p apipassword -u apiuser -s https://demo.basespacelims.com/api/v2 -t artifact

Expected Results

The script tracks how long each batch call takes to complete. The script outputs a .txt file with the raw numeric data and the batch size that returns the minimum value, and is the most efficient.

Analyzing results for: artifact

Batch sizes:
[25, 50, 75, 100, 125, 150, 175, 200, 225, 250, 275, 300, 325, 350, 375, 400, 425, 450, 475, 500, 525, 550, 575, 600, 625, 650, 675, 700, 725, 750, 775, 800, 825, 850, 875, 900, 925, 950, 975]

Time (s) per entity:
[0.061350816726684576, 0.04790449237823486, 0.040710381189982096, 0.03354618215560913, 0.033738230133056636, 0.03324082946777344, 0.03209760447910854, 0.03409448790550232, 0.03184072346157498, 0.031050360870361327, 0.029453758586536753, 0.03295832395553589, 0.03149744004469652, 0.03347888347080776, 0.033550281016031906, 0.030628018498420718, 0.03328620989182416, 0.03454347112443712, 0.035195479945132606, 0.0361147011756897, 0.03584921982174828, 0.0383262753053145, 0.037979933946029, 0.03772751696904501, 0.03774445213317871, 0.03933756652245155, 0.04524845660174335, 0.03916741977419172, 0.04273618560001768, 0.043037356503804525, 0.04183078679730815, 0.044450711250305176, 0.0478362009453051, 0.04694189671909108, 0.044135747201102124, 0.04349724955028958, 0.04686621408204775, 0.046690188458091336, 0.05018808247492863] 

Duration (s) of batch call: 
[1.5337704181671143, 2.395224618911743, 3.053278589248657, 3.354618215560913, 4.21727876663208, 4.986124420166016, 5.617080783843994, 6.8188975811004635, 7.1641627788543705, 7.762590217590332, 8.099783611297607, 9.887497186660767, 10.236668014526368, 11.717609214782716, 12.581355381011964, 12.251207399368287, 14.146639204025268, 15.544562005996704, 16.717852973937987, 18.057350587844848, 18.820840406417847, 21.079451417922975, 21.838462018966673, 22.636510181427003, 23.590282583236693, 25.569418239593507, 30.54270820617676, 27.417193841934203, 30.983734560012817, 32.278017377853395, 32.418859767913816, 35.56056900024414, 39.46486577987671, 39.90061221122742, 38.61877880096436, 39.14752459526062, 43.351248025894165, 44.35567903518677, 48.93338041305542]

275 artifacts was the most efficient batch size

Viewing this data in a scatterplot format, you can see the range of optimal batch sizes for the artifacts/batch/retrieve endpoint is about 200 to 300 artifacts. This would be valid for artifacts only and each entity (eg, sample, file, or container) should be evaluated separately.

The shortest time per artifact is the most efficient batch size, as shown in the following example:

275 artifacts was the most efficient batch size

Proxy timeout

By default, LIMS configuration of send and receive timeout is 60 seconds. Very large batch calls will not complete if their duration is greater then the timeout configuration. This configuration is located at

/etc/httpd/conf/httpd.conf 

Attachments

BatchOptimalSizeTest.py:

PreviousRetrieve Multiple Entities with a Single API InteractionNextWork with Files

Last updated 10 months ago

Was this helpful?

6KB
BatchOptimalSizeTest.py