CN112955729B - Subsampling flow cytometry event data - Google Patents

Subsampling flow cytometry event data Download PDF

Info

Publication number
CN112955729B
CN112955729B CN202080004716.2A CN202080004716A CN112955729B CN 112955729 B CN112955729 B CN 112955729B CN 202080004716 A CN202080004716 A CN 202080004716A CN 112955729 B CN112955729 B CN 112955729B
Authority
CN
China
Prior art keywords
flow cytometry
event data
bin
bins
cytometry event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202080004716.2A
Other languages
Chinese (zh)
Other versions
CN112955729A (en
Inventor
乔纳森·林
基根·奥斯利
大卫·A·罗伯茨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Becton Dickinson and Co
Original Assignee
Becton Dickinson and Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Becton Dickinson and Co filed Critical Becton Dickinson and Co
Publication of CN112955729A publication Critical patent/CN112955729A/en
Application granted granted Critical
Publication of CN112955729B publication Critical patent/CN112955729B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Optical investigation techniques, e.g. flow cytometry
    • G01N15/1429Signal processing
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Optical investigation techniques, e.g. flow cytometry
    • G01N15/1456Optical investigation techniques, e.g. flow cytometry without spatial resolution of the texture or inner structure of the particle, e.g. processing of pulse signals
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Optical investigation techniques, e.g. flow cytometry
    • G01N15/1456Optical investigation techniques, e.g. flow cytometry without spatial resolution of the texture or inner structure of the particle, e.g. processing of pulse signals
    • G01N15/1459Optical investigation techniques, e.g. flow cytometry without spatial resolution of the texture or inner structure of the particle, e.g. processing of pulse signals the analysis being performed on a sample stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Optical investigation techniques, e.g. flow cytometry
    • G01N15/1404Handling flow, e.g. hydrodynamic focusing
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Optical investigation techniques, e.g. flow cytometry
    • G01N15/149Optical investigation techniques, e.g. flow cytometry specially adapted for sorting particles, e.g. by their size or optical properties
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N2015/1006Investigating individual particles for cytology
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Optical investigation techniques, e.g. flow cytometry
    • G01N2015/1402Data analysis by thresholding or gating operations performed on the acquired signals or stored data
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Optical investigation techniques, e.g. flow cytometry
    • G01N2015/1477Multiparameters
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N15/00Investigating characteristics of particles; Investigating permeability, pore-volume or surface-area of porous materials
    • G01N15/10Investigating individual particles
    • G01N15/14Optical investigation techniques, e.g. flow cytometry
    • G01N2015/1488Methods for deciding

Landscapes

  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Health & Medical Sciences (AREA)
  • Dispersion Chemistry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Signal Processing (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

The present invention includes systems, devices, computer readable media, and methods for subsampling flow cytometry event data. The first and second flow cytometry event data may be converted into a low-dimensional space associated with the plurality of bins and may be assigned to the first bin and the second bin. The invention may generate subsampled flow cytometry event data comprising said first flow cytometry event data. The subsampled flow cytometry event data may include the second flow cytometry event data if the first bin is different from the second bin. The subsampled flow cytometry event data may not include the second flow cytometry event data if the first bin and the second bin are the same.

Description

对流式细胞术事件数据进行二次采样Subsampling flow cytometry event data

技术领域Technical Field

本发明通常涉及自动颗粒评估技术领域,更具体而言,涉及样品分析和颗粒表征方法。The present invention relates generally to the field of automated particle assessment technology and, more particularly, to sample analysis and particle characterization methods.

背景技术Background Art

颗粒分析仪(例如,流式细胞仪)可基于光散射和荧光等电光测量值来表征颗粒。在流式细胞仪中,例如,流体悬浮液中的颗粒(例如分子、与分析物结合的微珠或个别细胞)通过检测区域,在所述区域内,所述颗粒暴露于通常来自一个或更多个激光器的激发光中,并且测定所述颗粒的光散射和荧光特性。颗粒或其组分通常用荧光染料标记以便进行检测。各种不同的颗粒或组分可以通过用光谱特性不同的荧光染料标记来同时加以检测。不同的细胞类型可以通过用荧光染料标记的抗体或其他荧光探针标记各种细胞蛋白或其他组分而得到/产生的光散射特性和荧光发射加以识别。利用多色流式细胞术进行细胞(或其他颗粒)分析所获得的数据是多维的,其中每个细胞对应于由所述被测参数定义的多维空间中的一个点。细胞群或颗粒群被识别为所述数据空间中的点的聚类。Particle analyzers (e.g., flow cytometers) can characterize particles based on electro-optical measurements such as light scattering and fluorescence. In a flow cytometer, for example, particles (e.g., molecules, microbeads bound to an analyte, or individual cells) in a fluid suspension pass through a detection region where they are exposed to excitation light, typically from one or more lasers, and the light scattering and fluorescence properties of the particles are measured. Particles or their components are typically labeled with fluorescent dyes for detection. Various different particles or components can be detected simultaneously by labeling with fluorescent dyes with different spectral properties. Different cell types can be identified by light scattering properties and fluorescence emissions obtained/generated by labeling various cellular proteins or other components with fluorescent dye-labeled antibodies or other fluorescent probes. The data obtained by analyzing cells (or other particles) using multicolor flow cytometry is multidimensional, with each cell corresponding to a point in the multidimensional space defined by the measured parameters. Cell populations or particle populations are identified as clusters of points in the data space.

发明内容Summary of the invention

本发明公开了用于对流式细胞术事件数据进行二次采样的系统、设备、计算机可读介质和方法。在一些实施例中,方法包含:在处理器的控制下:将高维空间内流式细胞术事件数据集中与第一多个事件中的第一事件相关联的第一流式细胞术事件数据转换成第一低维空间内与所述第一事件相关联的第一经转换的流式细胞术事件数据。所述第一事件可与正二次采样需求相关联。所述第一低维空间可与第一多个分箱相关联。The present invention discloses a system, apparatus, computer readable medium and method for subsampling flow cytometry event data. In some embodiments, the method comprises: under the control of a processor: converting first flow cytometry event data associated with a first event in a first plurality of events in a flow cytometry event data set in a high-dimensional space into first converted flow cytometry event data associated with the first event in a first low-dimensional space. The first event may be associated with a positive subsampling demand. The first low-dimensional space may be associated with a first plurality of bins.

所述第一经转换的流式细胞术事件数据可与所述第一多个分箱中的所述第一分箱相关联。所述方法可以包含:将所述高维空间内所述流式细胞术事件数据集中与第一多个事件中的第二事件相关联的第二流式细胞术事件数据转换成所述第一低维空间内与所述第二事件相关联的第二经转换的流式细胞术事件数据。所述第二事件可与所述正二次采样需求相关联。所述第二经转换的流式细胞术事件数据可与所述第一多个分箱中的第二分箱相关联。所述方法可以包含:确定与所述第一经转换的流式细胞术事件数据相关联的所述第一分箱和与所述第二经转换的流式细胞术事件数据相关联的所述第二分箱是不同的。所述方法可以包含:生成所述流式细胞术事件数据的经二次采样的流式细胞术事件数据集,所述数据包含与所述第一事件相关联的所述第一流式细胞术事件数据和与所述第二事件相关联的所述第二流式细胞术事件数据。The first transformed flow cytometry event data may be associated with the first bin of the first plurality of bins. The method may include converting second flow cytometry event data associated with a second event of the first plurality of events in the flow cytometry event data set in the high-dimensional space into second transformed flow cytometry event data associated with the second event in the first low-dimensional space. The second event may be associated with the positive subsampling requirement. The second transformed flow cytometry event data may be associated with a second bin of the first plurality of bins. The method may include determining that the first bin associated with the first transformed flow cytometry event data and the second bin associated with the second transformed flow cytometry event data are different. The method may include generating a subsampled flow cytometry event data set of the flow cytometry event data, the data including the first flow cytometry event data associated with the first event and the second flow cytometry event data associated with the second event.

在一些实施例中,所述方法可以包含:接收包含所述第一流式细胞术事件数据和所述第二流式细胞术事件数据的流式细胞术事件数据。所述方法可以包含:确定所述第一多个事件中所述第一事件的所述第一流式细胞术事件数据与所述正二次采样需求相关联;和/或确定所述第一多个事件中所述第二事件的所述第二流式细胞术事件数据与所述正二次采样需求相关联。所述方法可以包含:确定所述第一经转换的流式细胞术事件数据与所述第一多个分箱中的所述第一分箱相关联;和/或确定所述第二经转换的流式细胞术事件数据与所述第一多个分箱中的所述第二分箱相关联。所述方法可以包含:基于所述第一多个分箱中的所述第一分箱确定所述第一经转换的流式细胞术事件数据的第一描述符;和/或基于所述第一多个分箱中的所述第二分箱确定所述第二经转换的流式细胞术事件数据的第二描述符。与所述第一分箱相关联的所述第一经转换的流式细胞术事件数据的所述第一描述符可以是所述第一多个分箱中所述第一分箱的第一分箱编号;和/或与所述第二分箱相关联的所述第二经转换的流式细胞术事件数据的所述第二描述符可以是所述第一多个分箱中所述第一分箱的第二分箱编号。所述第一流式细胞术事件数据可与第一稀有细胞相关联和/或所述第二流式细胞术事件数据可与第二稀有细胞相关联。所述第一稀有细胞和所述第二稀有细胞可以是不同细胞类型的细胞。In some embodiments, the method may include: receiving flow cytometry event data including the first flow cytometry event data and the second flow cytometry event data. The method may include: determining that the first flow cytometry event data of the first event in the first plurality of events is associated with the positive subsampling requirement; and/or determining that the second flow cytometry event data of the second event in the first plurality of events is associated with the positive subsampling requirement. The method may include: determining that the first transformed flow cytometry event data is associated with the first bin in the first plurality of bins; and/or determining that the second transformed flow cytometry event data is associated with the second bin in the first plurality of bins. The method may include: determining a first descriptor of the first transformed flow cytometry event data based on the first bin in the first plurality of bins; and/or determining a second descriptor of the second transformed flow cytometry event data based on the second bin in the first plurality of bins. The first descriptor of the first transformed flow cytometry event data associated with the first bin may be a first bin number of the first bin in the first plurality of bins; and/or the second descriptor of the second transformed flow cytometry event data associated with the second bin may be a second bin number of the first bin in the first plurality of bins. The first flow cytometry event data may be associated with a first rare cell and/or the second flow cytometry event data may be associated with a second rare cell. The first rare cell and the second rare cell may be cells of different cell types.

所述方法可以包含:向内存数据结构中添加所述第一分箱、所述第一描述符和/或所述第一分箱编号;和/或向所述内存数据结构中添加所述第二分箱、所述第二描述符和/或所述第二分箱编号。The method may include: adding the first bin, the first descriptor and/or the first bin number to a memory data structure; and/or adding the second bin, the second descriptor and/or the second bin number to the memory data structure.

在一些实施例中,所述方法包含:将所述高维空间内所述流式细胞术事件数据集中与第一多个事件中的第三事件相关联的第三流式细胞术事件数据转换成所述第一低维空间内与所述第三事件相关联的第三经转换的流式细胞术事件数据。所述第三事件可与所述正二次采样需求相关联。所述第三经转换的流式细胞术事件数据可与所述第一多个分箱中的所述第三分箱相关联。所述方法可以包含:确定与所述第三经转换的流式细胞术事件数据相关联的所述第三分箱是与所述第一经转换的流式细胞术事件数据相关联的所述第一分箱或与所述第二经转换的流式细胞术事件数据相关联的所述第二分箱。所述第三流式细胞术事件数据可能不在所述流式细胞术事件数据的所述经二次采样的流式细胞术事件数据中。所述方法可以包含:基于所述第一多个分箱中的所述第三分箱确定所述第三经转换的流式细胞术事件数据的第三描述符。与所述第三分箱相关联的所述第三经转换的流式细胞术事件数据的所述第三描述符可以是所述第一多个分箱中所述第三分箱的第三分箱编号。所述方法可以包含:确定所述第三分箱、所述第三描述符和/或所述第三分箱编号不在所述内存数据结构中。In some embodiments, the method includes converting third flow cytometry event data associated with a third event in the first plurality of events in the flow cytometry event data set in the high dimensional space into third transformed flow cytometry event data associated with the third event in the first low dimensional space. The third event may be associated with the positive subsampling requirement. The third transformed flow cytometry event data may be associated with the third bin in the first plurality of bins. The method may include determining that the third bin associated with the third transformed flow cytometry event data is the first bin associated with the first transformed flow cytometry event data or the second bin associated with the second transformed flow cytometry event data. The third flow cytometry event data may not be in the subsampled flow cytometry event data of the flow cytometry event data. The method may include determining a third descriptor of the third transformed flow cytometry event data based on the third bin in the first plurality of bins. The third descriptor of the third transformed flow cytometry event data associated with the third bin may be a third bin number of the third bin in the first plurality of bins. The method may include determining that the third bin, the third descriptor, and/or the third bin number are not in the memory data structure.

在一些实施例中,所述方法包含:确定第四流式细胞术事件数据(其与所述第一多个事件中的第四事件相关联)与负二次采样需求相关联。所述生成可以包含:生成所述流式细胞术事件数据的所述经二次采样的流式细胞术事件数据集,所述事件数据包含与所述第四事件相关联的所述第四流式细胞术事件数据。所述方法可以包含:接收定义多种目的细胞的多个门,其中所述第四流式细胞术事件数据与所述多种目的细胞中的目的细胞相关联。所述第四流式细胞术事件数据与已分选细胞相关联。In some embodiments, the method includes determining that fourth flow cytometry event data associated with a fourth event in the first plurality of events is associated with a negative subsampling requirement. The generating may include generating the subsampled flow cytometry event data set of the flow cytometry event data, the event data including the fourth flow cytometry event data associated with the fourth event. The method may include receiving a plurality of gates defining a plurality of cells of interest, wherein the fourth flow cytometry event data is associated with a cell of interest in the plurality of cells of interest. The fourth flow cytometry event data is associated with a sorted cell.

在一些实施例中,所述方法包含:将所述高维空间内所述流式细胞术事件数据集中与第二多个事件中的第二事件相关联的第二流式细胞术事件数据转换成所述第一低维空间内与所述第二多个事件中的所述第二事件相关联的第二经转换的流式细胞术事件数据。In some embodiments, the method includes converting second flow cytometry event data associated with a second event in a second plurality of events in the flow cytometry event data set within the high-dimensional space into second converted flow cytometry event data associated with the second event in the second plurality of events within the first low-dimensional space.

所述第二多个事件中的所述第二事件可与所述正二次采样需求相关联。所述第二经转换的流式细胞术事件数据(其与所述第二多个事件中的所述第二事件相关联)可与所述第一多个分箱中的第二分箱相关联。与所述第二经转换的流式细胞术事件数据(其与所述第二多个事件中的所述第二事件相关联)相关联的所述第二分箱和与所述第一经转换的流式细胞术事件数据(其与所述第一多个事件中的所述第一事件相关联)相关联的所述第一分箱是相同的。所述生成可以包含:生成所述流式细胞术事件数据的所述经二次采样的流式细胞术事件数据集,所述事件数据包含与所述第二多个事件中的所述第二事件相关联的所述第二流式细胞术事件数据。所述方法可以包含:确定所述第一多个事件中的最后事件与超过预定阈值的时间参数或事件数相关联。所述方法可以包含:重置所述内存数据结构。所述方法可以包含:向所述内存数据结构中添加与所述第二经转换的流式细胞术事件数据(其与所述第二多个事件中的所述第二事件相关联)相关联的所述第二分箱。在一些实施例中,所述方法可以包含:接收二次采样参数水平。所述方法可以包含:基于所述二次采样参数水平确定所述预定阈值。The second event in the second plurality of events may be associated with the positive subsampling demand. The second transformed flow cytometry event data (associated with the second event in the second plurality of events) may be associated with a second bin in the first plurality of bins. The second bin associated with the second transformed flow cytometry event data (associated with the second event in the second plurality of events) and the first bin associated with the first transformed flow cytometry event data (associated with the first event in the first plurality of events) are the same. The generating may include generating the subsampled flow cytometry event data set of the flow cytometry event data, the event data including the second flow cytometry event data associated with the second event in the second plurality of events. The method may include determining that the last event in the first plurality of events is associated with a time parameter or a number of events exceeding a predetermined threshold. The method may include resetting the memory data structure. The method may include adding the second bin associated with the second transformed flow cytometry event data (associated with the second event in the second plurality of events) to the memory data structure. In some embodiments, the method may include receiving a subsampling parameter level. The method may comprise determining the predetermined threshold based on the sub-sampling parameter level.

在一些实施例中,转换所述第一流式细胞术事件数据包含利用第一降维函数转换所述第一流式细胞术事件数据。转换所述第二流式细胞术事件数据可以包含利用所述第一降维函数转换所述第二流式细胞术事件数据。所述第一降维函数和/或所述第二降维函数可以是线性降维函数。所述第一降维函数和/或所述第二降维函数可以是非线性降维函数。所述非线性降维函数可以是t分布随机邻居嵌入(t-SNE)。所述方法可以包含:首先接收所述降维函数或其标识。In some embodiments, converting the first flow cytometry event data comprises converting the first flow cytometry event data using a first dimensionality reduction function. Converting the second flow cytometry event data may comprise converting the second flow cytometry event data using the first dimensionality reduction function. The first dimensionality reduction function and/or the second dimensionality reduction function may be a linear dimensionality reduction function. The first dimensionality reduction function and/or the second dimensionality reduction function may be a nonlinear dimensionality reduction function. The nonlinear dimensionality reduction function may be a t-distributed stochastic neighbor embedding (t-SNE). The method may comprise: first receiving the dimensionality reduction function or an identifier thereof.

在一些实施例中,转换所述第一流式细胞术事件数据包含利用第二降维函数将所述第一流式细胞术事件数据转换成第二低维空间内与所述第一事件相关联的第一经转换的流式细胞术事件数据。所述第二低维空间可与第二多个分箱相关联。In some embodiments, transforming the first flow cytometry event data comprises transforming the first flow cytometry event data into first transformed flow cytometry event data associated with the first event in a second low-dimensional space using a second dimensionality reduction function. The second low-dimensional space may be associated with a second plurality of bins.

所述第二低维空间内的所述第一经转换的流式细胞术事件数据可与所述第二多个分箱中的第一分箱相关联。转换所述第二流式细胞术事件数据可以包含利用所述第二降维函数将所述第二流式细胞术事件数据转换成所述第二低维空间内与所述第二事件相关联的第二经转换的流式细胞术事件数据。所述第二低维空间内的所述第二经转换的流式细胞术事件数据可与所述第二多个分箱中的第二分箱相关联。所述第一多个分箱中的所述第一分箱可与第一目的细胞类型相关联。所述第二多个分箱中的所述第二分箱可与第二目的细胞类型相关联。所述第一多个分箱中的所述第二分箱与所述第一目的细胞类型无关联。所述第一多个分箱中的所述第二分箱与所述第二目的细胞类型无关联。所述第二多个分箱中的所述第一分箱与所述第二目的细胞类型无关联。所述第二多个分箱中的所述第一分箱与所述第一目的细胞类型无关联。所述第一多个分箱中的所述第一分箱与所述第二多个分箱中的所述第一分箱的组合与第一目的细胞类型相关联。所述第一多个分箱中的所述第二分箱与所述第二多个分箱中的所述第二分箱的组合与第二目的细胞类型相关联。所述第一多个分箱中的所述第一分箱与所述第二多个分箱中的所述第二分箱的组合与所述第一目的细胞类型和所述第二目的细胞类型无关联。所述第一多个分箱中的所述第二分箱与所述第二多个分箱中的所述第一分箱的组合与所述第一目的细胞类型和所述第二目的细胞类型无关联。The first transformed flow cytometry event data in the second low-dimensional space may be associated with a first bin in the second plurality of bins. Converting the second flow cytometry event data may include converting the second flow cytometry event data into second transformed flow cytometry event data associated with the second event in the second low-dimensional space using the second dimensionality reduction function. The second transformed flow cytometry event data in the second low-dimensional space may be associated with a second bin in the second plurality of bins. The first bin in the first plurality of bins may be associated with a first target cell type. The second bin in the second plurality of bins may be associated with a second target cell type. The second bin in the first plurality of bins is not associated with the first target cell type. The second bin in the first plurality of bins is not associated with the second target cell type. The first bin in the second plurality of bins is not associated with the second target cell type. The first bin in the second plurality of bins is not associated with the first target cell type. The combination of the first bin in the first plurality of bins and the first bin in the second plurality of bins is associated with the first target cell type. The combination of the second sub-bin in the first plurality of sub-bins and the second sub-bin in the second plurality of sub-bins is associated with a second target cell type. The combination of the first sub-bin in the first plurality of sub-bins and the second sub-bin in the second plurality of sub-bins is not associated with the first target cell type and the second target cell type. The combination of the second sub-bin in the first plurality of sub-bins and the first sub-bin in the second plurality of sub-bins is not associated with the first target cell type and the second target cell type.

在一些实施例中,所述第一多个分箱中的两个分箱具有相同尺寸。所述第一多个分箱中的每个分箱可具有相同尺寸。所述第一多个分箱中的两个分箱可具有不同尺寸。所述第一多个分箱中的两个分箱可以包含数量(大致)相同的经转换的流式细胞术事件数据。所述第一多个分箱中的每个分箱可以包含数量大致相同的经转换的流式细胞术事件数据。所述方法可以包含:确定所述第一多个分箱中每个分箱的尺寸。所述方法可以包含:基于多个门确定所述第一多个分箱中每个分箱的所述尺寸。所述方法可以包含:基于与多种目的细胞相关联的所述经转换的流式细胞术事件数据确定所述第一多个分箱中每个分箱的所述尺寸。In some embodiments, two of the first plurality of bins have the same size. Each of the first plurality of bins may have the same size. Two of the first plurality of bins may have different sizes. Two of the first plurality of bins may contain (approximately) the same amount of converted flow cytometry event data. Each of the first plurality of bins may contain approximately the same amount of converted flow cytometry event data. The method may include: determining the size of each of the first plurality of bins. The method may include: determining the size of each of the first plurality of bins based on a plurality of gates. The method may include: determining the size of each of the first plurality of bins based on the converted flow cytometry event data associated with a plurality of cells of interest.

本发明包括用于对流式细胞术事件数据进行二次采样的计算系统的实施例。在一些实施例中,所述计算系统可以包含:被配置成存储可执行指令的非暂时性存储器;以及与所述非暂时性存储器进行通信的处理器(例如,硬件处理器或虚拟处理器),所述处理器被所述可执行指令编程为:将高维空间内与第一多个事件中的第一事件相关联的第一流式细胞术事件数据转换成第一低维空间内流式细胞术事件数据集中与所述第一事件相关联的第一经转换的流式细胞术事件数据,其中所述第一事件与正二次采样需求相关联,其中所述第一低维空间与第一多个分箱相关联,且其中所述第一经转换的流式细胞术事件数据与所述第一多个分箱中的第一分箱相关联。所述处理器可被所述可执行指令编程为:将所述高维空间内所述流式细胞术事件数据集中与所述第一多个事件中的第二事件相关联的第二流式细胞术事件数据转换成所述第一低维空间内与所述第二事件相关联的第二经转换的流式细胞术事件数据,其中所述第二事件与所述正二次采样需求相关联,且其中所述第二经转换的流式细胞术事件数据与所述第一多个分箱中的第二分箱相关联。所述处理器可被所述可执行指令编程为:确定与所述第一经转换的流式细胞术事件数据相关联的所述第一分箱和与所述第二经转换的流式细胞术事件数据相关联的所述第二分箱是不同的。所述处理器可被所述可执行指令编程为:生成所述流式细胞术事件数据的经二次采样的流式细胞术事件数据集,所述数据包含与所述第一事件相关联的所述第一流式细胞术事件数据和与所述第二事件相关联的所述第二流式细胞术事件数据。The present invention includes embodiments of a computing system for subsampling flow cytometry event data. In some embodiments, the computing system may include: a non-transitory memory configured to store executable instructions; and a processor (e.g., a hardware processor or a virtual processor) in communication with the non-transitory memory, the processor being programmed by the executable instructions to: convert first flow cytometry event data associated with a first event in a first plurality of events in a high-dimensional space into first converted flow cytometry event data associated with the first event in a flow cytometry event data set in a first low-dimensional space, wherein the first event is associated with a positive subsampling demand, wherein the first low-dimensional space is associated with a first plurality of bins, and wherein the first converted flow cytometry event data is associated with a first bin in the first plurality of bins. The processor may be programmed by the executable instructions to convert second flow cytometry event data associated with a second event in the first plurality of events in the flow cytometry event data set in the high-dimensional space into second transformed flow cytometry event data associated with the second event in the first low-dimensional space, wherein the second event is associated with the positive subsampling demand, and wherein the second transformed flow cytometry event data is associated with a second bin in the first plurality of bins. The processor may be programmed by the executable instructions to determine that the first bin associated with the first transformed flow cytometry event data and the second bin associated with the second transformed flow cytometry event data are different. The processor may be programmed by the executable instructions to generate a subsampled flow cytometry event data set of the flow cytometry event data, the data comprising the first flow cytometry event data associated with the first event and the second flow cytometry event data associated with the second event.

在一些实施例中,所述处理器被所述可执行指令编程为:接收包含所述第一流式细胞术事件数据和所述第二流式细胞术事件数据的流式细胞术事件数据。所述处理器可被所述可执行指令编程为:确定所述第一多个事件中所述第一事件的所述第一流式细胞术事件数据与所述正二次采样需求相关联。所述处理器可被所述可执行指令编程为:确定所述第一多个事件中所述第二事件的所述第二流式细胞术事件数据与所述正二次采样需求相关联。所述处理器可被所述可执行指令编程为:In some embodiments, the processor is programmed by the executable instructions to: receive flow cytometry event data including the first flow cytometry event data and the second flow cytometry event data. The processor may be programmed by the executable instructions to: determine that the first flow cytometry event data of the first event in the first plurality of events is associated with the positive subsampling requirement. The processor may be programmed by the executable instructions to: determine that the second flow cytometry event data of the second event in the first plurality of events is associated with the positive subsampling requirement. The processor may be programmed by the executable instructions to:

确定所述第一经转换的流式细胞术事件数据与所述第一多个分箱中的所述第一分箱相关联。所述处理器可被所述可执行指令编程为:确定所述第二经转换的流式细胞术事件数据与所述第一多个分箱中的所述第二分箱相关联。Determining that the first transformed flow cytometry event data is associated with the first bin of the first plurality of bins. The processor may be programmed by the executable instructions to determine that the second transformed flow cytometry event data is associated with the second bin of the first plurality of bins.

在一些实施例中,所述处理器被所述可执行指令编程为:基于所述第一多个分箱中的所述第一分箱确定所述第一经转换的流式细胞术事件数据的第一描述符。所述处理器可被所述可执行指令编程为:基于所述第一多个分箱中的所述第二分箱确定所述第二经转换的流式细胞术事件数据的第二描述符。与所述第一分箱相关联的所述第一经转换的流式细胞术事件数据的所述第一描述符可以是所述第一多个分箱中所述第一分箱的第一分箱编号;和/或与所述第二分箱相关联的所述第二经转换的流式细胞术事件数据的所述第二描述符可以是所述第一多个分箱中所述第一分箱的第二分箱编号。所述第一流式细胞术事件数据与第一稀有细胞相关联和/或所述第二流式细胞术事件数据可与第二稀有细胞相关联。所述第一稀有细胞和所述第二稀有细胞可以是不同细胞类型的细胞。In some embodiments, the processor is programmed by the executable instructions to determine a first descriptor of the first transformed flow cytometry event data based on the first bin in the first plurality of bins. The processor may be programmed by the executable instructions to determine a second descriptor of the second transformed flow cytometry event data based on the second bin in the first plurality of bins. The first descriptor of the first transformed flow cytometry event data associated with the first bin may be a first bin number of the first bin in the first plurality of bins; and/or the second descriptor of the second transformed flow cytometry event data associated with the second bin may be a second bin number of the first bin in the first plurality of bins. The first flow cytometry event data is associated with a first rare cell and/or the second flow cytometry event data may be associated with a second rare cell. The first rare cell and the second rare cell may be cells of different cell types.

在一些实施例中,所述处理器被所述可执行指令编程为:向内存数据结构中添加所述第一分箱、所述第一描述符和/或所述第一分箱编号;和/或向所述内存数据结构中添加所述第二分箱、所述第二描述符和/或所述第二分箱编号。在一些实施例中,所述处理器被所述可执行指令编程为:将所述高维空间内所述流式细胞术事件数据集中与第一多个事件中的第三事件相关联的第三流式细胞术事件数据转换成所述第一低维空间内与所述第三事件相关联的第三经转换的流式细胞术事件数据。所述第三事件可与所述正二次采样需求相关联。所述第三经转换的流式细胞术事件数据可与所述第一多个分箱中的所述第三分箱相关联。所述处理器可被所述可执行指令编程为:确定与所述第三经转换的流式细胞术事件数据相关联的所述第三分箱是与所述第一经转换的流式细胞术事件数据相关联的所述第一分箱或与所述第二经转换的流式细胞术事件数据相关联的所述第二分箱。所述第三流式细胞术事件数据可能不在所述流式细胞术事件数据的所述经二次采样的流式细胞术事件数据中。所述处理器可被所述可执行指令编程为:In some embodiments, the processor is programmed by the executable instructions to: add the first bin, the first descriptor and/or the first bin number to the memory data structure; and/or add the second bin, the second descriptor and/or the second bin number to the memory data structure. In some embodiments, the processor is programmed by the executable instructions to: convert the third flow cytometry event data associated with the third event in the first plurality of events in the flow cytometry event data set in the high-dimensional space into the third transformed flow cytometry event data associated with the third event in the first low-dimensional space. The third event may be associated with the positive subsampling requirement. The third transformed flow cytometry event data may be associated with the third bin in the first plurality of bins. The processor may be programmed by the executable instructions to: determine whether the third bin associated with the third transformed flow cytometry event data is the first bin associated with the first transformed flow cytometry event data or the second bin associated with the second transformed flow cytometry event data. The third flow cytometry event data may not be in the subsampled flow cytometry event data of the flow cytometry event data. The processor may be programmed by the executable instructions to:

基于所述第一多个分箱中的所述第三分箱确定所述第三经转换的流式细胞术事件数据的第三描述符。与所述第三分箱相关联的所述第三经转换的流式细胞术事件数据的所述第三描述符可以是所述第一多个分箱中所述第三分箱的第三分箱编号。所述处理器可被所述可执行指令编程为:确定所述第三分箱、所述第三描述符和/或所述第三分箱编号不在所述内存数据结构中。A third descriptor for the third transformed flow cytometry event data is determined based on the third bin in the first plurality of bins. The third descriptor for the third transformed flow cytometry event data associated with the third bin may be a third bin number for the third bin in the first plurality of bins. The processor may be programmed by the executable instructions to determine that the third bin, the third descriptor, and/or the third bin number are not in the memory data structure.

在一些实施例中,所述处理器被所述可执行指令编程为:确定第四流式细胞术事件数据(其与所述第一多个事件中的第四事件相关联)与负二次采样需求相关联。为了生成所述经二次采样的流式细胞术事件数据集,所述处理器可被所述可执行指令编程为:生成所述流式细胞术事件数据的所述经二次采样的流式细胞术事件数据集,所述事件数据包含与所述第四事件相关联的所述第四流式细胞术事件数据。所述处理器可被所述可执行指令编程为:接收定义多种目的细胞的多个门。所述第四流式细胞术事件数据可与所述多种目的细胞中的目的细胞相关联。所述第四流式细胞术事件数据与已分选细胞相关联。In some embodiments, the processor is programmed by the executable instructions to determine that fourth flow cytometry event data (which is associated with a fourth event in the first plurality of events) is associated with a negative subsampling requirement. To generate the subsampled flow cytometry event data set, the processor may be programmed by the executable instructions to generate the subsampled flow cytometry event data set of flow cytometry event data, the event data including the fourth flow cytometry event data associated with the fourth event. The processor may be programmed by the executable instructions to receive a plurality of gates defining a plurality of target cells. The fourth flow cytometry event data may be associated with a target cell in the plurality of target cells. The fourth flow cytometry event data is associated with a sorted cell.

在一些实施例中,所述处理器被所述可执行指令编程为:将所述高维空间内所述流式细胞术事件数据集中与第二多个事件中的第二事件相关联的第二流式细胞术事件数据转换成所述第一低维空间内与所述第二多个事件中的所述第二事件相关联的第二经转换的流式细胞术事件数据。所述第二多个事件中的所述第二事件可与所述正二次采样需求相关联。所述第二经转换的流式细胞术事件数据(其与所述第二多个事件中的所述第二事件相关联)可与所述第一多个分箱中的第二分箱相关联。与所述第二经转换的流式细胞术事件数据(其与所述第二多个事件中的所述第二事件相关联)相关联的所述第二分箱和与所述第一经转换的流式细胞术事件数据(其与所述第一多个事件中的所述第一事件相关联)相关联的所述第一分箱是相同的。为了生成所述经二次采样的流式细胞术事件数据集,所述处理器可被所述可执行指令编程为:生成所述流式细胞术事件数据的所述经二次采样的流式细胞术事件数据集,所述事件数据包含与所述第二多个事件中的所述第二事件相关联的所述第二流式细胞术事件数据。所述处理器可被所述可执行指令编程为:In some embodiments, the processor is programmed by the executable instructions to convert second flow cytometry event data associated with a second event in the second plurality of events in the flow cytometry event data set in the high-dimensional space into second transformed flow cytometry event data associated with the second event in the second plurality of events in the first low-dimensional space. The second event in the second plurality of events may be associated with the positive subsampling requirement. The second transformed flow cytometry event data (associated with the second event in the second plurality of events) may be associated with a second bin in the first plurality of bins. The second bin associated with the second transformed flow cytometry event data (associated with the second event in the second plurality of events) and the first bin associated with the first transformed flow cytometry event data (associated with the first event in the first plurality of events) are the same. To generate the subsampled flow cytometry event data set, the processor may be programmed by the executable instructions to generate the subsampled flow cytometry event data set of the flow cytometry event data, the event data including the second flow cytometry event data associated with the second event in the second plurality of events. The processor may be programmed by the executable instructions to:

确定所述第一多个事件中的最后事件与超过预定阈值的时间参数或事件数相关联。所述处理器可被所述可执行指令编程为:重置所述内存数据结构。所述处理器可被所述可执行指令编程为:向所述内存数据结构中添加与所述第二经转换的流式细胞术事件数据(其与所述第二多个事件中的所述第二事件相关联)相关联的所述第二分箱。determining that a last event in the first plurality of events is associated with a time parameter or a number of events exceeding a predetermined threshold. The processor may be programmed by the executable instructions to reset the memory data structure. The processor may be programmed by the executable instructions to add the second bin associated with the second transformed flow cytometry event data associated with the second event in the second plurality of events to the memory data structure.

在一些实施例中,所述处理器被所述可执行指令编程为:接收二次采样参数水平。所述处理器可被所述可执行指令编程为:基于所述二次采样参数水平确定所述预定阈值。In some embodiments, the processor is programmed by the executable instructions to receive a sub-sampling parameter level. The processor is programmed by the executable instructions to determine the predetermined threshold based on the sub-sampling parameter level.

在一些实施例中,为了转换所述第一流式细胞术事件数据,所述处理器可被所述可执行指令编程为:利用所述第一降维函数转换所述第一流式细胞术事件数据;和/或为了转换所述第二流式细胞术事件数据,所述处理器可被所述可执行指令编程为:利用所述第一降维函数转换所述第二流式细胞术事件数据。所述第一降维函数和/或所述第二降维函数可以是线性降维函数。所述第一降维函数和/或所述第二降维函数可以是非线性降维函数。所述非线性降维函数可以是t分布随机邻居嵌入(t-SNE)。所述处理器可被所述可执行指令编程为:首先接收所述降维函数或其标识。In some embodiments, to convert the first flow cytometry event data, the processor may be programmed by the executable instructions to: convert the first flow cytometry event data using the first dimensionality reduction function; and/or to convert the second flow cytometry event data, the processor may be programmed by the executable instructions to: convert the second flow cytometry event data using the first dimensionality reduction function. The first dimensionality reduction function and/or the second dimensionality reduction function may be a linear dimensionality reduction function. The first dimensionality reduction function and/or the second dimensionality reduction function may be a nonlinear dimensionality reduction function. The nonlinear dimensionality reduction function may be a t-distributed stochastic neighbor embedding (t-SNE). The processor may be programmed by the executable instructions to: first receive the dimensionality reduction function or its identifier.

在一些实施例中,为了转换所述第一流式细胞术事件数据,所述处理器被所述可执行指令编程为:利用第二降维函数将所述第一流式细胞术事件数据转换成第二低维空间内与所述第一事件相关联的第一经转换的流式细胞术事件数据。所述第二低维空间可与第二多个分箱相关联。所述第二低维空间内的所述第一经转换的流式细胞术事件数据可与所述第二多个分箱中的第一分箱相关联。为了转换所述第二流式细胞术事件数据,所述处理器可被所述可执行指令编程为:利用所述第二降维函数将所述第二流式细胞术事件数据转换成所述第二低维空间内与所述第二事件相关联的第二经转换的流式细胞术事件数据。In some embodiments, to transform the first flow cytometry event data, the processor is programmed by the executable instructions to: transform the first flow cytometry event data into first transformed flow cytometry event data associated with the first event in a second low-dimensional space using a second dimensionality reduction function. The second low-dimensional space may be associated with a second plurality of bins. The first transformed flow cytometry event data in the second low-dimensional space may be associated with a first bin in the second plurality of bins. To transform the second flow cytometry event data, the processor is programmed by the executable instructions to: transform the second flow cytometry event data into second transformed flow cytometry event data associated with the second event in the second low-dimensional space using the second dimensionality reduction function.

所述第二低维空间内的所述第二经转换的流式细胞术事件数据可与所述第二多个分箱中的第二分箱相关联。所述第一多个分箱中的所述第一分箱可与第一目的细胞类型相关联,所述第二多个分箱中的所述第二分箱可与第二目的细胞类型相关联,所述第一多个分箱中的所述第二分箱与所述第一目的细胞类型无关联,所述第一多个分箱中的所述第二分箱与所述第二目的细胞类型无关联,所述第二多个分箱中的所述第一分箱与所述第二目的细胞类型无关联,和/或所述第二多个分箱中的所述第一分箱与所述第一目的细胞类型无关联。所述第一多个分箱中的所述第一分箱与所述第二多个分箱中的所述第一分箱的组合可与第一目的细胞类型相关联,和/或所述第一多个分箱中的所述第二分箱与所述第二多个分箱中的所述第二分箱的组合可与第二目的细胞类型相关联。所述第一多个分箱中的所述第一分箱与所述第二多个分箱中的所述第二分箱的组合与所述第一目的细胞类型和所述第二目的细胞类型无关联,和/或所述第一多个分箱中的所述第二分箱与所述第二多个分箱中的所述第一分箱的组合与所述第一目的细胞类型和所述第二目的细胞类型无关联。The second transformed flow cytometry event data within the second low-dimensional space may be associated with a second bin in the second plurality of bins. The first bin in the first plurality of bins may be associated with a first target cell type, the second bin in the second plurality of bins may be associated with a second target cell type, the second bin in the first plurality of bins is not associated with the first target cell type, the second bin in the first plurality of bins is not associated with the second target cell type, the first bin in the second plurality of bins is not associated with the second target cell type, and/or the first bin in the second plurality of bins is not associated with the first target cell type. The combination of the first bin in the first plurality of bins and the first bin in the second plurality of bins may be associated with a first target cell type, and/or the combination of the second bin in the first plurality of bins and the second bin in the second plurality of bins may be associated with a second target cell type. A combination of the first bin in the first plurality of bins and the second bin in the second plurality of bins is not associated with the first target cell type and the second target cell type, and/or a combination of the second bin in the first plurality of bins and the first bin in the second plurality of bins is not associated with the first target cell type and the second target cell type.

在一些实施例中,所述第一多个分箱中的两个分箱具有相同尺寸。所述第一多个分箱中的每个分箱可具有相同尺寸。所述第一多个分箱中的两个分箱可具有不同尺寸。所述第一多个分箱中的两个分箱可以包含数量大致相同的经转换的流式细胞术事件数据。所述第一多个分箱中的每个分箱可以包含数量大致相同的经转换的流式细胞术事件数据。所述处理器可被所述可执行指令编程为:确定所述第一多个分箱中每个分箱的尺寸。所述处理器可被所述可执行指令编程为:基于多个门确定所述第一多个分箱中每个分箱的所述尺寸。所述处理器可被所述可执行指令编程为:基于与多种目的细胞相关联的所述经转换的流式细胞术事件数据确定所述第一多个分箱中每个分箱的所述尺寸。In some embodiments, two of the first plurality of bins have the same size. Each of the first plurality of bins may have the same size. Two of the first plurality of bins may have different sizes. Two of the first plurality of bins may contain approximately the same amount of converted flow cytometry event data. Each of the first plurality of bins may contain approximately the same amount of converted flow cytometry event data. The processor may be programmed by the executable instructions to determine the size of each of the first plurality of bins. The processor may be programmed by the executable instructions to determine the size of each of the first plurality of bins based on a plurality of gates. The processor may be programmed by the executable instructions to determine the size of each of the first plurality of bins based on the converted flow cytometry event data associated with a plurality of cells of interest.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1显示了用于分析和显示生物事件的分选控制系统的一个示例的功能框图。FIG. 1 shows a functional block diagram of one example of a sorting control system for analyzing and displaying biological events.

图2A是根据本文所示的一个实施例的颗粒分选仪系统的示意图。2A is a schematic diagram of a particle sorter system according to one embodiment described herein.

图2B是根据本文所示的一个实施例的另一颗粒分选仪系统的示意图。2B is a schematic diagram of another particle sorter system according to one embodiment described herein.

图3显示了用于基于计算的样品分析和颗粒表征的颗粒分析系统的功能框图。FIG3 shows a functional block diagram of a particle analysis system for computation-based sample analysis and particle characterization.

图4是显示对流式细胞术事件数据进行二次采样的示例性方法的流程图。4 is a flow chart showing an exemplary method for subsampling flow cytometry event data.

图5是被配置成实施对流式细胞术事件数据进行二次采样的方法的说明性计算系统。5 is an illustrative computing system configured to implement a method of subsampling flow cytometry event data.

具体实施方式DETAILED DESCRIPTION

在下文更为详细的描述中,参考了构成本文组成部分的附图。在附图中,除非上下文另外指出,否则类似符号通常标识类似组件。在详细描述、附图和权利要求中描述的说明性实施例并非旨在起限制作用。在不脱离本文所述主题的精神或范围的情况下,可使用其他实施例,且可做出其他变更。容易理解的是,本文通常所述的且在附图中示出的本发明的方面可以各种不同的配置来布置、替换、组合、分离和设计,所有这些都在本文中进行了明确的设想,且构成了本文公开内容的一部分。In the more detailed description below, reference is made to the accompanying drawings which form part of this document. In the accompanying drawings, similar symbols generally identify similar components unless the context indicates otherwise. The illustrative embodiments described in the detailed description, drawings, and claims are not intended to be limiting. Other embodiments may be used and other changes may be made without departing from the spirit or scope of the subject matter described herein. It is readily understood that the aspects of the invention generally described herein and shown in the accompanying drawings may be arranged, substituted, combined, separated, and designed in a variety of different configurations, all of which are expressly contemplated herein and constitute a part of the disclosure herein.

颗粒分析仪(例如,流式细胞仪和扫描式细胞仪)是一种分析工具,其可基于光散射和荧光等电光测量值来表征颗粒。在流式细胞仪中,例如,流体悬浮液中的颗粒(例如分子、与分析物结合的微珠或个别细胞)通过检测区域,在所述区域内,所述颗粒暴露于通常来自一个或更多个激光器的激发光中,并且测定所述颗粒的光散射和荧光特性。颗粒或其组分通常用荧光染料标记以便进行检测。各种不同的颗粒或组分可以通过用光谱特性不同的荧光染料标记来同时加以检测。Particle analyzers (e.g., flow cytometers and scanning cytometers) are analytical tools that characterize particles based on electro-optical measurements such as light scattering and fluorescence. In a flow cytometer, for example, particles (e.g., molecules, microbeads bound to an analyte, or individual cells) in a fluid suspension pass through a detection region where they are exposed to excitation light, typically from one or more lasers, and their light scattering and fluorescence properties are measured. Particles or their components are typically labeled with fluorescent dyes for detection. Various different particles or components can be detected simultaneously by labeling them with fluorescent dyes that have different spectral properties.

在一些实施方案中,所述分析仪包括多个光电检测器,一个适用于待测定的每个散射参数,一个或更多个适用于待检测的每种不同染料。例如,一些实施例包括光谱结构,其中针对每种染料使用一个以上的传感器或检测器。所获得的数据包含针对所述光散射检测器和所述荧光发射中的每一项测得的所述信号。In some embodiments, the analyzer includes multiple photodetectors, one for each scattering parameter to be determined, and one or more for each different dye to be detected. For example, some embodiments include a spectral configuration in which more than one sensor or detector is used for each dye. The data obtained includes the signals measured for each of the light scattering detector and the fluorescent emission.

颗粒分析仪可以进一步包含用于记录所测得的数据并分析所述数据的装置。例如,可以使用与所述检测电子器件连接的计算机来进行数据存储和分析。例如,所述数据可以列表形式存储,其中每一行对应于一颗颗粒的数据,而所述列对应于每项被测特征。使用标准文件格式(例如,流式细胞术标准(“FCS”)文件格式)存储来自颗粒分析仪的数据有助于采用单独的程序和/或机器分析数据。采用当前分析方法时,所述数据通常以一维直方图或二维(2D)图的形式展示,以便于可视化,但也可采用其他方法来使多维数据可视化。The particle analyzer may further include a device for recording the measured data and analyzing the data. For example, a computer connected to the detection electronic device can be used for data storage and analysis. For example, the data can be stored in a list format, where each row corresponds to the data of one particle and the column corresponds to each measured feature. Using a standard file format (e.g., a flow cytometry standard ("FCS") file format) to store data from a particle analyzer facilitates the use of a separate program and/or machine to analyze the data. When using current analysis methods, the data is typically displayed in the form of a one-dimensional histogram or a two-dimensional (2D) graph for easy visualization, but other methods can also be used to visualize multi-dimensional data.

例如,使用流式细胞仪测定的参数通常包括所述颗粒以狭角主要沿前向散射(被称为前向散射(FSC))的光;所述颗粒沿与所述激发激光器正交的方向散射(被称为侧向散射(SSC))的光;以及由一个或更多个检测器(用于在一定光谱波长范围内测量信号)中的荧光分子发射出的光,或者由主要在所述特定检测器或检测器阵列中检测到的荧光染料发射出的光。不同的细胞类型可以通过用荧光染料标记的抗体或其他荧光探针标记各种细胞蛋白或其他组分而得到/产生的光散射特性和荧光发射加以识别。For example, the parameters measured using flow cytometry typically include light scattered by the particles at narrow angles primarily in the forward direction (referred to as forward scatter (FSC)); light scattered by the particles in a direction orthogonal to the excitation laser (referred to as side scatter (SSC)); and light emitted by fluorescent molecules in one or more detectors (used to measure signals over a certain spectral wavelength range), or light emitted by fluorescent dyes detected primarily in the particular detector or detector array. Different cell types can be identified by light scattering characteristics and fluorescent emissions obtained/generated by labeling various cellular proteins or other components with fluorescent dye-labeled antibodies or other fluorescent probes.

流式细胞仪和扫描式细胞仪均可从诸如BD Biosciences(加利福尼亚州圣何塞)等处购得。流式细胞术的相关描述请见以下出版物,例如,Landy等人(编),临床流式细胞术,纽约科学院年鉴,第677卷(1993年);Bauer等人(编),临床流式细胞术:原理与应用,Williams&Wilkins(1993年);Ormerod(编),流式细胞术:实用方法,牛津大学出版社(1994年);Jaroszeski等人(编),流式细胞术方案,Methods in Molecular Biology,第91期,Humana Press(1997年);以及Shapiro,实用流式细胞术,第4版,Wiley-Liss(2003年);这些出版物中的内容以引用方式并入本文。Flow cytometers and scanning cytometers are available from, for example, BD Biosciences (San Jose, CA). Flow cytometry is described in, for example, Landy et al. (eds.), Clinical Flow Cytometry, Annals of the New York Academy of Sciences, Vol. 677 (1993); Bauer et al. (eds.), Clinical Flow Cytometry: Principles and Applications, Williams & Wilkins (1993); Ormerod (ed.), Flow Cytometry: A Practical Approach, Oxford University Press (1994); Jaroszeski et al. (eds.), Flow Cytometry Protocols, Methods in Molecular Biology, No. 91, Humana Press (1997); and Shapiro, Practical Flow Cytometry, 4th Edition, Wiley-Liss (2003); the contents of these publications are incorporated herein by reference.

荧光成像显微镜检查的相关描述请见以下出版物,例如,Pawley(编),生物共聚焦显微镜检查手册,第2版,Plenum Press(1989年),其以引用方式并入本文。Fluorescence imaging microscopy is described in, for example, Pawley (ed.), Handbook of Biological Confocal Microscopy, 2nd ed., Plenum Press (1989), which is incorporated herein by reference.

利用多色流式细胞术进行细胞(或其他颗粒)分析所获得的数据是多维的,其中每个细胞对应于由所述被测参数定义的多维空间中的一个点。细胞群或颗粒群被识别为所述数据空间中的点的聚类。可以通过在所述数据的一张或更多张二维图(被称为“散点图”或“点阵图”)中显示的某一群体周围绘制门来手动识别聚类和群体。或者,可以自动识别聚类,而且可以自动确定定义所述群体界限的门。用于自动设门的方法示例请见以下出版物,例如,编号如下的美国专利:4,845,653;5,627,040;5,739,000;5,795,727;5,962,238;6,014,904;The data obtained from the analysis of cells (or other particles) using multicolor flow cytometry is multidimensional, with each cell corresponding to a point in the multidimensional space defined by the measured parameters. Populations of cells or particles are identified as clusters of points in the data space. Clusters and populations can be manually identified by drawing gates around a population displayed in one or more two-dimensional graphs of the data (called "scatter plots" or "dot plots"). Alternatively, clusters can be automatically identified and gates defining the boundaries of the populations can be automatically determined. Examples of methods for automatic gating can be found in the following publications, for example, U.S. Patents No. 4,845,653; 5,627,040; 5,739,000; 5,795,727; 5,962,238; 6,014,904;

6,944,338和8,990,047;这些专利中的内容以引用方式并入本文。6,944,338 and 8,990,047; the contents of which are incorporated herein by reference.

流式细胞术是用于分析和分离生物颗粒(例如,细胞和组成分子)的有效方法,因此,它广泛应用于诊断和治疗中。所述方法利用流体介质线性分离颗粒,使所述颗粒可以排成一排,以通过检测装置。个别细胞可以根据其在流体介质中的位置以及是否存在可检测标记物来区分。因此,流式细胞仪可用于表征和生成生物颗粒群的诊断配置文件。Flow cytometry is an effective method for analyzing and separating biological particles (e.g., cells and constituent molecules), and therefore, it is widely used in diagnosis and treatment. The method utilizes a fluid medium to linearly separate particles so that the particles can be arranged in a row to pass through a detection device. Individual cells can be distinguished based on their location in the fluid medium and the presence or absence of detectable markers. Therefore, flow cytometry can be used to characterize and generate diagnostic profiles of biological particle populations.

生物颗粒的分离已通过使流式细胞仪具有分选或收集功能而实现。经检测具有一种或更多种所需特征的分离流中的颗粒通过机械或电气分离方式从所述样品流中单独分离出来。这种流式分选方法已用于分选不同类型的细胞,分离含有X和Y染色体的精子以进行动物育种,分选染色体以进行遗传分析,以及从复杂的生物种群中分离出特定的生物体。Separation of biological particles has been achieved by adding a sorting or collection function to flow cytometers. Particles in the separation stream that have been detected to have one or more desired characteristics are individually separated from the sample stream by mechanical or electrical separation. This flow sorting method has been used to sort different types of cells, separate sperm containing X and Y chromosomes for animal breeding, sort chromosomes for genetic analysis, and isolate specific organisms from complex biological populations.

设门用于帮助理解可能由样品生成的大量数据并对其进行分类。鉴于一个给定样品所呈现出的大量数据,需要有效控制所述数据的图形显示。Gating is used to help understand and classify the large amount of data that may be generated by a sample. Given the large amount of data presented by a given sample, there is a need to effectively control the graphical display of that data.

荧光激活颗粒分选或细胞分选是一种专用流式细胞术。Fluorescence activated particle sorting or cell sorting is a specialized type of flow cytometry.

荧光激活颗粒分选或细胞分选提供了一种将颗粒异质混合物分选至一个或更多个容器中的方法,所述方法基于各细胞的特定光散射和荧光特性实施,每次分选一种细胞。它记录来自个体细胞的荧光信号,并在物理上分离特定目的细胞。缩略词FACS是Becton,Dickinson and Company(新泽西州富兰克林湖)的商标且归Becton Dickinson所有,可用于指代执行荧光激活颗粒分选或细胞分选的设备。Fluorescence activated particle sorting or cell sorting provides a method for sorting a heterogeneous mixture of particles into one or more containers based on the specific light scattering and fluorescence properties of each cell, sorting one cell at a time. It records the fluorescent signals from individual cells and physically separates the specific cells of interest. The abbreviation FACS is a trademark of Becton, Dickinson and Company (Franklin Lakes, NJ) and is owned by Becton Dickinson and may be used to refer to equipment that performs fluorescence activated particle sorting or cell sorting.

所述颗粒悬浮液置于狭窄、快速流动的液流中心附近。所述流的安置使得平均而言,在颗粒随机抵达(例如,泊松过程)所述检测区域时,相对于其直径,颗粒之间存在较大的间隔。振动机制可使流出的流体介质稳定破碎成单一液滴,所述液滴中包含先前在所述检测区域中表征的颗粒。通常可对所述系统进行调整,使液滴中存在一颗以上颗粒的概率偏低。如果将颗粒归类为待收集,则在一段时间内,可向所述流动池和流出流中施加电荷,以形成一滴或更多滴液滴并与所述流脱离。而后,这些带电液滴穿过静电偏转系统,所述系统依据向所述液滴施加的电荷,将液滴转移至目标容器中。The particle suspension is placed near the center of a narrow, fast-flowing stream. The stream is arranged so that, on average, there are large spacings between particles relative to their diameters as they randomly arrive (e.g., a Poisson process) at the detection region. A vibration mechanism causes the outflowing fluid medium to steadily break up into single droplets containing particles previously characterized in the detection region. The system can typically be tuned so that the probability of more than one particle being present in a droplet is low. If a particle is classified as being collected, an electrical charge can be applied to the flow cell and outflow stream over a period of time to form one or more droplets that are separated from the stream. These charged droplets then pass through an electrostatic deflection system that transfers the droplets to a target container based on the charge applied to the droplets.

一个样品可能包括(即使没有数百万也有)数千个细胞。可以对细胞进行分选以使样品纯化为目的细胞。所述分选过程通常可以识别出三种细胞:目的细胞、非目的细胞和无法识别的细胞。为了分选出具有高纯度的细胞(例如,高浓度的目的细胞),如果所需细胞与另一不合需求的细胞过于接近,则生成液滴的细胞分选仪可以电子方式中止所述分选,从而减少因无意间使含有目的颗粒的液滴内包括不合需求的颗粒而对已分选群体造成的污染。A sample may include thousands, if not millions, of cells. The cells may be sorted to purify the sample to cells of interest. The sorting process typically identifies three types of cells: cells of interest, non-cells of interest, and unidentifiable cells. In order to sort cells with high purity (e.g., high concentrations of cells of interest), the cell sorter that generates the droplets may electronically abort the sorting if the desired cell is too close to another undesirable cell, thereby reducing contamination of the sorted population by inadvertent inclusion of undesirable particles in droplets containing particles of interest.

本发明公开了用于对流式细胞术事件数据进行二次采样的系统、设备、计算机可读介质和方法。在一些实施例中,方法包含:在处理器的控制下:将高维空间内与第一多个事件中的第一事件相关联的第一流式细胞术事件数据转换成第一低维空间内与所述第一事件相关联的第一经转换的流式细胞术事件数据。所述第一事件可与正二次采样需求相关联。The present invention discloses a system, apparatus, computer-readable medium and method for subsampling flow cytometry event data. In some embodiments, the method comprises: under the control of a processor: converting first flow cytometry event data associated with a first event in a first plurality of events in a high-dimensional space into first converted flow cytometry event data associated with the first event in a first low-dimensional space. The first event may be associated with a positive subsampling demand.

所述第一低维空间可与第一多个分箱相关联。所述第一经转换的流式细胞术事件数据可与所述第一多个分箱中的所述第一分箱相关联。所述方法可以包含:将所述高维空间内与所述第一多个事件中的第二事件相关联的第二流式细胞术事件数据转换成所述第一低维空间内与所述第二事件相关联的第二经转换的流式细胞术事件数据。所述第二事件可与所述正二次采样需求相关联。所述第二经转换的流式细胞术事件数据可与所述第一多个分箱中的第二分箱相关联。所述方法可以包含:确定与所述第一经转换的流式细胞术事件数据相关联的所述第一分箱和与所述第二经转换的流式细胞术事件数据相关联的所述第二分箱是不同的。所述方法可以包含:生成所述流式细胞术事件数据的所述经二次采样的流式细胞术事件数据集,所述数据包含与所述第一事件相关联的所述第一流式细胞术事件数据和与所述第二事件相关联的所述第二流式细胞术事件数据。The first low-dimensional space may be associated with a first plurality of bins. The first transformed flow cytometry event data may be associated with the first bin of the first plurality of bins. The method may include converting second flow cytometry event data associated with a second event of the first plurality of events in the high-dimensional space into second transformed flow cytometry event data associated with the second event in the first low-dimensional space. The second event may be associated with the positive subsampling requirement. The second transformed flow cytometry event data may be associated with a second bin of the first plurality of bins. The method may include determining that the first bin associated with the first transformed flow cytometry event data and the second bin associated with the second transformed flow cytometry event data are different. The method may include generating the subsampled flow cytometry event data set of the flow cytometry event data, the data including the first flow cytometry event data associated with the first event and the second flow cytometry event data associated with the second event.

本发明包括用于对流式细胞术事件数据进行二次采样的计算系统的实施例。在一些实施例中,所述计算系统可以包含:被配置成存储可执行指令的非暂时性存储器;以及与所述非暂时性存储器进行通信的处理器(例如,硬件处理器或虚拟处理器),所述处理器被所述可执行指令编程为:将高维空间内与第一多个事件中的第一事件相关联的第一流式细胞术事件数据转换成第一低维空间内流式细胞术事件数据集中与所述第一事件相关联的第一经转换的流式细胞术事件数据,其中所述第一事件与正二次采样需求相关联,其中所述第一低维空间与第一多个分箱相关联,且其中所述第一经转换的流式细胞术事件数据与所述第一多个分箱中的第一分箱相关联。所述处理器可被所述可执行指令编程为:将所述高维空间内所述流式细胞术事件数据集中与所述第一多个事件中的第二事件相关联的第二流式细胞术事件数据转换成所述第一低维空间内与所述第二事件相关联的第二经转换的流式细胞术事件数据,其中所述第二事件与所述正二次采样需求相关联,且其中所述第二经转换的流式细胞术事件数据与所述第一多个分箱中的第二分箱相关联。The present invention includes embodiments of a computing system for subsampling flow cytometry event data. In some embodiments, the computing system may include: a non-transitory memory configured to store executable instructions; and a processor (e.g., a hardware processor or a virtual processor) in communication with the non-transitory memory, the processor being programmed by the executable instructions to: convert first flow cytometry event data associated with a first event in a first plurality of events in a high-dimensional space into first converted flow cytometry event data associated with the first event in a flow cytometry event data set in a first low-dimensional space, wherein the first event is associated with a positive subsampling demand, wherein the first low-dimensional space is associated with a first plurality of bins, and wherein the first converted flow cytometry event data is associated with a first bin in the first plurality of bins. The processor may be programmed by the executable instructions to convert second flow cytometry event data associated with a second event of the first plurality of events in the flow cytometry event data set in the high-dimensional space into second transformed flow cytometry event data associated with the second event in the first low-dimensional space, wherein the second event is associated with the positive subsampling requirement, and wherein the second transformed flow cytometry event data is associated with a second bin of the first plurality of bins.

所述处理器可被所述可执行指令编程为:确定与所述第一经转换的流式细胞术事件数据相关联的所述第一分箱和与所述第二经转换的流式细胞术事件数据相关联的所述第二分箱是不同的。所述处理器可被所述可执行指令编程为:生成所述流式细胞术事件数据的所述经二次采样的流式细胞术事件数据集,所述数据包含与所述第一事件相关联的所述第一流式细胞术事件数据和与所述第二事件相关联的所述第二流式细胞术事件数据。The processor may be programmed by the executable instructions to determine that the first bin associated with the first transformed flow cytometry event data and the second bin associated with the second transformed flow cytometry event data are different. The processor may be programmed by the executable instructions to generate the subsampled flow cytometry event data set of the flow cytometry event data, the data comprising the first flow cytometry event data associated with the first event and the second flow cytometry event data associated with the second event.

定义definition

本文中使用的下文具体阐述的术语具有以下定义。除非本部分另有定义,否则本文中使用的所有术语的含义与本发明所属领域的普通技术人员通常理解的含义相同。The terms used herein and specifically set forth below have the following definitions. Unless otherwise defined in this section, all terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs.

本文中使用的“系统”、“仪器”、“装置”和“设备”通常涵盖硬件(例如,机械和电子硬件),而且在一些实施方案中,也涵盖相关软件(例如,用于图形控制的专用计算机程序)组件。As used herein, "system," "apparatus," "device," and "equipment" generally encompass hardware (e.g., mechanical and electronic hardware) and, in some embodiments, related software (e.g., a dedicated computer program for graphics control) components.

本文中使用的“事件”或“事件数据”通常是指从单种颗粒(例如,细胞或合成颗粒)中测得的数据(例如,组合数据包)。通常情况下,从单种颗粒中测得的数据包括许多参数或特征,包括一个/种或更多个/种光散射参数或特征,以及至少一个/种利用从所述颗粒中检测到的荧光所得到的其他参数或特征,例如,所述荧光强度。因此,每个事件均可被表示为参数和特征测量值向量,其中每个被测参数或特征对应于数据空间的一个维度。在一些实施例中,从单种颗粒中测得的数据包括图像、电气数据、时间数据或声学数据。事件可与实验、测定或样品来源(可以通过测量数据来识别)相关联。As used herein, "event" or "event data" generally refers to data (e.g., a combined data packet) measured from a single particle (e.g., a cell or a synthetic particle). Typically, the data measured from a single particle includes a number of parameters or features, including one or more light scattering parameters or features, and at least one other parameter or feature obtained using fluorescence detected from the particle, for example, the fluorescence intensity. Therefore, each event can be represented as a vector of parameter and feature measurements, where each measured parameter or feature corresponds to a dimension of the data space. In some embodiments, the data measured from a single particle includes images, electrical data, time data, or acoustic data. An event can be associated with an experiment, assay, or sample source (which can be identified by the measurement data).

本文中使用的颗粒“群体”或“子群体”(例如,细胞或其他颗粒)通常是指相对于一个或更多个被测参数具有特性(例如,光学、阻抗或时间特性)的一组颗粒,其使得被测参数数据在数据空间中形成聚类。因此,群体可被识别为数据中的聚类。相反,虽然通常也观测到了对应于噪点或本底的聚类,但每个数据聚类通常被解释为对应于特定类型的细胞或颗粒群。As used herein, a "population" or "subpopulation" of particles (e.g., cells or other particles) generally refers to a group of particles having properties (e.g., optical, impedance, or time properties) with respect to one or more measured parameters that cause the measured parameter data to form clusters in the data space. Thus, a population can be identified as a cluster in the data. Conversely, each data cluster is typically interpreted as corresponding to a particular type of cell or particle population, although clusters corresponding to noise or background are also typically observed.

可以在所述维度的子集(例如,相对于被测参数的子集)中定义聚类,其对应于仅在被测参数或特征(提取自所述细胞或颗粒测量值)的子集中有所不同的群体。Clusters may be defined in a subset of the dimensions (eg, relative to a subset of measured parameters), corresponding to groups that differ only in a subset of measured parameters or features (extracted from the cell or particle measurements).

本文中使用的“门”通常是指识别目的数据子集的分类器边界。在细胞计量中,门可能与一组特定目的事件相关联。本文中使用的“设门”通常是指针对一个给定数据集,利用一个定义门对所述数据进行分类的过程,其中所述门可以是一个或更多个与布尔逻辑相结合的目的区域。As used herein, "gate" generally refers to the boundaries of a classifier that identifies a subset of data of interest. In cytometry, a gate may be associated with a specific set of events of interest. As used herein, "gating" generally refers to the process of classifying a given data set using a defined gate, where the gate can be one or more regions of interest combined with Boolean logic.

下文进一步描述了各种实施例和系统的具体示例(可实现所述实施例和系统)。Specific examples of various embodiments and systems in which the embodiments and systems may be implemented are described further below.

分选控制系统Sorting control system

图1显示了用于分析和显示生物事件的分选控制系统(例如分析控制器100)的一个示例的功能框图。分析控制器100可被配置成实施用于控制生物事件图形显示的各种过程。1 shows a functional block diagram of one example of a sorting control system (eg, analysis controller 100) for analyzing and displaying biological events. Analysis controller 100 can be configured to implement various processes for controlling the graphical display of biological events.

颗粒分析仪或分选系统102可被配置成采集生物事件数据。例如,流式细胞仪可以生成流式细胞术事件数据。颗粒分析仪102可被配置成将生物事件数据提供给分析控制器100。可以在颗粒分析仪102和分析控制器100之间纳入一条数据通信通道。可以经由所述数据通信通道将所述生物事件数据提供给分析控制器100。The particle analyzer or sorting system 102 can be configured to collect biological event data. For example, a flow cytometer can generate flow cytometry event data. The particle analyzer 102 can be configured to provide the biological event data to the analysis controller 100. A data communication channel can be included between the particle analyzer 102 and the analysis controller 100. The biological event data can be provided to the analysis controller 100 via the data communication channel.

分析控制器100可被配置成接收来自颗粒分析仪102的生物事件数据。接收自颗粒分析仪102的生物事件数据可以包括流式细胞术事件数据。分析控制器100可被配置成向显示设备106提供包括生物事件数据第一图的图形显示。例如,分析控制器100可进一步被配置成围绕由显示设备106所示的多个生物事件数据渲染作为门的目的区域,其覆盖在第一图上。在一些实施例中,所述门可以是在单参数直方图或双变量图上绘制的一个或更多个目的图形区域的逻辑组合。Analysis controller 100 may be configured to receive biological event data from particle analyzer 102. The biological event data received from particle analyzer 102 may include flow cytometry event data. Analysis controller 100 may be configured to provide a graphical display including a first graph of biological event data to display device 106. For example, analysis controller 100 may be further configured to render a region of interest as a gate around a plurality of biological event data shown by display device 106, which is overlaid on the first graph. In some embodiments, the gate may be a logical combination of one or more graphical regions of interest drawn on a single parameter histogram or a bivariate graph.

或者,分析控制器100可进一步被配置成以不同于门外生物事件数据中其他事件的方式在门内的显示设备106上显示生物事件数据。例如,分析控制器100可被配置成使门内含有的生物事件数据的颜色与门外的生物事件数据的颜色不同。显示设备106可以显示器、平板计算机、智能电话或被配置成展示图形界面的其他电子设备的形式实现。Alternatively, analysis controller 100 may be further configured to display the bio-event data on display device 106 inside the door in a manner different from other events in the bio-event data outside the door. For example, analysis controller 100 may be configured to make the color of the bio-event data contained inside the door different from the color of the bio-event data outside the door. Display device 106 may be implemented in the form of a display, tablet computer, smart phone, or other electronic device configured to display a graphical interface.

分析控制器100可被配置成接收来自第一输入设备的识别所述门的门选择信号。例如,所述第一输入设备可以鼠标110的形式实现。鼠标110可以向分析控制器100发出门选择信号,以确定待在显示设备106上显示或经由显示设备106操纵的门(例如,当光标位于此处时,在所需门上或内单击)。在一些实施方案中,第一设备可以键盘108或用于向分析控制器100提供输入信号的其他装置(例如触摸屏、触控笔、光学检测器或语音识别系统)的形式实现。某些输入设备可能包括多种输入功能。在此等实施方案中,所述输入功能可以各自地被视为是一种输入设备。例如,如图1所示,鼠标110可以包括鼠标右键和鼠标左键,其均可产生触发事件。The analysis controller 100 may be configured to receive a door selection signal identifying the door from a first input device. For example, the first input device may be implemented in the form of a mouse 110. The mouse 110 may send a door selection signal to the analysis controller 100 to determine the door to be displayed on the display device 106 or manipulated via the display device 106 (e.g., when the cursor is located there, click on or in the desired door). In some embodiments, the first device may be implemented in the form of a keyboard 108 or other device (e.g., a touch screen, a stylus, an optical detector, or a voice recognition system) for providing an input signal to the analysis controller 100. Some input devices may include multiple input functions. In such embodiments, the input functions may be individually considered to be an input device. For example, as shown in Figure 1, the mouse 110 may include a right mouse button and a left mouse button, both of which may generate a trigger event.

所述触发事件可以促使分析控制器100改变显示数据的方式(在显示设备106上实际显示部分数据),和/或提供输入以作进一步处理,例如选择目的群体进行颗粒分选。The trigger event may cause the analysis controller 100 to change the manner in which the data is displayed (actually displaying a portion of the data on the display device 106), and/or provide input for further processing, such as selecting a target population for particle sorting.

在一些实施例中,分析控制器100可被配置成检测鼠标110启动门选择的时间。分析控制器100可进一步被配置成自动修改图可视化,以便促进所述设门过程。所述修改可以基于由分析控制器100接收的生物事件数据的特定分布进行。In some embodiments, the analysis controller 100 can be configured to detect when the mouse 110 initiates a gate selection. The analysis controller 100 can be further configured to automatically modify the graph visualization to facilitate the gating process. The modification can be based on a particular distribution of the bio-event data received by the analysis controller 100.

分析控制器100可与存储设备104连接。存储设备104可被配置成接收和存储来自分析控制器100的生物事件数据。存储设备104还可被配置成接收和存储来自分析控制器100的流式细胞术事件数据。存储设备104还可被配置成允许分析控制器100检索生物事件数据,例如,流式细胞术事件数据。Analysis controller 100 may be coupled to storage device 104. Storage device 104 may be configured to receive and store biological event data from analysis controller 100. Storage device 104 may also be configured to receive and store flow cytometry event data from analysis controller 100. Storage device 104 may also be configured to allow analysis controller 100 to retrieve biological event data, such as flow cytometry event data.

显示设备106可被配置成接收来自分析控制器100的显示数据。所述显示数据可以包含生物事件数据图和概述所述图部分的门。显示设备106可进一步被配置成根据接收自分析控制器100的输入以及接收自颗粒分析仪102、存储设备104、键盘108和/或鼠标110的输入来改变所展示的信息。The display device 106 can be configured to receive display data from the analysis controller 100. The display data can include graphs of biological event data and gates outlining portions of the graphs. The display device 106 can be further configured to change the information displayed based on input received from the analysis controller 100 and input received from the particle analyzer 102, the storage device 104, the keyboard 108, and/or the mouse 110.

在一些实施方案中,分析控制器100可以生成用户界面以接收示例事件,以便进行分选。例如,所述用户界面可以包括用于接收示例事件或示例图像的控件。可以在采集样品的事件数据之前或基于所述样品某一部分的初始事件集合来提供示例事件或图像或示例门。In some embodiments, the analysis controller 100 can generate a user interface to receive sample events for sorting. For example, the user interface can include controls for receiving sample events or sample images. Sample events or images or sample gates can be provided before event data for the sample is collected or based on an initial event set for a portion of the sample.

颗粒分选仪系统Particle Sorting System

常用流式分选技术(被称为“静电细胞分选”)采用液滴分选,其中使含有线性分离颗粒的流或移动液柱破碎成液滴,且使含有目的颗粒的液滴带电并通过电场偏转至收集管中。液滴分选系统能够在通过直径小于100微米的喷嘴的流体介质中以100,000滴/秒的速率形成液滴。液滴分选通常要求液滴在与喷嘴头相距一定距离处与所述流脱离。所述距离通常距离喷嘴头约几毫米,而且对于不受干扰的流体介质,所述距离是稳定的,且可以通过以预先确定的频率以及使所述脱离保持恒定的振幅振荡喷嘴头来维持。例如,在一些实施例中,在给定频率下调整正弦波形电压脉冲的振幅,使所述脱离保持稳定和恒定。A common flow sorting technique (referred to as "electrostatic cell sorting") employs droplet sorting, in which a stream or moving column of liquid containing linearly separated particles is broken into droplets, and droplets containing the particles of interest are charged and deflected into a collection tube by an electric field. Droplet sorting systems are capable of forming droplets at a rate of 100,000 drops per second in a fluid medium passing through a nozzle having a diameter of less than 100 microns. Droplet sorting typically requires that the droplets detach from the stream at a certain distance from the nozzle head. The distance is typically about a few millimeters from the nozzle head, and for an undisturbed fluid medium, the distance is stable and can be maintained by oscillating the nozzle head at a predetermined frequency and an amplitude that keeps the detachment constant. For example, in some embodiments, the amplitude of a sinusoidal waveform voltage pulse is adjusted at a given frequency to keep the detachment stable and constant.

通常情况下,所述流中的线性分离颗粒的特征在于它们会通过位于流动池或比色皿内或喷嘴头正下方的观测点。一旦确定颗粒满足一项或更多项所需标准,便可以预测其到达液滴破碎点并以一滴液滴的形式与所述流脱离的时间。理想情况下,在含有选定颗粒的液滴与所述流脱离之前,先对所述流体介质施加短暂的电荷,然后在液滴破碎后使其立即接地。Typically, linearly separated particles in the stream are characterized by their passage through an observation point located within a flow cell or cuvette or directly below a nozzle tip. Once a particle is determined to meet one or more desired criteria, it is possible to predict when it will reach the droplet breakup point and separate from the stream as a droplet. Ideally, a brief charge is applied to the fluid medium before the droplet containing the selected particle separates from the stream, and then the droplet is grounded immediately after breakup.

待分选的液滴在与所述流体介质脱离时会保持带电状态,而所有其他液滴则不带电。所述带电液滴在电场作用下侧向偏离其他液滴的下行轨迹,并被收集在样品管中。所述不带电的液滴直接落入排放管。The droplets to be sorted will remain charged when they are separated from the fluid medium, while all other droplets are uncharged. The charged droplets are laterally deviated from the downward trajectory of other droplets under the action of the electric field and are collected in the sample tube. The uncharged droplets fall directly into the discharge tube.

图2B是根据本文所示的一个实施例的颗粒分选仪系统200(例如,颗粒分析仪102)的示意图。在一些实施例中,颗粒分选仪系统200是细胞分选仪系统。如图2A所示,液滴形成传感器202(例如,压电振荡器)与流体导管201(其可与喷嘴203耦合,也可以包括或可以是喷嘴203)耦合。在流体导管201内,鞘液204以流体动力学的方式使样品流体206(包含颗粒209)聚集至移动液柱208(例如,流)中。在所述移动液柱208内,使颗粒209(例如,细胞)排成一排,以穿过由辐照源212(例如,激光器)辐照的受监测区域211(例如,激光流交点)。所述液滴形成传感器202振动使得移动液柱208破碎成多滴液滴210,其中一些含有颗粒209。FIG2B is a schematic diagram of a particle sorter system 200 (e.g., particle analyzer 102) according to one embodiment described herein. In some embodiments, the particle sorter system 200 is a cell sorter system. As shown in FIG2A, a droplet formation sensor 202 (e.g., a piezoelectric oscillator) is coupled to a fluid conduit 201 (which may be coupled to, may include, or may be a nozzle 203). Within the fluid conduit 201, a sheath fluid 204 fluidically aggregates a sample fluid 206 (containing particles 209) into a moving liquid column 208 (e.g., a stream). Within the moving liquid column 208, particles 209 (e.g., cells) are aligned to pass through a monitored area 211 (e.g., a laser stream intersection) irradiated by an irradiation source 212 (e.g., a laser). The droplet formation sensor 202 vibrates to cause the moving liquid column 208 to break into a plurality of droplets 210, some of which contain particles 209.

在运行中,检测站214(例如,事件检测器)确定目的颗粒(或目的细胞)何时穿过受监测区域211。检测站214馈入定时电路228,转而馈入瞬时充电电路230。在液滴破碎点,在定时液滴延迟(Δt)通知后,可向所述移动液柱208中施加瞬时电荷,使目的液滴带电。所述目的液滴可以包括一颗/个或更多颗/个待分选颗粒或细胞。然后,可以通过启用偏转板(未示出)来分选所述带电液滴,使液滴偏转至诸如收集管或多孔或微孔样品板等容器中,其中孔或微孔可与特定目的液滴相关联。如图2A所示,所述液滴可被收集在排放容器238中。In operation, a detection station 214 (e.g., an event detector) determines when a particle of interest (or cell of interest) passes through the monitored area 211. The detection station 214 feeds a timing circuit 228, which in turn feeds a transient charging circuit 230. At the droplet breakup point, after notification of a timing droplet delay (Δt), a transient charge can be applied to the moving liquid column 208 to charge the droplets of interest. The droplets of interest can include one or more particles or cells to be sorted. The charged droplets can then be sorted by activating a deflection plate (not shown) to deflect the droplets into a container such as a collection tube or a porous or microporous sample plate, where the holes or micropores can be associated with specific droplets of interest. As shown in FIG. 2A , the droplets can be collected in a discharge container 238.

检测系统216(例如,液滴边界检测器)用于在目的颗粒通过受监测区域211时自动确定液滴驱动信号的相位。示例性液滴边界检测器如第7,679,039号美国专利所述,所述专利全文以引用方式并入本文。检测系统216允许仪器准确地计算每颗检测到的颗粒在液滴中的位置。检测系统216可以馈入振幅信号220和/或相位218信号,然后依次(通过放大器222)馈入振幅控制电路226和/或频率控制电路224。A detection system 216 (e.g., a droplet boundary detector) is used to automatically determine the phase of the droplet drive signal as the particle of interest passes through the monitored area 211. An exemplary droplet boundary detector is described in U.S. Pat. No. 7,679,039, which is incorporated herein by reference in its entirety. The detection system 216 allows the instrument to accurately calculate the position of each detected particle in the droplet. The detection system 216 can be fed with an amplitude signal 220 and/or a phase 218 signal, which in turn is fed (via amplifier 222) to an amplitude control circuit 226 and/or a frequency control circuit 224.

振幅控制电路226和/或频率控制电路224反过来控制液滴形成传感器202。振幅控制电路226和/或频率控制电路224可以包括在控制系统中。The amplitude control circuit 226 and/or the frequency control circuit 224 in turn controls the drop formation sensor 202. The amplitude control circuit 226 and/or the frequency control circuit 224 may be included in the control system.

在一些实施方案中,分选电子器件(例如,检测系统216、检测站214、处理器240)可与被配置成存储所检测到的事件和基于其的分选决策的存储器耦合。所述分选决策可以包括在颗粒的事件数据中。在一些实施方案中,检测系统216和检测站214可以单个检测单元的形式实现或以通信方式耦合,使得事件测量值可以由检测系统216或检测站214中的一项收集并提供给所述非收集元件。In some embodiments, the sorting electronics (e.g., detection system 216, detection station 214, processor 240) may be coupled to a memory configured to store detected events and sorting decisions based thereon. The sorting decision may be included in the event data of the particle. In some embodiments, the detection system 216 and the detection station 214 may be implemented in the form of a single detection unit or coupled in a communication manner so that event measurements may be collected by one of the detection system 216 or the detection station 214 and provided to the non-collecting element.

图2B是根据本文所示的一个实施例的颗粒分选仪系统的示意图。图2B所示的颗粒分选仪系统200包括偏转板252和254。电荷通过倒钩中的流充电线施加。这产生了含有用于分析的颗粒210的液滴流210。所述颗粒可以用一个或更多个光源(例如,激光)照射,以产生光散射并生成荧光信息。所述颗粒信息通过诸如分选电子器件或其他检测系统(图2B中未示出)来进行分析。可以独立地控制偏转板252和254来吸引或排斥所述带电液滴,以将所述液滴引导至目的地收集容器中(例如,272、274、276或278中的一项)。如图2B所示,可以控制偏转板252和254,以沿着第一路径262朝向容器274或沿着第二路径268朝向容器278引导颗粒。如果所述颗粒并非目的颗粒(例如,在指定的分选范围内,不显示散射或照射信息),则偏转板可使所述颗粒沿着流路264继续流动。此类不带电液滴可以经由诸如抽吸器270进入废弃物容器。FIG. 2B is a schematic diagram of a particle sorter system according to an embodiment described herein. The particle sorter system 200 shown in FIG. 2B includes deflection plates 252 and 254. The charge is applied by the stream charge wire in the barb. This produces a droplet stream 210 containing particles 210 for analysis. The particles can be irradiated with one or more light sources (e.g., lasers) to produce light scattering and generate fluorescence information. The particle information is analyzed by, for example, a sorting electronic device or other detection system (not shown in FIG. 2B). Deflection plates 252 and 254 can be independently controlled to attract or repel the charged droplets to guide the droplets to a destination collection container (e.g., one of 272, 274, 276, or 278). As shown in FIG. 2B, deflection plates 252 and 254 can be controlled to guide particles along a first path 262 toward a container 274 or along a second path 268 toward a container 278. If the particle is not a particle of interest (e.g., within a specified sorting range, does not show scattering or illumination information), the deflector plate may cause the particle to continue flowing along the flow path 264. Such uncharged droplets may enter a waste container via, for example, an aspirator 270.

可以包括所述分选电子器件,以开始收集测量值,接收颗粒的荧光信号,并确定如何调整所述偏转板以分选所述颗粒。图2B中所示的实施例的示例实施方案包括由Becton,Dickinson and Company(新泽西州富兰克林湖)市售的BD FACSAriaTM系列流式细胞仪。The sorting electronics may be included to begin collecting measurements, receive the fluorescence signal of a particle, and determine how to adjust the deflection plates to sort the particle. An exemplary implementation of the embodiment shown in FIG2B includes a BD FACSAria series flow cytometer commercially available from Becton, Dickinson and Company (Franklin Lakes, NJ).

在一些实施例中,适用于颗粒分选仪系统200的一个或更多个所述组件可用于颗粒分析和表征,无论是否将所述颗粒物理分选至收集容器中。同样地,适用于颗粒分析系统300(图3)的一个或更多个下述组件也可用于颗粒分析和表征,无论是否将所述颗粒物理分选至收集容器中。例如,可以采用颗粒分选仪系统200或颗粒分析系统300中的一个或更多个所述组件,对颗粒进行分组或使其显示在树中,所述树包括至少三个本文所述的分组。In some embodiments, one or more of the components described herein applicable to the particle sorter system 200 can be used for particle analysis and characterization, whether or not the particles are physically sorted into a collection container. Similarly, one or more of the components described herein applicable to the particle analysis system 300 (FIG. 3) can also be used for particle analysis and characterization, whether or not the particles are physically sorted into a collection container. For example, one or more of the components described herein in the particle sorter system 200 or the particle analysis system 300 can be used to group particles or display them in a tree, the tree including at least three of the groupings described herein.

图3显示了用于基于计算的样品分析和颗粒表征的颗粒分析系统的功能框图。在一些实施例中,颗粒分析系统300是流式系统。例如,图3所示的颗粒分析系统300可被配置成全部或部分执行本文所述的方法。颗粒分析系统300包括射流系统302。射流系统302可以包括样品管310和样品管内的移动液柱,或与所述样品管耦合,在所述样品管内,样品中的颗粒330(例如,细胞)沿共同样品路径320移动。FIG3 shows a functional block diagram of a particle analysis system for computation-based sample analysis and particle characterization. In some embodiments, the particle analysis system 300 is a flow system. For example, the particle analysis system 300 shown in FIG3 can be configured to perform all or part of the methods described herein. The particle analysis system 300 includes a fluidics system 302. The fluidics system 302 can include or be coupled to a sample tube 310 and a moving liquid column within the sample tube, in which particles 330 (e.g., cells) in the sample move along a common sample path 320.

颗粒分析系统300包括检测系统304,所述检测系统304被配置成在每颗颗粒沿所述共同样品路径穿过一个或更多个检测站时,收集来自每颗颗粒的信号。检测站308通常是指所述共同样品路径的受监测区域340。在一些实施方案中,检测可以包括在颗粒330穿过受监测区域340时,检测光或所述颗粒的一种或更多种其他特性。在图3中,显示了具有受监测区域340的检测站308。颗粒分析系统300的一些实施方案可以包括多个检测站。此外,某些检测站可能监测一个以上区域。The particle analysis system 300 includes a detection system 304 configured to collect signals from each particle as each particle passes through one or more detection stations along the common sample path. Detection station 308 generally refers to a monitored area 340 of the common sample path. In some embodiments, detection can include detecting light or one or more other characteristics of the particle as the particle 330 passes through the monitored area 340. In FIG. 3 , a detection station 308 is shown with a monitored area 340. Some embodiments of the particle analysis system 300 can include multiple detection stations. In addition, some detection stations may monitor more than one area.

为每个信号分配一个信号值,以构成每颗颗粒的数据点。如上所述,所述数据可以被称为事件数据。所述数据点可以是多维数据点,其包括针对颗粒的各所测特性的值。检测系统304被配置成在第一时间间隔内收集一系列所述数据点。Each signal is assigned a signal value to form a data point for each particle. As described above, the data may be referred to as event data. The data point may be a multi-dimensional data point that includes values for each measured characteristic of the particle. The detection system 304 is configured to collect a series of the data points within a first time interval.

颗粒分析系统300还包括控制系统306。控制系统306可以包括一个或更多个处理器、振幅控制电路226和/或频率控制电路224,具体如图2B所示。所示控制系统206在操作上可与射流系统302相关联。The particle analysis system 300 also includes a control system 306. The control system 306 may include one or more processors, an amplitude control circuit 226, and/or a frequency control circuit 224, as shown in FIG2B. The control system 206 may be operatively associated with the fluidics system 302.

控制系统206可被配置成基于泊松分布以及在所述第一时间间隔内由检测系统304收集的数据点数量,针对所述第一时间间隔的至少一部分生成计算出的信号频率。控制系统306可进一步被配置成基于在所述第一时间间隔的所述部分内的数据点数量,生成实验信号频率。此外,控制系统306可将所述实验信号频率与计算出的信号频率或预先确定的信号频率进行比较。The control system 206 can be configured to generate a calculated signal frequency for at least a portion of the first time interval based on a Poisson distribution and a number of data points collected by the detection system 304 during the first time interval. The control system 306 can be further configured to generate an experimental signal frequency based on the number of data points during the portion of the first time interval. In addition, the control system 306 can compare the experimental signal frequency to the calculated signal frequency or the predetermined signal frequency.

对流式细胞术事件数据进行二次采样Subsampling flow cytometry event data

本发明包括用于对数据集(例如,大型高维数据集)进行二次采样的系统、设备、计算机可读介质和方法,其能对稀有事件和群体进行加权,使得所述稀有事件和群体在所得子集中适当示出。在一些实施例中,在保存整个数据集不可取的情况下,保留所有群体(例如,包括稀有细胞和群体的所有群体)的数据集的子集可以在所述数据集或经二次采样的数据集的子集中示出。在一些实施例中,所述子集或经二次采样的数据集保留来自稀有子群体的代表性样品。在一些实施例中,可以在不舍弃稀有事件或目的事件(例如,对应于稀有细胞或目的细胞)的情况下对数据集进行二次采样。所述系统会自动检测稀有事件并予以保存,同时更积极地舍弃常见事件。The present invention includes systems, devices, computer-readable media, and methods for subsampling a data set (e.g., a large, high-dimensional data set) that can weight rare events and populations so that the rare events and populations are appropriately shown in the resulting subset. In some embodiments, where it is not desirable to save the entire data set, a subset of the data set that retains all populations (e.g., all populations including rare cells and populations) can be shown in a subset of the data set or a subsampled data set. In some embodiments, the subset or subsampled data set retains representative samples from rare subpopulations. In some embodiments, the data set can be subsampled without discarding rare events or target events (e.g., corresponding to rare cells or target cells). The system automatically detects rare events and saves them, while more actively discarding common events.

在一些实施例中,可以非随机地(例如,半随机地)对数据进行二次采样。可以选择所需的二次采样率,然后依次通过二次采样方法馈入所述数据。所述方法可以决定在单个事件基础上保存或舍弃事件(或与事件相关联的多维事件数据)。在不分析事件的整体分布的情况下即可舍弃事件的功能消除了保存和分析大量数据的需求。In some embodiments, the data may be subsampled non-randomly (e.g., semi-randomly). A desired subsampling rate may be selected and the data may then be fed sequentially through the subsampling method. The method may decide to save or discard events (or multi-dimensional event data associated with an event) on a per-event basis. The ability to discard events without analyzing the overall distribution of events eliminates the need to save and analyze large amounts of data.

用户可以选择二次采样参数水平。所述二次采样参数水平可以确定所述算法“内存”的持续时间。The user can select the subsampling parameter level. The subsampling parameter level can determine the duration of the algorithm's "memory".

用户可以选择一种或更多种转换或“指纹识别”函数。转换或指纹识别函数可以是以某种方式,例如将所述数据从高维空间数据转换为低维空间数据的数学方程式。例如,转换或指纹识别函数可以是t分布随机邻居嵌入(t-SNE)。可以将事件转换为低维空间事件,所述空间被划分为分箱。所述分箱编号可以作为所述事件的描述符。The user can select one or more transformation or "fingerprinting" functions. The transformation or fingerprinting function can be a mathematical equation that transforms the data from a high-dimensional space data to a low-dimensional space data in some way, such as. For example, the transformation or fingerprinting function can be a t-distributed stochastic neighbor embedding (t-SNE). Events can be transformed into low-dimensional space events, and the space is divided into bins. The bin numbers can serve as descriptors for the events.

在一些实施例中,分箱可以是一致的,也可以基于事件密度进行。分箱可以基于自动群体检测进行。分箱可以部分基于任意绘制的门(例如,用户绘制的门)进行。在一些实施例中,转换或指纹识别函数可以转换事件,使得类似事件具有相同的标识符。标识符可以小于用于生成所述标识符的数据。就计算方面而言,所述转换计算可能较为便宜。所述转换或函数的逆转可能存在也可能不存在。在一些实施例中,可以使用多种指纹识别函数。例如,可以使用不同的指纹识别函数定义不同的目标群体。作为另一示例,可以基于多种指纹识别函数的组合输出来定义目标群体。In some embodiments, binning can be consistent or based on event density. Binning can be based on automatic population detection. Binning can be based in part on arbitrarily drawn gates (e.g., user-drawn gates). In some embodiments, a transformation or fingerprinting function can transform events so that similar events have the same identifier. The identifier can be smaller than the data used to generate the identifier. The transformation calculation may be computationally cheaper. The reversal of the transformation or function may or may not exist. In some embodiments, multiple fingerprinting functions can be used. For example, different target groups can be defined using different fingerprinting functions. As another example, a target group can be defined based on the combined output of multiple fingerprinting functions.

第三,用户可以描述不应被二次采样的事件。例如,目的区域周围的门可以自动绘制或由用户绘制。可能不会对目的区域周围的门内的事件进行二次采样。作为另一示例,可能不会对任何被分选的事件(例如,未被分选的细胞)进行二次采样。Third, the user can describe events that should not be subsampled. For example, a gate around the region of interest can be drawn automatically or by the user. Events within the gate around the region of interest may not be subsampled. As another example, any sorted events (e.g., unsorted cells) may not be subsampled.

可以使用本文所揭示的二次采样方法对所述事件数据进行二次采样。例如,对于每个事件:The event data may be subsampled using the subsampling method disclosed herein. For example, for each event:

ii.检查是否应对所述事件进行二次采样。如果答案为否,则保存所述事件。ii. Check if the event should be subsampled. If the answer is no, save the event.

iii.如果应对所述事件进行二次采样,则使用所述指纹识别函数生成描述符。iii. If the event should be sub-sampled, a descriptor is generated using the fingerprinting function.

1.将描述符与算法“内存”进行比较。以前是否见过此描述符?1. Compare the descriptor to the algorithm "memory". Have you seen this descriptor before?

a.见过。舍弃事件a. Seen. Abandoned event

b.未见过。保存事件并将描述符保存在内存中。b. Not seen before. Save the event and keep the descriptor in memory.

iv.检查时间或事件数。如果所述时间和/或事件数超过基于用户的二次采样参数水平所生成的相应阈值,则重置算法存储器。iv. Check time or number of events. If the time and/or number of events exceeds the corresponding thresholds generated based on the user's subsampling parameter level, reset the algorithm memory.

本文所揭示的二次采样方法(一种非随机二次采样方法)可以是用于对大型数据集进行二次采样的补足性或补充性随机二次采样方法。对数据进行随机采样时,可以消除稀有群体。所述二次采样方法可以包括一些、多数或全部稀有群体。对于颗粒分析(例如流式细胞术分析),稀有事件可能极具价值。保留稀有群体可能是有用的,以便在分析经缩减的数据集时检测稀有群体。所述非随机二次采样方法可以有意偏离随机采样过程,使稀有群体更有可能在最终经二次采样的数据集中示出。The subsampling method disclosed herein (a non-random subsampling method) can be a complementary or supplementary random subsampling method for subsampling large data sets. When randomly sampling data, rare populations can be eliminated. The subsampling method can include some, most or all rare populations. For particle analysis (e.g., flow cytometric analysis), rare events may be extremely valuable. It may be useful to retain rare populations in order to detect rare populations when analyzing a reduced data set. The non-random subsampling method can intentionally deviate from the random sampling process to make rare populations more likely to be shown in the final subsampled data set.

在未降维的情况下对所述数据空间进行简单分割可能会使分箱因所谓的“维数灾难”而出现数据稀少的情况。所采用的降维变换或函数可以是关系保持嵌入,其允许在低维空间内进行分箱,且允许在二次采样之前对数据进行更为有效的分组。A simple partitioning of the data space without dimensionality reduction may result in bins with sparse data due to the so-called "curse of dimensionality". The dimensionality reduction transformation or function employed may be a relation-preserving embedding that allows binning in a lower dimensional space and allows for more efficient grouping of the data prior to subsampling.

颗粒分析事件数据二次采样方法Secondary sampling method for particle analysis event data

图4是显示对颗粒分选事件数据(例如,流式细胞术事件数据)进行二次采样的示例性方法400的流程图。方法400可以体现在存储于计算机可读介质(例如,计算系统的一个或更多个磁盘驱动器)上的一组可执行程序指令中。例如,图5所示以及下文中更为详细地描述的计算系统500可以执行一组可执行程序指令,以实施方法400。在启用方法400时,所述可执行程序指令可被加载至诸如RAM的存储器中,并由计算系统500的一个或更多个处理器执行。虽然就图5所示的计算系统500描述了方法400,但所述描述仅为了说明目的而提供,而非旨在进行限定。在一些实施例中,方法400或其部分可以由多个计算系统串行或并行执行。FIG. 4 is a flow chart showing an exemplary method 400 for subsampling particle sorting event data (e.g., flow cytometry event data). Method 400 may be embodied in a set of executable program instructions stored on a computer-readable medium (e.g., one or more disk drives of a computing system). For example, a computing system 500 shown in FIG. 5 and described in more detail below may execute a set of executable program instructions to implement method 400. When method 400 is enabled, the executable program instructions may be loaded into a memory such as a RAM and executed by one or more processors of the computing system 500. Although method 400 is described with respect to the computing system 500 shown in FIG. 5, the description is provided for illustrative purposes only and is not intended to be limited. In some embodiments, method 400 or portions thereof may be executed serially or in parallel by a plurality of computing systems.

方法400在框404处开始后,方法400前进至框408,其中计算系统将高维空间内流式细胞术事件数据集中与第一多个事件中的第一事件相关联的第一流式细胞术事件数据转换成第一低维空间内与所述第一事件相关联的第一经转换的流式细胞术事件数据。所述第一事件可与正二次采样需求相关联。例如,对包含所述第一流式细胞术事件数据的流式细胞术事件数据进行二次采样时,所述经二次采样的流式细胞术事件数据可以不包括所述第一流式细胞术事件数据。所述第一低维空间可与第一多个分箱相关联。所述第一经转换的流式细胞术事件数据可与所述第一多个分箱中的所述第一分箱相关联。生成经二次采样的流式细胞术事件数据时,所述计算系统可以指示(例如,在数据结构中)应当包括所述第一流式细胞术事件数据。After method 400 starts at box 404, method 400 proceeds to box 408, where the computing system converts the first flow cytometry event data associated with the first event in the first plurality of events in the flow cytometry event data set in the high-dimensional space into the first converted flow cytometry event data associated with the first event in the first low-dimensional space. The first event may be associated with a positive subsampling requirement. For example, when the flow cytometry event data containing the first flow cytometry event data is subsampled, the subsampled flow cytometry event data may not include the first flow cytometry event data. The first low-dimensional space may be associated with a first plurality of bins. The first converted flow cytometry event data may be associated with the first bin in the first plurality of bins. When generating the subsampled flow cytometry event data, the computing system may indicate (e.g., in a data structure) that the first flow cytometry event data should be included.

在一些实施例中,所述计算系统可以接收包含所述第一流式细胞术事件数据的流式细胞术事件数据。所述计算系统可以确定所述第一多个事件中所述第一事件的所述第一流式细胞术事件数据与所述正二次采样需求相关联。所述计算系统可以确定所述第一经转换的流式细胞术事件数据与所述第一多个分箱中的所述第一分箱相关联。In some embodiments, the computing system may receive flow cytometry event data including the first flow cytometry event data. The computing system may determine that the first flow cytometry event data for the first event in the first plurality of events is associated with the positive subsampling requirement. The computing system may determine that the first transformed flow cytometry event data is associated with the first bin in the first plurality of bins.

所述处理器可被所述可执行指令编程为:基于所述第一多个分箱中的所述第一分箱确定所述第一经转换的流式细胞术事件数据的第一描述符。与所述第一分箱相关联的所述第一经转换的流式细胞术事件数据的所述第一描述符可以是所述第一多个分箱中所述第一分箱的第一分箱编号。所述计算系统可以:向内存数据结构中添加所述第一分箱、所述第一描述符和/或所述第一分箱编号。The processor may be programmed by the executable instructions to: determine a first descriptor of the first transformed flow cytometry event data based on the first bin in the first plurality of bins. The first descriptor of the first transformed flow cytometry event data associated with the first bin may be a first bin number of the first bin in the first plurality of bins. The computing system may: add the first bin, the first descriptor, and/or the first bin number to a memory data structure.

在一些实施例中,所述第一多个分箱中的两个分箱具有相同尺寸。所述第一多个分箱中的每个分箱可具有相同尺寸。所述第一多个分箱中的两个分箱可具有不同尺寸。所述第一多个分箱中的两个分箱可以包含数量大致相同的经转换的流式细胞术事件数据。所述第一多个分箱中的每个分箱可以包含数量大致相同的经转换的流式细胞术事件数据。所述计算系统可以:确定所述第一多个分箱中每个分箱的尺寸。所述处理器可被所述可执行指令编程为:基于多个门确定所述第一多个分箱中每个分箱的所述尺寸。所述计算系统可以:基于与多种目的细胞相关联的所述经转换的流式细胞术事件数据确定所述第一多个分箱中每个分箱的所述尺寸。In some embodiments, two of the first plurality of bins have the same size. Each of the first plurality of bins may have the same size. Two of the first plurality of bins may have different sizes. Two of the first plurality of bins may contain approximately the same amount of converted flow cytometry event data. Each of the first plurality of bins may contain approximately the same amount of converted flow cytometry event data. The computing system may: determine the size of each of the first plurality of bins. The processor may be programmed by the executable instructions to: determine the size of each of the first plurality of bins based on a plurality of gates. The computing system may: determine the size of each of the first plurality of bins based on the converted flow cytometry event data associated with a plurality of cells of interest.

在一些实施例中,为了转换所述第一流式细胞术事件数据,所述计算系统可以:利用第一降维函数转换所述第一流式细胞术事件数据。所述第一降维函数可以是线性降维函数。所述第一降维函数可以是非线性降维函数。所述非线性降维函数可以是t分布随机邻居嵌入(t-SNE)。所述计算系统可以:首先接收所述降维函数或其标识。In some embodiments, to convert the first flow cytometry event data, the computing system may: convert the first flow cytometry event data using a first dimensionality reduction function. The first dimensionality reduction function may be a linear dimensionality reduction function. The first dimensionality reduction function may be a nonlinear dimensionality reduction function. The nonlinear dimensionality reduction function may be a t-distributed stochastic neighbor embedding (t-SNE). The computing system may: first receive the dimensionality reduction function or an identifier thereof.

方法400前进至框412,其中所述计算系统可以将所述高维空间内所述流式细胞术事件数据集中与第一多个事件中的第二事件相关联的第二流式细胞术事件数据转换成所述第一低维空间内与所述第二事件相关联的第二经转换的流式细胞术事件数据。所述第二事件可与所述正二次采样需求相关联。所述第二经转换的流式细胞术事件数据可与所述第一多个分箱中的第二分箱相关联。在一些实施例中,为了转换所述第二流式细胞术事件数据,所述计算系统可以:利用第二降维函数转换所述第一流式细胞术事件数据。所述第一降维函数和所述第二降维函数可以是相同的。The method 400 proceeds to block 412, wherein the computing system may convert second flow cytometry event data associated with a second event in the first plurality of events in the flow cytometry event data set in the high-dimensional space into second converted flow cytometry event data associated with the second event in the first low-dimensional space. The second event may be associated with the positive subsampling requirement. The second converted flow cytometry event data may be associated with a second bin in the first plurality of bins. In some embodiments, to convert the second flow cytometry event data, the computing system may: convert the first flow cytometry event data using a second dimensionality reduction function. The first dimensionality reduction function and the second dimensionality reduction function may be the same.

在一些实施例中,所述计算系统可以接收包含所述第二流式细胞术事件数据的流式细胞术事件数据。所述计算系统可以确定所述第一多个事件中所述第二事件的所述第二流式细胞术事件数据与所述正二次采样需求相关联。所述计算系统可以确定所述第二经转换的流式细胞术事件数据与所述第一多个分箱中的所述第二分箱相关联。In some embodiments, the computing system may receive flow cytometry event data including the second flow cytometry event data. The computing system may determine that the second flow cytometry event data for the second event in the first plurality of events is associated with the positive subsampling requirement. The computing system may determine that the second transformed flow cytometry event data is associated with the second bin in the first plurality of bins.

所述处理器可被所述可执行指令编程为:基于所述第一多个分箱中的所述第二分箱确定所述第二经转换的流式细胞术事件数据的第二描述符。与所述第二分箱相关联的所述第二经转换的流式细胞术事件数据的所述第二描述符可以是所述第一多个分箱中所述第一分箱的第二分箱编号。所述计算系统可以向所述内存数据结构中添加所述第二分箱、所述第二描述符和/或所述第二分箱编号。The processor may be programmed by the executable instructions to determine a second descriptor of the second transformed flow cytometry event data based on the second bin in the first plurality of bins. The second descriptor of the second transformed flow cytometry event data associated with the second bin may be a second bin number of the first bin in the first plurality of bins. The computing system may add the second bin, the second descriptor, and/or the second bin number to the memory data structure.

所述第一流式细胞术事件数据与第一稀有细胞相关联和/或所述第二流式细胞术事件数据可与第二稀有细胞相关联。所述第一稀有细胞和所述第二稀有细胞可以是不同细胞类型的细胞。The first flow cytometry event data is associated with a first rare cell and/or the second flow cytometry event data may be associated with a second rare cell.The first rare cell and the second rare cell may be cells of different cell types.

方法400从框412前进至框416,其中所述计算系统可以确定与所述第一经转换的流式细胞术事件数据相关联的所述第一分箱和与所述第二经转换的流式细胞术事件数据相关联的所述第二分箱是不同的。生成经二次采样的流式细胞术事件数据时,所述计算系统可以指示(例如,在数据结构中)应当包括所述第二流式细胞术事件数据。From block 412, method 400 proceeds to block 416, where the computing system may determine that the first bin associated with the first transformed flow cytometry event data and the second bin associated with the second transformed flow cytometry event data are different. When generating the subsampled flow cytometry event data, the computing system may indicate (e.g., in a data structure) that the second flow cytometry event data should be included.

在一些实施例中,为了转换所述第一流式细胞术事件数据,所述计算系统可以:利用第二降维函数将所述第一流式细胞术事件数据转换成第二低维空间内与所述第一事件相关联的第一经转换的流式细胞术事件数据。所述第二低维空间可与第二多个分箱相关联。所述第二低维空间内的所述第一经转换的流式细胞术事件数据可与所述第二多个分箱中的第一分箱相关联。为了转换所述第二流式细胞术事件数据,所述计算系统可以:利用所述第二降维函数将所述第二流式细胞术事件数据转换成所述第二低维空间内与所述第二事件相关联的第二经转换的流式细胞术事件数据。所述第二低维空间内的所述第二经转换的流式细胞术事件数据可与所述第二多个分箱中的第二分箱相关联。所述第一多个分箱中的所述第一分箱可与第一目的细胞类型相关联,所述第二多个分箱中的所述第二分箱可与第二目的细胞类型相关联,所述第一多个分箱中的所述第二分箱与所述第一目的细胞类型无关联,所述第一多个分箱中的所述第二分箱与所述第二目的细胞类型无关联,所述第二多个分箱中的所述第一分箱与所述第二目的细胞类型无关联,和/或所述第二多个分箱中的所述第一分箱与所述第一目的细胞类型无关联。In some embodiments, to transform the first flow cytometry event data, the computing system may: transform the first flow cytometry event data into first transformed flow cytometry event data associated with the first event in a second low-dimensional space using a second dimensionality reduction function. The second low-dimensional space may be associated with a second plurality of bins. The first transformed flow cytometry event data in the second low-dimensional space may be associated with a first bin in the second plurality of bins. To transform the second flow cytometry event data, the computing system may: transform the second flow cytometry event data into second transformed flow cytometry event data associated with the second event in the second low-dimensional space using the second dimensionality reduction function. The second transformed flow cytometry event data in the second low-dimensional space may be associated with a second bin in the second plurality of bins. The first sub-bin in the first plurality of sub-bins may be associated with a first target cell type, the second sub-bin in the second plurality of sub-bins may be associated with a second target cell type, the second sub-bin in the first plurality of sub-bins is not associated with the first target cell type, the second sub-bin in the first plurality of sub-bins is not associated with the second target cell type, the first sub-bin in the second plurality of sub-bins is not associated with the second target cell type, and/or the first sub-bin in the second plurality of sub-bins is not associated with the first target cell type.

所述第一多个分箱中的所述第一分箱与所述第二多个分箱中的所述第一分箱的第一组合可与第一目的细胞类型相关联,和/或所述第一多个分箱中的所述第二分箱与所述第二多个分箱中的所述第二分箱的第二组合可与第二目的细胞类型相关联。所述第一多个分箱中的所述第一分箱与所述第二多个分箱中的所述第二分箱的第一组合与所述第一目的细胞类型和所述第二目的细胞类型无关联,和/或所述第一多个分箱中的所述第二分箱与所述第二多个分箱中的所述第一分箱的第二组合与所述第一目的细胞类型和所述第二目的细胞类型无关联。所述计算系统可以确定所述第一组合和所述第二组合是不同的。A first combination of the first bin in the first plurality of bins and the first bin in the second plurality of bins may be associated with a first cell type of interest, and/or a second combination of the second bin in the first plurality of bins and the second bin in the second plurality of bins may be associated with a second cell type of interest. A first combination of the first bin in the first plurality of bins and the second bin in the second plurality of bins is not associated with the first cell type of interest and the second cell type of interest, and/or a second combination of the second bin in the first plurality of bins and the first bin in the second plurality of bins is not associated with the first cell type of interest and the second cell type of interest. The computing system may determine that the first combination and the second combination are different.

方法400前进至框420,其中所述计算系统可以生成所述流式细胞术事件数据的经二次采样的流式细胞术事件数据,所述数据包含与所述第一事件相关联的所述第一流式细胞术事件数据和与所述第二事件相关联的所述第二流式细胞术事件数据。Method 400 proceeds to block 420, where the computing system may generate subsampled flow cytometry event data of the flow cytometry event data, the data comprising the first flow cytometry event data associated with the first event and the second flow cytometry event data associated with the second event.

在一些实施例中,所述计算系统可以:将所述高维空间内所述流式细胞术事件数据集中与第一多个事件中的第三事件相关联的第三流式细胞术事件数据转换成所述第一低维空间内与所述第三事件相关联的第三经转换的流式细胞术事件数据。所述第三事件可与所述正二次采样需求相关联。所述第三经转换的流式细胞术事件数据可与所述第一多个分箱中的所述第三分箱相关联。所述处理器可被所述可执行指令编程为:确定与所述第三经转换的流式细胞术事件数据相关联的所述第三分箱是与所述第一经转换的流式细胞术事件数据相关联的所述第一分箱或与所述第二经转换的流式细胞术事件数据相关联的所述第二分箱。所述第三流式细胞术事件数据可能不在所述流式细胞术事件数据的所述经二次采样的流式细胞术事件数据中。所述计算系统可以:基于所述第一多个分箱中的所述第三分箱确定所述第三经转换的流式细胞术事件数据的第三描述符。与所述第三分箱相关联的所述第三经转换的流式细胞术事件数据的所述第三描述符可以是所述第一多个分箱中所述第三分箱的第三分箱编号。所述计算系统可以:确定所述第三分箱、所述第三描述符和/或所述第三分箱编号不在所述内存数据结构中。In some embodiments, the computing system may: convert third flow cytometry event data associated with a third event in the first plurality of events in the flow cytometry event data set in the high dimensional space into third transformed flow cytometry event data associated with the third event in the first low dimensional space. The third event may be associated with the positive subsampling requirement. The third transformed flow cytometry event data may be associated with the third bin in the first plurality of bins. The processor may be programmed by the executable instructions to: determine whether the third bin associated with the third transformed flow cytometry event data is the first bin associated with the first transformed flow cytometry event data or the second bin associated with the second transformed flow cytometry event data. The third flow cytometry event data may not be in the subsampled flow cytometry event data of the flow cytometry event data. The computing system may: determine a third descriptor of the third transformed flow cytometry event data based on the third bin in the first plurality of bins. The third descriptor of the third transformed flow cytometry event data associated with the third bin may be a third bin number of the third bin in the first plurality of bins. The computing system may determine that the third bin, the third descriptor, and/or the third bin number are not in the memory data structure.

在一些实施例中,所述计算系统包含:确定第四流式细胞术事件数据(其与所述第一多个事件中的第四事件相关联)与负二次采样需求相关联。所述计算系统可以:生成所述流式细胞术事件数据的所述经二次采样的流式细胞术事件数据集,所述事件数据包含与所述第四事件相关联的所述第四流式细胞术事件数据。所述计算系统可以:接收定义多种目的细胞的多个门。所述第四流式细胞术事件数据可与所述多种目的细胞中的目的细胞相关联。所述第四流式细胞术事件数据与已分选细胞相关联。In some embodiments, the computing system includes: determining that fourth flow cytometry event data (which is associated with a fourth event in the first plurality of events) is associated with a negative subsampling requirement. The computing system may: generate the subsampled flow cytometry event data set of the flow cytometry event data, the event data including the fourth flow cytometry event data associated with the fourth event. The computing system may: receive a plurality of gates defining a plurality of cells of interest. The fourth flow cytometry event data may be associated with a cell of interest in the plurality of cells of interest. The fourth flow cytometry event data is associated with a sorted cell.

在一些实施例中,所述计算系统可以:将所述高维空间内所述流式细胞术事件数据集中与第二多个事件中的第二事件相关联的第二流式细胞术事件数据转换成所述第一低维空间内与所述第二多个事件中的所述第二事件相关联的第二经转换的流式细胞术事件数据。所述第二多个事件中的所述第二事件可与所述正二次采样需求相关联。In some embodiments, the computing system may: transform second flow cytometry event data associated with a second event in a second plurality of events in the flow cytometry event data set in the high dimensional space into second transformed flow cytometry event data associated with the second event in the second plurality of events in the first low dimensional space. The second event in the second plurality of events may be associated with the positive subsampling demand.

所述第二经转换的流式细胞术事件数据(其与所述第二多个事件中的所述第二事件相关联)可与所述第一多个分箱中的第二分箱相关联。与所述第二经转换的流式细胞术事件数据(其与所述第二多个事件中的所述第二事件相关联)相关联的所述第二分箱和与所述第一经转换的流式细胞术事件数据(其与所述第一多个事件中的所述第一事件相关联)相关联的所述第一分箱是相同的。所述计算系统可以:生成所述流式细胞术事件数据的所述经二次采样的流式细胞术事件数据集,所述事件数据包含与所述第二多个事件中的所述第二事件相关联的所述第二流式细胞术事件数据。The second transformed flow cytometry event data (associated with the second event in the second plurality of events) may be associated with a second bin in the first plurality of bins. The second bin associated with the second transformed flow cytometry event data (associated with the second event in the second plurality of events) and the first bin associated with the first transformed flow cytometry event data (associated with the first event in the first plurality of events) are the same. The computing system may: generate the subsampled flow cytometry event data set of the flow cytometry event data, the event data including the second flow cytometry event data associated with the second event in the second plurality of events.

所述计算系统可以:确定所述第一多个事件中的最后事件与超过预定阈值的时间参数或事件数相关联。所述计算系统可以:重置所述内存数据结构。所述处理器可被所述可执行指令编程为:向所述内存数据结构中添加与所述第二经转换的流式细胞术事件数据(其与所述第二多个事件中的所述第二事件相关联)相关联的所述第二分箱。在一些实施例中,所述计算系统可以:接收二次采样参数水平。所述计算系统可以:基于所述二次采样参数水平确定所述预定阈值。The computing system may determine that a last event in the first plurality of events is associated with a time parameter or a number of events that exceeds a predetermined threshold. The computing system may reset the memory data structure. The processor may be programmed by the executable instructions to add the second bin associated with the second transformed flow cytometry event data associated with the second event in the second plurality of events to the memory data structure. In some embodiments, the computing system may receive a subsampling parameter level. The computing system may determine the predetermined threshold based on the subsampling parameter level.

方法400结束于框424处。The method 400 ends at block 424 .

执行环境Execution Environment

图5描绘了示例计算设备500的总体架构,所述示例计算设备被配置成操作本文所揭示的代谢物、注释和基因整合系统。图5所示的计算设备500的所述总体架构包括计算机硬件和软件组件的布置。计算设备500可以包括相较于图5中示出的元件更多(或更少)的元件。但是,提供实现性公开内容时,无需示出所有此类常规元件。如图所示,计算设备500包括处理单元510、网络接口520、计算机可读介质驱动器530、输入/输出设备接口540、显示器550和输入设备560,所有这些都可以通过通信总线与另一设备进行通信。网络接口520可与一个或更多个网络或计算系统连接。处理单元510因此可以经由网络从其他计算系统或服务处接收信息和指令。Fig. 5 depicts the overall architecture of an example computing device 500, which is configured to operate metabolites, annotations and gene integration systems disclosed herein. The overall architecture of the computing device 500 shown in Fig. 5 includes the arrangement of computer hardware and software components. The computing device 500 may include more (or less) elements than the elements shown in Fig. 5. However, when providing an implementable disclosure, it is not necessary to show all such conventional elements. As shown in the figure, the computing device 500 includes a processing unit 510, a network interface 520, a computer-readable medium driver 530, an input/output device interface 540, a display 550 and an input device 560, all of which can communicate with another device via a communication bus. The network interface 520 can be connected to one or more networks or computing systems. Therefore, the processing unit 510 can receive information and instructions from other computing systems or services via a network.

处理单元510还可以与存储器570进行通信,且可以通过输入/输出设备接口540进一步为可选显示器550提供输出信息。输入/输出设备接口540还可以接受来自可选输入设备560的输入,例如键盘、鼠标、数位笔、麦克风、触摸屏、手势识别系统、语音识别系统、游戏手柄、加速度计、陀螺仪或其他输入设备。The processing unit 510 may also communicate with the memory 570 and may further provide output information to the optional display 550 via the input/output device interface 540. The input/output device interface 540 may also accept input from an optional input device 560, such as a keyboard, mouse, digital pen, microphone, touch screen, gesture recognition system, voice recognition system, game controller, accelerometer, gyroscope or other input device.

存储器570可以含有计算机程序指令(在一些实施例中被分组为模块或组件),所述指令由处理单元510执行以实现一个或更多个实施例。存储器570通常包括RAM、ROM和/或其他持久性、辅助性或非暂时性计算机可读介质。存储器570可以存储操作系统572,所述操作系统提供计算机程序指令以供处理单元510在计算设备500的常规管理和操作中使用。存储器570可以进一步包括用于实现本发明的方面的计算机程序指令和其他信息。Memory 570 may contain computer program instructions (grouped into modules or components in some embodiments) that are executed by processing unit 510 to implement one or more embodiments. Memory 570 typically includes RAM, ROM, and/or other persistent, auxiliary, or non-transitory computer-readable media. Memory 570 may store an operating system 572 that provides computer program instructions for use by processing unit 510 in general management and operation of computing device 500. Memory 570 may further include computer program instructions and other information for implementing aspects of the present invention.

例如,在一实施例中,存储器570包括用于对颗粒分析事件数据进行二次采样的二次采样模块574,例如,图4所述的二次采样方法400。另外,存储器570可以包括数据存储区590和/或一个或更多个存储所生成的流式细胞术事件数据集或经二次采样的流式细胞术事件数据集的其他数据存储区或与之通信。For example, in one embodiment, the memory 570 includes a subsampling module 574 for subsampling the particle analysis event data, such as the subsampling method 400 described in Figure 4. In addition, the memory 570 may include or communicate with a data storage area 590 and/or one or more other data storage areas storing the generated flow cytometry event data set or the subsampled flow cytometry event data set.

术语the term

本文中使用的术语“确定”涵盖各种活动。例如,“确定”可以包括计算、计算机计算、处理、推导、调查、查找(例如,在表、数据库或另一数据结构中查找)、查明等。此外,“确定”可以包括接收(例如,接收信息)、访问(例如,访问存储器中的数据)等。此外,“确定”可以包括解析、选定、选择、建立等。As used herein, the term "determining" encompasses a variety of activities. For example, "determining" may include calculating, computer computing, processing, deriving, investigating, looking up (e.g., looking up in a table, database, or another data structure), ascertaining, etc. In addition, "determining" may include receiving (e.g., receiving information), accessing (e.g., accessing data in a memory), etc. In addition, "determining" may include resolving, selecting, choosing, establishing, etc.

本文中使用的术语“提供”涵盖各种活动。例如,“提供”可以包括将值存储在便于后续检索的存储设备的位置、将值通过至少一种有线或无线通信介质直接发送给接收者、发送或存储对值的引用等。The term "providing" as used herein encompasses a variety of activities. For example, "providing" may include storing a value at a location on a storage device that facilitates subsequent retrieval, sending a value directly to a recipient via at least one wired or wireless communication medium, sending or storing a reference to a value, etc.

“提供”还可以包括通过硬件进行编码、解码、加密、解密、确认、验证等。"Providing" may also include encoding, decoding, encryption, decryption, confirmation, verification, etc. through hardware.

本文中使用的术语“选择性地”或“选择性”可以涵盖各种活动。例如,“选择性”过程可以包括从多个选项中确定一个选项。“选择性”过程可以包括以下一项或更多项:动态确定的输入、预先配置的输入或用户为确定而发起的输入。在一些实施方案中,可以包括n输入切换以提供选择性功能,其中n是用于进行选择的输入数量。The terms "selectively" or "selective" as used herein may encompass a variety of activities. For example, a "selective" process may include determining an option from a plurality of options. A "selective" process may include one or more of the following: dynamically determined inputs, preconfigured inputs, or user-initiated inputs for determination. In some embodiments, n-input switching may be included to provide selective functionality, where n is the number of inputs used to make a selection.

本文中使用的术语“消息”涵盖用于传达(例如,发送或接收)信息的各种格式。消息可以包括机器可读的信息聚合,例如,XML文档、固定字段消息、逗号分隔消息等。在一些实施方案中,消息可以包括用于传输一种或更多种信息表示的信号。虽然以单数形式叙述,但应当理解,消息可以多个部分组成、发送、存储、接收等。The term "message" as used herein encompasses various formats for conveying (e.g., sending or receiving) information. A message may include a machine-readable aggregation of information, such as an XML document, a fixed field message, a comma-delimited message, etc. In some embodiments, a message may include a signal for transmitting one or more information representations. Although described in singular form, it should be understood that a message may be composed, sent, stored, received, etc., in multiple parts.

本文中使用的“用户界面”(也被称为交互式用户界面、图形用户界面或UI)可以指基于网络的包括数据字段、按钮或用于接收输入信号或提供电子信息或向用户提供信息以响应于任何接收到的输入信号的其他交互控件。可以采用诸如超文本标记语言(HTML)、JAVASCRIPTTM、FLASHTM、JAVATM、.NETTM、WINDOWSOSTMAs used herein, "user interface" (also referred to as interactive user interface, graphical user interface or UI) may refer to a web-based user interface including data fields, buttons or other interactive controls for receiving input signals or providing electronic information or providing information to a user in response to any received input signals. The user interface may be implemented in a web-based user interface such as Hypertext Markup Language (HTML), JAVASCRIPT , FLASH , JAVA , .NET , WINDOWSOS ,

macOSTM、Web服务和丰富站点摘要(RSS)等技术来全部或部分实现UI。在一些实施方案中,可以将UI包括在被配置成根据所述的一个或更多个方面进行通信(例如,发送或接收数据)的独立客户端(例如,胖客户端、胖客户机)中。The UI may be implemented in whole or in part using technologies such as macOS , Web services, and Rich Site Summary (RSS). In some embodiments, the UI may be included in a standalone client (e.g., a thick client, a fat client) configured to communicate (e.g., send or receive data) according to one or more aspects described.

本文中使用的“数据存储区”可以体现在可以访问某一设备或由所述设备访问的硬盘驱动器、固态存储器和/或任何其他类型的非暂时性计算机可读存储介质中,所述设备包括存取设备、服务器或其他所述的计算设备等。如本领域所公知,在不脱离本发明的范围的情况下,数据存储区还可以或替代地在多个本地和/或远程存储设备上实现分布或分区。在其他实施例中,数据存储区可以包括或体现在数据存储网络服务中。As used herein, a "data storage area" may be embodied in a hard disk drive, solid-state memory, and/or any other type of non-transitory computer-readable storage medium that can access a device or be accessed by the device, including an access device, a server, or other computing device, etc. As is known in the art, the data storage area may also or alternatively be distributed or partitioned across multiple local and/or remote storage devices without departing from the scope of the present invention. In other embodiments, the data storage area may include or be embodied in a data storage network service.

本领域技术人员应当理解,可以使用各种不同技术和技巧中的任何一种来表示信息、消息和信号。例如,在上述整个说明书中可能提及的数据、指令、命令、信息、信号、位、符号和芯片可以用电压、电流、电磁波、磁场或粒子、光场或光学粒子或其任何组合表示。Those skilled in the art will appreciate that any of a variety of different technologies and techniques may be used to represent information, messages, and signals. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be mentioned throughout the above specification may be represented by voltage, current, electromagnetic waves, magnetic fields or particles, light fields or optical particles, or any combination thereof.

本领域技术人员应当进一步理解,结合本文所揭示的实施例描述的各种说明性逻辑块、模块、电路和算法步骤可以电子硬件、计算机软件或两者的组合的形式实现。为了清楚阐明硬件和软件的这种可互换性,上文已大体上就其功能描述了各种说明性组件、块、模块、电路和步骤。这种功能以硬件还是软件的形式实现取决于特定应用以及对于整个系统的设计约束。技术人员可以针对各特定应用以各种方式执行所述功能,但是这种执行决策不应被解释为导致脱离本发明的范围。It will be further understood by those skilled in the art that the various illustrative logic blocks, modules, circuits and algorithmic steps described in conjunction with the embodiments disclosed herein can be implemented in the form of electronic hardware, computer software or a combination of the two. In order to clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits and steps have been generally described above with respect to their functions. Whether this function is implemented in the form of hardware or software depends on the specific application and the design constraints for the entire system. The technician can perform the functions in various ways for each specific application, but this execution decision should not be interpreted as causing departure from the scope of the present invention.

本文所述的技术可以硬件、软件、固件或其任何组合的形式来实施。所述技术可以在各种设备中的任何一种中实施,例如,专门编程的事件处理计算机、无线通信设备或集成电路器件。被描述为模块或组件的任何功能都可以在集成逻辑设备中一同执行,或在离散但可互操作的逻辑设备中分别执行。如果以软件形式实施,则所述技术可以至少部分通过计算机可读数据存储介质来实现,所述计算机可读数据存储介质包括指令,执行所述指令时,即执行上述一种或更多种方法。所述计算机可读数据存储介质可以构成计算机程序产品的一部分,其可以包括包装材料。所述计算机可读介质可以包含存储器或数据存储介质,例如,随机存取存储器(RAM)(例如同步动态随机存储器(SDRAM))、只读存储器(ROM)、非易失性随机存取存储器(NVRAM)、电可擦可编程只读存储器(EEPROM)、闪速存储器、磁性或光学数据存储介质等。所述计算机可读介质可以是非暂时性存储介质。The technology described herein can be implemented in the form of hardware, software, firmware or any combination thereof. The technology can be implemented in any of a variety of devices, for example, a specially programmed event processing computer, a wireless communication device or an integrated circuit device. Any function described as a module or component can be executed together in an integrated logic device, or separately in a discrete but interoperable logic device. If implemented in software form, the technology can be implemented at least in part by a computer-readable data storage medium, which includes instructions, and when the instructions are executed, one or more of the above methods are executed. The computer-readable data storage medium may constitute a part of a computer program product, which may include packaging materials. The computer-readable medium may include a memory or a data storage medium, for example, a random access memory (RAM) (e.g., a synchronous dynamic random access memory (SDRAM)), a read-only memory (ROM), a non-volatile random access memory (NVRAM), an electrically erasable programmable read-only memory (EEPROM), a flash memory, a magnetic or optical data storage medium, etc. The computer-readable medium may be a non-temporary storage medium.

此外或另外,所述技术可以至少部分通过计算机可读通信介质来实现,所述计算机可读通信介质以指令或数据结构的形式携带或传递程序代码,而且可以通过计算设备来访问、读取和/或执行,例如传播的信号或波。Alternatively or additionally, the techniques may be implemented at least in part by a computer-readable communication medium that carries or communicates program code in the form of instructions or data structures and that can be accessed, read, and/or executed by a computing device, such as a propagated signal or wave.

所述程序代码可以由专门编程的分选策略处理器执行,所述分选策略处理器可以包括一个或更多个处理器,例如,一个或更多个数字信号处理器(DSP)、可配置微处理器、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其他等效的集成或离散逻辑电路系统。所述图形处理器可以被专门配置成执行本发明中所描述的任何技术。计算设备的组合(例如,DSP和微处理器的组合、多个微处理器、与DSP内核结合的一个或更多个微处理器或至少部分数据连接中的任何其他此等配置)可以执行一项或更多项所述功能。在一些方面,本文所述的功能可以在被配置成用于编码和解码的专用软件模块或硬件模块内提供,或者将其并入专用分选控制卡中。The program code can be executed by a specially programmed sorting strategy processor, which can include one or more processors, for example, one or more digital signal processors (DSPs), configurable microprocessors, application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), or other equivalent integrated or discrete logic circuit systems. The graphics processor can be specially configured to perform any of the techniques described in the present invention. A combination of computing devices (e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in combination with a DSP core, or any other such configuration in at least part of the data connection) can perform one or more of the functions described. In some aspects, the functions described herein can be provided in a dedicated software module or hardware module configured for encoding and decoding, or incorporated into a dedicated sorting control card.

本领域技术人员应当理解,一般而言,本文使用的术语,特别是所附权利要求书中(例如,在所附权利要求书的主体部分中)使用的术语,通常应理解为“开放”术语(例如,术语“包括”应解释为“包括但不限于”,术语“具有”应解释为“至少具有”等)。本领域技术人员还应理解,如果意在所引入的权利要求中标明具体数目,则这种意图将在该权利要求中明确指出,而在无这种明确标明的情况下,则视为不存在这种意图。例如,为帮助理解,所附权利要求可能使用了引导短语“至少一个”和“一个或更多个”来引入权利要求中的特征。然而,这种短语的使用不应被解释为暗示着由不定冠词“一”或“一个”引入的权利要求特征将含有该特征的任意特定权利要求限制为仅含有一个该特征的实施例,即便是该权利要求既包括引导短语“一个或更多个”或“至少一个”又包括不定冠词如“一”或“一个”(例如,“一”和/或“一个”应当被解释为意指“至少一个”或“一个或更多个”);在使用定冠词来引入权利要求中的特征时,同样如此。Those skilled in the art will understand that, in general, the terms used herein, particularly in the appended claims (e.g., in the body of the appended claims), should generally be understood as "open" terms (e.g., the term "including" should be interpreted as "including but not limited to", the term "having" should be interpreted as "having at least", etc.). Those skilled in the art will also understand that if a specific number is intended to be indicated in an introduced claim, such intention will be explicitly indicated in the claim, and in the absence of such explicit indication, such intention will be deemed not to exist. For example, to aid understanding, the appended claims may use the introductory phrases "at least one" and "one or more" to introduce features in the claims. However, the use of such phrases should not be interpreted as implying that a claim feature introduced by the indefinite article "a" or "an" limits any particular claim containing that feature to embodiments containing only one of that feature, even if the claim includes both the introductory phrase "one or more" or "at least one" and an indefinite article such as "a" or "an" (for example, "a" and/or "an" should be interpreted as meaning "at least one" or "one or more"); the same is true when a definite article is used to introduce a feature in a claim.

另外,即使明确指出了所引入权利要求特征的具体数目,本领域技术人员应当认识到,这种列举应解释为意指至少是所列数目(例如,不存在其他修饰语的短语“两个特征”意指至少两个该特征,或者两个或更多该特征)。另外,在使用类似于“A、B和C等中至少一个”这样的表述的情况下,一般来说应该按照本领域技术人员通常理解该表述的含义来予以解释(例如,“具有A、B和C中至少一个的系统”应包括但不限于单独具有A、单独具有B、单独具有C、具有A和B、具有A和C、具有B和C和/或具有A、B和C的系统等)。在使用类似于“A、B或C等中至少一个”这样的表述的情况下,一般来说应该按照本领域技术人员通常理解该表述的含义来予以解释(例如,“具有A、B或C中至少一个的系统”应包括但不限于单独具有A、单独具有B、单独具有C、具有A和B、具有A和C、具有B和C和/或具有A、B和C的系统等)。本领域技术人员还应当理解,实质上任意表示两个或更多可选项目的转折连词和/或短语,无论是在说明书、权利要求书还是附图中,都应被理解为给出了包括这些项目之一、这些项目任一方或两个项目的可能性。例如,短语“A或B”应当被理解为包括“A”或“B”或“A和B”的可能性。In addition, even if a specific number of claimed features is explicitly indicated, those skilled in the art will recognize that such enumeration should be interpreted as meaning at least the number listed (e.g., the phrase "two features" without other modifiers means at least two of the features, or two or more of the features). In addition, where expressions such as "at least one of A, B, and C, etc." are used, they should generally be interpreted in accordance with the meaning of the expression generally understood by those skilled in the art (e.g., "a system having at least one of A, B, and C" should include but is not limited to a system having A alone, B alone, C alone, A and B, A and C, B and C, and/or A, B, and C, etc.). Where expressions such as "at least one of A, B, or C, etc." are used, they should generally be interpreted in accordance with the meaning of the expression generally understood by those skilled in the art (e.g., "a system having at least one of A, B, or C" should include but is not limited to a system having A alone, B alone, C alone, A and B, A and C, B and C, and/or A, B, and C, etc.). Those skilled in the art should also understand that any transitional conjunction and/or phrase that represents two or more optional items, whether in the specification, claims or drawings, should be understood to include the possibility of one of these items, either of these items or both items. For example, the phrase "A or B" should be understood to include the possibility of "A" or "B" or "A and B".

另外,在以马库什组描述本发明的特征或方面的情况下,本领域技术人员应当认识到,本发明因此也是以该马库什组中的任意单独成员或成员子组来描述的。In addition, where features or aspects of the invention are described in terms of Markush groups, those skilled in the art will recognize that the invention is also thereby described in terms of any individual member or subgroup of members of the Markush group.

本领域技术人员应当理解,出于任意和所有目的,例如为了提供书面说明,本文所揭示的所有范围也包含任意及全部可能的子范围及其子范围的组合。任意列出的范围可以被容易地看作充分描述且实现了将该范围至少进行二等分、三等分、四等分、五等分、十等分等。作为非限制性示例,在此所讨论的每一范围可以容易地分成下三分之一、中三分之一和上三分之一等。本领域技术人员应当理解,所有诸如“直至”、“至少”、“大于”、“小于”之类的语言包括所列数字,并且指代了随后可以如上所述被分成子范围的范围。最后,本领域技术人员应当理解,范围包括每一单独数字。因此,例如具有1-3个单元的组是指具有1、2或3个单元的组。类似地,具有1-5个单元的组是指具有1、2、3、4或5个单元的组,以此类推。It will be appreciated by those skilled in the art that, for any and all purposes, such as to provide a written description, all ranges disclosed herein also include any and all possible sub-ranges and combinations of sub-ranges thereof. Any listed range can be easily viewed as fully describing and achieving at least bisection, trisection, quartering, quintillation, decimalization, etc. of the range. As a non-limiting example, each range discussed herein can be easily divided into a lower third, a middle third, and an upper third, etc. It will be appreciated by those skilled in the art that all such as "until", "at least", "greater than", "less than" and the like language include the listed numbers, and refer to the range that can be subsequently divided into sub-ranges as described above. Finally, it will be appreciated by those skilled in the art that the range includes each individual number. Therefore, for example, a group having 1-3 units refers to a group having 1, 2 or 3 units. Similarly, a group having 1-5 units refers to a group having 1, 2, 3, 4 or 5 units, and so on.

本文所揭示的方法包含用于实现所述方法的一个或更多个步骤或动作。在不脱离权利要求书的范围的情况下,所述方法步骤和/或动作可以彼此互换。换言之,除非指定了步骤或动作的特定顺序,否则可以在不脱离权利要求书的范围的情况下,修改特定步骤和/或动作的顺序和/或用途。The method disclosed herein includes one or more steps or actions for implementing the method. The method steps and/or actions may be interchangeable with each other without departing from the scope of the claims. In other words, unless a specific order of steps or actions is specified, the order and/or use of specific steps and/or actions may be modified without departing from the scope of the claims.

虽然已在本文中揭示了各方面和实施例,但是其他方面和实施例对于本领域技术人员而言是显而易见的。本文所揭示的各方面和实施例是为了说明目的而提供,而非旨在进行限定,其真实范围和精神由以下权利要求书指示。Although various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are provided for purposes of illustration and are not intended to be limiting, with the true scope and spirit being indicated by the following claims.

Claims (74)

1. A method for subsampling flow cytometry event data, comprising: under control of the processor:
Converting first flow cytometry event data in a high-dimensional spatial flow cytometry event dataset associated with a first event of a first plurality of events into first converted flow cytometry event data in a first low-dimensional space associated with the first event, wherein the first event is associated with a positive subsampling requirement, wherein the first low-dimensional space is associated with a first plurality of bins, and wherein the first converted flow cytometry event data is associated with a first bin of the first plurality of bins;
Converting second flow cytometry event data in the set of flow cytometry event data associated with a second event of the first plurality of events within the high-dimensional space to second converted flow cytometry event data associated with the second event within the first low-dimensional space, wherein the second event is associated with the positive sub-sampling requirement, and wherein the second converted flow cytometry event data is associated with a second bin of the first plurality of bins;
Determining that the first bin associated with the first converted flow cytometry event data and the second bin associated with the second converted flow cytometry event data are different; and
A subsampled flow cytometry event data set of the flow cytometry event data is generated, the data comprising the first flow cytometry event data associated with the first event and the second flow cytometry event data associated with the second event.
2. The method as claimed in claim 1, comprising: flow cytometry event data comprising the first flow cytometry event data and the second flow cytometry event data is received.
3. The method according to any one of claims 1-2, comprising:
determining that the first flow cytometry event data of the first event of the first plurality of events is associated with the positive sub-sampling requirement; and
Determining that the second flow cytometry event data for the second event of the first plurality of events is associated with the positive sub-sampling requirement.
4. The method according to any one of claims 1-2, comprising:
determining that the first converted flow cytometry event data is associated with the first bin of the first plurality of bins; and
Determining that the second converted flow cytometry event data is associated with the second bin of the first plurality of bins.
5. The method according to any one of claims 1-2, comprising:
Determining a first descriptor of the first converted flow cytometry event data based on the first bin of the first plurality of bins; and
A second descriptor of the second converted flow cytometry event data is determined based on the second bin of the first plurality of bins.
6. The method of claim 5, wherein the first descriptor of the first converted flow cytometry event data associated with the first bin is a first bin number of the first bin of the first plurality of bins, and wherein the second descriptor of the second converted flow cytometry event data associated with the second bin is a second bin number of the first bin of the first plurality of bins.
7. The method of any one of claims 1-2, wherein the first flow cytometry event data is associated with a first rare cell and/or the second flow cytometry event data is associated with a second rare cell, optionally wherein the first rare cell and the second rare cell are cells of different cell types.
8. The method as in claim 6, comprising:
Adding the first sub-box, the first descriptor and/or the first sub-box number to a memory data structure; and
And adding the second sub-box, the second descriptor and/or the second sub-box number into the memory data structure.
9. The method as claimed in claim 8, comprising:
Converting third flow cytometry event data in the set of flow cytometry event data associated with a third event of the first plurality of events within the high-dimensional space to third converted flow cytometry event data associated with the third event within the first low-dimensional space, wherein the third event is associated with the positive subsampling requirement, and wherein the third converted flow cytometry event data is associated with a third bin of the first plurality of bins; and
Determining that the third bin associated with the third converted flow cytometry event data is the first bin associated with the first converted flow cytometry event data or the second bin associated with the second converted flow cytometry event data, wherein the third flow cytometry event data is not in the subsampled flow cytometry event data of the flow cytometry event data.
10. The method as claimed in claim 9, comprising: determining a third descriptor of the third converted flow cytometry event data based on the third bin of the first plurality of bins, wherein the third descriptor of the third converted flow cytometry event data associated with the third bin is a third bin number of the third bin of the first plurality of bins.
11. The method as claimed in claim 10, comprising: and determining that the third sub-box, the third descriptor and/or the third sub-box number are not in the memory data structure.
12. The method according to any one of claims 1-2, comprising:
Determining that fourth flow cytometry event data is associated with a negative subsampling requirement, wherein the fourth flow cytometry event data is associated with a fourth event of the first plurality of events; and
Wherein the generating includes generating the subsampled flow cytometry event data set of the flow cytometry event data including the fourth flow cytometry event data associated with the fourth event.
13. The method as claimed in claim 12, comprising:
a plurality of gates defining a plurality of cells of interest is received, wherein the fourth flow cytometry event data is associated with a cell of interest in the plurality of cells of interest.
14. The method of claim 13, wherein the fourth flow cytometry event data is associated with sorted cells.
15. The method as claimed in claim 8, comprising:
Converting third flow cytometry event data in the set of high-dimensional space associated with a second event of a second plurality of events into third converted flow cytometry event data in the first low-dimensional space associated with the second event of the second plurality of events, wherein the second event of the second plurality of events is associated with the positive subsampling requirement, wherein the third converted flow cytometry event data is associated with a third sub-bin of the first plurality of sub-bins, wherein the third sub-bin associated with the third converted flow cytometry event data and the first sub-bin associated with the first converted flow cytometry event data are the same, wherein the third converted flow cytometry event data is associated with the second event of the second plurality of events, and the first converted flow cytometry event data is associated with the first event of the first plurality of events,
Wherein the generating includes generating the subsampled flow cytometry event data set of the flow cytometry event data including the third flow cytometry event data associated with the second event of the second plurality of events.
16. The method as claimed in claim 15, comprising:
Determining that a last event of the first plurality of events is associated with a time parameter or number of events exceeding a predetermined threshold;
Resetting the memory data structure; and
Adding the third bin associated with the third converted flow cytometry event data to the in-memory data structure, wherein the third converted flow cytometry event data is associated with the second event of the second plurality of events.
17. The method as claimed in claim 16, comprising:
receiving a subsampled parameter level; and
The predetermined threshold is determined based on the subsampled parameter level.
18. The method of any one of claims 1-2, wherein converting the first flow cytometry event data comprises converting the first flow cytometry event data using a first dimension-reduction function, and/or wherein converting the second flow cytometry event data comprises converting the second flow cytometry event data using the first dimension-reduction function.
19. The method of claim 18, wherein the first dimension-reduction function is a linear dimension-reduction function.
20. The method of claim 18, wherein the first dimension-reduction function is a nonlinear dimension-reduction function.
21. The method of claim 20, wherein the nonlinear dimension reduction function is t-distribution random neighbor embedding.
22. The method as claimed in claim 18, comprising: the first dimension-reduction function or an identification thereof is received.
23. The method according to any one of claim 1 to 2,
Wherein converting the first flow cytometry event data comprises converting the first flow cytometry event data using a second dimension-reduction function to first converted flow cytometry event data associated with the first event within a second low-dimensional space, wherein the second low-dimensional space is associated with a second plurality of bins, and wherein the first converted flow cytometry event data within the second low-dimensional space is associated with a first bin of the second plurality of bins, and/or
Wherein converting the second flow cytometry event data comprises converting the second flow cytometry event data using the second dimension-reduction function to second converted flow cytometry event data associated with the second event within the second low-dimensional space, wherein the second converted flow cytometry event data within the second low-dimensional space is associated with a second bin of the second plurality of bins.
24. The method according to claim 23,
Wherein the first bin of the first plurality of bins is associated with a first cell type of interest,
Wherein the second bin of the second plurality of bins is associated with a second cell type of interest,
Wherein the second bin of the first plurality of bins is unassociated with the first cell type of interest, wherein the second bin of the first plurality of bins is unassociated with the second cell type of interest,
Wherein the first sub-tank of the second plurality of sub-tanks is unassociated with the second cell type of interest, and/or wherein the first sub-tank of the second plurality of sub-tanks is unassociated with the first cell type of interest.
25. The method of claim 24, wherein a combination of the first bin of the first plurality of bins and the first bin of the second plurality of bins is associated with a first cell type of interest, and/or wherein a combination of the second bin of the first plurality of bins and the second bin of the second plurality of bins is associated with a second cell type of interest.
26. The method of claim 25, wherein a combination of the first bin of the first plurality of bins and the second bin of the second plurality of bins is unassociated with the first cell type of interest and the second cell type of interest, and/or wherein a combination of the second bin of the first plurality of bins and the first bin of the second plurality of bins is unassociated with the first cell type of interest and the second cell type of interest.
27. The method of any of claims 1-2, wherein two bins of the first plurality of bins have the same size.
28. The method of any of claims 1-2, wherein each of the first plurality of bins has the same size.
29. The method of any of claims 1-2, wherein two bins of the first plurality of bins have different sizes.
30. The method of any one of claims 1-2, wherein two bins of the first plurality of bins contain the same number of converted flow cytometry event data.
31. The method of any one of claims 1-2, wherein each bin of the first plurality of bins contains the same number of transformed flow cytometry event data.
32. The method of any of claims 1-2, comprising determining a size of each of the first plurality of bins.
33. The method of claim 32, comprising determining the size of each bin of the first plurality of bins based on a plurality of gates.
34. The method of claim 32, comprising determining the size of each bin of the first plurality of bins based on the transformed flow cytometry event data associated with a plurality of cells of interest.
35. A computing system that subsamples flow cytometry event data, comprising: a non-transitory memory configured to store executable instructions; and
A processor in communication with the non-transitory memory, the processor programmed with the executable instructions to:
converting first flow cytometry event data in a high-dimensional space associated with a first event of a first plurality of events in a flow cytometry event dataset into first converted flow cytometry event data in a first low-dimensional space associated with the first event, wherein the first event is associated with a positive subsampling requirement,
Wherein the first low-dimensional space is associated with a first plurality of bins, and wherein the first converted flow cytometry event data is associated with a first bin of the first plurality of bins;
converting second flow cytometry event data associated with a second event of the first plurality of events within the high-dimensional space to second converted flow cytometry event data associated with the second event in the set of flow cytometry event data within the first low-dimensional space, wherein the second event is associated with the positive sub-sampling requirement, and wherein the second converted flow cytometry event data is associated with a second bin of the first plurality of bins;
Determining that the first bin associated with the first converted flow cytometry event data and the second bin associated with the second converted flow cytometry event data are different; and
A subsampled flow cytometry event data set of the flow cytometry event data is generated, the data comprising the first flow cytometry event data associated with the first event and the second flow cytometry event data associated with the second event.
36. The system of claim 35, wherein the processor is programmed with the executable instructions to: flow cytometry event data comprising the first flow cytometry event data and the second flow cytometry event data is received.
37. The system of any of claims 35-36, wherein the processor is programmed with the executable instructions to:
determining that the first flow cytometry event data of the first event of the first plurality of events is associated with the positive sub-sampling requirement; and
Determining that the second flow cytometry event data for the second event of the first plurality of events is associated with the positive sub-sampling requirement.
38. The system of any of claims 35-36, wherein the processor is programmed with the executable instructions to:
determining that the first converted flow cytometry event data is associated with the first bin of the first plurality of bins; and
Determining that the second converted flow cytometry event data is associated with the second bin of the first plurality of bins.
39. The system of any of claims 35-36, wherein the processor is programmed with the executable instructions to:
Determining a first descriptor of the first converted flow cytometry event data based on the first bin of the first plurality of bins; and
A second descriptor of the second converted flow cytometry event data is determined based on the second bin of the first plurality of bins.
40. The system of claim 39, wherein the first descriptor of the first converted flow cytometry event data associated with the first bin is a first bin number of the first bin of the first plurality of bins, and wherein the second descriptor of the second converted flow cytometry event data associated with the second bin is a second bin number of the first bin of the first plurality of bins.
41. The system of any one of claims 35-36, wherein the first flow cytometry event data is associated with a first rare cell and/or the second flow cytometry event data is associated with a second rare cell, optionally wherein the first rare cell and the second rare cell are cells of different cell types.
42. The system of claim 40, wherein the processor is programmed with the executable instructions to:
Adding the first sub-box, the first descriptor and/or the first sub-box number to a memory data structure; and
And adding the second sub-box, the second descriptor and/or the second sub-box number into the memory data structure.
43. The system of claim 42, wherein the processor is programmed with the executable instructions to:
Converting third flow cytometry event data in the set of flow cytometry event data associated with a third event of the first plurality of events within the high-dimensional space to third converted flow cytometry event data associated with the third event within the first low-dimensional space, wherein the third event is associated with the positive subsampling requirement, and wherein the third converted flow cytometry event data is associated with a third bin of the first plurality of bins; and
Determining that the third bin associated with the third converted flow cytometry event data is the first bin associated with the first converted flow cytometry event data or the second bin associated with the second converted flow cytometry event data,
Wherein the third flow cytometry event data is not in the subsampled flow cytometry event data of the flow cytometry event data.
44. The system of claim 43, wherein the processor is programmed with the executable instructions to: determining a third descriptor of the third converted flow cytometry event data based on the third bin of the first plurality of bins, wherein the third descriptor of the third converted flow cytometry event data associated with the third bin is a third bin number of the third bin of the first plurality of bins.
45. The system of claim 44, wherein the processor is programmed with the executable instructions to: and determining that the third sub-box, the third descriptor and/or the third sub-box number are not in the memory data structure.
46. The system of any of claims 35-36, wherein the processor is programmed with the executable instructions to:
Determining that fourth flow cytometry event data is associated with a negative sub-sampling requirement, wherein the fourth flow cytometry event data is associated with a fourth event of the first plurality of events,
Wherein to generate the subsampled flow cytometry event data set, the processor is programmed with the executable instructions to: generating the subsampled flow cytometry event data set of the flow cytometry event data, the event data including the fourth flow cytometry event data associated with the fourth event.
47. The system of claim 46, wherein the processor is programmed with the executable instructions to:
a plurality of gates defining a plurality of cells of interest is received, wherein the fourth flow cytometry event data is associated with a cell of interest in the plurality of cells of interest.
48. The system of claim 47, wherein the fourth flow cytometry event data is associated with sorted cells.
49. The system of claim 42, wherein the processor is programmed with the executable instructions to:
Converting third flow cytometry event data in the set of flow cytometry event data associated with a second one of a second plurality of events within the high-dimensional space to third converted flow cytometry event data in the first low-dimensional space associated with the second one of the second plurality of events,
Wherein the second event of the second plurality of events is associated with the positive sub-sampling requirement, wherein the third converted flow cytometry event data is associated with a third bin of the first plurality of bins, wherein the third bin associated with the third converted flow cytometry event data is the same as the first bin associated with the first converted flow cytometry event data, wherein the third converted flow cytometry event data is associated with the second event of the second plurality of events, and the first converted flow cytometry event data is associated with the first event of the first plurality of events,
Wherein to generate the subsampled flow cytometry event data set, the processor is programmed with the executable instructions to: generating the subsampled flow cytometry event data set of the flow cytometry event data, the event data including the third flow cytometry event data associated with the second event of the second plurality of events.
50. The system of claim 49, wherein the processor is programmed with the executable instructions to:
Determining that a last event of the first plurality of events is associated with a time parameter or number of events exceeding a predetermined threshold;
Resetting the memory data structure; and
Adding the third bin associated with the third converted flow cytometry event data to the in-memory data structure, wherein the third converted flow cytometry event data is associated with the second event of the second plurality of events.
51. The system of claim 50, wherein the processor is programmed with the executable instructions to:
receiving a subsampled parameter level; and
The predetermined threshold is determined based on the subsampled parameter level.
52. The system of any one of claims 35-36,
Wherein to convert the first flow cytometry event data, the processor is programmed by the executable instructions to: converting the first flow cytometry event data using a first dimension-reduction function, and/or
Wherein to convert the second flow cytometry event data, the processor is programmed by the executable instructions to: converting the second flow cytometry event data using the first dimension-reduction function.
53. The system of claim 52, wherein the first dimension-reduction function is a linear dimension-reduction function.
54. The system of claim 52, wherein the first dimension-reduction function is a nonlinear dimension-reduction function.
55. The system of claim 54, wherein the nonlinear dimension reduction function is t-distribution random neighbor embedding.
56. The system of claim 52, wherein the processor is programmed with the executable instructions to: the first dimension-reduction function or an identification thereof is received.
57. The system of any one of claims 35-36,
Wherein to convert the first flow cytometry event data, the processor is programmed by the executable instructions to: converting the first flow cytometry event data into first converted flow cytometry event data associated with the first event within a second low-dimensional space using a second dimension-reduction function, wherein the second low-dimensional space is associated with a second plurality of bins, and wherein the first converted flow cytometry event data within the second low-dimensional space is associated with a first bin of the second plurality of bins, and/or
Wherein to convert the second flow cytometry event data, the processor is programmed by the executable instructions to: converting the second flow cytometry event data using the second dimension-reduction function into second converted flow cytometry event data associated with the second event within the second low-dimensional space, wherein the second converted flow cytometry event data within the second low-dimensional space is associated with a second bin of the second plurality of bins.
58. The system of claim 57,
Wherein the first bin of the first plurality of bins is associated with a first cell type of interest,
Wherein the second bin of the second plurality of bins is associated with a second cell type of interest,
Wherein the second bin of the first plurality of bins is unassociated with the first cell type of interest, wherein the second bin of the first plurality of bins is unassociated with the second cell type of interest,
Wherein the first sub-tank of the second plurality of sub-tanks is unassociated with the second cell type of interest, and/or wherein the first sub-tank of the second plurality of sub-tanks is unassociated with the first cell type of interest.
59. The system of claim 58, wherein a combination of the first sub-tank of the first plurality of sub-tanks and the first sub-tank of the second plurality of sub-tanks is associated with a first cell type of interest, and/or wherein a combination of the second sub-tank of the first plurality of sub-tanks and the second sub-tank of the second plurality of sub-tanks is associated with a second cell type of interest.
60. The system of claim 59, wherein a combination of the first sub-tank of the first plurality of sub-tanks and the second sub-tank of the second plurality of sub-tanks is unassociated with the first cell type of interest and the second cell type of interest, and/or wherein a combination of the second sub-tank of the first plurality of sub-tanks and the first sub-tank of the second plurality of sub-tanks is unassociated with the first cell type of interest and the second cell type of interest.
61. The system of any of claims 35-36, wherein two bins of the first plurality of bins have the same size.
62. The system of any of claims 35-36, wherein each of the first plurality of bins has the same size.
63. The system of any of claims 35-36, wherein two bins of the first plurality of bins have different sizes.
64. The system of any one of claims 35-36, wherein two bins of the first plurality of bins contain the same number of converted flow cytometry event data.
65. The system of any one of claims 35-36, wherein each bin of the first plurality of bins contains the same number of transformed flow cytometry event data.
66. The system of any of claims 35-36, wherein the processor is programmed with the executable instructions to: determining a size of each bin of the first plurality of bins.
67. The system of claim 66, wherein the processor is programmed with the executable instructions to: the size of each bin of the first plurality of bins is determined based on a plurality of gates.
68. The system of claim 66, wherein the processor is programmed with the executable instructions to: the size of each bin of the first plurality of bins is determined based on the transformed flow cytometry event data associated with a plurality of cells of interest.
69. The method of claim 23, wherein the second dimension-reduction function is a linear dimension-reduction function.
70. The method of claim 23, wherein the second dimension-reduction function is a nonlinear dimension-reduction function.
71. The method of claim 70, wherein the nonlinear dimension reduction function is t-distribution random neighbor embedding.
72. The system of claim 57, wherein the second dimension-reduction function is a linear dimension-reduction function.
73. The system of claim 57, wherein the second dimension-reduction function is a nonlinear dimension-reduction function.
74. The system of claim 73, wherein the nonlinear dimension reduction function is t-distribution random neighbor embedding.
CN202080004716.2A 2019-04-19 2020-02-27 Subsampling flow cytometry event data Active CN112955729B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201962836537P 2019-04-19 2019-04-19
US62/836,537 2019-04-19
PCT/US2020/020095 WO2020214248A1 (en) 2019-04-19 2020-02-27 Subsampling flow cytometric event data

Publications (2)

Publication Number Publication Date
CN112955729A CN112955729A (en) 2021-06-11
CN112955729B true CN112955729B (en) 2024-10-22

Family

ID=72832216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080004716.2A Active CN112955729B (en) 2019-04-19 2020-02-27 Subsampling flow cytometry event data

Country Status (5)

Country Link
US (3) US11402317B2 (en)
EP (1) EP3956649A4 (en)
JP (2) JP2022529196A (en)
CN (1) CN112955729B (en)
WO (1) WO2020214248A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112955729B (en) * 2019-04-19 2024-10-22 贝克顿·迪金森公司 Subsampling flow cytometry event data
WO2023146623A1 (en) * 2022-01-28 2023-08-03 Becton, Dickinson And Company Methods for array binning flow cytometry data and systems for same
WO2024097099A1 (en) * 2022-11-02 2024-05-10 Becton, Dickinson And Company Methods and systems for dimensionality reduction

Family Cites Families (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7043500B2 (en) * 2001-04-25 2006-05-09 Board Of Regents, The University Of Texas Syxtem Subtractive clustering for use in analysis of data
US7796815B2 (en) * 2005-06-10 2010-09-14 The Cleveland Clinic Foundation Image analysis of biological objects
KR20080034012A (en) * 2005-08-02 2008-04-17 루미넥스 코포레이션 Methods, data structures, and systems for classifying microparticles
US7299135B2 (en) 2005-11-10 2007-11-20 Idexx Laboratories, Inc. Methods for identifying discrete populations (e.g., clusters) of data within a flow cytometer multi-dimensional data set
US8214157B2 (en) * 2006-03-31 2012-07-03 Nodality, Inc. Method and apparatus for representing multidimensional data
US7853432B2 (en) * 2007-10-02 2010-12-14 The Regents Of The University Of Michigan Method and apparatus for clustering and visualization of multicolor cytometry data
ES2763537T3 (en) * 2008-09-16 2020-05-29 Beckman Coulter Inc Interactive tree diagram for flow cytometric data
FR2954907B1 (en) 2010-01-04 2012-02-24 Oreal COSMETIC COMPOSITION, COSMETIC PROCESSING METHOD, AND KIT
US10289802B2 (en) * 2010-12-27 2019-05-14 The Board Of Trustees Of The Leland Stanford Junior University Spanning-tree progression analysis of density-normalized events (SPADE)
US8705031B2 (en) * 2011-02-04 2014-04-22 Cytonome/St, Llc Particle sorting apparatus and method
US8809057B2 (en) * 2012-01-04 2014-08-19 Raytheon Bbn Technologies Corp. Methods of evaluating gene expression levels
WO2013119924A1 (en) * 2012-02-09 2013-08-15 Beckman Coulter, Inc. Sorting flow cytometer
WO2013134633A1 (en) * 2012-03-09 2013-09-12 Firefly Bioworks, Inc. Methods and apparatus for classification and quantification of multifunctional objects
JP6396911B2 (en) * 2012-10-15 2018-09-26 ナノセレクト バイオメディカル, インコーポレイテッド System, apparatus and method for sorting particles
US20140336942A1 (en) * 2012-12-10 2014-11-13 The Trustees Of Columbia University In The City Of New York Analyzing High Dimensional Single Cell Data Using the T-Distributed Stochastic Neighbor Embedding Algorithm
US10088407B2 (en) * 2013-05-17 2018-10-02 Becton, Dickinson And Company Systems and methods for efficient contours and gating in flow cytometry
US9551644B2 (en) * 2014-07-11 2017-01-24 Intellicyt Methods and apparatus for real-time detection and clearing of a clog
SG10201507049XA (en) * 2014-09-10 2016-04-28 Agency Science Tech & Res Method and system for automatically assigning class labels to objects
CA2971129A1 (en) 2015-01-22 2016-07-28 The Board Of Trustees Of The Leland Stanford Junior University Methods and systems for determining proportions of distinct cell subsets
US10685045B2 (en) * 2016-07-15 2020-06-16 The Board Of Trustees Of The Leland Stanford Junior University Systems and methods for cluster matching across samples and guided visualization of multidimensional cytometry data
KR102476676B1 (en) 2016-08-05 2022-12-12 패러데이 그리드 리미티드 Power supply system and method
KR102469620B1 (en) 2016-08-22 2022-11-21 아이리스 인터내셔널 인크. Classification systems and methods for biological particles
CN106548205A (en) * 2016-10-21 2017-03-29 北京信息科技大学 A kind of fast automatic point of group of flow cytometry data and circle door method
US9965702B1 (en) * 2016-12-27 2018-05-08 Cesar Angeletti Method for analysis and interpretation of flow cytometry data
US10360499B2 (en) * 2017-02-28 2019-07-23 Anixa Diagnostics Corporation Methods for using artificial neural network analysis on flow cytometry data for cancer diagnosis
WO2018165530A1 (en) * 2017-03-09 2018-09-13 Cytovas Llc Method of constructing a reusable low-dimensionality map of high-dimensionality data
WO2018217933A1 (en) 2017-05-25 2018-11-29 FlowJo, LLC Visualization, comparative analysis, and automated difference detection for large multi-parameter data sets
US11029242B2 (en) * 2017-06-12 2021-06-08 Becton, Dickinson And Company Index sorting systems and methods
AU2019236297B2 (en) * 2018-03-16 2024-08-01 The United States Of America, As Represented By The Secretary, Department Of Health & Human Services Using machine learning and/or neural networks to validate stem cells and their derivatives for use in cell therapy, drug discovery, and diagnostics
CN112136035B (en) 2018-04-26 2024-06-21 贝克顿·迪金森公司 Double index conversion for particle sorter
US20190360914A1 (en) * 2018-05-01 2019-11-28 Stephen W. Meehan Fully automated (unsupervised) identification, matching, display and quantitation of subsets (clusters) by exhaustive projection pursuit methods
EP3844482B1 (en) 2018-08-30 2025-03-12 Becton, Dickinson and Company Characterization and sorting for particle analyzers
WO2020081582A1 (en) * 2018-10-16 2020-04-23 Anixa Diagnostics Corporation Methods of diagnosing cancer using multiple artificial neural networks to analyze flow cytometry data
JP7437393B2 (en) 2018-10-17 2024-02-22 ベクトン・ディキンソン・アンド・カンパニー Adaptive sorting for particle analyzers
CN112955729B (en) * 2019-04-19 2024-10-22 贝克顿·迪金森公司 Subsampling flow cytometry event data
US20220082489A1 (en) * 2020-06-26 2022-03-17 Cytek Biosciences, Inc. Methods and apparatus for full spectrum flow cytometer

Also Published As

Publication number Publication date
EP3956649A4 (en) 2023-01-04
US20220317018A1 (en) 2022-10-06
CN112955729A (en) 2021-06-11
US12019006B2 (en) 2024-06-25
JP2025000819A (en) 2025-01-07
US20230258551A1 (en) 2023-08-17
US11402317B2 (en) 2022-08-02
US11674881B2 (en) 2023-06-13
US20200333236A1 (en) 2020-10-22
JP2022529196A (en) 2022-06-20
WO2020214248A1 (en) 2020-10-22
EP3956649A1 (en) 2022-02-23

Similar Documents

Publication Publication Date Title
US11994459B2 (en) Adaptive sorting for particle analyzers
JP7394786B2 (en) Characterization and sorting for particle analyzers
CN113811754B (en) Characterization and sorting of particle analyzers
US11674881B2 (en) Subsampling flow cytometric event data
US12130223B2 (en) Optimized sorting gates
CN113039428B (en) Compensation Editor
US20200333330A1 (en) Cytometric bead array analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
OSZAR »