WO2023196662A1 - Time series data layered storage systems and methods - Google Patents
Time series data layered storage systems and methods Download PDFInfo
- Publication number
- WO2023196662A1 WO2023196662A1 PCT/US2023/017981 US2023017981W WO2023196662A1 WO 2023196662 A1 WO2023196662 A1 WO 2023196662A1 US 2023017981 W US2023017981 W US 2023017981W WO 2023196662 A1 WO2023196662 A1 WO 2023196662A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- segment
- record
- partition
- time series
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 99
- 238000005192 partition Methods 0.000 claims abstract description 108
- 238000012217 deletion Methods 0.000 claims abstract description 30
- 230000037430 deletion Effects 0.000 claims abstract description 30
- 238000013523 data management Methods 0.000 claims description 32
- 230000000737 periodic effect Effects 0.000 claims description 9
- 230000004044 response Effects 0.000 claims description 4
- 238000013500 data storage Methods 0.000 abstract description 36
- 238000005056 compaction Methods 0.000 abstract description 26
- 238000007726 management method Methods 0.000 abstract description 17
- 238000012545 processing Methods 0.000 abstract description 14
- 230000008569 process Effects 0.000 description 31
- 238000004891 communication Methods 0.000 description 15
- 238000000605 extraction Methods 0.000 description 10
- 230000007246 mechanism Effects 0.000 description 8
- 238000000638 solvent extraction Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 238000010295 mobile communication Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 239000000470 constituent Substances 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000037406 food intake Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000010076 replication Effects 0.000 description 2
- 101100328886 Caenorhabditis elegans col-2 gene Proteins 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 238000009529 body temperature measurement Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013479 data entry Methods 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 239000004557 technical material Substances 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/185—Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
- G06F16/162—Delete operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
- G06F16/164—File meta data generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2477—Temporal data queries
Definitions
- the present disclosure relates generally to systems and methods for managing data. More specifically, the present disclosure relates to systems and methods for managing time series data using layered data storage techniques.
- Time series data such as, for example and without limitation, Internet-of-Things (“loT”) networks
- LoT Internet-of-Things
- data ingestion, storage, and retrieval may benefit from highly scalable solutions for data ingestion, storage, and retrieval.
- Embodiments of the disclosed systems and methods may use layered data storage techniques.
- data may be stored in at least two storage layers.
- a hot storage layer where data may be stored in a record- oriented manner, may be used. Data stored in the hot storage layer may be made available for query with relatively minimal delay using more robust storage infrastructure.
- a hot storage layer may have a mechanism to expire and/or otherwise clean up older data (e.g., delete and/or mark and/or otherwise schedule for eventual deletion), based on user direction and/or automatically after a certain period of time and/or after data ages by a certain amount (e.g., after a number of subsequent data entries, after data is not queried for a certain period of time, and/or the like) and/or via other automated data management processes.
- a cold storage layer described in more detail below, may employ similar mechanisms to expire and/or otherwise clean up older data.
- a cold storage layer may be used, where data may be stored in relatively inexpensive storage infrastructure.
- data stored in a cold storage layer may be stored in a compressed and/or columnar format.
- the access latency for data available in the cold storage layer which in certain instances herein may be referred to as data availability latency (/.e., the time it takes for ingested data to be made available for access), may be relatively high compared with data in the hot storage layer, but the storage costs for larger volumes of data may be relatively smaller.
- data may be stored in a way where certain data may be made available with relatively minimal query response latency and certain data may be stored in a relatively low cost and/or efficient storage solution.
- Further embodiments of the disclosed systems and methods provide techniques for managing and/or otherwise updating data stored between hot and cold storage layers using data merging and/or compaction techniques.
- data storage and management techniques consistent with various aspects disclosed herein may be relatively seamless from the perspective of a user as to where the data is materialized. Indeed, in some embodiments, data may be stored in both hot and cold storage layers, with duplicate data stored in both storage layers being removed at query time.
- Figure 1 illustrates a non-limiting example of a data storage and/or management service architecture consistent with certain embodiments disclosed herein.
- Figure 2 illustrates a non-limiting example of a multi-dimensional data management structure using data partitions and data segments consistent with certain embodiments disclosed herein.
- Figure 3 illustrates a non-limiting example of a data compaction process consistent with certain embodiments disclosed herein.
- Figure 4 illustrates a flow chart of a non-limiting example of data compaction process consistent with certain embodiments disclosed herein.
- Figure 5 illustrates a flow chart of a non-limiting example of data record deletion process using data record tombstones consistent with certain embodiments disclosed herein.
- Figure 6 illustrates a non-limiting example of a system that may be used to implement certain embodiments of the systems and methods of the present disclosure.
- Embodiments of the disclosed systems and methods may use layered data storage techniques to, among other things, provide data storage and/or management with relatively fast query response while reducing reliance on relatively expensive data storage infrastructure.
- a data storage and/or management architecture may comprise a hot storage layer, where data may be made available for query with relatively minimal data availability latency, and a cold storage layer, where data may be stored in a compacted columnar format in a relatively inexpensive storage infrastructure.
- AWS block storage on solid state drives may be significantly more expensive than AWS cloud object storage.
- This cost difference may be more significant if achieving higher durability using some object cloud storage services involves data replication as part of a data management architecture.
- cloud object storage like AWS S3 may provide higher durability within a base service. Achieving comparable durability using fast block storage, however, may involve data replication that may be associated with more storage space and introduce extra costs.
- data size when stored in compact columnar format may be smaller than when stored in row-oriented format.
- Embodiments of the disclosed systems and methods may manage data storage between hot and cold storage layers in a way that more efficiently realizes storage savings in view of the storage cost differences between the layers.
- records entering into the system may be processed by hot storage layer components. Processing data by the hot storage layer may, in some implementations, make incoming records available for queries with relatively shorter delay. A copy of the record may be added to a store, which may be referred to in certain instances as a canonical store, where data may be stored in record-oriented compressed chunks in a cloud object store. These chunks of data may be used to produce cold storage layer updates, which may be periodic in nature.
- stored data may be partitioned. That is, in some implementations, a mechanism may be employed to divide dataset records into data partitions so that any given data record with the same key values ends up in the same data partition.
- an object store may not permit changing of existing objects and/or files, and as such data updates may produce new files written to the object store.
- an update may produce a new updated file record, which may be a columnar file, for every time bucket (e.g., fixed size periods based on data timestamp used to divide data) and for every data partition (assuming there are records belonging to a given data partition and a given time bucket).
- a data retrieval mechanism may filter out possible duplicate records.
- incoming records may be numbered and/or otherwise associated with sequence numbers.
- sequence numbers may be unique within a given partition (but in some implementations not necessarily unique globally) and may be monotonically increasing, although it will be appreciated that other suitable sequence number paradigms may also be used.
- records in the system may be associated with a primary key, which may comprise a set of record values determining distinct records in the system, and a sequence number.
- a data deduplication mechanism consistent with various aspects of the disclosed embodiments may be employed in instances where there are multiple records with a same primary key such that the record of the multiple records with the highest sequence number is used (in the case of monotonically increasing sequence numbers)
- Embodiments of the disclosed systems and methods may allow for record updates by allowing insertion of a new record with the same primary key.
- Embodiments of the deduplication mechanism described above may ensure that a latest record is used.
- a specific record with the same primary key as the record to be deleted, but with no data values (such a record which may be referred to herein in certain instances as a data record tombstone and/or derivatives thereof), may be inserted into the storage system.
- a data record tombstone may be assigned a sequence number higher than the record that is intended to be replaced and/or otherwise deleted. The data record deduplication mechanism may thus retain the data record tombstone as the record with the highest sequence number.
- columnar data files produced by the cold storage layer updated may be divided.
- the data may be first divided into data partitions (e.g., based on some selected column values - that is, selector values). Then the data may be divided into fixed size periods based on data timestamps, which may be referred to as time buckets.
- Time buckets may, for example and without limitation, comprise daily, monthly, and/or yearly buckets. The choice about bucket size may be made based on expected query patterns, which may depend (at least in part) on what period(s) are likely to be queried together. Data may then be divided into data partitions.
- Certain embodiments of the disclosed data management systems and methods may address segmentation of data over time. To have relatively less data in the hot storage layer, the system may update the cold storage layer with some frequency. Cold storage layer updates may produce a new segment file per data partition and per time bucket, leading over time to fragmented representation of data. This may increase data retrieval times as a larger number of segment files may need to be retrieved and processed for answering a query request. To address data fragmentation associated with data storage in the cold storage layer, embodiments of the disclosed systems and methods may merge multiple segment files of the same data partition and the same time bucket into one or more compacted segment files.
- a timeseries data table may comprise temperature measurements.
- Each record may have 3 fields: sensorjd, time, and temperature.
- the data may be partitioned by sensorjd - that is, each sensor data may belong to a separate data partition.
- sensors may report data once per minute.
- a table may be configured to materialize data into a cold storage layer which may be updated periodically (e.g., daily).
- the data may be collected into time buckets (e.g., monthly time buckets).
- time buckets e.g., monthly time buckets.
- the sensors operate correctly and therefore that there are few if any gaps in incoming data with minimal delays in data arrival.
- the example system may run a cold store update at 6 PM on February 28 th .
- An update may produce at least one new segment file for each sensor containing record(s) since the last update (e.g., since 6 PM on the preceding day, February 27 th ).
- the next update running at 6 PM on March 1 st may produce two segment files per sensor: one file belonging to the February time bucket containing records since the last update until March 1 st at 00:00 and one file belonging to the March time bucket.
- the number of segment files within any given time bucket of a data partition may trigger data compaction processes consistent with various aspects of the disclosed embodiments.
- data compaction processes may eliminate duplicate records and/or tombstones using various aspects of the record deduplication mechanism detailed above.
- data compaction processes may be performed and/or otherwise triggered periodically (e.g., based on a user specified period and/or the like), based on reaching a threshold size of segment files and/or records, and/or the like.
- Data storage in storage layers consistent with various aspects of the disclosed embodiments may depend, at least in part, on one or more time periods, which may be set by a user and/or otherwise adjusted as appropriate.
- a time period to update data stored in the cold storage layer may be denoted as T d .
- a minimal time period of data stored in the hot storage layer may be Tu+T P (where T P is maximum time period required for processing a data increment to be available in the cold storage layer).
- the cold storage layer may be updated daily (e.g., every 24 hours) and the cold storage layer update process may be set to not exceed one hour.
- TTL time-to-live
- the set of possibly overlapping records may fall into the time period T P .
- one of the hot storage layer or the cold storage layer may be used instead of using both the hot storage layer and the cold storage layer for data storage and management.
- the hot storage layer may be used in applications where the most recent data need be stored and made available with minimal latency (and relatively low data volumes).
- the cold storage layer may be used in applications where data availability latency requirements are relatively low, but data volumes are relatively high.
- Figure 1 illustrates a non-limiting example of a data storage and/or management service architecture 100 consistent with certain embodiments disclosed herein.
- the architecture 100 may comprise systems, services, and/or components associated with a hot storage layer and a cold storage layer.
- the architecture 100 may further comprise systems, services, and/or components shared between the hot and cold storage layers and systems, services, and/or components associated with canonical storage.
- the definitions metastore 102 may provide definitions relating to namespaces, which may allow for different users to operate on and/or process data in a particular table while operating in different namespaces .
- Namespaces may be used, for example and without limitation, to localize table names (e.g., table names may be unique within a namespace) and/or to apply access rights to a namespace.
- information included the definitions metastore 102 may be used to grant access rights based on namespaces (e.g., by an access management system and/or service). For example, users may be granted privileges to access certain data tables and be restricted from accessing certain other data tables.
- the definitions metastore 102 may further provide definitions relating to data tables, which may define the logical structure of data tables stored and/or otherwise managed by the service. Definitions relating to data tables may comprise, for example and without limitation, information relating to table elements and/or columns, data types, and/or the like. In some embodiments, the definitions metastore 102 may further provide information relating to one or more partitioning schemes (e.g., projections) supported by the data management service.
- partitioning schemes e.g., projections
- the definitions metastore 102 may provide definitions relating to storage layers. For example, definitions may be provided regarding whether and/or what data should be stored in a hot storage layer, a cold storage layer, both storage layers, and/or the like, retention periods for stored data, which in some implementations may differ depending on the layer, update information for the hot and/or cold storage layers, criteria for data compaction operations, and/or the like. In this manner, information included in the definitions metastore 102 may help define the logical structure of data, how it should be partitioned by the service, how it should be written to storage, etc.
- the hot storage layer may comprise a streaming writer 104 and a hot data store 106.
- Data ingested into the data storage and management service may be published into one or more partitioned topics, which in some implementations may comprise partitioned Kafka topics.
- each message published to a topic may have a sequence number within an associated partition.
- each message published to a Kafka topic may have an offset within a given Kafka topic partition, which may function as a sequence number for various data management operations consistent with embodiments disclosed herein.
- the data storage and management service may expose a REST API that may allow external systems and/or services to insert data records into the data storage and/or management service.
- Data may be consumed (e.g., consumed from each topic) by a streaming writer 104.
- the streaming writer 104 may be configured to detect which data partition an incoming data record belongs to, store the record in the hot data store in the partition, and/or associate the data record with the data partition key associated with the target data partition.
- the streaming writer 104 may comprise a Cassandra keyvalue database.
- the streaming writer 104 may further detect new data partitions from the ingested data records, potentially repartitioning the ingested data if needed (e.g., based on information included in the definitions metastore 102), add the data partition record to a data partitions index 108 (if needed), which may be shared between the hot storage and cold storage layers, and then store the data record with the new data partition key in the hot data store 106.
- sequence numbers may be assigned during the data ingestion process (e.g., assigned by the streaming writer 104).
- sequence numbers may be globally unique and/or increase monotonically.
- sequence numbers may be monotonically increasing and/or unique within a given data partition.
- data associated with topics ingested by the service may be associated with unique offset numbers within a given topic partition (e.g., as may be in the case with Kafka topics), which may be used as and/or otherwise associated with sequence numbers consistent with various aspects of the disclosed embodiments. It will be appreciated that sequence numbers may be associated with other paradigms.
- data stored in the hot data store 106 may be associated with a time-to-live ("TTL") specifying a time and/or period that the data should be kept in the hot data store 106.
- TTL time-to-live
- this information may be specified in the definitions metastore 102.
- the relevant Cassandra table may have TTL set according to a user-specified configuration.
- a canonical storage layer may comprise a canonical store writer 110, a canonical store 112, and a canonical segment index 114.
- Data ingested into the data storage and management service may be provided to the canonical store writer 110.
- the canonical store writer 110 may consume received topic record data, process the data, and/or store the data in a canonical store 112.
- the canonical store 112 may, in some embodiments, comprise a cloud-based storage service such as, for example and without limitation, AWS S3.
- Files written to the canonical store 112 may be associated with a record added to the canonical segment index 114, which may provide index information relating to records stored in the canonical store 112.
- Data stored in the canonical store 112 may be used in connection with various cold layer storage operations, as discussed in more detail below, partitioning and/or repartitioning operations, data backup operations, and/or the like.
- the cold storage layer may comprise a segment extraction service 116, a cold data segment store 118, a data segment indexer 120, a data segment index 122, and/or a segment compaction service 126.
- data stored in the canonical store 112 and/or index information included in the canonical segment index 114 may be used to build data records within the cold storage layer.
- the segment extraction service 116 may interact with the canonical store 112 and/or the canonical segment index 114 to access data from the canonical store 112, potentially process the data (e.g., partitioning and/or otherwise organizing the data into time buckets ordered by record time), and store the data within the cold data segment store 118.
- the segment extraction service 116 may interact with the data segment indexer service 120 to generate one or more records in a data segment index 122 associated with the data stored in the cold data segment store 118.
- the segment extraction service 116 may store data in the cold data segment store 118 based, at least in part, on information included in the definitions metastore 102.
- the definitions metastore 102 may include information relating to cold data storage layer data storage and/or update scheduling, which may comprise information relating to update period, update frequency, update data amount thresholds, and/or the like. This information may be used by the segment extraction service 116 to schedule data recordation actions and/or updates from the canonical store 112 to the cold data segment store 118.
- the definitions metastore 102 may include update scheduling information indicating that the cold storage layer should be updated daily.
- Records added to the canonical store 112 in the day period may then be retrieved by the segment extraction service 116, partition the records in accordance with a partitioning scheme (which may be defined by information included in the definitions metastore 102), and then write the partitioned data to the cold data segment store 118.
- data stored in the cold data segment store 118 may comprise columnar files.
- data written to the cold data segment store 118 may be divided between time periods, which may be referred to in certain instances herein as time buckets, so that data of a single data partition associated with timestamps belonging to a given time period are stored in the same time bucket. This may, among other things, facilitate streamlined data retrieval and/or management operations. For example, in connection with data retrieval over a specific time range, time bucket information may be used to quickly identify data segments for retrieval.
- Data written to the cold data segment store 118 may be associated with one or more records included in a data segment index 122.
- the segment extraction service 116 may interact with a data segment indexer 120 to add an index record to the data segment index 122 associated with the data record.
- the segment extraction service 116 may be implemented using Apache Spark and the cold data segment store 118 may be implemented using Parquet and/or AWS S3 storage.
- a Spark job may be launched by the segment extraction service 116, potentially on a periodic basis (e.g., on a user-specified periodic basis).
- the Spark job may produce a new data segment for storage by the cold data segment store 118 as a Parquet file for defined data partitions and time buckets.
- the segment may be stored in AWS S3 storage and relevant entry may be added to the data segment index 122 by the data segment indexer 120.
- Another Spark job (e.g., a periodic Spark job) may be executed to implement segment compaction for datasets, which may in some implementations meet user-specified compaction criteria.
- use of a canonical storage layer in conjunction with a cold storage layer may allow for certain optimized data, processing, management, retrieval, and/or query functionality.
- the canonical store 112 may store record data in a compacted form, but the partitioning and/or division of data and use of time buckets in connection with the cold data segment store 118 may provide certain data processing, retrieval, management, and/or querying efficiencies that may not be otherwise realized directly by the canonical storage layer.
- the definitions metastore 102 may comprise information used by various systems, services, and/or components of the disclosed service to determine which ingested topics should be recorded by the hot data storage layer and the canonical store (and by extension, the cold data storage layer).
- the streaming writer 104 and the canonical store writer 110 may use information included in the definitions metastore 102 to determine which ingested data should be recorded in the hot data store 106 and/or the canonical store 112.
- an entire incoming data stream may be ingested by the canonical store writer 110 for storage in the canonical store 112 (and/or the cold data storage layer), but only a subset of data may be ingested by the streaming writer 104 for storage in the hot data store 106.
- the subset may be associated with particular data topics, tables, and/or associated projections.
- the definitions metastore 102 may include information directing that the streaming writer 104 process incoming data associated with a particular topic for storage in the hot data store 106 (e.g., if there is a hot storage materialization defined for the incoming topic and/or the like).
- the definitions metastore 102 may comprise information specifying a variety of other ways that data included in a data stream be processed and/or otherwise ingested by the canonical store writer 110 and/or the streaming writer 104.
- the definitions metastore 102 may comprise information specifying that all incoming data may be ingested by both the hot storage layer and the canonical storage layer.
- data stored in the canonical store 112 may be used in connection with data restoration and/or backup operations. For example, if data is deleted from the hot storage layer and/or the cold storage layer but remains stored in the canonical store 112, it may be restored to the hot storage layer and/or the cold storage layer from the canonical store 112.
- data stored in the canonical store 112 may be used in connection with data repartitioning operations.
- the data storage and/or management service and/or a user thereof may determine that it is advantageous to repartition data stored in the cold storage layer from the original materialized projection (e.g., based on how the data in the cold storage layer is being queried or the like).
- the data may be repartitioned and stored in the cold data storage layer consistent with the updated projection. It will be appreciated that a variety of other events triggering a repartitioning of data in the cold data storage layer may be used in connection with various aspects of the disclosed embodiments.
- a streaming read API 124 may be queried with relevant query information (e.g., identifying data partitions and/or time periods).
- the streaming read API 124 may query the hot and cold storage layers based on the identified data partitions and/or time periods.
- low level data retrieval components may apply filters to the fetched data.
- the time-ordered sequences of records belonging to data partitions fetched from both layers may be processed by a deduplicator, where records having the same primary key but lower sequence number may be discarded. Then records from different data partitions may be merged into single result and optional post-processing like sorting or aggregation may be executed.
- a segment compaction service 126 may launch segment merging and/or compaction operations consistent with various disclosed embodiments (e.g., by launching an associating Spark job), potentially on a periodic basis and/or according to a user-specified schedule.
- the segment compaction operation may be performed and/or otherwise implement certain user-specified compaction criteria.
- Figure 2 illustrates a non-limiting example of a multi-dimensional data management structure 200 using data partitions and data segments consistent with certain embodiments disclosed herein.
- ingested data may be organized in a multi-dimensional space with a first dimension comprising an index to data partitions 202 and a second dimension comprising an index to data segments 204 within data partitions.
- data and/or entries within data segments may be time ordered.
- a data table may comprise columns, from which a subset of columns may be selected for calculating data partitioning keys.
- these columns of a data table may comprise entries that may be referred to as selectors.
- Selectors may be associated with a given partitioning scheme (which may be referred to in certain instances herein as a data projection and/or derivatives of the same).
- data partition keys may be calculated as a function of certain data values included in a data table (e.g., a hash function). As illustrated in connection with Figure 2, selectors may be included in a data partitions index 202 associated with data partition keys.
- the data segments index 204 may associate data partition keys with specific timestamp ranges.
- a data partition key may be associated with multiple segments of a particular data partition (e.g., data partition key key 1 may be associated with the first and second data segments 206, 208 of the first data partitions).
- a data partition key may be associated with a single segment of a data partition (e.g., data partition key key M may be associated with a first data segment 210 of a M th data partition).
- records with timestamp l.T may exist in the both the first and the second segments 206, 208 of the first data partition.
- ingested records and/or data may be associated with a sequence number. Multiple records associated with the same timestamp may be differentiated based on associated sequence numbers.
- sequence numbers may be globally unique and increase monotonically. In further embodiments, sequence numbers may be unique within a given data partition. In the event there are duplicate records in the system, during data retrieval and/or querying processes, duplicate records may be filtered out so that only the data and/or record with the highest sequence number is returned. In certain embodiments, additional table columns may be associated with a data record to allow for additional information to be associated with the record and be used in connection with record differentiation.
- Sequence numbers associated with data records may be used in connection with a variety of data operations including, for example and without limitation, data update, data access, data deletion, and/or data compaction and/or merging operations.
- sequence numbers may be used in connection with ingesting and retrieving updates of previously ingested data records, where a data record with a higher sequence number may be retrieved as part of a query to ensure the most up to date record is retrieved.
- a data record associated with timestamp time 1.1 in the first segment 206 of the first data partition may be associated with sequence number seq 1.
- An update to the data record, also associated with timestamp time 1.1 may be ingested and stored in same segment 206.
- the updated data record may be assigned sequence number seq 2.
- sequence number seq 2 the record with the greater sequence number - that is, sequence number seq 2 - would be retrieved and/or otherwise considered the most up to date data record.
- a data table may comprise columns coh, col 2 , cols, col 4 , where col 2 and col 3 are the selectors for projection pi and a record r N is a collection of tuples of column identifier and value ((id(coh), valw), (id(col2), VCII2N), (id(cols), valsm), (id(col 4 ), VQI 4 N)).
- the relevant processing/storage partition index can be calculated as hash(key N ) mod P.
- Sequence numbers consistent with various aspects of the disclosed systems and methods may allow for streamlined data updates and/or retrieval operations.
- original data may not be deleted when updates are received (and in some embodiments may be assigned higher sequence numbers)
- use of sequence numbers consistent with various aspects of the disclosed embodiments may facilitate data auditing and/or other methods of inspecting data record history, provenance, and/or the like.
- original data may not be immediately deleted (and/or may be configured to be retained in perpetuity and/or for some length of time depending on how data cleanup and/or deduplication processes are configured)
- use of sequence numbers consistent with aspects of the disclosed embodiments may provide data record versioning and/or backup functionality, where data records with lower sequence numbers may be accessed to access prior versions of data records.
- Figure 3 illustrates a non-limiting example of a data compaction process 300 consistent with certain embodiments disclosed herein.
- a first record 308 associated with timestamp time 1.1 may be stored in a first segment of a first data partition 302 and be associated with sequence number seq 1.
- a second record 310 also associated with timestamp time 1.1 may also be stored in the first segment of the first data partition 302, associated with a higher sequence number seq 2.
- the second record 310 may comprise, for example, an update to the first record 308.
- data compaction processes may generate a compacted segment of a first data partition 306 that comprises the record with the higher sequence number - that is, the second data record 310 associated with timestamp time 1.1 and sequence number seq 2.
- a data record 312 associated with timestamp time l.T may be stored in the first segment of the first data partition 302 and be associated with sequence number seq S.
- a different data record 314 associated with timestamp time l.T may be stored in the second segment of the first data partition 304 and be associated with sequence number seq S+l.
- a data compaction process consistent with certain embodiments disclosed herein may add the record with the higher sequence number - that is, the data record 314 associated with timestamp time l.T and sequence number seq S+l - to the compacted data segment 306.
- a data record 316 included in the second segment of the second data partition 304 associated with timestamp time l.T+1 and sequence number seq S+2 may also be added to the compacted data segment 306.
- the most current data records of the first and second segments of the first partition 302, 304 may be combined in the compacted data segment of the first data partition 306 (with the first and second segments 302, 304 being scheduled for eventual deletion).
- Figure 4 illustrates a flow chart of a non-limiting example of data compaction process 400 consistent with certain embodiments disclosed herein.
- the illustrated process 400 may be implemented in a variety of ways, including using software, firmware, hardware, and/or any combination thereof.
- various aspects of the process 400 and/or its constituent steps may be performed by one or more systems and/or services, including systems and/or services that may implement aspects of a hot data storage layer, a cold data storage layer, a canonical data store, and/or various shared systems and/or services.
- the data compaction process 400 and/or aspects thereof may be initiated periodically, based on user direction, and/or following one or more conditions and/or triggers.
- the disclosed data compaction process and/or aspects thereof may be initiated based on determining that a total number of data segments of a data partition has reached a threshold number of data segments, a total storage size of a data partition has reached a threshold total storage size, and/or the like.
- a first time series data record stored in a first segment of a first data partition may be identified.
- the first data partition may be stored in a cold data store (e.g., a cold data store managed, at least in part, by a data management service system).
- the cold data store may comprise a cloud service data store.
- the first time series data record may be associated with a first timestamp and a first sequence identifier.
- sequence identifiers may comprise sequence numbers, although other types of sequence identifiers may also be used.
- sequence identifiers and/or numbers may be monotonically increasing and/or be unique within a given partition.
- It may be determined at 404 whether another record in the first data partition is associated with the same timestamp as the first timestamp (e.g., as may be the case if an update, revision, and/or newer record to the first time series data record has been stored in the first data partition).
- the process 400 may proceed to 406, where a second time series data record stored in the first data partition that is associated with the first timestamp may be identified.
- the second time series data record may be associated with a second sequence identifier.
- the second time series data record may be stored in the first segment of the first data partition. In further embodiments, the second time series data record may be stored in another segment of the first data partition.
- a compacted segment of the first data partition may be generated and/or stored at 410.
- the compacted data segment may comprise the second time series data record.
- a record may be updated multiple times.
- a record with the largest sequence identifier e.g., indicating it is the most recent updated record
- sharing a timestamp with other records may be identified and included in the compacted data segment.
- the first segment of the first data partition may be marked for deletion at 412.
- a cold storage layer data segment index may be updated.
- a third time series data record may be identified in the first data segment of the first data partition that is associated with a second timestamp and a third sequence identifier.
- a fourth time series data record stored in a second segment of the first data partition may be further identified that is also associated with the second timestamp and is further associated with a fourth sequence identifier. It may be determined that the fourth sequence identifier is greater than the third sequence identifier, and the generated compacted segment of the first data partition may further include the fourth time series data record.
- a compacted segment may comprise records originating from a plurality of data segments of a data partition.
- the second data segment may be marked for deletion as part of the compaction process.
- marking one or more data segments for deletion may comprise scheduling data segment(s) for deletion.
- deletion of data segments may be scheduled to occur at a particular deletion time.
- the deletion time may comprise, for example, a next scheduled deletion time, which may be periodic and/or scheduled by a user, a deletion time determined based, at least in part, on determining that a total number of data segments of the first data partition has reached a threshold number of data segments, determining that a total storage size of the first data partition has reached a threshold total storage size, and/or the like.
- Figure 5 illustrates a flow chart of a non-limiting example of data record deletion process 500 using data record tombstones consistent with certain embodiments disclosed herein.
- the illustrated process 500 may be implemented in a variety of ways, including using software, firmware, hardware, and/or any combination thereof.
- various aspects of the process 500 and/or its constituent steps may be performed by one or more systems and/or services, including systems and/or services that may implement aspects of a hot data storage layer, a cold data storage layer, a canonical data store, and/or various shared systems and/or services.
- a time series data record may be received for storage in a data partition (e.g., a data partition of a cold data store). Consistent with various disclosed embodiments, the time series data record may be associated with a timestamp and a first sequence identifier. The time series data record may be stored in the data partition at 504.
- a data partition e.g., a data partition of a cold data store. Consistent with various disclosed embodiments, the time series data record may be associated with a timestamp and a first sequence identifier. The time series data record may be stored in the data partition at 504.
- a request to delete the time series data record may be received at 506.
- a time series data record tombstone may be generated and stored (stored in the data partition in the same and/or a different segment within the data partition) at 508.
- the data record tombstone may be associated with the timestamp and a second sequence identifier, which may be higher and/or greater than the first sequence identifier, indicating that the tombstone was recorded after the time series data record.
- the time series data record tombstone may not include any data values (/.e., it may be an empty data record) and/or comprise information indicating and/or otherwise identifying that the record is a tombstone record.
- the tombstone record may be identified as part of a data cleanup, compaction, and/or merging processes consistent with various disclosed embodiments and used in connection with data management processes.
- a record associated with a tombstone record may not be included in a compacted data segment (which may or may not include the tombstone record) generated as part of a data compaction and/or merging process.
- Figure 6 illustrates an example of a system 600 that may be used to implement certain embodiments of the systems and methods of the present disclosure.
- the various systems, services, and/or devices used in connection with aspects the disclosed embodiments may be communicatively coupled using a variety of networks and/or network connections (e.g., network 608).
- the network 608 may comprise a variety of network communication devices and/or channels and may utilize any suitable communications protocols and/or standards facilitating communication between the systems and/or devices.
- the network 608 may comprise the Internet, a local area network, a virtual private network, and/or any other communication network utilizing one or more electronic communication technologies and/or standards (e.g., Ethernet or the like).
- the network 608 may comprise a wireless carrier system such as a personal communications system ("PCS"), and/or any other suitable communication system incorporating any suitable communication standards and/or protocols.
- PCS personal communications system
- the network 608 may comprise an analog mobile communications network and/or a digital mobile communications network utilizing, for example, code division multiple access (“CDMA”), Global System for Mobile Communications or Groupe Special Mobile (“GSM”), frequency division multiple access (“FDMA”), time divisional multiple access (“TDMA”) standards, 4G and/or 5G communication standards (e.g., Long-Term Evolution (“LTE”), 5G New Radio (“NR”), orthogonal frequency division multiple access (“OFDMA”), etc.).
- the network 608 may incorporate one or more satellite communication links.
- the network may utilize IEEE's 802.11 standards, Bluetooth’, ultra-wide band (“UWB”), Zigbee*, and or any other suitable standard or standards.
- the various systems and/or devices used in connection with aspects of the disclosed embodiments may comprise a variety of computing devices and/or systems, including any computing system or systems suitable to implement the systems and methods disclosed herein.
- the connected devices and/or systems may comprise a variety of computing devices and systems, including laptop computer systems, desktop computer systems, server computer systems, distributed computer systems, smartphones, tablet computers, and/or the like.
- the systems and/or devices may comprise at least one processor system configured to execute instructions stored on an associated non-transitory computer-readable storage medium.
- systems used in connection with implementing various aspects of the disclosed embodiments may further comprise a secure processing unit ("SPU") configured to perform sensitive operations such as trusted credential and/or key management, cryptographic operations, secure policy management, and/or other aspects of the systems and methods disclosed herein.
- the systems and/or devices may further comprise software and/or hardware configured to enable electronic communication of information between the devices and/or systems via a network using any suitable communication technology and/or standard.
- the example system 600 may comprise: a processing unit 602; system memory 604, which may include high speed random access memory (“RAM”), nonvolatile memory (“ROM”), and/or one or more bulk non-volatile non-transitory computer- readable storage mediums (e.g., a hard disk, flash memory, etc.) for storing programs and other data for use and execution by the processing unit 602; a port 614 for interfacing with removable memory 616 that may include one or more diskettes, optical storage mediums (e.g., flash memory, thumb drives, USB dongles, compact discs, DVDs, etc.) and/or other non-transitory computer-readable storage mediums; a network interface 606 for communicating with other systems via one or more network connections and/or networks 608 using one or more communication technologies; a user interface 612 that may include a display and/or one or more input/output devices such as, for example, a touchscreen, a keyboard, a mouse, a track pad, and the like
- the system 600 may, alternatively or in addition, include an SPU 610 that is protected from tampering by a user of the system 600 or other entities by utilizing secure physical and/or virtual security techniques.
- An SPU 610 can help enhance the security of sensitive operations such as personal information management, trusted credential and/or key management, privacy and policy management, and other aspects of the systems and methods disclosed herein.
- the SPU 610 may operate in a logically secure processing domain and be configured to protect and operate on secret information, as described herein.
- the SPU 610 may include internal memory storing executable instructions or programs configured to enable the SPU 610 to perform secure operations, as described herein.
- the operation of the system 600 may be generally controlled by the processing unit 602 and/or an SPU 610 operating by executing software instructions and programs stored in the system memory 604 (and/or other computer-readable media, such as removable memory 616).
- the system memory 604 may store a variety of executable programs or modules for controlling the operation of the system 600.
- the system memory may include an operating system ("OS") 620 that may manage and coordinate, at least in part, system hardware resources and provide for common services for execution of various applications and a trust and privacy management system 622 for implementing trust and privacy management functionality including protection and/or management of personal data through management and/or enforcement of associated policies.
- OS operating system
- a trust and privacy management system 622 for implementing trust and privacy management functionality including protection and/or management of personal data through management and/or enforcement of associated policies.
- the system memory 604 may further include, without limitation, communication software 624 configured to enable in part communication with and by the system 600, one or more applications, data management services626 configured to implement various aspects of the disclosed systems and/or methods, and/or any other information and/or applications configured to implement embodiments of the systems and methods disclosed herein and/or aspects thereof.
- the systems and methods disclosed herein are not inherently related to any particular computer, electronic control unit, or other apparatus and may be implemented by a suitable combination of hardware, software, and/or firmware.
- Software implementations may include one or more computer programs comprising executable code/instructions that, when executed by a processor, may cause the processor to perform a method defined at least in part by the executable instructions.
- the computer program can be written in any form of programming language, including compiled or interpreted languages, and can be deployed in any form, including as a standalone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. Further, a computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.
- Software embodiments may be implemented as a computer program product that comprises a non-transitory storage medium configured to store computer programs and instructions, that when executed by a processor, are configured to cause the processor to perform a method according to the instructions.
- the non-transitory storage medium may take any form capable of storing processor-readable instructions on a non- transitory storage medium.
- a non-transitory storage medium may be embodied by a compact disk, digital-video disk, a magnetic disk, flash memory, integrated circuits, or any other non- transitory digital processing apparatus memory device.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP23721159.4A EP4505320A1 (en) | 2022-04-08 | 2023-04-07 | Time series data layered storage systems and methods |
AU2023248423A AU2023248423A1 (en) | 2022-04-08 | 2023-04-07 | Time series data layered storage systems and methods |
JP2024559449A JP2025511879A (en) | 2022-04-08 | 2023-04-07 | Time series data tiered storage system and method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263329346P | 2022-04-08 | 2022-04-08 | |
US63/329,346 | 2022-04-08 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023196662A1 true WO2023196662A1 (en) | 2023-10-12 |
Family
ID=86286160
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2023/017981 WO2023196662A1 (en) | 2022-04-08 | 2023-04-07 | Time series data layered storage systems and methods |
Country Status (5)
Country | Link |
---|---|
US (1) | US20230325363A1 (en) |
EP (1) | EP4505320A1 (en) |
JP (1) | JP2025511879A (en) |
AU (1) | AU2023248423A1 (en) |
WO (1) | WO2023196662A1 (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12001406B2 (en) * | 2022-07-29 | 2024-06-04 | Oracle International Corporation | Method and system to implement directory reads for a database file system |
CN117648297B (en) * | 2024-01-30 | 2024-06-11 | 中国人民解放军国防科技大学 | Offline merging method, system, device and medium of small files based on object storage |
CN118820264B (en) * | 2024-09-18 | 2024-12-17 | 戎行技术有限公司 | Partition-based data processing method and apparatus |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10033706B2 (en) * | 2015-12-04 | 2018-07-24 | Samsara Networks Inc. | Secure offline data offload in a sensor network |
US20200167360A1 (en) * | 2018-11-23 | 2020-05-28 | Amazon Technologies, Inc. | Scalable architecture for a distributed time-series database |
US10775976B1 (en) * | 2018-10-01 | 2020-09-15 | Splunk Inc. | Visual previews for programming an iterative publish-subscribe message processing system |
US11068537B1 (en) * | 2018-12-11 | 2021-07-20 | Amazon Technologies, Inc. | Partition segmenting in a distributed time-series database |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9753935B1 (en) * | 2016-08-02 | 2017-09-05 | Palantir Technologies Inc. | Time-series data storage and processing database system |
-
2023
- 2023-04-07 WO PCT/US2023/017981 patent/WO2023196662A1/en active Application Filing
- 2023-04-07 US US18/132,348 patent/US20230325363A1/en active Pending
- 2023-04-07 AU AU2023248423A patent/AU2023248423A1/en active Pending
- 2023-04-07 EP EP23721159.4A patent/EP4505320A1/en active Pending
- 2023-04-07 JP JP2024559449A patent/JP2025511879A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10033706B2 (en) * | 2015-12-04 | 2018-07-24 | Samsara Networks Inc. | Secure offline data offload in a sensor network |
US10775976B1 (en) * | 2018-10-01 | 2020-09-15 | Splunk Inc. | Visual previews for programming an iterative publish-subscribe message processing system |
US20200167360A1 (en) * | 2018-11-23 | 2020-05-28 | Amazon Technologies, Inc. | Scalable architecture for a distributed time-series database |
US11068537B1 (en) * | 2018-12-11 | 2021-07-20 | Amazon Technologies, Inc. | Partition segmenting in a distributed time-series database |
Also Published As
Publication number | Publication date |
---|---|
AU2023248423A1 (en) | 2024-09-26 |
EP4505320A1 (en) | 2025-02-12 |
US20230325363A1 (en) | 2023-10-12 |
JP2025511879A (en) | 2025-04-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230325363A1 (en) | Time series data layered storage systems and methods | |
CN110249321B (en) | System and method for capturing change data from a distributed data source for use by heterogeneous targets | |
US10764045B2 (en) | Encrypting object index in a distributed storage environment | |
US10158483B1 (en) | Systems and methods for efficiently and securely storing data in a distributed data storage system | |
US10296494B2 (en) | Managing a global namespace for a distributed filesystem | |
US10761758B2 (en) | Data aware deduplication object storage (DADOS) | |
US7257690B1 (en) | Log-structured temporal shadow store | |
US7756831B1 (en) | Cooperative locking between multiple independent owners of data space | |
US9804928B2 (en) | Restoring an archived file in a distributed filesystem | |
US10635632B2 (en) | Snapshot archive management | |
US9852149B1 (en) | Transferring and caching a cloud file in a distributed filesystem | |
US9792298B1 (en) | Managing metadata and data storage for a cloud controller in a distributed filesystem | |
US9811662B2 (en) | Performing anti-virus checks for a distributed filesystem | |
US9678968B1 (en) | Deleting a file from a distributed filesystem | |
CN113515487B (en) | Directory query method, computing device and distributed file system | |
WO2011108021A1 (en) | File level hierarchical storage management system, method, and apparatus | |
US20150293699A1 (en) | Network-attached storage enhancement appliance | |
JP6968876B2 (en) | Expired backup processing method and backup server | |
US20140006354A1 (en) | Executing a cloud command for a distributed filesystem | |
US11093387B1 (en) | Garbage collection based on transmission object models | |
US8533158B1 (en) | Reclaiming data space by rewriting metadata | |
EP2583183A1 (en) | Data deduplication | |
US20070061540A1 (en) | Data storage system using segmentable virtual volumes | |
EP3788505B1 (en) | Storing data items and identifying stored data items | |
US10628298B1 (en) | Resumable garbage collection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23721159 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: AU23248423 Country of ref document: AU |
|
ENP | Entry into the national phase |
Ref document number: 2023248423 Country of ref document: AU Date of ref document: 20230407 Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2024559449 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023721159 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2023721159 Country of ref document: EP Effective date: 20241108 |