WO2019164718A1 - Supervised learning system - Google Patents

Supervised learning system Download PDF

Info

Publication number
WO2019164718A1
WO2019164718A1 PCT/US2019/017777 US2019017777W WO2019164718A1 WO 2019164718 A1 WO2019164718 A1 WO 2019164718A1 US 2019017777 W US2019017777 W US 2019017777W WO 2019164718 A1 WO2019164718 A1 WO 2019164718A1
Authority
WO
WIPO (PCT)
Prior art keywords
data item
item
decision
information
classifier
Prior art date
Application number
PCT/US2019/017777
Other languages
French (fr)
Inventor
Lukas Machlica
Ivan Nikolaev
Jan Brabec
Original Assignee
Cisco Technology, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cisco Technology, Inc. filed Critical Cisco Technology, Inc.
Priority to EP19707599.7A priority Critical patent/EP3756146B1/en
Publication of WO2019164718A1 publication Critical patent/WO2019164718A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
    • G06F21/53Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by executing in a restricted environment, e.g. sandbox or secure virtual machine
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/145Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials

Definitions

  • the present disclosure generally relates to supervised learning systems, and more specifically to systems for providing explanations of classification decisions made using supervised learning systems.
  • Machine learning solutions are known in which supervised learning is used to train a blackbox classifier.
  • One non-limiting example of such a classifier is a decision tree; other examples of black-box classifiers are known in the art.
  • the example of a decision tree is often used throughout the present specification.
  • FIG. 1 is a simplified schematic illustration of a decision tree constructed and operative in accordance with an embodiment of the present disclosure
  • FIG. 2 is a simplified schematic illustration of another decision tree constructed and operative in accordance with another embodiment of the present disclosure
  • FIG. 3 illustrates pseudo code which provides a particularly detailed non- limiting example of how the decision tree of Fig. 2 may be built
  • FIG. 4 is a simplified block diagram illustration of an exemplary device suitable for implementing various ones of the systems, methods or processes described herein;
  • FIG. 5 is a simplified flowchart illustration of a method for training a classifier
  • FIG. 6 is a simplified flowchart illustration of a method for applying a trained classifier.
  • a system includes a processor and a memory to store data used by the processor.
  • the processor is operative to access at least one first data item used to train a classifier; access at least one second data item, the second data item not being used to train the classifier; produce a trained classifier based on training using the at least one first data item; store in the trained classifier, as decision determining information, information of the at least one first data item; and also store in the trained classifier, in association with the decision determining information, decision explanation information of the at least one second data item.
  • a system includes a processor; and a memory to store data used by the processor.
  • the processor is operative to: access a trained classifier, the trained classifier trained based at least on a first data item and including both decision determination information of the first data item and decision explanation information of at least one second data item, the second data item being distinct from the first data item; receive an item for classification; use the trained classifier to classify the item for classification; and provide item decision information regarding a reason for classifying the item for classification, the item decision information being based on at least a part of the decision explanation information.
  • a method includes accessing at least one first data item used to train a classifier; accessing at least one second data item, the second data item not being used to train the classifier; producing a trained classifier based on training using the at least one first data item; storing in the trained classifier, as decision determining information, information of the at least one first data item; and also storing in the trained classifier, in association with the decision determining information, decision explanation information of the at least one second data item.
  • a method includes accessing a trained classifier, the trained classifier trained based at least on a first data item and including both decision determination information of the first data item and decision explanation information of at least one second data item, the second data item being distinct from the first data item; receiving an item for classification; using the trained classifier to classify the item for classification; and providing item decision information regarding a reason for classifying the item for classification, the item decision information being based on at least a part of the decision explanation information.
  • a computer-readable storage medium includes stored therein data representing software executable by a computer, the software including instructions including: instructions for accessing at least one first data item used to train a classifier; instructions for accessing at least one second data item, the second data item not being used to train the classifier; instructions for producing a trained classifier based on training using the at least one first data item; instructions for storing in the trained classifier, as decision determining information, information of the at least one first data item; and instructions for also storing in the trained classifier, in association with the decision determining information, decision explanation information of the at least one second data item.
  • a computer-readable storage medium includes stored therein data representing software executable by a computer, the software including instructions including: instructions for accessing a trained classifier, the trained classifier trained based at least on a first data item and including both decision determination information of the first data item and decision explanation information of at least one second data item, the second data item being distinct from the first data item; instructions for receiving an item for classification; instructions for using the trained classifier to classify the item for classification; and instructions for providing item decision information regarding a reason for classifying the item for classification, the item decision information being based on at least a part of the decision explanation information.
  • a different problem is presented in some cases.
  • the items to be classified comprise encrypted traffic, such as encrypted network traffic.
  • the information used to make a decision at various branches of a decision tree may be obscure and difficult to verify as correct.
  • such information may be obscure and difficult for a human being to understand, such that if a human operator were to query the reason for a given classification (whether directly or via a log file or the like) and the decisions made at various branches were played back (whether directly or into a log file or the like), the“reasoning” behind the classification would still be quite unclear to the human operator.
  • Certain embodiments presented herein are designed to address these problems, and to provide better explanations of classification decisions.
  • Fig. 1 is a simplified schematic illustration of a decision tree constructed and operative in accordance with an embodiment of the present disclosure.
  • the decision tree 100 comprises a plurality, generally a multiplicity, of branch nodes 110 which include branch nodes l lOa - 11 Og, and also comprises leaf nodes 120 which include leaf nodes l20a - l20h.
  • branch nodes 110 and leaf nodes 120 are depicted in Fig. 1, it being appreciated that in practice a larger number of such nodes may be comprised in the decision tree 100.
  • the decision tree 100 of Fig. 1 is generally created by a training process.
  • Each depicted branch node 110 represents a decision regarding an item to be classified, based on associated decision information; for example, in Fig. 1 decision determination information 135 is associated with root node 1 lOa of the decision tree 100.
  • known items conceptually enter the tree at the root node l lOa and are classified by passing through branch nodes 110 until reaching a leaf node 120. If, for example, for a plurality of known items which are either known to be“good” or known to be“bad”, the decision tree 100 can be determined to be successful or unsuccessful according to how well it succeeds in classifying known-good items as good, and known-bad items as bad.
  • a training process suitable for training the decision tree 100 of Fig. 1 is referred to herein as the“regular Random Forest algorithm”.
  • regular Random Forest algorithm a decision tree such as the decision tree 100 of Fig. 1 is trained automatically using a training set comprising exemplar data.
  • a split function is defined in such a way as to optimize the split function so that the data is split as best as possible;“best” being defined in a particular way given the particular task to be performed when using the decision tree.
  • 1,“best” could mean that the child nodes of each branch node 110 are as“pure” as possible, so that each child node would in practice receive as many items which are similar to each other as possible, and as few different items as possible.
  • the training process is generally constrained to produce a decision tree having, for example, one or more of the following: a maximum number of levels; a determined level of“purity” as described above; and no less than a minimum number of items at each leaf node 120.
  • the decision tree 100 is used to classify“unknown” items (continuing the above example, items for which it is not known whether the items are“good” or“bad”).
  • an item to be classified also termed herein an“item for classification”
  • the item for classification conceptually enters the tree at the root node 1 lOa.
  • decision determination information 135 is used to begin classifying the item for classification. In the example of Fig. 1, based on the decision determination information 135 associated with the root node l lOa, the item for classification is passed on to node 1 lOb.
  • the item for classification continues to pass through the decision tree at nodes l lOc and l lOd.
  • a test based on associated determination information 135, 136, 137, and 138, respectively, is used to send the item for classification on to a further node; for simplicity of depiction, only a portion of the determination information has been assigned reference numerals in Fig. 1.
  • the item for classification is examined based on the determination information 136 associated with node 110b.
  • the determination information might comprise“if size of item for classification exceeds 1056 bytes proceed to node 1 lOc; else proceed to node l20b”.
  • the item for classification is sent on to node l lOc, and not to node l20b, because the size of the item for classification exceeds 1056 bytes.
  • the item for classification When the item for classification reaches a leaf node 120, the item for classification has been classified. In the example of Fig. 1, the item reaches leaf node l20a and is classified accordingly; that is, the item for classification is classified according to a classification associated with leaf node l20a. For example, if leaf node l20a is associated with the classification“suspected dangerous malware”, then the item for classification may be classified at leaf node l20a as“suspected dangerous malware”. For ease of depiction, the nodes l lOa, l lOb, l lOc, l lOd, and l20a which were“visited” by the item for classification are shown with hashing.
  • the“reasoning” may comprise the decisions made at nodes l lOa, 11 Ob, l lOc, and l lOd, based in each such case on associated determination information 135, 136, 137, and 138 respectively.
  • the“reasoning” will comprise“size of item exceeds 1056 bytes”, per the decision made at node 122 based on the determination information 136 associated with node 122; the“reasoning” will also comprise information per the decisions made at nodes l lOa, l lOc, and l lOd, based on determination information 135, 137, and 138, respectively.
  • the“reasoning” provided by a decision tree such as the decision tree 100 of Fig. 1 is inadequate.
  • the items to be classified comprise encrypted traffic, such as, by way of non-limiting example, encrypted network traffic.
  • the information used to make a decision at each such branch of a decision tree may be obscure and difficult for a human being to understand, such that if a human operator were to query the reason for a given classification and, in order to provide such a reason, the decisions made at each such branch were played back, the“reasoning” behind the classification would still be quite unclear to the human operator.
  • the“reasoning” may comprise“size of item exceeds 1056 bytes”; it may not be apparent to a human operator why“size of item exceeds 1056 bytes” is part of the reasoning for classifying an item as suspected dangerous malware.
  • the determination information 135, 136, 137, and 138 relates to characteristics of an item for classification which were used to train the decision tree 100 and which are readily known at the time of classification of the item for classification.
  • an item for classification may have been determined to be suspected dangerous malware by being executed in a controlled environment, such as a sandbox, and it may have been determined that many items which are suspected dangerous malware have a size of item exceeding 1056 bytes, thus leading to the determination information 136.
  • the decision tree 100 was trained based on information (such as the determination information 135, 136, 137, and 138) which would be readily known at the later time of classification of an item; an item to be classified is not executed in a sandbox when it is to be classified, and hence the results of execution in a sandbox, which execution may have taken place at the time of training the decision tree 100, are not included in the determination information 135, 136, 137, and 138.
  • information such as the determination information 135, 136, 137, and 138
  • Data sources from a sandboxing environment can be used to show Indicators of Compromise (IOCs) associated with the classified behavior.
  • IOCs Indicators of Compromise
  • execution in a sandbox is provided as one particular example of a mechanism for determining one or more characteristics known at the time of training but not readily known regarding an item for classification, or difficult to determine regarding an item for classification, when an item for classification is to be classified; for example, execution in a sandbox would be expected to be difficult and / or time-consuming to carry out when an item for classification is to be classified.
  • Other examples of such characteristics which are difficult to determine when an item for classification is to be classified include, but are not limited to, information from proxy logs captured on the training data or features that are easy to understand but are expensive to calculate in a“live” environment when the trained decision tree 100 is used to classify an item to be classified. Characteristics which would be expected to be difficult and / or time-consuming to carry out when an item for classification is to be classified are also termed herein“inappropriate to use in real time”.
  • Proxy logs created when a proxy is used to connect to a site can, for example, provide information about Uniform Resource Locators (URLs), user agent/s, referrer/s and similar information.
  • log entries in proxy logs reveal information about the client making the request, date / time of the request, and the name of an object or objects requested. It is appreciated that the log entry information listed is a non- limiting example of log entry information that might be found in a proxy log.
  • Examples of expensive features as referred to above may include, by way of non-limiting example:
  • VirusTotal a product / site available via the World Wide Web which includes information aggregated from malware vendors; accessing VirusTotal requires an application programming interface (API) key, would require significant resource use, and would thus be inappropriate to use in real time);
  • API application programming interface
  • features calculated from large amounts of data during the training might include additional status information, a number of users who visited a particular domain, etc.; such information changes quickly and takes a long time to determine, and thus would be inappropriate to use in real time.
  • Fig. 2 is a simplified schematic illustration of another decision tree constructed and operative in accordance with another embodiment of the present disclosure.
  • the decision tree 200 comprises a plurality, generally a multiplicity, of branch nodes 210 which include branch nodes 2l0a - 210 g, and also comprises leaf nodes 220 which include leaf nodes 220a - 220h.
  • branch nodes 210 and leaf nodes 220 are depicted in Fig. 2, it being appreciated that in practice a larger number of such nodes may be comprised in the decision tree 200.
  • the decision tree 200 may be created by a training process which differs from the training process described above for the decision tree 100 of Fig. 1.
  • determination information comprised in the decision tree 200 includes, as described in more detail above and below, both information readily available at a time when an item for classification is to be classified (that information being used for training the decision tree 200), and also other information which is available when training the decision tree 200 but which is not readily available at the time when an item for classification is to be classified by the already-trained decision tree 200.
  • the decision tree 200 is used to classify“unknown” items.
  • the item for classification (not shown) conceptually enters the tree at the root node 2l0a.
  • decision determination information 235 is used to begin classifying the item for classification.
  • the item for classification is passed on to node 210b.
  • the item for classification continues to pass through the decision tree at nodes 2l0c and 2l0d.
  • a test based on associated determination information 235, 236, 237, and 238, respectively, is used to send the item for classification on to a further node.
  • the item for classification is examined based on the determination information 236 associated with node 210b.
  • determination information such as the determination information 236 may comprise, as explained above, both information available at a time when an item for classification is to be classified, and other information which is available at a time of training but which other information is not readily available at the time when an item for classification is to be classified.
  • the determination information 236 may comprise available determination information 251, which is actually used for classifying an item to be classified, as well as non-available determination information 252 and 253, which are information that was available at a time of training but relate to one or more characteristics typical of items for classification, but which are not readily known / not readily available regarding particular item for classification at the time when the item for classification is to be classified.
  • the determination information might comprise available determination information 251 indicating“if size of item for classification exceeds 1056 bytes proceed to node 2l0c; else proceed to node 220b”.
  • the item for classification is sent on to node 2l0c, and not to node 220b, because the size of the item for classification exceeds 1056 bytes.
  • the item for classification When the item for classification reaches a leaf node 220, the item for classification has been classified. In the example of Fig. 1, the item reaches leaf node 220a and is classified accordingly; that is, the item for classification is classified according to a classification associated with leaf node 220a. For example, if leaf node 220a is associated with the classification“suspected dangerous malware”, then the item for classification may be classified at leaf node 220a as“suspected dangerous malware”. For ease of depiction, the nodes 2l0a, 210b, 2l0c, 2l0d, and 220a which were“visited” by the item for classification are shown with hashing.
  • the“reasoning” may comprise the decisions made at nodes 2l0a, 210b, 2l0c, and 2l0d, based at the respective nodes on associated determination information 235, 236, 237, and 238 respectively.
  • the“reasoning” will comprise“size of item exceeds 1056 bytes”, per the decision made at node 210b based on the determination information 236 associated with node 210b; the “reasoning” will also comprise information per the decisions made at nodes 2l0a, 2l0c, and 2l0d, based on determination information 235, 236, 237, and 238, respectively.
  • the “reasoning” may comprise non-available determination information 252 and / or 253, which, as indicated above, are information that was available at a time of training but relate to one or more characteristics which are not readily known / not readily available at the time when the item for classification is to be classified.
  • the non- available determination information 252 may comprise “execution in a controlled environment suggests malware”.
  • Fig. 3 illustrates pseudo code 300 which provides a particularly detailed non-limiting example of how the decision tree of Fig. 2 may be built.
  • pseudo code 300 which provides a particularly detailed non-limiting example of how the decision tree of Fig. 2 may be built.
  • a) pairing between data sources is implicit in the input functions fi , f n.
  • network behavior of a particular piece of code is known based on behavior of the piece of code when executed in a sandbox.
  • information extracted from VirusTotal may be used.
  • the reference to“regular Random Forest algorithm” may, in one non- limiting example, refer to the regular Random Forest algorithm described above.
  • FIG. 4 is a simplified block diagram illustration of an exemplary device 400 suitable for implementing various ones of the systems, methods or processes described above.
  • the exemplary device 400 comprises one or more processors, such as processor 401, providing an execution platform for executing machine readable instructions such as software.
  • processors such as by way of non-limiting example the illustrated processor 401, may be a special purpose processor operative to perform the methods for building a tree and/or the methods for classifying items described herein above.
  • Processor 401 comprises dedicated hardware logic circuits, in the form of an application-specific integrated circuit (ASIC), field programmable gate array (FPGA), or full-custom integrated circuit, or a combination of such devices.
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • DSP digital signal processor
  • This software may be downloaded to the processor in electronic form, over a network, for example.
  • the software may be stored on tangible storage media, such as optical, magnetic, or electronic memory media.
  • the system 400 also includes a main memory 403, such as a Random Access Memory (RAM) 404, where machine readable instructions may reside during runtime, and further includes a secondary memory 405.
  • the secondary memory 405 includes, for example, a hard disk drive 407 and/or a removable storage drive 408, representing a floppy diskette drive, a magnetic tape drive, a compact disk drive, a flash drive, etc., or a nonvolatile memory where a copy of the machine readable instructions or software may be stored.
  • the secondary memory 405 may also include ROM (read only memory), EPROM (erasable, programmable ROM), EEPROM (electrically erasable, programmable ROM).
  • data representing the decision tree 200 of Fig. discussed above, without limiting the generality of the foregoing, or other similar data, may be stored in the main memory 403 and/or the secondary memory 405.
  • the removable storage drive 408 is read from and/or written to by a removable storage control unit 409 in a well-known manner.
  • a network interface 419 is provided for communicating with other systems and devices via a network.
  • the network interface 419 typically includes a wireless interface for communicating with wireless devices in the wireless community.
  • a wired network interface (e.g. an Ethernet interface) may be present as well.
  • the exemplary device 400 may also comprise other interfaces, including, but not limited to Bluetooth, and HDMI. It is appreciated that logic and/or software may, in addition to what is described above and below, be stored other than in the main memory 403 and/or the secondary memory 405; without limiting the generality of the foregoing, logic and/or software may be stored in a cloud and/or on a network and may be accessed through the network interface 419 and executed by the processor 401.
  • the exemplary device 400 shown in Fig. 4 is provided as an example of a possible platform that may be used; other types of platforms may be used as is known in the art.
  • One or more of the steps described above and/or below may be implemented as instructions embedded on a computer readable medium and executed on the exemplary device 400.
  • the steps may be embodied by a computer program, which may exist in a variety of forms both active and inactive. For example, they may exist as software program(s) comprised of program instructions in source code, object code, executable code or other formats for performing some of the steps.
  • any of the above may be embodied on a computer readable medium, which include storage devices and signals, in compressed or uncompressed form.
  • suitable computer readable storage devices include conventional computer system RAM (random access memory), ROM (read only memory), EPROM (erasable, programmable ROM ), EEPROM (electrically erasable, programmable ROM), and magnetic or optical disks or tapes.
  • Examples of computer readable signals, whether modulated using a carrier or not, are signals that a computer system hosting or running a computer program may be configured to access, including signals downloaded through the Internet or other networks. Concrete examples of the foregoing include distribution of the programs on a CD ROM or via Internet download. In a sense, the Internet itself, as an abstract entity, is a computer readable medium. The same is true of computer networks in general. It is therefore to be understood that those functions enumerated above may be performed by any electronic device capable of executing the above-described functions.
  • software components of the present invention may, if desired, be implemented in ROM (read only memory) form.
  • the software components may, generally, be implemented in hardware, if desired, using conventional techniques.
  • the software components may be instantiated, for example: as a computer program product or on a tangible medium. In some cases, it may be possible to instantiate the software components as a signal interpretable by an appropriate computer, although such an instantiation may be excluded in certain embodiments of the present invention.
  • FIG. 5 is a simplified flowchart illustration of an exemplary method for training a classifier.
  • at least one first information source available when a classifier is trained, but not readily available at a time when the classifier is applied is accessed at step 510.
  • at least one second information source is accessed, the second information source being available at the time of training the classifier and also being readily available when the classifier is applied.
  • the classifier is trained based on the at least one second information source at step 530, and decision determining information from the at least one second information source is stored in the classifier at step 540.
  • decision explanation information from the at least one first information source is stored in the classifier at step 550.
  • Fig. 6 is a simplified flowchart illustration of a method for applying a trained classifier.
  • a trained classifier is accessed.
  • the trained classifier is a classifier trained based at least on a second information source available when the classifier is trained, and also readily available when the classifier is applied.
  • the trained classifier also includes decision explanation information from at least one first information source which is available when the classifier is trained, but which is not readily available when the classifier is applied.
  • An item to be classified is received at step 620, and the classifier is used to classify the item at step 630.
  • item decision information for the item is provided; the item decision information is based on at least a part of the decision information from the at least one first information source.
  • a method includes accessing a trained classifier, the trained classifier trained based at least on a first data item and including both decision determination information of the first data item and decision explanation information of at least one second data item, the second data item being distinct from the first data item; receiving an item for classification; using the trained classifier to classify the item for classification; and providing item decision information regarding a reason for classifying the item for classification, the item decision information being based on at least a part of the decision explanation information.
  • Other embodiments are also described.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

In one embodiment, a method including accessing a trained classifier, the trained classifier trained based at least on a first data item and including both decision determination information of the first data item and decision explanation information of at least one second data item, the second data item being distinct from the first data item; receiving an item for classification; using the trained classifier to classify the item for classification; and providing item decision information regarding a reason for classifying the item for classification, the item decision information being based on at least a part of the decision explanation information. Other embodiments are also described.

Description

SUPERVISED LEARNING SYSTEM
TECHNICAL FIELD [1] The present disclosure generally relates to supervised learning systems, and more specifically to systems for providing explanations of classification decisions made using supervised learning systems.
BACKGROUND
[2] Machine learning solutions are known in which supervised learning is used to train a blackbox classifier. One non-limiting example of such a classifier is a decision tree; other examples of black-box classifiers are known in the art. For simplicity of description, and without limiting the generality of the foregoing, the example of a decision tree is often used throughout the present specification.
[3] Once a decision tree has been trained, items for classification are entered into the decision tree and classified. Some solutions for explaining why a decision tree chose to classify a given item in a given way are known in the art.
BRIEF DESCRIPTION OF THE DRAWINGS
[4] The present disclosure will be understood and appreciated more fully from the following detailed description, taken in conjunction with the drawings in which:
[5] Fig. 1 is a simplified schematic illustration of a decision tree constructed and operative in accordance with an embodiment of the present disclosure;
[6] Fig. 2 is a simplified schematic illustration of another decision tree constructed and operative in accordance with another embodiment of the present disclosure;
[7] Fig. 3 illustrates pseudo code which provides a particularly detailed non- limiting example of how the decision tree of Fig. 2 may be built; [8] Fig. 4 is a simplified block diagram illustration of an exemplary device suitable for implementing various ones of the systems, methods or processes described herein;
[9] Fig. 5 is a simplified flowchart illustration of a method for training a classifier; and
[10] Fig. 6 is a simplified flowchart illustration of a method for applying a trained classifier.
OVERVIEW [11] Aspects of the invention are set out in the independent claims and preferred features are set out in the dependent claims. Features of one aspect may be applied to each aspect alone or in combination with other aspects.
[12] A system includes a processor and a memory to store data used by the processor. The processor is operative to access at least one first data item used to train a classifier; access at least one second data item, the second data item not being used to train the classifier; produce a trained classifier based on training using the at least one first data item; store in the trained classifier, as decision determining information, information of the at least one first data item; and also store in the trained classifier, in association with the decision determining information, decision explanation information of the at least one second data item.
[13] A system includes a processor; and a memory to store data used by the processor. The processor is operative to: access a trained classifier, the trained classifier trained based at least on a first data item and including both decision determination information of the first data item and decision explanation information of at least one second data item, the second data item being distinct from the first data item; receive an item for classification; use the trained classifier to classify the item for classification; and provide item decision information regarding a reason for classifying the item for classification, the item decision information being based on at least a part of the decision explanation information. [14] A method includes accessing at least one first data item used to train a classifier; accessing at least one second data item, the second data item not being used to train the classifier; producing a trained classifier based on training using the at least one first data item; storing in the trained classifier, as decision determining information, information of the at least one first data item; and also storing in the trained classifier, in association with the decision determining information, decision explanation information of the at least one second data item.
[15] A method includes accessing a trained classifier, the trained classifier trained based at least on a first data item and including both decision determination information of the first data item and decision explanation information of at least one second data item, the second data item being distinct from the first data item; receiving an item for classification; using the trained classifier to classify the item for classification; and providing item decision information regarding a reason for classifying the item for classification, the item decision information being based on at least a part of the decision explanation information.
[16] A computer-readable storage medium includes stored therein data representing software executable by a computer, the software including instructions including: instructions for accessing at least one first data item used to train a classifier; instructions for accessing at least one second data item, the second data item not being used to train the classifier; instructions for producing a trained classifier based on training using the at least one first data item; instructions for storing in the trained classifier, as decision determining information, information of the at least one first data item; and instructions for also storing in the trained classifier, in association with the decision determining information, decision explanation information of the at least one second data item.
[17] A computer-readable storage medium includes stored therein data representing software executable by a computer, the software including instructions including: instructions for accessing a trained classifier, the trained classifier trained based at least on a first data item and including both decision determination information of the first data item and decision explanation information of at least one second data item, the second data item being distinct from the first data item; instructions for receiving an item for classification; instructions for using the trained classifier to classify the item for classification; and instructions for providing item decision information regarding a reason for classifying the item for classification, the item decision information being based on at least a part of the decision explanation information.
DESCRIPTION OF EXAMPLE EMBODIMENTS
[18] As explained above, machine learning solutions are known in which supervised learning is used to train a blackbox classifier such as, by way of non-limiting example, a decision tree. Other non-limiting examples of such classifiers include logistic regression models, neural networks, and random forests. Once a classifier (such as a decision tree) has been trained, items for classification are entered into the trained classifier and are classified. Some solutions for explaining why a classifier chose to classify a given item in a given way are known in the art and are discussed below.
[19] For simplicity of description, and without limiting the generality of the foregoing, the example of a decision tree is often used throughout the present specification. In the case of a decision tree, when items for classification are presented for classification a series of decisions is made at various branches (nodes) of the tree, based on various criteria, until a leaf node of the tree is reached and the item has been classified. Therefore, it is straightforward to provide an explanation of the ultimate classification decision by outputting / stating (“playing back”) the decisions made at various branch nodes of the tree. Examples of more general ways of providing an explanation for the decision of a classifier, applicable more widely than a case of a decision tree, are known to persons skilled in the art.
[20] A different problem is presented in some cases. One example of such a case is when the items to be classified comprise encrypted traffic, such as encrypted network traffic. In such a case, the information used to make a decision at various branches of a decision tree may be obscure and difficult to verify as correct. In particular, and without limiting the generality of the foregoing, such information may be obscure and difficult for a human being to understand, such that if a human operator were to query the reason for a given classification (whether directly or via a log file or the like) and the decisions made at various branches were played back (whether directly or into a log file or the like), the“reasoning” behind the classification would still be quite unclear to the human operator. Certain embodiments presented herein are designed to address these problems, and to provide better explanations of classification decisions.
[21] Reference is now made to Fig. 1, which is a simplified schematic illustration of a decision tree constructed and operative in accordance with an embodiment of the present disclosure. In Fig. 1 a decision tree 100 is shown. The decision tree 100 comprises a plurality, generally a multiplicity, of branch nodes 110 which include branch nodes l lOa - 11 Og, and also comprises leaf nodes 120 which include leaf nodes l20a - l20h. For simplicity of depiction, a limited number of branch nodes 110 and leaf nodes 120 is depicted in Fig. 1, it being appreciated that in practice a larger number of such nodes may be comprised in the decision tree 100.
[22] The decision tree 100 of Fig. 1 is generally created by a training process. Each depicted branch node 110 represents a decision regarding an item to be classified, based on associated decision information; for example, in Fig. 1 decision determination information 135 is associated with root node 1 lOa of the decision tree 100. In a training process, known items conceptually enter the tree at the root node l lOa and are classified by passing through branch nodes 110 until reaching a leaf node 120. If, for example, for a plurality of known items which are either known to be“good” or known to be“bad”, the decision tree 100 can be determined to be successful or unsuccessful according to how well it succeeds in classifying known-good items as good, and known-bad items as bad.
[23] One non-limiting example of a training process suitable for training the decision tree 100 of Fig. 1 is referred to herein as the“regular Random Forest algorithm”. In the regular Random Forest algorithm, a decision tree such as the decision tree 100 of Fig. 1 is trained automatically using a training set comprising exemplar data. At each branch node 110 a split function is defined in such a way as to optimize the split function so that the data is split as best as possible;“best” being defined in a particular way given the particular task to be performed when using the decision tree. For a tree like the decision tree 100 of Fig. 1,“best” could mean that the child nodes of each branch node 110 are as“pure” as possible, so that each child node would in practice receive as many items which are similar to each other as possible, and as few different items as possible. In addition, the training process is generally constrained to produce a decision tree having, for example, one or more of the following: a maximum number of levels; a determined level of“purity” as described above; and no less than a minimum number of items at each leaf node 120.
[24] Once a decision tree such as decision tree 100 of Fig. 1 has been trained, the decision tree 100 is used to classify“unknown” items (continuing the above example, items for which it is not known whether the items are“good” or“bad”). When an item to be classified (also termed herein an“item for classification”) is received, the item for classification (not shown) conceptually enters the tree at the root node 1 lOa. At the root node l lOa decision determination information 135 is used to begin classifying the item for classification. In the example of Fig. 1, based on the decision determination information 135 associated with the root node l lOa, the item for classification is passed on to node 1 lOb.
[25] Similarly, the item for classification continues to pass through the decision tree at nodes l lOc and l lOd. At nodes l lOa, 110b, l lOc, and l lOd a test based on associated determination information 135, 136, 137, and 138, respectively, is used to send the item for classification on to a further node; for simplicity of depiction, only a portion of the determination information has been assigned reference numerals in Fig. 1. For example, at node 1 lOb the item for classification is examined based on the determination information 136 associated with node 110b. For example, the determination information might comprise“if size of item for classification exceeds 1056 bytes proceed to node 1 lOc; else proceed to node l20b”. In the particular example shown in Fig. 1, the item for classification is sent on to node l lOc, and not to node l20b, because the size of the item for classification exceeds 1056 bytes.
[26] When the item for classification reaches a leaf node 120, the item for classification has been classified. In the example of Fig. 1, the item reaches leaf node l20a and is classified accordingly; that is, the item for classification is classified according to a classification associated with leaf node l20a. For example, if leaf node l20a is associated with the classification“suspected dangerous malware”, then the item for classification may be classified at leaf node l20a as“suspected dangerous malware”. For ease of depiction, the nodes l lOa, l lOb, l lOc, l lOd, and l20a which were“visited” by the item for classification are shown with hashing. [27] If it is desired to provide an explanation of the“reasoning” behind the classification (whether to a human operator, to a log file, or otherwise), the“reasoning” may comprise the decisions made at nodes l lOa, 11 Ob, l lOc, and l lOd, based in each such case on associated determination information 135, 136, 137, and 138 respectively. In the particular example discussed, the“reasoning” will comprise“size of item exceeds 1056 bytes”, per the decision made at node 122 based on the determination information 136 associated with node 122; the“reasoning” will also comprise information per the decisions made at nodes l lOa, l lOc, and l lOd, based on determination information 135, 137, and 138, respectively.
[28] As described above, there may be cases in which the“reasoning” provided by a decision tree such as the decision tree 100 of Fig. 1 is inadequate. One example of such a case is when the items to be classified comprise encrypted traffic, such as, by way of non-limiting example, encrypted network traffic. In such a case, the information used to make a decision at each such branch of a decision tree may be obscure and difficult for a human being to understand, such that if a human operator were to query the reason for a given classification and, in order to provide such a reason, the decisions made at each such branch were played back, the“reasoning” behind the classification would still be quite unclear to the human operator. For example, as described above the“reasoning” may comprise“size of item exceeds 1056 bytes”; it may not be apparent to a human operator why“size of item exceeds 1056 bytes” is part of the reasoning for classifying an item as suspected dangerous malware.
[29] It will be appreciated that one of the challenges in providing“reasoning” which would be clear to a human operator is that, during use of a decision tree such as the decision tree 100 to classify items, the determination information 135, 136, 137, and 138 relates to characteristics of an item for classification which were used to train the decision tree 100 and which are readily known at the time of classification of the item for classification. For example, during a training phase of the decision tree 100, as described above, an item for classification may have been determined to be suspected dangerous malware by being executed in a controlled environment, such as a sandbox, and it may have been determined that many items which are suspected dangerous malware have a size of item exceeding 1056 bytes, thus leading to the determination information 136. However, in the training phase of the decision tree 100 the decision tree 100 was trained based on information (such as the determination information 135, 136, 137, and 138) which would be readily known at the later time of classification of an item; an item to be classified is not executed in a sandbox when it is to be classified, and hence the results of execution in a sandbox, which execution may have taken place at the time of training the decision tree 100, are not included in the determination information 135, 136, 137, and 138.
[30] Data sources from a sandboxing environment can be used to show Indicators of Compromise (IOCs) associated with the classified behavior. Examples of such IOCs, based on behavior during execution in a sandbox, include by way of non- limiting example: accessing the Windows registry or certain sensitive portions thereof; or modifying or attempting to modify an executable file; executing portions of memory in a way which is deemed suspicious; creating or attempting to create a DLL file; and so forth.
[31] It is appreciated that execution in a sandbox, as described above, is provided as one particular example of a mechanism for determining one or more characteristics known at the time of training but not readily known regarding an item for classification, or difficult to determine regarding an item for classification, when an item for classification is to be classified; for example, execution in a sandbox would be expected to be difficult and / or time-consuming to carry out when an item for classification is to be classified. Other examples of such characteristics which are difficult to determine when an item for classification is to be classified include, but are not limited to, information from proxy logs captured on the training data or features that are easy to understand but are expensive to calculate in a“live” environment when the trained decision tree 100 is used to classify an item to be classified. Characteristics which would be expected to be difficult and / or time-consuming to carry out when an item for classification is to be classified are also termed herein“inappropriate to use in real time”.
[32] Proxy logs created when a proxy is used to connect to a site can, for example, provide information about Uniform Resource Locators (URLs), user agent/s, referrer/s and similar information. In general, log entries in proxy logs reveal information about the client making the request, date / time of the request, and the name of an object or objects requested. It is appreciated that the log entry information listed is a non- limiting example of log entry information that might be found in a proxy log. [33] Examples of expensive features as referred to above may include, by way of non-limiting example:
[34] information extracted from external data feeds, such as a query to VirusTotal (a product / site available via the World Wide Web which includes information aggregated from malware vendors; accessing VirusTotal requires an application programming interface (API) key, would require significant resource use, and would thus be inappropriate to use in real time);
[35] information extracted from a whois database; and
[36] features calculated from large amounts of data during the training; such features might include additional status information, a number of users who visited a particular domain, etc.; such information changes quickly and takes a long time to determine, and thus would be inappropriate to use in real time.
[37] Thus, in a very particular example, it could be possible and might be desirable for“reasoning” to not simply be“this particular behavior is malicious”, or“this particular behavior is malicious because of excessive up-packets in the 83rd percentile of the distribution in combination with irregular access timings”. The“reasoning” could specifically point out the malicious behavior and a list of associated informative IOCs such as modifying the registry, sending a number of emails which exceeds a particular limit, and accessing domains that have a lot of hits on VirusTotal, as explained above.
[38] Reference is now made to Fig. 2, which is a simplified schematic illustration of another decision tree constructed and operative in accordance with another embodiment of the present disclosure. In Fig. 2 a decision tree 200 is shown. The decision tree 200 comprises a plurality, generally a multiplicity, of branch nodes 210 which include branch nodes 2l0a - 210 g, and also comprises leaf nodes 220 which include leaf nodes 220a - 220h. For simplicity of depiction, a limited number of branch nodes 210 and leaf nodes 220 is depicted in Fig. 2, it being appreciated that in practice a larger number of such nodes may be comprised in the decision tree 200.
[39] The decision tree 200 may be created by a training process which differs from the training process described above for the decision tree 100 of Fig. 1. In particular, as a result of the training process (a particularly detailed non-limiting example of which is described below) determination information comprised in the decision tree 200 includes, as described in more detail above and below, both information readily available at a time when an item for classification is to be classified (that information being used for training the decision tree 200), and also other information which is available when training the decision tree 200 but which is not readily available at the time when an item for classification is to be classified by the already-trained decision tree 200.
[40] Once a decision tree such as the decision tree 200 of Fig. 2 has been trained, the decision tree 200 is used to classify“unknown” items. When an item to be classified (“item for classification”) is received, the item for classification (not shown) conceptually enters the tree at the root node 2l0a. At the root node 2l0a decision determination information 235 is used to begin classifying the item for classification. In the example of Fig. 2, based on the decision determination information 235 associated with the root node 2l0a, the item for classification is passed on to node 210b.
[41] Similarly, the item for classification continues to pass through the decision tree at nodes 2l0c and 2l0d. At nodes 2l0a, 210b, 2l0c, and 2l0d a test based on associated determination information 235, 236, 237, and 238, respectively, is used to send the item for classification on to a further node. For example, at node 210b the item for classification is examined based on the determination information 236 associated with node 210b.
[42] In the decision tree 200, determination information such as the determination information 236 may comprise, as explained above, both information available at a time when an item for classification is to be classified, and other information which is available at a time of training but which other information is not readily available at the time when an item for classification is to be classified. For example, the determination information 236 may comprise available determination information 251, which is actually used for classifying an item to be classified, as well as non-available determination information 252 and 253, which are information that was available at a time of training but relate to one or more characteristics typical of items for classification, but which are not readily known / not readily available regarding particular item for classification at the time when the item for classification is to be classified.
[43] For example, the determination information might comprise available determination information 251 indicating“if size of item for classification exceeds 1056 bytes proceed to node 2l0c; else proceed to node 220b”. In the particular example shown in Fig. 2, the item for classification is sent on to node 2l0c, and not to node 220b, because the size of the item for classification exceeds 1056 bytes.
[44] When the item for classification reaches a leaf node 220, the item for classification has been classified. In the example of Fig. 1, the item reaches leaf node 220a and is classified accordingly; that is, the item for classification is classified according to a classification associated with leaf node 220a. For example, if leaf node 220a is associated with the classification“suspected dangerous malware”, then the item for classification may be classified at leaf node 220a as“suspected dangerous malware”. For ease of depiction, the nodes 2l0a, 210b, 2l0c, 2l0d, and 220a which were“visited” by the item for classification are shown with hashing.
[45] If it is desired to provide an explanation of the“reasoning” behind the classification (whether to a human operator, to a log file, or otherwise), the“reasoning” may comprise the decisions made at nodes 2l0a, 210b, 2l0c, and 2l0d, based at the respective nodes on associated determination information 235, 236, 237, and 238 respectively. In the particular example discussed, the“reasoning” will comprise“size of item exceeds 1056 bytes”, per the decision made at node 210b based on the determination information 236 associated with node 210b; the “reasoning” will also comprise information per the decisions made at nodes 2l0a, 2l0c, and 2l0d, based on determination information 235, 236, 237, and 238, respectively. In addition, the “reasoning” may comprise non-available determination information 252 and / or 253, which, as indicated above, are information that was available at a time of training but relate to one or more characteristics which are not readily known / not readily available at the time when the item for classification is to be classified. For example, the non- available determination information 252 may comprise “execution in a controlled environment suggests malware”.
[46] Reference is now made to Fig. 3, which illustrates pseudo code 300 which provides a particularly detailed non-limiting example of how the decision tree of Fig. 2 may be built. In the pseudo code of Fig. 3:
[47] a) pairing between data sources is implicit in the input functions fi , fn. In one example described above, where a sandbox is used, network behavior of a particular piece of code is known based on behavior of the piece of code when executed in a sandbox. In another example, where VirusTotal is used, information extracted from VirusTotal (based, for example, on a particular domain) may be used.
[48] b) the reference to“regular Random Forest algorithm”, may, in one non- limiting example, refer to the regular Random Forest algorithm described above.
[49] Reference is now made to Fig. 4, which is a simplified block diagram illustration of an exemplary device 400 suitable for implementing various ones of the systems, methods or processes described above.
[50] The exemplary device 400 comprises one or more processors, such as processor 401, providing an execution platform for executing machine readable instructions such as software. One of the processors, such as by way of non-limiting example the illustrated processor 401, may be a special purpose processor operative to perform the methods for building a tree and/or the methods for classifying items described herein above. Processor 401 comprises dedicated hardware logic circuits, in the form of an application-specific integrated circuit (ASIC), field programmable gate array (FPGA), or full-custom integrated circuit, or a combination of such devices. Alternatively or additionally, some or all of the functions of the processor 401 may be carried out by a programmable processor microprocessor or digital signal processor (DSP), under the control of suitable software. This software may be downloaded to the processor in electronic form, over a network, for example. Alternatively or additionally, the software may be stored on tangible storage media, such as optical, magnetic, or electronic memory media.
[51] Commands and data from the processor 401 are communicated over a communication bus 402. The system 400 also includes a main memory 403, such as a Random Access Memory (RAM) 404, where machine readable instructions may reside during runtime, and further includes a secondary memory 405. The secondary memory 405 includes, for example, a hard disk drive 407 and/or a removable storage drive 408, representing a floppy diskette drive, a magnetic tape drive, a compact disk drive, a flash drive, etc., or a nonvolatile memory where a copy of the machine readable instructions or software may be stored. The secondary memory 405 may also include ROM (read only memory), EPROM (erasable, programmable ROM), EEPROM (electrically erasable, programmable ROM). In addition to software, data representing the decision tree 200 of Fig. discussed above, without limiting the generality of the foregoing, or other similar data, may be stored in the main memory 403 and/or the secondary memory 405. The removable storage drive 408 is read from and/or written to by a removable storage control unit 409 in a well-known manner.
[52] A network interface 419 is provided for communicating with other systems and devices via a network. The network interface 419 typically includes a wireless interface for communicating with wireless devices in the wireless community. A wired network interface (e.g. an Ethernet interface) may be present as well. The exemplary device 400 may also comprise other interfaces, including, but not limited to Bluetooth, and HDMI. It is appreciated that logic and/or software may, in addition to what is described above and below, be stored other than in the main memory 403 and/or the secondary memory 405; without limiting the generality of the foregoing, logic and/or software may be stored in a cloud and/or on a network and may be accessed through the network interface 419 and executed by the processor 401.
[53] It will be apparent to one of ordinary skill in the art that one or more of the components of the exemplary device 400 may not be included and/or other components may be added as is known in the art. The exemplary device 400 shown in Fig. 4 is provided as an example of a possible platform that may be used; other types of platforms may be used as is known in the art. One or more of the steps described above and/or below may be implemented as instructions embedded on a computer readable medium and executed on the exemplary device 400. The steps may be embodied by a computer program, which may exist in a variety of forms both active and inactive. For example, they may exist as software program(s) comprised of program instructions in source code, object code, executable code or other formats for performing some of the steps. Any of the above may be embodied on a computer readable medium, which include storage devices and signals, in compressed or uncompressed form. Examples of suitable computer readable storage devices include conventional computer system RAM (random access memory), ROM (read only memory), EPROM (erasable, programmable ROM ), EEPROM (electrically erasable, programmable ROM), and magnetic or optical disks or tapes. Examples of computer readable signals, whether modulated using a carrier or not, are signals that a computer system hosting or running a computer program may be configured to access, including signals downloaded through the Internet or other networks. Concrete examples of the foregoing include distribution of the programs on a CD ROM or via Internet download. In a sense, the Internet itself, as an abstract entity, is a computer readable medium. The same is true of computer networks in general. It is therefore to be understood that those functions enumerated above may be performed by any electronic device capable of executing the above-described functions.
[54] It is appreciated that software components of the present invention may, if desired, be implemented in ROM (read only memory) form. The software components may, generally, be implemented in hardware, if desired, using conventional techniques. It is further appreciated that the software components may be instantiated, for example: as a computer program product or on a tangible medium. In some cases, it may be possible to instantiate the software components as a signal interpretable by an appropriate computer, although such an instantiation may be excluded in certain embodiments of the present invention.
[55] Reference is now made to Fig. 5, which is a simplified flowchart illustration of an exemplary method for training a classifier. In the method of Fig. 5, at least one first information source available when a classifier is trained, but not readily available at a time when the classifier is applied, is accessed at step 510. At step 520, at least one second information source is accessed, the second information source being available at the time of training the classifier and also being readily available when the classifier is applied. The classifier is trained based on the at least one second information source at step 530, and decision determining information from the at least one second information source is stored in the classifier at step 540. In addition to the decision determining information, decision explanation information from the at least one first information source is stored in the classifier at step 550.
[56] Reference is now made to Fig. 6, which is a simplified flowchart illustration of a method for applying a trained classifier. In step 610, a trained classifier is accessed. The trained classifier is a classifier trained based at least on a second information source available when the classifier is trained, and also readily available when the classifier is applied. The trained classifier also includes decision explanation information from at least one first information source which is available when the classifier is trained, but which is not readily available when the classifier is applied. An item to be classified is received at step 620, and the classifier is used to classify the item at step 630. At step 640, item decision information for the item is provided; the item decision information is based on at least a part of the decision information from the at least one first information source.
[57] The methods of Figs. 5 and 6 are believed to be self-explanatory with reference to the above discussion, and in particular with reference to the above discussion of Figs. 2 and 3. [58] In summary, in one embodiment, a method includes accessing a trained classifier, the trained classifier trained based at least on a first data item and including both decision determination information of the first data item and decision explanation information of at least one second data item, the second data item being distinct from the first data item; receiving an item for classification; using the trained classifier to classify the item for classification; and providing item decision information regarding a reason for classifying the item for classification, the item decision information being based on at least a part of the decision explanation information. Other embodiments are also described.
[59] It is appreciated that various features of the invention which are, for clarity, described in the contexts of separate embodiments may also be provided in combination in a single embodiment. Conversely, various features of the invention which are, for brevity, described in the context of a single embodiment may also be provided separately or in any suitable subcombination.
[60] It will be appreciated by persons skilled in the art that the present invention is not limited by what has been particularly shown and described hereinabove. Rather the scope of the invention is defined by the appended claims and equivalents thereof:

Claims

What is claimed is:
1. A system comprising a processor; and a memory to store data used by the processor, wherein the processor is operative to:
access at least one first data item used to train a classifier;
access at least one second data item, the second data item not being used to train the classifier;
produce a trained classifier based on training using the at least one first data item;
store in the trained classifier, as decision determining information, information of the at least one first data item; and
also store in the trained classifier, in association with the decision determining information, decision explanation information of the at least one second data item.
2. The system according to claim 1 and wherein the processor is also operative to:
use the trained classifier to classify an item;
provide information from the trained classifier regarding a reason for classifying the item, the information including the decision explanation information.
3. The system according to claim 2 and wherein the item comprises an event.
4. The system according to claim 3 and wherein the event comprises receiving an encrypted data item.
5. The system according to claim 4 and wherein the encrypted data item comprises an executable data item, and the reason comprises behavior of the encrypted data item when executed.
6. The system according to claim 4 and wherein the encrypted data item comprises an executable data item, and the reason comprises behavior of the encrypted data item when executed in a sandbox.
7. The system according to any of claims 4 to 6 and wherein the behavior comprises behavior classified as suspicious behavior.
8. The system according to any of claims 1 to 7 and wherein the classifier comprises a decision tree.
9. The system according to claim 8 and wherein the decision tree comprises a plurality of decision trees.
10. A system comprising a processor; and a memory to store data used by the processor, wherein the processor is operative to:
access a trained classifier, the trained classifier trained based at least on a first data item and comprising both decision determination information of the first data item and decision explanation information of at least one second data item, the second data item being distinct from the first data item;
receive an item for classification;
use the trained classifier to classify the item for classification; and provide item decision information regarding a reason for classifying the item for classification, the item decision information being based on at least a part of the decision explanation information.
11. The system according to claim 10 and wherein the item for classification comprises an event.
12. The system according to claim 11 and wherein the event comprises receiving an encrypted data item.
13. The system according to claim 12 and wherein the encrypted data item comprises an executable data item, and the reason comprises behavior of the encrypted data item when executed.
14. The system according to claim 12 and wherein the encrypted data item comprises an executable data item, and the reason comprises behavior of the encrypted data item when executed in a sandbox.
15. The system according to any of claims 12 to 14 and wherein the behavior comprises behavior classified as suspicious behavior.
16. The system according to any of claims 10 to 15 and wherein the classifier comprises a decision tree.
17. The system according to claim 16 and wherein the decision tree comprises a plurality of decision trees.
18. A method compri sing :
accessing at least one first data item used to train a classifier; accessing at least one second data item, the second data item not being used to train the classifier;
producing a trained classifier based on training using the at least one first data item;
storing in the trained classifier, as decision determining information, information of the at least one first data item; and
also storing in the trained classifier, in association with the decision determining information, decision explanation information of the at least one second data item.
19. The method according to claim 18 and wherein the classifier comprises a decision tree.
20. A method comprising:
accessing a trained classifier, the trained classifier trained based at least on a first data item and comprising both decision determination information of the first data item and decision explanation information of at least one second data item, the second data item being distinct from the first data item;
receiving an item for classification;
using the trained classifier to classify the item for classification; and providing item decision information regarding a reason for classifying the item for classification, the item decision information being based on at least a part of the decision explanation information.
21. The method according to claim 20 and wherein the trained classifier comprises a decision tree.
22. Apparatus comprising:
means for accessing at least one first data item used to train a classifier; means for accessing at least one second data item, the second data item not being used to train the classifier;
means for producing a trained classifier based on training using the at least one first data item;
means for storing in the trained classifier, as decision determining information, information of the at least one first data item; and
means for storing in the trained classifier, in association with the decision determining information, decision explanation information of the at least one second data item.
23. The apparatus according to claim 22 further comprising means for implementing the system of any of claims 2 to 9.
24. Apparatus comprising:
means for accessing a trained classifier, the trained classifier trained based at least on a first data item and comprising both decision determination information of the first data item and decision explanation information of at least one second data item, the second data item being distinct from the first data item;
means for receiving an item for classification; means for using the trained classifier to classify the item for classification; and
means for providing item decision information regarding a reason for classifying the item for classification, the item decision information being based on at least a part of the decision explanation information.
25. The apparatus according to claim 24 further comprising means for implementing the system of any of claims 11 to 17.
26. A computer program, computer program product or logic encoded on a tangible computer readable medium comprising instructions for implementing the method according to any one of claims 18 to 21.
PCT/US2019/017777 2018-02-22 2019-02-13 Supervised learning system WO2019164718A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP19707599.7A EP3756146B1 (en) 2018-02-22 2019-02-13 Supervised learning system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US15/901,915 2018-02-22
US15/901,915 US20190258965A1 (en) 2018-02-22 2018-02-22 Supervised learning system

Publications (1)

Publication Number Publication Date
WO2019164718A1 true WO2019164718A1 (en) 2019-08-29

Family

ID=65529853

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2019/017777 WO2019164718A1 (en) 2018-02-22 2019-02-13 Supervised learning system

Country Status (3)

Country Link
US (1) US20190258965A1 (en)
EP (1) EP3756146B1 (en)
WO (1) WO2019164718A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8375450B1 (en) * 2009-10-05 2013-02-12 Trend Micro, Inc. Zero day malware scanner
US20160036844A1 (en) * 2014-07-15 2016-02-04 Cisco Technology, Inc. Explaining network anomalies using decision trees

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050172027A1 (en) * 2004-02-02 2005-08-04 Castellanos Maria G. Management of service level agreements for composite Web services
WO2008014328A2 (en) * 2006-07-25 2008-01-31 Pivx Solutions, Inc. Systems and methods for digitally-signed updates
US9542535B1 (en) * 2008-08-25 2017-01-10 Symantec Corporation Systems and methods for recognizing behavorial attributes of software in real-time
RU2638730C2 (en) * 2012-09-06 2017-12-15 Конинклейке Филипс Н.В. Support for making decisions based on manual
US9292599B2 (en) * 2013-04-30 2016-03-22 Wal-Mart Stores, Inc. Decision-tree based quantitative and qualitative record classification
US10452995B2 (en) * 2015-06-29 2019-10-22 Microsoft Technology Licensing, Llc Machine learning classification on hardware accelerators with stacked memory
US10291634B2 (en) * 2015-12-09 2019-05-14 Checkpoint Software Technologies Ltd. System and method for determining summary events of an attack
US10824959B1 (en) * 2016-02-16 2020-11-03 Amazon Technologies, Inc. Explainers for machine learning classifiers
US10726128B2 (en) * 2017-07-24 2020-07-28 Crowdstrike, Inc. Malware detection using local computational models
US11030691B2 (en) * 2018-03-14 2021-06-08 Chicago Mercantile Exchange Inc. Decision tree data structure based processing system
US10839394B2 (en) * 2018-10-26 2020-11-17 Microsoft Technology Licensing, Llc Machine learning system for taking control actions
US20200134037A1 (en) * 2018-10-26 2020-04-30 Ca, Inc. Narration system for interactive dashboards

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8375450B1 (en) * 2009-10-05 2013-02-12 Trend Micro, Inc. Zero day malware scanner
US20160036844A1 (en) * 2014-07-15 2016-02-04 Cisco Technology, Inc. Explaining network anomalies using decision trees

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MARCO TULIO RIBEIRO ET AL: ""Why Should I Trust You?"", PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD '16, ACM PRESS, NEW YORK, NEW YORK, USA, 13 August 2016 (2016-08-13), pages 1135 - 1144, XP058276906, ISBN: 978-1-4503-4232-2, DOI: 10.1145/2939672.2939778 *

Also Published As

Publication number Publication date
US20190258965A1 (en) 2019-08-22
EP3756146B1 (en) 2025-04-09
EP3756146A1 (en) 2020-12-30

Similar Documents

Publication Publication Date Title
US11609991B2 (en) Methods and apparatus for using machine learning on multiple file fragments to identify malware
US11068587B1 (en) Dynamic guest image creation and rollback
US11689549B2 (en) Continuous learning for intrusion detection
US10902117B1 (en) Framework for classifying an object as malicious with machine learning for deploying updated predictive models
US10534906B1 (en) Detection efficacy of virtual machine-based analysis with application specific events
JP5802848B2 (en) Computer-implemented method, non-temporary computer-readable medium and computer system for identifying Trojanized applications (apps) for mobile environments
US9495180B2 (en) Optimized resource allocation for virtual machines within a malware content detection system
US8925076B2 (en) Application-specific re-adjustment of computer security settings
CN111160749B (en) Information quality assessment and information fusion method and device
US11797668B2 (en) Sample data generation apparatus, sample data generation method, and computer readable medium
US11816213B2 (en) System and method for improved protection against malicious code elements
US9477444B1 (en) Method and apparatus for validating and recommending software architectures
CN108600259B (en) Authentication and binding method of equipment, computer storage medium and server
WO2024249450A1 (en) Method and system for predicting malicious entities
EP3756146B1 (en) Supervised learning system
US11763004B1 (en) System and method for bootkit detection
CN110581857B (en) Virtual execution malicious software detection method and system
CN107229865B (en) Method and device for analyzing Webshell intrusion reason
CN113596600B (en) Security management method, device, equipment and storage medium for live broadcast embedded program
WO2020228564A1 (en) Application service method and device
Andoor A filtering based Android Malware Detection system for google playstore
CN115629721B (en) Data processing method and platform suitable for data migration
US20240160400A1 (en) Sound System Using Over-the-Air and Operation Method Thereof
NZ754552B2 (en) Continuous learning for intrusion detection
CN108804924A (en) A kind of method for detecting virus, system and relevant apparatus based on sandbox

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19707599

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019707599

Country of ref document: EP

Effective date: 20200922

OSZAR »