US10140321B2 - Preserving privacy in natural langauge databases - Google Patents
Preserving privacy in natural langauge databases Download PDFInfo
- Publication number
- US10140321B2 US10140321B2 US14/288,793 US201414288793A US10140321B2 US 10140321 B2 US10140321 B2 US 10140321B2 US 201414288793 A US201414288793 A US 201414288793A US 10140321 B2 US10140321 B2 US 10140321B2
- Authority
- US
- United States
- Prior art keywords
- value
- speaker
- feature vector
- transcription
- sanitized text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G06F17/30303—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
- G06F21/6254—Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
Definitions
- the present invention relates to preserving private or confidential information in natural language databases, and more specifically to extraction of private information from natural language databases and to hiding an identity of a person associated with the private information.
- Goal-oriented spoken dialog systems aim to identify intents of humans, expressed in natural language, and take actions accordingly to satisfy their requests.
- a spoken dialog system typically, first the speaker's utterance is recognized using an automatic speech recognizer (ASR). Then, the intent of the speaker is identified from the recognized sequence, using a spoken language understanding (SLU) component.
- ASR automatic speech recognizer
- SLU spoken language understanding
- these calls may include very sensitive information about the callers, such as names as well as the credit card and phone numbers.
- labeling means assigning one or more of the predefined intent(s) (call-type(s)) to each utterance.
- the utterance I would like to pay my bill, in a customer care application.
- the corresponding intent or the call-type would be Pay(Bill) and the action would be learning the caller's account number and credit card number and fulfilling the request.
- the transcribed and labeled data may then used to train automatic speech recognition and call classification models.
- the bottleneck in building an accurate statistical system is the time spent preparing high quality labeled data. Sharing of this data is extremely important for machine learning, data mining, information extraction and retrieval, and natural language processing research. Reuse of the data from one application, while building another application is also crucial in reducing the development time and making the process scalable. However, preserving privacy while sharing data is important since such data may contain confidential information. Outsourcing the data and tasks that require private data is another example of information sharing that may jeopardize the privacy of speakers. It is possible to mine natural language databases to gather aggregate information using statistical methods. The gathered information may be confidential or sensitive. For example, in an application from the medical domain, using the caller utterances and their call-types, one can extract statistical information such as the following:
- a method for preserving privacy in natural language databases is provided. Natural language input may be received. At least one of sanitizing or anonymizing the natural language input may be performed to form a clean output. The clean output may be stored.
- an apparatus for preserving privacy in natural language databases may include a processor and storage configured to store a plurality of instructions for the processor.
- the processor may be configured to receive natural language input, perform at least one of sanitizing or anonymizing the natural language input to form a clean output, and store the clean output.
- an apparatus for preserving privacy in natural language databases may include means for receiving natural language input, means for performing at least one of sanitizing, or anonymizing the natural language input to form a clean output, and means for storing the clean output.
- FIG. 1 illustrates an exemplary spoken dialog system
- FIG. 2 illustrates an exemplary system which may be used in implementations consistent with the principles of the invention
- FIG. 3 is a flowchart of a process that may be performed in implementations consistent with the principles of the invention.
- FIG. 4 illustrates a simple Backus Naur Form (BNF) that defines a grammar for a phone number.
- BNF Backus Naur Form
- FIG. 1 is a functional block diagram of an exemplary natural language spoken dialog system 100 .
- Natural language spoken dialog system 100 may include an automatic speech recognition (ASR) module 102 , a spoken language understanding (SLU) module 104 , a dialog management (DM) module 106 , a spoken language generation (SLG) module 108 , and a text-to-speech (TTS) module 110 .
- ASR automatic speech recognition
- SLU spoken language understanding
- DM dialog management
- SSG spoken language generation
- TTS text-to-speech
- ASR module 102 may analyze speech input and may provide a transcription of the speech input as output.
- SLU module 104 may receive the transcribed input and may use a natural language understanding model to analyze the group of words that are included in the transcribed input to derive a meaning from the input.
- DM module 106 may receive the meaning of the speech input as input and may determine an action, such as, for example, providing a spoken response, based on the input.
- SLG module 108 may generate a transcription of one or more words in response to the action provided by DM 106 .
- TTS module 110 may receive the transcription as input and may provide generated audible as output based on the transcribed speech.
- the modules of system 100 may recognize speech input, such as speech utterances, may transcribe the speech input, may identify (or understand) the meaning of the transcribed speech, may determine an appropriate response to the speech input, may generate text of the appropriate response and from that text, generate audible “speech” from system 100 , which the user then hears. In this manner, the user can carry on a natural language dialog with system 100 .
- speech input such as speech utterances
- may transcribe the speech input may identify (or understand) the meaning of the transcribed speech, may determine an appropriate response to the speech input, may generate text of the appropriate response and from that text, generate audible “speech” from system 100 , which the user then hears.
- speech input such as speech utterances
- may determine an appropriate response to the speech input may generate text of the appropriate response and from that text, generate audible “speech” from system 100 , which the user then hears.
- a computing device such as a smartphone (or any processing device having an audio processing capability, for example a PDA with audio and a WiFi network interface) may have an ASR module wherein a user may say “call mom” and the smartphone may act on the instruction without a “spoken dialog interaction”.
- FIG. 2 illustrates an exemplary processing system 200 in which one or more of the modules of system 100 may be implemented.
- system 100 may include at least one processing system, such as, for example, exemplary processing system 200 .
- System 200 may include a bus 210 , a processor 220 , a memory 230 , a read only memory (ROM) 240 , a storage device 250 , an input device 260 , an output device 270 , and a communication interface 280 .
- Bus 210 may permit communication among the components of system 200 .
- Processor 220 may include at least one conventional processor or microprocessor that interprets and executes instructions.
- Memory 230 may be a random access memory (RAM) or another type of dynamic storage device that stores information and instructions for execution by processor 220 .
- RAM random access memory
- Memory 230 may also store temporary variables or other intermediate information used during execution of instructions by processor 220 .
- ROM 240 may include a conventional ROM device or another type of static storage device that stores static information and instructions for processor 220 .
- Storage device 250 may include any type of media, such as, for example, magnetic or optical recording media and its corresponding drive.
- Input device 260 may include one or more conventional mechanisms that permit a user to input information to system 200 , such as a keyboard, a mouse, a pen, a microphone, a voice recognition device, etc.
- Output device 270 may include one or more conventional mechanisms that output information to the user, including a display, a printer, one or more speakers, or a medium, such as a memory, or a magnetic or optical disk and a corresponding disk drive.
- Communication interface 280 may include any transceiver-like mechanism that enables system 200 to communicate via a network.
- communication interface 280 may include a modem, or an Ethernet interface for communicating via a local area network (LAN).
- LAN local area network
- communication interface 280 may include other mechanisms for communicating with other devices and/or systems via wired, wireless or optical connections.
- communication interface 280 may not be included in processing system 200 when natural spoken dialog system 100 is implemented completely within a single processing system 200 .
- System 200 may perform functions in response to processor 220 executing sequences of instructions contained in a computer-readable medium, such as, for example, memory 230 , a magnetic disk, or an optical disk.
- Computer-readable mediums and computer-readable storage mediums can be tangible, non-transitory, or transitory. Tangible computer-readable storage media, computer-readable storage devices, or computer-readable memory devices, expressly exclude media such as transitory waves, energy, carrier signals, electromagnetic waves, and signals per se.
- Such instructions may be read into memory 230 from another computer-readable medium, such as storage device 250 , or from a separate device via communication interface 280 .
- FIG. 3 is a flowchart that illustrates an exemplary process that may be performed by implementations consistent with the principles of the invention.
- the process may be performed on a system, such as system 200 , and may be performed on data, such as transcribed data such as, for example, transcribed utterance data, prior to releasing the data to third parties.
- data such as transcribed data such as, for example, transcribed utterance data
- the process may begin with retrieval of a transcribed utterance (act 302 ).
- the transcribed utterance may then be sanitized (act 304 ).
- the details of sanitation are described below.
- the transcribed utterance may then be anonymized such that a source of the utterance (i.e., a speaker) may not be easily determined (act 306 ).
- the details of anonymization are described in detail below.
- the modified or cleaned transcribed utterances may then be stored (act 308 ).
- a check may be performed to determine whether any additional transcribed utterances remain to be processed (act 310 ).
- the utterances may be upsampled or downsampled according to their call-types to change their call-type distribution (act 312 ). If so, then acts 302 - 310 may be repeated. Otherwise, the process is completed.
- the aim of sanitization is to hide personal information, given privacy requirements, in order to disable data mining approaches from extracting personal or other private business related information in spoken language databases. This can be considered privacy preserving text mining.
- Sanitization depends on the corresponding task. Data quality should be preserved after the sanitization. Data quality may be measured in terms of readability and the ability to use the sanitized text for the corresponding task. For example, if the data is going to be used for text classification, sanitization should be performed without adversely affecting classification accuracy. For example, if information retrieval is to be performed, sanitization methods should not interfere with indexing and document matching methods.
- the methods include value distortion, value disassociation and value class membership.
- Value distortion alters confidential values that need to be hidden with random values.
- Value dissociation keeps a true distribution of the values, but replaces each value in a record with a value of the same field from another record. This can be achieved, for example, by exchanging the values across sentences.
- Value-class membership exchanges individual values with disjoint, mutually exhaustive classes. For example, all names of people may be changed to a single token ⁇ NAME>.
- Modifying the values of named entities or replacing the values with generic tokens is the simplest form of text sanitization. If the named entities are not already marked during transcription or labeling, automatic named entity extraction methods, which are well studied in the computational linguistics community, may be utilized. K-anonymity can be assured for text sanitization while determining the generic tokens. K-anonymity is defined, such that as it applies to names of people, the names and other features that may be used to identify a person may be generalized such that they map to at least k-people. For k-anonymity as it applies to numeric values such as salary, a concept hierarchy may be exploited. For example, the salary may be mapped to a more generic value (e.g., low, average, high, and astronomic linguistic hedges in the concept hierarchy).
- the named entities may be found in a given transcribed utterance database of calls, and the named entities may be hidden by using any of the three previously-discussed sanitization methods.
- the named entity values may be replaced with random values from the same named entity category.
- dissociation the value of the named entity maybe exchanged with the value of another named entity of the same category in the transcribed utterance database.
- value-class membership the named entity values may be replaced with generic named entity category tokens, such as: ⁇ NAME> and ⁇ PHONE_NUMBER>. This last approach may be likely to improve accuracy of call-type classification due to better generalization of word n-gram features, because call-types are expected to have strong associations with named entity categories, but not necessarily with their values.
- the purpose of named entity extraction is to identify the sub-string of the input utterance that contains a named entity, and extract its type. For example in the utterance “my phone number is 1 2 3 4 5 6 7 8 9 0”, the sub-string “1 2 3 4 5 6 7 8 9” contains the named entity of type ⁇ PHONE_NUMBER>. After named entity extraction, this entity can be marked in the utterance using eXtensible Markup Language (XML) tags: “my phone number is ⁇ PHONE_NUMBER>1 2 3 4 5 6 7 8 9 0 ⁇ /PHONE_NUMBER>” for sanitization purposes.
- XML eXtensible Markup Language
- Implementations consistent with the principles of the invention may employ a rule-based or a statistical approach for named entity extraction.
- a grammar in Backus Naur Form (BNF) may be manually created.
- the creation could involve the reuse and extension of a library of application-independent named entity grammars (“phone numbers”, “dates”, etc.) or a set of named entity grammars may be created for the current application.
- FIG. 4 shows a simple example of a grammar that may be used to extract phone numbers.
- These grammars are typically regular expressions written in a grammar rule notation.
- the grammars may be compiled into finite-state transducers whose arcs are labeled with the terminals of the grammars. The two components of the arc labels may then be interpreted as the input and the output symbols leading to a finite-state transducer representation.
- PHONE_NUMBER is made up of an area code, a local number and a phone number. Area code includes 3 digits, local number includes 7 digits, and each digit is made up of any of the numbers 0 through 9.
- each utterance FSM (U i ) may be composed with each entity grammar F j sequentially resulting in an FSM (M i ) representing the utterances with the named entities marked.
- the grammar rules can also specify the context in which they can apply, to prevent false acceptances.
- Detecting names of people may be difficult using regular grammars.
- a heuristic or automated approach may be employed in implementations consistent with the principles of the invention to detect names.
- grammars may be used to detect location and organization names. All other words that start with an upper-case letter may be assumed to be names. Because names may already be marked with an uppercase initial letter, the heuristic approach is reasonable and a significant performance improvement was observed during experiments.
- the above sanitization approaches may be used to sanitize output of an ASR component as well as human transcriptions.
- the initial letters of proper names can also be in upper-case in the ASR output, if the proper names are also capitalized in the training data.
- NE Named Entity
- Text sanitization may also help protect data against some indirect threats.
- the utterances may be changed and therefore, the utterance and call-type distribution, by up-sampling or down-sampling the data.
- Spoken language understanding models may be trained using spoken dialog utterances labeled with user intents (call-types). Changing the utterance and therefore, the call-type distribution will prevent others from extracting such indirect information.
- the utterances may be down-sampled by collapsing the same or very similar utterances into one example. This is known as cloning.
- the utterances may be down-sampled by just collapsing. For up-sampling, some utterances may be selected and duplicated by adding variations and by inserting dysfluencies, using a synonym list to change words, paraphrasing, or changing the named entity values.
- the utterances may be compiled into a finite state machine (FSM), and may generate as many paths and utterances as needed. FSMs may be used to generate the alternatives of similar frequent sequences such as “I would like to” and “I warmtha”, and named entities.
- FSM finite state machine
- Text anonymization is therefore necessary to protect the privacy of the authors, as well as speakers. Text anonymization aims at preventing the identification of the author or speaker (who is also considered to be the owner) of a given text or utterances.
- the concept of k-anonymity can be used as privacy metric for anonymization in the data mining context. K-anonymity against text classification is satisfied if text classification tools cannot differentiate between k authors for a text.
- documents may include electronically stored text or transcribed utterances
- a fixed set of documents or utterances may be assumed, for example, a digital library which collects all the major work of a given set of authors.
- Authorship information for some documents may be known and some of the authorships may not be known.
- a typical example is a set of articles and a set of reviews for those articles.
- the adversary is able to find another set of documents for the authors, for example, by searching the internet, where the number of documents that could be found is practically infinite.
- Text classification techniques may be used to first parse the text to obtain the features.
- Features that may be used to classify text may include the frequencies of words, phrases, and punctuation marks.
- Each document may be represented as a feature vector where each feature may be represented by a real number.
- DP the set of documents where the authorship information is public
- DA the set of documents where the authorship information is confidential.
- An adversary could train a classification model using DP to predict the authorship information of a document in DA. Since DP is known and fixed, anonymization can work on both DP and DA.
- the documents in DP and DA may be modified in order to change their feature vectors so that the data mining tools may not classify the document accurately.
- the most general model that an adversary may use is a classification model that returns probabilities for each author for a given document. This way each author will have a certain probability of being an author for a specific anonymous document.
- One approach that may be used to achieve k-anonymity is to change the probability of the real author so that (s)he falls into one of the top 1 . . . k positions randomly selected among the top-k authors with the highest probabilities. Probabilities may then be changed by updating the documents in DP and DA. This process may be performed in such a way that the original meaning of the document is preserved.
- DP is not fixed then the model that could be constructed by the adversary cannot be known in advance which complicates the anonymization process.
- the approach may be to update the anonymous documents in such a way that their feature vectors look alike to obscure the adversary.
- This can be achieved by changing the feature vectors such that at least k of the documents with different authors have the same feature vector. This may be accomplished by taking the mean of k feature vectors of documents with different authors and assigning the mean as the new feature vector.
- the disclosed method includes sanitizing sensitive information found in a transcription from a speaker, to yield a clean transcription including sanitized text and non-sanitized text, generating a mean feature vector associated with a plurality of speakers and anonymizing the non-sanitized text by replacing the feature vector associated with the non-sanitized text with the mean feature vector of the plurality of speakers.
- the anonymization method may depend heavily on features of a classifier used for authorship identification by the adversary. If the classifier only uses unigram word distributions, then anonymization may be achieved simply by replacing the words with their synonyms or by mapping them to more generic terms, as was done for sanitization. If the classifier uses a different feature set, such as the distribution of stop-words (such as “the” or “by”) or words from a closed class part of speech (word category) tags (that is, almost all words which are not nouns, verbs, or adjectives) then revising the sentences may be a solution. If the classifier uses other features such as passive or active voice, specific clauses, average length of sentences, etc., these features may need to be specifically addressed. If the text anonymization task has no information about the features of the classifier that the adversary is using, then the optimal solution may be to assume that the classifier uses all possible features of which one may think and anonymize the text accordingly.
- stop-words such as “the” or “by”
- word category word
- Embodiments within the scope of the present invention may also include computer-readable media for carrying or having computer-executable instructions or data structures stored thereon.
- Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer.
- Such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code means in the form of computer-executable instructions or data structures.
- a network or another communications connection either hardwired, wireless, or combination thereof
- any such connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of the computer-readable media.
- Computer-executable instructions include, for example, instructions and data which cause a general purpose computer, special purpose computer, or special purpose processing device to perform a certain function or group of functions.
- Computer-executable instructions also include program modules that are executed by computers in stand-alone or network environments.
- program modules include routines, programs, objects, components, and data structures, etc. that perform particular tasks or implement particular abstract data types.
- Computer-executable instructions, associated data structures, and program modules represent examples of the program code means for executing steps of the methods disclosed herein. The particular sequence of such executable instructions or associated data structures represents examples of corresponding acts for implementing the functions described in such steps.
- Embodiments of the invention may be practiced in network computing environments with many types of computer system configurations, including personal computers, hand-held devices, multi-processor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, and the like. Embodiments may also be practiced in distributed computing environments where tasks are performed by local and remote processing devices that are linked (either by hardwired links, wireless links, or by a combination thereof) through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Medical Informatics (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
Description
-
- System: How may I help you?
- User: Hello. This is John Smith. My phone number is 973 area code 1239684. I wish to have my bill, long distance bill, sent to my Discover card for payment.
- System: OK, I can help you with that. What is your credit card number?
- User: My Discover card number is 28743617891257 hundred and it expires on first month of next year.
- System: . . . .
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/288,793 US10140321B2 (en) | 2004-07-30 | 2014-05-28 | Preserving privacy in natural langauge databases |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US59285504P | 2004-07-30 | 2004-07-30 | |
US11/086,954 US8473451B1 (en) | 2004-07-30 | 2005-03-22 | Preserving privacy in natural language databases |
US13/926,404 US8751439B2 (en) | 2004-07-30 | 2013-06-25 | Preserving privacy in natural language databases |
US14/288,793 US10140321B2 (en) | 2004-07-30 | 2014-05-28 | Preserving privacy in natural langauge databases |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/926,404 Continuation US8751439B2 (en) | 2004-07-30 | 2013-06-25 | Preserving privacy in natural language databases |
Publications (2)
Publication Number | Publication Date |
---|---|
US20140278409A1 US20140278409A1 (en) | 2014-09-18 |
US10140321B2 true US10140321B2 (en) | 2018-11-27 |
Family
ID=48627784
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/086,954 Active 2026-03-17 US8473451B1 (en) | 2004-07-30 | 2005-03-22 | Preserving privacy in natural language databases |
US13/926,404 Expired - Lifetime US8751439B2 (en) | 2004-07-30 | 2013-06-25 | Preserving privacy in natural language databases |
US14/288,793 Active 2025-10-21 US10140321B2 (en) | 2004-07-30 | 2014-05-28 | Preserving privacy in natural langauge databases |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/086,954 Active 2026-03-17 US8473451B1 (en) | 2004-07-30 | 2005-03-22 | Preserving privacy in natural language databases |
US13/926,404 Expired - Lifetime US8751439B2 (en) | 2004-07-30 | 2013-06-25 | Preserving privacy in natural language databases |
Country Status (1)
Country | Link |
---|---|
US (3) | US8473451B1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020117074A1 (en) * | 2018-12-06 | 2020-06-11 | Motorola Solutions, Inc | Method and system to ensure a submitter of an anonymous tip remains anonymous |
US10999256B2 (en) * | 2018-01-29 | 2021-05-04 | Sap Se | Method and system for automated text anonymization |
US11550937B2 (en) | 2019-06-13 | 2023-01-10 | Fujitsu Limited | Privacy trustworthiness based API access |
US20230134796A1 (en) * | 2021-10-29 | 2023-05-04 | Glipped, Inc. | Named entity recognition system for sentiment labeling |
US12039082B2 (en) | 2022-08-09 | 2024-07-16 | Motorola Solutions, Inc. | System and method for anonymizing a person captured in an image |
Families Citing this family (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9111540B2 (en) * | 2009-06-09 | 2015-08-18 | Microsoft Technology Licensing, Llc | Local and remote aggregation of feedback data for speech recognition |
US8935154B1 (en) * | 2012-04-13 | 2015-01-13 | Symantec Corporation | Systems and methods for determining authorship of an unclassified notification message |
US9093069B2 (en) | 2012-11-05 | 2015-07-28 | Nuance Communications, Inc. | Privacy-sensitive speech model creation via aggregation of multiple user models |
US9131369B2 (en) | 2013-01-24 | 2015-09-08 | Nuance Communications, Inc. | Protection of private information in a client/server automatic speech recognition system |
US9437207B2 (en) * | 2013-03-12 | 2016-09-06 | Pullstring, Inc. | Feature extraction for anonymized speech recognition |
US9514740B2 (en) * | 2013-03-13 | 2016-12-06 | Nuance Communications, Inc. | Data shredding for speech recognition language model training under data retention restrictions |
US9514741B2 (en) * | 2013-03-13 | 2016-12-06 | Nuance Communications, Inc. | Data shredding for speech recognition acoustic model training under data retention restrictions |
EP3069287A4 (en) | 2013-11-14 | 2017-05-17 | 3M Innovative Properties Company | Obfuscating data using obfuscation table |
WO2016068883A1 (en) * | 2014-10-28 | 2016-05-06 | Hewlett Packard Enterprise Development Lp | Entity anonymization for a query directed to a multiplex graph |
US9934406B2 (en) | 2015-01-08 | 2018-04-03 | Microsoft Technology Licensing, Llc | Protecting private information in input understanding system |
US9881613B2 (en) * | 2015-06-29 | 2018-01-30 | Google Llc | Privacy-preserving training corpus selection |
US10402469B2 (en) * | 2015-10-16 | 2019-09-03 | Google Llc | Systems and methods of distributed optimization |
US9779756B2 (en) * | 2015-12-11 | 2017-10-03 | International Business Machines Corporation | Method and system for indicating a spoken word has likely been misunderstood by a listener |
US10360404B2 (en) * | 2016-02-25 | 2019-07-23 | International Business Machines Corporation | Author anonymization |
US10209907B2 (en) | 2016-06-14 | 2019-02-19 | Microsoft Technology Licensing, Llc | Secure removal of sensitive data |
CN108701037A (en) * | 2017-02-23 | 2018-10-23 | 华为技术有限公司 | A kind of method, apparatus and terminal of the application task list of cleaning terminal |
CN107016073B (en) * | 2017-03-24 | 2019-06-28 | 北京科技大学 | A kind of text classification feature selection approach |
US10963493B1 (en) | 2017-04-06 | 2021-03-30 | AIBrain Corporation | Interactive game with robot system |
US10839017B2 (en) * | 2017-04-06 | 2020-11-17 | AIBrain Corporation | Adaptive, interactive, and cognitive reasoner of an autonomous robotic system utilizing an advanced memory graph structure |
US10929759B2 (en) | 2017-04-06 | 2021-02-23 | AIBrain Corporation | Intelligent robot software platform |
US11151992B2 (en) | 2017-04-06 | 2021-10-19 | AIBrain Corporation | Context aware interactive robot |
US10810371B2 (en) * | 2017-04-06 | 2020-10-20 | AIBrain Corporation | Adaptive, interactive, and cognitive reasoner of an autonomous robotic system |
US10909978B2 (en) * | 2017-06-28 | 2021-02-02 | Amazon Technologies, Inc. | Secure utterance storage |
US10453447B2 (en) | 2017-11-28 | 2019-10-22 | International Business Machines Corporation | Filtering data in an audio stream |
EP3496090A1 (en) * | 2017-12-07 | 2019-06-12 | Thomson Licensing | Device and method for privacy-preserving vocal interaction |
US11120199B1 (en) * | 2018-02-09 | 2021-09-14 | Voicebase, Inc. | Systems for transcribing, anonymizing and scoring audio content |
DE102018202018B3 (en) * | 2018-02-09 | 2019-05-09 | Siemens Schweiz Ag | Method and system for providing a voice-based service, in particular for the control of room control elements in buildings |
US10984198B2 (en) * | 2018-08-30 | 2021-04-20 | International Business Machines Corporation | Automated testing of dialog systems |
WO2020074651A1 (en) * | 2018-10-10 | 2020-04-16 | Koninklijke Philips N.V. | Free text de-identification |
US11195524B2 (en) | 2018-10-31 | 2021-12-07 | Walmart Apollo, Llc | System and method for contextual search query revision |
US11238850B2 (en) | 2018-10-31 | 2022-02-01 | Walmart Apollo, Llc | Systems and methods for e-commerce API orchestration using natural language interfaces |
US11404058B2 (en) | 2018-10-31 | 2022-08-02 | Walmart Apollo, Llc | System and method for handling multi-turn conversations and context management for voice enabled ecommerce transactions |
US11183176B2 (en) * | 2018-10-31 | 2021-11-23 | Walmart Apollo, Llc | Systems and methods for server-less voice applications |
US20230032536A1 (en) * | 2019-12-23 | 2023-02-02 | Medsavana S.L. | Privacy preservation in a queryable database built from unstructured texts |
US11217223B2 (en) * | 2020-04-28 | 2022-01-04 | International Business Machines Corporation | Speaker identity and content de-identification |
US11968230B2 (en) | 2021-03-18 | 2024-04-23 | International Business Machines Corporation | Managing communication privacy in encroaching environments |
US20220399009A1 (en) * | 2021-06-09 | 2022-12-15 | International Business Machines Corporation | Protecting sensitive information in conversational exchanges |
EP4396743A1 (en) * | 2021-10-01 | 2024-07-10 | Schneider Electric USA, Inc. | Maintenance data sanitization |
US20240144931A1 (en) * | 2022-11-01 | 2024-05-02 | Microsoft Technology Licensing, Llc | Systems and methods for gpt guided neural punctuation for conversational speech |
FR3150317A1 (en) * | 2023-06-22 | 2024-12-27 | Tuito | Method of pseudoanonymizing a first query, in natural language, for interrogating a database containing confidential data. |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5909680A (en) * | 1996-09-09 | 1999-06-01 | Ricoh Company Limited | Document categorization by word length distribution analysis |
US5911129A (en) | 1996-12-13 | 1999-06-08 | Intel Corporation | Audio font used for capture and rendering |
US6085178A (en) | 1997-03-21 | 2000-07-04 | International Business Machines Corporation | Apparatus and method for communicating between an intelligent agent and client computer process using disguised messages |
US20020039408A1 (en) | 2000-10-03 | 2002-04-04 | Securetell, Inc. | Method and system for enabling workers to communicate anonymously with their employers |
US6404872B1 (en) | 1997-09-25 | 2002-06-11 | At&T Corp. | Method and apparatus for altering a speech signal during a telephone call |
US6507643B1 (en) | 2000-03-16 | 2003-01-14 | Breveon Incorporated | Speech recognition system and method for converting voice mail messages to electronic mail messages |
US20030105634A1 (en) | 2001-10-15 | 2003-06-05 | Alicia Abella | Method for dialog management |
US20030217272A1 (en) | 2002-05-15 | 2003-11-20 | International Business Machines Corporation | System and method for digital watermarking of data repository |
US20040148154A1 (en) * | 2003-01-23 | 2004-07-29 | Alejandro Acero | System for using statistical classifiers for spoken language understanding |
US6792425B2 (en) | 2000-11-30 | 2004-09-14 | Hitachi, Ltd. | Secure multi database system including a client, multi database server, and database server |
US20040181514A1 (en) | 2003-03-13 | 2004-09-16 | International Business Machines Corporation | Byte-code representations of actual data to reduce network traffic in database transactions |
US20040181670A1 (en) | 2003-03-10 | 2004-09-16 | Carl Thune | System and method for disguising data |
US20060005017A1 (en) | 2004-06-22 | 2006-01-05 | Black Alistair D | Method and apparatus for recognition and real time encryption of sensitive terms in documents |
US7028184B2 (en) | 2001-01-17 | 2006-04-11 | International Business Machines Corporation | Technique for digitally notarizing a collection of data streams |
-
2005
- 2005-03-22 US US11/086,954 patent/US8473451B1/en active Active
-
2013
- 2013-06-25 US US13/926,404 patent/US8751439B2/en not_active Expired - Lifetime
-
2014
- 2014-05-28 US US14/288,793 patent/US10140321B2/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5909680A (en) * | 1996-09-09 | 1999-06-01 | Ricoh Company Limited | Document categorization by word length distribution analysis |
US5911129A (en) | 1996-12-13 | 1999-06-08 | Intel Corporation | Audio font used for capture and rendering |
US6085178A (en) | 1997-03-21 | 2000-07-04 | International Business Machines Corporation | Apparatus and method for communicating between an intelligent agent and client computer process using disguised messages |
US6404872B1 (en) | 1997-09-25 | 2002-06-11 | At&T Corp. | Method and apparatus for altering a speech signal during a telephone call |
US6507643B1 (en) | 2000-03-16 | 2003-01-14 | Breveon Incorporated | Speech recognition system and method for converting voice mail messages to electronic mail messages |
US20020039408A1 (en) | 2000-10-03 | 2002-04-04 | Securetell, Inc. | Method and system for enabling workers to communicate anonymously with their employers |
US6792425B2 (en) | 2000-11-30 | 2004-09-14 | Hitachi, Ltd. | Secure multi database system including a client, multi database server, and database server |
US7028184B2 (en) | 2001-01-17 | 2006-04-11 | International Business Machines Corporation | Technique for digitally notarizing a collection of data streams |
US20030105634A1 (en) | 2001-10-15 | 2003-06-05 | Alicia Abella | Method for dialog management |
US20030217272A1 (en) | 2002-05-15 | 2003-11-20 | International Business Machines Corporation | System and method for digital watermarking of data repository |
US20040148154A1 (en) * | 2003-01-23 | 2004-07-29 | Alejandro Acero | System for using statistical classifiers for spoken language understanding |
US20040181670A1 (en) | 2003-03-10 | 2004-09-16 | Carl Thune | System and method for disguising data |
US20040181514A1 (en) | 2003-03-13 | 2004-09-16 | International Business Machines Corporation | Byte-code representations of actual data to reduce network traffic in database transactions |
US20060005017A1 (en) | 2004-06-22 | 2006-01-05 | Black Alistair D | Method and apparatus for recognition and real time encryption of sensitive terms in documents |
Non-Patent Citations (6)
Title |
---|
Allen L. Gorin, "Automated Natural Spoken Dialog", Apr. 2002, IEEE, pp. 51-56. |
Curry Guinn, "Extracting Emotional Information from the Text of Spoken Dialog", 2003, Proceedings of the 9th International Conference. |
Francis Kubala, Named Entity Extraction from Speech, 1998 en.scientificcommunications.org. |
Patrick Ruch et al., "Medical Document Anonymization with a Semantic Lexicon", Medical Informatics Division, University Hospital of Geneva; ISSCO, University of Geneva, 2000. |
Rakesh Agrawal et al., "Privacy-Preserving Data Mining", IBM Almaden Research Center, San Jose, California, Jun. 2000. |
Richard Conway et al., "Selective Partial Access to a Database", Cornell University, Ithaca, New York, Oct. 20, 1976. |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10999256B2 (en) * | 2018-01-29 | 2021-05-04 | Sap Se | Method and system for automated text anonymization |
WO2020117074A1 (en) * | 2018-12-06 | 2020-06-11 | Motorola Solutions, Inc | Method and system to ensure a submitter of an anonymous tip remains anonymous |
US12039271B2 (en) | 2018-12-06 | 2024-07-16 | Motorola Solutions, Inc. | Method and system to ensure a submitter of an anonymous tip remains anonymous |
US11550937B2 (en) | 2019-06-13 | 2023-01-10 | Fujitsu Limited | Privacy trustworthiness based API access |
US20230134796A1 (en) * | 2021-10-29 | 2023-05-04 | Glipped, Inc. | Named entity recognition system for sentiment labeling |
US12039082B2 (en) | 2022-08-09 | 2024-07-16 | Motorola Solutions, Inc. | System and method for anonymizing a person captured in an image |
Also Published As
Publication number | Publication date |
---|---|
US8473451B1 (en) | 2013-06-25 |
US20140278409A1 (en) | 2014-09-18 |
US20130289984A1 (en) | 2013-10-31 |
US8751439B2 (en) | 2014-06-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10140321B2 (en) | Preserving privacy in natural langauge databases | |
US20220222437A1 (en) | Systems and methods for structured phrase embedding and use thereof | |
US11250876B1 (en) | Method and system for confidential sentiment analysis | |
US9111540B2 (en) | Local and remote aggregation of feedback data for speech recognition | |
Tur et al. | Spoken language understanding: Systems for extracting semantic information from speech | |
US7742911B2 (en) | Apparatus and method for spoken language understanding by using semantic role labeling | |
US9218810B2 (en) | System and method for using semantic and syntactic graphs for utterance classification | |
EP1016074B1 (en) | Text normalization using a context-free grammar | |
EP1290676B1 (en) | Creating a unified task dependent language models with information retrieval techniques | |
US20080235004A1 (en) | Disambiguating text that is to be converted to speech using configurable lexeme based rules | |
US20090112600A1 (en) | System and method for increasing accuracy of searches based on communities of interest | |
US20140343942A1 (en) | Multitask Learning for Spoken Language Understanding | |
US11386269B2 (en) | Fault-tolerant information extraction | |
CA3177453A1 (en) | System and method for query authorization and response generation using machine learning | |
Dahl | Natural language processing: past, present and future | |
KR100684160B1 (en) | Apparatus and method for dialogue analysis using entity name recognition | |
Chowdhury et al. | Bangla grapheme to phoneme conversion using conditional random fields | |
Tang et al. | Preserving privacy in spoken language databases | |
Oudah et al. | Studying the impact of language-independent and language-specific features on hybrid Arabic Person name recognition | |
KR102162850B1 (en) | System for identifying human name in unstructured documents | |
CN114417387A (en) | Message encryption method based on semantic connotation | |
Yeh et al. | Ontology‐based speech act identification in a bilingual dialog system using partial pattern trees | |
Celikkaya et al. | A mobile assistant for Turkish | |
Jansche | Inference of string mappings for language technology | |
Arnab et al. | Shohojogi: an automated Bengali voice chat system for the banking customer services |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: AT&T CORP., NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:HAKKANI-TUR, DILEK Z.;SAYGIN, YUCEL;TANG, MING;AND OTHERS;SIGNING DATES FROM 20050127 TO 20050410;REEL/FRAME:038128/0199 |
|
AS | Assignment |
Owner name: AT&T INTELLECTUAL PROPERTY II, L.P., GEORGIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AT&T PROPERTIES, LLC;REEL/FRAME:038529/0240 Effective date: 20160204 Owner name: AT&T PROPERTIES, LLC, NEVADA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AT&T CORP.;REEL/FRAME:038529/0164 Effective date: 20160204 |
|
AS | Assignment |
Owner name: NUANCE COMMUNICATIONS, INC., MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:AT&T INTELLECTUAL PROPERTY II, L.P.;REEL/FRAME:041512/0608 Effective date: 20161214 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
AS | Assignment |
Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:NUANCE COMMUNICATIONS, INC.;REEL/FRAME:065533/0389 Effective date: 20230920 |