CN111428052A - Method for constructing educational concept graph with multiple relations from multi-source data - Google Patents

Method for constructing educational concept graph with multiple relations from multi-source data Download PDF

Info

Publication number
CN111428052A
CN111428052A CN202010235272.5A CN202010235272A CN111428052A CN 111428052 A CN111428052 A CN 111428052A CN 202010235272 A CN202010235272 A CN 202010235272A CN 111428052 A CN111428052 A CN 111428052A
Authority
CN
China
Prior art keywords
concept
concepts
educational
key
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010235272.5A
Other languages
Chinese (zh)
Other versions
CN111428052B (en
Inventor
刘淇
陈恩红
黄小青
王超
马建辉
苏喻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN202010235272.5A priority Critical patent/CN111428052B/en
Publication of CN111428052A publication Critical patent/CN111428052A/en
Application granted granted Critical
Publication of CN111428052B publication Critical patent/CN111428052B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Educational Administration (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Educational Technology (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种从多源数据构建具有多重关系的教育概念图方法,包括:爬取多源数据,使用数据挖掘方法,提取出概念文本,构成训练数据集;获取专家对训练数据集的标注结果,按照概念的来源以及概念的标签,提取概念以及概念之间的相关特征;利用标注后的训练数据集结合传统机器学习方法,训练用于预测教育关键概念的支持向量机,以及基于训练数据集中标注出的教育关键概念及教育关键概念对之间的先决条件关系和共同学习关系,结合传统机器学习方法,训练用于预测教育关键概念对的先决条件关系和共同学习关系的混合模型;利用训练好的支持向量机与混合模型对新的数据集进行教育概念图的构建。该方法可以精准地构建具有多重关系的教育概念图。

Figure 202010235272

The invention discloses a method for constructing an educational concept map with multiple relationships from multi-source data, comprising: crawling the multi-source data, using a data mining method, extracting concept texts to form a training data set; obtaining experts' opinions on the training data set Labeling results, extracting concepts and related features between concepts according to the source of the concept and the label of the concept; using the labeled training data set combined with traditional machine learning methods to train the support vector machine for predicting key concepts of education, and training based on training The educational key concepts and the prerequisite relationships and common learning relationships between the pairs of educational key concepts marked in the dataset are combined with traditional machine learning methods to train a hybrid model for predicting the prerequisite relationships and common learning relationships of pairs of educational key concepts; Use trained support vector machines and hybrid models to construct educational concept maps on new datasets. This method can accurately construct educational concept maps with multiple relationships.

Figure 202010235272

Description

一种从多源数据构建具有多重关系的教育概念图方法A method for constructing educational concept maps with multiple relationships from multi-source data

技术领域technical field

本发明涉及教育数据挖掘技术领域,尤其涉及一种从多源数据构建具有多重关系的教育概念图方法。The invention relates to the technical field of educational data mining, in particular to a method for constructing an educational concept map with multiple relationships from multi-source data.

背景技术Background technique

概念图由各种概念及其关系组成,是一种广泛使用的组织和表示知识的图形工具。在各种概念图中,教育概念图主要关注概念之间的教学关系。因此,它有利于学生组织和获得一个学科的知识。构建教育概念图不仅有利于学生增强自主学习策略,而且在很大程度上有助于教师提高科学教育、教学评价、课程规划等任务,还可以根据教育概念图为学生实现试题或者学习资源的推荐任务(统称为后续任务)。Concept map, which consists of various concepts and their relationships, is a widely used graphical tool for organizing and representing knowledge. Among the various concept maps, educational concept maps mainly focus on the pedagogical relationships between concepts. Therefore, it facilitates students to organize and acquire knowledge of a subject. The construction of educational concept maps not only helps students to enhance their independent learning strategies, but also helps teachers to improve tasks such as science education, teaching evaluation, and curriculum planning. It can also recommend test questions or learning resources for students based on educational concept maps. Tasks (collectively referred to as follow-up tasks).

教育概念图能帮助学生高效的、个性化的学习,是智能化个性教学的重要基石。自动准确的构建概念图,可以帮助学生清楚地了解自身的学习路径,同时可以辅助家长和老师为学生制定个性化的学习策略。因此,如何自动的、准确的构建概念图,一直是教育数据挖掘领域探索的一个重要问题。Education concept map can help students to learn efficiently and personalized, and is an important cornerstone of intelligent personalized teaching. Automatic and accurate construction of concept maps can help students clearly understand their own learning paths, and at the same time can assist parents and teachers to formulate personalized learning strategies for students. Therefore, how to automatically and accurately construct a concept map has always been an important issue explored in the field of educational data mining.

在目前的研究工作和专利中,关于教育概念图构建的方法主要有以下方法:In the current research work and patents, the methods for constructing educational concept maps mainly include the following methods:

1)基于人工构建的教育概念图方法。1) An educational concept map method based on artificial construction.

目前,基于人工构建的教育概念图方法主要着重于不同学科,由教师或助教提供。Currently, human-constructed educational concept map methods mainly focus on different disciplines and are provided by teachers or teaching assistants.

2)基于机器学习的教育概念图构建方法。2) The construction method of educational concept map based on machine learning.

基于机器学习的教育概念图构建方法结合了传统机器学习中常用的分类(如支持向量机)算法,有学者利用此方法抽取维基百科中的概念图。The method of constructing educational concept maps based on machine learning combines the classification (such as support vector machine) algorithms commonly used in traditional machine learning. Some scholars use this method to extract concept maps in Wikipedia.

上述两种方法都存在着一些不足,第一种方法费时的,而且,教师和助教只能根据自己的经验为学生开发个性化的概念图。因此,手工概念图难免存在一些错误和遗漏。第二种方法并没有考虑多源信息对构建教育概念图的帮助,而且它们均只关注一种教育学关系,因此构建的图谱是不完善的。教育概念图做后续任务的参考数据,当教育概念图不够准确时,也将影响后续任务的效果。Both of the above methods have some shortcomings. The first method is time-consuming, and teachers and teaching assistants can only develop personalized concept maps for students based on their own experience. Therefore, there are inevitably some errors and omissions in manual concept maps. The second method does not consider the help of multi-source information for constructing educational concept maps, and they only focus on one pedagogical relationship, so the constructed maps are imperfect. The educational concept map is used as reference data for subsequent tasks. When the educational concept map is not accurate enough, it will also affect the effect of subsequent tasks.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供一种从多源数据构建具有多重关系的教育概念图方法,通过对不同数据源进行准确的建模分析处理,从而提高预测结果的准确性,进而可以精准地构建具有多重关系的教育概念图。The purpose of the present invention is to provide a method for constructing an educational concept map with multiple relationships from multi-source data. By performing accurate modeling analysis and processing on different data sources, the accuracy of the prediction results can be improved, and then the accuracy of the prediction results can be accurately constructed. Relationship education concept illustration.

本发明的目的是通过以下技术方案实现的:The purpose of this invention is to realize through the following technical solutions:

一种从多源数据构建具有多重关系的教育概念图方法,包括:A method for constructing educational concept maps with multiple relationships from multi-source data, including:

步骤11、爬取多源数据,使用数据挖掘方法,提取出概念文本,构成训练数据集;Step 11. Crawling multi-source data, using data mining methods, extracting conceptual texts to form a training data set;

步骤12、获取专家对训练数据集的标注结果,标注结果包括:根据概念重要程度为各个概念标注的教育关键概念或非教育关键概念的标签,以及教育关键概念对之间的先决条件关系和共同学习关系;按照概念的来源以及概念的标签,提取概念以及概念之间的相关特征;Step 12: Obtain the labeling results of the experts on the training data set. The labeling results include: labels of educational key concepts or non-educational key concepts annotated for each concept according to the importance of the concepts, as well as the prerequisite relationship and common educational key concept pairs. Learning relationships; extracting concepts and related features between concepts according to the source of the concept and the label of the concept;

步骤13、利用标注后的训练数据集结合传统机器学习方法,训练用于预测教育关键概念的支持向量机,以及基于训练数据集中标注出的教育关键概念及教育关键概念对之间的先决条件关系和共同学习关系,结合传统机器学习方法,训练用于预测教育关键概念对的先决条件关系和共同学习关系的混合模型;Step 13: Using the labeled training data set combined with traditional machine learning methods to train a support vector machine for predicting key educational concepts, as well as the educational key concepts marked in the training data set and the prerequisite relationship between the pairs of educational key concepts and co-learning relationships, combined with traditional machine learning methods, to train a hybrid model for predicting prerequisite relationships and co-learning relationships for pairs of key educational concepts;

由上述本发明提供的技术方案可以看出,该方法针对多种不同的数据源,通过不同的数据集特点,提取出不同的特征;在此基础上,对于三大不同的任务,首先基于相关特征对关键概念进行抽取,之后对分别对两种不同的关系:先决条件关系以及共同学习关系进行抽取。通过对多种数据源的利用以及对多种关系的抽取,弥补了现有方法关系单一以及分类效果不理想的问题,进而更加准确的构建了教育概念图,进而可以更为准确的实现学生个性化试题或者学习资源的推荐。It can be seen from the technical solution provided by the present invention that the method extracts different features for a variety of different data sources through different data set features; on this basis, for three different tasks, first The feature extracts key concepts, and then extracts two different relationships: prerequisite relationships and co-learning relationships. Through the use of multiple data sources and the extraction of multiple relationships, the problems of single relationship and unsatisfactory classification effect of the existing methods are made up, and the educational concept map is more accurately constructed, which can more accurately realize the personality of students Test questions or recommendations for learning resources.

附图说明Description of drawings

为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他附图。In order to illustrate the technical solutions of the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without any creative effort.

图1为本发明实施例提供的一种从多源数据构建具有多重关系的教育概念图方法的流程图。FIG. 1 is a flowchart of a method for constructing an educational concept map with multiple relationships from multi-source data according to an embodiment of the present invention.

具体实施方式Detailed ways

下面结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明的保护范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present invention.

本发明实施例提供一种从多源数据构建具有多重关系的教育概念图方法的流程图,如图1所示,其主要包括如下步骤:An embodiment of the present invention provides a flowchart of a method for constructing an educational concept map with multiple relationships from multi-source data, as shown in FIG. 1 , which mainly includes the following steps:

步骤11、爬取多源数据,使用数据挖掘方法,提取出概念文本,构成训练数据集。Step 11. Crawling multi-source data, using data mining methods, extracting conceptual texts to form a training data set.

本发明实施例中,所爬取的多源数据至少包括:相关学科的课本数据与历史答题信息、以及相应的维基百科中的相关数据。In the embodiment of the present invention, the crawled multi-source data includes at least: textbook data and historical answer information of related subjects, and related data in corresponding Wikipedia.

1)相关学科的课本数据包含了n本相同学科的电子课本,表示为:S={B1,…,Bx,…,Bn},其中Bx表示第x本电子课本;对于每一电子课本B,其包含H个子章节,表示为B={C1,…,Ch,…,CH},其中Ch表示第h个子章节;对于每一子章节包含标题CT以及Y个句子,表示为C={ct,s1,…,sy,…,sY},其中,sy表示子章节C的第y个句子。1) The textbook data of related subjects includes n electronic textbooks of the same subject, expressed as: S={B 1 ,…,B x ,…,B n }, where B x represents the xth electronic textbook; for each Electronic textbook B, which contains H sub-chapters, expressed as B= { C 1 , . , expressed as C={ct,s 1 ,...,s y ,...,s Y }, where s y represents the yth sentence of subsection C.

示例性的,电子课本可以通过互联网下载,再通过OCR工具将下载的课本数据(小学、初中和高中的电子课本)转换为txt格式。Exemplarily, the electronic textbooks can be downloaded through the Internet, and then the downloaded textbook data (electronic textbooks of elementary, middle and high schools) can be converted into txt format through an OCR tool.

2)试题答题记录包括:学生答题分数、答题时间以及题目信息;一个试题答题记录是一个五元组(u,q,suq,tuq,conq),其中,u∈U表示学生,U为学生集合;q∈Q表示试题,Q为试题集合;suq表示答题分数;tuq表示答题时间;conq表示试题文本,包含试题内容

Figure BDA0002430745790000032
以及题目解析
Figure BDA0002430745790000031
2) The answer record of the test question includes: the student's answer score, answer time and question information; a test question answer record is a quintuple (u,q,s uq ,t uq ,con q ), where u∈Urepresents the student,U is the set of students; q∈Q represents the test question, Q is the test question set; s uq represents the answer score; t uq represents the answer time; con q represents the test question text, including the test question content
Figure BDA0002430745790000032
and topic analysis
Figure BDA0002430745790000031

示例性的,每一学生的试题答题记录可以从在线学习平台智学网获得。Exemplarily, each student's test answer record can be obtained from the online learning platform Zhixue.com.

3)维基百科中的相关数据对应了M个页面,表示为P={p1,…,pm,…,pM},其中pm表示第m个页面,每个页面p包含了标题pt、摘要pabs以及页面内容,表示为p=(pt,pabs,pcon)。3) The relevant data in Wikipedia corresponds to M pages, expressed as P={p 1 ,...,p m ,...,p M }, where p m represents the mth page, and each page p contains the title p t , abstract p abs and page content, expressed as p=(p t , p abs , p con ).

通过分词工具对数据集中的文本内容进行分词,之后将分词内容与百科标题进行匹配,从而提取出不同的概念文本,构成概念集合,从概念集合中随机挑选指定数目的概念(具体数目可以根据实际需要来设定),构成训练数据集。The text content in the data set is segmented by the word segmentation tool, and then the segmented content is matched with the encyclopedia title, so as to extract different concept texts to form a concept set, and randomly select a specified number of concepts from the concept set (the specific number can be based on the actual number of concepts). need to be set), which constitutes the training data set.

本领域技术人员可以理解,概念主要是指数学上通用的概念形式,例如“一元二次方程”、“函数”、“小数”等。Those skilled in the art can understand that a concept mainly refers to a conceptual form commonly used in mathematics, such as "quadratic equation in one variable", "function", "decimal number" and so on.

步骤12、获取专家对训练数据集的标注结果,标注结果包括:根据概念重要程度为各个概念标注的教育关键概念或非教育关键概念的标签,以及教育关键概念对之间的先决条件关系和共同学习关系;按照概念的来源以及概念的标签,提取概念以及概念之间的相关特征。Step 12: Obtain the labeling results of the experts on the training data set. The labeling results include: labels of educational key concepts or non-educational key concepts annotated for each concept according to the importance of the concepts, as well as the prerequisite relationship and common educational key concept pairs. Learn relationships; extract concepts and related features between concepts according to their origins and their labels.

本发明实施例中,以概念的重要程度为指标来衡量一个概念是教育关键概念或非教育关键概念,重要程度可以多种常规方式来确定,例如,可以通过概念在数学教材标题中出现的次数来判别,如果出现此处超过规定数值,则认为其重要程度较高,属于教育关键概念;例如,前文提到的“小数”等,还可以由专家根据经验来确定。In this embodiment of the present invention, the importance of a concept is used as an indicator to measure whether a concept is an educational key concept or a non-educational key concept, and the importance can be determined in various conventional ways, for example, the number of times the concept appears in the title of a mathematics textbook. If it exceeds the specified value, it is considered to be of high importance and belongs to the key concept of education; for example, the “decimal” mentioned above can also be determined by experts based on experience.

本发明实施例中,通过多源数据集的特点,根据概念的数据来源,分别提取以下特征:In the embodiment of the present invention, the following features are extracted respectively according to the characteristics of the multi-source data set and according to the data source of the concept:

(1)对于每一数据源的概念语义相似度特征,包括:标题匹配特征,用来表示概念是否出现在标题中;概念匹配特征,用来表示概念对之间的关系;词表征相似度,用来表示概念对在向量空间的相似性与距离。(1) Concept semantic similarity features for each data source, including: title matching feature, which is used to indicate whether the concept appears in the title; concept matching feature, which is used to indicate the relationship between concept pairs; word representation similarity, It is used to represent the similarity and distance of concept pairs in vector space.

(2)维基百科链接特征,包括:概念对在维基百科页面中的出入度、概念对的公共邻居程度、维基百科摘要定义、归一化的谷歌页面距离以及引用距离。(2) Wikipedia link features, including: concept pairs in and out of Wikipedia pages, degree of common neighbors of concept pairs, definition of Wikipedia abstracts, normalized Google page distance, and citation distance.

(3)课本结构化特征以及概念共现程度,其中,课本结构化特征包括:目录结构化特征以及课本间结构化特征,概念共现程度用来表示一个概念对在一个句子中同时出现的次数。(3) Textbook structuring features and concept co-occurrence degree, among which, textbook structuring features include: catalogue structuring features and inter-textbook structuring features, and concept co-occurrence degree is used to represent the number of times a concept pair appears simultaneously in a sentence .

(4)试题答题记录特征,包括:概念频率特征、概念难度距离、试题内容分析距离以及学生答题记录特征。(4) The characteristics of test questions and answer records, including: concept frequency characteristics, concept difficulty distance, test content analysis distance, and students' answer record characteristics.

以上各项特征中,标题匹配特征、概念频率特征以及概念对在维基百科页面中的出入度是针对单个概念而言,因而无需区分概念是否是教育关键概念,而其余特征是针对概念对而言,因此,只针对教育关键概念对进行提取(同样考虑数据来源);为了便于说明,下面统一使用wi,wj来表示训练数据集中的概念,不区分数据来源,也不区分对应的标签。Among the above features, the title matching feature, concept frequency feature, and the degree of discrepancy of concept pairs in Wikipedia pages are for a single concept, so there is no need to distinguish whether a concept is a key concept of education, and the remaining features are for concept pairs. , therefore, only the key concepts of education are extracted (data sources are also considered); for ease of illustration, w i and w j are uniformly used below to represent the concepts in the training data set, without distinguishing data sources and corresponding labels.

下面针对每一类型的特征做详细的介绍。The features of each type are described in detail below.

1、概念语义相似度特征。1. Conceptual semantic similarity features.

1)标题匹配特征。1) Title matching feature.

标题是对分章内容的总结,指出了分章的要点。如果一个概念出现在标题中,它很可能是一个关键的概念。标题匹配特征表示为:The title is a summary of the content of the sub-chapter and points out the main points of the sub-chapter. If a concept appears in the title, it is likely to be a key concept. The title matching feature is expressed as:

TM(wi,ct)∈{0,1}TM( wi ,ct)∈{0,1}

其中,ct∈{CT,pt,q′},q′表示试题q的标题,wi表示一个概念,当概念wi出现在相应的标题中,则TM(wi,ct)=1;否则,TM(wi,ct)=0。Among them, ct∈{CT,p t ,q′}, q′ represents the title of the test question q, wi represents a concept, when the concept wi appears in the corresponding title, then TM( wi ,ct)=1; Otherwise, TM( wi ,ct)=0.

2)概念匹配特征。2) Concept matching features.

给定一个概念对<wi,wj>,如果概念wi出现在概念wj中,则wi更有可能与wj存在先决条件关系。概念匹配特征表示为:Given a concept pair < wi , wj >, if concept wi appears in concept wj , then wi is more likely to have a preconditional relationship with wj . The concept matching feature is expressed as:

Figure BDA0002430745790000051
Figure BDA0002430745790000051

其中,||.||表示数目统计符号;Among them, ||.|| represents the number statistical symbol;

3)概念共现程度。3) The degree of concept co-occurrence.

4)词表征相似度。4) Word representation similarity.

词表征相似度包括:余弦相似度WEcs(wi,wj)以及欧几里得距离WEed(wi,wj):The word representation similarity includes: cosine similarity WEcs( wi , wj ) and Euclidean distance WEed( wi , wj ):

余弦相似度WEcs(wi,wj)反映了概念对(wi,wj)之间的语义关联,表示为:Cosine similarity WEcs( wi , wj ) reflects the semantic association between concept pairs ( wi , wj ), which is expressed as:

Figure BDA0002430745790000052
Figure BDA0002430745790000052

欧几里得距离WEed(wi,wj)表示向量空间中概念对(wi,wj)的欧氏距离,表示为:The Euclidean distance WEed( wi , wj ) represents the Euclidean distance of the concept pair ( wi , wj ) in the vector space, and is expressed as:

Figure BDA0002430745790000053
Figure BDA0002430745790000053

其中,

Figure BDA0002430745790000056
分别表示概念wi、wj的词向量,k为向量中元素的序号,P为向量长度。in,
Figure BDA0002430745790000056
Represent the word vectors of concepts w i and w j respectively, k is the serial number of the elements in the vector, and P is the length of the vector.

2、维基百科链接特征。2. Wikipedia link feature.

1)概念对在维基百科页面中的出入度。1) The in- and out-degree of concept pairs in Wikipedia pages.

通过维基百科页面计算概念的出入度,将概念对(wi,wj)的出入度分别定义为:IN(wi)、OUT(wi)、IN(wj)、OUT(wj)。Calculate the in-out degree of the concept through the Wikipedia page, and define the in-out degree of the concept pair ( wi , w j ) as: IN( wi ), OUT( wi ), IN(w j ), OUT(w j ) .

2)概念对的公共邻居程度。2) The degree of common neighbors of concept pairs.

概念对的公共邻居程度:对于概念对(wi,wj),概念对(wi,wj)的公共邻居越多,则概念对(wi,wj)的语义相似度越高,表示为:The degree of common neighbors of the concept pair: for the concept pair ( wi , wj ), the more common neighbors of the concept pair ( wi , wj ), the higher the semantic similarity of the concept pair ( wi , wj ), Expressed as:

Figure BDA0002430745790000054
Figure BDA0002430745790000054

3)维基百科摘要定义。3) Wikipedia abstract definition.

维基百科摘要定义:如果概念wi在概念wj的摘要定义中,那么概念wi为概念wj的先序概念,表示为:Wikipedia abstract definition: If concept wi is in the abstract definition of concept w j , then concept wi is a pre-order concept of concept w j , expressed as:

Figure BDA0002430745790000055
Figure BDA0002430745790000055

4)归一化的谷歌页面距离。4) Normalized Google page distance.

归一化的谷歌页面距离:通过对谷歌网页中概念之间的超链接,得到概念之间的关联程度,表示为:Normalized Google page distance: Through the hyperlinks between concepts in Google pages, the degree of association between concepts is obtained, which is expressed as:

Figure BDA0002430745790000061
Figure BDA0002430745790000061

5)引用距离。5) Reference distance.

引用距离:如果与wi最关联的概念都指向wj,那么wi更有可能是wj的先序概念,表示为:Reference distance: If the concepts most associated with wi all point to w j , then wi is more likely to be a preorder concept of w j , expressed as:

Figure BDA0002430745790000062
Figure BDA0002430745790000062

其中,O1表示概念wi所在维基百科页面中其他概念的数目,O2表示概念wi所在维基百科页面中其他概念被概念wj所在维基百科页面中其他概念所链接的数目,O3表示概念wj所在维基百科页面中其他概念的数目,O4表示概念wj所在维基百科页面中其他概念被概念wi所在维基百科页面中其他概念所链接的数目;

Figure BDA0002430745790000068
Figure BDA0002430745790000069
均表示维基百科中相应页面的概念;
Figure BDA0002430745790000067
表示概念
Figure BDA00024307457900000615
是否指向概念wi所在维基百科页面,1表示指向,0表示未指向;
Figure BDA0002430745790000066
表示概念
Figure BDA00024307457900000614
在概念wj所在维基百科页面的重要程度,
Figure BDA00024307457900000610
表示概念
Figure BDA00024307457900000613
是否指向概念wi所在维基百科页面;
Figure BDA00024307457900000612
表示概念
Figure BDA00024307457900000611
在概念wi所在维基百科页面的重要程度,
Figure BDA0002430745790000065
表示概念
Figure BDA00024307457900000616
是否指向概念wj所在维基百科页面。Among them, O 1 represents the number of other concepts in the Wikipedia page where the concept wi is located, O 2 represents the number of other concepts in the Wikipedia page where the concept wi is located by other concepts on the Wikipedia page where the concept w j is located, and O 3 represents the number of links The number of other concepts in the Wikipedia page where the concept w j is located, O 4 represents the number of other concepts linked by other concepts in the Wikipedia page where the concept w j is located;
Figure BDA0002430745790000068
and
Figure BDA0002430745790000069
Both represent the concept of the corresponding page in Wikipedia;
Figure BDA0002430745790000067
represent concepts
Figure BDA00024307457900000615
Whether it points to the Wikipedia page where the concept w i is located, 1 means pointing, 0 means not pointing;
Figure BDA0002430745790000066
represent concepts
Figure BDA00024307457900000614
In terms of the importance of the Wikipedia page where the concept w j is located,
Figure BDA00024307457900000610
represent concepts
Figure BDA00024307457900000613
Whether to point to the Wikipedia page where the concept w i is located;
Figure BDA00024307457900000612
represent concepts
Figure BDA00024307457900000611
In terms of the importance of the Wikipedia page where the concept w i is located,
Figure BDA0002430745790000065
represent concepts
Figure BDA00024307457900000616
Whether to point to the Wikipedia page where the concept w j is located.

3、课本结构化特征以及概念共现程度。3. The structural characteristics of textbooks and the degree of concept co-occurrence.

概念共现程度用来表示一个概念对在一个句子中同时出现的次数,计算公式如下:The degree of concept co-occurrence is used to represent the number of times a concept pair occurs simultaneously in a sentence. The calculation formula is as follows:

Figure BDA0002430745790000063
Figure BDA0002430745790000063

其中,r(s,wi)∈{0,1}表示概念wi是否出现在句子s中,若出现在句子s中,则取值为1,否则,取值为0。r(s,wj)的含义也是如此。Among them, r(s, wi )∈{0,1} indicates whether the concept wi appears in the sentence s, if it appears in the sentence s, the value is 1, otherwise, the value is 0. The same is true for the meaning of r(s,w j ).

课本目录(TOC)和教材结构表明了概念之间的内在联系,因为教师的课程规划是基于这些信息。定义了两种教科书的层次结构特征,包括目录化结构特征和课本间结构化特征,以帮助推断概念之间的关系。The Textbook Table of Contents (TOC) and textbook structure demonstrate the interconnectedness of concepts as teachers' lesson planning is based on this information. The hierarchical structure features of two textbooks are defined, including the catalogued structure feature and the inter-textbook structure feature, to help infer the relationship between concepts.

1)目录结构化特征。子章节C中概念对(wi,wj)的关系,表示为:1) Directory structure features. The relation of concept pair ( wi , w j ) in subsection C, expressed as:

Figure BDA0002430745790000064
Figure BDA0002430745790000064

其中,|B|表示课本的数量,|S|表示书本的数量,f(wi,C)是指包含有概念wi的子章节C的数目,最终得到的结果是一个数目;同理,f(wj,C)表示包含有概念wj的子章节C的数目。Among them, |B| represents the number of textbooks, |S| represents the number of books, f( wi ,C) refers to the number of sub-chapters C that contain the concept wi , and the final result is a number; f(w j ,C) denotes the number of subsections C that contain concept w j .

2)课本间结构化特征。2) Structural features between textbooks.

与目录结构化特征类似的,课本间结构化特征,体现了课本中概念对(wi,wj)的关系,表示为:Similar to the structured feature of the catalog, the structured feature between textbooks reflects the relationship between the concept pairs ( wi , w j ) in the textbook, and is expressed as:

Figure BDA0002430745790000071
Figure BDA0002430745790000071

其中,f(wi,B)是指包含有概念wi的课本B的数目。Among them, f( wi ,B) refers to the number of textbooks B that contain the concept wi .

4、试题答题记录特征。4. Recording features of test questions and answers.

1)概念频率特征。1) Concept frequency features.

如果概念wi经常被试题内容提到,那么wi更有可能是一个关键的概念。在此假设的基础上,可以通过该特征来提取关键概念。If the concept wi is frequently mentioned in the test item content, then wi is more likely to be a key concept. On the basis of this assumption, key concepts can be extracted by this feature.

概念频率特征定义为概念wi的出现频率,表示为:The concept frequency feature is defined as the occurrence frequency of the concept wi , which is expressed as:

Figure BDA0002430745790000072
Figure BDA0002430745790000072

其中,

Figure BDA0002430745790000076
是试题内容中出现的概念wi的次数。in,
Figure BDA0002430745790000076
is the number of times the concept wi appears in the test content.

2)概念难度距离。2) Concept difficulty distance.

概念难度距离表示包含概念wi试题的平均难度与包含概念wj试题的平均难度的距离,表示为:The concept difficulty distance represents the distance between the average difficulty of the questions containing the concept w i and the average difficulty of the questions containing the concept w j , expressed as:

CDD(wi,wj)=CD(wi)-CD(wj)CDD( wi , wj )=CD( wi )-CD( wj )

其中,CD(wi)、CD(wj)表示概念wi、wj的平均难度;一般来说,试题难度是指答对试题的学生所占的比例,概念wi的平均概念难度CD(wi)是包含概念wi的所试题的平均难度,CD(wi)的计算公式如下:Among them, CD( wi ) and CD( wj ) represent the average difficulty of concepts wi and wj ; generally speaking, the difficulty of test questions refers to the proportion of students who answered the questions correctly, and the average conceptual difficulty of concept wi , CD( w i ) is the average difficulty of the tested questions including concept w i , CD( wi ) is calculated as follows:

Figure BDA0002430745790000073
Figure BDA0002430745790000073

其中,

Figure BDA0002430745790000074
表示试题内容
Figure BDA0002430745790000075
中概念wi出现的次数,反映了试题q中概念wi的重要程度;difq为试题q的难度;L表示试题集合Q中包含概念wi的试题集合,|L|表示L的数目。in,
Figure BDA0002430745790000074
Indicates the content of the test
Figure BDA0002430745790000075
The number of times that the concept wi appears in the test question q reflects the importance of the concept wi in the test question q; dif q is the difficulty of the test question q; L represents the test question set that contains the concept wi i in the test question set Q, and |L| represents the number of L.

同理,CD(wj)也是类似计算方式,区别仅在将下标i更换为j。Similarly, CD(w j ) is also calculated in a similar way, except that the subscript i is replaced by j.

3)试题内容分析距离:一般试题内容出现的概念会在试题分析出现的概念之后学,基于这种特性,使用试题内容分析距离来衡量两个概念的先后序关系。3) Analysis distance of test content: Generally, the concepts that appear in the content of the test will be learned after the concepts that appear in the analysis of the test question. Based on this feature, the analysis distance of the content of the test is used to measure the sequence relationship between the two concepts.

试题内容分析距离,计算公式为:The content analysis distance of the test questions, the calculation formula is:

Qcad(wi,wj)=Qcaw(wj,wi)-Qcaw(wi,wj)Qcad( wi , wj )=Qcaw( wj , wi )-Qcaw( wi , wj )

其中:in:

Figure BDA0002430745790000081
Figure BDA0002430745790000081

Figure BDA0002430745790000082
Figure BDA0002430745790000082

其中,

Figure BDA0002430745790000086
表示试题内容
Figure BDA0002430745790000087
中概念wj出现的次数;
Figure BDA0002430745790000085
表示概念wj是否出现在试题分析
Figure BDA0002430745790000088
中,
Figure BDA0002430745790000089
表示概念wi是否出现在试题分析
Figure BDA00024307457900000810
中,出现取值为1,否则取值为0;当然,如果wi(或者wj)出现在试题内容中,而wj(或者wi)出现在试题分析中,那么Qcaw(wi,wj)(Qcaw(wj,wi))就会变大,这符合实际的情况。in,
Figure BDA0002430745790000086
Indicates the content of the test
Figure BDA0002430745790000087
The number of occurrences of the concept w j in ;
Figure BDA0002430745790000085
Indicates whether the concept w j appears in the test item analysis
Figure BDA0002430745790000088
middle,
Figure BDA0002430745790000089
Indicates whether the concept wi appears in the test item analysis
Figure BDA00024307457900000810
In , the value of occurrence is 1, otherwise the value is 0; of course, if wi (or w j ) appears in the content of the test question, and w j (or wi ) appears in the analysis of the test question, then Qcaw( wi , w j )(Qcaw(w j , wi )) will become larger, which is in line with the actual situation.

4)学生答题记录特征。4) The characteristics of students' answering records.

定义学生u的试题集合为Q,将I(Q;wi)定义为试题集合Q中包含概念wi的试题索引,I(Q;wj)为试题集合Q中包含概念wj的试题索引。例如,wi出现在试题集合Q第一个和第三个试题中,则I(Q;wi)∈{1,3}。假设wj是wi的先序概念,在学生u的答案序列中,如果学生答错了包含概念wi的试题,那么学生u更有可能回答错包含概念wj的试题。基于这一观察,对于给定的概念对<wi,wj>,定义S(Q)={(i1,j1)│i1∈I(Q;wi),j1∈I(Q;wj),i1<j1},学生答题记录特征如下:Define the test item set of student u as Q, define I(Q; w i ) as the test item index containing the concept wi in the test item set Q, and I(Q; w j ) as the test item index in the test item set Q containing the concept w j . For example, if wi appears in the first and third question of the question set Q, then I(Q; wi )∈{1,3}. Assuming w j is a pre-order concept of wi , in student u's answer sequence, if the student answers the question containing the concept wi incorrectly, then the student u is more likely to answer the question containing the concept w j incorrectly . Based on this observation, for a given concept pair < wi ,w j >, define S(Q)={(i 1 ,j 1 )│i 1 ∈I(Q; wi ),j 1 ∈I( Q; w j ), i 1 <j 1 }, the characteristics of students' answering records are as follows:

Figure BDA0002430745790000083
Figure BDA0002430745790000083

其中,

Figure BDA0002430745790000084
分别为学生u在试题i1、试题j1上的得分,U为学生集合,|U|表示U的数目。in,
Figure BDA0002430745790000084
are the scores of student u on question i 1 and question j 1 respectively, U is the set of students, and |U| represents the number of U.

步骤13、利用标注后的训练数据集结合传统机器学习方法,训练用于预测教育关键概念的支持向量机,以及基于训练数据集中标注出的教育关键概念及教育关键概念对之间的先决条件关系和共同学习关系,结合传统机器学习方法,训练用于预测教育关键概念对的先决条件关系和共同学习关系的混合模型。Step 13: Using the labeled training data set combined with traditional machine learning methods to train a support vector machine for predicting key educational concepts, as well as the educational key concepts marked in the training data set and the prerequisite relationship between the pairs of educational key concepts and co-learning relationships, combined with traditional machine learning methods, to train a hybrid model for predicting prerequisite and co-learning relationships for pairs of educational key concepts.

由于概念图构建中缺少大规模标签数据集,本发明实施例中,基于传统机器学习方法训练三个二元分类器;使用第一个分类器(即支持向量机)结合标题匹配特征、概念频率特征以及概念对在维基百科页面中的出入度,来抽取教育关键概念集合C’;将另外两个二元分类器作为混合模型,在得到教育关键概念集合C’的基础上,预测教育关键概念集合C’中关键概念对(wi′,wj′)之间的先决条件关系和共同学习关系,训练阶段的优选实施方式如下:Due to the lack of large-scale label data sets in the concept map construction, in the embodiment of the present invention, three binary classifiers are trained based on the traditional machine learning method; The degree of feature and concept pair in the Wikipedia page is used to extract the key educational concept set C'; the other two binary classifiers are used as mixed models to predict the key educational concepts on the basis of obtaining the key educational concept set C'. The precondition relationship and the common learning relationship between the key concept pairs (wi ' ,w j' ) in the set C', the preferred implementation of the training phase is as follows:

1)训练支持向量机。1) Train a support vector machine.

利用标注后的训练数据集,根据各个概念的标签,以及之前提取的概念特征,即标题匹配特征、以及根据概念对来源提取的概念频率特征、和/或概念对在维基百科页面中的出入度,对支持向量机进行训练,获得支持向量机的完整参数W1,以及第一阈值K*;训练的目标是最小化预测标签

Figure BDA0002430745790000091
与实际标签Xi间的误差:Using the labeled training data set, according to the labels of each concept, as well as previously extracted concept features, that is, title matching features, and concept frequency features extracted according to the source of concept pairs, and/or concept pairs in Wikipedia pages. , train the support vector machine to obtain the complete parameter W 1 of the support vector machine, and the first threshold K * ; the training goal is to minimize the predicted label
Figure BDA0002430745790000091
Error from actual label Xi :

Figure BDA0002430745790000092
Figure BDA0002430745790000092

其中,M1表示训练数据集中概念的数目,

Figure BDA0002430745790000093
表示支持向量机预测到的第i个概念的标签(即概念为教育关键概念或非教育关键概念),
Figure BDA0002430745790000099
为第i个概念的相关特征,
Figure BDA0002430745790000094
为对于第i个概念的参数,角标T为矩阵转置符号,M1个参数
Figure BDA0002430745790000095
构成支持向量机的完整参数W1;Xi表示专家为第i个概念标注的标签(即实际标签);λ1||W1||2是正则化项,λ1是手动调节的参数。where M1 represents the number of concepts in the training dataset,
Figure BDA0002430745790000093
represents the label of the ith concept predicted by the support vector machine (that is, the concept is an educational key concept or a non-educational key concept),
Figure BDA0002430745790000099
is the relevant feature of the i-th concept,
Figure BDA0002430745790000094
is the parameter for the i-th concept, the index T is the matrix transpose symbol, and M is 1 parameter
Figure BDA0002430745790000095
The complete parameter W 1 constituting the support vector machine; X i represents the label (ie, the actual label) annotated by experts for the ith concept; λ 1 ||W 1 || 2 is the regularization term, and λ 1 is a manually adjusted parameter.

2)训练用于预测先决条件关系的二分类器。2) Train a binary classifier for predicting precondition relations.

关键概念对(wi′,wj′)之间的先决条件关系通过概念匹配特征、词表征相似度、概念难度距离、试题内容分析距离、学生答题记录特征、目录结构化特征、课本间结构化特征、概念对的公共邻居程度、维基百科摘要定义、归一化的谷歌页面距离以及引用距离来预测。The prerequisite relationship between key concept pairs (wi ' , w j' ) is determined by concept matching feature, word representation similarity, concept difficulty distance, test content analysis distance, student answering record feature, catalog structuring feature, and inter-textbook structure. The predictions are based on normalized features, degree of common neighbors of concept pairs, Wikipedia abstract definitions, normalized Google page distances, and citation distances.

训练阶段,根据训练数据集中概念的标签选出其中的教育关键概念,利用专家标注的教育关键概念对之间的先决条件关系,结合教育关键概念对之间的概念匹配特征与词表征相似度,以及根据概念对来源提取的概念难度距离、试题内容分析距离与学生答题记录特征,目录结构化特征与课本间结构化特征,和/或概念对的公共邻居程度、维基百科摘要定义、归一化的谷歌页面距离与引用距离,来训练用于预测先决条件关系的二分类器,获得二分类器的完整参数W2及第二阈值P1;训练的目标是最小化预测标签

Figure BDA0002430745790000096
与实际标签X′l之间的误差:In the training phase, the key educational concepts in the training dataset are selected according to the labels of the concepts in the training data set, and the prerequisite relationship between the key educational concept pairs marked by experts is used, combined with the concept matching feature and the similarity of the word representation between the key educational concept pairs, As well as the concept difficulty distance extracted from the source of the concept pair, the analysis distance of the test question content and the characteristics of the students' answering records, the structured characteristics of the catalog and the structured characteristics of the textbooks, and/or the degree of common neighbors of the concept pairs, Wikipedia abstract definition, normalization The Google page distance and citation distance are used to train a binary classifier for predicting preconditional relations, obtaining the full parameters W 2 of the binary classifier and the second threshold P 1 ; the goal of training is to minimize the predicted labels
Figure BDA0002430745790000096
Error from actual label X' l :

Figure BDA0002430745790000097
Figure BDA0002430745790000097

其中,M2表示教育关键概念对的数目,

Figure BDA0002430745790000098
表示对于二分类器预测到的第l个教育关键概念对的标签,即教育关键概念对是否存在先决条件关系,
Figure BDA0002430745790000104
为第l个教育关键概念对的相关特征,W2 l为对于第l个教育关键概念对的参数,M2和参数W2 l构成了二分类器的完整参数W2;X′l表示专家为第l个教育关键概念对标注的先决条件关系(即实际标签),λ2||W2||2是正则化项,λ2是手动调节的参数。where M2 represents the number of educational key concept pairs,
Figure BDA0002430745790000098
represents the label of the l-th educational key concept pair predicted by the binary classifier, that is, whether there is a prerequisite relationship for the educational key concept pair,
Figure BDA0002430745790000104
is the relevant feature of the l-th educational key concept pair, W 2 l is the parameter for the l-th educational key concept pair, M 2 and parameter W 2 l constitute the complete parameter W 2 of the binary classifier; X′ l represents the expert λ 2 ||W 2 || 2 is the regularization term, and λ 2 is a manually tuned parameter.

3)训练用于预测共同学习关系的二分类器。3) Train a binary classifier for predicting co-learned relations.

如果概念对(wi,wj)具有共同学习关系,则它应具有以下属性:If a concept pair ( wi , wj ) has a common learning relationship, it should have the following properties:

语义相似性:它们共享相同的语义信息;Semantic similarity: they share the same semantic information;

共现:它们可能出现在同一个句子中;Co-occurrence: they may appear in the same sentence;

概念匹配:它们可能包含常用词;Concept matches: they may contain common words;

类似的难度:包含wi的问题A和包含wj的问题B可能具有相同的难度;Similar difficulty: Problem A containing w i and problem B containing w j may have the same difficulty;

类似的邻居:他们可能在维基百科链接中共享相同的邻居;Similar neighbors: they may share the same neighbors in the Wikipedia link;

共享定义:wi可能出现在wj的定义中,反之亦然。Shared definitions: w i may appear in the definition of w j and vice versa.

基于这些假设,教育关键概念对(wi′,wj′)之间的共同学习关系通过概念匹配特征、词表征相似度、概念共现程度、概念难度距离、概念对的公共邻居程度以及维基百科摘要定义来预测。Based on these assumptions, the common learning relationship between educational key concept pairs (wi ,wj ) is determined by concept matching features, word representation similarity, concept co-occurrence degree, concept difficulty distance, degree of common neighbors of concept pairs, and wiki Encyclopedia abstract definition to predict.

训练阶段,根据训练数据集中概念的标签选出其中的教育关键概念,利用专家标注的教育关键概念对之间的共同学习关系,结合教育关键概念对之间的概念匹配特征与词表征相似度,以及根据概念对来源提取的概念共现程度,概念难度距离,和/或概念对的公共邻居程度以及维基百科摘要定义,来训练二分类器,获得用于预测共同学习关系的二分类器的完整参数W3及第二阈值P3;训练的目标是最小化预测标签

Figure BDA0002430745790000101
与实际标签X″l之间的误差:In the training phase, the key educational concepts in the training dataset are selected according to the labels of the concepts in the training data set, and the common learning relationship between the key educational concept pairs marked by experts is used, combined with the concept matching feature and the similarity of the word representation between the key educational concept pairs. As well as training a binary classifier according to the concept co-occurrence degree extracted from the source of the concept pair, the concept difficulty distance, and/or the degree of common neighbors of the concept pair and the Wikipedia abstract definition, to obtain a complete binary classifier for predicting jointly learned relations. parameter W 3 and second threshold P 3 ; the goal of training is to minimize predicted labels
Figure BDA0002430745790000101
Error from actual label X″ l :

Figure BDA0002430745790000102
Figure BDA0002430745790000102

其中,M2表示教育关键概念对的数目,

Figure BDA0002430745790000103
表示对于二分类器预测到的第l个教育关键概念对的标签,即教育关键概念对是否存在共同学习关系,
Figure BDA0002430745790000105
为第l个教育关键概念对的相关特征,W3 l为对于第l个教育关键概念对的参数,M2和参数W3 l构成了二分类器的完整参数W3;X″l表示专家为第l个教育关键概念对标注的共同学习关系(即实际标签),λ3||W3||2是正则化项,λ3是手动调节的参数。where M2 represents the number of educational key concept pairs,
Figure BDA0002430745790000103
represents the label of the l-th educational key concept pair predicted by the binary classifier, that is, whether there is a common learning relationship between the educational key concept pair,
Figure BDA0002430745790000105
is the relevant feature of the lth educational key concept pair, W 3 l is the parameter for the l th educational key concept pair, M 2 and parameter W 3 l constitute the complete parameter W 3 of the binary classifier; X″ l represents the expert is the common learning relationship (ie the actual label) of the labels for the l-th educational key concept, λ 3 ||W 3 || 2 is the regularization term, and λ 3 is a manually adjusted parameter.

本发明实施例中,第一阈值K*的数值可以根据需要做适当调整;例如,想要筛选出较多教育关键概念时,可以适当降低第一阈值K*的数值;反之,可以适当增加第一阈值K*的数值。In this embodiment of the present invention, the value of the first threshold K * can be appropriately adjusted as needed; for example, when you want to filter out more key concepts of education, the value of the first threshold K* can be appropriately reduced; otherwise, the value of the first threshold K * can be appropriately increased A threshold value of K * .

本领域技术人员可以理解,概念对的各项特征是根据其所在数据源的相关信息来计算的,因此,此处提到的概念对主要是指相同数据源中的两个概念。在大多数情况下,相同的一个概念对,在三个数据源都存在,也就是说,一个相同内容的概念对,可以根据三个数据源中的相关信息计算出步骤12所提到的四类特征;但是,还考虑概念对只出现在一个或者两数据源的情况,此时,一个相同内容的概念对,只能够提取出步骤12所提到的两类或者三类特征,因此,上述训练过程中,根据概念对来源提取的特征之间使用了“和/或”的描述形式。Those skilled in the art can understand that each feature of a concept pair is calculated according to the relevant information of the data source where the concept pair is located. Therefore, the concept pair mentioned here mainly refers to two concepts in the same data source. In most cases, the same concept pair exists in all three data sources, that is to say, a concept pair with the same content can be calculated according to the relevant information in the three data sources. However, it also considers the case that the concept pair only appears in one or two data sources. At this time, a concept pair with the same content can only extract the two or three types of features mentioned in step 12. Therefore, the above During the training process, an "and/or" description is used between the features extracted from the source according to the concept.

步骤14、利用训练好的支持向量机与混合模型对新的数据集进行教育概念图的构建。Step 14: Use the trained support vector machine and the hybrid model to construct an educational concept map for the new data set.

对于一个未发布的新数据集,按照步骤11的方式提取出各个概念文本,按照步骤12提取概念与概念之间的相关特征;然后,利用训练好的支持向量机与混合模型的参数及相关阈值,构造概念图G,步骤如下:For an unpublished new data set, extract each concept text according to step 11, and extract related features between concepts according to step 12; then, use the parameters of the trained support vector machine and hybrid model and related thresholds , construct the concept map G, the steps are as follows:

首先,按照步骤11的方式(即基于分词技术),提取各个概念文本,构成概念候选集合R,结合各候选概念的相关特征

Figure BDA00024307457900001110
以及支持向量机的参数W1以及第一阈值K*,抽取关键概念集合C′,表示为:First, according to the method of step 11 (that is, based on the word segmentation technology), extract each concept text, form a concept candidate set R, and combine the relevant features of each candidate concept
Figure BDA00024307457900001110
As well as the parameter W 1 of the support vector machine and the first threshold K * , the key concept set C' is extracted, which is expressed as:

Figure BDA0002430745790000111
Figure BDA0002430745790000111

Figure BDA0002430745790000112
Figure BDA0002430745790000112

其中,相关特征

Figure BDA0002430745790000113
是指第t个概念的特征(与步骤13中的
Figure BDA0002430745790000114
是类似的含义),即标题匹配特征、以及根据概念对来源提取的概念频率特征、或概念对在维基百科页面中的出入度,Among them, the relevant features
Figure BDA0002430745790000113
refers to the feature of the t-th concept (same as in step 13
Figure BDA0002430745790000114
is a similar meaning), that is, the title matching feature, and the concept frequency feature extracted from the source of the concept pair, or the degree of ins and outs of the concept pair in the Wikipedia page,

在得到关键概念集合C′的基础上,根据混合模型的参数W2与W3,以及两个阈值P2与P3,分别预测关键概念对{(wi′,wj′)|wi′,wj′∈C′}之间是否有先决条件关系以及共同学习关系:On the basis of obtaining the key concept set C', according to the parameters W 2 and W 3 of the mixed model, as well as the two thresholds P 2 and P 3 , the key concept pair {(wi ' ,w j' )| wi is predicted respectively , w j′ ∈ C′} whether there is a prerequisite relationship and a common learning relationship:

Figure BDA0002430745790000115
Figure BDA0002430745790000115

Figure BDA0002430745790000116
Figure BDA0002430745790000116

Figure BDA0002430745790000117
Figure BDA0002430745790000117

其中,<wi′,wj′>=0表示概念wi′和概念wj′之间没有先决条件以及共同学习关系,<wi′,wj′>=1表示概念概念wi′和概念wj′之间有先决条件关系,<wi′,wj′>=2表示概念wi′和概念wj′之间有共同学习关系;

Figure BDA0002430745790000118
分别表示关键概念集合C′中第l′个概念对(wi′,wj′)之间的用于预测先决条件关系、共同学习关系的相关特征,与步骤13中的
Figure BDA0002430745790000119
是类似的含义,即
Figure BDA0002430745790000121
包含的特征有:概念匹配特征与词表征相似度,以及根据概念对来源提取的概念难度距离、试题内容分析距离与学生答题记录特征,或者目录结构化特征与课本间结构化特征,或者概念对的公共邻居程度、维基百科摘要定义、归一化的谷歌页面距离与引用距离;
Figure BDA0002430745790000122
包含的特征有:概念匹配特征与词表征相似度,以及根据概念对来源提取的概念共现程度,或者概念难度距离,或者概念对的公共邻居程度以及维基百科摘要定义;以筛选出的关键概念集合C′中的每一教育关键概念作为节点,根据教育关键概念对之间是否存在先决条件关系与共同学习关系,来构造相应节点之间的连接关系,从而构建教育概念图。Among them, <wi ' ,w j' >=0 means that there is no precondition and common learning relationship between concept wi ' and concept w j' , and <wi ' ,w j' >=1 means that concept wi ' There is a prerequisite relationship with the concept w j' , <wi ' , w j' >= 2 means there is a common learning relationship between the concept wi ' and the concept w j' ;
Figure BDA0002430745790000118
Respectively represent the related features between the l'th concept pair (wi ' , w j' ) in the key concept set C' for predicting the prerequisite relationship and the common learning relationship, which are the same as those in step 13.
Figure BDA0002430745790000119
is a similar meaning, i.e.
Figure BDA0002430745790000121
The features included are: the similarity between concept matching features and word representations, as well as the concept difficulty distance extracted from the source of the concept pair, the analysis distance of the test question content and the student answering record feature, or the structure feature of the catalog and the structured feature between textbooks, or the concept pair. The degree of common neighbors, Wikipedia abstract definition, normalized Google page distance and citation distance;
Figure BDA0002430745790000122
The features included are: similarity between concept matching features and word representations, and the degree of concept co-occurrence extracted from the source of the concept pair, or the concept difficulty distance, or the degree of common neighbors of the concept pair, and the definition of the Wikipedia abstract; to filter out the key concepts Each key educational concept in the set C' is used as a node, and the connection relationship between the corresponding nodes is constructed according to whether there is a prerequisite relationship and a common learning relationship between the key educational concept pairs, thereby constructing an educational concept map.

由于未发布的新数据集通常是与学生对应的,因此,在教育概念图可以反应学生的知识掌握情况,将教育概念图与试题进行链接后,根据教育概念图上的信息,可以生成试题推荐列表,并推荐给相应的学生。比如,通过教育概念图上的信息,发现学生对于二次函数这个教育关键概念的理解能够不足,则可以生成相应的试题推荐列表,来测试学生对二次函数的先序概念(一次函数)以及共同学习概念(二次方程)是否理解,通过这种方式可以对学生的能力进行层层排查,最终找到学生不明白的症结,再通过这些症结来实现试题或者学习资源的个性化推荐等。Since unpublished new datasets usually correspond to students, the educational concept map can reflect students' knowledge mastery. After linking the educational concept map with the test questions, test item recommendations can be generated based on the information on the educational concept map. list and recommend to the appropriate student. For example, through the information on the educational concept map, it is found that the students' understanding of the key concept of quadratic function is insufficient, and the corresponding test item recommendation list can be generated to test the students' pre-order concept of quadratic function (linear function) and Whether the common learning concept (quadratic equation) is understood or not, in this way, the students' abilities can be checked layer by layer, and finally the crux of the students' lack of understanding can be found, and then the personalized recommendation of test questions or learning resources can be realized through these cruxes.

本发明实施例上述方案,针对多种不同的数据源,通过不同的数据集特点,提取出不同的特征;在此基础上,对于三大不同的任务,首先基于相关特征对关键概念进行抽取,之后对分别对两种不同的关系:先决条件关系以及共同学习关系进行抽取。通过对多种数据源的利用以及对多种关系的抽取,弥补了现有方法关系单一以及分类效果不理想的问题,从而更加准确的构建了教育概念图。The above scheme of the embodiment of the present invention extracts different features according to different data sources and through different data set features; on this basis, for three different tasks, the key concepts are first extracted based on the relevant features, Afterwards, two different relationships are extracted: the prerequisite relationship and the co-learning relationship. Through the use of multiple data sources and the extraction of multiple relationships, the problems of single relationship and unsatisfactory classification effect in the existing methods are made up, so that the educational concept map can be constructed more accurately.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例可以通过软件实现,也可以借助软件加必要的通用硬件平台的方式来实现。基于这样的理解,上述实施例的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the above embodiments can be implemented by software or by means of software plus a necessary general hardware platform. Based on this understanding, the technical solutions of the above embodiments may be embodied in the form of software products, and the software products may be stored in a non-volatile storage medium (which may be CD-ROM, U disk, mobile hard disk, etc.), including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments of the present invention.

以上所述,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明披露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应该以权利要求书的保护范围为准。The above description is only a preferred embodiment of the present invention, but the protection scope of the present invention is not limited to this. Substitutions should be covered within the protection scope of the present invention. Therefore, the protection scope of the present invention should be based on the protection scope of the claims.

Claims (10)

1. A method for constructing an educational concept graph having multiple relationships from multiple sources of data, comprising:
step 11, crawling multi-source data, extracting concept texts by using a data mining method, and forming a training data set;
step 12, obtaining the labeling result of the expert on the training data set, wherein the labeling result comprises the following steps: labels of education key concepts or non-education key concepts labeled for the respective concepts according to the concept importance degrees, and prerequisite relationships and common learning relationships between pairs of the education key concepts; extracting concepts and related features between the concepts according to the source of the concepts and the labels of the concepts;
step 13, training a support vector machine for predicting the education key concepts by using the labeled training data set in combination with a traditional machine learning method, and training a mixed model for predicting the prerequisite relationships and the common learning relationships of the education key concepts in pairs based on the education key concepts labeled in the training data set and the prerequisite relationships and the common learning relationships between the education key concepts in pairs in combination with the traditional machine learning method;
and 14, constructing an educational concept graph for the new data set by using the trained support vector machine and the mixed model.
2. The method of claim 1, wherein the crawled multi-source data comprises at least: textbook data and historical answer information of related disciplines and related data in corresponding wikipedia; wherein:
textbook data of related disciplines contains n electronic textbooks of the same discipline, represented as: s ═ B1,…,Bx,…,Bn}, wherein BxRepresenting the xth electronic textbook; for each electronic textbook B, which contains H subsections, denoted B ═ C1,…,Ch,…,CH}, wherein ChRepresents the h-th sub-chapter; for each sub-chapter, the title CT and Y sentences, denoted C ═ CT, s1,…,sy,…,sY}, wherein ,syThe y-th sentence representing sub-section C;
the test question answering record comprises: the student answering score, answering time and question information; one test question answer record is a quintuple (u, q, s)uq,tuq,conq) Wherein U ∈ U represents students, U represents student set, Q ∈ Q represents test questions, Q represents test question set, and suqRepresenting an answer score; t is tuqRepresenting the answering time; conqRepresenting test question text, including test question content
Figure FDA0002430745780000011
And topic analysis
Figure FDA0002430745780000012
The relevant data in wikipedia corresponds to M pages, denoted P ═ P1,…,pm,…,pM}, wherein pmRepresenting the m-th page, each page p containing a title ptAbstract pabsAnd page content, denoted as p ═ (p)t,pabs,pcon)。;
The method comprises the steps of segmenting text contents in a data source through a segmentation tool, matching the segmented contents with encyclopedia titles to extract different mathematical concepts to form a concept set, and randomly selecting a specified number of concepts from the concept set to form a training data set.
3. The method of claim 2, wherein the features to be extracted according to the source of the concept comprise: the concept semantic similarity characteristic for each data source comprises the following steps: a title matching feature to indicate whether a concept appears in a title; concept matching features for the relationship between pairs of concepts; the word representation similarity is used for representing the similarity and the distance of the concept pair in a vector space;
wikipedia link features including: the degree of entry and exit of concept pairs in Wikipedia pages, the degree of public neighbourhood of concept pairs, Wikipedia abstract definition, normalized Google page distance and reference distance;
the degree of co-occurrence of textbook structural features and concepts, wherein the textbook structural features include: directory structured features and textbook structured features; the concept co-occurrence degree is used for representing the number of times of a concept pair appearing in a sentence simultaneously;
test question answering record characteristics include: concept frequency characteristics, concept difficulty distance, test question content analysis distance and student answer record characteristics;
the title matching features, the concept frequency features and the entrance and exit degree of the concepts in the Wikipedia page are specific to a single concept, and whether the concepts are education key concepts or not is not distinguished; and the rest of the characteristics are extracted only for the education key concept pairs.
4. The method for constructing an educational concept graph having multiple relationships from multiple data according to claim 3,
the title matching features are expressed as:
TM(wi,ct)∈{0,1}
wherein, CT ∈ { CT, ptQ '}, q' denotes the title of the test question q, wiRepresents a concept when the concept wiAppearing in the corresponding title, TM (w)iCt) ═ 1; otherwise, TM (w)i,xt)=0;
The concept matching features are expressed as:
Figure FDA0002430745780000021
wherein ,(wi,wj) For a conceptual pair, | |. | | represents a number statistics symbol;
the word representation similarity comprises: cosine similarity WEcs (w)i,wj) And Euclidean distance WEed (w)i,wj);
Cosine similarity WEcs (w)i,wj) Reflects the concept pair (w)i,wj) The semantic association between them is expressed as:
Figure FDA0002430745780000022
euclidean distance WEed (w)i,wj) Representing concept pairs (w) in vector spacei,wj) Expressed as:
Figure FDA0002430745780000023
wherein ,
Figure FDA0002430745780000024
respectively represent concepts wi、wjK is the sequence number of the element in the vector, and P is the vector length.
5. The method for constructing an educational concept graph having multiple relationships from multiple data according to claim 3,
concept versus degree of entry and exit in wikipedia pages: will concept pair (w)i,wj) Is defined as IN (w)i)、OUT(wi)、IN(wj)、OUT(wj);
Common neighbor degree of concept pair: for concept pair (w)i,wj) Concept pair (w)i,wj) The more common neighbors there are, thenConcept pair (w)i,wj) The higher the semantic similarity of (a) is, it is expressed as:
Figure FDA0002430745780000031
wikipedia abstract definition: if the concept wiAt concept wjIn the abstract definition of (1), then the concept wiIs a concept wjThe precedence concept of (a) is expressed as:
Figure FDA0002430745780000032
normalized google page distance: through the hyperlink between the concepts in the Google webpage, the association degree between the concepts is obtained and is expressed as:
Figure FDA0002430745780000033
quote distance, expressed as:
Figure FDA0002430745780000034
wherein ,O1Representing a concept wiNumber of other concepts in Wikipedia Page, O2Representing a concept wiOther concepts in the Wikipedia page are conceptualized wjThe number of links to other concepts in the Wikipedia page, O3Representing a concept wjNumber of other concepts in Wikipedia Page, O4Representing a concept wjOther concepts in the Wikipedia page are conceptualized wiThe number of links of other concepts in the Wikipedia page;
Figure FDA0002430745780000035
and
Figure FDA0002430745780000036
all represent the concept of the corresponding page in wikipedia;
Figure FDA0002430745780000037
representing concepts
Figure FDA0002430745780000038
Whether or not to point to concept wiIn the Wikipedia page, 1 indicates pointing, and 0 indicates not pointing;
Figure FDA0002430745780000039
representing concepts
Figure FDA00024307457800000310
At concept wjThe importance of the wikipedia page in which it is located,
Figure FDA00024307457800000311
representing concepts
Figure FDA00024307457800000312
Whether or not to point to concept wiThe Wikipedia page where the user is located;
Figure FDA00024307457800000313
representing concepts
Figure FDA00024307457800000314
At concept wiThe importance of the wikipedia page in which it is located,
Figure FDA00024307457800000315
representing concepts
Figure FDA00024307457800000316
Whether or not to point to concept wjThe wikipedia page.
6. The method for constructing an educational concept graph having multiple relationships from multiple data according to claim 3,
directory structuring feature embodying concept pairs (w) in subsection Ci,wj) Is expressed as:
Figure FDA00024307457800000317
where | B | represents the number of textbooks, | S | represents the number of books, f (w)iC) means that the concept w is includediNumber of sub-sections C, f (w)jC) indicates that the concept w is includedjThe number of sub-sections C;
the structural characteristics between textbooks embody the concept pair (w) in the textbooki,wj) Is expressed as:
Figure FDA0002430745780000041
wherein ,f(wiB) means that the concept w is includediThe number of textbooks B;
the degree of concept co-occurrence is calculated by the following formula:
Figure FDA0002430745780000042
wherein, r (s, w)i) ∈ {0,1} represents the concept wiWhether the sentence appears in the sentence s or not is judged, if the sentence appears in the sentence s, the value is 1, otherwise, the value is 0; r (s, w)j) ∈ {0,1} represents the concept wjAnd whether the sentence appears in the sentence s or not is judged, if so, the value is 1, otherwise, the value is 0.
7. The method for constructing an educational concept graph having multiple relationships from multiple data according to claim 3,
concept frequency signature, representing concept wiIs expressed as:
Figure FDA0002430745780000043
wherein ,
Figure FDA0002430745780000044
is a concept w appearing in the content of the test questioniThe number of times of (c);
concept difficulty distance, representing inclusion of concept wiAverage difficulty of test questions and contained concept wjThe distance of the average difficulty of the test questions is expressed as:
CDD(wi,wj)=CD(wi)-CD(wj)
wherein, CD (w)i)、CD(wj) Representing a concept wi、wjAverage difficulty of; CD (w)i) The calculation formula of (a) is as follows:
Figure FDA0002430745780000045
wherein ,
Figure FDA0002430745780000046
showing the contents of the test questions
Figure FDA0002430745780000047
Concept of middleiThe number of occurrences reflects the concept w in the test question qiThe degree of importance of; difqFor the difficulty of the test question Q, L shows that the concept w is included in the test question set QiThe set of questions, | L | represents the number of L;
analyzing the distance of the test question content, wherein the calculation formula is as follows:
Qcad(wi,wj)=Qcaw(wj,wi)-Qcaw(wi,wj)
wherein :
Figure FDA0002430745780000048
Figure FDA0002430745780000051
wherein ,
Figure FDA0002430745780000052
showing the contents of the test questions
Figure FDA0002430745780000053
Concept of middlejThe number of times of occurrence of the event,
Figure FDA0002430745780000054
representing a concept wjWhether it appears in the analysis of test question
Figure FDA0002430745780000055
In (1),
Figure FDA0002430745780000056
representing a concept wiWhether it appears in the analysis of test question
Figure FDA0002430745780000057
If so, the value is 1, otherwise, the value is 0; to represent
Student answer record characteristics, expressed as:
Figure FDA0002430745780000058
wherein ,
Figure FDA0002430745780000059
test questions i for the student u1Test question j1Score of (i), (ii) s (q) { (i)1,j1)│i1∈I(Q;wi),j1∈I(Q;wj),i1<j1}、I(Q;wi)、I(Q;wj) Each containing a concept w in a test question set Qi、wjU is the student set.
8. The method of claim 3, wherein training the support vector machine for predicting educational key concepts comprises:
training the support vector machine by using the labeled training data set according to the label of each concept and the extracted concept characteristics, namely the title matching characteristics, and the concept frequency characteristics extracted from the concept pair source and/or the entrance and exit degree of the concept pair in the Wikipedia page, and obtaining the complete parameter W of the support vector machine1And a first threshold value K*(ii) a The goal of the training is to minimize the predictive label
Figure FDA00024307457800000510
And the actual label XiError between:
Figure FDA00024307457800000511
wherein ,M1Representing the number of concepts in the training dataset,
Figure FDA00024307457800000512
a label representing the ith concept predicted by the support vector machine,
Figure FDA00024307457800000513
as a relevant feature of the ith concept,
Figure FDA00024307457800000514
for parameters for the ith concept, the corner mark T is the matrix transpose symbol, M1A parameter
Figure FDA00024307457800000515
Complete parameters W forming a support vector machine1;XiA label representing the label of the ith concept labeled by the expert; lambda [ alpha ]1||W1||2Is a regularization term, λ1Is a parameter that is adjusted manually.
9. The method of claim 8, wherein the hybrid model comprises a classifier for predicting prerequisite relationships and a classifier for predicting common learning relationships; wherein:
training a classifier for predicting prerequisite relationships includes:
in the training stage, education key concepts in the training data set are selected according to labels of the concepts in the training data set, the presupposition relationship between expert-labeled education key concept pairs is utilized, the concept matching features and word expression similarity between the education key concept pairs are combined, the concept difficulty distance extracted from the concept pair sources, the analysis distance of test question contents and the student answer record features, the directory structured features and the structured features between textbooks are used for training a binary classifier for predicting the presupposition relationship according to the concept difficulty distance, the test question content analysis distance and the student answer record features of the concept pair sources, and/or the public neighbor degree, the Wikipedia abstract definition, the normalized Google page distance and the reference distance of the concept pairs, and the complete parameter W of the binary classifier is obtained2And a second threshold value P1(ii) a The goal of the training is to minimize the predictive label
Figure FDA0002430745780000061
And actual tag X'lThe error between:
Figure FDA0002430745780000062
wherein ,M2Represents the number of pairs of educational key concepts,
Figure FDA0002430745780000063
a label representing the ith pair of educational key concepts predicted by the second classifier, i.e., whether there is a prerequisite relationship for the pair of educational key concepts,
Figure FDA0002430745780000064
for the relevant characteristics of the first pair of educational key concepts, W2 lAs a parameter for the first pair of educational key concepts, M2And a parameter W2 lComplete parameter W forming a classifier2;X′lDenotes the expert's prerequisite relationship to the annotation for the first educational key concept, λ2||W2||2Is a regularization term, λ2Is a manually adjusted parameter;
the way to train the classifiers for predicting the common learning relationship includes:
in the training stage, the education key concepts in the training data set are selected according to the labels of the concepts in the training data set, the common learning relationship between the education key concept pairs labeled by experts is utilized, the similarity between the concept matching features and word characteristics between the education key concept pairs is combined, the co-occurrence degree of the concepts extracted from the concept pair sources, the concept difficulty distance, the common neighbor degree of the concept pairs and the definition of Wikipedia abstract are combined to train a two-classifier, and the complete parameter W of the two-classifier for predicting the common learning relationship is obtained3And a second threshold value P3(ii) a The goal of the training is to minimize the predictive label
Figure FDA0002430745780000065
And the actual label X ″)lThe error between:
Figure FDA0002430745780000066
wherein ,M2Represents the number of pairs of educational key concepts,
Figure FDA0002430745780000067
a label indicating the ith pair of education key concepts predicted by the second classifier, i.e., whether or not there is a common learning relationship with the pair of education key concepts,
Figure FDA0002430745780000068
for the relevant characteristics of the first pair of educational key concepts, W3 lAs a parameter for the first pair of educational key concepts, M2And a parameter W3 lComplete parameter W forming a classifier3;X″lDenotes the common learning relationship, lambda, of the expert for the first educational key concept pair labels3||W3||2Is a regularization term, λ3Is a parameter that is adjusted manually.
10. The method for building an educational concept graph with multiple relationships from multi-source data according to claim 3, 8 or 9, wherein the building of the educational concept graph for the new data set using the trained support vector machine and the hybrid model comprises:
for a new data set which is not released, extracting each concept text according to the mode of step 11, and extracting the related characteristics between the concepts according to step 12; then, a conceptual diagram G is constructed by using the parameters of the trained support vector machine and the mixed model and the related threshold, and the steps are as follows:
firstly, according to the mode of step 11, extracting each concept text to form concept candidate set R, combining the relevant characteristics of each candidate concept
Figure FDA0002430745780000071
And support vector machine parameters W1And a first threshold value K*Extracting a key concept set C' as follows: (ii) a
Figure FDA0002430745780000072
Figure FDA0002430745780000073
On the basis of obtaining the key concept set C', according to the parameters W of the mixed model2And W3And two thresholds P2And P3Respectively predictMeasure key concept pairs { (w)i′,wj′)|wi′,wj′∈ C' } whether there is a prerequisite relationship and a common learning relationship:
Figure FDA0002430745780000074
Figure FDA0002430745780000075
Figure FDA0002430745780000076
wherein ,<wi′,wj′0 denotes the concept wi′And concept wj′There is no prerequisite and co-learning relationship between, < wi′,wj′1 denotes a concept wi′And concept wj′With a prerequisite relationship between, < wi′,wj′2 denotes the concept wi′And concept wj′Have a common learning relationship;
Figure FDA0002430745780000077
respectively representing the l' th concept pair (w) in the key concept set Ci′,wj′) Relative characteristics used for predicting prerequisite relations and common learning relations;
and constructing a connection relation between corresponding nodes by taking each education key concept in the screened key concept set C' as a node according to whether a prerequisite relation and a common learning relation exist between the education key concept pairs, thereby constructing an education concept graph.
CN202010235272.5A 2020-03-30 2020-03-30 A method for building educational concept maps with multiple relationships from multi-source data Active CN111428052B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010235272.5A CN111428052B (en) 2020-03-30 2020-03-30 A method for building educational concept maps with multiple relationships from multi-source data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010235272.5A CN111428052B (en) 2020-03-30 2020-03-30 A method for building educational concept maps with multiple relationships from multi-source data

Publications (2)

Publication Number Publication Date
CN111428052A true CN111428052A (en) 2020-07-17
CN111428052B CN111428052B (en) 2023-06-16

Family

ID=71549179

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010235272.5A Active CN111428052B (en) 2020-03-30 2020-03-30 A method for building educational concept maps with multiple relationships from multi-source data

Country Status (1)

Country Link
CN (1) CN111428052B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111949843A (en) * 2020-07-21 2020-11-17 江苏海洋大学 An Intelligent Learning Diagnosis Method Based on Concept Map Construction

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040123237A1 (en) * 2002-12-24 2004-06-24 Industrial Technology Research Institute Example-based concept-oriented data extraction method
US7493253B1 (en) * 2002-07-12 2009-02-17 Language And Computing, Inc. Conceptual world representation natural language understanding system and method
US20130138696A1 (en) * 2011-11-30 2013-05-30 The Institute for System Programming of the Russian Academy of Sciences Method to build a document semantic model
US20150056596A1 (en) * 2013-08-20 2015-02-26 Chegg, Inc. Automated Course Deconstruction into Learning Units in Digital Education Platforms
CN106875014A (en) * 2017-03-02 2017-06-20 上海交通大学 The automation of the soft project knowledge base based on semi-supervised learning builds implementation method
US20170242909A1 (en) * 2016-02-24 2017-08-24 Linkedln Corporation Universal concept graph for a social networking service
CN109299282A (en) * 2018-08-16 2019-02-01 山东女子学院 An automatic generation method of concept map based on text analysis and association rule mining
CN109308323A (en) * 2018-12-07 2019-02-05 中国科学院长春光学精密机械与物理研究所 A method, device and device for constructing a causal relationship knowledge base
CN110347894A (en) * 2019-05-31 2019-10-18 平安科技(深圳)有限公司 Knowledge mapping processing method, device, computer equipment and storage medium based on crawler
CN110532328A (en) * 2019-08-26 2019-12-03 哈尔滨工程大学 A kind of text concept figure building method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7493253B1 (en) * 2002-07-12 2009-02-17 Language And Computing, Inc. Conceptual world representation natural language understanding system and method
US20040123237A1 (en) * 2002-12-24 2004-06-24 Industrial Technology Research Institute Example-based concept-oriented data extraction method
US20130138696A1 (en) * 2011-11-30 2013-05-30 The Institute for System Programming of the Russian Academy of Sciences Method to build a document semantic model
US20150056596A1 (en) * 2013-08-20 2015-02-26 Chegg, Inc. Automated Course Deconstruction into Learning Units in Digital Education Platforms
US20170242909A1 (en) * 2016-02-24 2017-08-24 Linkedln Corporation Universal concept graph for a social networking service
CN106875014A (en) * 2017-03-02 2017-06-20 上海交通大学 The automation of the soft project knowledge base based on semi-supervised learning builds implementation method
CN109299282A (en) * 2018-08-16 2019-02-01 山东女子学院 An automatic generation method of concept map based on text analysis and association rule mining
CN109308323A (en) * 2018-12-07 2019-02-05 中国科学院长春光学精密机械与物理研究所 A method, device and device for constructing a causal relationship knowledge base
CN110347894A (en) * 2019-05-31 2019-10-18 平安科技(深圳)有限公司 Knowledge mapping processing method, device, computer equipment and storage medium based on crawler
CN110532328A (en) * 2019-08-26 2019-12-03 哈尔滨工程大学 A kind of text concept figure building method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
向芳玉;郝建江;顾文玲;黄冬明;: "基于概念图的可视化教学整合研究――以地理概念为例" *
涂新辉;何婷婷;李芳;王建文;: "基于排序学习的文本概念标注方法研究" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111949843A (en) * 2020-07-21 2020-11-17 江苏海洋大学 An Intelligent Learning Diagnosis Method Based on Concept Map Construction
CN111949843B (en) * 2020-07-21 2023-11-03 江苏海洋大学 Intelligent learning diagnosis method based on conceptual diagram construction

Also Published As

Publication number Publication date
CN111428052B (en) 2023-06-16

Similar Documents

Publication Publication Date Title
Ke et al. Automated Essay Scoring: A Survey of the State of the Art.
CN107230174B (en) Online interactive learning system and method based on network
US11631338B2 (en) Deep knowledge tracing with transformers
Kastrati et al. Aspect-based opinion Mining of Students' reviews on online courses
US20080126319A1 (en) Automated short free-text scoring method and system
CN116860978B (en) Primary school Chinese personalized learning system based on knowledge graph and large model
CN110825867B (en) Similar text recommendation method and device, electronic equipment and storage medium
CN115329200B (en) Teaching resource recommendation method based on knowledge graph and user similarity
CN114254208A (en) Identification method of weak knowledge points, learning path planning method and device
Ren et al. Automatic scoring of student feedback for teaching evaluation based on aspect-level sentiment analysis
Sanuvala et al. A study of automated evaluation of student’s examination paper using machine learning techniques
CN118484520A (en) Intelligent teaching method and system based on deep knowledge tracking and large language model
Agarwal et al. Autoeval: A nlp approach for automatic test evaluation system
Sharma et al. An effective deep learning pipeline for improved question classification into bloom’s taxonomy’s domains
CN117795581A (en) System and method for education and psychological modeling and assessment
Nehyba et al. Applications of deep language models for reflective writings
CN110852071A (en) Knowledge point detection method, device, equipment and readable storage medium
Gasparetti Discovering prerequisite relations from educational documents through word embeddings
Khanam Sentiment analysis of user reviews in an online learning environment: analyzing the methods and future prospects
CN118396803B (en) A smart education system based on artificial intelligence
Chang et al. Automated Chinese essay scoring based on multilevel linguistic features
CN111428052B (en) A method for building educational concept maps with multiple relationships from multi-source data
Shin et al. Evaluating coherence in writing: Comparing the capacity of automated essay scoring technologies
Dodia et al. Machine Learning-based Automated System for Subjective Answer Evaluation
CN117688424A (en) Method, system, device and medium for classifying teaching data generated by retrieval enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
OSZAR »