US20150056596A1 - Automated Course Deconstruction into Learning Units in Digital Education Platforms - Google Patents
Automated Course Deconstruction into Learning Units in Digital Education Platforms Download PDFInfo
- Publication number
- US20150056596A1 US20150056596A1 US13/971,738 US201313971738A US2015056596A1 US 20150056596 A1 US20150056596 A1 US 20150056596A1 US 201313971738 A US201313971738 A US 201313971738A US 2015056596 A1 US2015056596 A1 US 2015056596A1
- Authority
- US
- United States
- Prior art keywords
- content
- activities
- course
- concepts
- passive
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000013016 learning Effects 0.000 title claims abstract description 101
- 230000000694 effects Effects 0.000 claims abstract description 116
- 238000000034 method Methods 0.000 claims description 37
- 239000000463 material Substances 0.000 claims description 20
- 238000003860 storage Methods 0.000 claims description 13
- 238000012360 testing method Methods 0.000 claims description 12
- 230000003993 interaction Effects 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 description 35
- 238000009826 distribution Methods 0.000 description 24
- 238000013507 mapping Methods 0.000 description 24
- 230000008569 process Effects 0.000 description 24
- 238000012384 transportation and delivery Methods 0.000 description 20
- 238000004458 analytical method Methods 0.000 description 9
- 238000007726 management method Methods 0.000 description 9
- 230000037406 food intake Effects 0.000 description 5
- 238000009877 rendering Methods 0.000 description 5
- 230000009471 action Effects 0.000 description 4
- 239000000284 extract Substances 0.000 description 4
- 230000002596 correlated effect Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000002452 interceptive effect Effects 0.000 description 3
- 230000015654 memory Effects 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 230000000153 supplemental effect Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 235000021028 berry Nutrition 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000003909 pattern recognition Methods 0.000 description 2
- 238000013439 planning Methods 0.000 description 2
- 230000000630 rising effect Effects 0.000 description 2
- 206010011469 Crying Diseases 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004581 coalescence Methods 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000012913 prioritisation Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 239000012925 reference material Substances 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 230000009326 social learning Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 230000008685 targeting Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B5/00—Electrically-operated educational appliances
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B7/00—Electrically-operated teaching apparatus or devices working with questions and answers
- G09B7/02—Electrically-operated teaching apparatus or devices working with questions and answers of the type wherein the student is expected to construct an answer to the question which is presented or wherein the machine gives an answer to the question presented by a student
Definitions
- This invention relates to automated processing of education materials and services associated with digital education platforms.
- An educational course is automatically deconstructed into discrete learning units.
- Content related to the course that has been stored by an integrated education platform is analyzed, and distinct concepts are extracted from the content.
- the learning activities in which users engage while accessing integrated learning services from the platform are recorded. These activities can generally be divided into passive, active, and recall activities.
- a general model of learning is then applied that connects concepts to the activities undertaken by students to learn those concepts.
- a model of learning is developed where courses are atomized into individual learning units, each of which comprises a concept and at least one learning activity. The learning units then can be delivered independently or aggregated as desired.
- FIG. 1 illustrates an example publishing platform, according to one embodiment.
- FIG. 2 is a block diagram illustrating interactions with a publishing platform, according to one embodiment.
- FIG. 3 illustrates a document reconstruction process, according to one embodiment.
- FIG. 4 illustrates an automated course deconstruction system, according to one embodiment.
- FIG. 5 illustrates the operation of automated learning units extraction systems, according to one embodiment.
- FIGS. 6A and 6B illustrate a process of learning unit extraction performed by automated learning units extraction systems, according to one embodiment.
- FIG. 7 illustrates a predictive model of course organization, according to one embodiment.
- FIG. 8 illustrates an example course deconstruction into distinct learning units, according to one embodiment.
- Embodiments of the invention identify and organize the learning activities in which students engage during their education. By deconstructing educational courses into individual concepts, a general model of learning is then applied in order to predict activities in which students will engage.
- Embodiments of the invention will be described in the context of a versatile education social learning platform for digital content interactive services distribution and consumption.
- personalized learning services are paired with secured distribution and analytics systems for reporting on both connected user activities and effectiveness of deployed services.
- the platform is able to deconstruct courses into individual concepts and pair these concepts to activities that users are likely to do.
- a model of learning is developed where courses are atomized into individual “learning units” that can be expressed independently or aggregated as desired.
- FIG. 1 is a high-level block diagram illustrating the education platform environment 100 .
- the education platform environment 100 is organized around four function blocks: content 101 , management 102 , delivery 103 , and experience 104 .
- Content block 101 automatically gathers and aggregates content from a large number of sources, categories, and partners. Whether the content is curated, perishable, on-line, or personal, these systems define the interfaces and processes to automatically collect various content sources into a formalized staging environment.
- Management block 102 comprises five blocks with respective submodules: ingestion 120 , publishing 130 , distribution 140 , back office system 150 , and eCommerce system 160 .
- the ingestion module 120 including staging, validation, and normalization subsystems, ingests published documents that may be in a variety of different formats, such as PDF, ePUB2, ePUB3, SVG, XML, or HTML.
- the ingested document may be a book, such as a textbook, a set of self-published notes, or any other published document, and may be subdivided in any manner. For example, the document may have a plurality of pages organized into chapters, which could be further divided into one or more sub-chapters. Each page may have text, images, tables, graphs, or other items distributed across the page.
- the documents are passed to the publishing system 130 , which in one embodiment includes transformation, correlation, and metadata subsystems. If the document ingested by the ingestion module 120 is not in a markup language format, the publishing system 130 automatically identifies, extracts, and indexes all the key elements and composition of the document to reconstruct it into a modern, flexible, and interactive HTML5 format.
- the ingested documents are converted into markup language documents well-suited for distribution across various computing devices.
- the publishing system 130 reconstructs published documents so as to accommodate dynamic add-ons, such as user-generated and related content, while maintaining page fidelity to the original document.
- the transformed content preserves the original page structure including pagination, number of columns and arrangement of paragraphs, placement and appearance of graphics, titles and captions, and fonts used, regardless of the original format of the source content and complexity of the layout of the original document.
- the page structure information is assembled into a document-specific table of contents describing locations of chapter headings and sub-chapter headings within the reconstructed document, as well as locations of content within each heading.
- document metadata describing a product description, pricing, and terms e.g., whether the content is for sale, rent, or subscription, or whether it is accessible for a certain time period or geographic region, etc. are also added to the reconstructed document.
- the reconstructed document's table of contents indexes the content of the document into a description of the overall structure of the document, including chapter headings and sub-chapter headings. Within each heading, the table of contents identifies the structure of each page. As content is added dynamically to the reconstructed document, the content is indexed and added to the table of contents to maintain a current representation of the document's structure.
- the process performed by the publishing system 130 to reconstruct a document and generate a table of contents is described further with respect to FIG. 3 .
- the distribution system 140 packages content for delivery, uploads the content to content distribution networks, and makes the content available to end users based on the content's digital rights management policies.
- the distribution system 140 includes digital content management, content delivery, and data collection and analysis subsystems.
- the distribution system 140 may aggregate additional content layers from numerous sources into the ingested or reconstructed document. These layers, including related content, advertising content, social content, and user-generated content, may be added to the document to create a dynamic, multilayered document.
- related content may comprise material supplementing the foundation document, such as study guides, self-testing material, solutions manuals, glossaries, or journal articles.
- Advertising content may be uploaded by advertisers or advertising agencies to the publishing platform, such that advertising content may be displayed with the document.
- Social content may be uploaded to the publishing platform by the user or by other nodes (e.g., classmates, teachers, authors, etc.) in the user's social graph.
- Examples of social content include interactions between users related to the document and content shared by members of the user's social graph.
- User-generated content includes annotations made by a user during an eReading session, such as highlighting or taking notes.
- user-generated content may be self-published by a user and made available to other users as a related content layer associated with a document or as a standalone document.
- page information and metadata of the document are referenced by all layers to merge the multilayered document into a single reading experience.
- the publishing system 130 may also add information describing the supplemental layers to the reconstructed document's table of contents. Because the page-based document ingested into the management block 102 or the reconstructed document generated by the publishing system 130 is referenced by all associated content layers, the ingested or reconstructed document is referred to herein as a “foundation document,” while the “multilayered document” refers to a foundation document and the additional content layers associated with the foundation document.
- the back-office system 150 of management block 102 enables business processes such as human resources tasks, sales and marketing, customer and client interactions, and technical support.
- the eCommerce system 160 interfaces with back office system 150 , publishing 130 , and distribution 140 to integrate marketing, selling, servicing, and receiving payment for digital products and services.
- Delivery block 103 of an educational digital publication and reading platform distributes content for user consumption by, for example, pushing content to edge servers on a content delivery network.
- Experience block 104 manages user interaction with the publishing platform through browser application 170 by updating content, reporting users' reading and other educational activities to be recorded by the platform, and assessing network performance.
- the content distribution and protection system is interfaced directly between the distribution sub-system 140 and the browser application 170 , essentially integrating the digital content management (DCM), content delivery network (CDN), delivery modules, and eReading data collection interface for capturing and serving all users' content requests.
- DCM digital content management
- CDN content delivery network
- eReading data collection interface for capturing and serving all users' content requests.
- the platform content catalog is a mosaic of multiple content sources which are collectively processed and assembled into the overall content service offering.
- the content catalog is based upon multilayered publications that are created from reconstructed foundation documents augmented by supplemental content material resulting from users' activities and platform back-end processes.
- FIG. 2 illustrates an example of a publishing platform where multilayered content document services are assembled and distributed to desktop, mobile, tablet, and other connected devices. As illustrated in FIG. 2 , the process is typically segmented into three phases: Phase 1: creation of the foundation document layer; Phase 2: association of the content service layers to the foundation document layer; and Phase 3: management and distribution of the content.
- Phase 1 the licensed document is ingested into the publishing platform and automatically reconstructed into a series of basic elements, while maintaining page fidelity to the original document structure. Document Reconstruction will be described in more detail below with reference to FIG. 2 .
- the publishing platform runs several processes to enhance the reconstructed document and transform it into a personalized multilayered content experience. For instance, several distinct processes are run to identify the related content to the reconstructed document, user generated content created by registered users accessing the reconstructed document, advertising or merchandising material that can be identified by the platform and indexed within the foundation document and its layers, and finally social network content resulting from registered users' activities.
- the elements referenced within each classes become identified by their respective content layer. Specifically, all the related content page-based elements that are matched with a particular reconstructed document are classified as part of the related content layer.
- Phase 2 is a series of static and dynamic page-based content layers that are logically stacked on top of each other and which collectively enhance the reconstructed foundation document.
- the resulting multilayered content are then published to the platform content catalog and pushed to the content servers and distribution network for distribution.
- the content distribution systems are effectively authorizing and directing the real-time download of page-based layered content services to a user's paired devices. These devices access the services through time sensitive dedicated URLs which, in one embodiment, only stay valid for a few minutes, all under control of the platform service provider.
- the browser-based applications are embedded, for example, into HTML5 compliant web browsers which control the fetching, requesting, synchronization, prioritization, normalization and rendering of all available content services.
- the publishing system 130 receives original documents for reconstruction from the ingestion system 120 illustrated in FIG. 1 .
- a series of modules of the publishing system 130 are configured to perform the document reconstruction process.
- FIG. 3 illustrates a process within the publishing system 130 for reconstructing a document.
- Embodiments are described herein with reference to an original document in the Portable Document Format (PDF) that is ingested into the publishing system 130 .
- PDF Portable Document Format
- the format of the original document is not limited to PDF; other unstructured document formats can also be reconstructed into a markup language format by a similar process.
- a PDF page contains one or more content streams, which include a sequence of objects, such as path objects, text objects, and external objects.
- a path object describes vector graphics made up of lines, rectangles, and curves. Path can be stroked or filled with colors and patterns as specified by the operators at the end of the path object.
- a text object comprises character stings identifying sequences of glyphs to be drawn on the page. The text object also specifies the encodings and fonts for the character strings.
- An external object XObject defines an outside resource, such as a raster image in JPEG format.
- An XObject of an image contains image properties and an associated stream of the image data.
- graphical objects within a page are identified and their respective regions and bounding boxes are determined.
- a path object in a PDF page may include multiple path construction operators that describe vector graphics made up of lines, rectangles, and curves.
- Metadata associated with each of the images in the document page is extracted, such as resolutions, positions, and captions of the images. Resolution of an image is often measured by horizontal and vertical pixel counts in the image; higher resolution means more image details.
- the image extraction process may extract the image in the original resolution as well as other resolutions targeting different eReading devices and applications. For example, a large XVGA image can be extracted and down sampled to QVGA size for a device with QVGA display. The position information of each image may also be determined.
- the position information of the images can be used to provide page fidelity when rendering the document pages in eReading browser applications, especially for complex documents containing multiple images per page.
- a caption associated with each image that defines the content of the image may also be extracted by searching for key words, such as “Picture”, “Image”, and “Tables”, from text around the image in the original page.
- the extracted image metadata for the page may be stored to the overall document metadata and indexed by the page number.
- Image extraction 301 may also extract tables, comprising graphics (horizontal and vertical lines), text rows, and/or text columns.
- the lines forming the tables can be extracted and stored separately from the rows and columns of the text.
- the image extraction process may be repeated for all the pages in the ingested document until all images in each page are identified and extracted.
- an image map that includes all graphics, images, tables and other graphic elements of the document is generated for the eReading platform.
- text and embedded fonts are extracted from the original document and the location of the text elements on each page are identified.
- Text is extracted from the pages of the original document tagged as having text.
- the text extraction may be done at the individual character level, together with markers separating words, lines, and paragraphs.
- the extracted text characters and glyphs are represented by the Unicode character mapping determined for each.
- the position of each character is identified by its horizontal and vertical locations within a page. For example, if an original page is in A4 standard size, the location of a character on the page can be defined by its X and Y location relative to the A4 page dimensions.
- text extraction is performed on a page-by-page basis.
- Embedded fonts may also be extracted from the original document, which are stored and referenced by client devices for rendering the text content.
- the pages in the original document having text are tagged as having text. In one embodiment, all the pages with one or more text objects in the original document are tagged. Alternatively, only the pages without any embedded text are marked.
- the output of text extraction 302 therefore, a dataset referenced by the page number, comprising the characters and glyphs in a Unicode character mapping with associated location information and embedded fonts used in the original document.
- Text coalescing 303 coalesces the text characters previously extracted.
- the extracted text characters are coalesced into words, words into lines, lines into paragraphs, and paragraphs into bounding boxes and regions.
- text coalescence into words is performed based on spacing.
- the spacing between adjacent characters is analyzed and compared to the expected character spacing based on the known text direction, font type, style, and size, as well as other graphics state parameters, such as character-spacing and zoom level.
- the average spacing between adjacent characters within a word is smaller than the spacing between adjacent words. For example, a string of “Berriesaregood” represents extracted characters without considering spacing information. Once taking the spacing into consideration, the same string becomes “Berries are good,” in which the average character spacing within a word is smaller than the spacing between words.
- extracted text characters may be assembled into words based on semantics.
- the string of “Berriesaregood” may be input to a semantic analysis tool, which matches the string to dictionary entries or Internet search terms, and outputs the longest match found within the string. The outcome of this process is a semantically meaningful string of “Berries are good.”
- the same text is analyzed by both spacing and semantics, so that word grouping results may be verified and enhanced.
- Words may be assembled into lines by determining an end point of each line of text. Based on the text direction, the horizontal spacing between words may be computed and averaged. The end point may have word spacing larger than the average spacing between words. For example, in a two-column page, the end of the line of the first column may be identified based on it having a spacing value much larger than the average word spacing within the column. On a single column page, the end of the line may be identified by the space after a word extending to the side of the page or bounding box.
- lines may be assembled into paragraphs. Based on the text direction, the average vertical spacing between consecutive lines can be computed. The end of the paragraph may have a vertical spacing that is larger than the average. Additionally or alternatively, semantic analysis may be applied to relate syntactic structures of phrases and sentences, so that meaningful paragraphs can be formed.
- the identified paragraphs may be assembled into bounding boxes or regions.
- the paragraphs may be analyzed based on lexical rules associated with the corresponding language of the text.
- a semantic analyzer may be executed to identify punctuation at the beginning or end of a paragraph. For example, a paragraph may be expected to end with a period. If the end of a paragraph does not have a period, the paragraph may continue either on a next column or a next page.
- the syntactic structures of the paragraphs may be analyzed to determine the text flow from one paragraph to the next, and may combine two or more paragraphs based on the syntactic structure. If multiple combinations of the paragraphs are possible, reference may be made to an external lexical database, such as WORDNET®, to determine which paragraphs are semantically similar.
- a Unicode character mapping for each glyph in a document to be reconstructed is determined.
- the mapping ensures that no two glyphs are mapped to a same Unicode character.
- a set of rules is defined and followed, including applying the Unicode mapping found in the embedded font file; determining the Unicode mapping by looking up postscript character names in a standard table, such as a system TrueType font dictionary; and determining the Unicode mapping by looking for patterns, such as hex codes, postscript name variants, and ligature notations.
- pattern recognition techniques may be applied on the rendered font to identify Unicode characters. If pattern recognition is still unsuccessful, the unrecognized characters may be mapped into the private use area (PUA) of Unicode. In this case, the semantics of the characters are not identified, but the encoding uniqueness is guaranteed. As such, rendering ensures fidelity to the original document.
- content of the reconstructed document is indexed.
- the indexed content is aggregated into a document-specific table of contents that describes the structure of the document at the page level. For example, when converting printed publications into electronic documents with preservation of page fidelity, it may be desirable to keep the digital page numbering consistent with the numbering of the original document pages.
- the table of contents may be optimized at different levels of the table.
- the chapter headings within the original document such as headings for a preface, chapter numbers, chapter titles, an appendix, and a glossary may be indexed.
- a chapter heading may be found based on the spacing between chapters.
- a chapter heading may be found based on the font face, including font type, style, weight, or size.
- the headings may have a font face that is different from the font face used throughout the rest of the document. After identifying the headings, the number of the page on which each heading is located is retrieved.
- sub-chapter headings within the original document may be identified, such as dedications and acknowledgments, section titles, image captions, and table titles.
- Vertical spacing between sections, text, and/or font face may be used to segment each chapter. For example, each chapter may be parsed to identify all occurrences of the sub-chapter heading font face, and determine the page number associated with each identified sub-chapter heading.
- FIG. 4 illustrates an automated educational course deconstruction system, according to one embodiment.
- FIG. 4 demonstrates the interaction between learning units extraction systems 440 , the education platform 450 , and the HTML5 browser environment 470 .
- the learning units extraction systems 440 may be integrated as part of the platform environment 100 illustrated in FIG. 1 , and in other embodiments they may be separate systems.
- the education platform 450 may have components in common with the functional blocks of the platform environment 100
- the HTML5 browser environment 470 may be the same as the eReading application 170 of the experience block 104 of the platform environment 100 , or the functionality may be implemented in different modules.
- the education platform 450 serves the education services to registered users 471 based on a process of requesting and fetching on-line services in the context of authenticated on-line sessions.
- the education platform 450 includes a content catalog database 451 , publishing systems 452 , content distribution systems 453 , and reporting systems 454 .
- the content catalog database 451 contains the collection of content available via the education platform 451 .
- the content catalog database 451 feeds the content to the publishing systems 452 .
- the publishing systems 452 serve the content to registered users 471 via the content distribution system 453 .
- Reporting systems 454 receive reports of user experience and user activities from the connected devices 470 operated by the registered users 471 . This feedback is used by content distribution system 453 for managing the distribution of the content and for capturing UGC and other forms of user activities to add to the content catalog database 451 .
- the learning units extraction systems 440 receives published content from the publishing systems 452 for analysis, and provides a mapping of concepts to activities for storage in the learning units database 445 .
- the learning units extraction system 440 include modules for content analysis 441 , concepts extraction 442 , and activities mapping 443 , timeline mapping, and includes a learning units database 445 .
- the content analysis module 441 analyzes the content available from the content catalog database 451 . This includes content added by registered users 471 through their interactions with the education platform 450 . The content analysis module 441 collects and prepares related content for further processing by the learning units extraction systems 440 .
- the concepts extraction module 442 extracts concepts from the analyzed content to determine a list of concepts.
- the extracted list of concepts is stored in the learning units database 445 .
- the activities mapping module 443 determines which activities undertaken by the registered users 471 are related to which concepts.
- the activities mapping module 443 stores the association in the learning units database 445 . Because the content and other services are originating from the same platform environment 100 , the users' activities are analyzed and correlated to each other. These activities can be aggregated over time into distinct categories. These activities are broadly categorized as passive 472 , active 473 , and recall 474 . Each concept is mapped to at least one type of user activity, and may be mapped to all three types of user activities.
- Passive activities 472 includes activities where registered users are passively interacting with published academic content materials associated to a particular course.
- the reference materials for a course typically include one or more published documents, such as textbook, summary, syllabus, and other digital related content which are aggregated at the course level and accessible from the registered users' connected devices. These activities are defined as “passive” because they are typically orchestrated by each user around multiple on-line reading authenticated sessions when accessing the structured HTML course-based referenced documents.
- the connected education platform analyzes the passive reading activities within each course, correlating how registered users are interacting with the referenced academic content within any course delivery.
- Activities are defined as “active” when registered users are interacting with course-defined referenced academic documents by creating their own user generated content (UGC) layer as managed by the platform services.
- UGC user generated content
- Passive activities where content is predetermined, static and structured as part of a course description, the process of creating user generated content is unique to each user, both in terms of actual material, format, frequency or structure, for example.
- UGC is defined by the creation of personal notes, highlights, asking or answering questions, and other comments, or interacting with other registered users 471 through the education platform 450 while accessing the referenced course-based HTML documents.
- Other types of UGC include seeking support when help is needed, running step-by-step problems associated to particular sections of course-based HTML documents, connecting and exchanging feedback with peers, among others.
- UGC activities are authenticated through on-line “active” sessions that are processed and correlated by the platform content distribution system 453 and reporting system 454 .
- the platform 450 can correlate how registered users add their UGC layer within any course delivery.
- Activities are defined as “recall” activities when registered users are being tested against the knowledge acquired from their previous passive and active sessions. By contrast to the previous passive and active sessions, recalls can be orchestrated around combined predetermined content material with user generated content. For instance, the assignments, quizzes and other testing materials associated to the particular course and its curriculum are typically predefined and offered to registered users as structured documents that are enhanced once personal content is added into them. Typically, a set of predetermined questions which are aggregated by the platform 450 into a digital testing material is described as a structured HTML document that is published either as a stand-alone document or as supplemental to a course-based document. By contrast, the individual answers to these questions are expressed as UGC in some testing-like activities.
- the platform 450 can correlate how registered users interact with the testing documents within any course delivery.
- the timeline mapping module 444 determines the starting point and/or end point of activities that are recorded.
- the timeline mapping module 444 stores the respective times associated with the activities in the learning units database 445 .
- FIG. 5 illustrates the operation of automated learning units extraction systems 440 , according to one embodiment.
- the extraction of learning units drives the identification of activities in which users can be predicted to engage during the delivery of a course. Generally, the extraction is performed by atomizing the courses that users take into individual learning concepts. A general model of learning is then applied to these individual concepts in order to determine what likely activities that the users will be performing.
- the course structured content library 550 is made up of data that supports passive 472 , active 473 , and recall 474 activities that a registered user 471 may undertake as part of the user's study of at least one course.
- the course structured content library 550 may exist within the content catalog database 451 .
- the content analysis module 441 analyzes the materials that make up and/or are generated by these passive 472 , active 473 , and recall 474 activities along with additional documents from the content catalog database 451 , and indexes them for the concepts extraction module 442 .
- the concepts extraction module 442 ranks the content affiliated per course of a plurality of courses and processes the content by extracting and normalizing the content into a unique combination of operands and operators that characterize the respective course.
- Each extracted combination of operand and operator forms a concept.
- concepts are extracted by the concepts extraction module 442 , they are indexed into the concepts index database 552 .
- the concepts are mapped to the activities that engage users who are studying those concepts as part of at least one course by the activities mapping module 443 .
- the activities are also mapped to a timeline by start time, end time, and/or elapsed time of the activities that are undertaken by a timeline mapping module 444 .
- the timeline mapping is informed by a school syllabus database 554 that contains information about course dates, lesson plans, or the like.
- the respective mappings of activities and timeline for each concept are stored in the learning units database 553 .
- course learning units 555 are output of the learning units extraction systems 440 .
- Course learning units 555 are composed of the mapping between a concept and the activities that are performed by a user that are related to that concept within the time boundaries of an educational course. The coupling of one concept to at least one learning activity collectively defines a discrete learning unit.
- the learning unit attributes are expressed as the unique combination of a single concept with its mapped activities.
- a course is composed of a plurality of learning units, which may each be associated with a start time, an end time, a length of time, or an elapsed time in which the learning unit is studied and the activities associated with the learning unit are performed.
- the learning units can be shuffled into different orders.
- the learning units can be mixed, matched, or assembled into new courses.
- FIGS. 6A-B illustrate a process of learning unit extraction performed by automated learning units extraction systems, according to one embodiment.
- content analysis 441 begins with materials from the platform content catalog database 451 which have been associated with passive, active, and recall user activities.
- content media types are identified. The media types may be inferred from data from the content catalog database 451 or the users' content activities, or the media types may be explicitly tagged. Certain media types, such as a summary of a course and a course syllabus, may tend to be more reliable and indicative of the learning units that will be studied in a course than other media types, such as related content and UGC, which may only be tangentially related to the learning units studied in the course.
- the identification of the media types in step 601 allows the media types to be sorted in step 602 .
- the media types may be sorted for example, in descending order of reliability or importance in terms of containing valuable concepts for extraction as the basis of a learning unit.
- the media types are ordered as follows: summary, syllabus, textbook, related content, UGC, Q&A, and testing materials, with all other materials following.
- the summary, syllabus, and textbook may be considered primary sources, whereas the remainder of the sources may be considered secondary sources.
- the content is loaded in the sorted order for extraction.
- FIG. 6B continues FIG. 6A at the point that content has been loaded for extraction.
- the concepts extraction module 442 and the timeline mapping module 444 illustrate several processing steps that may be undertaken sequentially or in parallel for each media type.
- relationships and tags are extracted from the loaded content.
- these relationships are the relationships between the content and the course, subject of study, jobs relevant to the field of study, and any other field that may be tracked by an education platform.
- the relationships may be explicitly tagged within the content catalog database 451 .
- These explicit tags can be fed into the concept data record or listing 604 .
- the tags may also be fed into the grouped content activities 605 , which will be discussed in great detail below.
- the loaded content can be sliced into logical groups.
- the logical groups may be determined based at least in part on the structure of the loaded content, for example a subsection of a textbook, a chapter of a study guide, or a paragraph of a course summary.
- the logical groups are diced into key phrases 607 using language analysis.
- the key phrases are candidates for concepts, and are composed of a combination of an “operator” and an “operand”. It can also be thought of as a verb and a direct object. It is this combination that uniquely identifies a particular concept.
- the “operator” is the action that a student learns to perform and the “operand” is the type of thing that the student learns to perform the action on. Either can be specific or broad.
- a major difference between the Humanities and Science is that the operands in the Humanities courses are very specific and the operators are very broad, while the opposite is true for Science courses.
- key phrases either have a specific operator or operand.
- Concepts where both are broad and generic are listed as undefined because they lack meaningful boundaries.
- Concepts where both are specific tend not to be as useful in some embodiments, although they are practical when learning to do repetitive tasks.
- the undefined key phrases which are identified may be listed separately and/or excluded from further processing.
- the key phrases that are identified from the loaded content are stored in step 608 as part of the concept data record 604 .
- citations are extracted for character strings (e.g., character strings that have been identified as key phrases or may be text recitations of concepts). For example, for each loaded content item, the citations for a text string are indexed.
- the text strings are normalized in format, for example by removing unwanted characters, eliminating punctuation, and standardizing language (e.g., making nouns singular or plural, and/or truncating verbs, or the like). The normalized citations are then stored in step 611 as part of the concept data record 604 .
- step 612 the loaded content is tagged as stemming from passive, active, and recall activities undertaken by users.
- the tags are also passed to the concept data record 604 .
- step 613 as part of the timeline mapping module 444 , the loaded content is analyzed to extract the time which is relevant to the content, for example, when pages were read, when a quiz was completed, when an assignment was started, or the like, as informed by the users activity logs 651 .
- step 614 by referencing the school syllabus data 652 , the process can normalize the extracted time relevant to the content according to the school in order to determine a relative time within the course at which the content was acted upon.
- the normalized timing of activities is stored in the concepts data record 604 .
- the normalized timing of activities may be reported for use in updating the school syllabus data 652 to be responsive to adjustments in the flow of the course delivery.
- a class may linger on a learning unit longer than planned at the outset of the course, for example, and the reported normalized timing of activities can be used to dynamically update the course syllabus according to the reality of the course delivery.
- step 614 also as part of the timeline mapping module 444 , the loaded content is analyzed to extract the time duration of an activity, for example how long a user spent reading a chapter, working on a problem set, or the like, as informed by the users activity longs 651 .
- a student may have spent 7 hours studying a chapter of a textbook, as revealed by the elapsed time in each of a user's reading sessions for that chapter recorded in the users activity log 651 .
- the user's individual time can be normalized across users in step 616 by referring to users activity logs 651 to determine a typical duration for the activity.
- the user's specific duration or the normalized duration may also be stored in the concepts data record 604 .
- the normalized duration of activities may be reported, for example, for use in planning future iterations of a course.
- an appropriate duration of the learning unit and an appropriate number of learning units for a course can be planned so that the course fits within the school schedule and engages students at an appropriate level of involvement.
- step 617 similar concepts can be combined and collapsed into one concept in the concepts index database 552 to avoid duplication and simplify the database 552 .
- Such a recombination process may be performed iteratively as new content is loaded for extraction and analyzed. It is noted that while some concepts may be uncommon in a particular course, such as covering the personally favored topics of a particular instructor, a great many are shared between courses that share the same logical curricular block. For example, every course on “beginning linear algebra” covers the topic “linear independence.” Therefore, the extraction systems are effectively building up the list of concepts for courses over time even when the list is not complete for a given course.
- the concepts index database 552 can be used together with the grouped content activities 605 to create a learning unit.
- the association between a concept from the concepts index database 552 and the passive, active, and recall activities 605 that a student undertakes to study the concept make up the learning unit, which is stored in the learning units database 445 .
- an interested party can search concepts by relationships or tags in step 618 through the concepts index database 552 , or an interested party can search learning units by concepts or activities in step 619 through the learning units database 445 .
- the interested party may be a student seeking to fill gaps in their education, a teacher planning a course, an administrator organizing a curriculum, an employer designing job requirements or seeking job applicants, or any other person or system interested in how students engage in their education on a digital education platform.
- FIG. 7 illustrates a predictive model of course organization, according to one embodiment.
- the course structured content activities are completed between a fixed start 701 and a fixed ending 702 of the course timeline.
- the course timeline includes a subdivision that predicts what activities a user studying the course will undertake in each of several time periods throughout the actual delivery of the course.
- the course is divided into eight equal time periods, but the time periods may vary in length in other examples.
- passive sessions will be interleaved with active sessions and followed by a recall session if available.
- the platform 450 can build a model of likely activities for each course in its catalog. The outcome of this is a predictive learning model for a course and/or class.
- the time interval, delta T, along with passive 472 , active 473 , and recall 474 activities, are predicted based on analytics from previous courses and classes that have been delivered. The model is then applied across multiple instances of that course, enabling direct comparisons between similar courses and cross indexing likely content activities and events.
- FIG. 8 illustrates an example course deconstruction into distinct learning units, according to one embodiment.
- the structure and delivery of a course can be expressed as the aggregation of passive, active, and recall on-line sessions that collectively summarize all the events and content activities associated to that course by the registered users during its actual delivery.
- the course is effectively atomized into a series of concepts and relevant activities, determining a new structure and organization for that course.
- the course referenced content i.e., the textbook, which is traditionally determining the structure of the course, is replaced with individual learning units that are more accurate indicators and representations of what students need to achieve.
- the modular nature of the learning units allows them to be rearranged, mixed with learning units from other courses, and aggregated in different orders in order adapt learning materials to suit different learning styles, instructor preferences, institutional goals, or for any other reason.
- each learning unit 801 is associated with a concept 802 and is associated with at least one activity.
- each concept 802 is supported by a passive 472 , an active 473 , and a recall 474 activity, but that need not always be the case.
- the extraction system 440 can fill in the activities for a given concept.
- the platform 450 suggests a wide range of products and services that can fulfill one of the activity types of the learning unit.
- the platform 450 can suggest additional reading for more passive learning, homework help and additional tutoring for active learning, and practice quizzes and tests for recall, all based on the indexed concepts and activities in the learning units database 445 that correspond to the learning units that a student is studying in a course. These can be useful to the user regardless of whether they are perfectly aligned to the particular assignments that are offered in the school because they are focused on mastering the same concepts. This way, the platform services can monitor users to ensure they are on track to succeeding and prodding them to seek more help if they are falling behind.
- Certain aspects of the present invention include process steps and instructions described herein in the form of an algorithm. It should be noted that the process steps and instructions of the present invention could be embodied in software, firmware or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by real time network operating systems.
- the present invention also relates to an apparatus for performing the operations herein.
- This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored on a computer readable medium that can be accessed by the computer and run by a computer processor.
- a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus.
- the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
- the present invention is not limited to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any references to specific languages, such as HTML or HTML5, are provided for enablement and best mode of the present invention.
- the present invention is well suited to a wide variety of computer network systems over numerous topologies.
- the configuration and management of large networks comprise storage devices and computers that are communicatively coupled to dissimilar computers and storage devices over a network, such as the Internet.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
- 1. Field of the Invention
- This invention relates to automated processing of education materials and services associated with digital education platforms.
- 2. Description of the Related Art
- The rising demand for high-skilled resources in a global economy is putting growing pressure on traditional education systems and environments. As too many students find themselves unable to effectively compete on today's job market, the need for education platforms to produce better tailored learning solutions is compelled with rising tuition costs amid challenging economies.
- While businesses at-large have embraced the digital revolution by providing increasingly sophisticated online services, education, by contrast, has been slow to adapt to new technologies in terms of infrastructure, curriculum, and publishing platforms. Typically, the structure of a course, with the required textbook as its central point of reference, has stayed remarkably monolithic over time, regardless of content or delivery formats.
- Furthermore, as interactive and other testing activities are progressively embedded within digital course offerings, it becomes increasingly complex to manage and organize these additional user-based content services into an integrated learning experience. As traditional courses are shifting from a static textbook-centric model to a connected one where related, personalized and other social-based content activities are being aggregated dynamically within the core academic material, it becomes strategic for education publishing platforms and their distribution systems to be able to translate these activities into new models of learning among a plurality of users and connected systems.
- An educational course is automatically deconstructed into discrete learning units. Content related to the course that has been stored by an integrated education platform is analyzed, and distinct concepts are extracted from the content. In addition, the learning activities in which users engage while accessing integrated learning services from the platform are recorded. These activities can generally be divided into passive, active, and recall activities. By deconstructing educational courses into individual concepts, a general model of learning is then applied that connects concepts to the activities undertaken by students to learn those concepts. As a result, a model of learning is developed where courses are atomized into individual learning units, each of which comprises a concept and at least one learning activity. The learning units then can be delivered independently or aggregated as desired.
- The features and advantages described in this summary and the following detailed description are not all-inclusive. Many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims.
-
FIG. 1 illustrates an example publishing platform, according to one embodiment. -
FIG. 2 is a block diagram illustrating interactions with a publishing platform, according to one embodiment. -
FIG. 3 illustrates a document reconstruction process, according to one embodiment. -
FIG. 4 illustrates an automated course deconstruction system, according to one embodiment. -
FIG. 5 illustrates the operation of automated learning units extraction systems, according to one embodiment. -
FIGS. 6A and 6B illustrate a process of learning unit extraction performed by automated learning units extraction systems, according to one embodiment. -
FIG. 7 illustrates a predictive model of course organization, according to one embodiment. -
FIG. 8 illustrates an example course deconstruction into distinct learning units, according to one embodiment. - The figures depict various embodiments of the present invention for purposes of illustration only. One skilled in the art will readily recognize from the following discussion that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles of the invention described herein.
- Embodiments of the invention identify and organize the learning activities in which students engage during their education. By deconstructing educational courses into individual concepts, a general model of learning is then applied in order to predict activities in which students will engage. Embodiments of the invention will be described in the context of a versatile education social learning platform for digital content interactive services distribution and consumption. In the platform, personalized learning services are paired with secured distribution and analytics systems for reporting on both connected user activities and effectiveness of deployed services. By analyzing key activities that users are engaging in while accessing integrated learning services, the platform is able to deconstruct courses into individual concepts and pair these concepts to activities that users are likely to do. As a result, a model of learning is developed where courses are atomized into individual “learning units” that can be expressed independently or aggregated as desired.
-
FIG. 1 is a high-level block diagram illustrating theeducation platform environment 100. Theeducation platform environment 100 is organized around four function blocks:content 101,management 102,delivery 103, and experience 104. -
Content block 101 automatically gathers and aggregates content from a large number of sources, categories, and partners. Whether the content is curated, perishable, on-line, or personal, these systems define the interfaces and processes to automatically collect various content sources into a formalized staging environment. -
Management block 102 comprises five blocks with respective submodules:ingestion 120,publishing 130,distribution 140,back office system 150, andeCommerce system 160. Theingestion module 120, including staging, validation, and normalization subsystems, ingests published documents that may be in a variety of different formats, such as PDF, ePUB2, ePUB3, SVG, XML, or HTML. The ingested document may be a book, such as a textbook, a set of self-published notes, or any other published document, and may be subdivided in any manner. For example, the document may have a plurality of pages organized into chapters, which could be further divided into one or more sub-chapters. Each page may have text, images, tables, graphs, or other items distributed across the page. - After ingestion, the documents are passed to the
publishing system 130, which in one embodiment includes transformation, correlation, and metadata subsystems. If the document ingested by theingestion module 120 is not in a markup language format, thepublishing system 130 automatically identifies, extracts, and indexes all the key elements and composition of the document to reconstruct it into a modern, flexible, and interactive HTML5 format. The ingested documents are converted into markup language documents well-suited for distribution across various computing devices. In one embodiment, thepublishing system 130 reconstructs published documents so as to accommodate dynamic add-ons, such as user-generated and related content, while maintaining page fidelity to the original document. The transformed content preserves the original page structure including pagination, number of columns and arrangement of paragraphs, placement and appearance of graphics, titles and captions, and fonts used, regardless of the original format of the source content and complexity of the layout of the original document. - The page structure information is assembled into a document-specific table of contents describing locations of chapter headings and sub-chapter headings within the reconstructed document, as well as locations of content within each heading. During reconstruction, document metadata describing a product description, pricing, and terms (e.g., whether the content is for sale, rent, or subscription, or whether it is accessible for a certain time period or geographic region, etc.) are also added to the reconstructed document.
- The reconstructed document's table of contents indexes the content of the document into a description of the overall structure of the document, including chapter headings and sub-chapter headings. Within each heading, the table of contents identifies the structure of each page. As content is added dynamically to the reconstructed document, the content is indexed and added to the table of contents to maintain a current representation of the document's structure. The process performed by the
publishing system 130 to reconstruct a document and generate a table of contents is described further with respect toFIG. 3 . - The
distribution system 140 packages content for delivery, uploads the content to content distribution networks, and makes the content available to end users based on the content's digital rights management policies. In one embodiment, thedistribution system 140 includes digital content management, content delivery, and data collection and analysis subsystems. - Whether the ingested document is in a markup language document or is reconstructed by the
publishing system 130, thedistribution system 140 may aggregate additional content layers from numerous sources into the ingested or reconstructed document. These layers, including related content, advertising content, social content, and user-generated content, may be added to the document to create a dynamic, multilayered document. For example, related content may comprise material supplementing the foundation document, such as study guides, self-testing material, solutions manuals, glossaries, or journal articles. Advertising content may be uploaded by advertisers or advertising agencies to the publishing platform, such that advertising content may be displayed with the document. Social content may be uploaded to the publishing platform by the user or by other nodes (e.g., classmates, teachers, authors, etc.) in the user's social graph. Examples of social content include interactions between users related to the document and content shared by members of the user's social graph. User-generated content includes annotations made by a user during an eReading session, such as highlighting or taking notes. In one embodiment, user-generated content may be self-published by a user and made available to other users as a related content layer associated with a document or as a standalone document. - As layers are added to the document, page information and metadata of the document are referenced by all layers to merge the multilayered document into a single reading experience. The
publishing system 130 may also add information describing the supplemental layers to the reconstructed document's table of contents. Because the page-based document ingested into themanagement block 102 or the reconstructed document generated by thepublishing system 130 is referenced by all associated content layers, the ingested or reconstructed document is referred to herein as a “foundation document,” while the “multilayered document” refers to a foundation document and the additional content layers associated with the foundation document. - The back-
office system 150 ofmanagement block 102 enables business processes such as human resources tasks, sales and marketing, customer and client interactions, and technical support. TheeCommerce system 160 interfaces withback office system 150, publishing 130, anddistribution 140 to integrate marketing, selling, servicing, and receiving payment for digital products and services. -
Delivery block 103 of an educational digital publication and reading platform distributes content for user consumption by, for example, pushing content to edge servers on a content delivery network. Experience block 104 manages user interaction with the publishing platform throughbrowser application 170 by updating content, reporting users' reading and other educational activities to be recorded by the platform, and assessing network performance. - In the example illustrated in
FIG. 1 , the content distribution and protection system is interfaced directly between thedistribution sub-system 140 and thebrowser application 170, essentially integrating the digital content management (DCM), content delivery network (CDN), delivery modules, and eReading data collection interface for capturing and serving all users' content requests. By having content served dynamically and mostly on-demand, the content distribution and protection system effectively authorizes the download of one page of content at a time through time-sensitive dedicated URLs which only stay valid for a limited time, for example a few minutes in one embodiment, all under control of the platform service provider. - The platform content catalog is a mosaic of multiple content sources which are collectively processed and assembled into the overall content service offering. The content catalog is based upon multilayered publications that are created from reconstructed foundation documents augmented by supplemental content material resulting from users' activities and platform back-end processes.
FIG. 2 illustrates an example of a publishing platform where multilayered content document services are assembled and distributed to desktop, mobile, tablet, and other connected devices. As illustrated inFIG. 2 , the process is typically segmented into three phases: Phase 1: creation of the foundation document layer; Phase 2: association of the content service layers to the foundation document layer; and Phase 3: management and distribution of the content. - During
Phase 1, the licensed document is ingested into the publishing platform and automatically reconstructed into a series of basic elements, while maintaining page fidelity to the original document structure. Document Reconstruction will be described in more detail below with reference toFIG. 2 . - During
Phase 2, once a foundation document has been reconstructed and its various elements extracted, the publishing platform runs several processes to enhance the reconstructed document and transform it into a personalized multilayered content experience. For instance, several distinct processes are run to identify the related content to the reconstructed document, user generated content created by registered users accessing the reconstructed document, advertising or merchandising material that can be identified by the platform and indexed within the foundation document and its layers, and finally social network content resulting from registered users' activities. By having each of these processes focusing on specific classes of content and databases, the elements referenced within each classes become identified by their respective content layer. Specifically, all the related content page-based elements that are matched with a particular reconstructed document are classified as part of the related content layer. Similarly, all other document enhancement processes, including user generated, advertising and social among others, are classified by their specific content layer. The outcome ofPhase 2 is a series of static and dynamic page-based content layers that are logically stacked on top of each other and which collectively enhance the reconstructed foundation document. - During
Phase 3, once the various content layers have been identified and processed, the resulting multilayered content are then published to the platform content catalog and pushed to the content servers and distribution network for distribution. By having multilayered content services served dynamically and on-demand through secured authenticated web sessions, the content distribution systems are effectively authorizing and directing the real-time download of page-based layered content services to a user's paired devices. These devices access the services through time sensitive dedicated URLs which, in one embodiment, only stay valid for a few minutes, all under control of the platform service provider. The browser-based applications are embedded, for example, into HTML5 compliant web browsers which control the fetching, requesting, synchronization, prioritization, normalization and rendering of all available content services. - The
publishing system 130 receives original documents for reconstruction from theingestion system 120 illustrated inFIG. 1 . In one embodiment, a series of modules of thepublishing system 130 are configured to perform the document reconstruction process. -
FIG. 3 illustrates a process within thepublishing system 130 for reconstructing a document. Embodiments are described herein with reference to an original document in the Portable Document Format (PDF) that is ingested into thepublishing system 130. However, the format of the original document is not limited to PDF; other unstructured document formats can also be reconstructed into a markup language format by a similar process. - A PDF page contains one or more content streams, which include a sequence of objects, such as path objects, text objects, and external objects. A path object describes vector graphics made up of lines, rectangles, and curves. Path can be stroked or filled with colors and patterns as specified by the operators at the end of the path object. A text object comprises character stings identifying sequences of glyphs to be drawn on the page. The text object also specifies the encodings and fonts for the character strings. An external object XObject defines an outside resource, such as a raster image in JPEG format. An XObject of an image contains image properties and an associated stream of the image data.
- During
image extraction 301, graphical objects within a page are identified and their respective regions and bounding boxes are determined. For example, a path object in a PDF page may include multiple path construction operators that describe vector graphics made up of lines, rectangles, and curves. Metadata associated with each of the images in the document page is extracted, such as resolutions, positions, and captions of the images. Resolution of an image is often measured by horizontal and vertical pixel counts in the image; higher resolution means more image details. The image extraction process may extract the image in the original resolution as well as other resolutions targeting different eReading devices and applications. For example, a large XVGA image can be extracted and down sampled to QVGA size for a device with QVGA display. The position information of each image may also be determined. The position information of the images can be used to provide page fidelity when rendering the document pages in eReading browser applications, especially for complex documents containing multiple images per page. A caption associated with each image that defines the content of the image may also be extracted by searching for key words, such as “Picture”, “Image”, and “Tables”, from text around the image in the original page. The extracted image metadata for the page may be stored to the overall document metadata and indexed by the page number. -
Image extraction 301 may also extract tables, comprising graphics (horizontal and vertical lines), text rows, and/or text columns. The lines forming the tables can be extracted and stored separately from the rows and columns of the text. - The image extraction process may be repeated for all the pages in the ingested document until all images in each page are identified and extracted. At the end of the process, an image map that includes all graphics, images, tables and other graphic elements of the document is generated for the eReading platform.
- During
text extraction 302, text and embedded fonts are extracted from the original document and the location of the text elements on each page are identified. - Text is extracted from the pages of the original document tagged as having text. The text extraction may be done at the individual character level, together with markers separating words, lines, and paragraphs. The extracted text characters and glyphs are represented by the Unicode character mapping determined for each. The position of each character is identified by its horizontal and vertical locations within a page. For example, if an original page is in A4 standard size, the location of a character on the page can be defined by its X and Y location relative to the A4 page dimensions. In one embodiment, text extraction is performed on a page-by-page basis. Embedded fonts may also be extracted from the original document, which are stored and referenced by client devices for rendering the text content.
- The pages in the original document having text are tagged as having text. In one embodiment, all the pages with one or more text objects in the original document are tagged. Alternatively, only the pages without any embedded text are marked.
- The output of
text extraction 302, therefore, a dataset referenced by the page number, comprising the characters and glyphs in a Unicode character mapping with associated location information and embedded fonts used in the original document. - Text coalescing 303 coalesces the text characters previously extracted. In one embodiment, the extracted text characters are coalesced into words, words into lines, lines into paragraphs, and paragraphs into bounding boxes and regions. These steps leverage the known attributes about extracted text in each page, such as information on the text position within the page, text direction (e.g., left to right, or top to bottom), font type (e.g., Arial or Courier), font style (e.g., bold or italic), expected spacing between characters based on font type and style, and other graphics state parameters of the pages.
- In one embodiment, text coalescence into words is performed based on spacing. The spacing between adjacent characters is analyzed and compared to the expected character spacing based on the known text direction, font type, style, and size, as well as other graphics state parameters, such as character-spacing and zoom level. Despite different rendering engines adopted by the
browser applications 170, the average spacing between adjacent characters within a word is smaller than the spacing between adjacent words. For example, a string of “Berriesaregood” represents extracted characters without considering spacing information. Once taking the spacing into consideration, the same string becomes “Berries are good,” in which the average character spacing within a word is smaller than the spacing between words. - Additionally or alternatively, extracted text characters may be assembled into words based on semantics. For example, the string of “Berriesaregood” may be input to a semantic analysis tool, which matches the string to dictionary entries or Internet search terms, and outputs the longest match found within the string. The outcome of this process is a semantically meaningful string of “Berries are good.” In one embodiment, the same text is analyzed by both spacing and semantics, so that word grouping results may be verified and enhanced.
- Words may be assembled into lines by determining an end point of each line of text. Based on the text direction, the horizontal spacing between words may be computed and averaged. The end point may have word spacing larger than the average spacing between words. For example, in a two-column page, the end of the line of the first column may be identified based on it having a spacing value much larger than the average word spacing within the column. On a single column page, the end of the line may be identified by the space after a word extending to the side of the page or bounding box.
- After determining the end point of each line, lines may be assembled into paragraphs. Based on the text direction, the average vertical spacing between consecutive lines can be computed. The end of the paragraph may have a vertical spacing that is larger than the average. Additionally or alternatively, semantic analysis may be applied to relate syntactic structures of phrases and sentences, so that meaningful paragraphs can be formed.
- The identified paragraphs may be assembled into bounding boxes or regions. In one embodiment, the paragraphs may be analyzed based on lexical rules associated with the corresponding language of the text. A semantic analyzer may be executed to identify punctuation at the beginning or end of a paragraph. For example, a paragraph may be expected to end with a period. If the end of a paragraph does not have a period, the paragraph may continue either on a next column or a next page. The syntactic structures of the paragraphs may be analyzed to determine the text flow from one paragraph to the next, and may combine two or more paragraphs based on the syntactic structure. If multiple combinations of the paragraphs are possible, reference may be made to an external lexical database, such as WORDNET®, to determine which paragraphs are semantically similar.
- In
fonts mapping 304, in one embodiment, a Unicode character mapping for each glyph in a document to be reconstructed is determined. The mapping ensures that no two glyphs are mapped to a same Unicode character. To achieve this goal, a set of rules is defined and followed, including applying the Unicode mapping found in the embedded font file; determining the Unicode mapping by looking up postscript character names in a standard table, such as a system TrueType font dictionary; and determining the Unicode mapping by looking for patterns, such as hex codes, postscript name variants, and ligature notations. - For those glyphs or symbols that cannot be mapped by following the above rules, pattern recognition techniques may be applied on the rendered font to identify Unicode characters. If pattern recognition is still unsuccessful, the unrecognized characters may be mapped into the private use area (PUA) of Unicode. In this case, the semantics of the characters are not identified, but the encoding uniqueness is guaranteed. As such, rendering ensures fidelity to the original document.
- In table of
contents optimization 305, content of the reconstructed document is indexed. In one embodiment, the indexed content is aggregated into a document-specific table of contents that describes the structure of the document at the page level. For example, when converting printed publications into electronic documents with preservation of page fidelity, it may be desirable to keep the digital page numbering consistent with the numbering of the original document pages. - The table of contents may be optimized at different levels of the table. At the primary level, the chapter headings within the original document, such as headings for a preface, chapter numbers, chapter titles, an appendix, and a glossary may be indexed. A chapter heading may be found based on the spacing between chapters. Alternatively, a chapter heading may be found based on the font face, including font type, style, weight, or size. For example, the headings may have a font face that is different from the font face used throughout the rest of the document. After identifying the headings, the number of the page on which each heading is located is retrieved.
- At a secondary level, sub-chapter headings within the original document may be identified, such as dedications and acknowledgments, section titles, image captions, and table titles. Vertical spacing between sections, text, and/or font face may be used to segment each chapter. For example, each chapter may be parsed to identify all occurrences of the sub-chapter heading font face, and determine the page number associated with each identified sub-chapter heading.
-
FIG. 4 illustrates an automated educational course deconstruction system, according to one embodiment.FIG. 4 demonstrates the interaction between learningunits extraction systems 440, theeducation platform 450, and theHTML5 browser environment 470. In some embodiments, the learningunits extraction systems 440 may be integrated as part of theplatform environment 100 illustrated inFIG. 1 , and in other embodiments they may be separate systems. Likewise, theeducation platform 450 may have components in common with the functional blocks of theplatform environment 100, and theHTML5 browser environment 470 may be the same as theeReading application 170 of the experience block 104 of theplatform environment 100, or the functionality may be implemented in different modules. - The
education platform 450 serves the education services to registered users 471 based on a process of requesting and fetching on-line services in the context of authenticated on-line sessions. In the example illustrated inFIG. 4 , theeducation platform 450 includes acontent catalog database 451,publishing systems 452,content distribution systems 453, and reportingsystems 454. Thecontent catalog database 451 contains the collection of content available via theeducation platform 451. Thecontent catalog database 451 feeds the content to thepublishing systems 452. Thepublishing systems 452 serve the content to registered users 471 via thecontent distribution system 453. Reportingsystems 454 receive reports of user experience and user activities from the connecteddevices 470 operated by the registered users 471. This feedback is used bycontent distribution system 453 for managing the distribution of the content and for capturing UGC and other forms of user activities to add to thecontent catalog database 451. - The learning
units extraction systems 440 receives published content from thepublishing systems 452 for analysis, and provides a mapping of concepts to activities for storage in thelearning units database 445. The learningunits extraction system 440 include modules forcontent analysis 441,concepts extraction 442, and activities mapping 443, timeline mapping, and includes alearning units database 445. - The
content analysis module 441 analyzes the content available from thecontent catalog database 451. This includes content added by registered users 471 through their interactions with theeducation platform 450. Thecontent analysis module 441 collects and prepares related content for further processing by the learningunits extraction systems 440. - The
concepts extraction module 442 extracts concepts from the analyzed content to determine a list of concepts. The extracted list of concepts is stored in thelearning units database 445. - The
activities mapping module 443 determines which activities undertaken by the registered users 471 are related to which concepts. Theactivities mapping module 443 stores the association in thelearning units database 445. Because the content and other services are originating from thesame platform environment 100, the users' activities are analyzed and correlated to each other. These activities can be aggregated over time into distinct categories. These activities are broadly categorized as passive 472, active 473, and recall 474. Each concept is mapped to at least one type of user activity, and may be mapped to all three types of user activities. -
Passive activities 472 includes activities where registered users are passively interacting with published academic content materials associated to a particular course. For example, the reference materials for a course typically include one or more published documents, such as textbook, summary, syllabus, and other digital related content which are aggregated at the course level and accessible from the registered users' connected devices. These activities are defined as “passive” because they are typically orchestrated by each user around multiple on-line reading authenticated sessions when accessing the structured HTML course-based referenced documents. By directly handling the fetching and requesting of all HTML course-based document pages for its registered users, the connected education platform analyzes the passive reading activities within each course, correlating how registered users are interacting with the referenced academic content within any course delivery. - Activities are defined as “active” when registered users are interacting with course-defined referenced academic documents by creating their own user generated content (UGC) layer as managed by the platform services. By contrast to “passive” activities, where content is predetermined, static and structured as part of a course description, the process of creating user generated content is unique to each user, both in terms of actual material, format, frequency or structure, for example. In this instance, UGC is defined by the creation of personal notes, highlights, asking or answering questions, and other comments, or interacting with other registered users 471 through the
education platform 450 while accessing the referenced course-based HTML documents. Other types of UGC include seeking support when help is needed, running step-by-step problems associated to particular sections of course-based HTML documents, connecting and exchanging feedback with peers, among others. These UGC activities are authenticated through on-line “active” sessions that are processed and correlated by the platformcontent distribution system 453 andreporting system 454. By directly handling the fetching and requesting of all UGC content for registered users, theplatform 450 can correlate how registered users add their UGC layer within any course delivery. - Activities are defined as “recall” activities when registered users are being tested against the knowledge acquired from their previous passive and active sessions. By contrast to the previous passive and active sessions, recalls can be orchestrated around combined predetermined content material with user generated content. For instance, the assignments, quizzes and other testing materials associated to the particular course and its curriculum are typically predefined and offered to registered users as structured documents that are enhanced once personal content is added into them. Typically, a set of predetermined questions which are aggregated by the
platform 450 into a digital testing material is described as a structured HTML document that is published either as a stand-alone document or as supplemental to a course-based document. By contrast, the individual answers to these questions are expressed as UGC in some testing-like activities. When registered users are answering questions as part of a testing exercise within a course delivery, the resulting authenticated on-line sessions are processed and correlated by theplatform content distribution 453 and reportingsystems 454. By directly handling the fetching and requesting of all testing content for registered users, theplatform 450 can correlate how registered users interact with the testing documents within any course delivery. - The
timeline mapping module 444 determines the starting point and/or end point of activities that are recorded. Thetimeline mapping module 444 stores the respective times associated with the activities in thelearning units database 445. -
FIG. 5 illustrates the operation of automated learningunits extraction systems 440, according to one embodiment. The extraction of learning units drives the identification of activities in which users can be predicted to engage during the delivery of a course. Generally, the extraction is performed by atomizing the courses that users take into individual learning concepts. A general model of learning is then applied to these individual concepts in order to determine what likely activities that the users will be performing. - In the example of
FIG. 5 , the course structuredcontent library 550 is made up of data that supports passive 472, active 473, and recall 474 activities that a registered user 471 may undertake as part of the user's study of at least one course. The course structuredcontent library 550 may exist within thecontent catalog database 451. Thecontent analysis module 441 analyzes the materials that make up and/or are generated by these passive 472, active 473, and recall 474 activities along with additional documents from thecontent catalog database 451, and indexes them for theconcepts extraction module 442. Theconcepts extraction module 442 ranks the content affiliated per course of a plurality of courses and processes the content by extracting and normalizing the content into a unique combination of operands and operators that characterize the respective course. Each extracted combination of operand and operator forms a concept. As concepts are extracted by theconcepts extraction module 442, they are indexed into theconcepts index database 552. Then, the concepts are mapped to the activities that engage users who are studying those concepts as part of at least one course by theactivities mapping module 443. The activities are also mapped to a timeline by start time, end time, and/or elapsed time of the activities that are undertaken by atimeline mapping module 444. The timeline mapping is informed by aschool syllabus database 554 that contains information about course dates, lesson plans, or the like. The respective mappings of activities and timeline for each concept are stored in the learning units database 553. - As depicted in
FIG. 5 ,course learning units 555 are output of the learningunits extraction systems 440.Course learning units 555 are composed of the mapping between a concept and the activities that are performed by a user that are related to that concept within the time boundaries of an educational course. The coupling of one concept to at least one learning activity collectively defines a discrete learning unit. The learning unit attributes are expressed as the unique combination of a single concept with its mapped activities. A course is composed of a plurality of learning units, which may each be associated with a start time, an end time, a length of time, or an elapsed time in which the learning unit is studied and the activities associated with the learning unit are performed. Once a course has been deconstructed into a plurality of learning units, the learning units can be shuffled into different orders. Alternatively or additionally, once a plurality of courses have been deconstructed into discrete learning units, the learning units can be mixed, matched, or assembled into new courses. -
FIGS. 6A-B illustrate a process of learning unit extraction performed by automated learning units extraction systems, according to one embodiment. As illustrated inFIG. 6A ,content analysis 441 begins with materials from the platformcontent catalog database 451 which have been associated with passive, active, and recall user activities. Instep 601, content media types are identified. The media types may be inferred from data from thecontent catalog database 451 or the users' content activities, or the media types may be explicitly tagged. Certain media types, such as a summary of a course and a course syllabus, may tend to be more reliable and indicative of the learning units that will be studied in a course than other media types, such as related content and UGC, which may only be tangentially related to the learning units studied in the course. The identification of the media types instep 601 allows the media types to be sorted instep 602. The media types may be sorted for example, in descending order of reliability or importance in terms of containing valuable concepts for extraction as the basis of a learning unit. In one embodiment, the media types are ordered as follows: summary, syllabus, textbook, related content, UGC, Q&A, and testing materials, with all other materials following. The summary, syllabus, and textbook may be considered primary sources, whereas the remainder of the sources may be considered secondary sources. The content is loaded in the sorted order for extraction. -
FIG. 6B continuesFIG. 6A at the point that content has been loaded for extraction. Theconcepts extraction module 442 and thetimeline mapping module 444 illustrate several processing steps that may be undertaken sequentially or in parallel for each media type. - In
step 603, relationships and tags are extracted from the loaded content. In one embodiment, these relationships are the relationships between the content and the course, subject of study, jobs relevant to the field of study, and any other field that may be tracked by an education platform. The relationships may be explicitly tagged within thecontent catalog database 451. These explicit tags can be fed into the concept data record or listing 604. The tags may also be fed into the groupedcontent activities 605, which will be discussed in great detail below. - In
step 606, the loaded content can be sliced into logical groups. The logical groups may be determined based at least in part on the structure of the loaded content, for example a subsection of a textbook, a chapter of a study guide, or a paragraph of a course summary. Then, instep 607, the logical groups are diced intokey phrases 607 using language analysis. Generally, the key phrases are candidates for concepts, and are composed of a combination of an “operator” and an “operand”. It can also be thought of as a verb and a direct object. It is this combination that uniquely identifies a particular concept. The “operator” is the action that a student learns to perform and the “operand” is the type of thing that the student learns to perform the action on. Either can be specific or broad. A major difference between the Humanities and Science is that the operands in the Humanities courses are very specific and the operators are very broad, while the opposite is true for Science courses. For example, “Perspectives of Free Black Soldiers in the American Civil War” is a dramatically specific “operand” but the “operator” for a class studying it is nearly the same as every other Humanities course, based on “compare,” “contrast,” “analyze” “look for trends and patterns,” and “develop an opinion or argument for an underlying reason or structure.” Even more specifically, classes can focus entirely around a single work, such as Thomas Pynchon's novel Crying of Lot 49. By contrast, a mathematics course focuses on a specific action “determining the linear independence of a system of equations,” for example. However, it can then be applied to a wide range of different problems, and not just the specific instances of problems studied in class. In one embodiment, key phrases either have a specific operator or operand. Concepts where both are broad and generic are listed as undefined because they lack meaningful boundaries. Concepts where both are specific tend not to be as useful in some embodiments, although they are practical when learning to do repetitive tasks. As a result, the undefined key phrases which are identified may be listed separately and/or excluded from further processing. The key phrases that are identified from the loaded content are stored instep 608 as part of theconcept data record 604. - In
step 609, citations are extracted for character strings (e.g., character strings that have been identified as key phrases or may be text recitations of concepts). For example, for each loaded content item, the citations for a text string are indexed. Instep 610, the text strings are normalized in format, for example by removing unwanted characters, eliminating punctuation, and standardizing language (e.g., making nouns singular or plural, and/or truncating verbs, or the like). The normalized citations are then stored instep 611 as part of theconcept data record 604. - In
step 612, the loaded content is tagged as stemming from passive, active, and recall activities undertaken by users. The tags are also passed to theconcept data record 604. - In
step 613, as part of thetimeline mapping module 444, the loaded content is analyzed to extract the time which is relevant to the content, for example, when pages were read, when a quiz was completed, when an assignment was started, or the like, as informed by the users activity logs 651. Instep 614, by referencing theschool syllabus data 652, the process can normalize the extracted time relevant to the content according to the school in order to determine a relative time within the course at which the content was acted upon. For example, if a section of a textbook was read in the fourth week of a 10 week summer course, it could be normalized to being studied when 40% of the term was complete, and thus compared against the normalized timing of reading the section of atextbook 40% through a longer Fall semester, or against the timing of reading of the section of a textbook in previous summer terms. The normalized timing of activities is stored in theconcepts data record 604. In one embodiment, the normalized timing of activities may be reported for use in updating theschool syllabus data 652 to be responsive to adjustments in the flow of the course delivery. Depending on circumstances, a class may linger on a learning unit longer than planned at the outset of the course, for example, and the reported normalized timing of activities can be used to dynamically update the course syllabus according to the reality of the course delivery. - In
step 614, also as part of thetimeline mapping module 444, the loaded content is analyzed to extract the time duration of an activity, for example how long a user spent reading a chapter, working on a problem set, or the like, as informed by the users activity longs 651. For example, a student may have spent 7 hours studying a chapter of a textbook, as revealed by the elapsed time in each of a user's reading sessions for that chapter recorded in theusers activity log 651. The user's individual time can be normalized across users in step 616 by referring to users activity logs 651 to determine a typical duration for the activity. The user's specific duration or the normalized duration may also be stored in theconcepts data record 604. In one embodiment, the normalized duration of activities may be reported, for example, for use in planning future iterations of a course. By knowing on average how long students spend on each activity in a learning unit, an appropriate duration of the learning unit and an appropriate number of learning units for a course can be planned so that the course fits within the school schedule and engages students at an appropriate level of involvement. - As described above, several processes of the
concepts extraction module 442 and thetimeline mapping module 444 have populated theconcept data record 604, which is indexed by theconcepts index database 552. Instep 617, similar concepts can be combined and collapsed into one concept in theconcepts index database 552 to avoid duplication and simplify thedatabase 552. Such a recombination process may be performed iteratively as new content is loaded for extraction and analyzed. It is noted that while some concepts may be uncommon in a particular course, such as covering the personally favored topics of a particular instructor, a great many are shared between courses that share the same logical curricular block. For example, every course on “beginning linear algebra” covers the topic “linear independence.” Therefore, the extraction systems are effectively building up the list of concepts for courses over time even when the list is not complete for a given course. - The
concepts index database 552 can be used together with the groupedcontent activities 605 to create a learning unit. The association between a concept from theconcepts index database 552 and the passive, active, and recallactivities 605 that a student undertakes to study the concept together make up the learning unit, which is stored in thelearning units database 445. Accordingly, by usingcourse learning units 555, an interested party can search concepts by relationships or tags in step 618 through theconcepts index database 552, or an interested party can search learning units by concepts or activities in step 619 through thelearning units database 445. The interested party may be a student seeking to fill gaps in their education, a teacher planning a course, an administrator organizing a curriculum, an employer designing job requirements or seeking job applicants, or any other person or system interested in how students engage in their education on a digital education platform. -
FIG. 7 illustrates a predictive model of course organization, according to one embodiment. In this example, the course structured content activities are completed between afixed start 701 and a fixed ending 702 of the course timeline. The course timeline includes a subdivision that predicts what activities a user studying the course will undertake in each of several time periods throughout the actual delivery of the course. In this example, the course is divided into eight equal time periods, but the time periods may vary in length in other examples. Likewise, in this example, there are passive 472 activities expected in the schedule of each of the time periods, but active 473 and recall 474 activities scheduled in only some of the time periods, but that need not be the case in other examples. Typically, passive sessions will be interleaved with active sessions and followed by a recall session if available. When applied to general course delivery, registered students are reading one or more pages of the course referenced document, then are interacting with these pages by adding their own content layer and continuing to iterate on that basis until reaching and completing a testing event, thus allowing them to move to the next phase of the course delivery. The course is completed once all chapter-based activities are executed, sequentially or not, or when time runs out. In addition, as the same courses are delivered repeatedly over time, theplatform 450 can build a model of likely activities for each course in its catalog. The outcome of this is a predictive learning model for a course and/or class. The time interval, delta T, along with passive 472, active 473, and recall 474 activities, are predicted based on analytics from previous courses and classes that have been delivered. The model is then applied across multiple instances of that course, enabling direct comparisons between similar courses and cross indexing likely content activities and events. -
FIG. 8 illustrates an example course deconstruction into distinct learning units, according to one embodiment. As discussed previously, the structure and delivery of a course can be expressed as the aggregation of passive, active, and recall on-line sessions that collectively summarize all the events and content activities associated to that course by the registered users during its actual delivery. Once the learning units have been extracted, the course is effectively atomized into a series of concepts and relevant activities, determining a new structure and organization for that course. In this new learning model, the course referenced content, i.e., the textbook, which is traditionally determining the structure of the course, is replaced with individual learning units that are more accurate indicators and representations of what students need to achieve. In some embodiments of the invention, the modular nature of the learning units allows them to be rearranged, mixed with learning units from other courses, and aggregated in different orders in order adapt learning materials to suit different learning styles, instructor preferences, institutional goals, or for any other reason. - As shown in
FIG. 8 , the course is deconstructed into a timeline between afixed start 701 and a fixed ending 702. In this scenario, the time allotted to eachlearning unit 801 is predicted based on analytics of user activities reported from previous courses/classes. Eachlearning unit 801 is associated with aconcept 802 and is associated with at least one activity. In this example, eachconcept 802 is supported by a passive 472, an active 473, and arecall 474 activity, but that need not always be the case. In one embodiment, if holes exist, theextraction system 440 can fill in the activities for a given concept. Depending on the type of concept, theplatform 450 suggests a wide range of products and services that can fulfill one of the activity types of the learning unit. For instance, theplatform 450 can suggest additional reading for more passive learning, homework help and additional tutoring for active learning, and practice quizzes and tests for recall, all based on the indexed concepts and activities in thelearning units database 445 that correspond to the learning units that a student is studying in a course. These can be useful to the user regardless of whether they are perfectly aligned to the particular assignments that are offered in the school because they are focused on mastering the same concepts. This way, the platform services can monitor users to ensure they are on track to succeeding and prodding them to seek more help if they are falling behind. - The present invention has been described in particular detail with respect to several possible embodiments. Those of skill in the art will appreciate that the invention may be practiced in other embodiments. The particular naming of the components, capitalization of terms, the attributes, data structures, or any other programming or structural aspect is not mandatory or significant, and the mechanisms that implement the invention or its features may have different names, formats, or protocols. Further, the system may be implemented via a combination of hardware and software, as described, or entirely in hardware elements. Also, the particular division of functionality between the various system components described herein is merely exemplary, and not mandatory; functions performed by a single system component may instead be performed by multiple components, and functions performed by multiple components may instead performed by a single component.
- Some portions of above description present the features of the present invention in terms of algorithms and symbolic representations of operations on information. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. These operations, while described functionally or logically, are understood to be implemented by computer programs. Furthermore, it has also proven convenient at times, to refer to these arrangements of operations as modules or by functional names, without loss of generality.
- Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as “determining” or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system memories or registers or other such information storage, transmission or display devices.
- Certain aspects of the present invention include process steps and instructions described herein in the form of an algorithm. It should be noted that the process steps and instructions of the present invention could be embodied in software, firmware or hardware, and when embodied in software, could be downloaded to reside on and be operated from different platforms used by real time network operating systems.
- The present invention also relates to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored on a computer readable medium that can be accessed by the computer and run by a computer processor. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, application specific integrated circuits (ASICs), or any type of media suitable for storing electronic instructions, and each coupled to a computer system bus. Furthermore, the computers referred to in the specification may include a single processor or may be architectures employing multiple processor designs for increased computing capability.
- In addition, the present invention is not limited to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any references to specific languages, such as HTML or HTML5, are provided for enablement and best mode of the present invention.
- The present invention is well suited to a wide variety of computer network systems over numerous topologies. Within this field, the configuration and management of large networks comprise storage devices and computers that are communicatively coupled to dissimilar computers and storage devices over a network, such as the Internet.
- Finally, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, the disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention.
Claims (16)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/971,738 US9378647B2 (en) | 2013-08-20 | 2013-08-20 | Automated course deconstruction into learning units in digital education platforms |
PCT/US2014/050960 WO2015026607A1 (en) | 2013-08-20 | 2014-08-13 | Automated course deconstruction into learning units in digital education platforms |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/971,738 US9378647B2 (en) | 2013-08-20 | 2013-08-20 | Automated course deconstruction into learning units in digital education platforms |
Publications (2)
Publication Number | Publication Date |
---|---|
US20150056596A1 true US20150056596A1 (en) | 2015-02-26 |
US9378647B2 US9378647B2 (en) | 2016-06-28 |
Family
ID=52480698
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/971,738 Active 2034-06-28 US9378647B2 (en) | 2013-08-20 | 2013-08-20 | Automated course deconstruction into learning units in digital education platforms |
Country Status (2)
Country | Link |
---|---|
US (1) | US9378647B2 (en) |
WO (1) | WO2015026607A1 (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150082197A1 (en) * | 2013-09-13 | 2015-03-19 | Box, Inc. | Systems and methods for configuring event-based automation in cloud-based collaboration platforms |
US20150086946A1 (en) * | 2013-09-20 | 2015-03-26 | David A. Mandina | NDT File Cabinet |
US20160117339A1 (en) * | 2014-10-27 | 2016-04-28 | Chegg, Inc. | Automated Lecture Deconstruction |
US20160260336A1 (en) * | 2015-03-03 | 2016-09-08 | D2L Corporation | Systems and methods for collating course activities from a plurality of courses into a personal learning stream |
US20160358488A1 (en) * | 2015-06-03 | 2016-12-08 | International Business Machines Corporation | Dynamic learning supplementation with intelligent delivery of appropriate content |
US20180033106A1 (en) * | 2016-07-26 | 2018-02-01 | Hope Yuan-Jing Chung | Learning Progress Monitoring System |
US9894119B2 (en) | 2014-08-29 | 2018-02-13 | Box, Inc. | Configurable metadata-based automation and content classification architecture for cloud-based collaboration platforms |
US9904435B2 (en) | 2012-01-06 | 2018-02-27 | Box, Inc. | System and method for actionable event generation for task delegation and management via a discussion forum in a web-based collaboration environment |
CN108257057A (en) * | 2018-01-26 | 2018-07-06 | 河南工学院 | A kind of computer assisted instruction system and method |
US20180350255A1 (en) * | 2017-05-31 | 2018-12-06 | Pearson Education, Inc. | Automated learner-focused content divisions |
US20180350254A1 (en) * | 2017-05-31 | 2018-12-06 | Pearson Education, Inc. | Multi table of contents and courseware generation |
WO2018222218A1 (en) * | 2017-05-31 | 2018-12-06 | Pearson Education, Inc. | Multi table of contents and courseware generation |
US20190311215A1 (en) * | 2018-04-09 | 2019-10-10 | Kåre L. Andersson | Systems and methods for adaptive data processing associated with complex dynamics |
CN111428052A (en) * | 2020-03-30 | 2020-07-17 | 中国科学技术大学 | Method for constructing educational concept graph with multiple relations from multi-source data |
CN112860983A (en) * | 2019-11-27 | 2021-05-28 | 上海流利说信息技术有限公司 | Learning content pushing method, system, equipment and readable storage medium |
US11158204B2 (en) * | 2017-06-13 | 2021-10-26 | Cerego Japan Kabushiki Kaisha | System and method for customizing learning interactions based on a user model |
EP4053732A1 (en) * | 2021-03-02 | 2022-09-07 | Canva Pty Ltd. | Systems and methods for extracting text from portable document format data |
WO2022253225A1 (en) * | 2021-06-04 | 2022-12-08 | International Business Machines Corporation | Reformatting digital content for digital learning platforms using suitability scores |
US20230035338A1 (en) * | 2020-11-09 | 2023-02-02 | Xi'an Jiaotong University | Community question-answer website answer sorting method and system combined with active learning |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9940606B2 (en) | 2013-10-30 | 2018-04-10 | Chegg, Inc. | Correlating jobs with personalized learning activities in online education platforms |
US10049416B2 (en) | 2013-11-26 | 2018-08-14 | Chegg, Inc. | Job recall services in online education platforms |
US11436286B1 (en) | 2019-04-04 | 2022-09-06 | Otsuka America Pharmaceutical, Inc. | System and method for using deconstructed document sections to generate report data structures |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070136233A1 (en) * | 2005-12-12 | 2007-06-14 | Sbc Knowledge Ventures Lp | Method for analyzing, deconstructing, reconstructing, and repurposing rhetorical content |
US20070162465A1 (en) * | 2003-06-27 | 2007-07-12 | Bill Cope | Method and apparatus for the creation, location and formatting of digital content |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101024140B1 (en) | 2008-11-13 | 2011-03-22 | 박덕용 | Content Delivery System for Online Learning |
US20120231435A1 (en) | 2011-03-09 | 2012-09-13 | Mcbride Matthew D | System and method for education including community-sourced data and community interactions |
KR20130049433A (en) | 2011-11-04 | 2013-05-14 | 두산동아 주식회사 | Distance education system and method for editing learning contents by stages in digital textbook |
TWI446306B (en) | 2011-11-21 | 2014-07-21 | Palmforce Software Inc | Method and system for feedback learning language |
US20130164727A1 (en) | 2011-11-30 | 2013-06-27 | Zeljko Dzakula | Device and method for reinforced programmed learning |
WO2013085699A1 (en) | 2011-12-09 | 2013-06-13 | Chegg, Inc. | Time based data visualization |
-
2013
- 2013-08-20 US US13/971,738 patent/US9378647B2/en active Active
-
2014
- 2014-08-13 WO PCT/US2014/050960 patent/WO2015026607A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070162465A1 (en) * | 2003-06-27 | 2007-07-12 | Bill Cope | Method and apparatus for the creation, location and formatting of digital content |
US20070136233A1 (en) * | 2005-12-12 | 2007-06-14 | Sbc Knowledge Ventures Lp | Method for analyzing, deconstructing, reconstructing, and repurposing rhetorical content |
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9904435B2 (en) | 2012-01-06 | 2018-02-27 | Box, Inc. | System and method for actionable event generation for task delegation and management via a discussion forum in a web-based collaboration environment |
US10509527B2 (en) * | 2013-09-13 | 2019-12-17 | Box, Inc. | Systems and methods for configuring event-based automation in cloud-based collaboration platforms |
US11822759B2 (en) | 2013-09-13 | 2023-11-21 | Box, Inc. | System and methods for configuring event-based automation in cloud-based collaboration platforms |
US11435865B2 (en) | 2013-09-13 | 2022-09-06 | Box, Inc. | System and methods for configuring event-based automation in cloud-based collaboration platforms |
US20150082197A1 (en) * | 2013-09-13 | 2015-03-19 | Box, Inc. | Systems and methods for configuring event-based automation in cloud-based collaboration platforms |
US20150086946A1 (en) * | 2013-09-20 | 2015-03-26 | David A. Mandina | NDT File Cabinet |
US11876845B2 (en) | 2014-08-29 | 2024-01-16 | Box, Inc. | Configurable metadata-based automation and content classification architecture for cloud-based collaboration platforms |
US11146600B2 (en) | 2014-08-29 | 2021-10-12 | Box, Inc. | Configurable metadata-based automation and content classification architecture for cloud-based collaboration platforms |
US9894119B2 (en) | 2014-08-29 | 2018-02-13 | Box, Inc. | Configurable metadata-based automation and content classification architecture for cloud-based collaboration platforms |
US10708321B2 (en) | 2014-08-29 | 2020-07-07 | Box, Inc. | Configurable metadata-based automation and content classification architecture for cloud-based collaboration platforms |
US11151188B2 (en) | 2014-10-27 | 2021-10-19 | Chegg, Inc. | Automated lecture deconstruction |
US11797597B2 (en) | 2014-10-27 | 2023-10-24 | Chegg, Inc. | Automated lecture deconstruction |
US10140379B2 (en) * | 2014-10-27 | 2018-11-27 | Chegg, Inc. | Automated lecture deconstruction |
US20160117339A1 (en) * | 2014-10-27 | 2016-04-28 | Chegg, Inc. | Automated Lecture Deconstruction |
US20160260336A1 (en) * | 2015-03-03 | 2016-09-08 | D2L Corporation | Systems and methods for collating course activities from a plurality of courses into a personal learning stream |
US20160358489A1 (en) * | 2015-06-03 | 2016-12-08 | International Business Machines Corporation | Dynamic learning supplementation with intelligent delivery of appropriate content |
US20160358488A1 (en) * | 2015-06-03 | 2016-12-08 | International Business Machines Corporation | Dynamic learning supplementation with intelligent delivery of appropriate content |
US10586297B2 (en) * | 2016-07-26 | 2020-03-10 | Hope Yuan-Jing Chung | Learning progress monitoring system |
US20180033106A1 (en) * | 2016-07-26 | 2018-02-01 | Hope Yuan-Jing Chung | Learning Progress Monitoring System |
US20180350255A1 (en) * | 2017-05-31 | 2018-12-06 | Pearson Education, Inc. | Automated learner-focused content divisions |
US20180350254A1 (en) * | 2017-05-31 | 2018-12-06 | Pearson Education, Inc. | Multi table of contents and courseware generation |
WO2018222218A1 (en) * | 2017-05-31 | 2018-12-06 | Pearson Education, Inc. | Multi table of contents and courseware generation |
US20210343176A1 (en) * | 2017-06-13 | 2021-11-04 | Cerego Japan Kabushiki Kaisha | System and method for customizing learning interactions based on a user model |
US11158204B2 (en) * | 2017-06-13 | 2021-10-26 | Cerego Japan Kabushiki Kaisha | System and method for customizing learning interactions based on a user model |
US11776417B2 (en) * | 2017-06-13 | 2023-10-03 | Cerego Japan Kabushiki Kaisha | System and method for customizing learning interactions based on a user model |
CN108257057A (en) * | 2018-01-26 | 2018-07-06 | 河南工学院 | A kind of computer assisted instruction system and method |
US11604937B2 (en) * | 2018-04-09 | 2023-03-14 | Kåre L. Andersson | Systems and methods for adaptive data processing associated with complex dynamics |
US20190311215A1 (en) * | 2018-04-09 | 2019-10-10 | Kåre L. Andersson | Systems and methods for adaptive data processing associated with complex dynamics |
CN112860983A (en) * | 2019-11-27 | 2021-05-28 | 上海流利说信息技术有限公司 | Learning content pushing method, system, equipment and readable storage medium |
CN111428052A (en) * | 2020-03-30 | 2020-07-17 | 中国科学技术大学 | Method for constructing educational concept graph with multiple relations from multi-source data |
US20230035338A1 (en) * | 2020-11-09 | 2023-02-02 | Xi'an Jiaotong University | Community question-answer website answer sorting method and system combined with active learning |
US11874862B2 (en) * | 2020-11-09 | 2024-01-16 | Xi'an Jiaotong University | Community question-answer website answer sorting method and system combined with active learning |
EP4053732A1 (en) * | 2021-03-02 | 2022-09-07 | Canva Pty Ltd. | Systems and methods for extracting text from portable document format data |
US12067351B2 (en) | 2021-03-02 | 2024-08-20 | Canva Pty Ltd | Systems and methods for extracting text from portable document format data |
US11557218B2 (en) | 2021-06-04 | 2023-01-17 | International Business Machines Corporation | Reformatting digital content for digital learning platforms using suitability scores |
WO2022253225A1 (en) * | 2021-06-04 | 2022-12-08 | International Business Machines Corporation | Reformatting digital content for digital learning platforms using suitability scores |
Also Published As
Publication number | Publication date |
---|---|
US9378647B2 (en) | 2016-06-28 |
WO2015026607A1 (en) | 2015-02-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11816637B2 (en) | Correlating jobs with personalized learning activities in online education platforms | |
US9378647B2 (en) | Automated course deconstruction into learning units in digital education platforms | |
US11790467B2 (en) | Job recall services in online education platforms | |
US11797597B2 (en) | Automated lecture deconstruction | |
US9852132B2 (en) | Building a topical learning model in a content management system | |
Pavlik | Collaborating with ChatGPT: Considering the implications of generative artificial intelligence for journalism and media education | |
US11741290B2 (en) | Automated testing materials in electronic document publishing | |
US20160034757A1 (en) | Generating an Academic Topic Graph from Digital Documents | |
US20150324459A1 (en) | Method and apparatus to build a common classification system across multiple content entities | |
US9870358B2 (en) | Augmented reading systems | |
US20130151300A1 (en) | Time Based Data Visualization | |
US20150302352A1 (en) | Knowledge proximity detector | |
Pirnay-Dummer et al. | Automated knowledge visualization and assessment | |
Mealand | Hellenistic Greek and the New Testament: A stylometric perspective | |
Santos Almeida et al. | Sequential Pattern Mining of Students Data: A Case Study with Moodle Log Data | |
US10255329B1 (en) | System information management | |
Siegel | Research Guides: A Social Work Research Guide: Sessions & Handouts | |
Gibendi | Managing digital academic grey literature using D-Space at Strathmore University Library. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: CHEGG, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BERCOVITZ, BENJAMIN JAMES;SRI, PAUL CHRIS;MADHAVAN, ANAND;AND OTHERS;SIGNING DATES FROM 20130815 TO 20130820;REEL/FRAME:032032/0723 |
|
AS | Assignment |
Owner name: CHEGG, INC., CALIFORNIA Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE ASSIGNEE'S ADDRESS FROM;ASSIGNORS:BERCOVITZ, BENJAMIN JAMES;SRI, PAUL CHRIS;MADHAVAN, ANAND;AND OTHERS;SIGNING DATES FROM 20130815 TO 20130820;REEL/FRAME:038827/0672 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: WELLS FARGO BANK, NATIONAL ASSOCIATION, CALIFORNIA Free format text: SECURITY INTEREST;ASSIGNOR:CHEGG, INC.;REEL/FRAME:039837/0859 Effective date: 20160921 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |