Curricula

In order to support the objectives of the PlanetData project, it is important to ensure that there is access to appropriate guidance in the technologies and approaches, which will underlie future large-scale data management tasks. The PlanetData project organizes a number of relevant training events supporting these aims. In particular we closely collaborate with the EUCLID project in terms of curriculum alignment and joint events. This page offers an overview of this collaboration, which resulted in an analysis of recent data science training offerings and their relevance to semantic data management. A core part of the summary are the learning materials developed in

the Educational Curriculum for the usage of Linked Data project, which are also used in the ESWC summer school

EUCLID Curriculum

The section provides an overview of EUCLID's curriculum structure:

Objectives

A Curriculum is important for both trainers and learners in order to achieve more understanding of what is going to be expected to have been covered by the end of the training so that the learner achieves the necessary level of competency in the appropriate technologies and approaches in order to be able to consider themselves skilled in large scale data management. The given curriculum is set up to offer a better grasp of the principles of large-scale data management.

The curriculum acts as a guideline for creating concrete training and education offers in the course of, and following, the project.

 

Structure

  • Introduction and Application Scenarios
  • Querying Linked Data
  • Providing Linked Data
  • Interaction with Linked Data
  • Building Linked Data Applications
  • Scaling-up

 

PlanetData will enrich its training curriculum by implementing and using some of the elements presented in EUCLID's curriculum. This will be done by extracting relevant material from the six modules created and developed in EUCLID and map them to the PlanetData structure, with heavy impact within the Linked Data topic. One other topic which could benefit and digest EUCLID's materials is the one on Semantic Technologies.

PlanetData is at this point creating materials and learning paths for its learners via a selection of presentations and video lectures to address its specific target groups, whereas the EUCLID curriculum has a much more practical approach such as giving specific examples, set of tools and solutions that can directly be used in the specific topic of Linked Data. It does not focus on the overall architectural approach or the individual roles of the parties taking part in the data consumption and production.

EUCLID has identified a set of target audiences from the business and professional arena, addressing data architects, data managers, data analyst sand data applications developers. These are the ones that the PlanetData is also addressing, thus content in more than one PlanetData training sections will be combined with the individual EUCLID modules in order to optimize the competencies in the four main fields of expertise defined by the PlanetData curriculum.

In the Introductory level, the subtopic in "Principles of Linked Data" and "The Web of Data" materials will be used for all four groups within the professionals target group. Within the Intermediate level the sub-topic of "Linked Data Design" and "Recipes for Publishing Linked Data" will be reused for managers and architects. Finally in the Advanced level, the subtopic of "Consuming Linked Data" will be cross checked.

Figure bellow describes skills alignment with the PlanetData Curriculum:

 

Data Architect

Data Manager

Data Analyst

Data Application Developer

Introductory Level

1. Principles of Linked Data

1. Principles of Linked Data

1. Principles of Linked Data

1. Principles of Linked Data

 

2. The Web of Data

2. The Web of Data

2. The Web of Data

2. The Web of Data

Intermediate Level

3. Linked Data Design Considerations

3. Linked Data Design Considerations

   
 

4. Recipes for Publishing Linked Data

4. Recipes for Publishing Linked Data

   

Advanced Level

     

5. Consuming Linked Data

For details check EUCLID deliverables.


PlanetData Curriculum

The PlanetData project aims to establish a community of researchers that supports organizations in exposing their data in new and useful ways. In this context, the curriculum covers four main topics:

  • Semantic Technology 
  • Database Technology
  • Linked Data
  • Data Streams

These topics were chosen according to the insight of the PlanetData network into the necessary underlying technologies which will be part of future large scale data management approaches. 

Within each, the sub-topics relevant for future large-scale data managers (i.e. persons who will need to be able to use computer systems and software to produce, consume, and manage large scales of data) are described. 

To get a better understanding of each topic of the training curriculum, different methods of delivering the course material are available:

  • Self-training
  • Distance learning
  • Webinars
  • On-site training

Bellow you could find a brief overview of the cirruculum:

1. Semantic Web Foundations

a. A Short History of Knowledge Systems
b. Birth of the Semantic Web
c. Semantic Applications
d. Core Concepts
e. RDF/S
f. OWL
g. Rules
h. SPARQL

2. Ontologies and the Semantic Web

a. Ontologies: a brief history
b. Ontology development process
c. Hands-on: use an example and create a requirements document
d. Knowledge elicitation
e. Hands-on: formulate competency questions, carry out interviews, and make first draft of ontology
f. Ontology creation (tools)
g. Ontology design
h. Hands-on: create ontology in the chosen editor

3. Application Development

a. Semantic Web Application Framework
b. Development frameworks
c. Development methodologies
d. Creating the Semantic Web
e. Storing the Semantic Web
f. Creating Semantic Web clients
g. Querying the Semantic Web
h. An application example, e.g. a semantic web portal

4. KR and Reasoning on the Semantic Web

a. Core concepts in reasoning and logic
b. Description Logic-based Knowledge Representation
c. RDFS and Taxonomic reasoning
d. OWL semantics
e. Hands on session: seeing inferences using an ontology editor and a reasoner
f. Reasoners for the Semantic Web
g. Logic Programming
h. Semantic Web and Logic Programming

5. Ontology Lifecycle

a. Ontology lifecycle
b. Collaboratively developing an ontology
c. Finding ontologies
d. Ontology modularisation
e. Re-using ontologies
f. Ontology evaluation
g. Ontology refinement
h. Ontology evolution and versioning

6. Semantic Web Services

a. From Web Services to Semantic Web Services
b. Adding Semantics to existing services: SAWSDL
c. OWL-S
d. The WSM stack
e. Semantic Web Service application deployment
f. Hands-on: creating a WSMO service with WSMT
g. Hands-on: discovering and executing WSMO services with WSMX
h. Hands-on: creating SWS with IRS-III and WSMO Studio

7. Semantic Web Services in Depth

a. SWS matching
b. SWS mediation
c. SWS orchestration
d. SWS choreography
e. SWS co-ordination
f. Trust and agreement between services
g. Capturing business rules
h. Capturing business processes

1. A brief history of DBMS

a. Data models
b. Query languages
c. Kinds of DBMS: relational, object-oriented, semi-structured, XML, Multi-dimensional, ...
d. Distributed architectures: distributed, federated, multi-DBMSs, P2P DBMSs
e. Major commercial/open-source DBMSs

2. Performance Benchmarking

a. Components: hardware platform, data structures, algebraic optimizer, SQL parser
b. Measures: throughput, response time, availability; speedup, scaleup, sizeup.
c. Relational benchmarks: Wisconsin, AS3AP, TPC-*, ...
d. XML benchmarks: XMark, XPathMark, XBench, ...

3. Basic database techniques

b. Query execution
c. Column-store architectures (MonetDB)
d. XML query processing

4. Advanced database techniques

a. Parallel and Distributed Database
b. Spatial and Geographic Databases

Introduction

a. The Data Deluge
b. The Rationale for Linked Data
c. Intended Audience
d. Introducing Big Lynx Productions

1. Principles of Linked Data

a. The Principles in a Nutshell
b. Naming Things with URIs
c. Making URIs Defererencable
d. Providing Useful RDF Information
e. Including Links to other Things

2. The Web of Data

a. Bootstrapping the Web of Data
b. Topology of the Web of Data

3. Linked Data Design Considerations

a. Using URIs as Names for Things
b. Describing Things with RDF
c. Publishing Data about Data
d. Choosing and Using Vocabularies
e. Making Links with RDF

4. Recipes for Publishing Linked Data

a. Linked Data Publishing Patterns
b. The Recipes
c. Additional Approaches to Publishing Linked Data
d. Testing and Debugging Linked Data
e. Linked Data Publishing Checklist

5. Consuming Linked Data

a. Deployed Linked Data Applications
b. Architecture of Linked Data Applications
c. Effort Distribution between Publishers, Consumers and Third

1. Motivation

a. Comparison with Relational DB storage

2. Streaming data models

a. Unbounded streams
b. Tuples, Windows
c. Timestamps
d. K-constraints

3. Query Languages

a. Relational operators
b. Window operators, temporal operators
c. Aggregators
d. Joins

4. Semantic streaming data

a. RDF Stream data models
b. SPARQL extensions for RDF Streams
c. Reasoning with Streams
d. Complex event processing
e. Linked Streaming Data

5. Query processing

a. Continuous queries
b. Window evaluation
c. Aggregates evaluation, approximative queries
d. Static optimization
e. Query optimization, statistics
f. Load shedding
g. Sampling

The complete Planet Data curriculum is available at D6.1 Training Curriculum.


  • Open Data Institute (ODI)
  • Lean Semantic Web
  • Cloudera Data Scientist Curriculum
  • EMC Data Science and Big Data Analytics Curriculum
  • GATE Training

Links

PlanetData Project: D6.1 Training Curriculum

EUCLID Project: D1.1.3 Curriculum Delivery