Eureka Research Workbench: A Semantic Approach to an Open ...

Eureka Research Workbench: A Semantic Approach to an Open ...

Toward Semantic Representation of Science in Electronic Laboratory Notebooks (ELNs) Stuart J. Chalk Department of Chemistry, University of North Florida [email protected] #ACSCINFDataSummit CINF Paper 50 251st ACS Meeting Spring 2016 Outline

Utopia: A Global Research Network What is an Electronic Notebook? The Semantics of Semantics What Needs to be Semantically Represented? Current lay of the land

ELN Item Manifest P-PLAN Ontology VIVO-ISF Ontology Chemical Analysis Metadata Platform HCLS Community Profiles Electronic Notebook Ontology A generic scientific data model Experimental information for LD (ExptLD) Take Home Conclusion

Utopia: A Global Research Network Big Data and the Semantic Web are the current buzz words du jour but what do they mean for chemistry? Lots of heterogeneous data and metadata with even more semantic data to represent it Look at what we want rather that what we have We went chemical data that is: Easy to share, find, and compare Freely available but with provenance Globally sourced and without IP restrictions on reuse What is an Electronic Laboratory Notebook? An electronic way to record data

...equivalent to a laboratory notebook But ELNs should not be thought of so lowly... An ELN must:* Keep track of research data Reference resources used in research and capture the story of research * Insight from Tony Williams What should an ELN be? The interface should mirror a laboratory notebook Behind the scenes though it should use state of the art software, data formats, data/metadata practices, and web technologies to manage data generation, workflows, remote data access, authentication etc As a result it needs to speak the same language as

other data sources and store data in a format that others can read and reuse Foundational building block of a Global Research Network The Semantics of Semantics Semantics is the study of meaning -> We need to give meaning to what is created in an ELN Described in computers using the Resource Description Framework (RDF) which: Makes statements about objects their relationships to other objects... ...using subject-predicate-object triples

RDF allows knowledge representation Meaning is represented by using one or more ontologies RDF in JSON-LD { "@context": { "name": "", "isAlive": "", "age": "", "height": "", "@base": "" }, "@id": "", "name": "Stuart Chalk",

"isAlive": true, "age": 49, "height": 188.0 } RDF in JSON-LD "49"^^ . "true"^^ .

"188"^^ . "Stuart Chalk" . What Needs to be Semantically Represented? Everything! What areas? Data, Results and Resources Models, Tools for Data Workup (Equations, Tests, Stats) General Workflows (Protocols and Procedures) The Research Story (What, Why, How)

User discussion and annotation ELN usage timeline The Science (Area, Hypotheses, Theories) The People (Expertise, Provenance, Integrity, Eminence) Workflows The P-PLAN Ontology Implement in Kepler, Taverna, Knime? People: The VIVO-ISF Ontology The Science: ChAMP (an example) The Chemical Analysis Metadata Platform (ChAMP) Identification of metadata related to chemical analysis and definition of an ontology to describe terms Examples in both XML and JSON-LD with associate XML Schema and JSON-LD context Journal Article Standard Method of Analysis Reference Material

ChAMP ChAMP Data Descriptions: HCLS Community Profile The Healthcare and Life Science (HCLS) Community Profile is a Note from the Semantic Web HCLS Interest Group Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. This document describes a consensus among participating stakeholders in the Health Care and the Life Sciences domain on the description of datasets using the

Resource Description Framework (RDF). This specification meets key functional requirements, reuses existing vocabularies to the extent that it is possible, and addresses elements of data description, versioning, provenance, discovery, exchange, query, and retrieval. Data Descriptions: HCLS Community Profile Describes three levels for description of datasets Summary Level Type declaration (rdf:type = dctypes:Dataset)

Title (dct:title = rdf:langString) Description (dct:description = rdf:langString) Publisher (dct:publisher = IRI) Version Level Publisher (dct:publisher = IRI) Version identifier (pav:version = xsd:string) Version linking (dct:isVersionOf = IRI) Distribution Level Type declaration (rdf:type =

void:Dataset OR dcat:Distribution) Title (dct:title = rdf:langString) Description (dct:description = rdf:langString) Creator (dct:creator = IRI) Publisher (dct:publisher = IRI) License (rdf:type = IRI) Type declaration (rdf:type = dctypes:Dataset) Title (dct:title = rdf:langString) Description (dct:description =

rdf:langString) Electronic Notebook Ontology (ENO) ENO ENO Data and Resources Use a Generic Scientific Data Model Captures data and metadata

about datasets and links to related data JSON-LD is ideal file format Experiment Markup Language (ExptML) A specification (written in XML) that describes different data types of information recorded during the scientific process ( Annotation Element Sample

Api Calculation Chemical Citation Communication

Customer Data Dataset Definition

Equipment Event Experiment Group Project Protocol Quote Report Result

Solution Space Specimen Substance Task Template Timeline User Vendor

Experimental Linked Data (ExptLD) Define data packets that capture the metadata of Resources Data Integrate with other ExptLD packets to create a SciData document Or convert to RDF and store in a triplestore

Take Home A lot exists to semantically represent the scientific process that can be leveraged as part of an ELN system A data standard needs to be agreed upon Agreeing on implementation standards will take time because of size of user community Integration and coverage of ontologies will be necessary to fully implement a system that underpins a Global Research Network Domain specific knowledge representation needed in many areas Questions?

[email protected] Phone: 904-620-1938 Skype: stuartchalk LinkedIn/Slidehare: stuchalk ORCID: ResearcherID:

Recently Viewed Presentations

  • Trauma Informed Primary Care - Children's Hospital of Wisconsin

    Trauma Informed Primary Care - Children's Hospital of Wisconsin

    Trauma informed care is an integral to primary care with youth. There are resources available for providers that can help clinics address and comfort young patients and families during each visit.
  • Antike Utopien und Staatsentwürfe

    Antike Utopien und Staatsentwürfe

    Jh., ab. [email protected] Zusammenfassung (Forts.) 7. Bei genauerer Analyse der Gedichte zeigt es sich, dass die Ober­schicht keine Feudalaristokratie ist, wie wir sie bei­spiels­weise aus dem Mittelalter kennen, sondern eine mehr oder weniger idealisierte und ausdifferenzierte Schicht aus Gutsbesitzern. 8.
  • Top 10 Nonfiction Texts of 2008/2009

    Top 10 Nonfiction Texts of 2008/2009

    Other siblings are not so friendly—hyena brothers will fight to the death. Nelson, Kadier. We are the Ship: The Story of Negro League Baseball. ... All that changed when he read his first poem. He loved the way the words...
  • 1. 1.What Whatis isthe theLPI? LPI? 2. 2.

    1. 1.What Whatis isthe theLPI? LPI? 2. 2.

    Trade logistics is playing an increasingly important role The LPI provides the most comprehensive data on country performance Consistency & Robustness LPI consistent with intuitive knowledge of country performance, very specific ranking Homogeneous respondent population The LPI, on a 1...
  • What is the relationship between the executive, legislative ...

    What is the relationship between the executive, legislative ...

    The Judicial Branch - Made up of Canada's Courts of Laws with the Supreme Court being the highest court in the land. These rules and laws are set out in the Charter of Rights and Freedoms, the Constitution and existing...
  • VIR Staff Meeting -

    VIR Staff Meeting -

    I need to change my schedule, should I call/text Jennifer at 9pm about it? No, call or email the RMC! Example: I am not feeling safe on the unit I am working on, and have spoken with the CSL and...
  • Presentation EC project - TT

    Presentation EC project - TT

    EXPERT LEVEL TRAINING ON TELECOM NETWORK COST MODELLING FOR THE HIPSSA REGIONS. Arusha. 15-19 July, 2013. David Rogerson, ITU Expert. HIPSSA Cost model training workshop: . Session 7: Approaches to cost modelling and their regulatory function
  • AAESS Parent Council - Al Ain English Speaking School

    AAESS Parent Council - Al Ain English Speaking School

    This template can be used as a starter file for presenting training materials in a group setting. Sections. Right-click on a slide to add sections. Sections can help to organize your slides or facilitate collaboration between multiple authors. Notes. Use...