Exploiting the DataCite schema and Elastic Search for complex queries

The following are some complex searches formulated as part of a project to establish how deep searches in a subject domain such as chemistry can be formulated. They exploit in particular the Subject element in the DataCite schema. They are not necessarily optimal (in the sense that more efficient variations for the same search might be found), but are presented as examples for the community to experiment with.

Entry description Elastic search query
1 Media (MIME) type https://search.datacite.org/works?query=media.media_type:chemical/x-mnpub*
https://search.datacite.org/works?query=media.media_type:chemical/x-jeol-jdf*
https://search.datacite.org/works?query=media.media_type:chemical/x-jcamp*
2 Combining Media with the DataCite Subject https://search.datacite.org/works?query=media.media_type:chemical/x-mnpub*+AND+subjects.subjectScheme:inchikey+AND+subjects.subject:XZYDALXOGPZGNV-UHFFFAOYSA-M+AND+media.media_type:chemical/x-gaussian*
3 Combining ORCID with Media https://search.datacite.org/works?query=contributors.nameIdentifiers.nameIdentifier:*0000-0002-8635-8390+AND+media.media_type:chemical/x-mnpub*
4 xploiting Subject https://search.datacite.org/works?query=subjects.subjectScheme:Gibbs_Energy+AND+subjects.subject:"-39.946176"
5 Exploiting Subject with range query https://search.datacite.org/works?query=subjects.subjectScheme:Gibbs_energy+AND+subjects.subject:[\-649.1 TO \-649.8]
6 Nested search with two Subjects https://search.datacite.org/works?query=(subjects.subjectScheme:inchikey+AND+subjects.subject:"-1082.980914")+AND+(subjects.subjectScheme:Gibbs_Energy+AND+subjects.subject:KTOSDSJYNBIDCN-UHFFFAOYSA-N)
Nested search with two Subjects transposed https://search.datacite.org/works?query=(subjects.subjectScheme:inchikey+AND+subjects.subject:KTOSDSJYNBIDCN-UHFFFAOYSA-N)+AND+(subjects.subjectScheme:Gibbs_Energy+AND+subjects.subject:"-1082.980914")
7 Two different Media types https://search.datacite.org/works?query=media.media_type:chemical/x-gaussian*+AND+media.media_type:chemical/x-mnpub*
8 License type https://search.datacite.org/works?query=rightsList.rights:"Creative Commons Public Domain Dedication (CC0 1.0)"
9 Exploiting subjectscheme https://search.datacite.org/works?query=media.media_type:chemical/x-mnpub*+AND+subjects.subjectScheme:NMR_Nucleus+AND+subjects.subject:1H
10 Exploiting subjectscheme https://search.datacite.org/works?query=media.media_type:chemical/x-mnpub*+AND+subjects.subjectScheme:NMR_Pulse+AND+subjects.subject:1D
11 Simple PID query https://search.datacite.org/works?query=identifier:*10.14469/hpc*
12 Combining ORCID with PID query https://search.datacite.org/works?query=(contributors.nameIdentifiers.nameIdentifier:*0000-0002-8635-8390)+AND+(identifier:*10.14469/hpc*)
13 Combing researcher name with PID query https://search.datacite.org/works?query=(identifier:*10.14469/hpc*)+AND+(contributors.contributor.contributorName:Henry+Rzepa)
14 Entries in specific repository (Imperial) referencing specific Journal https://search.datacite.org/works?query=(relatedIdentifiers.relatedIdentifier:10.1021/acs.orglett*)+AND+(identifier:*10.14469/hpc*)
15 Entries in specific repository (Cambridge) referencing specific Journal https://search.datacite.org/works?query=(relatedIdentifiers.relatedIdentifier:10.1021/acs.orglett*)+AND+(identifier:*10.17863/cam*)
18 Entries in specific repository (Cambridge) referencing all publisher journals https://search.datacite.org/works?query=(relatedIdentifiers.relatedIdentifier:10.1021/acs*)+AND+(identifier:*10.17863/cam*)
16 Entries in all repositories except one referencing specific Journal https://search.datacite.org/works?query=(relatedIdentifiers.relatedIdentifier:10.1021/acs.orglett*)+NOT+(identifier:*10.5517*)
17 Entries in specific repository referencing one publisher https://search.datacite.org/works?query=(relatedIdentifiers.relatedIdentifier:10.1021*)+AND+(identifier:*10.5517*)
19 Entires in all publisher journals, excluding one data repository https://search.datacite.org/works?query=(relatedIdentifiers.relatedIdentifier:10.1021*)+NOT+(identifier:*10.5517*)
20 Entries in Institutional repository referencing datasets https://search.datacite.org/works?query=(relatedIdentifiers.relatedIdentifier:*10.14469/spiral*)+AND+(identifier:*)+AND+(types.resourceTypeGeneral:Dataset)
21 Molecular formula search https://search.datacite.org/works?query=subjects.subjectScheme:inchi+AND+subjects.subject:*C11H8ClN3O*
22 Dataset is Referenced by https://search.datacite.org/works?query=(related_identifiers.relatedIdentifier:10.14469/hpc/3603)+AND+(related_identifiers.relationType:IsReferencedBy)
23 Finding institutional entries using the new ROR identifier https://search.datacite.org/works?affiliation-id=https://ror.org/041kmwe10
24 Retrieving DataCite metadata using DOI https://api.crossref.org/works/10.1021/acsomega.8b03005/transform/application/vnd.crossref.unixsd+xml

Interesting and/or valuable exploitations of the DataCite Schema/Search engine by other communities are hugely welcomed for inclusion here. We can all learn from them.

If you have a query or suggestion, please contact me at rzepa@imperial.ac.uk, ORCID: https://orcid.org/0000-0002-8635-8390 This document has DOI: 10.14469/hpc/5920