Elsevier,
a global information analytics business, offers the following webinar at BrightTALK:
Using machine learning to extract chemical information from patents
Tuesday, October 7, 2020 at 10:00 AM EST | 4 PM CET
… Patent authorities make available the patents but do not provide systematic continuous chemical annotations. Different text-mining approaches exist to extract chemical information from patents but less attention has been given to relevancy of a compound in a patent. …Using the advanced technologies in Artificial intelligence (AI), Machine learning (ML) and Natural language processing (NLP), we have developed models to overcome these limitations. Through shared evaluation campaign we have also invited academic and industrial teams to further develop, improve and contribute to the domain of patent information extraction.
The webinar will discuss:
- The challenges of patent mining in the chemical domain
- Chemical information extraction. From relevant document to relevant section to relevant information.
- How to create a quality training set for machine learning in Chemistry
- The ChEMU shared task for name entity and event extraction [see notes below - ab]
Presented by
Saber Akhondi, Principle NLP Scientist, Elsevier
About speaker:
Saber Akhondi obtained his MSc degree in Bioinformatics and Systems Biology from Chalmers University of Technology, Sweden. In 2011 he started as a PhD student within the biosemantics group in Erasmus Medical Center Rotterdam. He currently works at Elsevier as a Principle NLP Scientist where he applies NLP and machine learning techniques to extract information useful for large commercial and research communities. (See a list of seelected publication by the speaker below)
Registration / On Demand (after event) link
https://www.brighttalk.com/webcast/16527/425810 (full descriprion)
------------------------------------------------------------------------------------------------
Additional information:
Background information on CheMU (Cheminformatics Elsevier Melbourne University) project could be found at Sep. 17, 2020 PIUG-PF post
ChEMU evaluation lab was a part of the 11th Conference and Labs of the Evaluation Forum (CLEF-2020) and involved extraction tasks over chemical reactions from patents.
Task 1—Named entity recognition—involves identifying chemical compounds as well as their role in chemical reaction.
Task 2—Event extraction over chemical reactions.(see illustrative examples at Fig. 1 and Table 2 [they attached to the earlier PIUG-PF post]
The results of CLEF2020 CheMU task evaluation have published in the following papers:
He, J., Nguyen, D.Q., Akhondi, S.A., Druckenbrodt, C., Thorne, C., Hoessel, R., Afzal, Z., Zhai, Z., Fang, B., Yoshikawa, H., Albahem, A., Cavedon, L., Cohn, T., Baldwin, T., Verspoor, K., 2020. Overview of ChEMU 2020: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents, in: Arampatzis, A., Kanoulas, E., Tsikrika, T., Vrochidis, S., Joho, H., Lioma, C., Eickhoff, C., Névéol, A., Cappellato, L., Ferro, N. (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp. 237–254. https://doi.org/10.1007/978-3-030-58219-7_18 (Submitted copy)
He, J., Nguyen, D.Q., Akhondi, S.A., Druckenbrodt, C., Thorne, C., Hoessel, R., Afzal, Z., Zhai, Z., Fang, B., Yoshikawa, H., Albahem, A., Wang, J., Ren, Y.R., Zhang, Z., Zhang, Y., Dao, M.H., Ruas, P., Lamurias, A., Couto, F.M., Copara, J., Naderi, N., Knafou, J., Ruch, P., Teodoro, D., Lowe, D., Mayfield, J., Köksal, A., Dönmez, H., Özkırımlı, E., Özgür, A., Mahendran, D., Gurdin, G., Lewinski, N., Tang, C., McInnes, B.T., C.S., M., Rk Rao., P., Lalitha Devi, S., Cavedon, L., Cohn, T., Baldwin, T., Verspoor, K., 2020. An Extended Overview of the CLEF 2020 ChEMU Lab: Information Extraction of Chemical Reactions from Patents, in: Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum. 31 p. (Posted by one of authors)
CLEF-2021 CheMU tasks, as presented by Prof. Karin Verspoor (Melbourne University, Australia) at CLEF2020 Meeting would focus on references resolution in chemical patents and consists of two new tasks:
Task 1 – Chemical reaction reference resolution: Given a chemical reaction snippet, the task aims to find similar reactions and general conditions that it referrers to (see attached Fig. 1)
Task 2 –Anaphora resolution: Five types of references are defined: Coreference, Transformed, Reaction-associated, Work-up, Contained (see attached Fig. 2 & Fig. 3)
https://www.loom.com/share/d3ee08e4b8c64685bf13f0f37dc900b6
------------------------------------------
See also a list of selected publications of the speaker, Dr. Saber Akhondi, related recognition of chemical entities in patents:
Nguyen, D.Q., Zhai, Z., Yoshikawa, H., Fang, B., Druckenbrodt, C., Thorne, C., Hoessel, R., Akhondi, S.A., Cohn, T., Baldwin, T., Verspoor, K., 2020. ChEMU: Named Entity Recognition and Event Extraction of Chemical Reactions from Patents, in: Jose, J.M., Yilmaz, E., Magalhães, J., Castells, P., Ferro, N., Silva, M.J., Martins, F. (Eds.), Advances in Information Retrieval, Lecture Notes in Computer Science. Springer International Publishing, Cham, pp. 572–579. https://doi.org/10.1007/978-3-030-45442-5_74
Verspoor, K., Nguyen, D.Q., Akhondi, S.A., Druckenbrodt, C., Thorne, C., Hoessel, R., He, J., Zhai, Z., 2020. ChEMU dataset for information extraction from chemical patents. https://doi.org/10.17632/wy6745bjfj.1
Akhondi, S.A., Rey, H., Schwörer, M., Maier, M., Toomey, J., Nau, H., Ilchmann, G., Sheehan, M., Irmer, M., Bobach, C., Doornenbal, M., Gregory, M., Kors, J.A., 2019. Automatic identification of relevant chemical compounds from patents. Database (Oxford) 2019, baz001. https://doi.org/10.1093/database/baz001
Akhondi, S., 2018. Text Mining for Chemical Compounds (Ph.D.). Erasmus University Rotterdam. 167 p. (Ph.D. portfolio includes several articles published by the author)
Akhondi, S.A., Pons, E., Afzal, Z., van Haagen, H., Becker, B.F.H., Hettne, K.M., van Mulligen, E.M., Kors, J.A., 2016. Chemical entity recognition in patents by combining dictionary-based and statistical approaches. Database (Oxford) 2016. baw061, https://doi.org/10.1093/database/baw061
Akhondi, S.A., Klenner, A.G., Tyrchan, C., Manchala, A.K., Boppana, K., Lowe, D., Zimmermann, M., Jagarlapudi, S.A.R.P., Sayle, R., Kors, J.A., Muresan, S., 2014. Annotated Chemical Patent Corpus: A Gold Standard for Text Mining. PLoS ONE 9(9): e107477. https://doi.org/10.1371/journal.pone.0107477