Menu
Log in

Patent Information Users Group, Inc.

The International Society for Patent Information Professionals

Log in

Patent Information Users Group, Inc.  The International Society for Patent Information Professionals

  • Home
  • Presentations on methods of inventor & assignee disambiguation on the program of USPTO Symposium on Entity Resolution, March 22 & 24 9:00 AM -12:35 PM ET, March 26 9:00-11:35 AM
  • Home
  • Presentations on methods of inventor & assignee disambiguation on the program of USPTO Symposium on Entity Resolution, March 22 & 24 9:00 AM -12:35 PM ET, March 26 9:00-11:35 AM


Presentations on methods of inventor & assignee disambiguation on the program of USPTO Symposium on Entity Resolution, March 22 & 24 9:00 AM -12:35 PM ET, March 26 9:00-11:35 AM

  • 20 Mar 2021 5:16 AM
    Message # 10216758

    Note: The USPTO symposium on Entity Resolution includes presentations of researchers which have developed methods of inventor & assignee disambiguation. The program of the symposium, as of 3/20/2021 lists only names of speakers links relevant publications. Presentations of speakers with publications on inventor or assignee disambiguation are selected below, and abstracts have been added to highlight the scope of relevant research.

     USPTO is conducting a virtual symposium “to discuss state-of-the-art approaches to, and current practices and applications of, entity resolution, with a particular focus on patent applications.”

    March 22 & 24, 9:00 am -12:35 pm ET and March 26, 9:00 am-11:35 am ET

    Registration link

    USPTO symposium on entity resolution

    Join the USPTO for a virtual symposium on best practices for entity resolution, a three-day event hosted by the American Institutes for Research.

    The symposium will bring together computer scientists, information scientists, economists, and others to discuss state-of-the-art approaches to, and current practices and applications of, entity resolution, with a particular focus on patent applications. It will include sessions on methods and applications, with the goals of:

    • Providing an overview of current approaches from leading scholars in the field,
    • Building knowledge,
    • Identifying a community of practitioners
    • Facilitating the application of common approaches.

    The symposium will consist of three half-day sessions during the week of March 22–26, 2021, featuring the following speakers and panelists:

    • Monday, March 22, 9 a.m. to 12:30 p.m. ET: Rebecca Steorts, Duke University; Greg Morrison University of Houston;  Nicholas Monath, University of Massachusetts Amherst; and Osmat Jefferson, Lens.org
    • Wednesday, March 24, 9 a.m. to 12:30 p.m. ET: Donatella Firmani, Roma Tre University; Xin Luna Dong, Amazon; Julie Callaert, Catholic University Leuven; and Deyun Yin, World Intellectual Property Organization and Harbin Institute of Technology, Shenzhen
    • Friday, March 26, 9 – 11:30 a.m. ET: Mariagrazia Squicciarini and Hélène Dernis, Organization for Economic Cooperation and Development; Thorsten Doherr, Centre for European Economic Research; and Matthew Ross, Claremont Graduate School

    Program/Symposium webpage  (presentation slides would be placed here)

    Agenda

    Registration link


    Presentations of selected speakers (with relevant publications linked to the program)

    Monday, March 22, 2021

    9:55–10:40 a.m.

    Presenter: Greg Morrison, Professor, University of Houston, Department of Physics

    "Disambiguation of patent inventors and assignees using high-resolution geolocation data" (presentation, 23 slides)

    Paper:

    Morrison, G., Riccaboni, M., Pammolli, F., 2017. Disambiguation of patent inventors and assignees using high-resolution geolocation data. Scientific Data 4(1), 1-21. https://doi.org/10.1038/sdata.2017.64 [the article has been added - ab]
    …A major obstacle to extracting useful information from [patents] is the problem of name disambiguation: linking alternate spellings of individuals or institutions to a single identifier to uniquely determine the parties involved in knowledge production and diffusion. In this paper, we describe a new algorithm that uses high-resolution geolocation to disambiguate both inventors and assignees on about 8.5 million patents found in the EPO, under, and in the USPTO. 


    10:55-11:25 a.m.

    Presenter: Nicholas Monath, PhD student,University of Massachusetts Amherst, Computer Science

    "Disambiguation in PatentsView & Non-Greedy Incremental Clustering" (presentation, 152 p)

    Monath, N., Madhavan, S., DiPietro, C., McCallum, A., Jones, C.(University of Massachusetts Amherst & American Institutes for Research),  2021. Disambiguating Patent Inventors, Assignees, and their Locations in PatentsView. 2 p.
    PatentsView, an initiative supported by the Office of Chief Economist in the US Patent& Trademark Office (USPTO), is a tool for patent search, analysis, and visualization. It provides a curated and easy-to-use database of pre-granted and granted USTPO patents from 1976 to present. Not only does PatentsView carefully process and clean raw patent data collections from USPTO’s bulk XML files(https://bulkdata.uspto.gov/), but it also per-forms entity resolution of the ambiguous in-ventor names, assignee names, and the location names of the inventors and assignees. This process disambiguates which inventor, assignee, and location names refer to the same entity in PatentsView. In this work, we describe the entity resolution models and algorithms used in PatentsView, highlight their technical and empirical strengths, and provide examples of studies that have benefited from PatentsView’s disambiguation

    Monath, N., McCallum, A., Wick, M., Sullivan, J., Kobren, A.(University of Massachusetts Amherst). Discriminative Hierarchical Coreference for Inventor Disambiguation. Presented on PatentsView Inventor Disambiguation Technical Workshop, Sep. 24, 2015 (USPTO), 124 p. [this presentation has been added - ab]
    Team of Nicholas Monath and Prof. Andrew McCallum, from University of Massachusetts Amherst presented the best algorithm (of six research teams), which has been implemented in USPTO’s PatentsView, patent data visualization and analysis tool.

     

    11:25–11:55 a.m.

    Presenter: Osmat Azzam Jefferson, PhD, Director of Product Development, Cambia, Lens.org

    "A metarecord architecture for reconciling global entity resolution efforts" (presentation, 16 p.)


    Jefferson, O.A., Jaffe, A., Ashton, D, et al. 2018. Mapping the global influence of published research on industry and innovation. Nature Biotechnology 36, 31–39. https://doi.org/10.1038/nbt.4049
    … we have integrated and interconnected scholarly citations with global patent literature and created new tools to link the scholarly literature with the patent literature. … We outline an evolving toolkit, Lens Influence Mapping, that allows assessment of individual scholarly works and aggregated outputs of authors for influence on industry and enterprise, as measured by citations within patents.
    (Cf.  In4M metrics in Using the In4M Metric to rank global research institutions (methodology))

    What is the Lens…  Mapping PORTs (?). The Lens, the flagship project of the social enterprise Cambia, seeks to source, merge and link diverse open knowledge sets, including scholarly works and patents…https://about.lens.org/

    Jefferson, O.A., An open platform for discovery, analytics, mapping and management of research works and innovation pathways. Sep.8, 2020. 20 p. (Patcite: Open Influence Mapping Facility, p.14; In4M [=International Industry & Innovation Influence Metric] reports, p.15; 15-digit LensID for …scholarly works or patents, p.7, Lens scholarly.data, 220 mln records, Aug. 2020, p.6, 8-9, etc.) [the article has been added - ab]

    See also Lens Patcite, for analyzing linkages between academic research and inventions [based on citation to and from article to patents]…”The granularity of this tool allows you to gain real-time insights into how science and scholarship are shaping patent-based inventions and which research article, which scientists or researchers, and potentially, which institutions have influence over a subset of economic activity”.

    12:00–12:30 p.m.
    Moderators:Andrew Toole  PhD, Chief Economist, USPTO, and Mark Finlayson PhD, Professor, Florida International University, Computing & Information Sci.
    Panel with speakers on best practices for results validation

     

    Wednesday, March 24, 2021

    10:55–11:25 a.m.
    Presenter:
    Julie Callaert PhD, Senior Researcher, ECOOM -Centre for Research and Development Monitoring, KU Leuven (Belgium)

    The craft of harmonizing patentees” (Presentation, 32 p.)

    Additional references in the presentation:
    Compendium of underlying
    methodologies:

    Callaert, J., Du Plessis, M., Grouwels, J., Lecocq, C., Magerman, T., Peeters, B., Song, X., Van Looy, B., Vereyen, C.,. Patent statistics at eurostat: Methods for regionalisation, sector allocation and name harmonisation Eurostat.Methodologies and Working Papers. 2011 Edition. Eurostat. 2011. 72 p. https://ec.europa.eu/eurostat/documents/3859598/5916785/KS-RA-11-008-EN.PDF/ffe43370-8063-4e07-b77e-0319f1a79294?version=1.0

    Linked publications:

    Peeters, B., Song, X., Callaert, J., Grouwels, J., Van Looy, B., 2010. Harmonizing harmonized patentee names: an exploratory assessment of top patentees, Eurostat Working Paper. European Commission, Brussels (Belgium). 29 p. https://lirias.kuleuven.be/retrieve/106430
    Several name harmonization approaches have been developed in the past to correct for different name variants occurring for one organization or individual. Each of these methods however has limitations regarding coverage and/or accuracy. In this paper, we explore a methodology to complement automated harmonizing efforts by inspecting outcomes of harmonized name efforts. The emphasis is put on a high coverage in terms of patent volumes, on high accuracy and on completeness (all person names of the PATSTAT person table1that are patentees). The approach developed by Magerman et al. (2009)  [see below] serves as a starting point.

    Magerman, T., Van Looy, B., Song, X., 2006. Data Production Methods for Harmonized Patent Statistics: Patentee Name Harmonization (Research Report No. MSI 0605). K.U. Leuven, Leuven, Belgium. 88 p. https://doi.org/10.2139/ssrn.944470
    … In this paper, we develop a comprehensive method to achieve harmonization of patentee names in an automated way so that analysis at the level of patentees can be facilitated. The method has been applied to an extensive set of all patentee names found for all EPO patent applications published between 1978 and 2004 and all granted USPTO patents published between 1991 and 2003. . Priority has been given to accuracy (the extent to which the name-harmonization procedure correctly allocates name variants to a single, harmonized patentee name) (compare to 'completeness' as the extent to which the name-harmonization procedure is able to capture all name variants of the same patentee). The focus of the methodology outlined in this paper is on patentee name harmonization, which does not equate to harmonization on the level of the legal entity. (Legal entity harmonization is concerned with the identification of all patents owned by one and the same legal entity and takes mergers and acquisitions, name changes, and subsidiaries into account.)

    See also:
    Magerman, T., Peeters, B., Song, X., Grouwels, J., Callaert, J., Van Looy, B., 2011. Name harmonisation, in: Patent Statistics at Eurostat: Methods for Regionalisation, Sector Allocation and Name Harmonisation, Eurostat Methodologies and Working Papers. 2011 edition. Eurostat, pp. 30–54. https://lirias.kuleuven.be/retrieve/153491 [the article has been added - ab]

    11:25–11:55 a.m.
    Presenter:Deyun Yin PhD, Economist, World Intellectual Property Organization

    Large-scale Name Disambiguation of Chinese Patent Inventors” (presentation, 26 p.)

    11:25–11:55 a.m.
    Presenter:Deyun Yin PhD, Economist, World Intellectual Property Organization

    Linked publications
    de Rassenfosse, G., Kozak, J., Seliger, F., 2019. Geocoding of worldwide patent data. Scientific Data 6, 260 (15 p.) https://doi.org/10.1038/s41597-019-0264-6
    The dataset provides geographic coordinates for inventor and applicant locations in 18.8 million patent documents spanning over more than 30 years. The geocoded data are further allocated to the corresponding countries, regions and cities. When the address information was missing in the original patent document, we imputed it by using information from subsequent filings in the patent family. The resulting database can be used to study patenting activity at a fine-grained geographic level without creating bias towards the traditional, established patent offices.
    [Dr. Yin contribution was acknowledged for providing valuable advice on the assignment of Chinese city information]

    World Intellectual Property Report 2019. The Geography of Innovation: Local Hotspots, Global Networks, World Intellectual Property Organization, 2019. 128 p.. (WIPO Publication No.944) https://www.wipo.int/edocs/pubdocs/en/wipo_pub_944_2019.pdf
    The  report  was  prepared  by  a  team  led  by  Julio  Raffo, Intan Hamdan­Livramento, Maryam Zehtabchi and Deyun Yin, all from WIPO’s Economics and Statistics Division (ESD).
    Technical notes
    Geocoding  (p.123)
    In the case of patents, 87 percent of the international patent families filed from 1976 to 2015 were geocoded. …As far as possible, the geocoding was applied to the inventors’ addresses by using the most complete and reliable data source available within each patent family. In addition, the data were enriched with exiting geocoded patent data (see Yin and Motohashi, 2018; Ikeuchi et al., 2017; Li et al., 2014; de Rassenfosse et al., 2019; Morrison et al., 2017). … When there was more than one source for a given patent family, the following order of priority was given: (1) sources having information from the inventor (inventor principle); (2) sources having more inventors’ addresses covered (coverage principle); (3) sources with the best geocoding resolution (resolution principle); (4) sources closest to the address country – e.g., entrusting Chinese addresses to CNIPA data, Japanese addresses to Japan Patent Office (JPO) data, etc. (local principle); and (5) manual check and ad hoc selection when two or more sources were still available. …. For more information, please refer to Miguelez et al. (2019).
    Refs.
    Yin, D. and K. Motohashi (2018).
    Inventor Name Disambiguation with Gradient Boosting Decision Tree and Inventor Mobility in China (1985 –2016), RIETI Discussion Paper Series, 18-E-018. 56 p. https://econpapers.repec.org/paper/etidpaper/18018.htm
    Ikeuchi, K., K. Motohashi, R. Tamura and N. Tsukada (2017). Measuring Science Intensity of Industry using Linked Dataset of Science, Technology and Industry. RIETI Discussion Paper Series, 1–E-056. 46 p. www.rieti.go.jp/en/publications/summary/17030073.html
    Li, G.C., R. Lai, A. D’Amour, D.M. Doolin, Y. Sun, V.I. Torvik, A.Z. Yu and L. Fleming (2014). Disambiguation and co-authorship networks of the U.S. patent inventor database (1975–2010). Research Policy, 43, 941–955  http://funginstitute.berkeley.edu/wp-content/uploads/2014/05/Research-Policy.pdf
    de Rassenfosse, G., Kozak, J., Seliger, F., 2019. Geocoding of worldwide patent data. Scientific Data 6, 260 (15 p.) https://doi.org/10.1038/s41597-019-0264-6 [see above]
    Morrison, G., M. Riccaboni and F.
    Pammolli (2017). Disambiguation of patent inventors and assignees using high-resolution geolocation data. Scientific Data, 4(1), 1-21. https://doi.org/10.1038/sdata.2017.64
    Miguelez, E., J. Raffo, C. Chacua,
    M. Coda-Zabetta, D. Yin, F. Lissoni and G. Tarasconi (2019). Tied In: The Global Network of Local Innovation. WIPO Working Paper No. 58, November. Geneva: WIPO.  https://tind.wipo.int/record/40558/files/wipo_pub_econstat_wp_58.pdf [Corresponding author: Deyun Yin]


    Yin, D., Motohashi, K., 2018. Inventor Name Disambiguation with Gradient Boosting Decision Tree and Inventor Mobility in China (1985-2016) (Discussion paper). Research Institute of Economy, Trade and Industry (RIETI). https://www.rieti.go.jp/jp/publications/dp/18e018.pdf  [the article has been added - ab]
    …[We] found that gradient boosting decision trees classifier outperforms all other classifiers with the highest F1-score and stable performance in solving the homonym problem prevailing in Chinese names…. In the last step…we clustered records with the density-based spatial clustering of applications with noise (DBSCAN) based on the distance matrix predicated by the GBDT classifier.

    12:00–12:30 p.m.
    Moderators
    :Andrew Toole  PhD, Chief Economist, USPTO, and Mark Finlayson PhD, Professor, Florida International University, Computing & Information Sci.
    Panel with speakers on best practices for results validation
     

     

    Friday, March 26, 2021

    9:10 –9:40am
    Presenters: Mariagrazia Squicciarini PhD,Senior Economist, Directorate for Science, Technology, and Industry, Organization for Economic Cooperation and Development (OECD), Hélène Dernis, Analyst, OECD

    Harmonisation Of IP Applicant Names:The OECD Experience (presentation, 13 p.)

    Squicciarini, M., Dernis, H., 2013. A Cross-Country Characterisation of the Patenting Behaviour of Firms based on Matched Firm and Patent Data (Working Paper No. 2013/05), OECD Science, Technology and Industry 35 p. https://doi.org/10.1787/5k40gxd4vh41-en
    This work proposes a characterisation of the patenting behaviours of firms. It relies on patent data linked to firm data from a commercial dataset, regards firms of 20 or more employees located in 15 countries, and refers to the period 1999-2010. The way in which patent assignees’ names are linked to firm names is explained, and the coverage and representativeness of the firm database used is discussed using information from structural business statistics. The profile of patenting and non-patenting firms is delineated on the basis of characteristics such as firm size, ownership, firm age and industry, and of combinations thereof.


    9:40–10:10am
    Presenter: Thorsten Doherr PhD, Dept. of Economics of Innovation and Industrial Dynamics, Centre for European Economic Research (ZEW)

    Disambiguation by Namesake Risk Assessment (presentation, 23 p.)

     


    Doherr, T. Disambiguation by namesake risk assessment, ZEW Discussion Papers, No. 21–021. February 2021. 40 p.
    … We introduce a universal method to assess the riskollecting the documents into personalized clusters. A theoretical setup for the probability of drawing a namesake depending on the number of namesakes in the population and the size of the observed unit replaces the need for training datasets, thereby avoiding a namesake bias caused by the inherent underestimation of namesakes in training/benchmark data.
    See example of a trait vector constructed for the EPO patents,
    p.21-23; application of algorithm to patent data, p.30-32


    11:00–11:30 a.m.
    Moderators
    : Andrew Toole  PhD, Chief Economist, USPTO, and Mark Finlayson PhD, Professor, Florida International University, Computing & Information Sci.
    Panel with speakers on best practices for results validation



    ----------------------------------------------------

    See also presentation on the theory of entity resolution:
    Wednesday, March 24, 2021
    9:10–9:55a.m.
    Donatella Firmani, PhD, Professor, Roma Tre University, Engineering Department, Computer Science and Automation

    A link to Dr. Firmani slides for five lectures in June 2020 for a course on “Modern approaches to Entity Resolution” (includes Introduction to Entity Resolution and Data Integration; Modern Approaches for Recognition of Duplicates; Modern Approaches for Clustering and Reducing the Duplicates Search Space; Explainable AI methods for Entity Resolution; Beyond Entity Resolution: Knowledge Graphs)

    Update 3/26/2021 8:46 AM Titles and link to presentaton has been added.
    Last modified: 26 Mar 2021 8:47 AM | Anonymous member

In this Section:

PIUG - Patent Information User Group, Inc.

Mailing Address:  
40 E. Main St., #1438
Newark, DE  19711

Phone: +1 (302) 660-3275   
Fax: +1 (302) 660-3276
Email: PIUGinfo@piug.org

Webmaster: webmaster@piug.org

Notice on use of PIUG name and logo:  

No one may use the PIUG name or logo for any promotional or commercial purpose or any other purpose without the prior written consent of the PIUG Board of Directors.  

Antitrust Policy | Bylaws  |  Copyright and Disclaimer

© 2024 The Patent Information Users Group, Inc.

Powered by Wild Apricot Membership Software