Extracting Problem Linkages to Improve Knowledge Exchange between Science and Technology Domains using an Attention-based Language Model

Extracting Problem Linkage Using an Attention-based Language Model

  • H. Sasaki Institute for Future Initiatives, The University of Tokyo, Japan
  • S. Yamamoto Data Artist Inc., Tokyo, Japan
  • A. Agchbayar Data Artist Inc., Tokyo, Japan
  • Ν. Nkhbayasgalan Data Artist Inc., Tokyo, Japan
Keywords: problem extraction, attention-based language model, information matching, scientometrics, Literature-Based Discovery (LBD)


Science and technology activities can be considered problem-solving activities, and scientific papers and patent publications can be viewed as providing explicit knowledge gained from the problem-solving of academia and industry respectively. However, even in the same field, the approach to the same problem is not consistent between a paper and the patented technology. The creation of information silos in science and technology generates inefficiency in human intellectual production. Therefore, this study examines whether insights from technical problems can be shared with academics to solve scientific problems. We propose a concept to link the problems between these two domains using a linguistic approach for knowledge discovery that connects science and technology. We extracted scientific papers from the Association for Computational Linguistics dataset, and patent literature from the Derwent Innovation platform. From these, pairs of problem defining sentences were identified and extracted using an attention-based language model. For example, we were able to extract examples of issues that do not necessarily arise from scientific papers, such as annotation difficulties in the analysis of social network data, but can be hinted at by patented techniques prior to the paper. These results suggest that scientific problems and industrial solutions can provide mutual insight. This knowledge discovery approach is recommended not only for benefiting corporate activities but also for grasping research trends.


Download data is not yet available.


C. Freeman, “The economics of technical change,” Cambridge Journal of Economics, vol. 18, no. 5, pp. 463–514, 1994.

H. Grupp, Foundations of the Economics of Innovation: Theory, Measurement and Practice, Illustrated edition edition. Cheltenham:UK: Edward Elgar, 1998.

G. Dosi, Innovation, Organization and Economic Dynamics: Selected Essays. Cheltenham:UK: Edward Elgar, 2000.

B. Godin, “The Linear Model of Innovation: The Historical Construction of an Analytical Framework,” Science, Technology, & Human Values, vol. 31, no. 6, pp. 639–667, Nov. 2006, doi: 10.1177/0162243906291865.

D. Edgerton, “‘The linear model’ did not exist: Reflections on the history and historiography of science and research in industry in the twentieth century,” in The Science-Industry Nexus: History, Policy, Implications., 2004, pp. 1–36.

D.A. Hounshell, “Industrial research: Commentary”, in: The Science-Industry Nexus. History, Policy, Implications, Science History Publications, 2004, pp. 59-68

National Science Foundation (U.S.), Basic research; a national resource. Washington, DC, USA, 1957.

R. R. Nelson, “The Simple Economics of Basic Scientific Research,” Journal of Political Economy, vol. 67, no. 3, pp. 297–306, Jun. 1959, doi: 10.1086/258177.

W. J. Price and L. W. Bass, “Scientific Research and the Innovative Process,” Science, vol. 164, no. 3881, pp. 802–806, 1969.

S. J. Kline, “Innovation Is Not a Linear Process,” Research Management, vol. 28, no. 4, pp. 36–45, Jul. 1985, doi: 10.1080/00345334.1985.11756910.

R. Landau and Rosenberg, Eds., The Positive Sum Strategy: Harnessing Technology for Economic Growth. Washington, DC, USA: The National Academies Press, 1986.

N. Rosenberg, Exploring the Black Box: Technology, Economics, and History. Cambridge, UK: Cambridge University Press, 1994.

K. Grandin, N. Wormbs, and S. Widmalm, The Science-industry Nexus: History, Policy, Implications : Nobel Symposium 123. USA: Science History Publications, 2004.

N. Rosenberg, Inside the Black Box: Technology and Economics Paperback. Cambridge, UK: Cambridge University Press, 1982.

M. Gibbons, The New Production of Knowledge: The Dynamics of Science and Research in Contemporary Societies. Thousand Oaks, CL, USA: SAGE, 1994.

A. Verbeek, K. Debackere, M. Luwel, P. Andries, E. Zimmermann, and F. Deleus, “Linking science to technology: Using bibliographic references in patents to build linkage schemes,” Scientometrics, vol. 54, no. 3, pp. 399–420, 2002.

W. E. Steinmueller, “Basic Research and Industrial Innovation,” in The Handbook of Industrial Innovation, Cheltenham:UK: Edward Elgar, 1995.

S. J. Kline, Innovation Styles in Japan and the United States: Cultural Bases : Implications for Competitiveness : the 1989 Thurston Lecture. Stanford University, Department of Mechanical Engineering, Thermosciences Division, 1990.

M. B. Myers and R. S. Rosenbloom, Rethinking the Role of Industrial Research. Division of Research, Harvard Business School, 1994.

M. Balconi, S. Brusoni, and L. Orsenigo, “In defence of the linear model: An essay,” Research Policy, vol. 39, no. 1, pp. 1–13, Feb. 2010, doi: 10.1016/j.respol.2009.09.013.

F. Narin and D. Olivastro, “Status report: Linkage between technology and science,” Research Policy, vol. 21, no. 3, pp. 237–249, Jun. 1992, doi: 10.1016/0048-7333(92)90018-Y.

F. Narin and D. Olivastro, “Linkage between patents and papers: An interim EPO/US comparison,” Scientometrics, vol. 41, no. 1, pp. 51–59, Jan. 1998, doi: 10.1007/BF02457966.

F. Narin, M. Rosen, and D. Olivastro, “Patent Citation Analysis: New Validation Studies and Linkages Statistics,” Science and Technology Indicators, pp. 35–47, Jan. 1989.

C. D. Manning, P. Raghavan, and H. Schutze, An Introduction to Information Retrieval. Cambridge, England: Cambridge University Press, 2008.

M. Meyer, “Tracing knowledge flows in innovation systems,” Scientometrics, vol. 54, no. 2, pp. 193–212, Jun. 2002, doi: 10.1023/A:1016057727209.

J. Callaert, B. Van Looy, A. Verbeek, K. Debackere, and B. Thijs, “Traces of Prior Art: An analysis of non-patent references found in patent documents,” Scientometrics, vol. 69, no. 1, pp. 3–20, Oct. 2006, doi: 10.1007/s11192-006-0135-8.

F. Narin, K. S. Hamilton, and D. Olivastro, “The increasing linkage between U.S. technology and public science,” Research Policy, vol. 26, no. 3, pp. 317–330, 1997.

M. P. Carpenter and F. Narin, “Validation study: Patent citations as indicators of science and foreign dependence,” World Patent Information, vol. 5, no. 3, pp. 180–185, Jan. 1983, doi: 10.1016/0172-2190(83)90139-4.

W. Glanzel and M. Meyer, “Patents cited in the scientific literature: An exploratory study of ‘reverse’ citation relations,” Scientometrics, vol. 58, no. 2, pp. 415–428, Oct. 2003, doi: 10.1023/A:1026248929668.

F. Narin and E. Noma, “Is technology becoming science?,” Scientometrics, vol. 7, no. 3, pp. 369–381, Mar. 1985, doi: 10.1007/BF02017155.

T.-K. Hsiao and V. Torvik, “Knowledge transfer from technology to science: The longevity of paper‐to‐patent citations,” Proceedings of the Association for Information Science and Technology, vol. 56, pp. 417–421, Jan. 2019, doi: 10.1002/pra2.41.

R. Johnson, A. Watkinson, and A. Mabe, The STM Report: An overview of scientific and scholarly publishing, 5th ed. Hague, Netherlands: International Association of Scientific, Technical and Medical Publishers, 2018.

World Intellectual Property Indicators 2019. Geneva, Switzerland: World Intellectual Property Organization, 2019.

D. Swanson, N. Smalheiser, and V. Torvik, “Ranking indirect connections in literature-based discovery: The role of medical subject headings,” Journal of the American Society for Information Science and Technology, vol. 57, no. 11, pp. 1427–1439, Sep. 2006, doi: 10.1002/asi.20438.

M. Weeber, H. Klein, L. Berg, and R. Vos, “Using concepts in literature-based discovery: Simulating Swanson’s Raynaud-fish oil and migraine-magnesium discoveries,” Journal of the American Society for Information Science and Technology, vol. 52, no. 7, pp. 548–557, May 2001, doi: 10.1002/asi.1104.abs.

D. Hristovski, B. Peterlin, J. Mitchell, and S. Humphrey, “Using literature-based discovery to identify disease candidate genes,” International Journal of Medical Informatics, vol. 74, pp. 289–298, Nov. 2004, doi: 10.1016/j.ijmedinf.2004.04.024.

M. D. Gordon and R. K. Lindsay, “Toward discovery support systems: a replication, re-examination, and extension of Swanson’s work on literature-based discovery of a connection between Raynaud’s and fish oil,” Journal of the American Society for Information Science, vol. 47, no. 2, pp. 116–128, Feb. 1996, doi: 10.1002/(SICI)1097-4571(199602)47:2%3C116::AID-ASI3%3E3.3.CO;2-P.

D. R. Swanson, “Undiscovered public knowledge,” The Library Quarterly: Information, Community, Policy, vol. 56, no. 2, pp. 103–118, Apr. 1986.

V. Ittipanuvat, K. Fujita, Y. Kajikawa, J. Mori, and I. Sakata, “Finding linkage between technology and social issues: A literature based discovery approach,” in 2012 Proceedings of PICMET ’12: Technology Management for Emerging Technologies, Vancouver, BC, Canada, Aug. 2012, pp. 2310–2321.

N. Shibata, Y. Kajikawa, and I. Sakata, “Extracting the commercialization gap between science and technology — Case study of a solar cell,” Technological Forecasting and Social Change, vol. 77, no. 7, Sep. 2010, doi: 10.1016/j.techfore.2010.03.008.

N. Shibata, Y. Kajikawa, Y. Takeda, and K. Matsushima, “Detecting emerging research fronts based on topological measures in citation networks of scientific publications,” Technovation, vol. 28, no. 11, pp. 758–775, Nov. 2008, doi: 10.1016/j.technovation.2008.03.009.

M.-Y. Wang, S.-C. Fang, and Y.-H. Chang, “Exploring technological opportunities by mining the gaps between science and technology: Microalgal biofuels,” Technological Forecasting and Social Change, vol. 92, Aug. 2014, doi: 10.1016/j.techfore.2014.07.008.

M. Meyer, “Does science push technology? Patents citing scientific literature,” Research Policy, vol. 29, no. 3, pp. 409–434, Mar. 2000, doi: 10.1016/S0048-7333(99)00040-2.

M. Gittelman and B. Kogut, “Does Good Science Lead to Valuable Knowledge? Biotechnology Firms and the Evolutionary Logic of Citation Patterns,” Management Science, vol. 49, no. 4, pp. 366–382, Apr. 2003, doi: 10.1287/mnsc.49.4.366.14420.

L. Sollaci and M. Pereira, “The Introduction, Methods, Results, and Discussion (IMRAD) Structure: a fifty-year survey,” Journal of the Medical Library Association : JMLA, vol. 92, no. 3, pp. 364–7, Aug. 2004.

R. D. Huddleston, Sentence and Clause in Scientific English. Communication Research Centre, University College, 1968.

M. Hoey, Textual Interaction: An Introduction to Written Discourse Analysis. Routledge, 2013.

Y.-H. Tseng, C.-J. Lin, and Y.-I. Lin, “Text mining techniques for patent analysis,” Information Processing & Management, vol. 43, no. 5, pp. 1216–1247, Sep. 2007, doi: 10.1016/j.ipm.2006.11.011.

H. Sakai, H. Nonaka, and S. Masuyama, “Extraction of Information on the Technical Effect from a Patent Document,” Transactions of The Japanese Society for Artificial Intelligence, vol. 24, pp. 531–540, Jan. 2009, doi: 10.1527/tjsai.24.531.

A. Shinmori, M. Okumura, Y. Marukawa, and M. Iwayama, “Rhetorical Structure Analysis of Japanese Patent Claims using Cue Phrases,” in Proceedings of the Third NTCIR Workshop, Tokyo,Japan, Oct. 2002.

I. Bergmann, D. Butzke, L. Walter, J. P. Fuerste, M. G. Moehrle, and V. A. Erdmann, “Evaluating the risk of patent infringement by means of semantic patent analysis: the case of DNA chips,” R&D Management, vol. 38, no. 5, pp. 550–562, 2008, doi: 10.1111/j.1467-9310.2008.00533.x.

G. Cascini, A. Fantechi, and E. Spinicci, “Natural Language Processing of Patents and Technical Documentation,” in Document Analysis Systems VI, vol. 3163, 2004.

G. Cascini and M. Zini, “Measuring patent similarity by comparing inventions functional trees,” in Computer-Aided Innovation (CAI), vol. 277, Springer, 2008.

H. Park, J. Yoon, and K. Kim, “Identifying patent infringement using SAO based semantic technological similarities,” Scientometrics, vol. 90, no. 2, pp. 515–529, Feb. 2012, doi: 10.1007/s11192-011-0522-7.

J. Yoon and K. Kim, “Detecting signals of new technological opportunities using semantic patent analysis and outlier detection,” Scientometrics, vol. 90, no. 2, pp. 445–461, Feb. 2012, doi: 10.1007/s11192-011-0543-2.

X. Wang et al., “Identifying R&D partners through Subject-Action-Object semantic analysis in a problem & solution pattern,” Technology Analysis & Strategic Management, vol. 29, no. 10, pp. 1167–1180, Nov. 2017, doi: 10.1080/09537325.2016.1277202.

H. Nanba, A. Fujii, M. Iwayama, and T. Hashimoto, “Overview of the Patent Mining Task at the NTCIR-8 Workshop,” presented at the Proceedings of NTCIR-8 Workshop Meeting, Tokyo, Japan, Jun. 2010, pp. 293–302.

M. Iwayama, A. Fujii, and N. Kando, “Overview of Classification Subtask at NTCIR-5 Patent Retrieval Task,” presented at the Proceedings of NTCIR-5 Workshop Meeting, Tokyo, Japan, Dec. 2005, pp. 359–365.

K. Heffernan and S. Teufel, “Identifying problems and solutions in scientific text,” Scientometrics, vol. 116, no. 2, pp. 1367–1382, 2018, doi: 10.1007/s11192-018-2718-6.

I. Councill, C. L. Giles, and M.-Y. Kan, “ParsCit: an Open-source CRF Reference String Parsing Package,” in Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco, May 2008, pp. 661–667.

D. J. Phelps, “Automatic Concept Identification: Extracting Problem Solved Concepts From Patent Documents,” presented at the IRFS 2007 Vienna Information Retrieval Facility Symposium, Vienna, Austria, 2007.

S. Tiwana and E. Horowitz, “Extracting problem solved concepts from patent documents,” in Proceedings of the 2nd international workshop on Patent information retrieval, Hong Kong, China, Nov. 2009, pp. 43–48, doi: 10.1145/1651343.1651356.

C. Jeong and K. Kim, “Creating patents on the new technology using analogy-based patent mining,” Expert Systems with Applications, vol. 41, no. 8, pp. 3605–3614, Jun. 2014, doi: 10.1016/j.eswa.2013.11.045.

Z. S. Harris, “Distributional Structure,” WORD, vol. 10, no. 2–3, pp. 146–162, Aug. 1954, doi: 10.1080/00437956.1954.11659520.

M. Sahlgren, “The distributional hypothesis,” Italian journal of linguistics, vol. 20, no. 1, pp. 33–54, 2008.

A. Vaswani et al., “Attention Is All You Need,” presented at the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017.

A. Radford, “Improving Language Understanding by Generative Pre-Training.” 2018, Accessed: Jun. 13, 2020. [Online]. Available: https://www.semanticscholar.org/paper/Improving-Language-Understanding-by-Generative-Radford/cd18800a0fe0b668a1cc19f2ec95b5003d0a5035, (preprint)

S. Kobayashi, soskek/chainer-openai-transformer-lm. 2020.

P. J. Liu et al., “Generating Wikipedia by Summarizing Long Sequences,” arXiv:1801.10198 [cs], Jan. 2018, Accessed: Jun. 12, 2020. [Online]. Available: http://arxiv.org/abs/1801.10198.

“Finetune Quickstart Guide — finetune 0.8.3 documentation.” https://finetune.indico.io/ (accessed Jun. 13, 2020).

J. H. Ward, “Hierarchical Grouping to Optimize an Objective Function,” Journal of the American Statistical Association, vol. 58, no. 301, pp. 236–244, Mar. 1963, doi: 10.1080/01621459.1963.10500845.

D. Ravichandran and E. Hovy, “Learning Surface Text Patterns for a Question Answering System,” presented at the Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia,USA, Jul. 2002, pp. 41–47.

A. Moschitti, D. Pighin, and R. Basili, “Semantic Role Labeling via Tree Kernel Joint Inference,” in Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X), New York City, Jun. 2006, pp. 61–68.

R. J. Kate and R. J. Mooney, “Semi-Supervised Learning for Semantic Parsing using Support Vector Machines,” in Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Short Papers (NAACL/HLT-2007), Rochester, NY, USA, Apr. 2007, pp. 81–84.

T. Joachims, “Transductive Inference for Text Classification using Support Vector Machines,” in Proceedings of the Sixteenth International Conference on Machine Learning, San Francisco, CA, USA, Jun. 1999, pp. 200–209.

Y. Ye, V. L. Fossum, and S. Abney, “Latent Features in Automatic Tense Translation between Chinese and English,” in Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney, Australia, Jul. 2006, pp. 48–55.

J. Bak, C.-Y. Lin, and A. Oh, “Self-disclosure topic model for Twitter conversations,” in Proceedings of the Joint Workshop on Social Dynamics and Personal Attributes in Social Media, Baltimore, Maryland, USA, Jun. 2014, pp. 42–49.

P. Gardner, “The representation of science-technology relationships in Canadian physics textbooks,” International Journal of Science Education, vol. 21, no. 3, pp. 329–347, Mar. 1999, doi: 10.1080/095006999290732.


Abstract Views: 171
PDF Downloads: 111

Metrics Information
Bookmark and Share