The New Dataset MITWPU-1K for Object Recognition and Image Captioning Tasks
Received: 5 May 2022 | Revised: 19 May 2022 | Accepted: 20 May 2022 | Online: 7 August 2022
Corresponding author: M. Bhalekar
Abstract
In the domain of image captioning, many pre-trained datasets are available. Using these datasets, models can be trained to automatically generate image descriptions regarding the contents of an image. Researchers usually do not spend much time in creating and training the new dataset before using it for a specific application, instead, they simply use existing pre-trained datasets. MS COCO, ImageNet, Flicker, and Pascal VOC, are well-known datasets that are widely used in the task of generating image captions. In most available image captioning datasets, image textual information, which can play a vital role in generating more precise image descriptions, is missing. This paper presents the process of creating a new dataset that consists of images along with text and captions. Images of the nearby vicinity of the campus of MIT World Peace University-MITWPU, India, were taken for the new dataset named MITWPU-1K. This dataset can be used in object detection and caption generation of images. The objective of this paper is to highlight the steps required for creating a new dataset. This necessitated a review of the existing dataset models prior to creating the new dataset. A sequential convolutional model for detecting objects on a new dataset is also presented. The process of creating a new image captioning dataset and the gained insights are described.
Keywords:
convolutional model, dataset, image captioning, image labelling, object detectionDownloads
References
T.-Y. Lin et al., "Microsoft COCO: Common Objects in Context," in Computer Vision – ECCV 2014, 2014, pp. 740–755. DOI: https://doi.org/10.1007/978-3-319-10602-1_48
M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, "The Pascal Visual Object Classes Challenge: A Retrospective," International Journal of Computer Vision, vol. 111, no. 1, pp. 98–136, Jan. 2015. DOI: https://doi.org/10.1007/s11263-014-0733-5
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "ImageNet: A large-scale hierarchical image database," in 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, Jun. 2009, pp. 248–255. DOI: https://doi.org/10.1109/CVPR.2009.5206848
R. Doon, T. Kumar Rawat, and S. Gautam, "Cifar-10 Classification using Deep Convolutional Neural Network," in 2018 IEEE Punecon, Pune, India, Aug. 2018. DOI: https://doi.org/10.1109/PUNECON.2018.8745428
J. Redmon and A. Farhadi, "YOLO9000: Better, Faster, Stronger," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, Jul. 2017, pp. 6517–6525. DOI: https://doi.org/10.1109/CVPR.2017.690
Y. Jia et al., "Caffe: Convolutional Architecture for Fast Feature Embedding," in MM ’14: Proceedings of the 22nd ACM international conference on Multimedia, New York, NY, USA, Aug. 2014, pp. 675–678. DOI: https://doi.org/10.1145/2647868.2654889
V. Sharma and R. N. Mir, "A comprehensive and systematic look up into deep learning based object detection techniques: A review," Computer Science Review, vol. 38, Nov. 2020, Art. no. 100301. DOI: https://doi.org/10.1016/j.cosrev.2020.100301
M. D. Z. Hossain, F. Sohel, M. F. Shiratuddin, and H. Laga, "A Comprehensive Survey of Deep Learning for Image Captioning," ACM Computing Surveys, vol. 51, no. 6, pp. 118:1-118:36, Oct. 2019. DOI: https://doi.org/10.1145/3295748
G. Tanner, "Creating your own object detector," Towards Data Science, Feb. 06, 2019. https://towardsdatascience.com/creating-your-own-object-detector-ad69dda69c85.
M. Bhalekar and M. Bedekar, "D-CNN: A New model for Generating Image Captions with Text Extraction Using Deep Learning for Visually Challenged Individuals," Engineering, Technology & Applied Science Research, vol. 12, no. 2, pp. 8366–8373, Apr. 2022. DOI: https://doi.org/10.48084/etasr.4772
B. Ahmed, G. Ali, A. Hussain, A. Baseer, and J. Ahmed, "Analysis of Text Feature Extractors using Deep Learning on Fake News," Engineering, Technology & Applied Science Research, vol. 11, no. 2, pp. 7001–7005, Apr. 2021. DOI: https://doi.org/10.48084/etasr.4069
S. Nuanmeesri, "A Hybrid Deep Learning and Optimized Machine Learning Approach for Rose Leaf Disease Classification," Engineering, Technology & Applied Science Research, vol. 11, no. 5, pp. 7678–7683, Oct. 2021. DOI: https://doi.org/10.48084/etasr.4455
Downloads
How to Cite
License
Copyright (c) 2022 Madhuri Bhalekar Madhuri, Mangesh Bedekar
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.