Open Access Open Access  Restricted Access Subscription Access

Imputation of trip data for a docked bike-sharing system


Affiliations
1 Department of Civil Engineering, Rajiv Gandhi Institute of Technology, Kottayam 686 501, India
2 Department of Civil Engineering, Indian Institute of Science, Bengaluru 560 012, India
3 Department of Civil Engineering, Transport Division, Universidad de Chile, Chile
 

Mobile application-based transportation services are reshaping the urban transportation industries of both the developed and developing worlds. They generate massive amounts of data, which have the potential to provide deeper insights into urban travel activity than ever before. The bike-sharing service (BSS) market is growing at a breakneck pace with new service providers entering the arena. However, we have seen the failure of several BSS start-ups in India in recent years. All these cases have one aspect in common: user dissatisfaction because of insufficient/ineffective rebalancing approaches. The BSS operators rely on data insights to drive their policies and strategies. However, the data generated by these services are found to have several incomplete records as a result of various technical errors, like missing origin/destination. As most BSS modelling focuses on trip origin and destination, completely ignoring (or listwise deleting) trips with missing information results in the loss of valuable data that are still present in other observed variables, which include trip duration, date and time of the trip, and so on. This study proposes two methods for imputing missing data: (i) a probabilistic approach based on Bayes’ theorem, and (ii) a machine learning approach based on the k-nearest neighbor algorithm. The methodologies for their analyses are presented in detail. Data from a BSS that operated in the Indian Institute of Science campus, Bengaluru, India, are used to illustrate the proposed approaches. This is followed by a brief discussion of the results and a comparison of the performance

Keywords

Bike-sharing system, imputation, incomplete records, origin and destination, probabilistic and machine learning approaches, trip data.
User
Notifications
Font Size

  • Park, S., Kim, B. and Lee, J., Social distancing and outdoor physical activity during the COVID-19 outbreak in South Korea: implications for physical distancing strategies. Asia Pac. J. Public Health, 2020, 32, 360–362.
  • Glass, C., Appiah-Opoku, S., Weber, J., Jr., Steven L. Jones, Chan, A. and Oppong, J., Role of bikeshare programs in transitoriented development: case of Birmingham, Alabama. J. Urban Plann. Dev., 2020, 146, 1–9.
  • Nguyen, M. H., Armoogum, J., Madre, J. L. and Garcia, C., Reviewing trip purpose imputation in GPS-based travel surveys. J. Traffic. Transp. Eng. (Eng. Ed.), 2020, 7, 395–412.
  • Liu, X., Methods for handling missing data. In Methods and Applications of Longitudinal Data Analysis, Academic Press, Imprint, 2016, pp. 441–473; ISBN: 978-0-12-801342-7; http://dx.doi.org/10.1016/B978-0-12-801342-7.00014-9.
  • Acuña, E. and Rodriguez, C., The treatment of missing values and its effect on classifier accuracy. Classification, Clustering and Data Mining Applications. In Proceedings of the Meeting of the International Federation of Classification Societies (IFCS), Illinois Institute of Technology, Chicago, Springer, Berlin, Heidelberg, 15– 18 July 2004; doi:10.1007/978-3-642-17103-1_60.
  • Badr, W., 6 Different ways to compensate for missing values in a dataset, 2019; https://towardsdatascience.com/6-different-ways-tocompensate-for-missing-values-data-imputation-with-examples-6022d9ca0779 (accessed on 5 December 2019).
  • García-Laencina, P. J., Morales-Sánchez, J., Verdú-Monedero, R., Larrey-Ruiz, J., Sancho-Gómez, J. L. and Figueiras-Vidal, A. R., Classification with incomplete data. Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques (eds Magdalena-Benedito and Serrano López, A.), IGI Global, 2010, pp. 147–175; http://doi:10.4018/978-1-60566766-9.ch007
  • Skarga-Bandurova, I., Biloborodova, T. and Dyachenko, Y., Strategy to managing mixed datasets with missing items. In 17 International Conference on Information Processing and Management of Uncertainty in Knowledge-Based System. Theory and Foundations (eds Medina, J. et al.). IPMU 2018, Cádiz, Spain, 11–15 June 2018. Part of Communications in Computer and Information Science (Book Series), Springer, Cham, vol. 854; https://doi.org/10.1007/978-3-319-91476-3_50.
  • Soley-Bori, M., Horn, M., Morgan, J. and Min Lee, K., Dealing with Missing Data: Key Assumptions and Methods for Applied Analysis, 2013.
  • Zhang, S., Jin, Z. and Zhu, X., Missing data imputation by utilizing information within incomplete instances. J. Syst. Softw., 2011, 84, 452–459.
  • Feng, T. and Timmermans, H. J. P., Comparison of advanced imputation algorithms for detection of transportation mode and activity episode using GPS data. Transp. Plann. Technol., 2016, 39, 180– 194; doi:http://dx.doi.org/10.1080/03081060.2015.1127540.
  • Yao, X., Gao, Y., Zhu, D., Manley, E., Wang, J. and Liu, Y., Spatial origin-destination flow imputation using graph convolutional networks. IEEE Trans. Intell. Transp. Syst., 2020, 1–11.
  • El Esawey, M., Using spatio-temporal data for estimating missing cycling counts: a multiple imputation approach. Transp. A: Transp. Sci., 2020, 16, 5–22; doi:10.1080/23249935.2018.1440262.
  • Liu, X. C., Taylor, J., Porter, R. J. and Wei, R., Using trajectory data to explore roadway characterization for bikeshare network. J. Intell. Transp. Syst. Technol. Plann., Oper., 2018, 22, 530–546; doi:https://doi.org/10.1080/15472450.2018.1444484.
  • Buhi, E. R., Goodson, P. and Neilands, T. B., Out of sight, not out of mind: strategies for handling missing data. Am. J. Health Behav., 2008, 32, 83–92; doi:https://doi.org/10.5993/AJHB.32.1.8.
  • Little, R., Calibrated Bayes, for statistics in general and missing data in particular 1. Stat. Sci., 2011, 26, 162–174; doi:10.1214/10-STS318.
  • Obadia, Y., The use of KNN for missing values, 2017; https:// towardsdatascience.com/the-use-of-knn-for-missing-valuescf33d935c637 (accessed on 5 December 2019).
  • Mucherino, A., Papajorgji, P. and Pardalos, P. M., k-Nearest neighbor classification. In Data Mining in Agriculture, Springer, New York, USA, 2009, pp. 83–106.
  • Batista, G. and Monard, M. C., A study of k-Nearest neighbour as an imputation method. Hybrid Intell. Syst. Ser. Front. Artif. Intell. Appl., 2002, 30, 251–260.
  • Mucherino, A., Papajorgji, P. J. and Pardalos, P. M., Validation. In Optimization Data Mining in Agriculture, Series: Springer Optimization and its Applications, Springer, New York, USA, 2009, pp. 161–172; https://doi.org/10.1007/978-0-387-88615-2.

Abstract Views: 404

PDF Views: 135




  • Imputation of trip data for a docked bike-sharing system

Abstract Views: 404  |  PDF Views: 135

Authors

Milan Mathew Thomas
Department of Civil Engineering, Rajiv Gandhi Institute of Technology, Kottayam 686 501, India
Ashish Verma
Department of Civil Engineering, Indian Institute of Science, Bengaluru 560 012, India
Sai Kiran Mayakuntla
Department of Civil Engineering, Transport Division, Universidad de Chile, Chile

Abstract


Mobile application-based transportation services are reshaping the urban transportation industries of both the developed and developing worlds. They generate massive amounts of data, which have the potential to provide deeper insights into urban travel activity than ever before. The bike-sharing service (BSS) market is growing at a breakneck pace with new service providers entering the arena. However, we have seen the failure of several BSS start-ups in India in recent years. All these cases have one aspect in common: user dissatisfaction because of insufficient/ineffective rebalancing approaches. The BSS operators rely on data insights to drive their policies and strategies. However, the data generated by these services are found to have several incomplete records as a result of various technical errors, like missing origin/destination. As most BSS modelling focuses on trip origin and destination, completely ignoring (or listwise deleting) trips with missing information results in the loss of valuable data that are still present in other observed variables, which include trip duration, date and time of the trip, and so on. This study proposes two methods for imputing missing data: (i) a probabilistic approach based on Bayes’ theorem, and (ii) a machine learning approach based on the k-nearest neighbor algorithm. The methodologies for their analyses are presented in detail. Data from a BSS that operated in the Indian Institute of Science campus, Bengaluru, India, are used to illustrate the proposed approaches. This is followed by a brief discussion of the results and a comparison of the performance

Keywords


Bike-sharing system, imputation, incomplete records, origin and destination, probabilistic and machine learning approaches, trip data.

References





DOI: https://doi.org/10.18520/cs%2Fv122%2Fi3%2F310-318