Jump to content

User:Quaenuncabibis/Biogeme

From Wikipedia, the free encyclopedia
Biogeme
Original author(s)Michel Bierlaire
Developer(s)Michel Bierlaire
Initial release2000; 24 years ago (2000)
Stable release
3.2.6 / 5 June 2020; 4 years ago (2020-06-05)
Repositorygithub.com/michelbierlaire/biogeme
Written inPython
Operating systemLinux, macOS, Windows
Available inEnglish
TypeScientific
Websitebiogeme.epfl.ch

Biogeme is an open source software package dedicated to the estimation of discrete choice models. It implements optimization algorithms to perform maximum likelihood estimation of the model parameters. The users specify the model using a modeling language. The current version of the software is offered as a package of the Python programming language.

History[edit]

The development of Biogeme as a freeware package has started in 2000. Before that, a software package called "Hielow" was the precursor of Biogeme. Developed in the 1990's under Windows, it was designed to estimate nested logit models, and came with a graphical use interface.[1]

Versions[edit]

Various versions of Biogeme have been released over the years. The three major releases are associated with a complete reimplementation of the software, in order to exploit the most up-to-date technology.[2]

Version 1.0[edit]

BisonBiogeme, released in 2000, is a stand-alone software written in C++ and based on a simple modeling language derived from a parser generator called Bison.[3][4]

Version 2.0[edit]

PythonBiogeme, released in 2010, is a stand-alone software written in C++ and based on Python for the modeling language.[5]

Version 3.0[edit]

PandasBiogeme, released in 2018, is a Python package, written both in C++ and Python, based on Python for the modeling language, and on Pandas for the data management.[6]

Development[edit]

Biogeme is developed and maintained by Michel Bierlaire at EPFL's (École Polytechnique Fédérale de Lausanne) Transport and Mobility Laboratory.[7][8] The latest version of PandasBiogeme is a Python package, developed in Python and in C++.

Distribution[edit]

The source code is available on GitHub.[9] The Python package is available on the Python Package Index.[10] Material, including examples of models, text and video tutorials, and real data, is available on the Biogeme's webpage.[2]

Features[edit]

Biogeme offers a great deal of flexibility to the modeler to code the (log) likelihood function of a model using the features of the Python language, as well as several utilities that are specific to Biogeme. The analytical derivatives of the log likelihood function are automatically calculated by the software using automatic differentiation. A library of choice models is directly available from the package. They include the logit model, the nested logit model,[11][12][13] the cross-nested logit model,[14][15] multivariate extreme value models,[16] and mixtures of logit models.[17] A tutorial is available for the specification of hybrid choice models.[18][19][20] that is choice models with latent variables. Biogeme can handle both cross-sectional and panel data.

It also offers a simulation feature that allows to apply a previously estimated model on the database, and to derive various indicators[21], such as consumer surplus, elasticities and market shares.

In addition to maximum likelihood estimation, Biogeme can deal with models with random parameters, and perform simulated maximum likelihood estimation, using Monte-Carlo simulation. A library of draw generation functions (random, Halton, antithetic) is available in the package.

The documentation consists in videos,[22] reports and online documentation of the code.[23]

Example[edit]

The following example illustrates how to use Biogeme for the estimation of a logit model with three alternatives. It uses the Swissmetro dataset,[24] collected from a stated preferences survey in 1998 in Switzerland,[25] and used for teaching purposes. The code is decomposed in several part:

Importing of the modules:

import pandas as pd
import biogeme.database as db
import biogeme.biogeme as bio
import biogeme.models as models
from biogeme.expressions import Beta

Reading the data:

df = pd.read_csv('swissmetro.dat', '\t')
database = db.Database('swissmetro', df)
globals().update(database.variables)

Removing some observations from the data set:

exclude = ((PURPOSE != 1) * (PURPOSE != 3) + (CHOICE == 0)) > 0
database.remove(exclude)

Defining the list of parameters to be estimated:

ASC_CAR = Beta('ASC_CAR', 0, None, None, 0)
ASC_TRAIN = Beta('ASC_TRAIN', 0, None, None, 0)
ASC_SM = Beta('ASC_SM', 0, None, None, 1)
B_TIME = Beta('B_TIME', 0, None, None, 0)
B_COST = Beta('B_COST', 0, None, None, 0)

Defining new variables:

SM_COST = SM_CO * (GA == 0)
TRAIN_COST = TRAIN_CO * (GA == 0)
CAR_AV_SP = CAR_AV * (SP != 0)
TRAIN_AV_SP = TRAIN_AV * (SP != 0)
TRAIN_TT_SCALED = TRAIN_TT / 100
TRAIN_COST_SCALED = TRAIN_COST / 100
SM_TT_SCALED = SM_TT / 100
SM_COST_SCALED = SM_COST / 100
CAR_TT_SCALED = CAR_TT / 100
CAR_CO_SCALED = CAR_CO / 100

Defining the utility functions:

V1 = ASC_TRAIN + \
     B_TIME * TRAIN_TT_SCALED + \
     B_COST * TRAIN_COST_SCALED
V2 = ASC_SM + \
     B_TIME * SM_TT_SCALED + \
     B_COST * SM_COST_SCALED
V3 = ASC_CAR + \
     B_TIME * CAR_TT_SCALED + \
     B_COST * CAR_CO_SCALED

Associating the utility functions with the numbering of alternatives:

V = {1: V1,
     2: V2,
     3: V3}

Associating the availability conditions with the alternatives:

av = {1: TRAIN_AV_SP,
      2: SM_AV,
      3: CAR_AV_SP}

Defining the contribution of each observation to the log likelihood function. Here, a logit model its considered:

logprob = models.loglogit(V, av, CHOICE)

Creating the Biogeme object:

biogeme = bio.BIOGEME(database, logprob)
biogeme.modelName = '01logit'

Estimating the parameters:

results = biogeme.estimate()

Obtaining the results as a Pandas table:

pandasResults = results.getEstimatedParameters()
print(pandasResults)

The output is

              Value   Std err     t-test   p-value  Rob. Std err  Rob. t-test  Rob. p-value
ASC_CAR   -0.154603  0.043235  -3.575840  0.000349      0.058163    -2.658079      0.007859
ASC_TRAIN -0.701147  0.054874 -12.777443  0.000000      0.082562    -8.492375      0.000000
B_COST    -1.083768  0.051830 -20.910063  0.000000      0.068224   -15.885339      0.000000
B_TIME    -1.277885  0.056883 -22.464979  0.000000      0.104255   -12.257328      0.000000

Other software packages for discrete choice models[edit]

  • MIXL: Simulated Maximum Likelihood Estimation of Mixed Logit Models for Large Datasets, by Joseph Malloy[26]
  • LARCH: A Freeware Package for Estimating Discrete Choice Models, by Jeffrey Newman.[27]
  • Apollo: a flexible, powerful and customisable freeware package for choice model estimation and application, by Stephane Hess and David Palma.[28]
  • ALOGIT: Software for estimating and analysing generalised logit choice models, by Andrew Daly.[29]
  • NLOGIT: Estimation and analysis tools for multinomial choice modeling, by William Greene.[30]

Literature[edit]

References[edit]

  1. ^ Dellaert, B. G. C.; Waerden, P. van der (1997). "A review of the software package HIELOW: Hierarchical Logit for Windows". Journal of Retailing and Consumer Services. 4 (2): 135–138. ISSN 0969-6989.
  2. ^ a b "Biogeme". biogeme.epfl.ch. Retrieved 2021-01-23.
  3. ^ Bierlaire, Michel (July 20, 2015). "BisonBiogeme: estimating a first model" (PDF). Technical Report - Transport and Mobility Laboratory. TRANSP-OR 150720.
  4. ^ Bierlaire, Michel (November 2, 2015). "BisonBiogeme: syntax of the modeling language" (PDF). Technical Report - Transport and Mobility Laboratory. TRANSP-OR 151102.
  5. ^ Bierlaire, Michel (July 6, 2016). "PythonBiogeme: a short introduction" (PDF). Technical Report - Transport and Mobility Laboratory. TRANSP-OR 160706.
  6. ^ Bierlaire, Michel (June 5, 2020). "A short introduction to PandasBiogeme" (PDF). Technical Report - Transport and Mobility Laboratory. TRANSP-OR 200605.
  7. ^ "Michel Bierlaire". EPFL.{{cite web}}: CS1 maint: url-status (link)
  8. ^ "Transport and Mobility Laboratory". www.epfl.ch. Retrieved 2021-01-25.
  9. ^ Bierlaire, Michel (2021-01-15), michelbierlaire/biogeme, retrieved 2021-01-25
  10. ^ Bierlaire, Michel, biogeme: Estimation and application of discrete choice models, retrieved 2021-01-25
  11. ^ Ben-Akiva, Moshe (1973). "Structure of passenger travel demand models". PhD dissertation Dpt of Civil ENgineering MIT.
  12. ^ Daly, Andrew; Zachary, Stanley (1979). Hensher, David; Dalvi, Q. (eds.). "Improved multiple choice models". Identifying and Measuring the Determinants of Model Choice. Teakfield, London.
  13. ^ Willians, H. C. W. L. (1977). "On the formation of travel demand models and economic evaluation measures of user benefit". Envir. and Planning. 9: 285–344.
  14. ^ Vovsha, Peter (1997). "Application of Cross-Nested Logit Model to Mode Choice in Tel Aviv, Israel, Metropolitan Area". Transportation Research Record. 1607: 6–15.
  15. ^ Bierlaire, Michel (2006). "A theoretical analysis of the cross-nested logit model". Ann Oper Res. 144: 287–300.
  16. ^ McFadden, Daniel (1978). Karlquist (ed.). "Modeling the choice of residential location". Spatial Interaction Theory and Residential Location. North Holland, Amsterdam: 75–96.
  17. ^ McFadden, Daniel; Train, Kenneth (2000). "Mixed MNL models for discrete response". Journal of Applied Econometrics. 15 (5): 447–470.
  18. ^ Bierlaire, Michel (2018). "Estimating choice models with latent variables with PandasBiogeme" (PDF). Technical Report - Transport and Mobility Laboratory. TRANSP-OR 181227.
  19. ^ Ben-Akiva, Moshe; et al. (2002). "Hybrid Choice Models: Progress and Challenges". Marketing Letters. 13: 163–175.
  20. ^ Abou-Zeid, Maya; Ben-Akiva, Moshe (2014). "Hybrid Choice Models". In Hess, Stephane; Daly, Andrew (eds.). Handbook of Choice Modelling. Edward Elgar Publishing. pp. 383–412. ISBN 9781781003145.
  21. ^ Bierlaire, Michel (2018). "Calculating indicators with PandasBiogeme" (PDF). Technical Report - Transport and Mobility Laboratory. TRANSP-OR 181223.
  22. ^ "Biogeme". biogeme.epfl.ch. Retrieved 2021-01-25.
  23. ^ "Welcome to Biogeme's documentation! — Biogeme 3.2.6 documentation". biogeme.epfl.ch. Retrieved 2021-01-25.
  24. ^ "Swissmetro Dataset". EPFL.{{cite web}}: CS1 maint: url-status (link)
  25. ^ Bierlaire, Michel; Axhausen, Kay; Abay, Georg (2001). "The acceptance of modal innovation: The case of Swissmetro" (PDF). Proceedings of the 1st Swiss Transport Research Conference.
  26. ^ "mixl: Simulated Maximum Likelihood Estimation of Mixed Logit Models for Large Datasets version 1.2.3 from CRAN". rdrr.io. Retrieved 2021-01-25.
  27. ^ "Larch Documentation — Larch 5.4.1 documentation". larch.newman.me. Retrieved 2021-01-25.
  28. ^ "Apollo website". www.apollochoicemodelling.com. Retrieved 2021-01-25.
  29. ^ "ALOGIT". www.alogit.com. Retrieved 2021-01-25.
  30. ^ "NLOGIT Software | Multinomial Logistic Regression | LIMDEP Included". www.limdep.com. Retrieved 2021-01-25.

External links[edit]

Category:Open-source software