umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Simulated Data for Linear Regression with Structured and Sparse Penalties: Introducing pylearn-simulate
Brainomics Team, Neurospin, CEA Saclay, France.ORCID-id: 0000-0001-7119-7646
Brainomics Team, Neurospin, CEA Saclay, France.
Brainomics Team, Neurospin, CEA Saclay, France.
Brainomics Team, Neurospin, CEA Saclay, France.
Visa övriga samt affilieringar
2018 (Engelska)Ingår i: Journal of Statistical Software, ISSN 1548-7660, E-ISSN 1548-7660, Vol. 87, nr 3Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

A currently very active field of research is how to incorporate structure and prior knowledge in machine learning methods. It has lead to numerous developments in the field of non-smooth convex minimization. With recently developed methods it is possible to perform an analysis in which the computed model can be linked to a given structure of the data and simultaneously do variable selection to find a few important features in the data. However, there is still no way to unambiguously simulate data to test proposed algorithms, since the exact solutions to such problems are unknown.

The main aim of this paper is to present a theoretical framework for generating simulated data. These simulated data are appropriate when comparing optimization algorithms in the context of linear regression problems with sparse and structured penalties. Additionally, this approach allows the user to control the signal-to-noise ratio, the correlation structure of the data and the optimization problem to which they are the solution.

The traditional approach is to simulate random data without taking into account the actual model that will be fit to the data. But when using such an approach it is not possible to know the exact solution of the underlying optimization problem. With our contribution, it is possible to know the exact theoretical solution of a penalized linear regression problem, and it is thus possible to compare algorithms without the need to use, e.g., cross-validation.

We also present our implementation, the Python package pylearn-simulate, available at https://github.com/neurospin/pylearn-simulate and released under the BSD 3clause license. We describe the package and give examples at the end of the paper.

Ort, förlag, år, upplaga, sidor
Foundation for Open Access Statistics , 2018. Vol. 87, nr 3
Nyckelord [en]
simulated data, sparse and structured penalties, linear regression, Python
Nationell ämneskategori
Annan matematik
Forskningsämne
matematik; matematisk statistik
Identifikatorer
URN: urn:nbn:se:umu:diva-153002DOI: 10.18637/jss.v087.i03OAI: oai:DiVA.org:umu-153002DiVA, id: diva2:1260085
Tillgänglig från: 2018-11-01 Skapad: 2018-11-01 Senast uppdaterad: 2018-11-01

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltexthttps://www.jstatsoft.org/article/view/v087i03

Sök vidare i DiVA

Av författaren/redaktören
Löfstedt, Tommy
I samma tidskrift
Journal of Statistical Software
Annan matematik

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 85 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf