pandalyse

This is the documentation of pandalyse.

Note

This package is in the early stage of development.

What is pandalyse

Pandalyse offers an analysis environment and tools for pandas. The main features are

  • Selector: Define and store cuts in a Selector object: cutted_df = selector(df)
  • Trainer: Train multiple mva with scikit-learn interface in the way: trainer.fit(signal_df, background_df)
  • Analysis: Store and retrive Selectors, Trainings, numpy-arrays and dataframes in predefined locations: df = ana.data.get(“MySignalData”)

Installation

pip install pandalyse

Usage

Selectors

Selectors store cuts on colums of a pandas.Dataframe. All cuts are stored as a list of strings, which are applied with the AND condition.

Example:

import pandalyse

sel = pandalyse.Selector(['column1 > 0', 'column2 == 1'])

# Assume the existance of a pandas datframe 'df' and 'second_df'
df_cutted_1 = sel(df)

sel.add_cut('column3 < 100')
df_cutted_2 = sel(second_df)

df_cutted_3 = sel(df, 'Temporary_Cut == 1')

Analysis

The Analysis is the central part of pandalyse. It consists of a .pandalyse file which contains information on folders where pandas.DatsFrames, pandalyse.Selectors, pandalyse.Trainer and numpy.arrays are stored.

Example:

import pandalyse
import numpy as np

ana = pandalyse.analysis('path/to/(desired)/analysis/dir')
# ana = pandalyse.analysis() will use `pwd`

# ...
# assuming the existance of a signal and background dataframe
ana.data.add(df_bkg, 'background')
ana.data.add(df_sig, 'signal')

# doing some calculations
ana.values.add(0.5, 'efficiency')
ana.values.add(np.arange(3), 'example_array')

print(ana.values.example_array/ana.values.efficiency)
# >> [0, 0.5, 1]

# ls path/to/(desired)/analysis/dir
# >> background.hdf signal.hdf efficiency.val example_array.val

Trainer

A pandalyse.Trainer can take a list of features of a dataframe and classifyer with an sklearn interface methods can be added.

Example:

import pandalyse

ana = pandalyse.analysis()

tr = pandalyse.Trainer(['column1', 'column2'])
tr.add_method('bdt', some.sklearn_like.classifyer())
tr.add_method('nn',  some.sklearn_like.classifyer2())

tr.fit(ana.data.get('signal'), ana.data.get('background'))

ana.trainigs.add(tr, 'first_training')

Contents

Indices and tables