MDN (Multivariate analysis with Deep neural Network) python package

Share Button

MDN (Multivariate analysis with Deep neural Network)
Python script to easily implement multivariate regression in deep neural network with tensorflow. Script is available from the bottom of this post, embedded with Gist.

How to use.


import multivariate_dnn as mdn

Simply import this program.

Demonstration with sample data “d”.


nvar = 5 # specify number of factors (Variable X) to generate target variables
ylist = ['r', 'b']
d = mdn.generate_sample_data(nobs = 1000, nvar = nvar, ylist = ylist, seed = 1000)
d = mdn.create_dataset_multimodal(d = d, X = ['x'+str(i+1) for i in range(nvar)], Y = ['y'+str(i+1) for i in range(len(ylist))])

With generate_sample_data() you needs to specify following args:
seed: seed for the random generator (just for reproduce)
nobs: number of observations to generate
nvar: number of explanatory variable to generate
ylist: specify type of target variables as list
– ‘l’ or ‘linear’ for Linear (continuous variable)
– ‘r’ or ‘relu’ for ReLU function (Non-negative continuous variable)
– ‘s’ or ‘sigmoid’ for Sigmoid function (continuous variable 0-1)
– ‘b’ or ‘binomial’ for Binomial (discrete variable 0/1)
the number of type you specified as list here is automatically recognized as the number of target variable.
e.g. when you specified [‘r’, ‘s’, ‘b’], target Y1, Y2, Y3 are generated with ReLU, Sigmoid, Binomial respectively.

After the generation of sample data, or when you have your own dataset, you need to convert dataset to Dataset class constructed for MDN with create_dataset_multimodal()
d: data you created as pandas.DataFrame.
X: list of variables X. In the sample code, automatically generate list of x1, x2, … according to nvar.
Y: list of target variables Y. In the sample code, automatically generate list of y1, y2, … according to the length of ylist.

implement multivariate dnn


net = mdn.Network(
    data = d, 
    hidden_layers = [3, 3, 3], 
    activate_function = 'r', 
    output_function = ['r', 'b'], 
    use_standard_loss = False,
    opt = 'gd',
    batchsize = 100,
    nepoch = 3000,
    display_step = 100,
    only_
)
net.inference()

With Network() you need to specify following args:
data: Dataset class generated just for MDN. You can create your own Dataset with mdn.create_dataet_multimodal()
hidden_layers: number of hidden layers (length of lists) and number of neurons in each hidden layers.
e.g. [100, 50, 5] you can get 1st hidden layer with 100 neuron, 2nd with 50, 3rd with 5.
activate_function: Specify activate function FOR HIDDEN LAYERS.
output_function: Specify the number and the type of target variables (usage is same as ylist in generate_sample_data).
use_standard_loss: currently under development. Only “False” is valid.
opt: optimizer to minimize loss.
– ‘gd’ for GradientDescent
– ‘adagrad’ for AdaGrad
– ‘momentum’ for Momentum
– ‘adam’ for Adam
batchsize: batchsize for training
nepoch: number of epoch in training
display_step: result will be displayed in every [display_step] steps.
result_with_no_step: Training process are hidden and only results will be shown when True.

Finally, net.inference() to implement. Automatically begin training, test, plotting results.

import multivariate_dnn as mdn
import pandas as pd
from IPython.display import Markdown, display, HTML

%matplotlib inline
nvar = 5 # specify number of factors (Variable X) to generate target variables
ylist = ['r', 's']
d = mdn.generate_sample_data(nobs = 1000, nvar = nvar, ylist = ylist, seed = 1000)
d = mdn.create_dataset_multimodal(d = d, X = ['x'+str(i+1) for i in range(nvar)], Y = ['y'+str(i+1) for i in range(len(ylist))])

mdn.new_inference(
    data = d, 
    varlist = ['x'+str(i+1) for i in range(nvar)],
    targetlist = ['y'+str(i+1) for i in range(len(ylist))],
    hidden_layers = [3, 3, 3], 
    activate_function = 'r', 
    output_function = ['r', 's'], 
    use_standard_loss = False,
    opt = 'gd',
    batchsize = 100,
    nepoch = 3000,
    display_step = 100
)

You can simply implement MDN with new_inference()
This function includes Dataset(), Network(), Network().inference()

Compare performances across the different sets of variables

You can easily compare the results with/without using several groups of variables X and get results when completely same settings of network with different groups of variables.

import multivariate_dnn as mdn
import pandas as pd
from IPython.display import Markdown, display, HTML

%matplotlib inline
nvar = 10 # specify number of factors (Variable X) to generate target variables
ylist = ['r', 's']
d = mdn.generate_sample_data(nobs = 1000, nvar = nvar, ylist = ylist, seed = 1000)
d = mdn.create_dataset_multimodal(d = d, X = ['x'+str(i+1) for i in range(nvar)], Y = ['y'+str(i+1) for i in range(len(ylist))])

compare_multiple_varlist(
    data = d,
    varlist_list = [
        ['x1', 'x2'], # model 1
        ['x1', 'x2', 'x5'], # model 2
        ['x1', 'x5'],
        ['x'+str(i+1) for i in range(5)]
    ], 
    targetlist = [y1, y2], 
    hidden_layers = [3, 3, 3], 
    activate_function = ['s'], 
    output_function = ['r', 's'], 
    batchsize = 100, 
    nepoch = 1000, 
    display_step = 100, 
    use_standard_loss = False, 
    opt = 'gd'
)
varlist_list: Previously you specify “varlist”, however, in this time you need to specify multiple varlist as a nested list.

Compare performances across the different settings of network across the different sets of variables

Moreover, you can easily compare the results across the different settings of network with/without using several groups of variables X.

import multivariate_dnn as mdn
import pandas as pd
from IPython.display import Markdown, display, HTML

nvar = 5
ylist = ['r', 's']
d = mdn.generate_sample_data(nobs = 1000, nvar = nvar, ylist = ylist, seed = 1000)
d = mdn.create_dataset_multimodal(d = d, X = ['x'+str(i+1) for i in range(nvar)], Y = ['y'+str(i+1) for i in range(len(ylist))])

mdn.compare_multiple_setting(
    data = d, 
    varlist_list = [
        ['x1', 'x2'], # model 1
        ['x1', 'x2', 'x5'], # model 2
        ['x1', 'x5'],
        ['x'+str(i+1) for i in range(5)]
    ], 
    targetlist = ['y1', 'y2'], 
    hidden_layers_list = [
        [3, 3, 3], 
        [5, 5, 5], 
        [10, 10]
    ], 
    activate_function_list = ['r','s'], 
    output_function = ['r', 's'], 
    batchsize_list = [10,100,300,500], 
    nepoch_list = [500,5000,10000,30000], 
    display_step = 100, 
    use_standard_loss = False, 
    opt_list = ['gd'],
    result_filename = 'demo', 
    result_replace =  True
)
[any]_list means the implementation of loops in the specified setting.
e.g. batchsize_list = [10,100,300,500] means the implementation loop when different 4 types of batchsize setting across the 4 models of variables.
result_filename: performance will be saved in [specified strings].csv
result_replace: Boolean to choose whether [replace the existing result file with new loops] or [Add result of new loops into the existing result file].

Finally, you can compare results with…
result = pd.read_csv(fname = result_filename, index_col=0)
display(result)

Disclaimer Warning

You can use this for free, however, I will not take any of responsibilities associated with the use of this or the generated results.

Share Button

コメントを残す

メールアドレスが公開されることはありません。 * が付いている欄は必須項目です

This site uses Akismet to reduce spam. Learn how your comment data is processed.