TRAIN_TEST_SPLIT

The TRAIN_TEST_SPLIT node is used to split the data into test and training according to a size specified before any ML tasks.Params:test_size : floatThe size of testing data specified.Returns:train : DataFrameA dataframe of training data.test : DataFrameA dataframe of test data.

Python Code

from typing import TypedDict
from flojoy import flojoy, DataFrame
from sklearn.model_selection import train_test_split


class TrainTestSplitOutput(TypedDict):
    train: DataFrame
    test: DataFrame


@flojoy(deps={"scikit-learn": "1.2.2"})
def TRAIN_TEST_SPLIT(
    default: DataFrame, test_size: float = 0.2
) -> TrainTestSplitOutput:
    """The TRAIN_TEST_SPLIT node is used to split the data into test and training according to a size specified before any ML tasks.

    Parameters
    ----------
    test_size : float
        The size of testing data specified.

    Returns
    -------
    train: DataFrame
        A dataframe of training data.
    test: DataFrame
        A dataframe of test data.
    """

    df = default.m
    train, test = train_test_split(df, test_size=test_size)
    return TrainTestSplitOutput(train=DataFrame(df=train), test=DataFrame(df=test))

Find this Flojoy Block on GitHub

Example

Having problem with this example app? Join our Discord community and we will help you out!

In this example, the READ_CSV node loads a local .csv file and passes it to our TRAIN_TEST_SPLIT node which divides up the data file according to the test size specified which then can be used for training and testing for ML models. The information is displayed with TABLE node.