nlpretext

All the goto functions you need to handle NLP use-cases, integrated in NLPretext.

class Preprocessor[source]

Bases: object

pipe(operation, args=None)[source]

Add an operation and its arguments to pipe in the preprocessor.

Parameters:

Return type:

None

static build_pipeline(operation_list)[source]

Build sklearn pipeline from a operation list.

Parameters:: operation_list (iterable) – list of __operations of preprocessing
Return type:: sklearn.pipeline.Pipeline

Apply pipeline to text.

preprocessor

class Preprocessor[source]

Bases: object

pipe(operation, args=None)[source]

Add an operation and its arguments to pipe in the preprocessor.

Parameters:

Return type:

None

static build_pipeline(operation_list)[source]

Build sklearn pipeline from a operation list.

Parameters:: operation_list (iterable) – list of __operations of preprocessing
Return type:: sklearn.pipeline.Pipeline

Apply pipeline to text.

class TextLoader(text_column='text', encoding='utf-8', file_format=None, use_dask=True)[source]

Bases: object

read_text(files_path, file_format=None, encoding=None, compute_to_pandas=True, preprocessor=None)[source]

Read the text files stored in files_path.

Parameters:

files_path (string | list[string]) – single or multiple files path
file_format (string) – Format of the files to be loaded, to be selected among csv, json, parquet or txt
encoding (Optional[str]) – encoding of the text to be loaded, can be utf-8 or latin-1 for example
compute_to_pandas (bool) – True if user wants Dask Dataframe to be computed as pandas DF, False otherwise
preprocessor (nlpretext.preprocessor.Preprocessor) – NLPretext preprocessor can be specified to pre-process text after loading

Return type:

dask.dataframe | pandas.DataFrame