Multiprocessing DataFrame objects

class flockmp.dataframe.DataFrameAsync
classmethod apply(dataframe, function, style='row-like', chunksize=100, poolSize=5)

First we segmentat the orginal DataFrame in chunks, then the executeAsync() will parallelize the function’s operations on the segmented dataframes. There two options for the way it will operate, as row-like or block-like.

Parameters:
  • dataframe (DataFrame) – Input Dataframe
  • fuction (func) – Function to be applied on the dataframe
  • chunksize (int) – How many chunks the original dataframe will be splitted
  • poolSize (int) – Number of pools of processes
  • style (str) – if “row-like” function() will be applied in row-by-row, otherwise it will be applied in DataFrame chunks.

Example

df = DataFrame({"a": list(range(1000)),
                "b": list(range(1000, 2000))})
res = DataFrameAsync.apply(df, lambda x: x ** 2, style="block-like")