Multiprocessing DataFrame objects¶
-
class
flockmp.dataframe.
DataFrameAsync
¶ -
classmethod
apply
(dataframe, function, style='row-like', chunksize=100, poolSize=5)¶ First we segmentat the orginal
DataFrame
in chunks, then theexecuteAsync()
will parallelize the function’s operations on the segmented dataframes. There two options for the way it will operate, as row-like or block-like.Parameters: - dataframe (DataFrame) – Input Dataframe
- fuction (func) – Function to be applied on the dataframe
- chunksize (int) – How many chunks the original dataframe will be splitted
- poolSize (int) – Number of pools of processes
- style (str) – if “row-like”
function()
will be applied in row-by-row, otherwise it will be applied inDataFrame
chunks.
-
classmethod
Example¶
df = DataFrame({"a": list(range(1000)),
"b": list(range(1000, 2000))})
res = DataFrameAsync.apply(df, lambda x: x ** 2, style="block-like")