HMLP: High-performance Machine Learning Primitives
|
#include <thread.hpp>
Public Member Functions | |
Worker () | |
Device implementation. More... | |
Worker (thread_communicator *my_comm) | |
void | Communicator (thread_communicator *comm) |
bool | Master () |
void | Barrier () |
void | InitWithCommunicator (thread_communicator *comm, size_t tid, size_t gid) |
Worker | Split () |
template<typename Arg > | |
void | Bcast (Arg &buffer) |
template<int ALIGN_SIZE, typename T > | |
T * | AllocateSharedMemory (size_t count) |
template<typename T > | |
void | FreeSharedMemory (T *ptr) |
size_t | BalanceOver1DGangs (size_t n, size_t default_size, size_t nb) |
tuple< size_t, size_t, size_t > | DistributeOver1DGangs (size_t beg, size_t end, size_t nb) |
tuple< size_t, size_t, size_t > | DistributeOver1DThreads (size_t beg, size_t end, size_t nb) |
void | SetDevice (class Device *device) |
class Device * | GetDevice () |
bool | Execute (class Task *task) |
The work executes the task in the runtime system. I left some code commented out because there is no GPU support now. With GPUs (or some distributed devices), we need to first gather data before the execution. More... | |
void | WaitExecute () |
Pose a barrier if the device owned by this worker is performing asybchronous execution. More... | |
float | EstimateCost (class Task *task) |
Public Attributes | |
int | tid = 0 |
int | gid = 0 |
int | child_gid = 0 |
int | jc_id |
int | pc_id |
int | ic_id |
int | jr_id |
int | ic_jr |
int | jc_nt |
int | pc_nt |
int | ic_nt |
int | jr_nt |
thread_communicator * | my_comm |
thread_communicator * | jc_comm |
thread_communicator * | pc_comm |
thread_communicator * | ic_comm |
thread_communicator * | comm = NULL |
class Scheduler * | scheduler |
end class thread_communicator
bool hmlp::Worker::Execute | ( | class Task * | task | ) |
The work executes the task in the runtime system. I left some code commented out because there is no GPU support now. With GPUs (or some distributed devices), we need to first gather data before the execution.
Loop over each task in the batch. Notice that this may be an asynchronous execution. Wait for all tasks in the batch to terminate. Loop over each task in the batch.
*task | The current task pointer. |
Some tasks may be in "EXECUTED" status.
Move to the next task in the batch
Wait for all tasks in the batch to terminate.
Move to the next task in the batch
Set my current executing task to NULL.
class Device * hmlp::Worker::GetDevice | ( | ) |
Return the device pointer attached to the worker.
void hmlp::Worker::SetDevice | ( | class Device * | device | ) |
Assign a device to this worker.
Worker hmlp::Worker::Split | ( | ) |
By default, we split threads evenly usine "close" affinity. Threads with the same color will be in the same subcomm.
example: (n_splits=2)
tid 0 1 2 3 4 5 6 7 color 0 0 0 0 1 1 1 1 (gang id) first 0 0 0 0 4 4 4 4 last 4 4 4 4 8 8 8 8 child_tid 0 1 2 3 0 1 2 3 4 4 4 4 4 4 4 4 (child_n_threads)
make sure all threads have the new communicator
void hmlp::Worker::WaitExecute | ( | ) |
Pose a barrier if the device owned by this worker is performing asybchronous execution.