group2 0.1.0
CSE 125 Group 2
Loading...
Searching...
No Matches
group2::perf Namespace Reference

Namespaces

namespace  shotlog

Classes

struct  PerScopeStats
 Per-scope, all-thread atomic counters. More...
struct  NetworkCounters
 Per-tick network counters maintained by the network code. More...
struct  Snapshot
 Globally-visible snapshot returned to the aggregator callback. More...
class  ScopeTimer
 RAII scoped timer. More...

Typedefs

using ScopeId = std::uint16_t
 Dense small id used to index the global stats table.

Functions

void initParallelFromEnv ()
 Initialize from environment.
template<class Iter, class Fn>
void parallelFor (Iter begin, Iter end, Fn &&fn)
 Call fn(*it) for every element in [begin, end).
ScopeId registerScope (const char *name)
 Register (or look up) a scope name and return its dense id.
const char * scopeName (ScopeId id)
 Returns the human-readable name a ScopeId was registered with, or "" if id is out of range.
std::size_t scopeCount ()
 Returns the highest registered id + 1.
void recordSample (ScopeId id, std::uint64_t ticks) noexcept
 Recording entry point — public so unit tests can invoke it directly without a real ScopeTimer.
void tickEnd (std::uint64_t tickWallNs) noexcept
 Tick boundary marker — call once per server tick() end.
void initFromEnv ()
 Initialize from environment variables.
void startAggregator (std::function< void(const Snapshot &)> cb)
 Spawn the 1 Hz aggregator thread.
void stopAggregator ()
 Stop the aggregator and join its thread. Idempotent.
NetworkCountersnet ()
 Network counter accessor. Hot-path code increments these directly.
std::uint64_t ticksToNs (std::uint64_t ticks) noexcept
 Convenience: convert SDL performance-counter ticks to nanoseconds.

Variables

std::atomic< bool > parallelEnabled {true}
 Master switch for parallel execution.
constexpr std::size_t k_parallelThreshold = 64
 Minimum items below which parallelFor runs sequentially even when the master switch is on.
std::atomic< bool > enabled {false}
 Master switch.
constexpr std::size_t k_maxScopes = 64
 Compile-time caps.
constexpr std::size_t k_histogramBuckets = 32
 Histogram bucket count.
constexpr ScopeId k_invalidScope = static_cast<ScopeId>(-1)

Typedef Documentation

◆ ScopeId

using group2::perf::ScopeId = std::uint16_t

Dense small id used to index the global stats table.

Function Documentation

◆ initFromEnv()

void group2::perf::initFromEnv ( )

Initialize from environment variables.

Call once at startup, before the first scope is hit. GROUP2_SERVER_PROFILE=1 → enable sampling + 1 Hz log line GROUP2_SERVER_PROFILE_CSV=path → also write CSV rows to path

Idempotent.

Here is the caller graph for this function:

◆ initParallelFromEnv()

void group2::perf::initParallelFromEnv ( )
inline

Initialize from environment.

Idempotent.

Default ON (PR-8). GROUP2_SERVER_PARALLEL=0 flips it off for diagnostics / A-B comparison; any other value (or unset) leaves it on.

Here is the caller graph for this function:

◆ net()

NetworkCounters & group2::perf::net ( )
inline

Network counter accessor. Hot-path code increments these directly.

Here is the caller graph for this function:

◆ parallelFor()

template<class Iter, class Fn>
void group2::perf::parallelFor ( Iter begin,
Iter end,
Fn && fn )
inline

Call fn(*it) for every element in [begin, end).

Routes through TBB when (a) available, (b) the runtime flag is on, and (c) the input range is large enough to amortize dispatch cost. Otherwise sequential.

Here is the caller graph for this function:

◆ recordSample()

void group2::perf::recordSample ( ScopeId id,
std::uint64_t ticks )
noexcept

Recording entry point — public so unit tests can invoke it directly without a real ScopeTimer.

Here is the caller graph for this function:

◆ registerScope()

ScopeId group2::perf::registerScope ( const char * name)

Register (or look up) a scope name and return its dense id.

First call for a given name is O(n) over already-registered scopes; subsequent calls are cached at the call site. Thread-safe.

◆ scopeCount()

std::size_t group2::perf::scopeCount ( )

Returns the highest registered id + 1.

◆ scopeName()

const char * group2::perf::scopeName ( ScopeId id)

Returns the human-readable name a ScopeId was registered with, or "" if id is out of range.

Used by the aggregator's logger.

◆ startAggregator()

void group2::perf::startAggregator ( std::function< void(const Snapshot &)> cb)

Spawn the 1 Hz aggregator thread.

Calls cb(snap) once per second on a dedicated thread. Safe to call once. cb runs on the aggregator thread, so do not touch ECS / non-thread-safe state from inside it.

Here is the caller graph for this function:

◆ stopAggregator()

void group2::perf::stopAggregator ( )

Stop the aggregator and join its thread. Idempotent.

Here is the caller graph for this function:

◆ tickEnd()

void group2::perf::tickEnd ( std::uint64_t tickWallNs)
noexcept

Tick boundary marker — call once per server tick() end.

Here is the caller graph for this function:

◆ ticksToNs()

std::uint64_t group2::perf::ticksToNs ( std::uint64_t ticks)
inlinenoexcept

Convenience: convert SDL performance-counter ticks to nanoseconds.

Here is the caller graph for this function:

Variable Documentation

◆ enabled

std::atomic< bool > group2::perf::enabled {false}

Master switch.

Toggled at process startup based on GROUP2_SERVER_PROFILE. When false, ScopeTimer ctor early-outs after a single relaxed atomic load.

◆ k_histogramBuckets

std::size_t group2::perf::k_histogramBuckets = 32
inlineconstexpr

Histogram bucket count.

Buckets are log2-spaced over the SDL performance-counter tick scale (resolution typically 100 ns). Index = __builtin_clzll-derived msb position; bucket 0 holds the smallest measurable values.

◆ k_invalidScope

ScopeId group2::perf::k_invalidScope = static_cast<ScopeId>(-1)
inlineconstexpr

◆ k_maxScopes

std::size_t group2::perf::k_maxScopes = 64
inlineconstexpr

Compile-time caps.

Both fit in a single CPU cache line per scope (PerScopeStats is ~512 B; we keep it small enough for hot scopes to coexist in L2).

◆ k_parallelThreshold

std::size_t group2::perf::k_parallelThreshold = 64
inlineconstexpr

Minimum items below which parallelFor runs sequentially even when the master switch is on.

Avoids paying TBB dispatch overhead for trivially-small work where sequential is faster.

◆ parallelEnabled

std::atomic<bool> group2::perf::parallelEnabled {true}
inline

Master switch for parallel execution.

Defaults ON as of PR-8. Earlier benches (idle-bot loadtest, pre-PR-7) suggested defaulting off because the synthetic test's per-item work was too small. With AI bots actually moving + PR-7 (collision/movement parallel) + PR-8 (per-component-type parallel serialization) the per-item work is meaningful and the 16-core box pays clear dividends:

N=100, AI: tick p99 1.57 ms (off) → 0.39 ms (on) N=300, AI: tick p99 12 ms (off) → 1.57 ms (on) N=500, AI: tick p99 50+ ms (off) → 3.15 ms when OS gives CPU

Below the k_parallelThreshold element-count, parallelFor short-circuits to sequential anyway, so small inputs still win.

Kill switch: GROUP2_SERVER_PARALLEL=0 flips back to sequential without rebuilding — useful for diff bisection if a regression appears.