Serverless computing uses complex orchestration to manage resource scaling and initialization. Evaluating these orchestrators requires advanced benchmarking. A way of testing these orchestrators is to rely on ”naive” synthetic methodologies that use uniform or Gaussian distributions. This thesis investigates the extent to which synthetic benchmarks distort perceived system performance compared to trace driven testing based on real world data. Using an OpenFaaS test bed with a modified Locust load generator we conducted a comparative analysis between standard synthetic benchmarks and a replay of the Azure Functions production trace. We implemented a ”universal function” to simulate a wide array of workload profiles mimicking the Zipfian popularity and Log Normal execution distributions found in the production data. The results show that standard uniform function selection creates an artificially ”warm” environment masking the penalties of cold starts. The synthetic model also fails in generating out-liers responsible for head of line blocking and resource contention. It can be concluded that naive synthetic benchmarking leads to an overestimation of system throughput and stability. To achieve performance evaluations that covers all cases future research should adopt trace driven methodologies that preserve the statistical irregularities of real world traffic