性能优化是无止境的话题,需要持续的优化。SRS2做过一次较大的性能优化,从3k提升到7k,后续还需要不断的优化,会把优化过程和数据贴这个Issue。
之前SRS2做过部分优化,参考:
Play RTMP benchmark
The data for playing RTMP was benchmarked by [SB][srs-bench]:
Update | SRS | Clients | Type | CPU | Memory | Commit |
---|---|---|---|---|---|---|
2014-12-07 | 2.0.67 | 10k(10000) | players | 95% | 656MB | code |
2014-12-05 | 2.0.57 | 9.0k(9000) | players | 90% | 468MB | code |
2014-12-05 | 2.0.55 | 8.0k(8000) | players | 89% | 360MB | code |
2014-11-22 | 2.0.30 | 7.5k(7500) | players | 87% | 320MB | code |
2014-11-13 | 2.0.15 | 6.0k(6000) | players | 82% | 203MB | code |
2014-11-12 | 2.0.14 | 3.5k(3500) | players | 95% | 78MB | code |
2014-11-12 | 2.0.14 | 2.7k(2700) | players | 69% | 59MB | - |
2014-11-11 | 2.0.12 | 2.7k(2700) | players | 85% | 66MB | - |
2014-11-11 | 1.0.5 | 2.7k(2700) | players | 85% | 66MB | - |
2014-07-12 | 0.9.156 | 2.7k(2700) | players | 89% | 61MB | code |
2014-07-12 | 0.9.156 | 1.8k(1800) | players | 68% | 38MB | - |
2013-11-28 | 0.5.0 | 1.8k(1800) | players | 90% | 41M | - |
Publish RTMP benchmark
The data for publishing RTMP was benchmarked by [SB][srs-bench]:
Update | SRS | Clients | Type | CPU | Memory | Commit |
---|---|---|---|---|---|---|
2014-12-04 | 2.0.52 | 4.0k(4000) | publishers | 80% | 331MB | code |
2014-12-04 | 2.0.51 | 2.5k(2500) | publishers | 91% | 259MB | code |
2014-12-04 | 2.0.49 | 2.5k(2500) | publishers | 95% | 404MB | code |
2014-12-04 | 2.0.49 | 1.4k(1400) | publishers | 68% | 144MB | - |
2014-12-03 | 2.0.48 | 1.4k(1400) | publishers | 95% | 140MB | code |
2014-12-03 | 2.0.47 | 1.4k(1400) | publishers | 95% | 140MB | - |
2014-12-03 | 2.0.47 | 1.2k(1200) | publishers | 84% | 76MB | code |
2014-12-03 | 2.0.12 | 1.2k(1200) | publishers | 96% | 43MB | - |
2014-12-03 | 1.0.10 | 1.2k(1200) | publishers | 96% | 43MB | - |
Play HTTP FLV benchmark
The data for playing HTTP FLV was benchmarked by [SB][srs-bench]:
Update | SRS | Clients | Type | CPU | Memory | Commit |
---|---|---|---|---|---|---|
2014-05-25 | 2.0.171 | 6.0k(6000) | players | 84% | 297MB | code |
2014-05-24 | 2.0.170 | 3.0k(3000) | players | 89% | 96MB | code |
2014-05-24 | 2.0.169 | 3.0k(3000) | players | 94% | 188MB | code |
2014-05-24 | 2.0.168 | 2.3k(2300) | players | 92% | 276MB | code |
2014-05-24 | 2.0.167 | 1.0k(1000) | players | 82% | 86MB | - |
Latency benchmark
The latency between encoder and player with realtime config([CN][v3_CN_LowLatency], [EN][v3_EN_LowLatency]):
|
Update | SRS | VP6 | H.264 | VP6+MP3 | H.264+MP3 |
---|---|---|---|---|---|
2014-12-16 | 2.0.72 | 0.1s | 0.4s | 0.8s | 0.6s |
2014-12-12 | 2.0.70 | 0.1s | 0.4s | 1.0s | 0.9s |
2014-12-03 | 1.0.10 | 0.4s | 0.4s | 0.9s | 1.2s |
4条答案
按热度按时间r6l8ljro1#
SRS4: Refine ST Iterate Coroutines Performance
ST有个优化,可能能提升5%到10%,主要是优化迭代coroutines时的问题,数据参考:ossrs/state-threads#5 (comment)
这个优化改动较大,所以不会在SRS3上,预计会在SRS4上。
MacPro信息:
Docker信息:
Note: SRS绑定到CPU0,SB绑定到CPU2-3。
SRS3 for Playing Baseline
SRS3,没有这个优化的版本,可以作为性能基线,看这个PR相对优化了多少。
解读如下:
_st_epoll_dispatch
,以及RTMP Messages的处理逻辑。SRS3 for Playing with ST Refined
SRS3,合并了这个PR的版本,优化了ST迭代的逻辑。
解读如下:
Note: 优化完ST后,是对性能有一定的提升的,
_st_epoll_dispatch
不再是热点函数了。up9lanfz2#
SRS3: Use Compiler O2 To Improve Performance
SRS1,2,3一直默认使用O0,关闭了编译器的优化,可以开启优化后对比下数据。
MacPro信息:
Docker信息:
Note: SRS绑定到CPU0,SB绑定到CPU2-3。
SRS3 Play Baseline
先看基线数据,占用CPU平均在66%,用户空间39%,系统空间22%。
SRS3 Play with Compiler O2
SRS3开启O2编译选项后,能优化10%左右的性能,CPU使用52%左右,用户空间26%,系统空间17%。
c47b9e46
0g0grzrc3#
发现Docker环境可能存在基线不稳定的问题,有时候高有时候低,差别还非常的大,如下图所示:
做了一些优化,有些是预想得到能提升比如开启O2,但是由于基线不稳,所以先放一放,到时候找台物理机测试,下面是优化的分支:
x0fgdtte4#
关于ST的优化,可以优化的点在于:
关于ST的分析参考:https://github.com/ossrs/state-threads/tree/srs#analysis