Programming Rants: csharp

Showing posts with label csharp. Show all posts

2022-03-20

1 million Go goroutine vs C# task

Let's compare 1 million goroutines with 1 million tasks, which one more efficient in cpu usage and memory usage? The code is forked from kjpgit's techdemo

Name UserCPU    SysCPU     AvgRSS       MaxRSS    Wall
c#_1t      38.96    0.68    582,951    636,136    1:00
c#_2t      88.33    0.95 623,956    820,620    1:02
c#_4t   142.86    1.09 687,365 814,028    1:03
c#_8t   235.80    1.71 669,882      820,704    1:05
c#_16t    434.76    4.01 734,545      771,240    1:08
c#_32t    717.39    4.81      720,235 769,888    1:11
go_1t      58.77    0.65    2,635,380    2,635,632    1:04
go_2t 64.48    0.71    2,639,206    2,642,752    1:00
go_4t 72.55    1.42    2,651,086    2,654,972    1:00
go_8t 80.87    2.82    2,641,664    2,643,392    1:00
go_16t    83.18    4.03    2,673,404    2,681,100    1:00
go_32t    86.65    4.30    2,645,494    2,657,580    1:00

Apparently usual result is as expected because all tasks/goroutine spawned first before processing, Go's scheduler more efficient in CPU usage, but C# runtime more efficient in memory usage, which is normal tho because goroutines requires minimum 2KB overhead per goroutine, way higher cost than spawing a task. What if we increase to 10 millions tasks/goroutines, and let the spawning done in another task/goroutine, so if goroutine done it can restore back memory to the GC, here's the result:

Name     UserCPU    SysCPU     AvgRSS MaxRSS    Wall
c#_1t    12.78    1.28    2,459,190 5,051,528    0:13
c#_2t    22.60    1.54    2,692,439 5,934,796    0:18
c#_4t    42.09    1.54    2,370,239 5,538,280    0:21
c#_8t 88.54    2.29    2,522,053 6,334,176    0:29
c#_16t    204.39    3.32    2,395,001 5,803,808    0:34
c#_32t    259.09    3.25    1,842,458 4,710,012    0:28
go_1t 13.97    0.97    4,514,200 6,151,088    0:14
go_2t 12.35    1.51    5,595,418 9,506,076    0:07
go_4t 22.09    2.40    6,394,162    12,517,848    0:07
go_8t 31.00    3.09    7,115,281    13,428,344    0:06
go_16t 40.32    3.52    7,126,851    13,764,940    0:06
go_32t 58.58    3.58    7,104,882    12,145,396    0:06

Result seems normal, high memory usage caused by a lot of goroutine spawned at the same time in different thread, not blocking the main thread, but after it's done, got collected by GC (previously it was time based exit condition, this time it would exit after all process done, since I move the sleep first before atomic increment). What if we lower back to 1 million but with same exit rule and spawning executed in different task/goroutine also checking completion done every 1s, here's the result:

Name    UserCPU SysCPU    AvgRSS    MaxRSS    Wall
c#_1t    1.18    0.16    328,134    511,652    0:02
c#_2t    2.18    0.22 294,608    554,488    0:02
c#_4t    3.19    0.20 305,336    554,064    0:02
c#_8t    7.77    0.31 292,281    530,368    0:02
c#_16t 12.33    0.25 304,352    569,460    0:02
c#_32t 37.90    1.25 337,837    684,252    0:03
go_1t    2.72    0.42   1,592,978   2,519,040    0:03
go_2t    3.04    0.47   1,852,084   2,637,532    0:03
go_4t    3.65    0.54   1,936,626   2,637,272    0:03
go_8t    3.27    0.59   1,768,540   2,655,208    0:02
go_16t   4.01    0.71   1,770,673   2,664,504    0:02
go_32t   4.96    0.72   1,770,354   2,669,244    0:02

The difference in processing time is nengligible, but the CPU usage and memory usage quite contrast. Next, let's try to spawn in burst (100K per second), so we add 1 second sleep every 100th task/goroutine, since it's not quite realistic even for DDOS'ed server to receive that much (unless the server finely tuned), here's the result:

Name    UserCPU SysCPU    AvgRSS   MaxRSS   Wall
c#_1t 0.61    0.08    146,849   284,436    0:05
c#_2t 1.17    0.10    131,778   261,720    0:05
c#_4t 1.53    0.08    133,505   289,584    0:05
c#_8t    4.17    0.15    131,924   284,960    0:05
c#_16t   10.94    0.68    135,446   289,028    0:05
c#_32t   19.86    3.01    130,533   284,924    0:05
go_1t 1.84    0.24    731,872   1,317,796    0:06
go_2t    1.87    0.26    659,382   1,312,220    0:05
go_4t 2.00    0.30    661,296   1,322,152    0:05
go_8t 2.37    0.34    660,641   1,324,684    0:05
go_16t    2.82    0.39    660,225   1,323,932    0:05
go_32t    3.36    0.45    659,176   1,327,264    0:05

And for 5 millions:

Name    UserCPU    SysCPU    AvgRSS    MaxRSS    Wall
c#_1t 3.39    0.24    309,103    573,772    0:11
c#_2t 8.30    0.26    278,683    553,592    0:11
c#_4t    13.65    0.32    274,679    658,104    0:11
c#_8t    23.20    0.46    286,336 641,376    0:12
c#_16t   45.85    1.32    286,311 640,336    0:12
c#_32t   64.83    2.46    264,866    615,552    0:12
go_1t 6.25    0.50    1,397,434    2,629,936    0:13
go_2t 6.20    0.56    1,386,336    2,631,580    0:11
go_4t    7.52    0.65    1,410,523    2,625,308    0:11
go_8t   8.21    0.86    1,441,080    2,779,456    0:11
go_16t   11.17    0.96    1,436,220    2,687,908    0:11
go_32t   12.97    1.06    1,430,573    2,668,816    0:11

And for 25 millions:

Name    UserCPU   SysCPU    AvgRSS      MaxRSS    Wall
c#_1t 15.94    0.69 590,411    1,190,340    0:24
c#_2t 34.88    0.84 699,288    1,615,372    0:32
c#_4t 59.95    0.89 761,308    1,794,116    0:34
c#_8t    100.64    1.36    758,161    1,845,944    0:36
c#_16t   199.56    2.99 765,791    2,014,856    0:38
c#_32t   332.02    4.07 811,809    1,972,400    0:41
go_1t 21.76    0.71   2,846,565    4,413,968    0:29
go_2t 25.77    1.03   2,949,433    5,553,608    0:25
go_4t 28.74    1.24   2,920,447    5,800,088    0:24
go_8t 37.28    1.96   2,869,074    5,502,776    0:23
go_16t    43.46    2.67   2,987,114    5,769,356    0:24
go_32t    43.77    2.92   3,027,179    5,867,084    0:24

How about 25 millions and sleep per 200K?

Name    UserCPU   SysCPU AvgRSS    MaxRSS    Wall
c#_1t    18.47    0.91    842,492    1,820,788    0:22
c#_2t    40.32    0.93    1,070,555    2,454,324    0:31
c#_4t    62.39    1.16    1,103,741    2,581,476    0:33
c#_8t    100.84    1.34    1,074,820    2,377,580    0:34
c#_16t   218.26    2.91    1,062,642    2,726,700    0:37
c#_32t   339.00    6.51    1,042,254    2,275,644    0:40
go_1t    22.61    0.88    3,474,195    5,071,944    0:27
go_2t 25.83    1.20    3,912,071    6,964,640    0:20
go_4t 37.98    1.68    4,180,188    7,392,800    0:20
go_8t 38.56    2.44    4,189,265    8,481,852    0:18
go_16t    44.49    3.19    4,187,142    8,483,236    0:18
go_32t    48.82    3.44    4,218,591    8,424,200    0:18

And lastly 25 millions and sleep per 400K?

Name    UserCPU    SysCPU    AvgRSS MaxRSS    Wall
c#_1t    18.66    0.98    1,183,313    2,622,464    0:20
c#_2t    41.27    1.14    1,326,415    3,155,948    0:31
c#_4t    67.21    1.11    1,436,280    3,015,212    0:33
c#_8t    107.14    1.56    1,492,179    3,378,688    0:35
c#_16t   233.50    2.45    1,498,421    3,732,368    0:41
c#_32t   346.87    3.74    1,335,756    2,882,676    0:39
go_1t    24.13    0.82    4,048,937    5,099,220    0:26
go_2t    28.85    1.41    4,936,677    8,023,568    0:18
go_4t    31.51    1.95    5,193,653    9,537,080    0:14
go_8t    45.27    2.65    5,461,107    9,499,308    0:14
go_16t    53.43    3.19    5,183,009    9,476,084    0:14
go_32t    61.98    3.86    5,589,156   10,587,788    0:14

How to read results above? Wall = how much time need to complete, lower is better; AvgRSS/MaxRSS = average/max memory usage, lower is better; UserCPU = CPU time used in percent >100% means more than 1 full core compute time being used, lower is better. Versions used in this benchmark:

go version go1.17.6 linux/amd64
dotnet --version
6.0.201

2022-02-22

C# vs Go in Simple Benchmark

Today we're gonna retry two of my few favorite language in associative array and comb sort benchmark (compile and run, not just runtime performance, because development waiting for compilation time also important) like in the past benchmark. For installing DotNet:

wget https://packages.microsoft.com/config/ubuntu/21.04/packages-microsoft-prod.deb -O packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb
rm packages-microsoft-prod.deb
sudo apt install apt-transport-https
sudo apt-get update
sudo apt-get install -y dotnet-sdk-6.0 aspnetcore-runtime-6.0

For installing Golang:

sudo add-apt-repository ppa:longsleep/golang-backports
sudo apt-get update
sudo apt install -y golang-1.17

Result (best of 3 runs)

cd assoc; time dotnet run
6009354 6009348 611297
36186112 159701682 23370001

CPU: 14.16s Real: 14.41s RAM: 1945904KB

cd assoc; time go run map.go
6009354 6009348 611297
36186112 159701682 23370001

CPU: 14.80s Real: 12.01s RAM: 2305384KB

This is a bit weird, usually I see that Go use less memory but slower, but in this benchmark C# that are using less memory but a bit slower (14.41s vs 12.01s), possibly because the compilation speed also included.

cd num-assoc; time dotnet run
CPU: 2.21s      Real: 2.19s     RAM: 169208KB

cd num-assoc; time go run comb.go
CPU: 0.46s      Real: 0.44s     RAM: 83100KB

What if we increase the N from 1 million to 10 million?

cd num-assoc; time dotnet run
CPU: 19.25s     Real: 19.16s    RAM: 802296KB

cd num-assoc; time go run comb.go
CPU: 4.60s      Real: 4.67s     RAM: 808940KB

If you want to contribute (if I make mistake when coding the C# or Go version of the algorithm, or if there's more efficient data structure, just fork and create a pull request, and I will redo the benchmark).

2019-07-25

The Benchmarker's Web Framework Benchmark

Latest update (2019-07-19) from the-benchmarker's web-framework:

Language (Runtime)	Framework (Middleware)	Requests / s	Throughput
`c` (`11`)	agoo-c (0.5)	199670.00	115.49 MB
`python` (`3.7`)	japronto (0.1)	177634.00	212.57 MB
`java` (`8`)	rapidoid (5.5)	153167.00	275.56 MB
`go` (`1.12`)	fasthttprouter (0.1)	146986.67	236.54 MB
`python` (`3.6`)	vibora (0.0)	144171.33	163.66 MB
`c` (`99`)	kore (3.1)	142837.67	370.30 MB
`cpp` (`11`)	evhtp (1.2)	141011.33	136.87 MB
`java` (`8`)	act (1.8)	137266.33	236.87 MB
`ruby` (`2.6`)	agoo (2.8)	132990.67	76.84 MB
`rust` (`1.36`)	gotham (0.4)	130192.33	266.35 MB
`crystal` (`0.29`)	router.cr (0.2)	123911.33	116.40 MB
`nim` (`0.2`)	jester (0.4)	123719.00	248.70 MB
`crystal` (`0.29`)	raze (0.3)	122186.33	114.87 MB
`crystal` (`0.29`)	spider-gazelle (1.4)	120138.00	128.27 MB
`crystal` (`0.29`)	kemal (0.25)	114424.33	187.01 MB
`rust` (`1.36`)	actix-web (1.0)	114286.67	163.27 MB
`crystal` (`0.29`)	amber (0.28)	105704.33	193.62 MB
`rust` (`1.36`)	nickel (0.11)	102067.33	202.98 MB
`csharp` (`7.3`)	aspnetcore (2.2)	100367.67	163.49 MB
`rust` (`1.36`)	iron (0.6)	99692.33	125.66 MB
`crystal` (`0.29`)	orion (1.7)	95829.67	156.64 MB
`go` (`1.12`)	gorouter (4.0)	91250.00	121.51 MB
`node` (`12.6`)	polkadot (1.0)	90498.00	135.64 MB
`go` (`1.12`)	chi (4.0)	89401.33	119.52 MB
`node` (`12.6`)	0http (1.0)	88940.67	133.26 MB
`go` (`1.12`)	gin (1.4)	88229.00	154.70 MB
`go` (`1.12`)	violetear (7.0)	87979.00	116.68 MB
`node` (`12.6`)	restana (3.3)	87181.67	130.61 MB
`go` (`1.12`)	echo (4.1)	86944.33	152.32 MB
`go` (`1.12`)	kami (2.2)	85569.00	113.85 MB
`go` (`1.12`)	beego (1.12)	83531.33	112.24 MB
`go` (`1.12`)	gorilla-mux (1.7)	83107.67	110.75 MB
`kotlin` (`1.3`)	ktor (1.2)	76189.67	118.63 MB
`go` (`1.12`)	gf (1.8)	73145.67	110.94 MB
`node` (`12.6`)	polka (0.5)	71049.67	106.46 MB
`scala` (`2.12`)	akkahttp (10.1)	69006.00	147.87 MB
`node` (`12.6`)	rayo (1.3)	68066.67	102.05 MB
`python` (`3.7`)	falcon (2.0)	60301.00	141.34 MB
`swift` (`5.0`)	perfect (3.1)	60239.67	56.60 MB
`node` (`12.6`)	muneem (2.4)	58723.67	87.98 MB
`scala` (`2.12`)	http4s (0.18)	58317.33	102.08 MB
`node` (`12.6`)	fastify (2.6)	58029.33	147.94 MB
`node` (`12.6`)	foxify (0.1)	53745.00	112.74 MB
`java` (`8`)	spring-boot (2.1)	52174.00	39.04 MB
`node` (`12.6`)	koa (2.7)	50993.67	107.80 MB
`python` (`3.7`)	blacksheep (0.1)	50145.67	102.88 MB
`python` (`3.7`)	bottle (0.12)	49704.67	122.36 MB
`node` (`12.6`)	restify (8.2)	45617.00	79.87 MB
`php` (`7.3`)	slim (3.12)	43847.33	217.11 MB
`php` (`7.3`)	zend-expressive (3.2)	42281.00	209.34 MB
`php` (`7.3`)	symfony (4.3)	42019.67	208.50 MB
`python` (`3.7`)	starlette (0.12)	41710.67	89.72 MB
`node` (`12.6`)	express (4.17)	41081.33	100.31 MB
`php` (`7.3`)	zend-framework (3.1)	39650.00	196.61 MB
`swift` (`5.0`)	kitura (2.7)	39061.33	72.50 MB
`ruby` (`2.6`)	roda (3.22)	38720.67	36.90 MB
`swift` (`5.0`)	vapor (3.3)	38685.00	64.54 MB
`python` (`3.7`)	hug (2.5)	37882.33	93.84 MB
`php` (`7.3`)	lumen (5.8)	37822.00	196.49 MB
`ruby` (`2.6`)	cuba (3.9)	35257.00	41.55 MB
`crystal` (`0.28`)	lucky (0.14)	33885.00	41.73 MB
`crystal` (`0.29`)	onyx (0.5)	32685.67	83.76 MB
`node` (`12.6`)	turbo_polka (2.0)	31139.67	29.22 MB
`ruby` (`2.6`)	rack-routing (0.0)	29710.33	17.13 MB
`node` (`12.6`)	hapi (18.1)	29298.33	75.73 MB
`php` (`7.3`)	laravel (5.8)	28941.33	151.14 MB
`swift` (`5.0`)	kitura-nio (2.7)	28372.00	53.53 MB
`python` (`3.7`)	fastapi (0.33)	27457.67	59.12 MB
`python` (`3.7`)	aiohttp (3.5)	23169.00	52.40 MB
`ruby` (`2.6`)	flame (4.18)	20298.33	11.70 MB
`python` (`3.7`)	molten (0.27)	19610.00	36.40 MB
`python` (`3.7`)	flask (1.1)	19088.33	46.94 MB
`ruby` (`2.6`)	hanami (1.3)	18242.67	137.89 MB
`rust` (`nightly`)	rocket (0.4)	17988.33	27.86 MB
`python` (`3.7`)	bocadillo (0.18)	17408.33	33.59 MB
`python` (`3.7`)	sanic (19.6)	14934.00	26.61 MB
`ruby` (`2.6`)	sinatra (2.0)	14907.33	38.66 MB
`swift` (`5.0`)	swifter (1.4)	11351.67	14.52 MB
`python` (`3.7`)	quart (0.9)	10817.67	21.55 MB
`python` (`3.7`)	responder (1.3)	8826.33	19.23 MB
`python` (`3.7`)	django (2.2)	7604.67	22.02 MB
`python` (`3.7`)	tornado (5.1)	7089.33	20.92 MB
`python` (`3.7`)	masonite (2.2)	6298.67	15.47 MB
`crystal` (`0.29`)	athena (0.7)	6247.67	7.81 MB
`ruby` (`2.6`)	rails (5.2)	3680.33	11.28 MB
`python` (`3.7`)	cyclone (0.0)	2889.33	7.85 MB

It's interesting to see new frameworks (or one that I never heard of.. Vibora, Agoo, and Gotham for example) performing well.
But as usual, this just router, the bottleneck is mostly always the database.

Subscribe to: Posts ( Atom )