Most admins have encountered the problem before: someone popular boosts a post of your instance, suddenly you are bombarded with a few thousand HTTP requests and your instance becomes unusable until the storm is over. It doesn't have to be this way, in fact, it's pretty easy to prevent.
There is an appalling lack of documentation on this. I will explain what I do for my own instance, which is based on Sharkey (and therefore Misskey). Some advice will be specific to Misskey-based instances, but the lion's share of this post will be database and reverse proxy configuration entirely disconnected from the backing instance software, minus some paths you will have to adjust. This post assumes familiarity with instance administration and reverse proxies.
I have not, and will not, use Mastodon, therefore I cannot help you with configuring it.
Database Optimization
This is a bare minimum. Default Postgres is not using your hardware as much as it could, and will give you miserable performance if your instance grows a little.
Enter your setup into pgtune and save it into your Postgres config, then restart the database. Use the "Online Transaction Processing" template, I've personally gotten the best results with it.
Try to always have your database on flash memory. The disk latency that classical hard drives have is going to be a massive performance killer. Buy a cheap SSD to just plop into your server, it doesn't need to be top of the line, it just needs to not be spinning rust. The best-performing option is of course NVMe storage.
Reverse proxies
These are your lifeblood. Making them "dumb" HTTP forwarding machines is leaving almost everything they can do on the table, and is sadly the default in most documentation.
I use HAProxy for my instance, an open-source high-performance load balancer and proxy. We are going to configure rate-limiting, request prioritization, caching, keepalives, and some more things.
If you need anything else for HAProxy, the docs available here are a lot better than the "raw" ones, in my opinion.
Basic HAProxy setup
HAProxy by itself is quite simple to set up. Here's the base config we'll be working off of:
frontend fedi mode http option httplog timeout client 10s timeout connect 5s timeout server 10s timeout http-request 10s bind [::]:80 v4v6 allow-0rtt bind [::]:443 v4v6 ssl crt [YOUR .PEM SSL CERTIFICATE PATH] http-request redirect scheme https unless { ssl_fc } # redirect to HTTPS default_backend instance_backend backend instance_backend option forwarded option forwardfor mode http option redispatch # re-send request that timed out timeout queue 15s default-server check observe layer7 error-limit 10 on-error mark-down inter 2s rise 10 slowstart 10s maxconn 50 server backend1 [YOURBACKENDIP]:[YOURBACKENDPORT] check
Frontend sections accept requests from ports they bind to, process and modify them, and then pass them on to backends.
Basic request prioritization
acl is_activitypub_req hdr(Accept) -i ld+json application/activity+json acl is_activitypub_payload hdr(Content-Type) -i application/ld+json application/activity+json acl is_jscss path_end .js .css acl is_image path_end .png .jpg .jpeg .webp http-request set-priority-class int(1) if { path_beg /api || is_jscss } http-request set-priority-class int(2) if is_image http-request set-priority-class int(100) if is_activitypub_req || is_activitypub_payload
This little block in the fedi "frontend" detects ActivityPub requests based on the Accept header containing "activity". This includes both of the common headers for it that you will see.
Those conditions are used to determine the priority of the request. HAProxy has an internal request queueing system, which we can instruct to move user-relevant requests to the top of the queue. This prevents your users from having to stare at a spinner while your server processes 2000 ActivityPub requests.
/api is the sub-path used for all API requests in Misskey. The same seems to go for Mastodon, but again, I don't use Mastodon. API requests and common frontend-critical files are the number one (especially with Misskey, which hot-loads UI components), requests for images rank below them, and ActivityPub requests are the least important.
This doesn't mean anything will get dropped, by the way. Everything in the queue will still usually be responded to, just sometimes with multiple seconds of delay.
HTTP Keepalives
That's already a pretty big improvement, but we can do more.
Why close the connection to our backends at all? A TCP round-trip takes quite a bit of time, we can instead use modern HTTP between our reverse proxy and backend to keep connections open.
http-reuse aggressive
This goes into the backend section. I'm going to warn you here: this is fine for Misskey (and probably Mastodon), as connections are always stateless. This is probably the case for all fedi software, but you never know what someone is cooking up.
This instructs HAProxy to keep connections open and reuse any idle ones for different requests. Avoiding unnecessary round trips will prevent a lot of latency.
Caching
The effectiveness of this is going to vary depending on how much you want to cache. HAProxy keeps the cache in memory, it does not feature a disk cache like nginx does and offers no persistence.
frontend fedi filter cache fedi_cache http-request cache-use fedi_cache http-response cache-store fedi_cache http-response set-header X-Cache-Status HIT if !{ srv_id -m found } http-response set-header X-Cache-Status MISS if { srv_id -m found } cache fedi_cache total-max-size 2048 max-object-size 5242880 # 5 MB max size max-age 960
Due to the limitation of caching being in-memory only, you really want to limit your caching to actually "hot" data that benefits from fast responses. The above config defines the cache to contain at most 2GB of data, with each entry being at most 5MB large. Entries are evicted from cache after 16 minutes.
To make caching easier to debug, we add an X-Cache-Status header that reveals the caching status. If you don't want this info revealed, just drop those lines.
HAProxy is quite good at determining what to cache. It will respect Cache-Control headers and everything you'd have to worry about.
Bandwidth limiting
This is gonna vary based on your setup, I'd heavily suggest you test and adjust the number to be high enough to not piss anyone off, but low enough to reduce your network load. One person should not be able to max out your NIC.
frontend fedi filter bwlim-out download_limit default-limit 10242880 default-period 1s http-response set-bandwidth-limit download_limit
The above defines a maximum download speed of 10 MB/s, or 10242880 bytes every second.
Request compression
From what I've seen, Misskey does not compress their responses by default. This is very easily fixed with HAProxy. API responses compress down into almost nothingness with gzip.
frontend fedi filter compression compression algo gzip compression offload
"compression offload" ensures that HAProxy is handling compression, not your backend. It is also possible to compress requests before they are forwarded to the backend, but I've not seen this have any positive effect on performance.
Mastodon uses gzip by default. Ignore this section to pass through the pre-compressed responses.
Misskey tuning
You must tune your worker parameters to fit your system if you want to have a responsive instance even under load.
Sadly, a lot of this will depend on your database performance, CPU, networking setup, and many more factors, so I cannot offer a general recommendation for it. There are however some core rules you should adhere to.
First of all: your "clusterLimit" should not exceed the number of CPUs you have available. This seems to cause slowdowns making your throughput lower than if you match it instead.
Next is the job rate limiter. This is NOT a value you should bump up to 99999, hoping to get your queue empty faster. Federation isn't a race, it's better to have a slowly draining queue than one that gets processed fast but causes the instance to appear to be frozen to the users. This value will also depend on your system, 96 is a good starting point for delivery jobs.
Job concurrency determines how many jobs a worker takes at once. The worker will be unavailable for anything else while it processes this batch of jobs, which includes requests from users. Again, experiment with these. Delivery jobs can get stuck for a very long time if faced with a server that times out, and you usually only have ~8-16 workers. 16-32 is a decent point to start experimenting with.
The golden solution
The ultimate way to ensure that your instance stays responsive under load is to run two servers. One of them is dedicated to handling federation and queue work, and the other is dedicated to user handling. This of course takes more resources and is likely excessive for smaller instances, the things mentioned before should more than suffice for you.
It is however pretty cool. :3
Misskey makes this quite trivial. Spin up a server with the "MK_ONLY_SERVER='true'" environment variable, and it will only handle web requests, no queue jobs. You can keep the same config if you wish, the worker parts will simply be ignored (except for the clusterLimit).
Mastodon scaling is documented here.
frontend fedi use_backend federation_workers if is_activitypub_req || is_activitypub_payload default_backend web_workers backend web_workers server web_server [IP]:[PORT] check server federation_worker [IP]:[PORT] check backup backend federation_workers server federation_worker [IP]:[PORT] check server web_server [IP]:[PORT] check backup
Here's a sample of how that config would look in HAProxy. We check whether the request has ActivityPub headers, if it does, it gets passed into the separate backend.
This also comes with the advantage of your server now being fault-tolerant. If the main server of a backend fails, the backup will take over.
Closing
I don't really have closing words. I'm open to any questions you may have at https://plasmatrap.com/@privateger. Respond directly to this post using fedi, post available at https://plasmatrap.com/notes/9y7d6izir8.
This is also not even close to everything HAProxy can do. A bunch of fun stuff can be added, I may write more posts about it in the future (custom error pages, QUIC support...).
Seeya. :3