Most admins have encountered the problem before: someone popular boosts a post of your instance, suddenly you are bombarded with a few thousand HTTP requests and your instance becomes unusable until the storm is over. It doesn't have to be this way, in fact, it's pretty easy to prevent.

There is an appalling lack of documentation on this. I will explain what I do for my own instance, which is based on Sharkey (and therefore Misskey). Some advice will be specific to Misskey-based instances, but the lion's share of this post will be database and reverse proxy configuration entirely disconnected from the backing instance software, minus some paths you will have to adjust. This post assumes familiarity with instance administration and reverse proxies.

I have not, and will not, use Mastodon, therefore I cannot help you with configuring it.

Database Optimization

This is a bare minimum. Default Postgres is not using your hardware as much as it could, and will give you miserable performance if your instance grows a little.

Enter your setup into pgtune and save it into your Postgres config, then restart the database. Use the "Online Transaction Processing" template, I've personally gotten the best results with it.

Try to always have your database on flash memory. The disk latency that classical hard drives have is going to be a massive performance killer. Buy a cheap SSD to just plop into your server, it doesn't need to be top of the line, it just needs to not be spinning rust. The best-performing option is of course NVMe storage.

Reverse proxies

These are your lifeblood. Making them "dumb" HTTP forwarding machines is leaving almost everything they can do on the table, and is sadly the default in most documentation.

I use HAProxy for my instance, an open-source high-performance load balancer and proxy. We are going to configure rate-limiting, request prioritization, caching, keepalives, and some more things.

If you need anything else for HAProxy, the docs available here are a lot better than the "raw" ones, in my opinion.

Basic HAProxy setup

HAProxy by itself is quite simple to set up. Here's the base config we'll be working off of:

frontend fedi
  mode http
  option httplog

  timeout client 10s
  timeout connect 5s
  timeout server 10s 
  timeout http-request 10s

  bind [::]:80 v4v6 allow-0rtt
  bind [::]:443 v4v6 ssl crt [YOUR .PEM SSL CERTIFICATE PATH]
  http-request redirect scheme https unless { ssl_fc } # redirect to HTTPS

  default_backend instance_backend

backend instance_backend
  option forwarded
  option forwardfor
  mode http
  
  option redispatch # re-send request that timed out
  timeout queue 15s

  default-server check  observe layer7  error-limit 10 on-error mark-down  inter 2s rise 10 slowstart 10s maxconn 50

  server backend1 [YOURBACKENDIP]:[YOURBACKENDPORT] check

Frontend sections accept requests from ports they bind to, process and modify them, and then pass them on to backends.

Basic request prioritization

acl is_activitypub_req hdr(Accept) -i ld+json application/activity+json
acl is_activitypub_payload hdr(Content-Type) -i application/ld+json application/activity+json
acl is_jscss path_end .js .css
acl is_image path_end .png .jpg .jpeg .webp

http-request set-priority-class int(1) if { path_beg /api || is_jscss }
http-request set-priority-class int(2) if is_image
http-request set-priority-class int(100) if is_activitypub_req || is_activitypub_payload

This little block in the fedi "frontend" detects ActivityPub requests based on the Accept header containing "activity". This includes both of the common headers for it that you will see.

Those conditions are used to determine the priority of the request. HAProxy has an internal request queueing system, which we can instruct to move user-relevant requests to the top of the queue. This prevents your users from having to stare at a spinner while your server processes 2000 ActivityPub requests.

/api is the sub-path used for all API requests in Misskey. The same seems to go for Mastodon, but again, I don't use Mastodon. API requests and common frontend-critical files are the number one (especially with Misskey, which hot-loads UI components), requests for images rank below them, and ActivityPub requests are the least important.

This doesn't mean anything will get dropped, by the way. Everything in the queue will still usually be responded to, just sometimes with multiple seconds of delay.

HTTP Keepalives

That's already a pretty big improvement, but we can do more.

Why close the connection to our backends at all? A TCP round-trip takes quite a bit of time, we can instead use modern HTTP between our reverse proxy and backend to keep connections open.

http-reuse aggressive

This goes into the backend section. I'm going to warn you here: this is fine for Misskey (and probably Mastodon), as connections are always stateless. This is probably the case for all fedi software, but you never know what someone is cooking up.

This instructs HAProxy to keep connections open and reuse any idle ones for different requests. Avoiding unnecessary round trips will prevent a lot of latency.

Caching

The effectiveness of this is going to vary depending on how much you want to cache. HAProxy keeps the cache in memory, it does not feature a disk cache like nginx does and offers no persistence.

frontend fedi
  filter cache fedi_cache
  http-request cache-use fedi_cache
  http-response cache-store fedi_cache

  http-response set-header X-Cache-Status HIT if !{ srv_id -m found }
  http-response set-header X-Cache-Status MISS if { srv_id -m found }

cache fedi_cache
  total-max-size 2048
  max-object-size 5242880
  # 5 MB max size
  max-age 960

Due to the limitation of caching being in-memory only, you really want to limit your caching to actually "hot" data that benefits from fast responses. The above config defines the cache to contain at most 2GB of data, with each entry being at most 5MB large. Entries are evicted from cache after 16 minutes.

To make caching easier to debug, we add an X-Cache-Status header that reveals the caching status. If you don't want this info revealed, just drop those lines.

HAProxy is quite good at determining what to cache. It will respect Cache-Control headers and everything you'd have to worry about.

Bandwidth limiting

This is gonna vary based on your setup, I'd heavily suggest you test and adjust the number to be high enough to not piss anyone off, but low enough to reduce your network load. One person should not be able to max out your NIC.

frontend fedi
  filter bwlim-out download_limit default-limit 10242880 default-period 1s
  http-response set-bandwidth-limit download_limit

The above defines a maximum download speed of 10 MB/s, or 10242880 bytes every second.

Request compression

From what I've seen, Misskey does not compress their responses by default. This is very easily fixed with HAProxy. API responses compress down into almost nothingness with gzip.

frontend fedi
  filter compression
  compression algo gzip
  compression offload

"compression offload" ensures that HAProxy is handling compression, not your backend. It is also possible to compress requests before they are forwarded to the backend, but I've not seen this have any positive effect on performance.

Mastodon uses gzip by default. Ignore this section to pass through the pre-compressed responses.

Misskey tuning

You must tune your worker parameters to fit your system if you want to have a responsive instance even under load.

Sadly, a lot of this will depend on your database performance, CPU, networking setup, and many more factors, so I cannot offer a general recommendation for it. There are however some core rules you should adhere to.

First of all: your "clusterLimit" should not exceed the number of CPUs you have available. This seems to cause slowdowns making your throughput lower than if you match it instead.

Next is the job rate limiter. This is NOT a value you should bump up to 99999, hoping to get your queue empty faster. Federation isn't a race, it's better to have a slowly draining queue than one that gets processed fast but causes the instance to appear to be frozen to the users. This value will also depend on your system, 96 is a good starting point for delivery jobs.

Job concurrency determines how many jobs a worker takes at once. The worker will be unavailable for anything else while it processes this batch of jobs, which includes requests from users. Again, experiment with these. Delivery jobs can get stuck for a very long time if faced with a server that times out, and you usually only have ~8-16 workers. 16-32 is a decent point to start experimenting with.

The golden solution

The ultimate way to ensure that your instance stays responsive under load is to run two servers. One of them is dedicated to handling federation and queue work, and the other is dedicated to user handling. This of course takes more resources and is likely excessive for smaller instances, the things mentioned before should more than suffice for you.

It is however pretty cool. :3

Misskey makes this quite trivial. Spin up a server with the "MK_ONLY_SERVER='true'" environment variable, and it will only handle web requests, no queue jobs. You can keep the same config if you wish, the worker parts will simply be ignored (except for the clusterLimit).

Mastodon scaling is documented here.

frontend fedi
  use_backend federation_workers if is_activitypub_req || is_activitypub_payload
  default_backend web_workers

backend web_workers
  server web_server [IP]:[PORT] check
  server federation_worker [IP]:[PORT] check backup

backend federation_workers
  server federation_worker [IP]:[PORT] check
  server web_server [IP]:[PORT] check backup

Here's a sample of how that config would look in HAProxy. We check whether the request has ActivityPub headers, if it does, it gets passed into the separate backend.

This also comes with the advantage of your server now being fault-tolerant. If the main server of a backend fails, the backup will take over.

Closing

I don't really have closing words. I'm open to any questions you may have at https://plasmatrap.com/@privateger. Respond directly to this post using fedi, post available at https://plasmatrap.com/notes/9y7d6izir8.

This is also not even close to everything HAProxy can do. A bunch of fun stuff can be added, I may write more posts about it in the future (custom error pages, QUIC support...).

Seeya. :3

How to make your fediverse instance not explode

Latte macchiato