Fly + Bun for browser automation. An unfinished attempt

Some time a go, a friend that works with Elixir told me about this new cloud provider Fly. Since I aim to build a scalable AI Platform that often relies on components (e.g. browser automation, dynamic code execution, video editing) that are hard to scale with serverless, it peaked my interest.

In this article I aim to create a POC for a scalable API for browser automation tasks using Bun as the main toolkit.

Coming from a mix of regular cloud VMs (Linode) and hosting Serverless on Vercel, fly is really something different, because it:

automatically sets up SSL and a host for your VMs
allows multiple machines for an app with an easy API to create/remove/pause/start them.
automatically load-balances stuff for you.
exposes this all in a very general purpose way so you can build your own solutions with it.

The creators of Fly showed a new pattern that rethinks serverless: the FLAME pattern. It was first implemented with Elixir, and they created a small (unfinished) POC for JavaScript, but to accomodate my existing codebase of 120k lines, I need this pattern in Typescript. My conclusion is that this is easier with Bun

What I want to do:

Have a fly setup with docker image that installs NodeJS, ffmpeg, browser, and anything else I require in my monolith.
Ensure that docker image also starts up the server correctly
Implement the FLAME pattern so it will always start or create machines if unavailable, or use an available one if there is one.

It'd be very nice to have a server that is infinitely scalable that has MUCH FEWER CONSTRAINTS than regular serverless.

What I'd want:

Open browsers with playwright, any extension code, proxy, etc.
Process files with things like ffmpeg
Run heavy AI models on GPUs
Run arbitrary code

This would be a very good thing to have and - if I document it well (openapi) and make it infinitely scalable using fly.io - it'd be an amazing service in itself.

An alternative or diversion from FLAME pattern came from an idea from Theo: Using Vercel Serverless OpenAPI by default, but host and lead it to Fly automatically for when it’s not possible to run a function on there (such as ffmpeg or browsers). I might do this.

TODO: Proof of concept

To create a POC, let's:

✅ Figure out how to create a bun + ffmpeg docker + chromium

✅ Make a playwright script in the http endpoint, that opens chrome, goes to a URL, waits for it to load, gets the text, and returns that in the response.

❌ playwright support in bun seems bad

✅ Try puppeteer instead: https://github.com/rgl/try-puppeteer-in-bun/blob/main/main.js ... first in localhost then also on the vm...

✅ Fix VM chrome not found (https://stackoverflow.com/questions/59979188/error-failed-to-launch-the-browser-process-puppeteer)

❌ For some reason, it can't access any network (could not complete HTTP request to instance: connection closed before message completed). see https://community.fly.io/t/could-not-make-http-request-to-instance-connection-closed-before-message-completed/14650 ... tried to follow this but there is no solution here.

❌ Maybe try https://macarthur.me/posts/puppeteer-with-docker/ not working either

✅ Try https://github.com/fly-apps/puppeteer-js-renderer it works! see https://puppeteer-js-renderer2.fly.dev/api/render?url=https://google.nl

Ultimately I want to run regular functions in next.js - passing special functions onto fly. However, nextjs projects don't output a build folder with all js files.

Try to replicate https://fly.io/javascript-journal/flame-for-javascript-rethinking-serverless/ but in the Bun repo so it starts up a new one that pauses within 1 minute.

Start up 100 that are paused. Do a stress test with https://www.artillery.io/ ramping up to 100 rps and see how it goes.

If this works, I can mix in my openapi nextjs demo so I actually host regular serverless unless it's not possible. I can do this by providing a different handler to the /api. Max duration shouldn't be a problem given we are just waiting for a fetchcall to return.

Further raw findings

Stopped machines cost $0.15 per GB per month, and my machine with chrome is 1GB.

Destroyed machines cost nothing and scaling a stopped machine with chromium seems to take ±20 seconds.

To create a highly performant browser automation service, it'd be best to work with "performance" VMs. The amount of CPUs doesn't matter, so 1 CPU should be fine. The amount of memory does matter, so 8GB (highest for 1 CPU) should be fastest.

The above setup will cost about $0.0846 per hour, but obviously, the highest cost will come from LLMs. You could argue trying multiple tabs to make the VM more cost-effective, but that wouldn't reduce LLM cost which is significantly higher, so this optimisation doesn't seem to be worth it. LLMs are currently costing anywhere between $0.10 and $100 cost per million tokens (a factor 1000x difference)

Let's assume:

A browser action requires 4000 tokens on average and a bot performs 720 actions an hour, yielding 2.88M tokens an hour
We use Llama 2 70B (4096 Context Length) for $0.70/$0.80 per million on groq.com

Now our additional cost would be $2.30 per hour extra for the bot to run.

Last but not least, we can expect to pay an additional price for residential IPs. This may easily cost anywhere between $0.3 up to $10 extra per GB. Assuming a single bot making 720 requests per hour worth 1.5MB each, that comes down to 1080MB/hour.

Let's assume for now we can get the proxys for $1 per GB, this will be our final cost per hour:

VM: $0.09 per hour
LLM: $2.30 per hour
Proxy: $1 per hour

Our AI will already cost $3.39 per hour, or about half a cent ($0.005) per pageload.

12/1/2023

Jan Wilmake's Blog

Fly + Bun for browser automation. An unfinished attempt

TODO: Proof of concept

Further raw findings