If you want to make it actually decently safe, one approach would be to make a list of all the syscalls you critically need after you have loaded all the content in memory (strace can help), then write a seccomp filter to block all the others. Since you don’t need filesystem interaction or pretty much anything except socket I/O, your syscall allowlist can be pretty short. This will ensure that even if an attacker manages to exploit a bug (like a UAF) they’ll be dropped into a sandbox with very little useful functionality.
Here's one I wrote 25 years ago that was actually used in production for about a decade. For reasons, it ran on a server with 128MB of RAM and served a web/JS chat server for a large number of schools in England.
around line 663. there's a call to strrchr, checking for a period in the filename. then immediately after that, there's a strlen that uses the results.
Which is fine, unless the first call returns NULL, because there was no period in the name, and then the program will crash.
Much has been said about Daniel J. Bernstein eschewing the Standard C library in publicfile and other softwares. But Bernstein's str_rchr() function was designed to expressly avoid this well-known gotcha of the Standard C string functions.
Here's str_rchr() which uses the offset of the terminating NUL as the returned sentinel value:
One can get very lost in the weeds on the comparative merits on different instruction architectures of compiler intrinsics, explicit loop unrolling, whole program optimization, and whatnot. (-:
// it doesn't seem to love piping or redirecting output without this, even
// with the newlines above
fflush(stdout);
Ah, the full buffering mode. I believe it can be fixed by calling
setvbuf(stdout, NULL, _IOLBF, BUFSIZ);
once at the start.
On the whole, it actually almost implements the minimally required amount of HTTP/1.1: I would suggest adding support for HEAD requests, it's just a single flag that you need to set in the try_parse_request_path(), and check in generate_response(). Also, probably check that the request path is followed by "HTTP/1." before sending the response? And I'd really recommend finishing reading out all of the request from the socket (that is, until you've seen "\r\n\r\n"), or you may run into the problem of your clients not being sent the complete response [0].
But other than that, yeah, it is an HTTP server. The HTTP protocol is decently well thought out so that you can be oblivious of most of the features you don't want to support.
Ah yep, I read about the TCP RST problem in one of the RFC docs, then promptly forgot about it and never implemented anything to avoid it. Thankyou for the detailed notes.
Good to see more tiny / small http servers. I'm not a fan of sticking Nginx in a container which maybe bigger than the assets its serving.
A statically compiled httpd from busybox has been great for this reason but its good to see more options.
This doesn't make it safe. It can still be exploited and used to join a botnet, as a proxy, to mine cryptocurrency, to spy on requests or redirect users to malicious websites or phish them, to host malware...
Found the issue - a use after free in send_response() if I close the session early due to an error. Was continuing to the next bit. Put a temp fix in place, will push a proper one later.
What is it about HN that overwhelms small servers like this? It was a small static page so I wouldn't think it'd be that much load on the server itself, even for an OrangePi like this one.
Too many simultaneous connections for his router maybe? Or too much bandwidth for his internet connection?
If they are behind a NAT/ stateful firewall there is just so much connections it will handle at once. I think OpenWRT has like 16K max by default, f.ex. So for less than 16K requests by different users/IPs… each is kept for about 1 minute I think… it quickly will go down, I guess. :)
Are you near Sydney? I noted a possible link to the Central Coast. I will contribute a smaller device if you're game to host it.
PS. You may be unaware that your shortened domain name 'benren' from your whois-available real name means "stupid person" in Mandarin. Only noted because there is a company registered with the same name since 1999. On the off chance it's yours, probably not the best marketing in a global world. Just throwing it out there.
It could be self-deprecating! Plus, I would more readily read it as 本人 (this person/me/myself) - than as 笨人 (stupid person).
Also, Pinyin is more susceptible to accidental interpretations than most writing systems due to ambiguity and tonality. For example, “mana” can be parsed into 32 different syllable-tone combinations (man/a or ma/na times 4x4 tone combinations for each syllable), and while most aren’t meaningful, that still gives you a ton of potential words to match against.
Almost everything is going to sound like something else in some other language, I don't know that there's much you can do about that. On the plus side, maybe the silly association will make the name stick in people's heads!
Nice effort but this isn’t interesting at all. You skipped the most interesting part; parsing http. This is beejs networking tutorial with writing a file to a socket.
Harsh? Maybe, but you’re posting this to a site with some of the most talented developers on planet. Real talk, sorry.
I swear that the only thing that draws people to this industry is the desire to escape their home village. It certainly isn't the quality of conversation with like-minded tinkerers. It's just losers like you who think a big paycheck for playing with Jira means you're the smartest boy in the world. God help us.
Even simple implementations serve as valuable learning exercises, and proper HTTP parsing could be the natural next step in the author's learning journey.
If you want to make it actually decently safe, one approach would be to make a list of all the syscalls you critically need after you have loaded all the content in memory (strace can help), then write a seccomp filter to block all the others. Since you don’t need filesystem interaction or pretty much anything except socket I/O, your syscall allowlist can be pretty short. This will ensure that even if an attacker manages to exploit a bug (like a UAF) they’ll be dropped into a sandbox with very little useful functionality.
Or (if on openbsd), the pledge and unveil syscalls. Pretty similar effect, but much easier
> RFC 9112 is a fantastic document that details the exact format of HTTP 1.1 requests, how servers should respond to those requests ...
> This server follows almost none of that.
This made me chuckle :-)
I've got a similar one, but with http 1.0 and partial 1.1 support, multi threaded, etc. in C
https://GitHub.com/lionkor/http
Here's one I wrote 25 years ago that was actually used in production for about a decade. For reasons, it ran on a server with 128MB of RAM and served a web/JS chat server for a large number of schools in England.
http://git.annexia.org/?p=rws.git;a=tree
Noice!
You are lucky that all of your sample files have dots in their names. (-:
I don't understand this, could you explain?
around line 663. there's a call to strrchr, checking for a period in the filename. then immediately after that, there's a strlen that uses the results.
Which is fine, unless the first call returns NULL, because there was no period in the name, and then the program will crash.
Much has been said about Daniel J. Bernstein eschewing the Standard C library in publicfile and other softwares. But Bernstein's str_rchr() function was designed to expressly avoid this well-known gotcha of the Standard C string functions.
Here's str_rchr() which uses the offset of the terminating NUL as the returned sentinel value:
* https://github.com/jdebp/djbwares/blob/trunk/source/str_rchr...
And here's it being used (by publicfile's httpd and indeed other programs) to find the basename's extension in order to infer a content type:
* https://github.com/jdebp/djbwares/blob/trunk/source/filetype...
The extension is always a non-NULL string, that can always be passed to str_equal(). It is just sometimes a zero-length string.
It's possible, but a bit clunky, to achieve the same effect with two successive calls to Standard C/C++ strrchr(), or strchr(), the second being:
Here's me doing that in my own code:* https://github.com/jdebp/nosh/blob/c8d635c284b41b483067d5f58...
One can get very lost in the weeds on the comparative merits on different instruction architectures of compiler intrinsics, explicit loop unrolling, whole program optimization, and whatnot. (-:
Oof, thanks.
On the whole, it actually almost implements the minimally required amount of HTTP/1.1: I would suggest adding support for HEAD requests, it's just a single flag that you need to set in the try_parse_request_path(), and check in generate_response(). Also, probably check that the request path is followed by "HTTP/1." before sending the response? And I'd really recommend finishing reading out all of the request from the socket (that is, until you've seen "\r\n\r\n"), or you may run into the problem of your clients not being sent the complete response [0].
But other than that, yeah, it is an HTTP server. The HTTP protocol is decently well thought out so that you can be oblivious of most of the features you don't want to support.
[0] https://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-... — the tl;dr is that if you do close() on a socket that still has the data from the client you haven't recv()d, the client will be sent an RST.
Ah yep, I read about the TCP RST problem in one of the RFC docs, then promptly forgot about it and never implemented anything to avoid it. Thankyou for the detailed notes.
Good to see more tiny / small http servers. I'm not a fan of sticking Nginx in a container which maybe bigger than the assets its serving. A statically compiled httpd from busybox has been great for this reason but its good to see more options.
I saw the title, and this is everything I have ever hoped for.
Reminds me of Jef Poskanzer’s micro_http: https://acme.com/software/micro_httpd/
Consider it broke. You are getting hugged to death by HN. Throw Cloudlfare in front.
Easiest way to make it safe is
1) Run it in a container
2) Isolate it through a reverse proxy, probably nginx
This doesn't make it safe. It can still be exploited and used to join a botnet, as a proxy, to mine cryptocurrency, to spy on requests or redirect users to malicious websites or phish them, to host malware...
3) Deploy on a cloud provider’s managed Kubernetes behind a WAF. Now it’s web scale!
This should be a rite of passage: Read a sizeable RFC and make a passable implementation.
Should be back up now with a very temporary workaround in place.
I would expect GitHub page. The server seems down
It had a link to the GitHub page while it was still up.
https://github.com/GSGBen/unsafehttp
Doesn't seem to be up =\
Found the issue - a use after free in send_response() if I close the session early due to an error. Was continuing to the next bit. Put a temp fix in place, will push a proper one later.
Still seems to have an issue, but no output before the crash. Will have to do some more debugging. Thanks for the test HN!
Source is here btw: https://github.com/GSGBen/unsafehttp/blob/main/src/main.c
hotfixing httpd UAFs is peak HN spirit :)
Whoops, should be back up now. I'll have to check logs later to see why it went down.
You're going to need a bigger host to support HN traffic :)
I wish submitters would try using .onion sites for small static pages, for example as an alternative URL
Fewer source IPs
What is it about HN that overwhelms small servers like this? It was a small static page so I wouldn't think it'd be that much load on the server itself, even for an OrangePi like this one.
Too many simultaneous connections for his router maybe? Or too much bandwidth for his internet connection?
If they are behind a NAT/ stateful firewall there is just so much connections it will handle at once. I think OpenWRT has like 16K max by default, f.ex. So for less than 16K requests by different users/IPs… each is kept for about 1 minute I think… it quickly will go down, I guess. :)
cat /proc/sys/net/netfilter/nf_conntrack_max
Should give some details.
Do you know if using the DMZ feature on most routers instead of port forwarding would get around this limit, or if there's any other way?
Are you near Sydney? I noted a possible link to the Central Coast. I will contribute a smaller device if you're game to host it.
PS. You may be unaware that your shortened domain name 'benren' from your whois-available real name means "stupid person" in Mandarin. Only noted because there is a company registered with the same name since 1999. On the off chance it's yours, probably not the best marketing in a global world. Just throwing it out there.
It could be self-deprecating! Plus, I would more readily read it as 本人 (this person/me/myself) - than as 笨人 (stupid person).
Also, Pinyin is more susceptible to accidental interpretations than most writing systems due to ambiguity and tonality. For example, “mana” can be parsed into 32 different syllable-tone combinations (man/a or ma/na times 4x4 tone combinations for each syllable), and while most aren’t meaningful, that still gives you a ton of potential words to match against.
Almost everything is going to sound like something else in some other language, I don't know that there's much you can do about that. On the plus side, maybe the silly association will make the name stick in people's heads!
Nice effort but this isn’t interesting at all. You skipped the most interesting part; parsing http. This is beejs networking tutorial with writing a file to a socket.
Harsh? Maybe, but you’re posting this to a site with some of the most talented developers on planet. Real talk, sorry.
I swear that the only thing that draws people to this industry is the desire to escape their home village. It certainly isn't the quality of conversation with like-minded tinkerers. It's just losers like you who think a big paycheck for playing with Jira means you're the smartest boy in the world. God help us.
Shitty reply and this critique isn't helpful at all. You assumed the most interesting part; the thing you personally want.
Harsh? Maybe, but you're posting this to a site with some of the most jaded developers on the planet. Not sorry.
Even simple implementations serve as valuable learning exercises, and proper HTTP parsing could be the natural next step in the author's learning journey.
Obviously you aren't one of them with an attitude like that.
Let's see throwaway1492's code
nah this is pretty cool
Parsing HTTP is entirely unnecessary. That's the web client's job.
Do you mean parsing HTML? HTTP is the protocol they use to communicate, so both client and server must speak it. Or did I misunderstand you?