Download Insomnia 2023.5.8 and disable automatic updates. Though Insomnia doesn'...

1vuio0pswjnm7 · on March 9, 2024

Does this describe the "streaming HTTP" to which you refer

https://gist.github.com/CMCDragonkai/6bfade6431e9ffb7fe88

If not, is there an example of "streaming HTTP" you could provide that illustrates the limitation

dumbo-octopus · on March 10, 2024

Yes, that is that I mean. All the modern "ChatGPT" style API's use this, so if you're building anything that invokes them you can choose between buffering the entire response and modifying/relaying it once complete, or building up a toolchain of streaming-capable utilities.

Of all the http-client applications I tested (Curl, Postman, Insomnia, Bruno), somewhat hilariously Curl has the best support. It will output all 'Transfer-Encoding: chunked' with line-by-line buffering, whereas Postman only supports responses precisely following the `Content-Type: text/event-stream` format (strictly less powerful than curl, as this format requires newlines in between events, and a bunch of overhead on top of that). The others buffer the entire response before displaying anything.

The `Content-Type: text/event-stream` format is fine enough, but I personally prefer to just plainly chunk the responses and let the client choose whether to buffer them into memory and operate on the entire value at once, or interpret them live as the chunks come in. With tools like gjp-4-gpt (Gradual JSON Parser 4 GPT's) you can even interpret partial JSON objects live as they come in and display complex/nested data structures in a generative manner.

1vuio0pswjnm7 · on March 10, 2024

What if instead of JSON, i.e., strings of unknown length, used something more like netstrings.

https://cr.yp.to/proto/netstrings.txt

Personally I use a lame but effective simple 85.9 KiB static binary filter, a small C program, that removes the chunk sizes so the response is ready for use by other programs, e.g., in a pipe. Buffer is set at 8 KiB.

Is there a way to experiment with one of these streaming JSON GPT APIs non-interactively by just sending an HTTP request, without need for a third party program, an account, use of a Javascript engine, etc.

dumbo-octopus · on March 10, 2024

The unknown length isn't much of a problem for me in practice: GPT's are slow enough that getting a large chunk is almost impossible. I like the idea of the C filter, but in the end you're just piping the data to the program, why add the middle step? Is it to protect against too-large chunks in some way?

I don't know a public API that returns JSON slowly, but you could simulate it by just taking a JSON string, splitting it into 3-5 char chunks, and sending each of those in a `Transfer-Encoding: Chunked` response at ~100ms intervals.

Actually, now that I look at the underlying mechanism behind `Transfer-Encoding: Chunked`, it looks like it's already basically the same as the netstrings. What I'm referring to is the (variable length) contents of the netstring/chunk being sequential slices of a JSON object.

1vuio0pswjnm7 · on March 10, 2024

"I like the idea of the C filter, but in the end you're just piping the data to the program, why add the middle step?"

Only for the flexibility to use more programs. Otherwise every program I use to process HTTP responses needs to be able to accomodate chunked transfer encoding. Plus only a minority of sites send chunked responses. Instead, have one program that does one thing: remove chunked transfer encoding.

IIUC, what you want is uniform chunk sizes where you know the size before you send the request.

GPTs sound annoying if they are so slow that they only output a few characters every ~100ms..

dumbo-octopus · on March 10, 2024

Ah I see, I'm working a bit further up the stack from you so the JS runtime handles making the transfer encoding of the response more or less irrelevant, for any response with any encoding you can access `response.text()` and get the entire contents when they're ready, or do something like `for await (const chunk of response.body) { ... }` to work with the pieces as they roll in.

> IIUC, what you want is uniform chunk sizes where you know the size before you send the request.

I don't think so... I don't really want anything! Just a GUI that displays existing HTTP chunked responses as they come.

> GPTs sound annoying if they are so slow that they only output a few characters every ~100ms..

That's perhaps an exaggeration, but in general the speed and "smartness" of the model are inversely correlated.

1vuio0pswjnm7 · on March 10, 2024

I'm not really a GUI person nor do I use JS. I'm happy to see HTTP responses in textmode. I tried playing around with Postman and some other similar programs a while back in an attempt to understand how they could be superior to working with HTTP as text files. But I failed to see any advantage for what I need to do. One problem with GUIs/TUIs IMHO is that generally few people write them or modify them. And so users must look around to try to find one written by someone else that they like. Then they are stuck with it, and whatever "options" its author chooses to provide. Whereas with textmode programs, I can easily control all the aesthetics myself. Even when using other peoples' software, if there is something that annoys me, I can usually edit the source and change it.

Best of luck. Hope you can find the right program for viewing HTTP.

ponector · on March 9, 2024

Why not to do the same with postman? I still use it without the login.

dotancohen · on March 9, 2024

From where do you download old Postman versions? More importantly, which was the last version recommended?

dumbo-octopus · on March 10, 2024

The "Lite API" mode of current Postman is actually decent, it's the only GUI client I know that supports streaming responses, but you have to use `Content-Type: text/event-stream`. You can't save/share queries, but the local history is decent enough for local development. I prefer it to the Insomnia mutable fixed length saved query implementation for "hacking around" with many different APIs.