Skip to content

Highlight stream_request_body in documentation #2952

@CLKBlu3

Description

@CLKBlu3

I am pretty sure this has been addressed somewhere else but I can't find a solution, so I am asking for help here.

I have a POST to upload files into my server as follows:

class uploadHandler(BaseHandler):
    def post(self):
        if(checkForwardedHeader(self.request.headers)):
            posts.upload(self.request.files['file'][0])
        else:
            raise tornado.web.HTTPError(403) # Unauthorized!

(Base handler is a custom RequestHandler that enables POSTS, GET and OPTIONS, nothing else, so no relevant here)

def upload(filenames):
    try:
      path=xxxxxxxxx #customapathusingthe filenames['filename'] value
      filebody=filenames['body']
      
      output_file = open(path, 'wb')
      output_file.write(filebody)

      # close file
      output_file.close()
    exception Exception as e:
        .............. # capturing possible errors

This works completely fine with small files (<100Mb), 0 problems

However, for larger files, it does not work, even though Nginx should set the max_body_size to 800M

Uploads return, for large files: [I 201110 12:37:48 http1connection:289] Malformed HTTP message from 127.0.0.1: Content-Length too long
With Nginx capturing the following:

[error] 838512#838512: *3332387 readv() failed (104: Connection reset by peer) while reading upstream, client: 127.0.0.1, server: , request: "POST /upload HTTP/1.0", upstream: "http://127.0.0.1:8888/upload", host: "xxxxxx.com", referrer: "https://xxxxx.com"

So I decided to run the application with a custom max_buffer_size bigger than 100Mb to see if that was the issue since the docs (https://www.tornadoweb.org/en/stable/httpclient.html?highlight=upload) says the default size for the buffer is 100Mb

http_server = tornado.httpserver.HTTPServer(application, max_buffer_size=800000000) # 800MB buffer
http_server.listen(options.port)
tornado.ioloop.IOLoop.current().start()

Upon doing this, the error message changed to 502 Bad Gateway and immediatly crashing the webserver with a "Killed" message when uploading a file.
Ngninx logs show, in the other hand the following captured error:

2020/11/10 12:34:41 [error] 838512#838512: *3309856 upstream prematurely closed connection while reading response header from upstream, client: 127.0.0.1, server: , request: "POST /upload HTTP/1.0", upstream: "http://127.0.0.1:8888/upload", host: "xxxxx.com", referrer: "https://xxxxx.com"

Everything works fine for uploads < 100MB, fails for even 101MB :)

How should I actually approach big files uploads with Tornado?

Activity

ploxiln

ploxiln commented on Nov 10, 2020

@ploxiln
Contributor
CLKBlu3

CLKBlu3 commented on Nov 11, 2020

@CLKBlu3
Author

I did manage to fix it, thanks for the hint.
Anyway I think the docs/demo should explain/exemplify the "large files" scenario as I see its a recurring issue.

What I did is the following, if you want to add it as an example, its based on a stack overflow example, so you should be free to do so:

@tornado.web.stream_request_body
class uploadHandler(BaseHandler):
    def initialize(self):
        self.meta = dict()
        self.receiver = self.get_receiver()

    def data_received(self, chunk):
        self.receiver(chunk)

    def get_receiver(self):
        index = 0
        SEPARATE = b'\r\n'
        _UPLOADS_ = '/uploads' # uploads folder

        def receiver(chunk):
            nonlocal index
            if index == 0:
                print(chunk)
                index +=1
                split_chunk = chunk.split(SEPARATE)
                self.meta['boundary'] = SEPARATE + split_chunk[0] + b'--' + SEPARATE
                self.meta['header'] = SEPARATE.join(split_chunk[0:3])
                self.meta['header'] += SEPARATE *2
                self.meta['filename'] = split_chunk[1].split(b'=')[-1].replace(b'"', b'').decode()
                chunk = chunk[len(self.meta['header']):] #actual stream
                path = _UPLOADS_ + '/' + self.meta['filename'].lower() 
                
                self.fp = open(path, 'wb')
                self.fp.write(chunk)
            else:
                self.fp.write(chunk)
        return receiver

    def post(self, *args, **kwargs):
            self.meta['content_length'] = int(self.request.headers.get('Content-Length')) - len(self.meta['header']) - len(self.meta['boundary'])
            self.fp.seek(self.meta['content_length'], 0)
            self.fp.truncate()
            self.fp.close()
            self.finish('OK')
mivade

mivade commented on Nov 11, 2020

@mivade
Contributor

There is already a file upload demo here. It is also mentioned in passing in the documentation, though I think it could be made more prominent (somewhere in the User's Guide section, for example).

changed the title [-]Upload approaching for large files[/-] [+]Highlight stream_request_body in documentation[/+] on Nov 19, 2020
bdarnell

bdarnell commented on Nov 19, 2020

@bdarnell
Member

Yeah, this isn't documented very well unless you read through the full tornado.web module docs to see what's there. It does deserve to be called out somewhere more prominent, such as the user's guide. The max_buffer_size parameter also needs to be documented, and that documentation should point out that you almost never want to increase it - you may want to decrease it to control memory usage more tightly, but if you need more you really need stream_request_body instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @bdarnell@ploxiln@mivade@CLKBlu3

        Issue actions

          Highlight stream_request_body in documentation · Issue #2952 · tornadoweb/tornado