• Object Storage Service

obs
  1. Help Center
  2. Object Storage Service
  3. Developer Guide (Python SDK)
  4. Object Upload
  5. Performing a Multipart Upload

Performing a Multipart Upload

To upload a large file, multipart upload is recommended. Multipart upload is applicable to many scenarios, including:

  • Files to be uploaded are larger than 100 MB.
  • The network condition is poor. Connection to the OBS server is constantly down.
  • Sizes of files to be uploaded are uncertain.

Multipart upload consists of three phases:

  1. Initialize a multipart upload (ObsClient.initiateMultipartUpload).
  2. Upload parts one by one or concurrently (ObsClient.uploadPart).
  3. Combine parts (ObsClient.completeMultipartUpload) or abort the multipart upload (ObsClient.abortMultipartUpload).

Initializing a Multipart Upload

Before upload, you need to notify OBS of initializing a multipart upload. This operation will return an upload ID (globally unique identifier) created by the OBS server to identify the multipart upload. You can use this upload ID to initiate related operations, such as aborting a multipart upload, listing multipart uploads, and listing uploaded parts.

You can call ObsClient.initiateMultipartUpload to initialize a multipart upload.

# Import the module.
from com.obs.client.obs_client import ObsClient

# Create an instance of ObsClient.
obsClient = ObsClient(
    access_key_id='*** Provide your Access Key ***',    
    secret_access_key='*** Provide your Secret Key ***',    
    server='yourdomainname'
)

metadata = {'property1' : 'property-value1', 'property2' : 'property-value2'}
resp = obsClient.initiateMultipartUpload('bucketname', 'objectkey', contentType='text/plain', metadata=metadata)

if resp.status < 300:
    print('requestId:', resp.requestId)
    print('uploadId:', resp.body.uploadId)
else:    
    print('errorCode:', resp.errorCode)
    print('errorMessage:', resp.errorMessage)
NOTE:
  • When initializing a multipart upload, you can use the contentType and metadata parameters to respectively set the MIME type and customize the metadata of an object, besides the object name and owning bucket.
  • After the API for initializing a multipart upload is called, the upload ID will be returned. This ID will be used in follow-up operations.

Uploading a Part

After initializing a multipart upload, you can specify the object name and upload ID to upload a part. Each part has a part number (ranging from 1 to 10000). For parts with the same upload ID, their part numbers are unique and identify their comparative locations in the object. If you use the same part number to upload two parts, the latter one being uploaded will overwrite the former. Except for the part last uploaded whose size ranges from 0 to 5 GB, sizes of the other parts range from 100 KB to 5 GB. Parts are uploaded in random order and can be uploaded through different processes or machines. OBS will combine them into the object based on their part numbers.

You can call ObsClient.uploadPart to upload a part.

# Import the module.
from com.obs.client.obs_client import ObsClient

# Create an instance of ObsClient.
obsClient = ObsClient(
    access_key_id='*** Provide your Access Key ***',    
    secret_access_key='*** Provide your Secret Key ***',    
    server='yourdomainname'
)
uploadId = 'upload id from initiateMultipartUpload'
# File to be uploaded
file_path = 'localfile'

# Part number of the first part
partNumber = 1
# Offset of the first part
offset = 0 
# Part size of the first part
partSize = 5 * 1024 * 1024 
# Upload the first part.
resp = obsClient.uploadPart('bucketname', 'objectkey', partNumber=partNumber, uploadId=uploadId, 
                            offset=offset, partSize=partSize, object=file_path, isFile=True)

if resp.status < 300:
    print('requestId:', resp.requestId)
    # Obtain the ETag after the upload.
    print('etag:', resp.body.etag)
else:    
    print('errorCode:', resp.errorCode)
    print('errorMessage:', resp.errorMessage)
    
# Part number of the second part
partNumber = 2
# Offset of the second part
offset = 5 * 1024 * 1024  
# Part size of the second part
partSize = 5 * 1024 * 1024
# Upload the second part.
resp = obsClient.uploadPart('bucketname', 'objectkey', partNumber=partNumber, uploadId=uploadId, 
                            offset=offset, partSize=partSize, object=file_path, isFile=True, isAttachMd5=True)

if resp.status < 300:
    print('requestId:', resp.requestId)
    # Obtain the ETag after the upload.
    print('etag:', resp.body.etag)
else:    
    print('errorCode:', resp.errorCode)
    print('errorMessage:', resp.errorMessage)
NOTE:
  • Except the part last uploaded, other parts must be larger than 100 KB. Part sizes will not be verified during upload because which one is last uploaded is not identified until parts are combined.
  • OBS will return ETags (MD5 values) of the received parts to users.
  • To ensure data integrity, set isAttachMd5 to True (the default value is False) to make SDK to automatically calculate the MD5 value of each part and add the MD5 value to the Content-MD5 request header. The OBS server will compare the MD5 value contained by each part and that calculated by SDK to verify the data integrity.
  • You can use the md5 parameter to set the MD5 value of the uploaded data directly. If this parameter is set, the isAttachMd5 parameter becomes ineffective.
  • Part numbers range from 1 to 10000. If the part number you set is out of this range, OBS will return error 400 Bad Request.

Combining Parts

After all parts are uploaded, call the API for combining parts to generate the object. Before this operation, valid part numbers and ETags of all parts must be sent to OBS. After receiving this information, OBS verifies the validity of each part one by one. After all parts pass the verification, OBS combines these parts to form the final object.

You can call ObsClient.completeMultipartUpload to combine parts.

# Import the module.
from com.obs.client.obs_client import ObsClient

# Create an instance of ObsClient.
obsClient = ObsClient(
    access_key_id='*** Provide your Access Key ***',    
    secret_access_key='*** Provide your Secret Key ***',    
    server='yourdomainname'
)
from com.obs.models.complete_multipart_upload_request import CompleteMultipartUploadRequest, CompletePart
 
part1 = CompletePart(partNum=1, etag='etag1')
part2 = CompletePart(partNum=2, etag='etag2')
 
completeMultipartUploadRequest = CompleteMultipartUploadRequest()
completeMultipartUploadRequest.parts = [part1, part2]

resp = obsClient.completeMultipartUpload('bucketname', 'objectkey', 'uploadid', completeMultipartUploadRequest)
   
if resp.status < 300:
    print('requestId:', resp.requestId)
else:
    print('errorCode:', resp.errorCode)
    print('errorMessage:', resp.errorMessage)
NOTE:
  • In the preceding code, CompleteMultipartUploadRequest.parts indicates the list of part numbers and ETags of uploaded parts. These parts are listed in ascending order by part number.
  • Part numbers can be inconsecutive.

Concurrently Uploading Parts

Multipart upload is mainly used for large file upload or when the network condition is poor. The following sample code shows how to concurrently upload parts in a multipart upload:

# Import the module.
from com.obs.client.obs_client import ObsClient

# Create an instance of ObsClient.
obsClient = ObsClient(
    access_key_id='*** Provide your Access Key ***',    
    secret_access_key='*** Provide your Secret Key ***',    
    server='yourdomainname'
)

import platform, os, threading, multiprocessing
IS_WINDOWS = platform.system() == 'Windows' or os.name == 'nt'

bucketName = 'bucketname'
objectKey = 'objectkey'
filePath = 'localfile'

def doUploadPart(lock, partETags, bucketName, objectKey, partNumber, uploadId, filePath, partSize, offset):
    resp = obsClient.uploadPart(bucketName, objectKey, partNumber, uploadId, filePath, isFile=True, partSize=partSize, offset=offset)
    if resp.status < 300:
        lock.acquire()
        try:
            partETags[partNumber] = resp.body.etag
            print('\tPart#' + str(partNumber) + ' done\n')
        finally:
            lock.release()

if __name__ == '__main__':
    # Initialize a multipart upload.
    resp = obsClient.initiateMultipartUpload(bucketName, objectKey)
    uploadId = resp.body.uploadId
    print(uploadId)
    
    # Set the part size to 100 MB.
    partSize = 100 * 1024 * 1024
    
    fileLength = os.path.getsize(filePath)
    # Calculate the number of parts to be uploaded.
    partCount = int(fileLength / partSize) if (fileLength % partSize == 0) else int(fileLength / partSize) + 1
    
    lock = threading.Lock() if IS_WINDOWS else multiprocessing.Lock()
    proc = threading.Thread if IS_WINDOWS else multiprocessing.Process
    partETags = dict() if IS_WINDOWS else multiprocessing.Manager().dict()
    processes = []
    
    #  Start uploading parts concurrently.
    for i in range(partCount):
        # Start point of the part in the file
        offset = i * partSize
        # Part size
        currPartSize = (fileLength - offset) if i + 1 == partCount else partSize
        # Part number
        partNumber = i + 1
        p = proc(target=doUploadPart, args=(lock, partETags, bucketName, objectKey, partNumber, uploadId, filePath, currPartSize, offset))
        p.daemon = True
        processes.append(p)
    
    for p in processes:
        p.start()
    
    # Wait until the upload is complete.
    for p in processes:
        p.join()
    
    # Combine parts.
    from com.obs.models.complete_multipart_upload_request import CompletePart, CompleteMultipartUploadRequest
    partETags = sorted(partETags.items(), key=lambda d : d[0])
    parts = []
    for key, value in partETags:
        parts.append(CompletePart(partNum=key, etag=value))
    
    resp = obsClient.completeMultipartUpload(bucketName, objectKey, uploadId, CompleteMultipartUploadRequest(parts))
    if resp.status < 300:
        print('requestId:', resp.requestId)
    else:    
        print('errorCode:', resp.errorCode)
        print('errorMessage:', resp.errorMessage)
NOTE:

When uploading a large file in multipart mode, you need to use the offset and partSize parameters to set the start and end positions of each part in the file.

Aborting a Multipart Upload

After a multipart upload is aborted, you cannot use its upload ID to perform any operation and the uploaded parts will be deleted by OBS.

You can call ObsClient.abortMultipartUpload to abort a multipart upload.

# Import the module.
from com.obs.client.obs_client import ObsClient

# Create an instance of ObsClient.
obsClient = ObsClient(
    access_key_id='*** Provide your Access Key ***',    
    secret_access_key='*** Provide your Secret Key ***',    
    server='yourdomainname'
)

obsClient.abortMultipartUpload('bucketname', 'objectkey', 'upload id from initiateMultipartUpload')

if resp.status < 300:
    print('requestId:', resp.requestId)
else:    
    print('errorCode:', resp.errorCode)
    print('errorMessage:', resp.errorMessage)

Listing Uploaded Parts

You can call ObsClient.listParts to list successfully uploaded parts of a multipart upload.

The following table describes the parameters involved in this API.

Parameter

Description

uploadId

Upload ID, which globally identifies a multipart upload. The value is in the returned result of initiateMultipartUpload.

maxParts

Maximum number of parts that can be listed per page

partNumberMarker

Part number after which listing uploaded parts begins. Only parts whose part numbers are larger than this value will be listed.

  • Listing parts in simple mode
# Import the module.
from com.obs.client.obs_client import ObsClient

# Create an instance of ObsClient.
obsClient = ObsClient(
    access_key_id='*** Provide your Access Key ***',    
    secret_access_key='*** Provide your Secret Key ***',    
    server='yourdomainname'
)

# List uploaded parts. uploadId is obtained from initiateMultipartUpload.
obsClient.listParts('bucketname', 'objectkey', uploadId='upload id from initiateMultipartUpload')

if resp.status < 300:
    print('requestId:', resp.requestId)
    index = 1    
    for part in resp.body.parts:        
        print('part [' + str(index) + ']')  
        # Part number, specified during the upload      
        print('partNumber:', part.partNumber)        
        # Time when the part was last uploaded
        print('lastModified:', part.lastModified)
        # ETag of the part        
        print('etag:', part.etag)    
        # Part size    
        print('size:', part.size)        
        index += 1
else:    
    print('errorCode:', resp.errorCode)
    print('errorMessage:', resp.errorMessage)
NOTE:
  • Information about a maximum of 1000 parts can be listed each time. If an upload of the specific upload ID contains more than 1000 parts and body.isTruncated is True in the returned result, not all parts are listed. In such cases, you can use body.nextPartNumberMarker to obtain the start position for next listing.
  • If you want to obtain all parts involved in a specific upload ID, you can use the paging mode for listing.
  • Listing all parts

If the number of parts of a multipart upload is larger than 1000, you can use the following sample code to list all parts.

# Import the module.
from com.obs.client.obs_client import ObsClient

# Create an instance of ObsClient.
obsClient = ObsClient(
    access_key_id='*** Provide your Access Key ***',    
    secret_access_key='*** Provide your Secret Key ***',    
    server='yourdomainname'
)

index = 1
nextPartNumberMarker = None
while True: 
    # List uploaded parts. uploadId is obtained from initiateMultipartUpload.
    resp = obsClient.listParts('bucketname', 'objectkey', uploadId='upload id from initiateMultipartUpload', partNumberMarker=nextPartNumberMarker)
    if resp.status < 300:
        print('requestId:', resp.requestId)
        for part in resp.body.parts:        
            print('part [' + str(index) + ']')
            # Part number, specified during the upload      
            print('partNumber:', part.partNumber)        
            # Time when the part was last uploaded
            print('lastModified:', part.lastModified)
            # ETag of the part        
            print('etag:', part.etag)    
            # Part size    
            print('size:', part.size)        
            index += 1
        if not resp.body.isTruncated:
            break
        nextPartNumberMarker = resp.body.nextPartNumberMarker
    else:    
        print('errorCode:', resp.errorCode)
        print('errorMessage:', resp.errorMessage)
        break 
  • Listing all parts in paging mode

The previously described listing (1000 parts per page) is a special paging listing mode. The following sample code shows how to specify the number of parts displayed per page when listing.

# Import the module.
from com.obs.client.obs_client import ObsClient

# Create an instance of ObsClient.
obsClient = ObsClient(
    access_key_id='*** Provide your Access Key ***',    
    secret_access_key='*** Provide your Secret Key ***',    
    server='yourdomainname'
)

index = 1
nextPartNumberMarker = None
# Set the number of parts displayed per page to 100.
maxParts = 100
while True: 
    # List uploaded parts. uploadId is obtained from initiateMultipartUpload.
    resp = obsClient.listParts('bucketname', 'objectkey', uploadId='upload id from initiateMultipartUpload', partNumberMarker=nextPartNumberMarker, maxParts=100)
    if resp.status < 300:
        print('requestId:', resp.requestId)
        for part in resp.body.parts:        
            print('part [' + str(index) + ']')
            # Part number, specified upon uploading      
            print('partNumber:', part.partNumber)        
            # Time when the part was last uploaded
            print('lastModified:', part.lastModified)
            # ETag of the part        
            print('etag:', part.etag)    
            # Part size    
            print('size:', part.size)        
            index += 1
        if not resp.body.isTruncated:
            break
        nextPartNumberMarker = resp.body.nextPartNumberMarker
    else:    
        print('errorCode:', resp.errorCode)
        print('errorMessage:', resp.errorMessage)
        break 

Listing Multipart Uploads

You can call ObsClient.listMultipartUploads to list multipart uploads. The following table describes parameters involved in ObsClient.listMultipartUploads.

Parameter

Description

ListMultipartUploadsRequest.prefix

Prefix that the object names in the multipart uploads to be listed must contain

ListMultipartUploadsRequest.max_uploads

Maximum number of listed multipart uploads. The value ranges from 1 to 1000. If the value is not in this range, 1000 multipart uploads are listed by default.

ListMultipartUploadsRequest.delimiter

Character used to group object names involved in multipart uploads. All tasks whose object names that contain the same string between the prefix, if specified, and the first occurrence of delimiter after the prefix are grouped under a single result element, commonPrefix.

ListMultipartUploadsRequest.key_marker

Object name to start with when listing multipart uploads

ListMultipartUploadsRequest.upload_id_marker

Upload ID after which the multipart upload listing begins. It is effective only when used with key_marker so that multipart uploads after upload_id_marker of key_marker will be listed.

  • Listing multipart uploads in simple mode
# Import the module.
from com.obs.client.obs_client import ObsClient

# Create an instance of ObsClient.
obsClient = ObsClient(
    access_key_id='*** Provide your Access Key ***',    
    secret_access_key='*** Provide your Secret Key ***',    
    server='yourdomainname'
)

resp = obsClient.listMultipartUploads('bucketname')
if resp.status < 300:
    print('requestId:', resp.requestId)
    index = 1
    for upload in resp.body.upload:
        print('upload [' + str(index) + ']')
        print('key:', upload.key)
        print('uploadId:', upload.uploadId)
        print('initiated:', upload.initiated)
        index += 1
else:
    print('errorCode:', resp.errorCode)
    print('errorMessage:', resp.errorMessage)
NOTE:
  • Information about a maximum of 1000 multipart uploads can be listed each time. If a bucket contains more than 1000 multipart uploads and body.isTruncated is True in the returned result, not all uploads are returned. In such cases, you can use body.nextKeyMarker and body.nextUploadIdMarker to obtain the start position for next listing.
  • If you want to obtain all multipart uploads in a bucket, you can list them in paging mode.
  • Listing all multipart uploads
# Import the module.
from com.obs.client.obs_client import ObsClient

# Create an instance of ObsClient.
obsClient = ObsClient(
    access_key_id='*** Provide your Access Key ***',    
    secret_access_key='*** Provide your Secret Key ***',    
    server='yourdomainname'
)

from com.obs.models.list_multipart_uploads_request import ListMultipartUploadsRequest
multipart = ListMultipartUploadsRequest()

while True:
    resp = obsClient.listMultipartUploads('bucketname', multipart=multipart)
    if resp.status < 300:
        print('requestId:', resp.requestId)
        index = 1
        for upload in resp.body.upload:
            print('upload [' + str(index) + ']')
            print('key:', upload.key)
            print('uploadId:', upload.uploadId)
            print('initiated:', upload.initiated)
            index += 1
            
        if not resp.body.isTruncated:
            break
        multipart.key_marker = resp.body.nextKeyMarker
        multipart.upload_id_marker = resp.body.nextUploadIdMarker
    else:
        print('errorCode:', resp.errorCode)
        print('errorMessage:', resp.errorMessage)
        break 
  • Listing all multipart uploads in paging mode

The previous sample code (listing 1000 uploads per page) is a special paging listing mode. The following sample code shows how to specify the number of uploads displayed per page when listing.

# Import the module.
from com.obs.client.obs_client import ObsClient

# Create an instance of ObsClient.
obsClient = ObsClient(
    access_key_id='*** Provide your Access Key ***',    
    secret_access_key='*** Provide your Secret Key ***',    
    server='yourdomainname'
)

from com.obs.models.list_multipart_uploads_request import ListMultipartUploadsRequest
multipart = ListMultipartUploadsRequest()
# Set the number of multipart uploads to be uploaded to 100.
multipart.max_uploads = 100

while True:
    resp = obsClient.listMultipartUploads('bucketname', multipart=multipart)
    if resp.status < 300:
        print('requestId:', resp.requestId)
        index = 1
        for upload in resp.body.upload:
            print('upload [' + str(index) + ']')
            print('key:', upload.key)
            print('uploadId:', upload.uploadId)
            print('initiated:', upload.initiated)
            index += 1
            
        if not resp.body.isTruncated:
            break
        multipart.key_marker = resp.body.nextKeyMarker
        multipart.upload_id_marker = resp.body.nextUploadIdMarker
    else:
        print('status:', resp.status)
        break