Efficient file transfer in S3 with Golang using Channels
In the age of cloud storage and large-scale file handling, the need for efficient, reliable, and fault-tolerant solutions is more critical than ever. AWS S3, as one of the most widely used cloud storage services, provides the scalability to handle large amounts of data, but when it comes to transferring files from local storage to S3, challenges like network latency, timeouts, and handling large volumes efficiently can arise.
This is where Golang shines. Known for its concurrency model, Golang allows developers to harness the power of goroutines and channels to parallelize tasks like file uploads. By leveraging these tools, you can not only speed up the transfer process but also introduce fault tolerance, ensuring that even if one part of the process fails, others can continue uninterrupted.
In this post, we’ll explore how to build a file transfer system using Golang’s goroutines and channels, allowing for simultaneous, efficient uploads with built-in error handling and retries. This approach highlights Golang’s ability to manage parallel tasks effectively, making it an excellent choice for high-performance cloud operations like file transfers to AWS S3.
This method is essential for anyone dealing with large datasets or seeking to optimize cloud infrastructure performance, as it offers a practical solution to improve transfer speed and reliability, with minimal code complexity.
The AWS account used in this post was set up using A Cloud Guru sandbox, which is a nice way to test small projects and get used to cloud providers like AWS, Azure and GCP.
The Project Code
Create a structure ./cmd/generator/main.go. Inside this file, put the following content:
package main
import (
"fmt"
"os"
)
func main() {
i := 0
for i < 30000 {
f, err := os.Create(fmt.Sprintf("./tmp/file%d.txt", i))
if err != nil {
panic(err)
}
defer f.Close()
f.WriteString("Hello, World!")
i++
}
}
The code above will create several files to be used to test the batch file transfer procedure from our local environment to the AWS S3 bucket.
The next step is to obtain a set of credentials from AWS to connect to the Account locally and be able to transfer files. Firstly, you should have access to the ACCESS_KEY_ID and SECRET_ACCESS_KEY (if you are using A Cloud Guru). Use the Export command as given below to configure the credentials in your Terminal:
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
Next, type aws sts get-session-token — duration-seconds 3600. You will be given a new set of access key id, secret access key and session token. Refresh the access key id and secret access key, with those new values, and also export the new session token generated, using export AWS_SESSION_TOKEN=… in the Terminal.
The next step is to write the actual code that will perform the files transfer. One problem that we commonly face when transferring several files to S3 at the same time is that the limit of concurrent file transfers is easily exceeded. We thus need a way to catch eventual problems and retry those. The best way to do this in Golang is to use goroutines and channels to capture the failed cases. As failed cases (not transferred files) happen, we then capture those with channels which then triggers a retry action. The complete code for this algorithm should be placed inside cmd/uploader/main.go:
package main
import (
"fmt"
"io"
"os"
"sync"
"github.com/aws/aws-sdk-go/aws"
"github.com/aws/aws-sdk-go/aws/credentials"
"github.com/aws/aws-sdk-go/aws/session"
"github.com/aws/aws-sdk-go/service/s3"
)
var (
s3Client *s3.S3
s3Bucket string
wg sync.WaitGroup
)
func init() {
sess, err := session.NewSession(
&aws.Config{
Region: aws.String("us-east-1"),
Credentials: credentials.NewStaticCredentials(
"---",
"---",
"---",
),
},
)
if err != nil {
panic(err)
}
s3Client = s3.New(sess)
s3Bucket = "bucket-name"
}
func main() {
dir, err := os.Open("./tmp")
if err != nil {
panic(err)
}
defer dir.Close()
uploadControl := make(chan struct{}, 100)
errorFileUpload := make(chan string, 10)
go func() {
for {
select {
case filename := <-errorFileUpload:
uploadControl <- struct{}{}
wg.Add(1)
go uploadFile(filename, uploadControl, errorFileUpload)
}
}
}()
for {
files, err := dir.ReadDir(1)
if err != nil {
if err == io.EOF {
break
}
fmt.Printf("Error reading directory: %s\n", err)
continue
}
wg.Add(1)
uploadControl <- struct{}{}
go uploadFile(files[0].Name(), uploadControl, errorFileUpload)
}
wg.Wait()
}
func uploadFile(filename string, uploadControl <-chan struct{}, errorFileUpload chan<- string) {
defer wg.Done()
completeFileName := fmt.Sprintf("./tmp/%s", filename)
fmt.Printf("Uploading file %s to bucket %s\n", completeFileName, s3Bucket)
f, err := os.Open(completeFileName)
if err != nil {
fmt.Printf("Error opening file %s: %s\n", filename, err)
<-uploadControl // empty the channel
errorFileUpload <- filename
return
}
defer f.Close()
_, err = s3Client.PutObject(&s3.PutObjectInput{
Bucket: aws.String(s3Bucket),
Key: aws.String(filename),
Body: f,
})
if err != nil {
fmt.Printf("Error uploading file %s\n", completeFileName)
<-uploadControl // empty the channel
errorFileUpload <- filename
return
}
fmt.Printf("File %s uploaded successfully\n", filename)
<-uploadControl // empty the channel
}
Do not forget to replace the credentials for your AWS account in the code above, as well as replace the “bucket-name” with your actual bucket name. Let us describe better how the max concurrent file transfer limit is solved:
- Concurrency Management: The code uses a buffered channel
uploadControl
to control the number of concurrent file uploads. By setting the channel buffer to 100 (uploadControl := make(chan struct{}, 100)
), it ensures that only up to 100 files are being uploaded simultaneously. Each time a file upload starts, the code sends a signal into this channel, and once the upload is complete (or failed), it removes that signal, allowing space for the next file. This limits the number of concurrent uploads and prevents overloading the system or reaching S3 limits. - Handling Failed Uploads: A separate channel
errorFileUpload := make(chan string, 10)
is used to track files that fail to upload. If an error occurs during the file upload, the filename is sent to this channel, and another goroutine, running in the background, constantly monitors this channel for any failed uploads. - Retry Logic: The retry mechanism is embedded within the second goroutine that listens for errors on the
errorFileUpload
channel. Each time a file fails (due to an error like exceeding the concurrent transfer limit or a network failure), the file’s name is sent into theerrorFileUpload
channel. The retry goroutine listens for these failed uploads and re-attempts them by adding the filename back into the upload queue.
Use go run cmd/uploader/main.go to transfer the created files to the S3 bucket. After some time, you should be able to verify that all files were successfully transferred thanks to our retry logic that uses the channels to send again the failed cases back to the main function that handles the upload action to the AWS S3 bucket.
The full code for this example can be seen at https://github.com/gabogomes/Go-File-Transfer-Example.