Best Practices on uploading CSVs to Snowflake and how to debug issues

Published
July 5, 2024
Author

Uploading CSV files to Snowflake is an essential task for data engineers and analysts who need to integrate structured data into their data warehousing solutions. Snowflake offers multiple methods for loading CSV files, catering to different user preferences and file sizes. This report delves into the various methods available for uploading CSV files to Snowflake, focusing on web interfaces and command-line tools, and outlines the required steps, privileges, and considerations for each method.

What are the Methods to Upload CSV to Snowflake?

Snowflake provides several methods to upload CSV files, each suited to different scenarios and user needs. The primary methods include using the web interfaces (Snowsight and Classic Console) and the SnowSQL command-line tool. Here’s a detailed look at each method and their specific use cases:

Snowsight

Snowsight is Snowflake’s modern web interface that allows users to upload files up to 250 MB. It offers intuitive dialogs for adding data and creating tables.

  • Load from Local Machine: Users can sign in to Snowsight, navigate to Data » Add Data, and follow the prompts to upload files from their local machine.
  • Load from Stage: Users can create a stage, upload files to it, and then load the data into a table using the Load Data into Table dialog.
  • Create New Table: Users can use the INFER_SCHEMA function to detect file metadata and generate a new table automatically.

Required Privileges:

  • Database USAGE
  • Schema USAGE
  • Stage USAGE (if loading from a stage)
  • File format USAGE (for named file formats)
  • Table OWNERSHIP (to create new tables)

Steps to Load Data Using Snowsight

  1. Sign in: Access the Snowsight interface by signing in to your Snowflake account.
  2. Navigate to Data: Go to the Data tab and select "Add Data."
  3. Follow Prompts: Choose whether to upload from a local machine or a stage. Follow the on-screen prompts to complete the upload.
  4. Select File Format: Choose the appropriate file format (CSV) and handle any errors if necessary.

Classic Console

The Classic Console provides a wizard for loading limited amounts of data and supports loading from local machines or cloud storage services like Amazon S3, Google Cloud Storage, and Microsoft Azure.

Prerequisites:

  • Identify the data file location.
  • Ensure each file is up to 250 MB.
  • Required privileges include USAGE on the database and schema, CREATE STAGE, and CREATE FILE FORMAT.

Steps to Load Data Using Classic Console

  1. Select Databases: Navigate to the Databases section and choose the relevant database and schema.
  2. Open Table Details: Select the table into which you want to load data and open its details.
  3. Load Data: Click on the "Load Data" button and follow the wizard steps to upload data from your computer or cloud storage.

How to Choose Between Snowsight and Classic Console?

Both Snowsight and Classic Console offer user-friendly interfaces for uploading CSV files to Snowflake. However, they cater to slightly different needs and preferences.

Feature Snowsight Classic Console File Size Limit Up to 250 MB Up to 250 MB Data Source Local Machine, Stage Local Machine, Cloud Storage Table Creation INFER_SCHEMA function for automatic creation Manual table creation Interface Modern, intuitive Traditional wizard-based Privileges USAGE on database, schema, stage, file format USAGE on database and schema, CREATE STAGE

What to Do for Files Larger Than 250 MB?

For files larger than 250 MB, Snowflake recommends using SnowSQL, a command-line tool that offers greater flexibility and control over the data loading process.

SnowSQL

SnowSQL is a powerful command-line tool provided by Snowflake for executing SQL queries and performing various database operations, including data loading. It is particularly useful for uploading large files and automating data loading tasks.

Required Privileges:

  • CREATE STAGE
  • CREATE FILE FORMAT
  • USAGE on the database and schema

Steps to Load Data Using SnowSQL

  1. Install SnowSQL: Download and install SnowSQL from the Snowflake website.
  2. Configure SnowSQL: Set up the configuration file with your Snowflake credentials and connection details.
  3. Create a Stage: Use the CREATE STAGE command to create a stage where your files will be uploaded.
  4. Upload Files: Use the PUT command to upload files from your local machine to the stage.
  5. Copy Data into Table: Use the COPY INTO command to load data from the stage into the target table.

Example Commands

Create Stage:

CREATE OR REPLACE STAGE my_stage;

Upload Files:

snowsql -q "PUT file:///path/to/your/file.csv @my_stage"

Copy Data into Table:

COPY INTO my_table
FROM @my_stage/file.csv
FILE_FORMAT = (TYPE = 'CSV' FIELD_OPTIONALLY_ENCLOSED_BY='"');

What Are the Common Challenges and Solutions?

Uploading CSV files to Snowflake can sometimes encounter challenges. Here are some common issues and their solutions:

  • File Size Limitations: For files larger than 250 MB, use SnowSQL instead of the web interfaces.
  • Data Format Issues: Ensure the CSV file adheres to the expected format and handle any discrepancies using file format options in Snowflake.
  • Privileges and Permissions: Verify that the necessary privileges are granted to the user account performing the upload.

How to Handle Errors During Data Loading?

Errors during data loading can occur due to various reasons, including format mismatches, missing values, or incorrect privileges. Here are some strategies to handle these errors:

  • Error Handling Options: Use the error handling options in the COPY INTO command to specify how to deal with errors, such as skipping error rows or logging errors to a table.
  • Data Validation: Validate the data before uploading to ensure it meets the required format and constraints.
  • Review Error Logs: Check Snowflake’s error logs for detailed information on any errors encountered during the data loading process.

Example of Handling Errors

COPY INTO my_table
FROM @my_stage/file.csv
FILE_FORMAT = (TYPE = 'CSV' FIELD_OPTIONALLY_ENCLOSED_BY='"')
ON_ERROR = 'CONTINUE';

What Are the Best Practices for Uploading CSV to Snowflake?

To ensure a smooth and efficient data loading process, follow these best practices:

  • Preprocess Data: Clean and preprocess the data before uploading to avoid format issues and errors.
  • Use Appropriate Tools: Choose the right tool (Snowsight, Classic Console, or SnowSQL) based on the file size and complexity of the task.
  • Monitor and Optimize: Regularly monitor the performance of data loading operations and optimize them as needed, such as by splitting large files or adjusting file format options.

Common Challenges and Solutions

  • File Size Limitations: For files larger than 250 MB, use SnowSQL instead of the web interfaces.
  • Data Format Issues: Ensure the CSV file adheres to the expected format and handle any discrepancies using file format options in Snowflake.
  • Privileges and Permissions: Verify that the necessary privileges are granted to the user account performing the upload.

Recap

  • Multiple Methods: Snowflake offers various methods for uploading CSV files, including Snowsight, Classic Console, and SnowSQL, each catering to different file sizes and user preferences.
  • Required Privileges: Ensure you have the necessary privileges such as USAGE on the database and schema, CREATE STAGE, and CREATE FILE FORMAT to successfully upload and manage data.
  • Best Practices: Follow best practices like preprocessing data, choosing the right tool, and monitoring performance to ensure a smooth and efficient data loading process.

Keep reading

See all