A Toolkit for OpenAI Batch
NEWS: I deployed the app online to make it easier to use. Visit openaibatch.vercel.app to give it a try!
The online service may not be able to process large CSV files. If you encounter any error, please pull the app and run it on your local device.
What is OpenAI Batch?
When using the OpenAI API for NLP tasks in social science research, I typically use the openai
package with pandas
to process CSV files, reading text and writing labels from OpenAI API responses. However, when a CSV file gets too large, the processing speed drops dramatically after handling a certain number of tasks. This is due to rate limits, as detailed in the rate limits documentation.
To tackle large workloads more efficiently, the best approach is to use OpenAI Batch. With OpenAI Batch, users can upload a JSONL file in a specific format. OpenAI processes the JSONL file (slower than standard API calls but more efficient for large jobs) and returns the results in the same format.
Key Advantages of OpenAI Batch
- Higher daily usage limits compared to standard API calls.
- Faster processing for large-scale NLP projects.
Workflow for Using OpenAI Batch
Here’s a simple workflow I follow to classify a set of sentences using OpenAI Batch:
- Create a CSV file with my target sentences in one column.
- Configure the task parameters and convert the CSV file to JSONL format.
- Upload the JSONL file to the OpenAI Batch service. If there’s no error, wait for the results.
- Download the processed JSONL file from the server and convert it back to a CSV file.
What Can OpenAI Batch Tools Do?
The above process requires some coding, and dealing with JSONL format and batch service limits can get pretty annoying. That’s why I made this app, which you can download here.
Menu Overview
The app menu looks like this:
It contains three tools to streamline the workflow:
CSV to JSONL Converter
Converts a CSV file to JSONL format, which is required for OpenAI Batch processing.
JSONL File Splitter
Splits a JSONL file into smaller files of equal size. If the JSONL file exceeds batch service limits, you can split it into smaller files, register new OpenAI accounts to process them separately, and combine the results later.
JSONL Response Extractor
After downloading batch outcomes from the OpenAI server, this feature extracts the responses and converts them back to a CSV file.
How to Use
First, prepare a CSV file as input. Specify the column that contains the text you want to analyze in the “Text Column” field, configure the other parameters, and click the “Convert” button to generate a JSONL file.
# Example of the parameters
# Model: I often use gpt-4o-mini, which is cheap and strikes a good balance between speed and quality.
# Max Tokens: This is the maximum number of tokens shared between the prompt and the response.
# One token is roughly 4 characters in English.
# Temperature: 1 is the default. A higher value gives more creative responses, a lower value gives more conservative ones.
Once the JSONL file is ready, you can upload it to the OpenAI Batch service to start processing.
After the batch service finishes processing, download the results and use the “JSONL Response Extractor” tool to convert them back to a CSV file.
Hope it can save you some time!