Using the Bulk APIs

  • Updated

Overview

If you have a large data set you want to process with our APIs, you should use the bulk versions of our APIs rather than the normal ones when available.  With one API call these APIs allow to process the equivalent of up to 10,000 standard API calls.

The following API have a bulk version:

  • Get Company (part of the Data API)
  • Get Contact (part of the Data API)
  • Enrich

These APIs are asynchronous; they allow you to submit jobs containing a bulk of queries.  Jobs are queued and processed asynchronously on our servers.  The typical time to complete after submission is between 1 minute and a few hours, depending on the size and number of jobs already in our processing queue.

Here are the typical steps:

  • Submitting a bulk of queries.  This will create a server job (the job ID is returned by the API)
  • Monitoring the status of the job.  A job can have one of the following statuses:
    • accepted:  the job has been received and validated and is in the job waiting queue.
    • processing: the job is being processed.
    • finished: the job processing is complete, results are ready to be downloaded.
    • failed: the job failed. See the results file for more details.
    • aborted: the job has been cancelled.
  • Downloading the results of the job.

NOTE: A job can only be cancelled if its processing has not started.

Bulk API calls and Quotas

When you use the bulk API call it impacts your quotas in the following ways:  

  • The number of queries in your bulk is debited from your quota when the processing of the job starts.
  • A job may be refused by our servers if the number of queries it contains would consume more transactions (1 query = 1 transaction) than what you have left for the quota period.
  • If you cancel a job after its processing is started or completed, the transactions consumed will not be re-credited.
  • Calls to the Job Status, Cancellation or Results endpoints do not impact your quotas.

Job Inputs and Results

File Format

Input and output files are CSV files, the structure is specific to each API, please refer to their respective documentation for more details.  Each row corresponds to one query or one result.  

For input files, all fields are mandatory, even for criteria that are not used.

In results files, multivalued fields (like sources) are represented as concatenated strings separated by the pipe symbol ('|').

Input files have to be passed as octet stream in the response body.  Here is a code snippet demonstrating on how to submit a file in java when using Apache HttpClient library:

String apiPath = "https://api.insideview.com/enrich/job";
String accessToken = "";
File file = new File("inputdata.csv");

// instantiate the POST method and setup necessary headers
HttpPost httppost = new HttpPost(apiPath); httppost.addHeader("accessToken", accessToken); httppost.addHeader("Accept", "application/json"); httppost.addHeader("Content-Type", "application/octet-stream"); httppost.setHeader("Connection", "keep-alive"); // read the input file into a byte array byte[] bFile = new byte[(int) file.length()]; FileInputStream fis = new FileInputStream(file); fis.read(bFile); fis.close(); // add the binary data to the query httppost.setEntity(new ByteArrayEntity(bFile)); // execute the API call response = (CloseableHttpResponse) httpclient.execute(httppost);

Identification of results

For the Enrich Bulk API, in order to be able to associate a result with your query you need to pass a unique id for each query within a bulk.  The ID field accepts alphanumeric strings of 40 characters or less.

For the Get Contact and Get Company bulk API, this is not necessary as the input is already a list of unique id.

Availability of Results

Job results are discarded 7 days after the completion of a job.  The job is then no longer available; trying to access the status or the results will give an HTTP 404 error.

Limits

The bulk API have a few limits of which you need to be aware:

  • A job cannot contain more than 10,000 queries
  • No more than two jobs per license key can be active in our queue at one time.  (Active meaning a status of "Accepted" or "Processing".