I am trying to iterate the rows from TableResult using getValues()
as below.
if I use getValues()
, it’s retrieving only the first page rows. I want to iterate all the rows using getValues()
and NOT using iterateAll()
.
In the below code, the problem is its going infinite time. not ending. while(results.hasNextPage())
is not ending. what is the problem in the below code?
{ query = "select from aa.bb.cc"; QueryJobConfiguration queryConfig = QueryJobConfiguration.newBuilder(query) .setPriority(QueryJobConfiguration.Priority.BATCH) .build(); TableResult results = bigquery.query(queryConfig); int i = 0; int j=0; while(results.hasNextPage()) { j++; System.out.println("page " + j); System.out.println("Data Extracted::" + i + " records"); for (FieldValueList row : results.getNextPage().getValues()) { i++; } } System.out.println("Total Count::" + results.getTotalRows()); System.out.println("Data Extracted::" + i + " records"); }
I have only 200,000 records in the source table. below is the out put and I forcefully stopped the process.
page 1 Data Extracted::0 records page 2 Data Extracted::85242 records page 3 Data Extracted::170484 records page 4 Data Extracted::255726 records page 5 Data Extracted::340968 records page 6 Data Extracted::426210 records page 7 Data Extracted::511452 records page 8 Data Extracted::596694 records ....... ....... ....... .......
Advertisement
Answer
In short, you need to update TableResults
variable with your getNextPage()
variable. If you don’t update it you will always be looping the same results over and over. Thats why you are getting tons of records in your output.
If you check the following samples: Bigquery Pagination and Using Java Client Library. There are ways that we can deal with pagination results. Although not specific for single run queries.
As show on the code below, which is partially based on pagination sample, you need to use the output of getNextPage()
to update results
variable and proceed to perform the next iteration inside the while up until it iterates all pages but the last.
QueryRun.Java
package com.projects; // [START bigquery_query] import com.google.cloud.bigquery.BigQuery; import com.google.cloud.bigquery.BigQueryException; import com.google.cloud.bigquery.BigQueryOptions; import com.google.cloud.bigquery.QueryJobConfiguration; import com.google.cloud.bigquery.TableResult; import com.google.cloud.bigquery.Job; import com.google.cloud.bigquery.JobId; import com.google.cloud.bigquery.FieldValueList; import com.google.cloud.bigquery.JobInfo; import com.google.cloud.bigquery.BigQuery.QueryResultsOption; import java.util.UUID; import sun.jvm.hotspot.debugger.Page; public class QueryRun { public static void main(String[] args) { String projectId = "bigquery-public-data"; String datasetName = "covid19_ecdc_eu"; String tableName = "covid_19_geographic_distribution_worldwide"; String query = "SELECT * " + " FROM `" + projectId + "." + datasetName + "." + tableName + "`" + " LIMIT 100"; System.out.println(query); query(query); } public static void query(String query) { try { BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService(); QueryJobConfiguration queryConfig = QueryJobConfiguration.newBuilder(query).build(); // Create a job ID so that we can safely retry. JobId jobId = JobId.of(UUID.randomUUID().toString()); Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build()); TableResult results = queryJob.getQueryResults(QueryResultsOption.pageSize(10)); int i = 0; int j =0; // get all paged data except last line while(results.hasNextPage()) { j++; for (FieldValueList row : results.getValues()) { i++; } results = results.getNextPage(); print_msg(i,j); } // last line run j++; for (FieldValueList row : results.getValues()) { i++; } print_msg(i,j); System.out.println("Query performed successfully."); } catch (BigQueryException | InterruptedException e) { System.out.println("Query not performed n" + e.toString()); } } public static void print_msg(int i,int j) { System.out.println("page " + j); System.out.println("Data Extracted::" + i + " records"); } } // [END bigquery_query]
output:
SELECT * FROM `bigquery-public-data.covid19_ecdc_eu.covid_19_geographic_distribution_worldwide` LIMIT 100 page 1 Data Extracted::10 records page 2 Data Extracted::20 records page 3 Data Extracted::30 records page 4 Data Extracted::40 records page 5 Data Extracted::50 records page 6 Data Extracted::60 records page 7 Data Extracted::70 records page 8 Data Extracted::80 records page 9 Data Extracted::90 records page 10 Data Extracted::100 records Query performed successfully.
As a final note, there are not official sample about pagination for queries so I’m not totally sure of the recommended way to handle pagination with java. Its not quite clear on the BigQuery for Java documentation page. If you can update your question with your approach to pagination I would appreciate.
If you have issues running the attached sample please see Using the BigQuery Java client sample, its github page and its pom.xml
file inside of it and check if you are in compliance with it.