Interpreting Your JSON Results
Read this guide to understand the JSON output files generated by the Indico Agents and Workflows platform.
Introduction
The contents of the result file depend on several factors:
- The type of agent
- The default output version for your cluster
- Whether optional functionalities are enabled, such as:
- Review
- Typed Answer Keys (TAK)
- Linked Labels
- Summarization
Output Overview
- Classification agents, including Classify and Unbundle agents, output predicted classes with confidence scores.
- Extraction agents output extracted text with confidence levels for each prediction.
- Linked-Label transformers group linked labels using an index in the output dictionary.
- Form agents output includes keys like
recognized_formsandform_versionto identify processed forms. - Review or Autoreview indicate the review status of agent outputs.
- Summarization output varies between result-file versions:
- For version 1, results mirror single classification output results and contain summary text with citations.
- For version 3, the output contains an array for each unbundled file in the submission.
Each file also features links to ETL (Extract, Transform, Load) and OCR (Optical Character Recognition) outputs, which offer detailed information about each page within a submission. To understanding ETL and OCR output, read our Interpreting your OCR and ETL Output document.
In addition to the agent type, Typed Answer Keys/Output Normalization also impacts your output. It adds a formatted or normalized key for field normalization.
The
spansKeyAgents like Extraction and Classify and Unbundle return labels containing a
spanskey, which represents semantically related text segments from the source document. Thectx_idfield, if provided, tracks the parent span to organize related data through a workflow. If noctx_idis provided, the agent ran on the entire document.
Understanding Your Result Files
This guide will walk you through the structure of the results files, explain the significance of each element, and provide practical examples to help you effectively utilize this data in your applications.
File Hierarchy
There are three levels of result files that you get from a workflow:
- Top-level result file that typically has the following name structure
submission_<submission id>_result.json. - Mid-level result file named
etl_output.jsonprovides links to the full text of the submission and the low-level result files that contain the detailed results of the workflow. - Low-level detailed OCR result files:
page_info_<index>.jsonprovides detailed OCR data for the page identified by the index in the file name.- JSON files that provide granular OCR data on characters, tokens, and blocks.
To learn about the mid- and low-level output file contents, read Interpreting Your OCR and ETL Output.
File Versions
Indico currently offers two versions of results files, version 1 and version 3. The formats have slight variations.
The file version you are using is accessible in the first line of the JSON output file.
{
"file_version": 3,
...
}It is a best practice to use version 3 whenever possible. Indico is sunsetting version 1. It is currently being supported for backward compatibility only.
Classification
For classification results of all varieties tested under the name of the agent group, there is a dictionary of the document class name and confidence level of the prediction. Confidence levels are between 0 and 1.
Single Classification
For single classification, results are present for every agent group as keys in the results dictionary.
{
"file_version": 1,
"submission_id": "12345",
"etl_output": "indico-file:///storage/submission/ocr_output.json",
"results": {
"document": {
"results": {
"Test Model 1": {
"Class 1": 0.688137223450789,
"Class 2": 0.08451932419717022,
"Class 3": 0.06304424768016251,
"Class 4": 0.021362314937542252
}
}
}
}
}{
"file_version": 3,
"submission_id": 77924,
"modelgroup_metadata": {
"24616": {
"id": 24616,
"task_type": "classification",
"name": "class with two labels",
"selected_model": {
"id": 35369,
"model_type": "tfidf_gbt"
}
},
...
},
"component_metadata": {
...
"103423": {
"id": 103423,
"name": "Document Classification",
"component_type": "model_group",
"task_type": "classification"
},
...
},
"submission_results": [
{
"submissionfile_id": 182135,
"etl_output": "indico-file:///storage/submission/25803/77924/182135/etl_output.json",
"input_filename": "97783.txt",
"input_filepath": "indico-file:///storage/submission/25803/77924/182135.txt",
"input_filesize": 8171,
"model_results": {
"ORIGINAL": {
"24616": [
{
"field_id": 10559356,
"confidence": {
"email": 0.8453857085865302,
"other insurance files": 0.15461429141346983
},
"label": "email"
}
],
...
}
},
"component_results": {
"ORIGINAL": {}
},
"rejected": {
"models": {
"24616": [],
"24669": []
},
"components": {}
}
},
...
],
"reviews": {},
"errored_files": {}
}Multi Classification
For regular classification agents, class name and confidence are present for every document class in the agent.
For GenAI classification agents, only the highest probability document class has an associated confidence score.
{
"file_version": 3,
"submission_id": 97350,
"modelgroup_metadata": {
"4221": {
"id": 4221,
"task_type": "classification_multiple",
"name": "multi_classify",
"selected_model": {
"id": 6766,
"model_type": "tfidf_gbt"
}
},
...
},
"component_metadata": {
...
"id": 14735,
"name": null,
"component_type": "output_json_formatter",
"task_type": null
},
"14737": {
"id": 14737,
"name": "Document Multi-Classification",
"component_type": "model_group",
"task_type": "classification_multiple"
},
...
},
"submission_results": [
{
"submissionfile_id": 93642,
"etl_output": "indico-file:///storage/submission/4008/97350/93642/etl_output.json",
"input_filename": "leadership.pdf",
"input_filepath": "indico-file:///storage/submission/4008/97350/93642.pdf",
"input_filesize": 15248,
"model_results": {
"ORIGINAL": {
"4221": [
{
"field_ids": [
267290,
267290,
267290
],
"confidence": {
"problem": 0.9999999961727768,
"leadership": 0.9999999922852744,
"tech": 0.6666662614633455,
"earnings": 0.3333331376906894,
"other": 2.872262090213859e-8,
"acquisition": 2.3355426017253765e-9,
"energy": 9.715453333083489e-10,
"financial": 4.161086406129549e-10,
"foodretail": 4.353135264147288e-11,
"product": 2.2489297185618299e-11,
"auto": 2.2232509679333238e-11,
"pharmaceutical": 1.824398818834385e-14
},
"label": [
"problem",
"leadership",
"tech"
]
}
],
...
}
},
"component_results": {
"ORIGINAL": {}
},
"rejected": {
"models": {
"4221": [],
"4227": []
},
"components": {}
}
}
],
"reviews": {},
"errored_files": {}
}Classify and Unbundle
{
"file_version": 1,
"submission_id": 191111,
"etl_output": "indico-file:///storage/submission/13733/19302/11111/etl_output.json",
"results": {
"document": {
"results": {
"classify and unbundle invoices": [],
"class filter for invoices": {
"field_id": 6866379,
"confidence": {
"Invoices": 0.999618701653154,
"Receipts": 0.0003806857502734332,
"Other": 6.125965725590266e-7
},
"label": "Other"
}
},
"rejected": {
"classify and unbundle invoices": [],
"class filter for invoices": []
}
}
}
}{
"file_version": 3,
"submission_id": 97349,
"modelgroup_metadata": {
"5207": {
"id": 5207,
"task_type": "classification_unbundling",
"name": "classify_unbundle_readapiv2_no_unpack",
"selected_model": {
"id": 8302,
"model_type": "unbundle"
}
},
...
},
"component_metadata": {
...
"17002": {
"id": 17002,
"name": "Classify & Unbundle",
"component_type": "model_group",
"task_type": "classification_unbundling"
},
...
},
"submission_results": [
{
"submissionfile_id": 93641,
"etl_output": "indico-file:///storage/submission/4903/97349/93641/etl_output.json",
"input_filename": "bundled_doc-1.pdf",
"input_filepath": "indico-file:///storage/submission/4903/97349/93641.pdf",
"input_filesize": 361241,
"model_results": {
"ORIGINAL": {
...
"5207": [
{
"label": "annual report",
"spans": [
{
"start": 0,
"end": 2426,
"page_num": 0
},
{
"start": 2427,
"end": 2463,
"page_num": 1
}
],
"span_id": "93641:c:17002:idx:0",
"confidence": {
"annual report": 0.9949126243591309,
"avg annual report": 0.002969800028949976,
"financial disclosures": 0.002117517637088895
},
"field_id": 429947,
"location_type": "exact"
},
{
"label": "financial disclosures",
"spans": [
{
"start": 2464,
"end": 3659,
"page_num": 2
},
{
"start": 3660,
"end": 5017,
"page_num": 3
},
{
"start": 5018,
"end": 6477,
"page_num": 4
},
{
"start": 6478,
"end": 8038,
"page_num": 5
},
{
"start": 8039,
"end": 9392,
"page_num": 6
},
{
"start": 9393,
"end": 10802,
"page_num": 7
},
{
"start": 10803,
"end": 11939,
"page_num": 8
},
{
"start": 11940,
"end": 13167,
"page_num": 9
},
{
"start": 13168,
"end": 14544,
"page_num": 10
},
{
"start": 14545,
"end": 15919,
"page_num": 11
},
{
"start": 15920,
"end": 17441,
"page_num": 12
},
{
"start": 17442,
"end": 18723,
"page_num": 13
},
{
"start": 18724,
"end": 19341,
"page_num": 14
},
{
"start": 19342,
"end": 22187,
"page_num": 15
},
{
"start": 22188,
"end": 25429,
"page_num": 16
},
{
"start": 25430,
"end": 27204,
"page_num": 17
},
{
"start": 27205,
"end": 30796,
"page_num": 18
}
],
"span_id": "93641:c:17002:idx:1",
"confidence": {
"annual report": 0.0030741794034838678,
"avg annual report": 0.0038796598091721536,
"financial disclosures": 0.9930461049079895
},
"field_id": 429947,
"location_type": "exact"
},
{
"label": "avg annual report",
"spans": [
{
"start": 30797,
"end": 32809,
"page_num": 19
},
{
"start": 32810,
"end": 36737,
"page_num": 20
}
],
"span_id": "93641:c:17002:idx:2",
"confidence": {
"annual report": 0.003948610741645098,
"avg annual report": 0.9928027987480164,
"financial disclosures": 0.003248531138524413
},
"field_id": 429947,
"location_type": "exact"
}
]
}
},
"component_results": {
"ORIGINAL": {}
},
"rejected": {
"models": {
"5208": [],
"5210": [],
"5209": [],
"5207": []
},
"components": {}
}
}
],
"reviews": {},
"errored_files": {}
}Extraction
For extraction results, each extracted text (identified by character start and end indexes and text) displays the confidence score (under the confidence nesting) for each of the classes in the agent. The class with the highest confidence is identified as the label (under the label key). Confidence levels are between 0 and 1. You may notice <PAD> in list of labels. <PAD> indicates the absence of a label.
{
"submission_id": 91,
"etl_output": "indico-file:///storage/submission/ocr_output.json",
"errors": [],
"results": {
"document": {
"results": {
"Invoices Extraction": [
{
"start": 115,
"end": 126,
"label": "Invoice Number",
"text": "redacted",
"confidence": {
"Line Item": 4.491627958458366e-9,
"Total": 1.0310143494507429e-7,
"Vendor": 3.994096786641421e-8,
"<PAD>": 3.4748592270261724e-7,
"Invoice Number": 0.9999995231628418,
"Line Item Value": 1.2659886472476956e-8
}
},
{
"start": 978,
"end": 1004,
"label": "Vendor",
"text": "redacted",
"confidence": {
"Line Item": 2.7375634203963273e-8,
"Total": 1.2861304909961291e-8,
"Vendor": 1,
"<PAD>": 1.0245797099628362e-8,
"Invoice Number": 3.3023983547764146e-8,
"Line Item Value": 4.132349928909207e-8
}
},
{
"start": 1960,
"end": 2069,
"label": "Line Item",
"text": "redacted",
"confidence": {
"Line Item": 0.9999918937683105,
"Total": 7.112359412531077e-8,
"Vendor": 0.0000023915347355796257,
"<PAD>": 0.000003880942585965386,
"Invoice Number": 1.288364899210137e-7,
"Line Item Value": 0.0000016712019714759663
}
},
...
]
}
}
}
}{
"file_version": 3,
"submission_id": 39855,
"modelgroup_metadata": {
"18584": {
"id": 18584,
"task_type": "annotation",
"name": "model1",
"selected_model": {
"id": 23032,
"model_type": "finetune"
}
},
...
},
"component_metadata": {
"82564": {
"id": 82564,
"name": null,
"component_type": "input_ocr_extraction"
},
"82565": {
"id": 82565,
"name": null,
"component_type": "output_json_formatter"
},
"82567": {
"id": 82567,
"name": "Document Extraction",
"component_type": "model_group"
},
"82569": {
"id": 82569,
"name": "Document Extraction",
"component_type": "model_group"
},
"84000": {
"id": 84000,
"name": "Date and Price Group",
"component_type": "link_label"
},
"91655": {
"id": 91655,
"name": "Standard Output",
"component_type": "default_output"
}
},
"submission_results": [
{
"submissionfile_id": 58851,
"etl_output": "indico-file:///storage/submission/20869/39855/58851/etl_output.json",
"input_filename": "redacted",
"input_filepath": "indico-file:///storage/submission/20869/39855/58851.txt",
"input_filesize": 2919,
"model_results": {
"ORIGINAL": {
"18584": [
{
"label": "person_name",
"spans": [
{
"start": 80,
"end": 98,
"page_num": 0
}
],
"span_id": "58851:c:82567:idx:1",
"confidence": {
"address": 0.0017796654719859362,
"category": 0.00037611479638144374,
"date": 0.0002551926299929619,
"email": 0.1306380033493042,
"person_name": 0.8642205595970154,
"phone": 0.00008364625682588667,
"price": 0.00075566116720438,
"unformatted_summary": 0.0008147378102876246,
"unformatted_text": 0.0003376395034138113
},
"field_id": 7995219,
"location_type": "exact",
"text": "redacted redacted",
"groupings": [],
"normalized": {
"text": "redacted redacted",
"start": 80,
"end": 98,
"structured": null,
"formatted": "redacted redacted",
"status": "SUCCESS",
"comparison_type": "string",
"comparison_value": "redacted redacted",
"validation": [
{
"validation_type": "TYPE_CONVERSION",
"error_message": null,
"validation_status": "SUCCESS"
}
]
}
},
...
],
...
}
},
"component_results": {
"ORIGINAL": {}
},
"rejected": {
"models": {
"18584": [
{
"label": "email",
"spans": [
{
"start": 4,
"end": 73,
"page_num": 0
}
],
"span_id": "58851:c:82567:idx:0",
"confidence": {
"address": 0.0001422955101588741,
"category": 0.00004625660221790895,
"date": 0.00002712739478738513,
"email": 0.995801568031311,
"person_name": 0.0005087707540951669,
"phone": 0.000011167575394210871,
"price": 0.00023257092107087374,
"unformatted_summary": 0.0017664311453700066,
"unformatted_text": 0.00026696716668084264
},
"field_id": 7995218,
"location_type": "exact",
"text": "\"redacted\" <redacted>",
"groupings": []
},
...
],
...
},
"components": {}
}
},
...
],
"reviews": {},
"errored_files": {}
}Other JSON Output Styles
Linked Labels
Your JSON output document may look slightly different if your agent is downstream from a linked labels transformer in a workflow. The results document will continue to follow the standard format with the addition of a groupings key. In the groupings dictionary, your linked labels groups will be identified by their index. Each instance of a group has a unique index, so labels that share an index are in the same label group.
If a prediction went through the transformer but was determined not to be part of any groups, it will contain an empty groupings dictionary (i.e., "groupings": []).
{
"submission_id": 9,
"etl_output": "indico-file:///storage/submission/ocr_output.json",
"errors": [],
"results": {
"document": {
"results": {
"Invoices Extraction": [
{
"start": 115,
"end": 126,
"label": "Invoice Number",
"text": "10000023222",
"confidence": {
"Line Item Description": 4.491627958458366e-9,
"Total": 1.0310143494507429e-7,
"Vendor": 3.994096786641421e-8,
"<PAD>": 3.4748592270261724e-7,
"Invoice Number": 0.9999995231628418,
"Line Item Value": 1.2659886472476956e-8
},
"groupings": []
},
{
"start": 321,
"end": 340,
"label": "Line Item Description",
"text": "Hospitalization Level 2",
"confidence": {
"Invoice Number": 4.491627958458366e-9,
"Total": 1.0310143494507429e-7,
"Vendor": 3.994096786641421e-8,
"<PAD>": 3.4748592270261724e-7,
"Line Item Description": 0.9999995231628418,
"Line Item Value": 1.2659886472476956e-8
},
"groupings": [
{
"group_name": "Line Item",
"group_index": 1
}
]
},
{
"start": 351,
"end": 359,
"label": "Line Item Value",
"text": "350.00",
"confidence": {
"Line Item Value": 0.9999995231628418,
"Total": 1.0310143494507429e-7,
"Vendor": 3.994096786641421e-8,
"<PAD>": 3.4748592270261724e-7,
"Invoice Number": 1.0310143494507429e-7,
"Line Item Description": 1.2659886472476956e-8
},
"groupings": [
{
"group_name": "Line Item",
"group_index": 1
}
]
},
...
]
}
}
}
}{
"file_version": 3,
"submission_id": 23111,
"modelgroup_metadata": {
"9111": {
"id": 9111,
"task_type": "classification_unbundling",
"name": "multi_models_classify_unbundle",
"selected_model": {
"id": 11111,
"model_type": "unbundle"
}
},
"9112": {
"id": 9112,
"task_type": "classification",
"name": "multi_models_classification_model",
"selected_model": {
"id": 11112,
"model_type": "tfidf_gbt"
}
},
"9113": {
"id": 9113,
"task_type": "annotation",
"name": "multi_models_extraction_model",
"selected_model": {
"id": 11114,
"model_type": "finetune"
}
}
},
"submission_results": [
{
"submissionfile_id": 21111,
"etl_output": "indico-file:///storage/submission/11599/23532/2111/etl_output.json",
"input_filename": "MultiClass.pdf",
"input_filepath": "indico-file:///storage/submission/11599/23111/24607.pdf",
"input_filesize": 15103,
"model_results": {
"ORIGINAL": {
"9111": [
{
"label": "financial disclosures",
"spans": [
{
"start": 0,
"end": 31,
"page_num": 0
}
],
"span_id": "24607:c:47034:idx:0",
"confidence": {
"annual report": 0.01469539012759924,
"avg annual report": 0.005166168324649334,
"financial disclosures": 0.9801384210586548
},
"field_id": 6403711
},
{
"label": "financial disclosures",
"spans": [
{
"start": 32,
"end": 89,
"page_num": 1
}
],
"span_id": "24607:c:47034:idx:1",
"confidence": {
"annual report": 0.011918464675545692,
"avg annual report": 0.004315620753914118,
"financial disclosures": 0.9837659001350403
},
"9113": []
}
},
"component_results": {
"ORIGINAL": {}
},
"rejected": {
"models": {
"9111": [],
"9112": [],
"9113": []
},
"components": {}
}
}
],
"reviews": {}
}Forms Output
Output from a Forms agent contains a recognized_forms key for the agent, which details all the forms recognized in the output file, and a orm_version key for each prediction.
{
"file_version": 1,
"submission_id": 11111,
"etl_output": "indico-file:///storage/submission/13708/11111/1111/etl_output.json",
"results": {
"document": {
"results": {
"ACORD Model": [
{
"start": null,
"end": null,
"label": "Agency",
"confidence": {
"Agency": 1.0
},
"field_id": 6801111,
"top": 219,
"bottom": 530,
"left": 63,
"right": 1286,
"page_num": 0,
"type": "text",
"text": "My Insurance Group \n1234 Main St. \nBoston, MA 02111",
"normalized": {
"text": "Mediocre Insurance Group \n1234 Main St. \nBoston, MA 02111",
"start": null,
"end": null,
"structured": null,
"formatted": "Mediocre Insurance Group \n1234 Main St. \nBoston, MA 02111",
"status": "SUCCESS",
"validation": [
{
"validation_type": "TYPE_CONVERSION",
"error_message": null,
"validation_status": "SUCCESS"
}
]
}
},{
"file_version": 3,
"submission_id": 19111,
{
"processed_file_name": "indico-blob:///storage/submission/0000/6803/0000.pdf",
"recognized_forms": {
"Acord-125-2016-03": [
0
]
},
"etl_output_url": "indico-blob:///storage/submission/2851/6803/0000/etl_output.json"
}
],
"pages": [
{
"template_name": "299485",
"template_page_number": 0,
"match_confidence": 0.98,
"zones": [
{
"top": 119,
"bottom": 231,
"left": 2101,
"right": 2485,
"page_num": 0,
"type": "text",
"text": "",
"label": "Date",
"confidence": 100,
"form_version": "Acord-125-2016-03",
"value": ""
},
{
"top": 219,
"bottom": 530,
"left": 63,
"right": 1286,
"page_num": 0,
"type": "text",
"text": "Mediocre Insurance Group \n1234 Main St. \nBoston, MA 02111",
"label": "Agency",
"confidence": 100,
"form_version": "Acord-125-2016-03",
"value": "Mediocre Insurance Group \n1234 Main St. \nBoston, MA 02111"
},
{
"top": 219,
"bottom": 331,
"left": 1262,
"right": 2275,
"page_num": 0,
"type": "text",
"text": "Anonymous Insurance",
"label": "Carrier",
"confidence": 100,
"form_version": "Acord-125-2016-03",
"value": "Anonymous Insurance"
},
{
"top": 219,
"bottom": 331,
"left": 2251,
"right": 2485,
"page_num": 0,
"type": "text",
"text": "",
"label": "NAICCodePg1",
"confidence": 100,
"form_version": "Acord-125-2016-03",
"value": ""
},
{
"top": 325,
"bottom": 431,
"left": 1262,
"right": 2200,
"page_num": 0,
"type": "text",
"text": "",
"label": "PolicyProgramName",
"confidence": 100,
"form_version": "Acord-125-2016-03",
"value": ""
},
{
"top": 319,
"bottom": 431,
"left": 2176,
"right": 2485,
"page_num": 0,
"type": "text",
"text": "",
"label": "CompanyProductCode",
"confidence": 100,
"form_version": "Acord-125-2016-03",
"value": ""
},
{
"top": 419,
"bottom": 531,
"left": 1262,
"right": 2485,
"page_num": 0,
"type": "text",
"text": "0123456789",
"label": "PolicyNumber",
"confidence": 100,
"form_version": "Acord-125-2016-03",
"value": "0123456789"
},Normalization/Typed Answer Keys
If you have used Typed Answer Keys (TAK) to normalize your output, that normalization will appear in your output file. A normalized or formatted section will be included in your document, which details the normalization expectation, whether or not it was successful, and the original format of the text.
A Note on Normalization with Autoreview
Autoreview users who have normalized their results will need to modify their autoreview scripts and their integration with downstream systems. Downstream systems and autoreview scripts should be updated to use the normalized values as the final value rather than the original text.
To update autoreview scripts: Change
prediction["text"]toprediction["normalized"]["formatted"]in autoreview and post-processing code.
{
"file_version": 1,
"submission_id": 91111,
"etl_output": "indico-file:///storage/submission/3106/93449/1111/etl_output.json",
"results": {
"document": {
"results": {
"Test Workflow": [
{
"start": 410,
"end": 432,
"label": "Income Amount",
"confidence": {
"Asset Value": 0.001380413887090981,
"Date of Appointment": 3.0055036859266693e-7,
"Department": 3.4668929060899245e-7,
"Income Amount": 0.7819252610206604,
"Liability Amount": 8.27995336294407e-6,
"Liability Type": 1.7472679019192583e-6,
"Name": 1.465798504796112e-6,
"Position": 7.679560098949878e-7,
"Previous Organization": 1.8361643014941365e-6,
"Previous Position": 1.587482643117255e-6
},
"field_id": 91111,
"page_num": 0,
"text": "00 6,575.91 6,575.91 0",
"normalized": {
"text": "00 6,575.91 6,575.91 0",
"start": 410,
"end": 412,
"structured": {
"currency": null,
"amount": 0.0,
"currency_symbol": null
},
"formatted": "$0.00",
"status": "SUCCESS",
"validation": [
{
"validation_type": "TYPE_CONVERSION",
"error_message": null,
"validation_status": "SUCCESS"
}
]
}
},
{
"start": 433,
"end": 436,
"label": "Income Amount",
"confidence": {
"Asset Value": 0.010080317035317421,
"Date of Appointment": 5.33476793407317e-7,
"Department": 5.270041469884745e-7,
"Income Amount": 0.6256295442581177,
"Liability Amount": 0.000012726128261419944,
"Liability Type": 5.120524292578921e-6,
"Name": 5.42804627912119e-7,
"Position": 7.942233537505672e-7,
"Previous Organization": 4.114456714887638e-6,
"Previous Position": 6.950397164473543e-6
},
"field_id": 91111,
"page_num": 0,
"text": "000",
"normalized": {
"text": "000",
"start": 433,
"end": 436,
"structured": {
"currency": null,
"amount": 0.0,
"currency_symbol": null
},
"formatted": "$0.00",
"status": "SUCCESS",
"validation": [
{
"validation_type": "TYPE_CONVERSION",
"error_message": null,
"validation_status": "SUCCESS"
}
]
}
},{
"file_version": 3,
"submission_id": 25542,
"modelgroup_metadata": {
"12842": {
"id": 12842,
"task_type": "annotation",
"name": "invoices",
"selected_model": {
"id": 18969,
"model_type": "finetune"
}
}
},
"submission_results": [
{
"submissionfile_id": 26781,
"etl_output": "indico-file:///storage/submission/10000/2000/20001/etl_output.json",
"input_filename": "redacted",
"input_filepath": "indico-file:///storage/submission/15200/25542/26781.pdf",
"input_filesize": 105501,
"model_results": {
"ORIGINAL": {
"12842": [
{
"label": "vendor",
"spans": [
{
"start": 29,
"end": 54,
"page_num": 0
}
],
"span_id": "26781:c:62510:idx:0",
"confidence": {
"invoice": 0.00000904287207958987,
"vendor": 0.9999739527702332
},
"field_id": 7220809,
"text": "redacted",
"normalized": {
"text": "redacted",
"start": 29,
"end": 54,
"structured": null,
"formatted": "redacted",
"status": "SUCCESS",
"validation": [
{
"validation_type": "TYPE_CONVERSION",
"error_message": null,
"validation_status": "SUCCESS"
}
]
}
},
...
]
}
},
"component_results": {
"ORIGINAL": {}
},
"rejected": {
"models": {
"12842": []
},
"components": {}
}
},
...
],
"reviews": {}
}Output with Review/Autoreview
Raw result files remain unaltered to guarantee that all reviewers can access and evaluate the original file consistently. Results that undergo review closely resemble standard results but include additional nested sections: pre-review, final, post-review, and reviews_meta for each agent group. Confidence levels are not provided for results that incorporate reviewer corrections.
A note on normalization with autoreview
Autoreview users who have normalized their results will need to modify their autoreview scripts and their integration with downstream systems. Downstream systems and autoreview scripts should be updated to use the normalized values as the final value rather than the original text.
To update autoreview scripts: Change
prediction["text"]toprediction["normalized"]["formatted"]in autoreview and post-processing code or setprediction["normalized"]["formatted"]in addition toprediction["text"]in autoreview alone.
{
"submission_id": 23,
"etl_output": "foo_etl-output.json",
"errors": [],
"results": {
"document": {
"results": {
"bar": {
"pre_review": [
{
"etl": "Lorem Ipsum"
}
],
"post_reviews": [
[
{
"etl": "dolor sit amet"
}
],
[
{
"etl": "consectetur adipiscing elit"
}
]
],
"final": [
{
"etl": "consectetur adipiscing elit"
}
]
}
}
}
},
"reviews_meta": [
{
"review_id": 2,
"reviewer_id": 2,
"review_notes": "Fooey",
"review_rejected": false,
"review_type": "manual"
},
{
"review_id": 1,
"reviewer_id": 1,
"review_notes": null,
"review_rejected": false,
"review_type": "manual"
}
],
"file_version": 1,
"review_id": 1,
"reviewer_id": 1,
"review_notes": null,
"review_rejected": false,
"review_type": "manual"
}{
"file_version": 3,
"submission_id": 21111,
"modelgroup_metadata": {
"12111": {
"id": 12111,
"task_type": "annotation",
"name": "invoices",
"selected_model": {
"id": 18111,
"model_type": "finetune"
}
}
},
"submission_results": [
{
"submissionfile_id": 26111,
"etl_output": "indico-file:///storage/submission/10000/2000/20001/etl_output.json",
"input_filename": "redacted",
"input_filepath": "indico-file:///storage/submission/15200/25542/1111.pdf",
"input_filesize": 105501,
"model_results": {
"ORIGINAL": {
"11111": [
{
"label": "vendor",
"spans": [
{
"start": 29,
"end": 54,
"page_num": 0
}
],
"span_id": "26781:c:62510:idx:0",
"confidence": {
"invoice": 0.00000904287207958987,
"vendor": 0.9999739527702332
},
"field_id": 7220809,
"text": "redacted",
"normalized": {
"text": "redacted",
"start": 29,
"end": 54,
"structured": null,
"formatted": "redacted",
"status": "SUCCESS",
"validation": [
{
"validation_type": "TYPE_CONVERSION",
"error_message": null,
"validation_status": "SUCCESS"
}
]
}
},
...
]
}
},
"component_results": {
"ORIGINAL": {}
},
"rejected": {
"models": {
"12842": []
},
"components": {}
}
}
],
"reviews": {}
}Output with Summarization Enabled
- Summarization (Version 1): In version 1, the output with summaries contains similar dictionary outputs to single classification agents with the addition of both summary text and citation data contained in the
textkey, where each agent’s output is structured as a dictionary rather than an array. - Summarization (Version 3): In version 3, the output is structured as an array with one entry per file in the submission. Each entry includes the summary text and citation data, which point back to segments in the source document.
Citation formatting
The citation's included in your output have two ranges:
- One that points to the text in the source document
- And another that links to the text in the generated summary.
Each citation corresponds to a specific segment of the generated summary. For instance, "[1-2]" indicates that a section of the summary is based on two different parts of the source document. These citations replace full phrases or sentences in the summary text and are indicated by numbers such as "[0]" or similar.
"summary_model": {
"field_id": 162,
"confidence": {
"foo": 1.0
},
"label": "foo",
"text": "• This document ... the company [1-2]. \n• Confidential ... third parties [4][6]. \n• The ... [12-13]. \n• Furthermore, ... date [23].\n• In exchange for ... grant [31][36][39]. \n• The agreement ... Virginia [7]. \n• The ... employment [38][42].",
"citations": [
{
"document": {
"start": 0,
"end": 394,
"page_num": 0
},
"response": {
"start": 188,
"end": 191
}
},
...
]
}
},
..."model_results": {
"ORIGINAL": {
"160": [
{
"field_id": 160,
"confidence": {
"foo": 1.0
},
"label": "foo",
"text": "• The document ... or boat [0][4].\n• ... wolves [1].\n• Gates ... years [6].",
"citations": [
{
"document": {
"start": 16739,
"end": 17085,
"page_num": 6
},
"response": {
"start": 440,
"end": 443
}
},
...
],
"ctx_id": "75:c:489:idx:1"
},
...
],
...Updated 14 days ago
