Interpreting Your JSON Results
Introduction
This document serves as a guide to understanding the JSON model results files generated by the Indico Intake platform. The results file you see depends on several factors: the type of model, the default output version for your cluster, and whether optional functionalities like Review, Typed Answer Keys, Linked Labels, or Summarization are enabled.
This guide will walk you through the structure of each of these results files, explain the significance of each element, and provide practical examples to help you effectively utilize this data in your applications.
Output Overview
- Classification models, including Classify and Unbundle models, output predicted classes with confidence scores.
- Extraction models : output extracted text with confidence levels for each prediction.
- Linked Labels transformers: groups linked labels using an index in the output dictionary.
- Forms models: includes keys like recognized_forms and form_version to identify processed forms.
- *Review or Autoreview: Indicates the review status of model outputs.
- Object Detection
- Summarization: varies between results versions. For version 1, results mirror single classification output results. Unlike single classification output, it also contains summary text with citations. For version 3, the output contains an array for each unbundled file in the submission.
Each file also features links to ETL (Extract, Transform, Load) and OCR (Optical Character Recognition) outputs, which offer detailed information about each page within a submission. Read more about understanding ETL and OCR output in our Interpreting your OCR and ETL Output document.
In addition to model types, the following features also impact your output:
- Typed Answer Keys/Output Normalization: Adds a "Formatted" or "Normalized" key for field normalization.
The Span Key
Models like Extraction and Classify and Unbundle return labels containing a "spans" key, which represents semantically related text segments from the source document. The "ctx_id" field, if provided, tracks the parent span to organize related data through a workflow. If no "ctx_id" is provided, the model ran on the entire document.
Understanding Your JSON Results
Indico currently offers two versions of results files, version 1 and version 3. The file version you are using is accessible in the first line of the JSON output file. The formats have slight variations.
Classification
For classification results of all varieties tested under the name of the model group, there is a dictionary of the class name and confidence level of the prediction. Confidence levels are between 0 and 1.
Single/Multi Classification
For single classification, results are present for every model group as keys in the "results" dictionary.
For multi-classification, class name and confidence are present for every class in the model.
{
"file_version":1,
"submission_id": "12345",
"etl_output": "indico-file:///storage/submission/ocr_output.json",
"results": {
"document": {
"results": {
"Test Model 1": {
"Class 1": 0.688137223450789,
"Class 2": 0.08451932419717022,
"Class 3": 0.06304424768016251,
"Class 4": 0.021362314937542252
}
}
}
}
}
{
"file_version":3,
"modelgroup_metadata":{
"18":{
"name":"My model",
"selected_model_id": 13,
"task_type":"classification"
},
"19":{
"name":"My model",
"selected_model_id": 18,
"task_type":"annotation"
}
},
"review":{},
"submission_id":49994,
"submission_results":[
{
"submissionfile_id": 49
"input_filename":"airline-complaint.pdf",
"input_filepath":"indico-file:///storage/submission/13076/49999/91.pdf"
"etl_output":"indico-file:///storage/submission/13076/49999/91/etl_output.json",
"model_results":{
"ORIGINAL":{
"18":[
{
"confidence":{
"neg":0.7378238702168476,
"pos":0.26217612978315247
},
"label":"neg"
}
]
}
}
},
{
"submissionfile_id": 49
"input_filename":"airline-complaint.pdf",
"input_filepath":"indico-file:///storage/submission/13076/49999/49.pdf"
"etl_output":"indico-file:///storage/submission/13076/49999/49/etl_output.json",
"model_results":{
"ORIGINAL":{
"18":[
{
"confidence":{
"pos":0.7378238702168476,
"neg":0.26217612978315247
},
"label":"pos"
}
],
"19": [
{
"confidence": {
"Line Item": 0.9
}
"label":"Line Item",
"span_id": "span:123"
"spans": [
{
"start": 3,
"end": 8,
"page_num": 0,
"text": "Total",
}
]
}
]
}
}
}
]
}
Classify and Unbundle
{
"file_version": 1,
"submission_id": 191111,
"etl_output": "indico-file:///storage/submission/13733/19302/11111/etl_output.json",
"results": {
"document": {
"results": {
"classify and unbundle invoices": [],
"class filter for invoices": {
"field_id": 6866379,
"confidence": {
"Invoices": 0.999618701653154,
"Receipts": 0.0003806857502734332,
"Other": 6.125965725590266e-7
},
"label": "Other"
}
},
"rejected": {
"classify and unbundle invoices": [],
"class filter for invoices": []
}
}
}
}
{
"file_version":3,
"modelgroup_metadata":{
"18":{
"name":"My model",
"selected_model_id": 13,
"task_type":"classification_unbundling"
},
"19":{
"name":"My model",
"selected_model_id": 18,
"task_type":"classification_multiple"
}
},
"review":{},
"submission_id":49999,
"submission_results":[
{
"submissionfile_id": 91
"input_filename":"airline-complaint.pdf",
"input_filepath":"indico-file:///storage/submission/13076/49994/91.pdf"
"etl_output":"indico-file:///storage/submission/13076/49994/91/etl_output.json",
"model_results":{
"ORIGINAL":{
"18":[
{
"confidence":{
"neg":0.7378238702168476,
"pos":0.26217612978315247
},
"label":"neg",
"span_id": "span:1",
"spans": [
{start": 30, "end": 100, "page_num": 1},
{start": 101, "end": 200, "page_num": 2},
]
}
]
}
}
},
{
"submissionfile_id": 92
"input_filename":"airline-complaint.pdf",
"input_filepath":"indico-file:///storage/submission/13076/49994/92.pdf"
"etl_output":"indico-file:///storage/submission/13076/49994/92/etl_output.json",
"model_results":{
"ORIGINAL":{
"18":[
{
"confidence":{
"pos":0.7378238702168476,
"neg":0.26217612978315247
},
"label":"pos",
"span_id": "span:5"
"spans": [
{start": 320, "end": 350, "page_num": 3},
{start": 350, "end": 400, "page_num": 4},
]
}
{
"confidence":{
"neg":0.7378238702168476,
"pos":0.26217612978315247
},
"label":"neg",
"span_id": "span:7",
"spans": [
{start": 30, "end": 100, "page_num": 1},
{start": 101, "end": 200, "page_num": 2},
]
}
],
"19": [
{
"confidence": {
"First Class": 0.9,
"Second Class": 0.8,
"Third Class": 0.1,
},
"ctx_id": "span:5"
"label": ["First Class", "Second Class"],
}
]
}
}
}
]
}
Extraction
For extraction results, each extracted text (identified by character start and end indexes and "text") displays the confidence score (under the "confidence" nesting) for each of the classes in the model. The class with the highest confidence is identified as the label (under the "label" key). Confidence levels are between 0 and 1. You may notice in list of labels. indicates the absence of a label.
{
"submission_id": 91,
"etl_output": "indico-file:///storage/submission/ocr_output.json",
"errors": [],
"results": {
"document": {
"results": {
"Invoices Extraction": [
{
"start": 115,
"end": 126,
"label": "Invoice Number",
"text": "10000023222",
"confidence": {
"Line Item": 4.491627958458366e-9,
"Total": 1.0310143494507429e-7,
"Vendor": 3.994096786641421e-8,
"<PAD>": 3.4748592270261724e-7,
"Invoice Number": 0.9999995231628418,
"Line Item Value": 1.2659886472476956e-8
}
},
{
"start": 978,
"end": 1004,
"label": "Vendor",
"text": "Intrado Digital Media, LLC",
"confidence": {
"Line Item": 2.7375634203963273e-8,
"Total": 1.2861304909961291e-8,
"Vendor": 1,
"<PAD>": 1.0245797099628362e-8,
"Invoice Number": 3.3023983547764146e-8,
"Line Item Value": 4.132349928909207e-8
}
},
{
"start": 1960,
"end": 2069,
"label": "Line Item",
"text": "Indico CEO Tom Wilde to Present at Intelligent Automation for Banking, Financial Services and Insurance Event",
"confidence": {
"Line Item": 0.9999918937683105,
"Total": 7.112359412531077e-8,
"Vendor": 0.0000023915347355796257,
"<PAD>": 0.000003880942585965386,
"Invoice Number": 1.288364899210137e-7,
"Line Item Value": 0.0000016712019714759663
}
},
]
}
}
}
}
{
"file_version": 3,
"submission_id": 25542,
"modelgroup_metadata": {
"12842": {
"id": 12842,
"task_type": "annotation",
"name": "invoices",
"selected_model": {
"id": 18969,
"model_type": "finetune"
}
}
},
"submission_results": [
{
"submissionfile_id": 26781,
"etl_output": "indico-file:///storage/submission/10000/2000/20001/etl_output.json",
"input_filename": "amazonaws.pdf",
"input_filepath": "indico-file:///storage/submission/15200/25542/26781.pdf",
"input_filesize": 105501,
"model_results": {
"ORIGINAL": {
"12842": [
{
"label": "vendor",
"spans": [
{
"start": 29,
"end": 54,
"page_num": 0
}
],
"span_id": "26781:c:62510:idx:0",
"confidence": {
"invoice": 0.00000904287207958987,
"vendor": 0.9999739527702332
},
"field_id": 7220809,
"text": "Amazon Web Services, Inc.",
"normalized": {
"text": "Amazon Web Services, Inc.",
"start": 29,
"end": 54,
"structured": null,
"formatted": "Amazon Web Services, Inc.",
"status": "SUCCESS",
"validation": [
{
"validation_type": "TYPE_CONVERSION",
"error_message": null,
"validation_status": "SUCCESS"
}
]
}
},
{
"label": "invoice",
"spans": [
{
"start": 164,
"end": 173,
"page_num": 0
}
],
"span_id": "26781:c:62510:idx:1",
"confidence": {
"invoice": 0.9999997615814209,
"vendor": 5.3698755664299824e-8
},
"field_id": 7220808,
"text": "234983902",
"normalized": {
"text": "234983902",
"start": 164,
"end": 173,
"structured": null,
"formatted": "234983902",
"status": "SUCCESS",
"validation": [
{
"validation_type": "TYPE_CONVERSION",
"error_message": null,
"validation_status": "SUCCESS"
}
]
}
}
]
}
},
"component_results": {
"ORIGINAL": {}
},
"rejected": {
"models": {
"12842": []
},
"components": {}
}
},
],
"reviews": {}
}
Other JSON Output Styles
Linked Labels
Your JSON output document may look slightly different if your model is downstream from a linked labels transformer in a workflow. The results document will continue to follow the standard format with the addition of a "groupings" key. In the groupings dictionary, your linked labels groups will be identified by their index. Each instance of a group has a unique index, so labels that share an index are in the same label group.
If a prediction went through the transformer but was determined not to be part of any groups, it will contain an empty groupings dictionary (i.e., "groupings": []
)
{
"submission_id": 9,
"etl_output": "indico-file:///storage/submission/ocr_output.json",
"errors": [],
"results": {
"document": {
"results": {
"Invoices Extraction": [
{
"start": 115,
"end": 126,
"label": "Invoice Number",
"text": "10000023222",
"confidence": {
"Line Item Description": 4.491627958458366e-9,
"Total": 1.0310143494507429e-7,
"Vendor": 3.994096786641421e-8,
"<PAD>": 3.4748592270261724e-7,
"Invoice Number": 0.9999995231628418,
"Line Item Value": 1.2659886472476956e-8
},
"groupings": []
},
{
"start": 321,
"end": 340,
"label": "Line Item Description",
"text": "Hospitalization Level 2",
"confidence": {
"Invoice Number": 4.491627958458366e-9,
"Total": 1.0310143494507429e-7,
"Vendor": 3.994096786641421e-8,
"<PAD>": 3.4748592270261724e-7,
"Line Item Description": 0.9999995231628418,
"Line Item Value": 1.2659886472476956e-8
},
"groupings": [
{
"group_name": "Line Item",
"group_index": 1
}
]
},
{
"start": 351,
"end": 359,
"label": "Line Item Value",
"text": "350.00",
"confidence": {
"Line Item Value": 0.9999995231628418,
"Total": 1.0310143494507429e-7,
"Vendor": 3.994096786641421e-8,
"<PAD>": 3.4748592270261724e-7,
"Invoice Number": 1.0310143494507429e-7,
"Line Item Description": 1.2659886472476956e-8
},
"groupings": [
{
"group_name": "Line Item",
"group_index": 1
}
]
},
...
]
}
}
}
}
{
"file_version": 3,
"submission_id": 23111,
"modelgroup_metadata": {
"9111": {
"id": 9111,
"task_type": "classification_unbundling",
"name": "multi_models_classify_unbundle",
"selected_model": {
"id": 11111,
"model_type": "unbundle"
}
},
"9112": {
"id": 9112,
"task_type": "classification",
"name": "multi_models_classification_model",
"selected_model": {
"id": 11112,
"model_type": "tfidf_gbt"
}
},
"9113": {
"id": 9113,
"task_type": "annotation",
"name": "multi_models_extraction_model",
"selected_model": {
"id": 11114,
"model_type": "finetune"
}
}
},
"submission_results": [
{
"submissionfile_id": 21111,
"etl_output": "indico-file:///storage/submission/11599/23532/2111/etl_output.json",
"input_filename": "MultiClass.pdf",
"input_filepath": "indico-file:///storage/submission/11599/23111/24607.pdf",
"input_filesize": 15103,
"model_results": {
"ORIGINAL": {
"9111": [
{
"label": "financial disclosures",
"spans": [
{
"start": 0,
"end": 31,
"page_num": 0
}
],
"span_id": "24607:c:47034:idx:0",
"confidence": {
"annual report": 0.01469539012759924,
"avg annual report": 0.005166168324649334,
"financial disclosures": 0.9801384210586548
},
"field_id": 6403711
},
{
"label": "financial disclosures",
"spans": [
{
"start": 32,
"end": 89,
"page_num": 1
}
],
"span_id": "24607:c:47034:idx:1",
"confidence": {
"annual report": 0.011918464675545692,
"avg annual report": 0.004315620753914118,
"financial disclosures": 0.9837659001350403
},
"9113": []
}
},
"component_results": {
"ORIGINAL": {}
},
"rejected": {
"models": {
"9111": [],
"9112": [],
"9113": []
},
"components": {}
}
}
],
"reviews": {}
}
Forms Output
Output from an Forms model contains a recognized_forms key for the model, which details all the forms recognized in the output file, and a form_version key for each prediction.
Example:
{
"file_version": 1,
"submission_id": 11111,
"etl_output": "indico-file:///storage/submission/13708/11111/1111/etl_output.json",
"results": {
"document": {
"results": {
"ACORD Model": [
{
"start": null,
"end": null,
"label": "Agency",
"confidence": {
"Agency": 1.0
},
"field_id": 6801111,
"top": 219,
"bottom": 530,
"left": 63,
"right": 1286,
"page_num": 0,
"type": "text",
"text": "My Insurance Group \n1234 Main St. \nBoston, MA 02111",
"normalized": {
"text": "Mediocre Insurance Group \n1234 Main St. \nBoston, MA 02111",
"start": null,
"end": null,
"structured": null,
"formatted": "Mediocre Insurance Group \n1234 Main St. \nBoston, MA 02111",
"status": "SUCCESS",
"validation": [
{
"validation_type": "TYPE_CONVERSION",
"error_message": null,
"validation_status": "SUCCESS"
}
]
}
},
{
"file_version": 3,
"submission_id": 19111,
{
"processed_file_name": "indico-blob:///storage/submission/0000/6803/0000.pdf",
"recognized_forms": {
"Acord-125-2016-03": [
0
]
},
"etl_output_url": "indico-blob:///storage/submission/2851/6803/0000/etl_output.json"
}
],
"pages": [
{
"template_name": "299485",
"template_page_number": 0,
"match_confidence": 0.98,
"zones": [
{
"top": 119,
"bottom": 231,
"left": 2101,
"right": 2485,
"page_num": 0,
"type": "text",
"text": "",
"label": "Date",
"confidence": 100,
"form_version": "Acord-125-2016-03",
"value": ""
},
{
"top": 219,
"bottom": 530,
"left": 63,
"right": 1286,
"page_num": 0,
"type": "text",
"text": "Mediocre Insurance Group \n1234 Main St. \nBoston, MA 02111",
"label": "Agency",
"confidence": 100,
"form_version": "Acord-125-2016-03",
"value": "Mediocre Insurance Group \n1234 Main St. \nBoston, MA 02111"
},
{
"top": 219,
"bottom": 331,
"left": 1262,
"right": 2275,
"page_num": 0,
"type": "text",
"text": "Anonymouse Insurance",
"label": "Carrier",
"confidence": 100,
"form_version": "Acord-125-2016-03",
"value": "Anonymouse Insurance"
},
{
"top": 219,
"bottom": 331,
"left": 2251,
"right": 2485,
"page_num": 0,
"type": "text",
"text": "",
"label": "NAICCodePg1",
"confidence": 100,
"form_version": "Acord-125-2016-03",
"value": ""
},
{
"top": 325,
"bottom": 431,
"left": 1262,
"right": 2200,
"page_num": 0,
"type": "text",
"text": "",
"label": "PolicyProgramName",
"confidence": 100,
"form_version": "Acord-125-2016-03",
"value": ""
},
{
"top": 319,
"bottom": 431,
"left": 2176,
"right": 2485,
"page_num": 0,
"type": "text",
"text": "",
"label": "CompanyProductCode",
"confidence": 100,
"form_version": "Acord-125-2016-03",
"value": ""
},
{
"top": 419,
"bottom": 531,
"left": 1262,
"right": 2485,
"page_num": 0,
"type": "text",
"text": "0123456789",
"label": "PolicyNumber",
"confidence": 100,
"form_version": "Acord-125-2016-03",
"value": "0123456789"
},
Normalization/Typed Answer Keys
If you have used Typed Answer keys to normalize your output, that normalization will appear in your output file. A "normalized" or "formatted" section will be included in your document, which details the normalization expectation, whether or not it was successful, and the text's original format before the normalization.
A Note on Normalization with Autoreview
Autoreview users who have normalized their results will need to modify their autoreview scripts and their integration with downstream systems. Downstream systems and autoreview scripts should be updated to use the normalized values as the final value rather than the original text.
To update autoreview scripts: Change
prediction["text"]
toprediction["normalized"]["formatted"]
in autoreview and post-processing code.
Example:
{
"file_version": 1,
"submission_id": 91111,
"etl_output": "indico-file:///storage/submission/3106/93449/1111/etl_output.json",
"results": {
"document": {
"results": {
"Test Workflow": [
{
"start": 410,
"end": 432,
"label": "Income Amount",
"confidence": {
"Asset Value": 0.001380413887090981,
"Date of Appointment": 3.0055036859266693e-7,
"Department": 3.4668929060899245e-7,
"Income Amount": 0.7819252610206604,
"Liability Amount": 8.27995336294407e-6,
"Liability Type": 1.7472679019192583e-6,
"Name": 1.465798504796112e-6,
"Position": 7.679560098949878e-7,
"Previous Organization": 1.8361643014941365e-6,
"Previous Position": 1.587482643117255e-6
},
"field_id": 91111,
"page_num": 0,
"text": "00 6,575.91 6,575.91 0",
"normalized": {
"text": "00 6,575.91 6,575.91 0",
"start": 410,
"end": 412,
"structured": {
"currency": null,
"amount": 0.0,
"currency_symbol": null
},
"formatted": "$0.00",
"status": "SUCCESS",
"validation": [
{
"validation_type": "TYPE_CONVERSION",
"error_message": null,
"validation_status": "SUCCESS"
}
]
}
},
{
"start": 433,
"end": 436,
"label": "Income Amount",
"confidence": {
"Asset Value": 0.010080317035317421,
"Date of Appointment": 5.33476793407317e-7,
"Department": 5.270041469884745e-7,
"Income Amount": 0.6256295442581177,
"Liability Amount": 0.000012726128261419944,
"Liability Type": 5.120524292578921e-6,
"Name": 5.42804627912119e-7,
"Position": 7.942233537505672e-7,
"Previous Organization": 4.114456714887638e-6,
"Previous Position": 6.950397164473543e-6
},
"field_id": 91111,
"page_num": 0,
"text": "000",
"normalized": {
"text": "000",
"start": 433,
"end": 436,
"structured": {
"currency": null,
"amount": 0.0,
"currency_symbol": null
},
"formatted": "$0.00",
"status": "SUCCESS",
"validation": [
{
"validation_type": "TYPE_CONVERSION",
"error_message": null,
"validation_status": "SUCCESS"
}
]
}
}
"rejected": { "Test Workflow": [] }
}
}
}
{
"file_version": 3,
"submission_id": 25542,
"modelgroup_metadata": {
"12842": {
"id": 12842,
"task_type": "annotation",
"name": "invoices",
"selected_model": {
"id": 18969,
"model_type": "finetune"
}
}
},
"submission_results": [
{
"submissionfile_id": 26781,
"etl_output": "indico-file:///storage/submission/10000/2000/20001/etl_output.json",
"input_filename": "amazonaws.pdf",
"input_filepath": "indico-file:///storage/submission/15200/25542/26781.pdf",
"input_filesize": 105501,
"model_results": {
"ORIGINAL": {
"12842": [
{
"label": "vendor",
"spans": [
{
"start": 29,
"end": 54,
"page_num": 0
}
],
"span_id": "26781:c:62510:idx:0",
"confidence": {
"invoice": 0.00000904287207958987,
"vendor": 0.9999739527702332
},
"field_id": 7220809,
"text": "Amazon Web Services, Inc.",
"normalized": {
"text": "Amazon Web Services, Inc.",
"start": 29,
"end": 54,
"structured": null,
"formatted": "Amazon Web Services, Inc.",
"status": "SUCCESS",
"validation": [
{
"validation_type": "TYPE_CONVERSION",
"error_message": null,
"validation_status": "SUCCESS"
}
]
}
},
{
"label": "invoice",
"spans": [
{
"start": 164,
"end": 173,
"page_num": 0
}
],
"span_id": "26781:c:62510:idx:1",
"confidence": {
"invoice": 0.9999997615814209,
"vendor": 5.3698755664299824e-8
},
"field_id": 7220808,
"text": "234983902",
"normalized": {
"text": "234983902",
"start": 164,
"end": 173,
"structured": null,
"formatted": "234983902",
"status": "SUCCESS",
"validation": [
{
"validation_type": "TYPE_CONVERSION",
"error_message": null,
"validation_status": "SUCCESS"
}
]
}
}
]
}
},
"component_results": {
"ORIGINAL": {}
},
"rejected": {
"models": {
"12842": []
},
"components": {}
}
},
],
"reviews": {}
}
Output with Review/AutoReview
Raw result files remain unaltered to guarantee that all reviewers can access and evaluate the original file consistently. Results that undergo review closely resemble standard results but include additional nested sections: "pre-review," "final," "post-review," and "reviews_meta" for each model group. Confidence levels are not provided for results that incorporate reviewer corrections.
See the code comments for in-context code explanation!
A Note on Normalization with Autoreview
Autoreview users who have normalized their results will need to modify their autoreview scripts and their integration with downstream systems. Downstream systems and autoreview scripts should be updated to use the normalized values as the final value rather than the original text.
To update autoreview scripts: Change
prediction["text"]
toprediction["normalized"]["formatted"]
in autoreview and post-processing code or setprediction["normalized"]["formatted"]
in addition toprediction["text"]
in autoreview alone.
Example:
{
"submission_id": 23,
"etl_output": "foo_etl-output.json",
"errors": [],
"results": {
"document": {
"results": {
"bar": {
"pre_review": [
{
"etl": "Lorem Ipsum"
}
],
"post_reviews": [
[{
"etl": "dolor sit amet"
}],
[{
"etl": "consectetur adipiscing elit"
}]
],
"final": [
{
"etl": "consectetur adipiscing elit"
}
]
}
}
}
},
# Gives meta data of the review that rejected the submission
"reviews_meta": [
{
"review_id": 2,
"reviewer_id": 2,
"review_notes": "Fooey",
"review_rejected": false,
"review_type": "manual"
},
{
"review_id": 1,
"reviewer_id": 1,
"review_notes": null,
"review_rejected": false,
"review_type": "manual"
}
],
"file_version": 1,
# Final submissions meta info (kept for backwards compatibility)
"review_id": 1,
"reviewer_id": 1,
"review_notes": null,
"review_rejected": false,
"review_type": "manual"
}
# ==== Rejected in one of the reviews ====
{
"submission_id": 23,
"etl_output": "foo_etl-output.json",
"errors": [],
"results": {
"document": {
"results": {
"bar": {
"pre_review": [
{
"etl": "content"
}
],
"post_reviews": [null], # rejections are put in as null, with other reviews following or preceding it
"final": null # null, when any review rejects the submission
}
}
}
}
{
"file_version": 3,
"submission_id": 21111,
"modelgroup_metadata": {
"12111": {
"id": 12111,
"task_type": "annotation",
"name": "invoices",
"selected_model": {
"id": 18111,
"model_type": "finetune"
}
}
},
"submission_results": [
{
"submissionfile_id": 26111,
"etl_output": "indico-file:///storage/submission/10000/2000/20001/etl_output.json",
"input_filename": "amazonaws.pdf",
"input_filepath": "indico-file:///storage/submission/15200/25542/1111.pdf",
"input_filesize": 105501,
"model_results": {
"ORIGINAL": {
"11111": [
{
"label": "vendor",
"spans": [
{
"start": 29,
"end": 54,
"page_num": 0
}
],
"span_id": "26781:c:62510:idx:0",
"confidence": {
"invoice": 0.00000904287207958987,
"vendor": 0.9999739527702332
},
"field_id": 7220809,
"text": "Amazon Web Services, Inc.",
# Only for models with normalized output
"normalized": {
"text": "Amazon Web Services, Inc.",
"start": 29,
"end": 54,
"structured": null,
"formatted": "Amazon Web Services, Inc.",
"status": "SUCCESS",
"validation": [
{
"validation_type": "TYPE_CONVERSION",
"error_message": null,
"validation_status": "SUCCESS"
}
]
}
},
{
"label": "invoice",
"spans": [
{
"start": 164,
"end": 173,
"page_num": 0
}
],
}
]
}
},
"component_results": {
"ORIGINAL": {}
},
"rejected": {
"models": {
"12842": []
},
"components": {}
}
},
],
"reviews": {}
}
Output with Summarization Enabled
Summarization (Version 1): In version 1, the output with summaries contains similar dictionary outputs to single classification models with the addition of both summary text and citation data contained in the text
key, where each model’s output is structured as a dictionary rather than an array.
Summarization (Version 3): In version 3, the output is structured as an array with one entry per file in the submission. Each entry includes the summary text and citation data, which point back to segments in the source document.
Citation Formatting
The citation's included in your output have two ranges: one that points to the text in the source document and another that links to the text in the generated summary. Each citation corresponds to a specific segment of the generated summary. For instance, "[1-2]" indicates that a section of the summary is based on two different parts of the source document. These citations replace full phrases or sentences in the summary text and are indicated by numbers such as "[0]" or similar.
Example:
"summary_model": {
"field_id": 162,
"confidence": {
"foo": 1.0
},
"label": "foo",
"text": "• This document is a Restrictive Covenant and Confidentiality Agreement between Federal Home Loan Mortgage Corporation (Freddie Mac) and Donald J. Bisenius, effective from 11th March 2001 [0].\n• The agreement states that upon termination of employment with Freddie Mac, all materials embodying Confidential Information must be returned to the company [1-2]. \n• Confidential Information consists of various types of proprietary or confidential information relating to Freddie Mac's business, products, customers, and third parties [4][6]. \n• The agreement also contains a non-competition clause, which prevents the executive from seeking or accepting employment with any competitor of Freddie Mac for twelve months after termination of employment with Freddie Mac [12-13]. \n• Furthermore, the executive is prohibited from soliciting any Freddie Mac managerial employee for a period of twelve months after the termination date [23].\n• In exchange for agreeing to be employed by Freddie Mac under the conditions of this agreement, Freddie Mac will provide the executive with a twelve-month severance and a long-term incentive grant [31][36][39]. \n• The agreement is governed by the laws of the Commonwealth of Virginia [7]. \n• The executive acknowledges that breaching this agreement could result in discipline, including termination of employment [38][42].",
"citations": [
{
"document": {
"start": 0,
"end": 394,
"page_num": 0
},
"response": {
"start": 188,
"end": 191
}
},
...
]
}
},
"model_results": {
"ORIGINAL": {
"160": [
{
"field_id": 160,
"confidence": {
"foo": 1.0
},
"label": "foo",
"text": "• The document discusses three national parks in the United States: Dry Tortugas in Florida, Denali in Alaska, and Gates of the Arctic also in Alaska. \n• Dry Tortugas, established on October 26, 1992, is located at the westernmost end of the Florida Keys and is home to Fort Jefferson, the largest masonry structure in the Western Hemisphere. The park features undisturbed coral reefs and shipwrecks and is only accessible by plane or boat [0][4].\n• Denali, established on February 26, 1917, is centered around Denali, the tallest mountain in North America. The park is home to various wildlife species including grizzly bears, Dall sheep, Porcupine caribou, and wolves [1].\n• Gates of the Arctic, established on December 2, 1980, is the northernmost park in the country, located in Alaska's Brooks Range. The park does not have any facilities and has been home to Alaska Natives for 11,000 years [6].",
"citations": [
{
"document": {
"start": 16739,
"end": 17085,
"page_num": 6
},
"response": {
"start": 440,
"end": 443
}
},
...
],
"ctx_id": "75:c:489:idx:1"
},
{
"field_id": 160,
"confidence": {
"foo": 1.0
},
"label": "foo",
"text": "• The Gateway Arch, located in Missouri, is a 630-foot catenary arch built to commemorate the Lewis and Clark Expedition and the subsequent westward expansion of the country. It's located near the old courthouse, the first site of the Dred Scott case about slavery[0-1].\n• Waterton-Glacier International Peace Park, located in Montana, includes 26 glaciers and 130 named lakes surrounded by Rocky Mountain peaks, and it's home to historic hotels and the landmark Going-to-the-Sun Road[2-3].\n• Glacier Bay, situated in Alaska, contains tidewater glaciers, mountains, fjords, a temperate rainforest, and is home to a large population of grizzly bears, mountain goats, whales, seals, and eagles[4-5].\n• The Grand Canyon, located in Arizona, is a large canyon carved by the Colorado River. It's 277 miles long, up to 1 mile deep, and up to 15 miles wide. The erosion over millions of years has exposed the multicolored layers of the Colorado Plateau in mesas and canyon walls[6-7].",
"citations": [
{
"document": {
"start": 17757,
"end": 18405,
"page_num": 7
},
"response": {
"start": 264,
"end": 269
}
},
...
],
"ctx_id": "75:c:489:idx:2"
},
...
],
Updated 3 months ago