API & Documentation
How does the PDF2Data service work? The ideal solution.
Example of invoice, template and generated XML result + short explanatory video.
Documentation and sample project are available here: PDF2Data java API v1.5.2 and documentation
Supported file types |
|||
Electronic documents |
|||
File type |
Description |
Supported | Notes |
Portable Document Format | Yes |
Version 1.7 or earlier (including multipage) Only not secured (password protected) pdf are supported. |
|
doc / docx |
Microsoft Word |
Yes* | *experimental, at this moment only ".doc" is supported, soon will be suppported ".docx" |
xls / xlsx |
Microsoft Excel | Yes* | *experimental, at this moment only A4 format of excel page is supported. Information beyond the borders will be moved to next page. |
ppt / pptx |
Microsoft PowerPoint | Yes* | *experimental |
rtf |
Rich Text Format | Soon |
|
odt | Open Document Text | Yes* | *experimental |
ods | Open Document Spreadsheet | Yes* | *experimental |
odp | OpenDocument Presentation | Yes* | *experimental |
sxw | OpenOffice.org 1.0 Text | Soon | |
sxc | OpenOffice.org 1.0 Spreadsheet | Soon | |
sxi | OpenOffice.org 1.0 Presentation | Soon | |
wpd | Word Perfect | Soon | |
txt | Plain Text | Yes* | *experimental |
tsv | Tab Separated Values | Soon | |
html | HyperText Markup Language | Soon | |
OCR documents. The standard accepted quality is 300 dpi black/white, grayscale (preferred) or color. For the better results we recommend to scan documents at 400 or 500 dpi grayscale (preferred) or color. Supported OCR languages: click to see the list. |
|||
png | Portable Network Graphics | Yes | Black and white, gray, color |
jpeg / jpg | Joint Photographic Experts Group | Yes | Gray, color |
jp2 / jpc | JPEG 2000 | Yes | Gray - Part1, color - Part1 |
Portable Document Format | Yes |
Version 1.7 or earlier (including multipage) |
|
tiff / tif | Tagged Image File Format | Yes |
Black and white — uncompressed, CCITT3, CCITT4, Packbits, ZIP, LZW; Gray — uncompressed, Packbits, JPEG, ZIP, LZW; 24-bit color — uncompressed, JPEG, ZIP, LZW; 1-, 4-, 8-bit palette — uncompressed, Packbits, ZIP, LZW (including multipage TIFF) |
gif | Graphics Interchange Format | Yes |
Black and white — LZW-compressed; 2-, 3-, 4-, 5-, -6, 7-, 8-bit palette — LZW-compressed |
djvu / djv | DjVu | Yes | Black and white, gray, color |
jb2 | JBIG2 | Yes | Black and white |
Requesting available credits of pages |
||
URL: http://pdf2data.cloudforpeople.com/api/getCredits |
||
Parameter | Type | Description |
api_key | string |
User autentification. The "api_key" you can locate in your "Control Panel" on PDF2Data web interface. This parameter is required. |
XML result: | ||
The server return: <result> In case of error: see the file status.xml <status>
|
Requesting list of available templates from PDF2Data server |
||
URL: http://pdf2data.cloudforpeople.com/api/listTemplates |
||
Parameter | Type | Description |
api_key | string |
User autentification. The "api_key" you can locate in your "Control Panel" on PDF2Data web interface. This parameter is required. |
XML result: | ||
The server return: <templates> In case of error: see the file status.xml <status>
|
Requesting the Template Schema from PDF2Data server |
||
URL: http://pdf2data.cloudforpeople.com/api/getTemplateSchema |
||
Parameter | Type | Description |
api_key | string |
User autentification. The "api_key" you can locate in your "Control Panel" on PDF2Data web interface. This parameter is required. |
template_id | integer |
Template ID. This parameter is required. |
XML result: | ||
The server return: <result> In case of error: see the file status.xml <status>
|
Submit document to PDF2Data server for recognizing |
||
Submit single document for recognizing: URL: http://pdf2data.cloudforpeople.com/api/recognize Required: http method "post" and content-type "multipart/form-data" |
||
Parameter | Type | Description |
api_key | string |
User autentification. The "api_key" you can locate in your "Control Panel" on PDF2Data web interface. This parameter is required. |
|
||
template_id | integer |
Template id from a list of available templates. If you have created on our server template with corresponding "ID" then you can specify "-1" as template_id, so the system will "autorecognize" which template need to apply to document. If the document does not have appropriate template - the server will return Message "Invalid template_ID". We recommend to specify template_id as "-1". This parameter is required. |
|
||
file | byte array |
File. This parameter is required. |
|
||
scanned | string |
Indicate if document is scanned. Can be "true/false/auto". If you will not specify this parameter - our system will use "auto". We recommend to specify this parameter as: "scanned=auto" or do not specify this parameter at all. This parameter is NOT required. |
|
||
language | string |
The language to use for OCR processing. You can specify one primary and one secondary OCR languages divided by comma. See the list of available languages here. If you don`t specify the language - will be used the language from your Control Panel from web Interface. If you don`t specify the language - will be applied the primary and secondary language from your Control Panel from PDF2Data web interface. This parameter is NOT required. |
|
||
mimeType | string |
MIME type. This parameter is NOT required. |
|
||
batch | Boolean |
Used only for batch submitting. "true" if file is composed of a many invoices. *This function is experimental. This parameter is NOT required. |
|
||
split | integer |
Used only for batch submitting. This is a number of a splitting step: 0 - split file by separator sheet (download) 1 - split file page by page "n"... - split file by any "n" page This parameter is NOT required. |
XML result: |
After the document is submitted the server returns to user the "document ID" which is need to be temporarily stored for further requesting of recognizing result. NB: the recognizing result may be ready "instantly" (3 - 10 sec.) or later (5 min - 24 h). The time depends on the type of document (Electronic/OCR) and on whether it is human-controlled or no. |
|
The server return: In case of a single document: <result> <documentId>1</documentId> </result>
In case when document is "in progress" or in case of error: see the file status.xml <status> <error code="1800">Internal error.</error>
|
Getting a "status" of document from PDF2Data server |
||
URL: http://pdf2data.cloudforpeople.com/api/getStatus |
||
Parameter | Type | Description |
api_key | string |
User autentification. The "api_key" you can locate in your "Control Panel" on PDF2Data web interface. This parameter is required. |
document_id | integer |
Document ID. This parameter is required. |
XML result: | ||
The server return: see the file status.xml
<status> <error code="1800">Internal error.</error> |
Getting results from PDF2Data server (for both single documents and batch) |
||
URL: http://pdf2data.cloudforpeople.com/api/getResult |
||
Parameter | Type | Description |
api_key | string |
User autentification. The "api_key" you can locate in your "Control Panel" on PDF2Data web interface. This parameter is required. |
document_id | integer |
Document ID. This parameter is required. |
export_format | string |
Returns to user result as standard XML (if not specified), CSV file or personalized format (to activate personalized format please contact us, we can support large variety of personalized export formats). This parameter is NOT required. |
XML result: | ||
The server return (just example):
In case of "instantly" ready result: see the file result.xml
<result> <documentID>DocumentID</documentID> <documentType>Electronic</documentType> <associations> </associations>
In case when document is "in progress" or in case of error: see the file status.xml
<status> <error code="1800">Internal error.</error> |
Error and Info codes |
||
The list of Error and Info codes is available here. |
||
Code | Description | |
Code |
Examples: the document size is > 20 MB; the document format is not supported; the document is "secured"; or other causes. |
|
Code | Server error. |
|
XML result: | ||
The server return: see the file status.xml
<status> <error code="1800">Internal error.</error>
|
Goodies:
For XML viewing/editing — Notepad ++.
For timestamp conversion — EpochConverter.
We recommend the ultimate IDE for developers — Eclipse.