Using Docparser to create orders in a Laravel Passport application

Brian Kidd
17 min readJun 13, 2018

--

I’m a Docparser integration partner and love how easy it is to extract data trapped in PDF documents. The ease of integrating with their API makes my life easier and my clients’ lives are better not having to manually input data. You can check out Docparser here: https://docparser.com

One of my clients, a local commercial printshop, approached me about an application to help them track their orders from prepress through delivery. They had been using a paper-based method for years but were ready to put in the effort to change their processes and embrace technology.

At Magnigen, we take on whatever our clients need us to and that sometimes involves working with legacy applications but when we get to choose, we always reach for Laravel. The Laravel framework offers everything a modern web application requires and it’s a joy to work with.

We implemented the job tracking application and it was meeting their needs. As I mentioned, going from a paper-based method of tracking to using software is quite a leap so I’ve learned to take it slow and find some way to roll out the new process gradually. In this case, we started with a single customer that had the largest volume of orders.

One day I was having lunch with the owner and I asked how it was going. He said it was going well but started explaining how he received the orders from this customer and the effort it took to enter them in the application. They accept orders in a number of ways including by phone, email, and in person. In this case, his client is a large marketing firm and they have a process that includes emailing the purchase order — the purchase order comes in the form of a PDF attachment to an email.

I had started experimenting with Docparser and told him I may have a solution. He was intrigued so I explained to him how it would work:

  1. Client emails the purchase order
  2. Printshop forwards the email to the parser’s email address
  3. Using parsing rules, Docparser extracts the data from the PDF
  4. After extracting, the Docparser webhook posts the data to their order tracking application
  5. A new order is created and enters their job processing queue

He understood it at a high level and gave me the go ahead to implement it.

In this post, we’ll build out the Laravel controller to handle the webhook from Docparser and create the order in the application. We first need an application to work with. We can start a new Laravel application but I’ve been meaning to try Laravel Passport, a Laravel package for OAuth2 authentication, so we’ll use it here. If it wasn’t for Laravel and this package, it may be cause for worry but this is so dead simple, I think I can even do it :-)

Laravel

New project

The current version of Laravel is 5.6. I develop on a Mac and have installed the Laravel installer. I use Homestead but, if you are using Valet, you can skip some of these steps. You’ll know what you’ll need to do. Our goal at this point is to get the welcome page to render in the browser. Here’s my short list of steps:

  1. In my development directory, run laravel new docparser
  2. Run cd docparser
  3. Run npm install
  4. Add new project to Homestead.yaml
  5. From vagrant bin directory, vagrant up then vagrant provision
  6. Add to /etc/hosts file
  7. Create database
  8. Change database name in .env file

If everything went well, you should be able to reach your application by navigating to the project URL and see the Laravel welcome page:

Laravel default welcome page

Passport

Let’s go ahead and install Passport. All the steps are in the Laravel Passport documentation. If you are a Laracasts subscriber (and if you’re not, you should be), you can see Taylor Otwell himself implementing Passport. Be sure and go as far as registering the Vue passport components, running npm run dev, and add the components to the home view. These are provided for us by the package, let's use them to save time.

If you haven’t already, scaffold out the default authentication using php artisan make:auth in your project root directory.

If you refresh the welcome page, you should have the register and login links in the upper right corner of the page.

Click Register and create a new user. I’m going to name my user Docparser — by doing this, I can add a creator belongs to relationship on my order and see the order was created by Docparser. After entering the required data, click Register and the user will be registered and logged in to the application.

If you followed my suggestion and added the Passport Vue components to the home view, you should see something like this:

Spend some time learning about OAuth. We’ll use a small sliver of Passport to create a personal access token. This token will allow us to interact with our application through the endpoint we’ll implement later.

Because I’m logged in as the Docparser user when I create this token, the token will be associated with the Docparser user and when I make requests with it, the application will know it’s me and allow requests that would otherwise be rejected if we didn’t provide a valid token. Think of it as a username and password all rolled up into one and treat it like you would a password.

Click Create New Token and enter a valid name. When you click Create, you’ll be prompted with the token and won’t be able to access it again so copy and save it. The token should be listed in the Personal Access Tokens section.

We now have authentication in place. Let’s get started with creating our Order model.

Order Model and Migration

From the root of our project directory, run php artisan make:model Order -m — this will create our Order model as well as a migration for our database table.

Let’s start with our migration. Open the migration created and modify the up method to look like this (the id column and timestamps should already be there):

<?php

use
Illuminate\Support\Facades\Schema;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Database\Migrations\Migration;

class CreateOrdersTable extends Migration
{
/**
* Run the migrations.
*
*
@return void
*/
public function up()
{
Schema::create('orders', function (Blueprint $table) {
$table->increments('id');
$table->string('description', 50);
$table->string('po_number', 50);
$table->date('due_date');
$table->string('final_size');
$table->string('paper_weight');
$table->unsignedInteger('quantity');
$table->string('final_form', 50);
$table->string('contact', 50);
$table->string('status');
$table->unsignedInteger('user_id');
$table->foreign('user_id')->references('id')
->on('users');
$table->text('long_description')->nullable();
$table->text('delivery_instructions')->nullable();
$table->text('media_link')->nullable();
$table->timestamps();
});
}

/**
* Reverse the migrations.
*
*
@return void
*/
public function down()
{
Schema::dropIfExists('orders');
}
}

With the migration file updated, run php artisan migrate to create the database table.

Order Model

We have a few changes to make to our Order model:

  • Tell Laravel to treat due_date as a date
  • Set our fillable property
  • Set our default status value using the attributes property
  • Define our creator relationship to the User model

Update the Order model to look like this:

<?php

namespace
App;

use Illuminate\Database\Eloquent\Model;

class Order extends Model
{
protected $dates = [
'due_date'
];

protected $fillable = [
'description', 'po_number', 'due_date', 'final_size',
'paper_weight', 'quantity', 'final_form', 'contact',
'long_description', 'delivery_instructions', 'media_link'
];

protected $attributes = [
'status' => 'new'
];

public function creator()
{
return $this->belongsTo(User::class, 'user_id', 'id');
}
}

Routing

We need a way to handle the post request from Docparser. Let’s add a new route to routes/api.php to route the request to a controller we’ll create in the next step. I’ve reworked the routes a little with a group where we can include the default user routing as well as our new docparser routing:

Route::middleware('auth:api')->group(function() {

Route::get('/user', function (Request $request) {
return $request->user();
});

Route::post('/docparser/webhook',
'DocparserController@webhook');

});

That takes care of the routing, let’s move on to creating our controller by running php artisan make:controller DocparserController and create a webhook method with this:

<?php

namespace
App\Http\Controllers;

use Illuminate\Http\Request;

class DocparserController extends Controller
{
public function webhook(Request $request)
{
info($request);

return response()->json('OK');
}
}

For now, we’ll just log the request so we can see what’s being sent by Docparser.

Docparser

Documents are a staple of business and as much as technology has changed over recent years with more applications exposing an API, you’re still most likely going to have to work with documents — and in many cases, those documents will be PDF. PDF is the de facto format for any important document. Needless to say, PDF documents aren’t going anywhere for a long time.

In this scenario, the customer is sending a PDF purchase order and most companies don’t want to cause friction with their customers to ask for purchase orders in some other format — they’re just happy to get them. That’s what makes Docparser so great — companies can continue to follow their internal processes and we as technologists can give their business partners a way to streamline the handling of those documents.

Creating our document parser

If you haven’t already, log in to your Docparser account or signup for a free trial.

Once you log in, on the default page, you’ll have an option to create a new parser — click Create New Parser.

On the create page, you’ll be presented with several parser types to choose from — you should choose the one that best matches your use case. If you don’t see any that are close, just choose Miscellaneous to start with an empty parser.

Note: Don’t worry if you realize you’ve chosen the “wrong” one — it won’t preclude you from doing what you need to do. Experiment with different parser types when you’re getting started. As we work through this, you’ll see examples of what the parser type does for you.

Because our use case is a purchase order, I’ll select the Purchase Orders type.

Select Purchase Orders and at the bottom of the page, change the Parser Name to Acme Purchase Order and click Click Here To Continue — the parser will be created and you’ll be asked to provide sample documents.

Here are links to the two documents I used:

Purchase Order 1

Purchase Order 2

Drag and drop or select at least two examples of the document you want to parse — once the documents are added, Docparser will start processing them.

Providing a representative sample of the actual document is critical. In this tutorial, we’ll be using high quality PDF documents but in the real world, things aren’t this perfect. For my client, in fact, some of the orders they receive are scanned PDF’s. Although Docparser can handle scanned PDF’s and has specific filters that can be applied to help with noise in the document, the more samples you have to test with, the better the final outcome will be.

After adding at least two samples, click the Continue To Parser button in the lower right of the screen — you’ll be taken to the Parser overview screen.

Parsing Rules

Let’s see the parser rules created because we started with the Purchase Orders parser — select Parsing Rules from the navigation.

If you used the documents I provided, there were most likely three rules created: PO Number, PO Date, and Totals:

Let’s discuss why these parsing rules were created. Although purchase orders vary in their format, most purchase orders have key data such as a PO Number. Most purchase orders have line items and each of those line items have amounts. When that’s the case, we usually want to know what the total of those line items are.

So now you can see why parsers have different parser types — because I chose the Purchase Orders type, the parser attempts to find key fields related to purchase orders.

Let’s start with the PO Number by clicking it in the parsing rules list — by clicking we enter edit mode for the rule.

Here’s another aspect of Docparser that I really like: we can see the document in each of it’s phases as the filters are applied. At the top, we see the document contents and as each filter is applied, we see the results of filter:

Note: If you are using the documents I provided and you see a different PO number, it’s likely that the parser is using a different document than what I used in this step. To change the document, click Change Document on the top right portion of the screen and choose the other document.

We can see that the PO number was correctly parsed from the document so this rule doesn’t need any filter changes.

Let’s move to the next rule, PO Date, by selecting it from the left navigation. In this case it found a date and correctly parsed it as a date but it’s not the PO Date, it happens to be the Due Date.

We need this value so let’s change the rule name to due_date so that it matches our Order model field name. We can do this by editing it directly in the left navigation.

After changing the name, click Save Parsing Rule at the bottom of the page.

I like to have the parsing rule name be the same as the column in the database so let’s stay in the Parsing Rule Editor and select and change the PO Number to be po_number. Finish by again clicking Save Parsing Rule at the bottom of the page.

After saving the po_number, hover over the Totals field and click the trash can to delete.

Now we need to add the other fields we need to parse from the document. Because of the layout of our purchase order form, we can leverage the Text Variable Position rule in the Generic Parsing Rules Preset group. This one works well for us because most of our data is listed just to the right of it’s label.

After selecting Text Variable Position, the document data will be added. A Text Filter will be added for Define Start Position and the option Text match: after is already selected. In the field to the right, enter Description since we need this value. Once you stop typing, the text filter will be applied and you should see Letter/Reply in the results of the last text filter. We’re almost there but let’s add another text filter to trim the blank spaces from our description.

Notice that by using Text Variable Position, the rule has a default end filter that’s responsible for knowing where to stop extracting the data for the field. The Define End Position should be set to End of line; we need this so keep it there.

Click Add Text Filter that’s below the Define End Position and choose Format & Refine Results -> Remove Blank Spaces:

The filter will be applied and the Letter/Reply text should now be left justified indicating that there are no blank spaces:

Great, you’ve created your first parsing rule! You can see how we can apply filters to fine tune our parser down to exactly what we want.

Now that you have the blueprint, go ahead and create parsing rules for final_size, paper_weight, quantity, final_form, and contact. We can parse these fields using the same rule we used for description, the only difference is for quantity. For quantity, add one more text filter to remove the comma since our number is formatted. Choose Add Text Field -> Search, Replace & Add -> Search & Replace and for Search for, enter , (comma) and leave and replace with empty:

Delivery instructions and long description

Delivery instructions and long description are a bit different so we’ll do these together. Let’s start with delivery instructions.

If you refer back to the documents, you’ll see that the delivery instructions are near the top of the form and have the label above the text. You’ll also notice that the text can extend beyond a single line so we’ll employ some additional filters to handle this.

Let’s add a new parsing rule and choose our new friend Text Variable Position. After selecting, a Define Start Position Text match:after will be created for us and let’s, as we have with the others, enter our field name Delivery Instructions.

When creating the rule, a Define End Position End of line was also created as a second filter but we need to change it from End of line to Text match: before and set the field name to PO#:

By adding this filter, we now have the text we want but just need to clean up things a bit.

We’ll add another filter after this filter to remove the empty lines. Chose Add Text Filter -> Format & Refine Results -> Remove Empty Lines.

The results of the this text filter should be this:

We have one more step to finish — we simply need to remove the line breaks. To do this, choose Add Text Filter -> Format & Refine Results -> Remove Line-Breaks:

Change the rule name to delivery_instructions and save the rule.

Finally, let’s handle the text labeled Order Description that we’ll map to our field long_description.

Choose Add Parsing Rule and again choose Text Variable Position. For Define Start Position Text match: after value, we’ll enter Order Description. Between our default filters Define Start Position and Define End Position, let’s add a filter to remove empty lines using Format & Refine Results -> Remove Empty Lines.

Everything looks fine now so we could go with this but what if our text was more than one line? While I’m in here, I want to add another filter after Remove Empty Lines to remove line breaks just as we did for delivery instructions. Because delivery instructions were multiple lines, it was clear we needed to do this.

This is where having as many sample documents as possible will help you avoid some pitfalls.

After Remove Empty Lines, choose Add Text Filter -> Format & Refine Results -> Remove Line-Breaks. We won’t see any difference on this document but we’ll know we can handle multi-line order descriptions if it happens.

Change the rule name to long_description to match our database column and save the parsing rule.

We now have parsing rules for all of the data we need from the document:

Next, let’s go to our Parser overview screen — after navigating, we should see our menu on the left with Statistics, Documents, Parsing Rules, etc. Choose Documents.

While we were creating rules, Docparser was processing our documents to apply the new rules we created. If you don’t have any documents in the Processed tab and you see them in Data Parsing Queue, just wait a minute or so until they are processed.

Here’s what I see:

Select the first document in the list — the slide out will show the parsed data:

You can see the data from the document and can switch between all the documents to ensure the data was parsed as you want. Notice two things here: first, you have access to other options including seeing the original file and second, because our due_date is defined as a date, Docparser captures the actual text as well as the ISO8601 format.

Switching between documents, I can see that the data was parsed on each document as I need it. If your parser needs changes, go ahead and make them now. You should have enough knowledge at this point to work through and dial in exactly what you need.

Now that our parsing rules are done, let’s move on to the document parser’s webhook to notify our app of the parsed document.

Local Development Tunneling

Before we move on to creating our webhook, we need a way to tunnel the request to our development machine. There are other options out there but I use ngrok. Tunneling is beyond the scope of this post but for using with Homestead, from my ngrok directory, I run:

./ngrok http 192.168.10.10:80 -host-header=docparser.test

After running, I’ll get a URL I can use to tunnel the request to Homestead:

Webhook

On the parser’s overview page, choose Integrations. Click Create an integration and on the Add New Webhook Integration page, choose SIMPLE WEBHOOK — notice all of the other integration options while you’re here.

On the Create Simple Webhook page, add a name for the webhook and the tunnel URL including the route to your DocparserController’s webhook method, which should be your ngrok URL with ‘/api/docparser/webhook’.

Let’s go ahead and Save & Test. At this point, we’re hoping this fails because we don’t have any authentication in place.

On the Test Custom Webhook dialog, choose Send test data. We should see a response back with something went wrong with a 302 status code. Because we didn’t authenticate, we’re being redirected to the login page. Let’s fix that.

After closing the failed dialog, you’ll be redirected to the Integrations page and you’ll see the webhook you just created. Hover over the webhook and choose Edit. On the Edit screen, click Advanced Options to expand. In the Additional Headers section, let’s add our authorization header by entering:

Authorization: Bearer <token>

The token is the value we were prompted with in a previous step when we created a Personal Access Token in our Laravel application. Copy and paste the token here. Mine looks like this:

Let’s choose Save & Test and on the Test Custom Webhook dialog, choose Send test data. We should receive a successful response with a 200 status code:

Great! We’ve proven that authentication is needed as well as our routing is working. Since we initially set our webhook method to log the response, head back over to your Laravel project and check your log. You should see a response that has some meta data at the beginning and then the fields we parsed:

Notice that due date is the rule name with “_iso8601”.

Inserting the order in our database

Now we know what Docparser is sending so let’s go ahead and insert in our database.

Back in our DocparserController, update the webhook method to this:

public function webhook(Request $request)
{
$order = new Order();

$order->fill($request->all());

$order->due_date = $request->input('due_date_iso8601');

$order->creator()->associate(auth()->user());

$order->save();

return response()->json('OK');
}

We’re creating a new Order model and using it’s fill method for most of the columns. Because the due_date has the suffix from Docparser, we’re explicitly setting this field and associating the current user with the creator relationship.

Head back to Docparser and on the parser’s Integration page, hover over the webhook and choose Test. On the Test Custom Webhook dialog, choose Send test data. We should receive another 200 successful response. Check the database to ensure the record was inserted successfully:

Testing the complete flow

Now that all of the pieces are in place, back in Docparser, on the parser’s Settings page, select tab E-Mail Reception. Each parser has it’s own email address. Copy the email address to your clipboard and head over to your email client. Create an email to this email address, attach one of the documents, and send.

Wait a minute or two and then check your database, you should see a new record.

Conclusion

You’ve learned how to create a simple Laravel application, how to use Passport’s personal access token, and created a parser in Docparser.

Of course, we’ve went the happy path but you should have some exception handling in your app when things don’t go as expected. It’s worth checking out the parser’s Settings -> Advanced Settings and turning on email alerts when your app responds with a 4xx or 5xx error message.

Continue to experiment with Docparser, I believe it can meet all of your document parsing needs.

--

--

Brian Kidd

Owner at Magnigen. Laravel, Vue, and AWS for web, Talend for data integration.