I've created a series of templates for different types of documents (passports, driver's licenses, receipts, insurance policies) and I want to be able to scan the document and have the OCR 1) determine what template it should be on 2) extract relevant information to fill out the "form" aka template so the user doesn't have to
I've already provided fields of what information should be extracted, i.e., passport
- number
- name
- expiration
- country of issue
- place to attach a picture
How do you find an item then? I've read numerous research studies that prove people still prefer navigation over search. Ofer Bergman has done a lot of work.
The thought is that collections should be homogeneous so that for most use cases,
* The number of items would be so small that search would not be necessary, e.g. a collection personal projects
* The items would fall naturally into a timeline so you can search trivially by scrolling, e.g. RAW photos grouped together by month
* The items would be easily identified by name, e.g. MP3 files grouped by album (why am I still holding onto these?)
The intention is not to upload 1000s of individual files in a jumble, but instead, a much smaller number of archives. E.g. If you are archiving the previous semester's homework assignments, instead of uploading a bunch of random documents, each item would be an archive of the assignments from a particular class. You could tag each item with 'Fall 2020' if you want to improve the organization. I'm intending to make that an easy process, where you point the program at a directory and it packages, tags and uploads each subfolder.
Indium tin oxide is a byproduct of the zinc oxide refinement process so it's doesn't see sustainable/enough to go around which seems like graphene is a better choice. Right now, they're primarily made of perovskite which is hybrid inorganic/organic and a lil toxic. MIT has a few studies on using graphene for OPV.
If I'm implementing search in an application and want to use NLP, do I need to train the search or are these solutions already ready to go? I'm not sure how other people do it/how search works/if you need to tell it what to do.
Well, of course the engine needs to have access to some corpus to search on. So the general answer to your question is: yes, however this step typically not called "training" but "indexing".
Most engines will repeatedly index your contents with crawlers or similar.