Hacker News new | past | comments | ask | show | jobs | submit | brittpart_'s comments login

I've created a series of templates for different types of documents (passports, driver's licenses, receipts, insurance policies) and I want to be able to scan the document and have the OCR 1) determine what template it should be on 2) extract relevant information to fill out the "form" aka template so the user doesn't have to


I've already provided fields of what information should be extracted, i.e., passport - number - name - expiration - country of issue - place to attach a picture


What's a flat collection hierarchy?


I just mean that a collection has no subfolders or other structure. It is simply a list of items like an S3 bucket.


How do you find an item then? I've read numerous research studies that prove people still prefer navigation over search. Ofer Bergman has done a lot of work.


The thought is that collections should be homogeneous so that for most use cases,

* The number of items would be so small that search would not be necessary, e.g. a collection personal projects

* The items would fall naturally into a timeline so you can search trivially by scrolling, e.g. RAW photos grouped together by month

* The items would be easily identified by name, e.g. MP3 files grouped by album (why am I still holding onto these?)

The intention is not to upload 1000s of individual files in a jumble, but instead, a much smaller number of archives. E.g. If you are archiving the previous semester's homework assignments, instead of uploading a bunch of random documents, each item would be an archive of the assignments from a particular class. You could tag each item with 'Fall 2020' if you want to improve the organization. I'm intending to make that an easy process, where you point the program at a directory and it packages, tags and uploads each subfolder.


You have to launch the bad thing first and as fast as possible; there's no way around but through.


I agree. launch and improve iteratively. Don’t waste time on perfection. Launch only core features and build from there


What's your process been for deciding core features?


Indium tin oxide is a byproduct of the zinc oxide refinement process so it's doesn't see sustainable/enough to go around which seems like graphene is a better choice. Right now, they're primarily made of perovskite which is hybrid inorganic/organic and a lil toxic. MIT has a few studies on using graphene for OPV.

Do you know what they use in BIPV?


Trust the timing of the universe - sounds woo woo but that's how I get through. Nothing that's meant to be your's will pass you.


So essentially the UI is called SSO and the authentication happens with Oauth2/OIDC - that's the combo Apple uses.

Do you know what the barrier to entry is for a company to integrate another company's SSO?


I've recently picked up origami and love the need for precision


What I've heard, is that you need to demonstrate that you have an ability to hire strongly.


So I'm screwed? :) I've hired in my passed and lead small teams. I put that in the slide.... but it's just so empty with just me and my credentials


Golden, so helpful - thank you!


Forgive me if this doesn't make sense:

If I'm implementing search in an application and want to use NLP, do I need to train the search or are these solutions already ready to go? I'm not sure how other people do it/how search works/if you need to tell it what to do.


Well, of course the engine needs to have access to some corpus to search on. So the general answer to your question is: yes, however this step typically not called "training" but "indexing".

Most engines will repeatedly index your contents with crawlers or similar.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: