Our deep-learning based patent classifier automatically categorizes patent documents
into 656 subclasses under the hierarchical Cooperative Patent Classification (CPC) scheme.
Using Neural Network based text classification methodology,
our classifier was trained in a large volume of patent documents written in English.
Patent documents are usually very long and most of words appearing in a patent are used very rarely.
Patents are structured documents composed of several sections, and each section of a patent is differently organized
in size and words. Considering these characteristics of patent documents,
we have experimented diverse data processing techniques tailored for automatic patent classification.
Our classifier assigns multiple CPC codes to a patent. In principal, one patent could have more than one category
at the same time, and there is no fixed number of categories to be assigned to each patent.
Currently, our classifier is designed to display minimum five possible CPC codes.