Patents provide a rich source of technical vocabulary, product names, and person names that complement other data sources used for machine translation.
We have processed patents from the United States Patent and Trademark Office and from the European Patent Organisation . By matching up related patents in different languages, we can obtain parallel text that is useful for training machine translation systems. Data is available to download as matched sentences from pairs of languages.
We use information from the European Patent Organisation database to identify patents from different countries that belong to the same "family". These are often patents for the same invention that have been registered in different jurisdictions.