Philipp Poschmann



Jena Parser Application (JenParA)

Social scientists recently started discussing the utilization of text-mining tools as being fruitful for content analysis. To enable social scientists who are not familiar with computer science and software development to use sophisticated text-mining tools, we aim at providing a fully developed and reliable software application that enables grammatical parsing and the extraction of semantic triplets (i.e., subject-verb-object relations). For grammatical parsing we draw on state-of-the-art software, namely Stanford’s CoreNLP.

Notably, our software’s output, which depicts semantic triplets, is suitable for researchers to add topic modeling results from each software package that they like to use. We decided to include our own topic-modeling scripts based on the “lda” Python package to make topic modeling broadly accessible for social scientists. We thus provide the software scripts needed for executing topic modeling and validating its results.

More Information

Jena Entity Extraction Application (JEnExtrA)

Social scientists contemporarily explore sophisticated text-mining tools for big data analysis. One class of tools attracting considerable attention is named entity recognizers, which provide the ability to detect social actors and classify them as persons and organizations. However, it remains a technical challenge to automatically disambiguate (who is referred to in a text?) and specify (which demographic characteristics are present?) social actors. JEnExtrA is a reliable and accurate software architecture for social scientists who are interested in automatically detecting, disambiguating, and demographically specifying social actors in big data. The software architecture utilizes the online encyclopedia Wikipedia.

More Information


Jena Organization Corpus (JOCo)

JOCo is a corpus of annual reports (ARs) and corporate social responsibility (CSR) reports of US American, British and German business organizations, i.e. corporations, which are listed in the main indices such as DOW JONES, S&P 500, and NASDAQ 100 for the USA; FTSE, FTSE AIM 100, FTSE 250 for Great Britain; DAX, MDAX, and TecDAX for Germany. All reports are in English: the German corporations provide reports in English for their international audiences as well.

More Information