Research Experience

During my master’s study, I was fascinated by social computing, when I started a project in [Prof. Krcmar's group](https://www.i17.in.tum.de/en/team/prof-dr-krcmar-helmut/) at the Technical University of Munich about analyzing developers’ reactions towards any change in application programming interfaces (APIs). This project aimed to bridge the gap between API providers and consumers to benefit both of them by generating a list of optimal criteria an API provider should follow to gain the popularity in developers’ circles. I employed rule-based feature engineering for pinpointing those criteria over annotated conversations of developers. which turned out to be time-consuming and low quality. Through intensive research, I realized that the distributed representation of a word or sentence in a text is essential. This prompted me to explore neural network-based models over social media to figure out intrinsic patterns over conversations under the auspices of deep learning; inspiring me to join [PD Dr. Georg Groh's Social Computing group](https://www.social.in.tum.de/en/group/) at the Technical University of Munich. I explored different unsupervised or pre-trained neural models, such as Google's XLING, attention-based clustering model, etc. and studied the extent to which they are beneficial in understanding societal contexts, social relations and cultural differences across regions. The outcome of the research led me to realize the nuances of different societal aspects of the world which are governed diverse languages and cultures. Thus, it is immensely beneficial to model cross-lingual cultural elements in language in order to better understand behavioral disparities and opinions. This proposal landed me an opportunity of doing a master’s thesis on multilingual opinion mining under the supervision of PD Dr. Georg Groh at Social Computing group. During my thesis, I realized that the word embedding model trained on large voluminous generic data was not able to understand the intricate contexts of the domain-specific corpus. After scouting through previous works and published literature I discovered that orthogonal Procrustes and canonical correlation analysis can align generic word embedding space with domain-specific, so that the lack of context can no longer be an obstacle. Amalgamation of mathematical foundation with NLP in the context of social computing made me think about other methods or concept that could be utilized to make deep learning-based models more effective and efficient in understanding optimized societal contexts and relations. Furthermore, through my thesis, I realized that there is no unique evaluation metric to measure the extent to which representative words describing a topic are coherent. I even noticed that depending on the evaluation metric, analysis and results are changed. This problem persists from the perspective of topic modeling. Therefore, I want to utilize my analytical expertise and research experience in NLP to alleviate such critical difficulties as well as explore novel approaches to generate a robust and fruitful solution in the direction of helping the human society.

Share on

Twitter Facebook LinkedIn

Mainak Ghosh

Share on