Yimeng Gu

I am a third-year Computer Science PhD student at Queen Mary University of London, where I am advised by Dr. Gareth Tyson and Dr. Ignacio Castro.

Prior to joining Queen Mary, I obtained my B.E. from Beihang University and my two M.S. from Carnegie Mellon University and The Hong Kong University of Science and Technology.

I work on natural language processing with an application in misinformation detection. I was a research intern in Autodesk AI lab in summer 2023.

Email  /  CV  /  Google Scholar

profile photo
News

Jan 2024: One paper accepted to ACM WebSci 2024!
Nov 2023: One paper accepted to ICWSM 2024!
Apr 2023: I will be interning at Autodesk Research this summer!
Mar 2022: I ranked 16/69 on sub-task A of SemEval 2022 Task 5: Multimedia Automatic Misogyny Identification!

Research

I'm interested in the broad topic of natural language processing.

Detecting Multimodal Fake News with Gated Variational AutoEncoder
Yimeng Gu, Ignacio Castro, Gareth Tyson
ACM WebSci, 2024

Abstract to be added.

Making the Pick: Understanding Professional Editor Comment Curation in Online News
Yupeng He, Yimeng Gu, Ravi Shekhar, Ignacio Castro, Gareth Tyson
AAAI ICWSM, 2024

This paper studies the growing use of professional editor-curation for user-generated comments. We find that editor-pick comments tend to be longer, more relevant to the article, positive in sentiment, and contain low toxicity. Our analysis further reveals that editors within different news sections exhibit differing criteria when they perform comment selection. Thus, we finally propose a set of models that can automatically identify good candidate editor-picks.

MMVAE at SemEval-2022 Task 5: A Multi-modal Multi-task VAE on Misogynous Meme Detection
Yimeng Gu, Ignacio Castro, Gareth Tyson
NAACL SemEval workshop, 2022
[code] [paper] [video]

We propose a Multi-modal Multi-task Variational AutoEncoder (MMVAE) to learn an effective co-representation of visual and textual features of memes in the latent space, and determine if the meme contains misogynous information and identify its fine-grained categories.

Automating Claim Construction in Patent Applications: The CMUmine Dataset
Ozan Tonguz, Yiwei Qin, Yimeng Gu, Hyun Hannah Moon
EMNLP NLLP workshop, 2021
[paper]

We first create a large dataset known as CMUmineâ„¢ and then demonstrate that, using NLP and ML techniques the claim construction process in patent applications can be automated.

Projects
Identifying Mechanisms in Fusion360 Assemblies

We build AutodEncoder + latentGAN to learn the probablistic distributions of the neighbouring parts of a given part in the assembly. We evaluate the model performance both quantitively (IoU) and qualitatively. Our approach is able to predict the neighboring parts for the part query from unseen datasets.

Invited Talks

Autodesk Research Connections [Aug 2023]: Identifying Mechanisms in Fusion360 Assemblies

Teaching

ECS765P Big Data Processing - Spring 2022 (TA)

Miscellaneous

In my spare time, I like playing tennis, badminton, ping-pong, basketball and working out. I like watching almost all kinds of sport games.

I'm also a museum lover, especially for natural history museums and museums related to humanity culture. Some cool museums I have been to: the Qsingdao Beer Museum, the BMW Vehicle museum.


Last update on Feb 28th, 2024. Template credits to Jon Barron.