Hexiang (Frank) Hu
Ph.D. Student [at] USC [at]
Deep Learner
I am passionate with Machine Learning, Computer Vision as well as Natural Language Processing. My objective is to combine the power of vision and language for robots.


Hexiang Hu is a Computer Science Ph.D. student in Viterbi School of Engineering at University of Southern California (USC), working with Prof. Fei Sha. Prior to this, He was a Ph.D. student in Henry Samueli School of Engineering and Applied Science at University of California, Los Angeles (UCLA). He earned his Bachelor’s degrees in Computer Science from Zhejiang University and Simon Fraser University with honor. His research interests lie in the field of Machine Learning, Computer Vision and Natural Language Processing. [ Résumé ]


Summer 2018
Intern @ Facebook AI Research
Summer 2017
Applied Scientist Intern @ Amazon AI
Large Scale Machine Learning on Videos
Mentor: Dr. R. Manmatha
2017 -
PhD student @ USC
Large Scale Machine Learning, Vision and Language
Supervisor: Prof. Fei Sha
2016 - 2017
PhD student @ UCLA
Deep Learning, Vision
Supervisor: Prof. Fei Sha

Selected Publications

Cross-Modal and Hierarchical Modeling of Video and Text

Visual data and text data are composed of information at multiple granularity. In this paper, we investigate the modeling techniques for such hierarchical sequential data where there are correspondences across multiple modalities.

ECCV 2018 in München, Germany
Multi-Task Learning for Sequence Tagging: An Empirical Study

We study three general multi-task learning (MTL) approaches on 11 sequence tagging tasks. Our extensive empirical results show that in about 50\% of cases, jointly learning all 11 tasks improves either learning tasks independently or pairwise learning of tasks. We also show that pairwise MTL can inform us what tasks can benefit others or what tasks can be benefited if they are learned jointly. We additionally identify tasks that can always benefit others as well as tasks that can always be harmed by others.

Coling 2018 in Santa Fe, New-Mexico
[ pdf ] [ bib ]
Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets

We show the design of the decoy answers has a significant impact on how and what the learning models learn from the datasets. In particular, the resulting learner can ignore the visual information, the question, or the both while still doing well on the task.

NAACL-HLT 2018 (Oral) in New Orleans, Louisiana
Learning Answer Embedding for Visual Question Answering

We propose a novel probabilistic model for visual question answering.

CVPR 2018 in Salt Lake City, Utah
Cross-Dataset Adaptation for Visual Question Answering

We investigate the problem of cross-dataset adaptation for visual question answering.

CVPR 2018 in Salt Lake City, Utah
Compressed Video Action Recognition

Training robust deep video representations has proven to be much more challenging than learning deep image representations and consequently hampered tasks like video action recognition. Motivated by the fact that the superfluous information can be reduced by up to two orders of magnitude with video compression techniques, in this work, we propose to train a deep network directly on the compressed video, devoid of redundancy

CVPR 2018 (Spotlight) in Salt Lake City, Utah
FastMask: Segment Multi-scale Object Candidates in One Shot

We present a novel segment proposal framework, namely FastMask, which takes advantage of the hierarchical structure in deep convolutional neural network to segment multi-scale objects in one shot. Through leveraging feature pyramid and sliding-window region attention, we made instance proposal not only fast but more accurate.

CVPR 2017 (Spotlight) in Honolulu, Hawaii
Learning Structured Inference Neural Networks with Label Relations

We propose a generic structured model that leverages diverse label relations to improve image classification performance. It employs a novel stacked label prediction neural network, capturing both inter-level and intra-level label semantics. The design of this framework naurally extends to leverage partial observations in the label space to inference the rest label space.

CVPR 2016 in Las Vegas, Nevada