Geo-innovation paper

Tasks

Load and understand the data
Produce a simple linechart with activity / DL activity
Produce metrics of activity by geography and compare periods



In [1]:

    
%matplotlib inline

#Some imports
import time
#import xml.etree.ElementTree as etree
from lxml import etree

import feedparser

#Imports
#Key imports are loaded from my profile (see standard_imports.py in src folder).

#Paths

#Paths
top = os.path.dirname(os.getcwd())

#External data (to download the GRID database)
ext_data = os.path.join(top,'data/external')

#Interim data (to place seed etc)
int_data = os.path.join(top,'data/interim')

#Figures
fig_path = os.path.join(top,'reports/figures')

#Models
mod_path = os.path.join(top,'models')


#Get date for saving files
today = datetime.datetime.today()

today_str = "_".join([str(x) for x in [today.day,today.month,today.year]])

1. Load data



In [2]:

    
#Load papers
papers = pd.read_csv(ext_data+'/matched_data/compsci_stats_with_tag.csv')

#Focus on good matches
papers = papers.loc[papers.score>0.99,:]
#Focus on the present
papers = papers.loc[(papers.year<2018) & (papers.year>1990),:]

2. Descriptive analysis



In [3]:

    
papers['citations_norm'] = [x/(2018-y) for x,y in zip(papers.citations,papers.year) ]



In [4]:

    
#Quantiles
papers.citations.describe()









    Out[4]:





count    142439.000000
mean         13.066765
std          92.063435
min           0.000000
25%           0.000000
50%           2.000000
75%           9.000000
max       15810.000000
Name: citations, dtype: float64



In [5]:

    
papers.citations_norm.describe()









    Out[5]:





count    142439.000000
mean          2.400834
std          12.527303
min           0.000000
25%           0.000000
50%           0.600000
75%           2.000000
max        1130.333333
Name: citations_norm, dtype: float64



In [6]:

    
papers['citations_high'] = [x>9 for x in papers.citations]
papers['citations_high_norm'] = [x>2 for x in papers.citations_norm]



In [7]:

    
#Cross tab papers by topic
papers.groupby('topic')['citations_norm'].aggregate(['mean','median'])



In [8]:

    
#Cross tab papers with high citations normalised
pd.crosstab(papers.citations_high_norm,papers.topic,normalize=1)









    Out[8]:







  
    
      topic
      0
      1
    
    
      citations_high_norm
      
      
    
  
  
    
      False
      0.771871
      0.729491
    
    
      True
      0.228129
      0.270509



In [9]:

    
len(set([x for x in papers.first_tag if 'cs.' in x]))









    Out[9]:





40



In [10]:

    
#Want to focus on a small number of areas where DL has become more important (there was a paradigm shift)

100*pd.crosstab(papers.first_tag,papers.topic,normalize=0).sort_values(1,ascending=False)[:10]



In [11]:

    
#Focus on CV, CL, ML, LG
#Areas which were not by definition focused on neural networks (like NE)

selected = ['cs.CV','cs.CL','stat.ML','cs.LG']

papers['selected_topics'] = [x in selected for x in papers.first_tag]

#Crosstab
#Focus on selected papers
papers_selected =papers.loc[papers.selected_topics==True,:]

pd.crosstab(papers_selected.year,papers_selected.topic)



In [12]:

    
len(papers_selected)
np.sum(papers_selected.topic==1)









    Out[12]:





12979



In [13]:

    
#Simple chart with activity

#Activity
fig,ax = plt.subplots(nrows=2,figsize=(10,6),sharex=True)

year_tots = pd.crosstab(papers_selected.year,papers_selected.topic)

year_tots_recent = year_tots[year_tots.index>=2000]

year_tots_recent.plot.bar(stacked=True,ax=ax[0])

ax[0].legend(labels=['Other','Deep Learning'],title='Topic')
ax[0].set_title("Number of papers per year and topic")


year_props = year_tots_recent.apply(lambda x: x/x.sum(),axis=1)

#Activity

year_props.plot.bar(stacked=True,ax=ax[1])

ax[1].legend(labels=['Other','Deep Learning'],title='Topic')
ax[1].set_title("Number of papers per year and topic (% of total in selected topics)")
ax[1].legend_.remove()

plt.tight_layout()

plt.savefig(fig_path+'/paper_all_papers.pdf')



In [14]:

    
#OK. Let's look at highly cited papers in these domains

highly_cited = papers_selected.loc[(papers_selected.citations_high_norm==True) & 
                                   (papers_selected.year>=2000),:]

fig,ax = plt.subplots(nrows=2,figsize=(10,6),sharex=True)

year_tots_cited = pd.crosstab(highly_cited.year,highly_cited.topic)
year_tots_cited.plot.bar(stacked=True,ax=ax[0])

ax[0].legend(labels=['Other','Deep Learning'],title='Topic')
ax[0].set_title("Number of highly cited papers per year and topic")


year_props_cited = year_tots_cited.apply(lambda x: x/x.sum(),axis=1)

#Activity
year_props_cited.plot.bar(stacked=True,ax=ax[1])

ax[1].legend(labels=['Other','Deep Learning'],title='Topic')
ax[1].set_title("Number of highly cited papers per year and topic (% of total in selected topics)")

ax[1].legend_.remove()

plt.tight_layout()

plt.savefig(fig_path+'/papers_highly_cited.pdf')

There is a sense of revolution from ~ 2014. Lots of papers and lots of highly cited papers

3. Competitiveness analysis

Now we will:

Calculate the competitiveness of different institutes by year. We then subset to focus on top (by total activity) institutes in selected areas



In [15]:

    
def create_rca(df):
    '''
    Takes a df with cells = activity in col in row and returns a df with cells = rca
    
    '''
    
    area_activity = df.sum(axis=0)
    area_shares = area_activity/area_activity.sum()
    
    rca = df.apply(lambda x: (x/x.sum())/area_shares, axis=1)
    return(rca)




def extract_rca(df,var,year):
    '''
    Takes a df and returns a lq for the year
    
    '''
    
    #Subset
    if var=='pid':
        subset = df.loc[df.year==year,:].groupby(['institute','first_tag'])['pid'].count().reset_index(drop=False)
        
    if var=='citations_high_norm':
        subset = df.loc[df.year==year,:].groupby([
            'institute','first_tag'])[var].sum().reset_index(drop=False)
    
    #Pivot to get rca
    pivoted = pd.pivot_table(subset,index='institute',columns='first_tag',values=var)
    pivoted.fillna(0,inplace=True)
    
    #rca
    rca = create_rca(pivoted)
    rca['year']=year
    
    return(rca)



In [16]:

    
rca_papers = pd.concat([extract_rca(
    papers,'pid',y) for y in np.arange(2005,2018)]).reset_index(drop=False)
rca_citations = pd.concat([extract_rca(
    papers,'citations_high_norm',y) for y in np.arange(2005,2018)]).reset_index(drop=False)



In [17]:

    
my_orgs = papers['institute'].value_counts().sort_values(ascending=False)[:100].index

rca_papers = rca_papers.loc[[
    x in my_orgs for x in rca_papers.institute],['institute','year']+selected].reset_index(drop=True)

rca_papers['period'] = ['2005_12' if x <2012 else '2012_17' for x in rca_papers.year]

rca_papers_averaged = pd.melt(rca_papers.drop(
    'year',axis=1),id_vars=['institute','period']).groupby(
    ['institute','period','variable'])['value'].mean().reset_index(drop=False)

rca_2005_12 =pd.pivot_table(rca_papers_averaged.loc[rca_papers_averaged.period=='2005_12',:],
                            index='institute',columns='variable',values='value')

rca_2005_12.corr()



In [18]:

    
#This is already quite interesting. Convergence in specialisations
rca_2012_17 =pd.pivot_table(rca_papers_averaged.loc[rca_papers_averaged.period=='2012_17',:],
                            index='institute',columns='variable',values='value')

rca_2012_17.corr()



In [19]:

    
#Create scatters

rca_papers_averaged_wide = pd.pivot_table(rca_papers_averaged,
                                         index=['institute','variable'],
                                          columns='period',values='value').reset_index(drop=False)

fig,ax = plt.subplots(nrows=4,figsize=(4,20))

for num,x in enumerate(set(rca_papers_averaged_wide.variable)):
    subset = rca_papers_averaged_wide.loc[rca_papers_averaged_wide.variable==x,:]
    
    ax[num].scatter(subset['2012_17'],subset['2005_12'],alpha=0.7)
    
    corr = np.round(subset['2012_17'].corr(subset['2005_12'],method='spearman'),4)
    
    ax[num].set_title('Period comparison for topic {x} \n (correlation={corr})'.format(x=x,corr=corr))
    
    ax[num].set_xlabel('RCA_2012_2017')
    ax[num].set_ylabel('RCA_2005_2012')
    

    
    ax[num].hlines(y=np.median(subset['2005_12']),xmin=0,xmax=np.max(subset['2012_17']),color='red',linestyle='dashed')
    ax[num].vlines(x=np.median(subset['2012_17']),ymin=0,ymax=np.max(subset['2005_12']),color='red',linestyle='dashed')
    

plt.tight_layout()



In [20]:

    
#Produce charts of DL adoption by area

papers_plot = papers_selected.loc[papers_selected[
    'year']>2004].groupby(['year','topic','first_tag'])['pid'].count().reset_index(drop=False)

fig,ax = plt.subplots(nrows=4,figsize=(7,20))

for num,x in enumerate(set(papers_plot['first_tag'])):
    #Subset
    subset = papers_plot.loc[papers_plot.first_tag==x,:]
    
    #Pivot
    pivoted = pd.pivot_table(subset,index='year',columns='topic',values='pid')
    pivoted_props = pivoted.apply(lambda x: x/x.sum(),axis=1)
    
    pivoted_props.plot.bar(stacked=True,ax=ax[num])
    
    ax[num].set_title("Importance of DL in topic {x}".format(x=x))
    ax[num].legend(labels=['Other','DL'],title='Topic')
    
plt.tight_layout()



In [21]:

    
#Put everything together in a single chart

fig,ax = plt.subplots(nrows=4,ncols=2,figsize=(8,12))

topics_sorted = ['cs.CV','cs.CL','cs.LG','stat.ML']

for num,x in enumerate(topics_sorted):
    subset = rca_papers_averaged_wide.loc[rca_papers_averaged_wide.variable==x,:]
    
    ax[num,1].scatter(subset['2012_17'],subset['2005_12'],alpha=0.7)
    
    corr = np.round(subset['2012_17'].corr(subset['2005_12'],method='spearman'),4)
    
    ax[num,1].set_title('Period comparison for topic {x} \n (correlation={corr})'.format(x=x,corr=corr))
    
    ax[num,1].set_xlabel('RCA_2012_2017')
    ax[num,1].set_ylabel('RCA_2005_2012')
    

    
    ax[num,1].hlines(y=np.median(subset['2005_12']),xmin=0,xmax=np.max(subset['2012_17']),color='red',linestyle='dashed')
    ax[num,1].vlines(x=np.median(subset['2012_17']),ymin=0,ymax=np.max(subset['2005_12']),color='red',linestyle='dashed')
    
    #Subset
    subset = papers_plot.loc[papers_plot.first_tag==x,:]
    
    #Pivot
    pivoted = pd.pivot_table(subset,index='year',columns='topic',values='pid')
    pivoted_props = pivoted.apply(lambda x: x/x.sum(),axis=1)
    
    pivoted_props.plot.bar(stacked=True,ax=ax[num,0])
    
    ax[num,0].set_title("Importance of DL in topic {x}".format(x=x))
    ax[num,0].legend(labels=['Other','DL'],title='Topic',loc=3)
    

    
plt.tight_layout()

#fig.suptitle('Levels of DL activity and changes in performance before and after DL break \n (by research category)',
#             fontsize=14)
    


plt.savefig(fig_path+'/activity_plots.pdf')



In [22]:

    
#Map this as a set of scatters.

institute_geo = papers_selected.drop_duplicates('institute')[['institute','lat','lon']]

rca_papers_averaged_wide_geo = pd.merge(rca_papers_averaged_wide,institute_geo,
                                       left_on='institute',right_on='institute')


fig,ax = plt.subplots(nrows=4,ncols=2,figsize=(20,20))

for num,x in enumerate(topics_sorted):
    subset = rca_papers_averaged_wide_geo.loc[rca_papers_averaged_wide_geo.variable==x,:]
    
    #First plot
    ax[num,0].scatter(subset['lon'],subset['lat'],s=50*subset['2005_12'],alpha=0.8,c=subset['2005_12'])
    ax[num,0].set_title("RCA for topic {top}, 2005-2012".format(top=x))
    
    
    #Second plot
    ax[num,1].scatter(subset['lon'],subset['lat'],s=50*subset['2012_17'],alpha=0.8,c=subset['2012_17'])
    ax[num,1].set_title("RCA for topic {top}, 2012-2017".format(top=x))



In [23]:

    
#Load the grid data
with open(ext_data+'/grid/grid.json','r') as infile:
    grid = json.load(infile)['institutes']

country_continents = pd.read_csv(ext_data+'/country_continent.csv')
country_continents.rename(columns={'iso 3166 country':'country',
                                  'continent code':'continent'},inplace=True)



In [24]:

    
grid_has_name = [x for x in grid if 'name' in x.keys()]

top_cs_discs = pd.concat([
    pd.DataFrame({'name':x['name'].lower(),
                 'city':x['addresses'][0]['city'],
                 'country':x['addresses'][0]['country_code']},index=[0])
                          for x in grid_has_name if x['name'].lower() 
    in set(rca_papers_averaged_wide.institute)]).reset_index(drop=True)



In [25]:

    
rca_country = pd.merge(pd.merge(rca_papers_averaged_wide,
                       top_cs_discs,left_on='institute',
                       right_on='name').drop('name',axis=1),
                       country_continents,left_on='country',right_on='country')

rca_country.fillna('AM',inplace=True)



In [26]:

    
set(rca_country.continent)

color_lookup = {'AF':'grey','AM':'blue','AS':'green','EU':'orange','OC':'red'}



In [27]:

    
#I want to turn this into a bump chart.
#How will it work?

fig,ax= plt.subplots(ncols=4,figsize=(20,15))

for num,top in enumerate(topics_sorted):

    subset = rca_country.loc[rca_country.variable==top,:].reset_index(drop=True)

    rank_1 = subset['2005_12'].rank()
    rank_2 = subset['2012_17'].rank()

    for x in np.arange(0,len(rank_1)):
        ax[num].plot([rank_1[x],rank_2[x]],alpha=0.6,color=color_lookup[subset.loc[x,'continent']])
        
    ax[num].set_title('Organisation rankings \n in category {top}'.format(top=top),size=18)
    ax[num].set_xticks([0,1])
    ax[num].set_xticklabels(['2005_12','2012_17'],fontsize=14)
    
    #ax[num].legend(labels=color_lookup.keys())

plt.savefig(fig_path+'/bump_chart.pdf')

Load data for CIA analysis

Actions

Process abstracts: Tokenise, bigramise
Track DL trends
Identify policy worries



In [31]:

    
#Load data
arxiv_labelled = pd.read_csv(ext_data+'/arxiv_papers_with_label.csv')



In [32]:

    
#Imports



In [33]:

    
#Sentiment analysis

from nltk.sentiment.vader import SentimentIntensityAnalyzer
from gensim import models, corpora
from nltk.corpus import stopwords
import string

sid = SentimentIntensityAnalyzer()
stopwords = stopwords.words('English')

symbols = "|".join([x for x in string.punctuation])









    



/usr/local/lib/python3.5/site-packages/nltk/twitter/__init__.py:20: UserWarning: The twython library has not been installed. Some functionality from the twitter package will not be available.
  warnings.warn("The twython library has not been installed. "
/usr/local/lib/python3.5/site-packages/gensim/utils.py:1015: UserWarning: Pattern library is not installed, lemmatization won't be available.
  warnings.warn("Pattern library is not installed, lemmatization won't be available.")



In [34]:

    
def pre_process_text(text):
    '''
    Function to pre-process text - need to turn this into a utility one of these days! 
    
    '''
    cleaned = re.sub("\n"," ",str(text).lower())
    cleaned_no_signs = re.sub(r'[{x}]'.format(x=symbols),'',cleaned)
    
    
    tokenised = cleaned_no_signs.split(" ")
    no_sws = [x for x in tokenised if x not in stopwords and len(x)>0]
    
    return(no_sws)



In [35]:

    
#Tokenise the abstract text

arxiv_tokenised = [pre_process_text(x) for x in arxiv_labelled.summary]

Topic modelling



In [36]:

    
#Phrases transforms the tokens into phrases

#Trains phraaes
phrases = models.Phrases(arxiv_tokenised)

#Tranforms into bigrams
bigram = models.phrases.Phraser(phrases)



In [37]:

    
new_corpus = bigram[arxiv_tokenised]

dictionary = corpora.Dictionary(new_corpus)

#Remove tokens that appear in less than 5 docs
dictionary.filter_extremes(no_below=5)

#Create BOW
corpus_bow = [dictionary.doc2bow(x) for x in new_corpus]



In [39]:

    
?models.LdaMulticore



In [48]:

    
#Model container
#model_cont = []

#Loop
for x in [200,300]:
    print(x)
    #Train model
    mod = models.LdaModel(corpus_bow,num_topics=x,iterations=100,passes=5,
        id2word=dictionary)
    
    model_cont.append([x,mod])



In [58]:

    
with open(mod_path+'/initial_models.p','wb') as outfile:
    pickle.dump(model_cont,outfile)

Sentiment analysis



In [598]:

    
%%time

#Extract the polarity scores from articles

arxiv_sentiment = [sid.polarity_scores(" ".join(x)) for x in arxiv_tokenised]









    



CPU times: user 16min 2s, sys: 3.13 s, total: 16min 5s
Wall time: 19min 8s



In [614]:

    
arxiv_sent = pd.concat([pd.DataFrame([x['neg'] for x in arxiv_sentiment]),
                       arxiv_labelled],axis=1)

arxiv_sent.rename(columns={0:'negativity'},inplace=True)



In [621]:

    
neg_ranked = arxiv_sent.drop_duplicates('pid').loc[arxiv_sent.is_Deep_learning==1,:].sort_values(
    'negativity',ascending=False)[['negativity','summary']].reset_index(drop=True)



In [636]:

    
for n in np.arange(0,len(neg_ranked[:30])):
    print(n)
    print(neg_ranked['summary'][n])
    print("\n")









    



0
From only positive (P) and unlabeled (U) data, a binary classifier could be
trained with PU learning. Unbiased PU learning that is based on unbiased risk
estimators is now state of the art. However, if its model is very flexible, its
empirical risk on training data will go negative, and we will suffer from
overfitting seriously. In this paper, we propose a novel non-negative risk
estimator for PU learning. When being minimized, it is more robust against
overfitting, and thus we are able to train very flexible models given limited P
data. Moreover, we analyze the bias, consistency and mean-squared-error
reduction of the proposed risk estimator as well as the estimation error of the
corresponding risk minimizer. Experiments show that the non-negative risk
estimator outperforms unbiased counterparts when they disagree.


1
Some recent works revealed that deep neural networks (DNNs) are vulnerable to
so-called adversarial attacks where input examples are intentionally perturbed
to fool DNNs. In this work, we revisit the DNN training process that includes
adversarial examples into the training dataset so as to improve DNN's
resilience to adversarial attacks, namely, adversarial training. Our
experiments show that different adversarial strengths, i.e., perturbation
levels of adversarial examples, have different working zones to resist the
attack. Based on the observation, we propose a multi-strength adversarial
training method (MAT) that combines the adversarial training examples with
different adversarial strengths to defend adversarial attacks. Two training
structures - mixed MAT and parallel MAT - are developed to facilitate the
tradeoffs between training time and memory occupation. Our results show that
MAT can substantially minimize the accuracy degradation of deep learning
systems to adversarial attacks on MNIST, CIFAR-10, CIFAR-100, and SVHN.


2
Deep learning classifiers are known to be inherently vulnerable to
manipulation by intentionally perturbed inputs, named adversarial examples. In
this work, we establish that reinforcement learning techniques based on Deep
Q-Networks (DQNs) are also vulnerable to adversarial input perturbations, and
verify the transferability of adversarial examples across different DQN models.
Furthermore, we present a novel class of attacks based on this vulnerability
that enable policy manipulation and induction in the learning process of DQNs.
We propose an attack mechanism that exploits the transferability of adversarial
examples to implement policy induction attacks on DQNs, and demonstrate its
efficacy and impact through experimental study of a game-learning scenario.


3
Phaseless super-resolution is the problem of recovering an unknown signal
from measurements of the magnitudes of the low frequency Fourier transform of
the signal. This problem arises in applications where measuring the phase, and
making high-frequency measurements, are either too costly or altogether
infeasible. The problem is especially challenging because it combines the
difficult problems of phase retrieval and classical super-resolution


4
Considering the difficult problem under classical computing model can be
solved by the quantum algorithm in polynomial time, t-multiple discrete
logarithm problems presented. The problem is non-degeneracy and unique
solution. We talk about what the parameter effects the problem solving
difficulty. Then we pointed out that the index-calculus algorithm is not
suitable for the problem, and two sufficient conditions of resistance to the
quantum algorithm for the hidden subgroup problem are given.


5
Poisoning attack is identified as a severe security threat to machine
learning algorithms. In many applications, for example, deep neural network
(DNN) models collect public data as the inputs to perform re-training, where
the input data can be poisoned. Although poisoning attack against support
vector machines (SVM) has been extensively studied before, there is still very
limited knowledge about how such attack can be implemented on neural networks
(NN), especially DNNs. In this work, we first examine the possibility of
applying traditional gradient-based method (named as the direct gradient
method) to generate poisoned data against NNs by leveraging the gradient of the
target model w.r.t. the normal data. We then propose a generative method to
accelerate the generation rate of the poisoned data: an auto-encoder
(generator) used to generate poisoned data is updated by a reward function of
the loss, and the target NN model (discriminator) receives the poisoned data to
calculate the loss w.r.t. the normal data. Our experiment results show that the
generative method can speed up the poisoned data generation rate by up to
239.38x compared with the direct gradient method, with slightly lower model
accuracy degradation. A countermeasure is also designed to detect such
poisoning attack methods by checking the loss of the target model.


6
Adversarial examples are malicious inputs designed to fool machine learning
models. They often transfer from one model to another, allowing attackers to
mount black box attacks without knowledge of the target model's parameters.
Adversarial training is the process of explicitly training a model on
adversarial examples, in order to make it more robust to attack or to reduce
its test error on clean inputs. So far, adversarial training has primarily been
applied to small problems. In this research, we apply adversarial training to
ImageNet. Our contributions include: (1) recommendations for how to succesfully
scale adversarial training to large models and datasets, (2) the observation
that adversarial training confers robustness to single-step attack methods, (3)
the finding that multi-step attack methods are somewhat less transferable than
single-step attack methods, so single-step attacks are the best for mounting
black-box attacks, and (4) resolution of a "label leaking" effect that causes
adversarially trained models to perform better on adversarial examples than on
clean examples, because the adversarial example construction process uses the
true label and the model can learn to exploit regularities in the construction
process.


7
The failure rate function plays an important role in studying the lifetime
distributions in reliability theory and life testing models. A study of the
general failure rate model $r(t)=a+bt^{\theta-1}$, under squared error loss
function taking $a$ and $b$ independent exponential random variables has been
analyzed in the literature. In this article, we consider $a$ and $b$ not
necessarily independent. The estimates of the parameters $a$ and $b$ under
squared error loss, linex loss and entropy loss functions are obtained here.


8
Traditional pedestrian collision warning systems sometimes raise alarms even
when there is no danger (e.g., when all pedestrians are walking on the
sidewalk). These false alarms can make it difficult for drivers to concentrate
on their driving. In this paper, we propose a novel framework for an end-to-end
pedestrian collision warning system based on a convolutional neural network.
Semantic segmentation information is used to train the convolutional neural
network and two loss functions, such as cross entropy and Euclidean losses, are
minimized. Finally, we demonstrate the effectiveness of our method in reducing
false alarms and increasing warning accuracy compared to a traditional
histogram of oriented gradients (HoG)-based system.


9
We show that defensive distillation is not secure: it is no more resistant to
targeted misclassification attacks than unprotected neural networks.


10
We study the problem of computing the capacity of a discrete memoryless
channel under uncertainty affecting the channel law matrix, and possibly with a
constraint on the average cost of the input distribution. The problem has been
formulated in the literature as a max-min problem. We use the robust
optimization methodology to convert the max-min problem to a standard convex
optimization problem. For small-sized problems, and for many types of
uncertainty, such a problem can be solved in principle using interior point
methods (IPM). However, for large-scale problems, IPM are not practical. Here,
we suggest an $\mathcal{O}(1/T)$ first-order algorithm based on Nemirovski
(2004) which is applied directly to the max-min problem.


11
Blind system identification is known to be an ill-posed problem and without
further assumptions, no unique solution is at hand. In this contribution, we
are concerned with the task of identifying an ARX model from only output
measurements. We phrase this as a constrained rank minimization problem and
present a relaxed convex formulation to approximate its solution. To make the
problem well posed we assume that the sought input lies in some known linear
subspace.


12
Machine learning models are vulnerable to adversarial examples, inputs
maliciously perturbed to mislead the model. These inputs transfer between
models, thus enabling black-box attacks against deployed models. Adversarial
training increases robustness to attacks by injecting adversarial examples into
training data.
  Surprisingly, we find that although adversarially trained models exhibit
strong robustness to some white-box attacks (i.e., with knowledge of the model
parameters), they remain highly vulnerable to transferred adversarial examples
crafted on other models. We show that the reason for this vulnerability is the
model's decision surface exhibiting sharp curvature in the vicinity of the data
points, thus hindering attacks based on first-order approximations of the
model's loss, but permitting black-box attacks that use adversarial examples
transferred from another model.
  We harness this observation in two ways: First, we propose a simple yet
powerful novel attack that first applies a small random perturbation to an
input, before finding the optimal perturbation under a first-order
approximation. Our attack outperforms prior "single-step" attacks on models
trained with or without adversarial training.
  Second, we propose Ensemble Adversarial Training, an extension of adversarial
training that additionally augments training data with perturbed inputs
transferred from a number of fixed pre-trained models. On MNIST and ImageNet,
ensemble adversarial training vastly improves robustness to black-box attacks.


13
We address the question of estimating Kullback-Leibler losses rather than
squared losses in recovery problems where the noise is distributed within the
exponential family. Inspired by Stein unbiased risk estimator (SURE), we
exhibit conditions under which these losses can be unbiasedly estimated or
estimated with a controlled bias. Simulations on parameter selection problems
in applications to image denoising and variable selection with Gamma and
Poisson noises illustrate the interest of Kullback-Leibler losses and the
proposed estimators.


14
Boosting combines weak learners into a predictor with low empirical risk. Its
dual constructs a high entropy distribution upon which weak learners and
training labels are uncorrelated. This manuscript studies this primal-dual
relationship under a broad family of losses, including the exponential loss of
AdaBoost and the logistic loss, revealing:
  - Weak learnability aids the whole loss family: for any {\epsilon}>0,
O(ln(1/{\epsilon})) iterations suffice to produce a predictor with empirical
risk {\epsilon}-close to the infimum;
  - The circumstances granting the existence of an empirical risk minimizer may
be characterized in terms of the primal and dual problems, yielding a new proof
of the known rate O(ln(1/{\epsilon}));
  - Arbitrary instances may be decomposed into the above two, granting rate
O(1/{\epsilon}), with a matching lower bound provided for the logistic loss.


15
We study a new class of online learning problems where each of the online
algorithm's actions is assigned an adversarial value, and the loss of the
algorithm at each step is a known and deterministic function of the values
assigned to its recent actions. This class includes problems where the
algorithm's loss is the minimum over the recent adversarial values, the maximum
over the recent values, or a linear combination of the recent values. We
analyze the minimax regret of this class of problems when the algorithm
receives bandit feedback, and prove that when the minimum or maximum functions
are used, the minimax regret is $\tilde \Omega(T^{2/3})$ (so called hard online
learning problems), and when a linear function is used, the minimax regret is
$\tilde O(\sqrt{T})$ (so called easy learning problems). Previously, the only
online learning problem that was known to be provably hard was the multi-armed
bandit with switching costs.


16
To use deep reinforcement learning in the wild, we might hope for an agent
that can avoid catastrophic mistakes. Unfortunately, even in simple
environments, the popular deep Q-network (DQN) algorithm is doomed by a
Sisyphean curse. Owing to the use of function approximation, these agents may
eventually forget experiences as they become exceedingly unlikely under a new
policy. Consequently, for as long as they continue to train, DQNs may
periodically repeat avoidable catastrophic mistakes. In this paper, we learn a
\emph{reward shaping} that accelerates learning and guards oscillating policies
against repeated catastrophes. First, we demonstrate unacceptable performance
of DQNs on two toy problems. We then introduce \emph{intrinsic fear}, a new
method that mitigates these problems by avoiding dangerous states. Our approach
incorporates a second model trained via supervised learning to predict the
probability of catastrophe within a short number of steps. This score then acts
to penalize the Q-learning objective. Equipped with intrinsic fear, our DQNs
solve the toy environments and improve on the Atari games Seaquest, Asteroids,
and Freeway.


17
We consider the problem of learning reject option classifiers. The goodness
of a reject option classifier is quantified using $0-d-1$ loss function wherein
a loss $d \in (0,.5)$ is assigned for rejection. In this paper, we propose {\em
double ramp loss} function which gives a continuous upper bound for $(0-d-1)$
loss. Our approach is based on minimizing regularized risk under the double
ramp loss using {\em difference of convex (DC) programming}. We show the
effectiveness of our approach through experiments on synthetic and benchmark
datasets. Our approach performs better than the state of the art reject option
classification approaches.


18
This work deals with a class of problems under interval data uncertainty,
namely interval robust-hard problems, composed of interval data min-max regret
generalizations of classical NP-hard combinatorial problems modeled as 0-1
integer linear programming problems. These problems are more challenging than
other interval data min-max regret problems, as solely computing the cost of
any feasible solution requires solving an instance of an NP-hard problem. The
state-of-the-art exact algorithms in the literature are based on the generation
of a possibly exponential number of cuts. As each cut separation involves the
resolution of an NP-hard classical optimization problem, the size of the
instances that can be solved efficiently is relatively small. To smooth this
issue, we present a modeling technique for interval robust-hard problems in the
context of a heuristic framework. The heuristic obtains feasible solutions by
exploring dual information of a linearly relaxed model associated with the
classical optimization problem counterpart. Computational experiments for
interval data min-max regret versions of the restricted shortest path problem
and the set covering problem show that our heuristic is able to find optimal or
near-optimal solutions and also improves the primal bounds obtained by a
state-of-the-art exact algorithm and a 2-approximation procedure for interval
data min-max regret problems.


19
We consider prediction with expert advice when the loss vectors are assumed
to lie in a set described by the sum of atomic norm balls. We derive a regret
bound for a general version of the online mirror descent (OMD) algorithm that
uses a combination of regularizers, each adapted to the constituent atomic
norms. The general result recovers standard OMD regret bounds, and yields
regret bounds for new structured settings where the loss vectors are (i) noisy
versions of points from a low-rank subspace, (ii) sparse vectors corrupted with
noise, and (iii) sparse perturbations of low-rank vectors. For the problem of
online learning with structured losses, we also show lower bounds on regret in
terms of rank and sparsity of the source set of the loss vectors, which implies
lower bounds for the above additive loss settings as well.


20
In high-dimensional and/or non-parametric regression problems, regularization
(or penalization) is used to control model complexity and induce desired
structure. Each penalty has a weight parameter that indicates how strongly the
structure corresponding to that penalty should be enforced. Typically the
parameters are chosen to minimize the error on a separate validation set using
a simple grid search or a gradient-free optimization method. It is more
efficient to tune parameters if the gradient can be determined, but this is
often difficult for problems with non-smooth penalty functions. Here we show
that for many penalized regression problems, the validation loss is actually
smooth almost-everywhere with respect to the penalty parameters. We can
therefore apply a modified gradient descent algorithm to tune parameters.
Through simulation studies on example regression problems, we find that
increasing the number of penalty parameters and tuning them using our method
can decrease the generalization error.


21
Deep neural network classifiers are vulnerable to small input perturbations
carefully generated by the adversaries. Injecting adversarial inputs during
training, known as adversarial training, can improve robustness against
one-step attacks, but not for unknown iterative attacks. To address this
challenge, we propose to utilize embedding space for both classification and
low-level (pixel-level) similarity learning to ignore unknown pixel level
perturbation. During training, we inject adversarial images without replacing
their corresponding clean images and penalize the distance between the two
embeddings (clean and adversarial). This additional regularization encourages
two similar images (clean and perturbed versions) to produce the same outputs,
not necessarily the true labels, enhancing classifier's robustness against
pixel level perturbation. Next, we show iteratively generated adversarial
images easily transfer between networks trained with the same strategy.
Inspired by this observation, we also propose cascade adversarial training,
which transfers the knowledge of the end results of adversarial training. We
train a network from scratch by injecting iteratively generated adversarial
images crafted from already defended networks in addition to one-step
adversarial images from the network being trained. Experimental results show
that cascade adversarial training together with our proposed low-level
similarity learning efficiently enhance the robustness against iterative
attacks, but at the expense of decreased robustness against one-step attacks.
We show that combining those two techniques can also improve robustness under
the worst case black box attack scenario.


22
We study how the regret guarantees of nonstochastic multi-armed bandits can
be improved, if the effective range of the losses in each round is small (e.g.
the maximal difference between two losses in a given round). Despite a recent
impossibility result, we show how this can be made possible under certain mild
additional assumptions, such as availability of rough estimates of the losses,
or advance knowledge of the loss of a single, possibly unspecified arm. Along
the way, we develop a novel technique which might be of independent interest,
to convert any multi-armed bandit algorithm with regret depending on the loss
range, to an algorithm with regret depending only on the effective range, while
avoiding predictably bad arms altogether.


23
In many applications, the training data, from which one needs to learn a
classifier, is corrupted with label noise. Many standard algorithms such as SVM
perform poorly in presence of label noise. In this paper we investigate the
robustness of risk minimization to label noise. We prove a sufficient condition
on a loss function for the risk minimization under that loss to be tolerant to
uniform label noise. We show that the $0-1$ loss, sigmoid loss, ramp loss and
probit loss satisfy this condition though none of the standard convex loss
functions satisfy it. We also prove that, by choosing a sufficiently large
value of a parameter in the loss function, the sigmoid loss, ramp loss and
probit loss can be made tolerant to non-uniform label noise also if we can
assume the classes to be separable under noise-free data distribution. Through
extensive empirical studies, we show that risk minimization under the $0-1$
loss, the sigmoid loss and the ramp loss has much better robustness to label
noise when compared to the SVM algorithm.


24
Machine learning classifiers are known to be vulnerable to inputs maliciously
constructed by adversaries to force misclassification. Such adversarial
examples have been extensively studied in the context of computer vision
applications. In this work, we show adversarial attacks are also effective when
targeting neural network policies in reinforcement learning. Specifically, we
show existing adversarial example crafting techniques can be used to
significantly degrade test-time performance of trained policies. Our threat
model considers adversaries capable of introducing small perturbations to the
raw input of the policy. We characterize the degree of vulnerability across
tasks and training algorithms, for a subclass of adversarial-example attacks in
white-box and black-box settings. Regardless of the learned task or training
algorithm, we observe a significant drop in performance, even with small
adversarial perturbations that do not interfere with human perception. Videos
are available at http://rll.berkeley.edu/adversarial.


25
Recently, an image encryption scheme based on chaotic standard and logistic
maps was proposed. This paper studies the security of the scheme and shows that
it can be broken with only one chosen-plaintext. Some other security defects of
the scheme are also reported.


26
Machine learning and deep learning in particular has advanced tremendously on
perceptual tasks in recent years. However, it remains vulnerable against
adversarial perturbations of the input that have been crafted specifically to
fool the system while being quasi-imperceptible to a human. In this work, we
propose to augment deep neural networks with a small "detector" subnetwork
which is trained on the binary classification task of distinguishing genuine
data from data containing adversarial perturbations. Our method is orthogonal
to prior work on addressing adversarial perturbations, which has mostly focused
on making the classification network itself more robust. We show empirically
that adversarial perturbations can be detected surprisingly well even though
they are quasi-imperceptible to humans. Moreover, while the detectors have been
trained to detect only a specific adversary, they generalize to similar and
weaker adversaries. In addition, we propose an adversarial attack that fools
both the classifier and the detector and a novel training procedure for the
detector that counteracts this attack.


27
Stochastic Dual Coordinate Ascent is a popular method for solving regularized
loss minimization for the case of convex losses. In this paper we show how a
variant of SDCA can be applied for non-convex losses. We prove linear
convergence rate even if individual loss functions are non-convex as long as
the expected loss is convex.


28
Regularized risk minimization with the binary hinge loss and its variants
lies at the heart of many machine learning problems. Bundle methods for
regularized risk minimization (BMRM) and the closely related SVMStruct are
considered the best general purpose solvers to tackle this problem. It was
recently shown that BMRM requires $O(1/\epsilon)$ iterations to converge to an
$\epsilon$ accurate solution. In the first part of the paper we use the
Hadamard matrix to construct a regularized risk minimization problem and show
that these rates cannot be improved. We then show how one can exploit the
structure of the objective function to devise an algorithm for the binary hinge
loss which converges to an $\epsilon$ accurate solution in
$O(1/\sqrt{\epsilon})$ iterations.


29
Model checking is an automatic verification technique to verify hardware and
software systems. However it suffers from state-space explosion problem. In
this paper we address this problem in the context of cryptographic protocols by
proposing a security property-dependent heuristic. The heuristic weights the
state space by exploiting the security formulae; the weights may then be used
to explore the state space when searching for attacks.



In [ ]:

topic	0	1
first_tag
cs.CV	44.052941	55.947059
cs.CL	57.357057	42.642943
cs.LG	63.225256	36.774744
cs.NE	64.564220	35.435780
stat.ML	67.358916	32.641084
cs.SD	70.000000	30.000000
cs.IR	82.943525	17.056475
stat.CO	84.695557	15.304443
cs.NA	84.820240	15.179760
cs.MM	85.775249	14.224751

topic	0	1
year
1993	4	0
1994	2	2
1995	2	3
1996	8	0
1997	16	3
1998	47	3
1999	84	17
2000	151	20
2001	113	25
2002	108	19
2003	88	14
2004	92	3
2005	102	22
2006	119	15
2007	128	10
2008	164	34
2009	328	65
2010	376	87
2011	535	126
2012	1067	270
2013	1232	470
2014	1617	856
2015	2330	1973
2016	4059	4645
2017	3294	4297

variable	cs.CL	cs.CV	cs.LG	stat.ML
variable
cs.CL	1.000000	0.127596	0.085004	0.031617
cs.CV	0.127596	1.000000	-0.031111	0.080352
cs.LG	0.085004	-0.031111	1.000000	0.135156
stat.ML	0.031617	0.080352	0.135156	1.000000

variable	cs.CL	cs.CV	cs.LG	stat.ML
variable
cs.CL	1.000000	0.172535	0.188696	0.268237
cs.CV	0.172535	1.000000	0.198476	0.055372
cs.LG	0.188696	0.198476	1.000000	0.363375
stat.ML	0.268237	0.055372	0.363375	1.000000

topic	0	1
citations_high_norm
False	0.771871	0.729491
True	0.228129	0.270509