The Cybersecurity Threat Landscape? APA Style, 2 References (I included some resources if you wish), 2 pages. Note: ? Describe the concept in lay
The Cybersecurity Threat Landscape
APA Style, 2 References (I included some resources if you wish), 2 pages.
Note:
· Describe the concept in layman's terms.
· Use visuals where appropriate.
Machine Learning and Data Analytics
o Describe the concepts of machine learning and data analytics and how applying them to cybersecurity will evolve the field.
o Are there companies providing innovative defensive cybersecurity measures based on these technologies? If so, what are they? Would you recommend any of these to the (Chief Technical Officer) CTO?
Learning Resource
Defining Machine Learning
Machine learning is a method of giving a computational device the ability to learn. It is different from traditional programming whereby the structure is logical, explicit, and conditional. Machine learning uses neural networks (among other techniques) and reinforcement learning to teach the computational device between what is correct or incorrect. One of the prerequisites of machine learning is data. Data is important because in order to teach the computational device how to learn, we need data to feed it.
For example, if we wanted to teach a computation device to identify pictures of dogs, we would need to submit pictures of dogs and pictures without dogs. The basics of a neural network can be found inside the system, and the neural network coupled with reinforced learning allows us to teach the computer. Essentially, the computation device will attempt to identify the pictures with dogs. If the system incorrectly identifies a picture, the human corrects the computational device. This is the reinforced learning aspect. It is akin to a teacher in grade school providing feedback on a math test. Once the system is trained, it has a high success rate of identifying pictures of dogs.
Learning Resource
Getting Started With Machine Learning
by Grant Ingersoll
Despite all the flashy headlines from Elon Musk and Stephen Hawking on the impending doom to be visited on us mere mortals by killer robots from the skies, machine learning and artificial intelligence are here to stay. More importantly, machine learning (ML) is quickly becoming a critical skill for developers to enhance their applications and their careers, better understand data, and to help users be more effective.
What is machine learning? It is the use of both historical and current data to make predictions, organize content, and learn patterns about data without being explicitly programmed to do so. This is typically done using statistical techniques that look for significant events like co-occurrences and anomalies in the data and then factoring in their likelihood into a model that is queried at a later time to provide a prediction for some new piece of data.
Common machine learning tasks include classification (applying labels to items), clustering (grouping items automatically), and topic detection. It is also commonly used in natural language processing. Machine learning is increasingly being used in a wide variety of use cases, including content recommendation, fraud detection, image analysis, and e-commerce. It is useful across many industries, and most popular programming languages have at least one open source library implementing common ML techniques.
Reflecting the broader push in software toward open source, there are now many vibrant machine learning projects available to experiment with as well as a plethora of books, articles, tutorials, and videos to get you up to speed. Let's look at a few projects leading the way in open source machine learning and a few primers on related ML terminology and techniques.
Learning Resource
How the Machine "Thinks": Understanding Opacity in Machine Learning Algorithms
by Jenna Burrell
This article considers the issue of opacity as a problem for socially consequential mechanisms of classfication and ranking, such as spam filters, credit card fraud detection, search engines, news trends, market segmentation and advertising, insurance or loan qualification, and credit scoring. These are just some examples of mechanisms of classification that the personal and trace data we generate is subject to every day in network-connected, advanced capitalist societies. These mechanisms of classification all frequently rely on computational algorithms and, lately, on machine learning algorithms to do this work.
Opacity seems to be at the very heart of new concerns about "algorithms" among legal scholars and social scientists. The algorithms in question operate on data. Using this data as input, they produce an output; specifically, a classification (i.e., whether to give an applicant a loan, or whether to tag an email as spam). They are opaque in the sense that if one is a recipient of the output of the algorithm (the classification decision), rarely does one have any concrete sense of how or why a particular classification has been arrived at from inputs.
Additionally, the inputs themselves may be entirely unknown or known only partially. The question naturally arises: What are the reasons for this state of not knowing? Is it because the algorithm is proprietary? Because it is complex or highly technical? Or are there, perhaps, other reasons?
By distinguishing forms of opacity that are often conflated in the emerging interdisciplinary scholarship on this topic, I seek to highlight the varied implications of algorithmic classification for longstanding matters of concern to sociologists, such as economic inequality and social mobility.
Three distinct forms of opacity include (1) opacity as intentional corporate or institutional self-protection and concealment and, along with it, the possibility for knowing deception; (2) opacity stemming from the current state of affairs where writing (and reading) code is a specialist skill; and (3) an opacity that stems from the mismatch between mathematical optimization in high-dimensionality characteristic of machine learning and the demands of human-scale reasoning and styles of semantic interpretation.
This third form of opacity (often conflated with the second form as part of the general sense that algorithms and code are very technical and complex) is the particular focus of this article. By examining in depth this form of opacity, I point out shortcomings in certain proposals for code or algorithm "audits" as a way to evaluate for discriminatory classification.
To examine this question of opacity, specifically toward the task of getting inside the algorithms themselves, I cite existing literature in computer science, known industry practices (as they are publicly presented), and do some testing and manipulation of code as a form of lightweight audit. Along the way, I relate these forms of opacity to technical and nontechnical solutions proposed to address the impenetrability of machine learning classification. Each form suggests distinct solutions for preventing harm.
So, What Is New?
The word algorithm has recently undergone a shift in public presentation, going from an obscure technical term used almost exclusively among computer scientists to one attached to a polarized discourse. The term appears increasingly in mainstream media outlets. For example, the professional body National Nurses United produced a radio spot that starts with a voice that sarcastically declares, ''Algorithms are simple mathematical formulas that nobody understands,'' and concludes with a nurse swooping in to rescue a distressed patient from a disease diagnosis system which makes a series of comically wrong declarations about the patient's condition (see https://soundcloud.com/national-nurses-united/radio-ad-algorithms). The purpose of the public service announcement is to champion professional care (by nurses), in this case against error-prone automation.
By contrast, efforts at corporate branding of the term algorithm play up notions of algorithmic objectivity over biased human decision making (Sandvig, 2015). In this way, the connotations of the term are actively being shaped as part of advertising culture and corporate self-presentation, as well as challenged by a related counter-discourse tied to general concerns about automation, corporate accountability, and media monopolies (i.e., Tufekci, 2014).
While these new media narratives may be novel, it has long been the case that large organizations (including private sector firms and public institutions) have had internal procedures that were not fully understood by those who were subject to them. These procedures could fairly be described as "algorithms." What should we then make of these new uses of the term and the field of critique and analysis emerging along with it? Is this merely old wine in new bottles, or are there genuinely new and pressing issues related to patterns of algorithmic design as they are employed increasingly in real-world applications?
In addition to the polarization of a public discourse about algorithms, much of what is new in this domain are the more pervasive technologies and techniques of data collection; the more vast archives of personal data including purchasing activities, link clicks, and geospatial movement, an outcome of more universally adopted mobile devices, services, and applications; and the reality (in some parts of the world) of constant connectivity. But this does not necessarily have much to do with the algorithms that operate on the data. Often, it is about what composes the data and new concerns about privacy and the possibility (or troublingly, the impossibility) of opting out.
Other changes have to do with particular application areas and evolving proposals for a regulatory response. The shift of algorithmic automation into new areas of what were previously white-collar work is reflected in headlines such as "Will we need teachers or algorithms?" (Khosla, 2012) and into consequential processes of classification that were previously human-determined, such as credit evaluations in an effort to realize cost savings (as so often fuels shifts toward automation) (Straka, 2000).
In the domain of credit and lending, Fourcade and Healy point to a shift from prior practices of exclusionary lending to a select few to more generous credit offered to a broader spectrum of society–but offered to some on unfavorable, even usurious terms. This shift is made possible by "the emergence and expansion of methods of tracking and classifying consumer behavior" (Fourcade & Healy, 2013, p. 560). These methods are (in part) implemented as algorithms in computers. Here, the account seems to suggest an expansion of the territory of work claimed by particular algorithmic routines: that they are taking on a broader range of types of tasks at a scale that they were not previously.
In this emerging critique of "algorithms" carried out by scholars in law and in the social sciences, few have considered in much depth their mathematical design. Many of these critics instead take a broad sociotechnical approach looking at "algorithms in the wild." The algorithms in question are studied for the way they are situated within a corporation, under the pressure of profit and shareholder value, and as they are applied to particular real-world user populations (and the data these populations produce). Thus, something more than the algorithmic logic is being examined.
Such analyses are often particular to an implementation (such as Google's search engine) with its specific user base and uniquely accumulated history of problems and failures with resulting parameter setting and manual tweaking by programmers. Such an approach may not reveal important broader patterns or risks to be found in particular classes of algorithms.
Investigating Opacity: A Method and Approach
In general, we cannot look at the code directly for many important algorithms of classification that are in widespread use. This opacity (at one level) exists because of proprietary concerns. They are closed in order to maintain competitive advantage and/or to keep a few steps ahead of adversaries. Adversaries could be other companies in the market or malicious attackers (relevant in many network security applications). However, it is possible to investigate the general computational designs that we know these algorithms use by drawing from educational materials.
To do this, I draw in part from classic illustrative examples of particular machine learning models, of the sort used in undergraduate education. In this case, I have specifically examined programming assignments for a Coursera course in machine learning. These examples offer hugely simplified versions of computational ideas scaled down to run on a student's personal computer so that they return output almost immediately. Such examples do not force a confrontation with many thorny, real-world application challenges. That said, the ways that opacity endures in spite of such simplification reveal something important and fundamental about the limits to overcoming it.
Machine learning algorithms do not encompass all of the algorithms of interest to scholars now studying what might be placed under the banner of the "politics of algorithms." However, they are interesting to consider specifically because they are typically applied to classification tasks and because they are used to make socially consequential predictions such as "How likely is this loan applicant to default?"
In the broader domain of algorithms implemented in various areas of concern (such as search engines or credit scoring), machine learning algorithms may play either a central or a peripheral role, and it is not always easy to tell which is the case. For example, a search engine request is algorithmically driven (except for the part—generally totally invisible to users—that may be done manually by human workers who do content moderation, cross-checking, ground truthing and correction) (Miller, 2014), but search engine algorithms are not, at their core, "machine learning" algorithms.
Search engines employ machine learning algorithms for particular purposes, such as detecting ads or blatant search ranking manipulation and prioritizing search results based on the user's location. (See the question and response on this Reddit AMA with Andrew Ng about why companies make their algorithmic techniques public–https://www.reddit.com/r/Machine Learning/comments/32ihpe/ama_andrew_ng_and_ adam_coates/cqbkmyb–and this Quora question and response about how machine learning contributes to the Google search engine—http://www.quora.com/Why-is-machine-learning-used-heavily-for-Googles-ad-ranking-and-less-for-their-search-ranking)
While not all tasks that machine learning is applied to are classification tasks, this is a key area of application and one where many sociological concerns arise. As Bowker and Star note in their account of classification and its consequences, "each category valorizes some point of view and silences another," and that there is a long history of lives "broken, twisted, and torqued by their encounters with classification systems," such as the race classification system of apartheid South Africa and the categorization of tuberculosis patients (Bowker & Star, 1999).
The claim that algorithms will classify more "objectively" (thus solving previous inadequacies or injustices in classification) cannot simply be taken at face value given the degree of human judgment still involved in designing the algorithms—choices which become built-in. This human work includes defining features, preclassifying training data, and adjusting thresholds and parameters.
Opacity
Below, I define a typology starting first with the matter of "opacity" as a form of proprietary protection or as "corporate secrecy" (Pasquale, 2015). Secondly, I point to opacity in terms of the readability of code. Code writing is a necessary skill for the computational implementation of algorithms, and one that remains a specialist skill not found widely in the general public. Finally, arriving at the major point of this article, I contrast a third form of opacity centering on the mismatch between mathematical procedures of machine learning algorithms and human styles of semantic interpretation.
At the heart of this challenge is an opacity that relates to the specific techniques used in machine learning. Each of these forms of opacity may be tackled by different tools and approaches ranging from the legislative to the organizational or programmatic and to the technical. But importantly, the form (or forms) of opacity entailed in a particular algorithmic application must be identified in order to pursue a course of action that is likely to mitigate its problems.
Forms of Opacity
Opacity as Intentional Corporate or State Secrecy
One argument in the emerging literature on the "politics of algorithms" is that algorithmic opacity is a largely intentional form of self-protection by corporations intent on maintaining their trade secrets and competitive advantage. Yet this is not just about one search engine competing with another to keep its "secret sauce" under wraps. It is also the case that dominant platforms and applications, particularly those that use algorithms for ranking, recommending, trending, and filtering, attract those who want to "game" them as part of strategies for securing attention from the general public. The field of search engine optimization does just this.
An approach within machine learning called "adversarial learning" deals specifically with these sorts of evolving strategies. Network security applications of machine learning deal explicitly with spam, scams, and fraud and remain opaque in order to be effective. Sandvig notes that this game of cat and mouse makes it entirely unlikely that most algorithms will be (or necessarily should be) disclosed to the general public (Sandvig et al., 2014, p. 9). That said, an obvious alternative to proprietary and closed algorithms is open-source software. Successful business models have emerged out of the open-source movement. There are options even in "adversarial learning" such as the SpamAssassin spam filter for Apache.
On the other hand, Pasquale's more skeptical analysis proposes that the current extent of algorithmic opacity in many domains of application may not be justified and is instead a product of lax or lagging regulations. In his book The Black Box Society: The Secret Algorithms that Control Money and Information, he argues that a kind of adversarial situation is indeed in play, one where the adversary is regulation itself. "What if financiers keep their doings opaque on purpose, precisely to avoid or to confound regulation?"(Pasquale, 2015, p. 2). In reference to this, he defines opacity as "remediable incomprehensibility."
The opacity of algorithms, according to Pasquale, could be attributed to willful self-protection by corporations in the name of competitive advantage, but this could also be a cover for a new form of concealing sidestepped regulations, the manipulation of consumers, and/or patterns of discrimination.
For this type of opacity, one proposed response is to make code available for scrutiny, through regulatory means if necessary (Diakopoulos, 2013; Gandy, 2010; Pasquale, 2015). Underlying this particular explanation for algorithmic opacity is an assumption that if corporations were willing to expose the design of the algorithms they use, it would be possible to ascertain problems of consumer manipulation or regulatory violation by reading the code. Pasquale acknowledges that such measures could render algorithms ineffective, though he suggests that it may still be possible with the use of an independent, trusted auditor who can maintain secrecy while serving the public interest (Pasquale, 2015, p. 141). In the absence of access to the code, Sandvig et al. (2014) detail and compare several forms of algorithmic audit (carried out with or without corporate cooperation) as a possible response and a way of forcing the issue without requiring access to the code itself.
Opacity as Technical Illiteracy
This second level of opacity stems from an acknowledgement that, at present, writing (and reading) code and the design of algorithms is a specialized skill. It remains inaccessible to most of the population. Courses in software engineering emphasize the writing of clean, elegant, and intelligible code. While code is implemented in particular programming languages, such as C or Python, and the syntax of these languages must be learned, they are in certain ways quite different from human languages. For one, they adhere strictly to logical rules and require precision in spelling and grammar in order to be read by the machine.
Good code does double duty. It must be interpretable by humans (the original programmer or someone adding to or maintaining the code) as well as by the computational device (Mateas & Montfort, 2005). Writing for the computational device demands a special exactness, formality, and completeness that communication via human languages does not. The art and craft of programming is partly about managing this mediating role and entails some well-known best practices like choosing sensible variable names, including "comments" (one-sided communication to human programmers omitted when the code is compiled for the machine), and choosing the simpler code formulation, all things being equal. (See also Ensmenger, 2003, on programming as craft and programmers as a profession.)
Recent calls for greater diversity in STEM fields and for general efforts toward developing computational thinking at all levels of education (Lee et al., 2011; Wing, 2006) are relevant. Diakopoulos (2013) likewise suggests ways that journalists might play a valuable role in reverse engineering algorithms to inform the general public, but notes that this poses a challenge of "human resource" development, one of developing code and computational literacy in journalists or others who wish to do this sort of examination. To address this form of opacity, widespread educational efforts would ideally make the public more knowledgeable about these mechanisms that affect their life opportunities and put people in a better position to directly evaluate and critique them.
Opacity as the Way Algorithms Operate at the Scale of Application
Scholars have noted that algorithms (such as that underlying the Google search engine) are often component systems built by teams producing an opacity that programmers who are insiders to the algorithm must contend with as well (Sandvig, Hamilton, Karahalios, & Langbort, 2014; Seaver, 2014). A call for code "audits" (where this means reading the code) and the employment of auditors may underestimate what this would entail as far as the number of hours required to untangle the logic of the code within a complicated software system. This valid critique is nevertheless nonspecific about different classes of algorithms and their particular logics.
I further argue that there are certain challenges of scale and complexity that are distinctive to machine learning algorithms. These challenges relate not simply to the total number of lines or pages of code, the number of team members on the engineering team, or the multitude of interlinkages between modules or subroutines. These are challenges not just of reading and comprehending code, but being able to understand the algorithm in action, operating on data. Though a machine learning algorithm can be implemented simply in such a way that its logic is almost fully comprehensible, in practice such an instance is unlikely to be particularly useful. Machine learning models that prove useful (specifically, in terms of the "accuracy" of classification) possess a degree of unavoidable complexity.
Machine learning in particular is often described as suffering from the "curse of dimensionality" (Domingos, 2012). In a big data era, billions or trillions of data examples and thousands or tens of thousands of properties of the data (termed "features" in machine learning) may be analyzed. The internal decision logic of the algorithm is altered as it learns on training data. Handling a huge number of especially heterogeneous properties of data (i.e., not just words in spam email but also email header info) adds complexity to the code.
Machine learning techniques quickly face computational resource limits as they scale and may manage this using techniques written into the code (such as "principal component analysis"), which add to its opacity. While datasets may be extremely large but possible to comprehend and code may be written with clarity, the interplay between the two in the mechanism of the algorithm is what yields the complexity (and thus opacity). Better understanding this complexity (and the barriers to overcoming the opacity it effects) is the concern of the following examples.
Machine Learning: A Brief Primer
Machine learning algorithms are used as powerful generalizers and predictors. Since the accuracy of these algorithms is known to improve with greater quantities of data to train on, the growing availability of such data in recent years has brought renewed interest to these algorithms.
A given machine learning algorithm generally includes two parallel operations, or two distinct algorithms: a classifier and a learner.
Classifiers take input (referred to as a set of "features") and produce an output (a "category"). For example, a classifier that does spam filtering takes a set of features (such as email header information, words in the body of the email, etc.) and produces one of two output categories (spam or not spam). A decision support system that does disease diagnosis may take input (clinical presentation/symptoms, blood test results) and produce a disease diagnosis as output (hypertension, heart disease, liver cancer, etc.).
However, machine learning algorithms called "learners" must first train on test data. (This refers to the subset of machine learning approaches called "supervised" learning which, for the sake of clarity of argument, is what is specifically considered here.) The result of this training is a matrix of weights that will then be used by the classifier to determine the classification for new input data. This training data could, for example, be emails that have been presorted and labeled as spam or not spam.
Machine learning encompasses a number of models that are implemented in code in different ways. Some popular machine learning models include neural networks, decision trees, naive Bayes, and logistic regression. The choice of model depends upon the domain (i.e., loan default prediction versus image recognition), its demonstrated accuracy in classification, and available computational resources, among other concerns. Models may also be combined into "model ensembles," an approach often used in machine learning competitions that seek to maximize accuracy in classification. Two applications of machine learning using separate models will be considered below.
Visualizing Opacity in a Neural Network
The first model and application of machine learning I wish to consider is a "neural network" applied to an image recognition task. Because this is an image recognition task, it lends itself to an attempt to see the weights output by the training algorithm. The classic example for teaching neural networks to computer science undergraduates is handwriting recognition. (Giving some sense of perhaps how little the algorithms themselves have changed, this is the same example used to teach neural networks in the course I took as an undergraduate in 2001 as well as in the Coursera course I completed in 2013.)
To simplify the computational task for educational purposes, the code is implemented to recognize handwritten digits only (the numbers 0 through 9). To further simplify the task, these digits are drawn within the boundaries of a space-constrained box. In the top figure below, you can see some of the fuzziness and ambiguity of the data that is to be classified. If you take a single handwritten number in an 8 by 8 pixel square, each pixel (and a grayscale value associated with it) becomes an input (or "feature") to the classifier, which ultimately outputs what number it recognizes (in the case of the second figure, it should be the number 6).
A set of examples of handwritten numbers that a machine learning algorithm (a learner) and, in this case, a neural network could be trained on.
A handwritten number in an 8 x 8 pixel square.
In the design of a neural network, a set of input nodes connects to a second set of nodes called the hidden layer (like interlinked neurons in the brain) and then to an output layer (see the next figure). Each input node is connected to a hidden layer node and each node in the hidden layer is connected to an output in the design of the neural network in that figure. A value or weight is associated with each of these connecting lines.
The optimal values for the matrix of weights are what the learning algorithm learns. What is optimal is defined by the set of weights that produce the most accurate possible classification of inputs (the individual pixels and their intensity ranging from white to black in an 8 by 8 matrix) to outputs (the handwritten numbers these pixels represent).
Graphical depiction of a neural network.
Because this is an image recognition task, we can actually visualize the optimized weights coming into the hidden layer node. In this way, we can see the way a neural network breaks down the problem of recognizing a handwritten number (see the following figures).
Left, the hidden layer: The black areas in each box are the areas (strokes or other patterns) that a particular hidden layer node cues in on in a handwritten digit. Right: This shows the result of the same learning algorithm being run a second time with the same training data. The reason they are not identical is because of the random initialization step that defines the set of weights initially to very small random numbers.
The figure at above left illustrates the hidden layer in a neural network. If you look at one of the 25 boxes, you can see which part of a handwritten number it cues in on. Each box represents a single node in the hidden layer, and each pixel within the box illustrates the value of the weight coming from one input layer node into that particular hidden layer node. In sum, each box shows the set of weights for a simplified neural network with only one hidden layer.
The regions in the box that are black are the specific pixels in which the node in question i
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.