Can i get assistance on this case study paper?? Introduction: Context, Questions/Hypotheses Data & methods: Model selection, Data preparation Results: Impl
Can i get assistance on this case study paper?
Introduction: Context, Questions/Hypotheses
Data & methods: Model selection, Data preparation
Results: Implementation of Descriptive statistical methods & Analytics
Conclusion: Significance of the results & Future work
Datasets, examples & template are attached
Tesla Stock forecasting feature analysis Austin Li (jl3273), Lang Lei (ll674), Shichen Qi (sq89)
1. Introduction In the modern quantitative research industry,
innovation and the use of previously
neglected data sources are what push the
industry forward. Apart from traditional
methods of stock price forecasting, which
heavily associate with time series analysis,
we’d like to explore ways of effectively
performing such analysis and predictions
without regarding the time series
properties(moving average etc.) In this
project, we investigate the influence of
external factors as additional features to
traditional forecasting models, comparing
the performance of any combination of them
as well. For instance, we will operate the
sentiment analysis of Elon Musk’s Twitter to
see if it fluctuates Tesla’s closing price..
State-of-the-art forecasting techniques such
as recurrent networks and deep learning are
referenced and modified in this project so
that it is technically relevant in 2021.
2. Dataset 2.1 Dataset Description
In order to analyze the relationship between
Elon Musk’s Twitter content and Tesla
stocking price, we subtracted Elon Musk’s
Twitter data from the Twitter API, as well as
stock pricing data of Tesla and its main
competitors (Volkswagen, General Motors
and Ford) from Yahoo Finance.
Competitors Stock Price dataset:
In this dataset, we collect three of Tesla's
major competitors (Volkswagen, General
Motors and Ford) open and close stock
prices from Jan 1, 2019 to Sep 31, 2021.
Tesla Stock Price dataset:
In this dataset, we collect Tesla’s open and
close stock prices from Jan 1, 2019 to Sep
31, 2021 shown below.
To get a better understanding of the
relationship between each brand, we
combined two datasets and utilized two line
graphs to show the change of stock price
from 2019 to 2021 shown below.
It is noticeable that the trend of General
Motors looks more similar to the trend of
Tesla, implying that principal component
analysis may be conducted in the future
topic of interest. Additionally, we add stock
price difference columns for each brand to
get more features to our model. One of the
benefits by doing so is that we can easily tell
the relationship of the stock price for the
same brand.
Twitter dataset:
We extracted the content of Musk’s Twitter
from Sep 31, 2020 to Sep 31, 2021 using
Twitter API.
2.2 Feature Engineering Twitter dataset:
1) Cleaning:
The main datasets that we extracted from the
internet are Twitter. For the Twitter dataset,
we used the regular expression to delete the
user name (for instance, “@xxx”), the image
url and the reference url, and then collected
all the content in the same format. There
would be multiple tweets per day, but we
only need one measurement connected to
each day. Thus, we decided to aggregate the
tweets of the same day, and then conduct the
sentimental analysis.
2) Sentiment Analysis:
We imported Textblob to implement the
sentiment analysis and plot with the
information of Tesla stock going up or
down.
It is shown from the graph above that if Elon
Musk’s tweets have positive attitude or tone,
the scatter points are above zero while if
they are negative, scatter points are below
zero. Points lying on the horizontal line of
zero represent the neutral tone. For the stock
price, green points mean price going up
while red points represent price going down.
We also performed the sentiment analysis of
Musk’s tweets and Tesla stock price with
one day delay.
The graph shown above illustrates that if
Elon Musk’s tweets one day before have
some influence on the second-day stock
price. It is noticeable that both graphs show
some trend between tones and stock price,
which is reasonable to conduct random
forest in our model.
3. Model 3.1 Avoid overfitting
In order to find the best degree in different
polynomial transformations, we compared
the different performances according to the
Mean Square Error (MSE). We used
PolynomialFeatures from sklearn to
construct the transformations with different
exponents. Since the function generates all
polynomial and interaction features
including all polynomial combinations of
the features with degrees less than or equal
to the given degree, the number of features
increases exponentially which means it is
easy to cause overfitting. Therefore, we tried
1, 2, 3 as the given degree and compared the
loss values. The data were divided into
training and testing sets and calculated the
MSE separately.
From the plot, when the degree increases
from 1 to 3, the MSE of training sets
decreases, while the MSEs of testing sets
decrease and then increase, which means the
model starts to be overfitting when the
degree is larger than 2.
3.2 Model Performance Analysis
3.2.1 logistic regression In the first part, we used the all the history
stock prices, including both Tesla and other
stocks, to predict the stock price of Tesla.
From the plot below, we can see that the
prediction and true data are almost identical
and fit the linear regression model.
In the second part, we excluded Tesla from
the data set and only used other stocks to
predict the stock price of Tesla. According
to the plot below, much more outliers are
apart from the linear regression line. This is
a reasonable result since our prediction only
depends on the market performance without
Tesla itself, which might cause larger error
from the actual price.
Lastly, we still used the historical stock
prices excluding Tesla as our data set. This
time we tested the performance of the
polynomial transformation of degree 2, since
in the previous Avoiding overfitting part we
found 2 is the best degree in the model.
Specifically, less outliers are presented in
the plot, which means the polynomial
transformation model fits the data set better
than the linear regression.
3.3.2 Neural Network After we did the feature engineering for the
twitter dataset in the data processing part,
we found out that there is a correlation
between price and tweets tone. In the last
models, we predicted the stock price with a
polynomial transformation model and
decreased the number of outliers to the
actual price. Therefore, we decided to find
the deep underlying relationships between
the price trend and other factors. We used
stock prices of the three competitors (GM, F,
VWAGY), the sentiment scores and the
Tesla’s trend (True or False) the day before
as predictors to predict the price trend the
next day.
The code chunk shown above expresses the
definition of the architecture of our neural
network as well as its forward function. In
the training step, we used 100 epochs and a
constant learning rate of 1*10^-4 without
learning rate decay. The loss values were
calculated with a binary cross entropy loss
function because our output is binary.
This accuracy, though below 0.5, has been
the best we could obtain from modifying the
parameters.
However, we aimed to get a better
prediction of our analysis. Thus, we decided
to get rid of several inputs and change to a
new model to see if it can improve the
performance.
3.3.3 Random Forest
In the previous feature engineering part, we
conducted sentiment analysis of Elon
Musk’s tweets with the performance of Tesla
stock price on the same day. The
visualization suggests that there is a
correlation between two variables.
So for the third model, we are interested in
using a random forest model to separate the
tweets that potentially have positive
influence on the market and otherwise.
Compared with the traditional time series
model, the feature of this random forest
model is to see the connection between price
and content of Musk’s tweets. We can only
tell if the price is going up or down based on
Elon Musk tweets. For this model, we both
calculated the accuracy of the same-day
input and the one-day delay input response.
They are around 0.655 and 0.431 separately,
which means for the same-day model, about
65.6% of the predictions are correct while
for the one-day delay model only 43.1
percent of price trend predictions are
correct.
The graph presented above shows the
accuracy for the random forest model of
the same-day input. Red scatter points
represent the right prediction while black
points show the wrong prediction.
4. Conclusion It is obvious that the history stock price data
set containing Tesla predicts the stock price
best. If we exclude Tesla, the polynomial
transaction model with degree of 2 improves
the prediction compared to the linear logistic
model. Then we calculated the accuracy of
predicting price trends with different
models. In the next model, we used 3
competitors' price, sentiment scores and
Tesla price to predict the price trend, but the
accuracy is lower than 0.5 which is
meaningless. So we decided to reduce inputs
and the random forest model proves to be
effective. The accuracy increases from 0.431
to 0.655. Though our training set is small
and it produces higher variance in the
prediction, the model still shows a
correlation between the price and Twitter
content.
4.1 Weapon of Math Destruction
In the logistic regression model, our output
is the predicted stock price based on
different stock combinations. Thus, the
outputs are floats that are easy to measure.
In the neural network model, we calculated
if the Tesla price will go up or down and
classified it as True and False. These binary
outputs are also measurable. The output of
the random forest is the same as the previous
one.In all, all our response variables can be
measured quantitatively.
Our models use the stock prices and Twitter
from Musk to predict Tesla’s stock price and
if the price will go up or not. All the
resources and references are open to the
public which means people are free to use
them to build their own models and predict
the prices. Thus, our model will not harm
anyone.
Finally, since we only used history stock
prices and Twitter content to predict the
price, it will not create a feedback loop since
we do not use predicted values as our
features.
In conclusion, our project might not produce
a Weapon of Math Destruction.
4.2 Fairness
Our team do not think fairness is very
important to our models. We used all the
stock prices and relevant Twitter
information as our data set, so there is no
discrimination and bias in the data set. Also
since we try to predict Tesla’s stock price, a
small error to actual price is acceptable since
a company will never invest in a single stock
and a portfolio can also decrease the risk.
Finally, predicting stock price will not affect
legal status. Thus, fairness is not an
important factor in the models.
5. Limitation and future improvements 5.1 Limitation
The largest limitation is lack of data. The
first limitation is due to API. We are only
allowed to extract one year of Twitter data
and this insufficient data will increase
variance and cause bias. Also we can only
get the content of tweets and there is no
number of likes and repos, which might also
influence the prediction. Moreover, we
planned to add the information of successful
launch in SpaceX as our feature, but it was
hard to get from the website. Finally, we
only got open and close prices from the
stock information, but there’s insufficient
information about the stock, such as volume,
market capital, high and low prices.
5.2 Improvement
We can also use positive and negative news
on Tesla correlated company to make
important database and stock performance to
better predict the stock price.
6. Appendix
1. SG:pub.10.1007/978-1-4614-9372-3
– springer nature scigraph. (n.d.).
Retrieved December 5, 2021, from
https://scigraph.springernature.com/p
ub.10.1007/978-1-4614-9372-3.
2. Sentiment analysis of Twitter data –
ACL member portal. (n.d.).
Retrieved December 5, 2021, from
https://aclanthology.org/W11-0705.p
df.
3. Pedregosa, F., Varoquaux, G.,
Gramfort, A., Michel, V., Thirion,
B., Grisel, O., Blondel, M., Müller,
A., Nothman, J., Louppe, G.,
Prettenhofer, P., Weiss, R., Dubourg,
V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot,
M., & Duchesnay, É. (2018, June 5).
Scikit-Learn: Machine learning in
Python. arXiv.org. Retrieved
December 5, 2021, from
https://arxiv.org/abs/1201.0490.
4. Pedregosa, F., Varoquaux, G.,
Gramfort, A., Michel, V., Thirion,
B., Grisel, O., Blondel, M., Müller,
A., Nothman, J., Louppe, G.,
Prettenhofer, P., Weiss, R., Dubourg,
V., Vanderplas, J., Passos, A.,
Cournapeau, D., Brucher, M., Perrot,
M., & Duchesnay, É. (2018, June 5).
Scikit-Learn: Machine learning in
Python. arXiv.org. Retrieved
December 5, 2021, from
https://arxiv.org/abs/1201.0490.
,
Case Study Title
Abstract—This electronic document is a “live” template and already defines the components of your paper [title, text, heads, etc.] in its style sheet. *CRITICAL: Do Not Use Symbols, Special Characters, Footnotes, or Math in Paper Title or Abstract. ( Abstract )
I. Introduction
This template, modified in MS Word 2007 and saved as a “Word 97-2003 Document” for the PC, provides authors with most of the formatting specifications needed for preparing electronic versions of their papers. All standard paper components have been specified for three reasons: (1) ease of use when formatting individual papers, (2) automatic compliance to electronic requirements that facilitate the concurrent or later production of electronic products, and (3) conformity of style throughout a conference proceedings. Margins, column widths, line spacing, and type styles are built-in; examples of the type styles are provided throughout this document and are identified in italic type, within parentheses, following the example. Some components, such as multi-leveled equations, graphics, and tables are not prescribed, although the various table text styles are provided. The formatter will need to create these components, incorporating the applicable criteria that follow.
II. Ease of Use
A. Selecting a Template (Heading 2)
First, confirm that you have the correct template for your paper size. This template has been tailored for output on the A4 paper size. If you are using US letter-sized paper, please close this file and download the Microsoft Word, Letter file.
B. Maintaining the Integrity of the Specifications
The template is used to format your paper and style the text. All margins, column widths, line spaces, and text fonts are prescribed; please do not alter them. You may note peculiarities. For example, the head margin in this template measures proportionately more than is customary. This measurement and others are deliberate, using specifications that anticipate your paper as one part of the entire proceedings, and not as an independent document. Please do not revise any of the current designations.
III. Prepare Your Paper Before Styling
Before you begin to format your paper, first write and save the content as a separate text file. Complete all content and organizational editing before formatting. Please note sections A-D below for more information on proofreading, spelling and grammar.
Keep your text and graphic files separate until after the text has been formatted and styled. Do not use hard tabs, and limit use of hard returns to only one return at the end of a paragraph. Do not add any kind of pagination anywhere in the paper. Do not number text heads-the template will do that for you.
A. Abbreviations and Acronyms
Define abbreviations and acronyms the first time they are used in the text, even after they have been defined in the abstract. Abbreviations such as IEEE, SI, MKS, CGS, sc, dc, and rms do not have to be defined. Do not use abbreviations in the title or heads unless they are unavoidable.
B. Units
· Use either SI (MKS) or CGS as primary units. (SI units are encouraged.) English units may be used as secondary units (in parentheses). An exception would be the use of English units as identifiers in trade, such as “3.5-inch disk drive”.
· Avoid combining SI and CGS units, such as current in amperes and magnetic field in oersteds. This often leads to confusion because equations do not balance dimensionally. If you must use mixed units, clearly state the units for each quantity that you use in an equation.
· Do not mix complete spellings and abbreviations of units: “Wb/m2” or “webers per square meter”, not “webers/m2”. Spell out units when they appear in text: “. . . a few henries”, not “. . . a few H”.
· Use a zero before decimal points: “0.25”, not “.25”. Use “cm3”, not “cc”. (bullet list)
C. Equations
The equations are an exception to the prescribed specifications of this template. You will need to determine whether or not your equation should be typed using either the Times New Roman or the Symbol font (please no other font). To create multileveled equations, it may be necessary to treat the equation as a graphic and insert it into the text after your paper is styled.
Number equations consecutively. Equation numbers, within parentheses, are to position flush right, as in (1), using a right tab stop. To make your equations more compact, you may use the solidus ( / ), the exp function, or appropriate exponents. Italicize Roman symbols for quantities and variables, but not Greek symbols. Use a long dash rather than a hyphen for a minus sign. Punctuate equations with commas or periods when they are part of a sentence, as in:
ab
Note that the equation is centered using a center tab stop. Be sure that the symbols in your equation have been defined before or immediately following the equation. Use “(1)”, not “Eq. (1)” or “equation (1)”, except at the beginning of a sentence: “Equation (1) is . . .”
D. Some Common Mistakes
· The word “data” is plural, not singular.
· The subscript for the permeability of vacuum 0, and other common scientific constants, is zero with subscript formatting, not a lowercase letter “o”.
· In American English, commas, semicolons, periods, question and exclamation marks are located within quotation marks only when a complete thought or name is cited, such as a title or full quotation. When quotation marks are used, instead of a bold or italic typeface, to highlight a word or phrase, punctuation should appear outside of the quotation marks. A parenthetical phrase or statement at the end of a sentence is punctuated outside of the closing parenthesis (like this). (A parenthetical sentence is punctuated within the parentheses.)
· A graph within a graph is an “inset”, not an “insert”. The word alternatively is preferred to the word “alternately” (unless you really mean something that alternates).
· Do not use the word “essentially” to mean “approximately” or “effectively”.
· In your paper title, if the words “that uses” can accurately replace the word “using”, capitalize the “u”; if not, keep using lower-cased.
· Be aware of the different meanings of the homophones “affect” and “effect”, “complement” and “compliment”, “discreet” and “discrete”, “principal” and “principle”.
· Do not confuse “imply” and “infer”.
· The prefix “non” is not a word; it should be joined to the word it modifies, usually without a hyphen.
· There is no period after the “et” in the Latin abbreviation “et al.”.
· The abbreviation “i.e.” means “that is”, and the abbreviation “e.g.” means “for example”.
An excellent style manual for science writers is [7].
IV. Using the Template
After the text edit has been completed, the paper is ready for the template. Duplicate the template file by using the Save As command, and use the naming convention prescribed by your conference for the name of your paper. In this newly created file, highlight all of the contents and import your prepared text file. You are now ready to style your paper; use the scroll down window on the left of the MS Word Formatting toolbar.
A. Authors and Affiliations
The template is designed for, but not limited to, six authors. A minimum of one author is required for all conference articles. Author names should be listed starting from left to right and then moving down to the next line. This is the author sequence that will be used in future citations and by indexing services. Names should not be listed in columns nor group by affiliation. Please keep your affiliations as succinct as possible (for example, do not differentiate among departments of the same organization).
1) For papers with more than six authors: Add author names horizontally, moving to a third row if needed for more than 8 authors.
2) For papers with less than six authors: To change the default, adjust the template as follows.
a) Selection: Highlight all author and affiliation lines.
b) Change number of columns: Select the Columns icon from the MS Word Standard toolbar and then select the correct number of columns from the selection palette.
c) Deletion: Delete the author and affiliation lines for the extra authors.
B. Identify the Headings
Headings, or heads, are organizational devices that guide the reader through your paper. There are two types: component heads and text heads.
Component heads identify the different components of your paper and are not topically subordinate to each other. Examples include Acknowledgments and References and, for these, the correct style to use is “Heading 5”. Use “figure caption” for your Figure captions, and “table head” for your table title. Run-in heads, such as “Abstract”, will require you to apply a style (in this case, italic) in addition to the style provided by the drop down menu to differentiate the head from the text.
Text heads organize the topics on a relational, hierarchical basis. For example, the paper title is the primary text head because all subsequent material relates and elaborates on this one topic. If there are two or more sub-topics, the next level head (uppercase Roman numerals) should be used and, conversely, if there are not at least two sub-topics, then no subheads should be introduced. Styles named “Heading 1”, “Heading 2”, “Heading 3”, and “Heading 4” are prescribed.
C. Figures and Tables
a) Positioning Figures and Tables: Place figures and tables at the top and bottom of columns. Avoid placing them in the middle of columns. Large figures and tables may span across both columns. Figure captions should be below the figures; table heads should appear above the tables. Insert figures and tables after they are cited in the text. Use the abbreviation “Fig. 1”, even at the beginning of a sentence.
TABLE I. Table Type Styles
Table Head |
Table Column Head |
||
Table column subhead |
Subhead |
Subhead |
|
copy |
More table copya |
Fig. 1. Example of a figure caption. (figure caption)
Figure Labels: Use 8 point Times New Roman for Figure labels. Use words rather than symbols or abbreviations when writing Figure axis labels to avoid confusing the reader. As an example, write the quantity “Magnetization”, or “Magnetization, M”, not just “M”. If including units in the label, present them within parentheses. Do not label axes only with units. In the example, write “Magnetization (A/m)” or “Magnetization {A[m(1)]}”, not just “A/m”. Do not label axes with a ratio of quantities and units. For example, write “Temperature (K)”, not “Temperature/K”.
References
The template will number citations consecutively within brackets [1]. The sentence punctuation follows the bracket [2]. Refer simply to the reference number, as in [3]—do not use “Ref. [3]” or “reference [3]” except at the beginning of a sentence: “Reference [3] was the first …”
Number footnotes separately in superscripts. Place the actual footnote at the bottom of the column in which it was cited. Do not put footnotes in the abstract or reference list. Use letters for table footnotes.
Unless there are six authors or more give all authors’ names; do not use “et al.”. Papers that have not been published, even if they have been submitted for publication, should be cited as “unpublished” [4]. Papers that have been accepted for publication should be cited as “in press” [5]. Capitalize only the first word in a paper title, except for proper nouns and element symbols.
For papers published in translation journals, please give the English citation first, followed by the original foreign-language citation [6].
[1] G. Eason, B. Noble, and I. N. Sneddon, “On certain integrals of Lipschitz-Hankel type involving products of Bessel functions,” Phil. Trans. Roy. Soc. London, vol. A247, pp. 529–551, April 1955. (references)
[2] J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68–73.
[3] I. S. Jacobs and C. P. Bean, “Fine particles, thin films and exchange anisotropy,” in Magnetism, vol. III, G. T. Rado and H. Suhl, Eds. New York: Academic, 1963, pp. 271–350.
[4] K. Elissa, “Title of paper if known,” unpublished.
[5] R. Nicole, “Title of paper with only first word capitalized,” J. Name Stand. Abbrev., in press.
,
Date | High | Low | Open | Close | Volume | Adj Close |
2019-01-02 | 63.0260009765625 | 59.7599983215332 | 61.220001220703125 | 62.02399826049805 | 58293000.0 | 62.02399826049805 |
2019-01-03 | 61.880001068115234 | 59.47600173950195 | 61.400001525878906 | 60.071998596191406 | 34826000.0 | 60.071998596191406 |
2019-01-04 | 63.599998474121094 | 60.54600143432617 | 61.20000076293945 | 63.53799819946289 | 36970500.0 | 63.53799819946289 |
2019-01-07 | 67.3479995727539 | 63.54999923706055 | 64.34400177001953 | 66.99199676513672 | 37756000.0 | 66.99199676513672 |
2019-01-08 | 68.802001953125 | 65.40399932861328 | 68.39199829101562 | 67.06999969482422 | 35042500.0 | 67.06999969482422 |
2019-01-09 | 68.69999694824219 | 66.29399871826172 | 67.0999984741211 | 67.70600128173828 | 27164500.0 | 67.70600128173828 |
2019-01-10 | 69.0780029296875 | 66.35800170898438 | 66.87999725341797 | 68.99400329589844 | 30282000.0 | 68.99400329589844 |
2019-01-11 | 69.68199920654297 | 67.75399780273438 | 68.41799926757812 | 69.4520034790039 | 25195500.0 | 69.4520034790039 |
2019-01-14 | 68.5 | 66.80000305175781 | 68.47599792480469 | 66.87999725341797 | 26236500.0 | 66.87999725341797 |
2019-01-15 | 69.76000213623047 | 66.9000015258789 | 67.0 | 68.88600158691406 | 30283000.0 | 68.88600158691406 |
20
Collepals.com Plagiarism Free Papers Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers. Why Hire Collepals.com writers to do your paper? Quality- We are experienced and have access to ample research materials. We write plagiarism Free Content Confidential- We never share or sell your personal information to third parties. Support-Chat with us today! We are always waiting to answer all your questions. All Rights Reserved Terms and Conditions |