Please use **Google Chrome** or **Mozilla FireFox** to see the animations properly.

Edexcel has made a really constructive move with the introduction of its Large Data Set, LDS, as a part of its statistics course in Applied Mathematics for A Level. It contains a substantial amount of data that could be used to experiment with numerous statistical concepts. With that, those who show a knack for statistics can explore many avenues in the field that stem from the modern data science and AI, Artificial Intelligence.

The main examination boards in the United Kingdom have understood the need of large data sets, when it comes to dealing with data, a marked deviation from traditional data tables with relatively small amount of data; the data science industry needs people with real practical skills in this realm, not just folks with the understanding of mere concepts. In this context, the introduction of the Large Data Set by Edexcel is a step in the right direction.

The LDS covers the data accumulated in 1987, covering 5 towns in the United Kingdom, a city in China, a city in the US and a city in Australia. Jacksonville in the US and Perth in Australia are in the Northern and Southern Hemispheres respectively.

In this tutorial, you will learn the following *interactively:*

- Plotting two sets of data on the same grid against the same time period
- You can change the time period interactively to see the changes
- During a given period, you can find the locations and spread of data
- You can check whether the two sets of data have a correlation
- You can take a random sample of a size of your choice
- The precautions you should take while taking samples from this data set
- Interactive boxplots
- The units of oktas and knots are fully explained
- The areas where you are supposed to exercise restraint, when it comes to forecasting island-wide weather, based on this particular data set

The Edexcel large data set covers the following data:

- Daily Mean Temperature (0900-0900) (°C)
- Daily Total Rainfall (0900-0900) (mm)
- Daily Total Sunshine (0000-2400) (hrs)
- Daily Mean Windspeed (0000-2400) (kn)
- Daily Mean Windspeed (0000-2400) (Beaufort conversion)
- Daily Maximum Gust (0000-2400) (kn)
- Daily Maximum Relative Humidity %
- Daily Mean Total Cloud (oktas)
- Daily Mean Visibility (Dm)
- Daily Mean Pressure (hPa)
- Daily Mean Wind Direction (o)
- Cardinal Direction
- Daily Max Gust Corresponding Direction (o)
- Cardinal Direction

Please note that data in some cells in the Excel large data set is missing, represented by *n/a* characters - a serious challenge for a developer to overcome before plotting.

The large data set can be downloaded from the following link(2020):

**Download Edexcel Large Data Set**

The following data locations and data spread are interactively updated:

In the animation, you can change the period of data using a slider below the chart; not only does the chart get updated, but also locations of data and spread for that particular period are updated accordingly.

Mean | Mode | Median | Standard Deviation | Maximum | Minimum | Interquartile Range | |
---|---|---|---|---|---|---|---|

Temperature: ^{0}C |
|||||||

Cloud Cover: oktas |

Please note that cloud cover, measured in oktas, is a discrete variable.

**Mean:** x̄ = Σx / n

**Standard Deviation:** σ = √Σ(x - x̄)²/n or σ = √Σf(x - x̄)²/(Σf)

**Q1:** 25% of data lies below this

**Q3:** 75% of data lies below this

**Median:** 50% of data lies below this

**IQR:** Q3 - Q1

When the data is too large or too small, we use coding to make calculations easier.

**E.g.**

x: 111, 121, 131, 141, 151

This data set can be tuned into y as follows by coding:

Let y = (x - 1)/10

y: 11, 12, 13, 14, 15

ȳ = Σy/5 = (11 + 12 + 13 + 14 + 15)/5 = 13
Now, the locations of the data can be found in terms of y and then turned into corresponding x values.

The same process can be used if the data in question is too small.

**Turning Coded Values into Original Values**

Let y = (x - a)/b, where a and b are constants. x and y are original and coded values respectively.

ȳ = Σ y/n

= Σ (x - a)/nb

= Σ x/nb - Σa/nb

= x̄/b - na/nb

= x̄/b - a/b

**x̄ = bȳ + a**
If ȳ, a and b are known,

x̄ can be calculated easily.

In the above example, ȳ = 13; a = 1; b = 10

x̄ = 10ȳ + 1

x̄ = 131

**Variables** are characteristics, numbers or quantities that can be counted or measured.

E.g. wind speed, cloud cover, no of fish in a lake, no of girls in a class with black hair

In the large data set, LDS, the following units are used to represent the **wind speed** and **cloud cover.**

**Knots**

The number of nautical miles per hour gives the speed of wind in knots.

Nautical miles are used for navigation.

**1 knot = 1.15 mph**

You can convert **knots** into **mph** by using the following; just put the value in the text box and move the mouse out:

**Okta**

This is the unit of the measurement of cloud cover. It's a discrete unit and ranges from 0 - 8 - hence a derivation of octave.

◯ - 0 okta: clear sky

◔ - 2 oktas: ¼ of the sky covered by clouds

◑ - 4 oktas: ½ of the sky covered by clouds

◕ - 6 oktasa: a ¾ of the sky covered by clouds

⬤ - 8 oktas: a ¼ of the sky covered by clouds

You can take random samples from the LDS, provided that you know how to avoid the cells with no data. For instance, there is no data in the first 16 cells of the **Daily mean wind speed **column. If you treat the whole column as the population and a random number turns out to be in that region, there is going to an error related to that data. It will be the same for systemic sampling.

These samples from the LDS do not lead to an accurate or reliable forecast for the UK weather for the following reasons.

- The data does not cover the entire United Kingdom.
- The data covers just five areas of the country.
- The data covers a period of 6 months of the year - a part of summer and autumn

** Ad:** Recommended Reading:

In this book, the statistical concepts are explained well; there are plenty of worked examples to complement what you learn at school/college/university.

The following interactive chart checks whether there is any correlation between the daily temperature and the cloud cover in Heathrow area in the United Kingdom. The temperature and cloud cover are plotted along the x-axis and y-axis respectively; the units are ^{0}C and oktas respectively.

Data Source: Edexcel

The following histogram is based on the cloud cover data in Heathrow - from May to October 1987. It's a histogram based on cloud cover, measured in oktas - a discreet variable. It's fully interactive.

Data Source: Edexcel

Since the data in question is discrete, the above chart can also be described as a *bar chart.*

The following histogram is based on continuous data, collected over a period of six months in 1987, in Heathrow area. The data shows that relative humidity stayed above 65%, most of the time. In this context, you may understand why the histogram has been restricted to just 3 classes.

Data Source: Edexcel

The boxplot below is based on the data collected from May, 1987 to October, 1987 in Heathrow area in the UK, where one of the busiest air ports in the world functions from. As the chart shows, the relative humidity fluctuated between 70% and 90% during the period of six months in the summer / autumn seasons. The boxplot is fully interactive.

Data Source: Edexcel

In order to compare the daily average temperatures, from Camborne and Heathrow, the following interactive animation has been made.

Change the size of the sample and keep an eye on the boxplots and the frequency tables, as the are automatically updated.

When comparing two data sets, please note the following:

1) Compare median and interquartile range

2) Compare mean and standard deviation

3) Do not compare median and standard deviation

4) Do not compare mean and interquartile range

If you want to contact me, here is the email.

The above histogram shows how to take a sample of daily mean temperatures in Heathrow in 1987, from May to October. Answer the following when the sample size is 110.

- A formula for frequency and class width
- The frequency of the classes, 6 - 8 and 18 - 20

1) Let's take the sample of 110 data values - you can change the sample size to whatever value you want.

Since the frequency of a bar of a histogram ∝ area,

f ∝ area

f = k x area

48 = k x 24

24k = 48

k = 2

**f = 2A**

2) For class, 6 - 8,

f = 2 x 1

=2

For class, 18 - 20,

f = 2 x 5

= 10

With the following animation, you can see how the *residual sum of squares* determines the *perfect regression line.* Move the data points closer to the line with your mouse and see the equation of the regression line. It's fun, isn't it?

You will find the following tutorials useful too:

Tutorials for A-Level
Quadratic Equations - interactive
Quadratic Equations - Word Problems
Transformations of Graphs
Graph Creator - interactive
Venn Diagrams - interactive
The Binomial Expansion - interactive
Triangular Square Numbers - interactive
Even and Odd functions
Iteration
Basic Differentiation
Basic Integration
Parametric Equations and Integration
How to use Casio calculator for statistics
Integration by Observation
Volume of Revolution - Integration
Trapezium Rule - Integration
Differential Equations - Integration
Mathematical proof
Standard Deviation
Cumulative Frequency - Quartiles, Median, IQR and box plots
Histograms
Stem and Leaf diagrams
Probability
Linear Interpolation
Probability - tree diagrams
Binomial Distribution
Poisson Distribution
Mechanics (Kinematics)
Mechanics (Dynamics - pulleys)
Mechanics (Motion-time graphs)
Mechanics - Momentum
Decision Maths - Chinese Postman Problem
The Dot Product - scalar product
Exponential Functions and Logarithms
Complex Numbers - Further Maths
Projectile Motion
Polynomial Roots and coefficients
Modelling with Quadratics
Vector Animations
The Chain Rule

I used ** Fetch()** function from the

- The data that comes as a
was dissected to extract the required data.*promise* - In order to plot data,
library was used.*Chart.js* - In order to find the statistical values,simple statistics JavaScript library was used.
- Two JavaScript functions was created to turn
into*knots*and get a sample from the data set.*mph*

Everything is evolving; so is the layout of a text book. By uniquely presenting the rich contents of the book, the author has elevated positive user experience of reading a text book to a new level: the examples are easy to follow and rich in standard. Highly recommended for those who want to master JavaScript and JQuery.

The significance of app stores is over; progressive web apps is the next *big thing.* They are just websites that makes the need of going through app stores and need of storing redundant. They work offline too. If you have a reasonable understanding of HTML, CSS and JavaScript, this is the book for you to learn in no time.

David Geary, in this book, shows how to combine JavaScript and HTML5 Canvas to produce amazing animations; he has set aside a whole chapter to teach you the role of physics in animations. If you want an in-depth understanding about HTML5 Canvas animations, this is a must read.