From Compact Discs to Streaming: A Comparison of Eras within the Brazilian Market

: The music industry has undergone many changes in the last few decades, notably since vinyl, cassettes and compact discs faded away as streaming platforms took the world by storm. This Digital evolution has made huge volumes of data about music consumption available. Based on such data, we perform cross-era comparisons between Physical and Digital media within the music market in Brazil. First, we build artists' success time series to detect and characterize hot streak periods, defined as high-impact bursts that occur in sequence, in both eras. Then, we identify groups of artists with distinct success levels by applying a cluster analysis based on hot streaks' features. We find the same clusters for both Physical and Digital eras: Spike Hit Artists, Big Hit Artists, and Top Hit Artists. Our results reveal significant changes in the music industry dynamics over the years by identifying the core of each era.

3 Still, streaming popularization has brought new challenges due to the massive volume of musicrelated data to process and analyze.
Finding and promoting artists with promising careers is an example of a task that has become more complex and important. In the Physical era, having a major label was essential to breakthrough; but now, artists from small or independent labels can go viral and become popular thanks to streaming services, showing just how inherently dynamic the music industry is. In such a context, combining Artificial Intelligence (AI) tools and techniques may be the key to facing the existing challenges of this task, creating significant benefits for both the artists and the A&R (Artists & Repertoire) executives. In fact, many applications use AI-powered technology in the music industry, such as genre classification (SHINOHARA; FOLEISS; TAVARES, 2019) and success prediction (ARAUJO; CRISTO; GIUSTI, 2019; MARTÍN-GUTIÉRREZ et al., 2020).
Regarding the latter, identifying upcoming artists with outstanding success is crucial, as it helps planning and adjusting marketing directions for their careers.
Generally, musical careers present continuous periods of success above average, defined as hot streaks (HS). The concept has been investigated in many domains, including science (SINATRA et al., 2016), social media (GARIMELLA; WEST, 2019), and creative careers (LIU et al., 2018;JANOSOV;BATTISTON;SINATRA, 2020;LIU et al., 2021). In such a context, we explore over three decades in the Brazilian music market, assessing the evolution of successful careers by comparing data from the Physical (1990Physical ( -2015 and Digital (2016Digital ( -2020 eras. In particular, we build artists' success time series based on sales (Physical era) and streams (Digital era).
Based on such time series, we investigate whether the most successful periods in an artist's career occur chronologically close and detect hot streak periods. Then, we perform a cluster analysis to group artists according to their success level. Finally, we characterize such hot streaks to extract insights into the temporal evolution of musical careers. This article extends a paper from the 18 th Brazilian Symposium on Computer Music (BARBOSA et al., 2021). As new material, we perform a clustered success analysis to investigate if the success is grouped in time for most artists. In addition, we deepen the previous analyses on hot streak periods by analyzing other dimensions such as artist nationality and the position of the first hot streak. The remainder of this paper is organized as follows. First, we discuss related work in 4 Section 1. Then, we describe the data acquisition process in Section 2. We present a clustered success analysis in Section 3. We detail the methodology used to identify hot streaks in Section 4.
We overview the identified clusters and previous analyses in Section 5. Next, we further enrich the cross-era comparison by including the evolution of genre preference, artist nationality preference in the Brazilian Market, and the first hot streak analysis in Section 6. Finally, we conclude with future directions in Section 7.

Related Work
Although streaming platforms are inherently designed to not interfere with the music production process, their leading role in the music industry is unquestionable: they determine the amount paid to music content producers, and dictate the type of music accessible through their recommendation algorithms LADEIRA, 2018 MARTÍN-GUTIÉRREZ et al., 2020;OLIVEIRA et al., 2020).
After decades of intense transformations in the music market, the Digital era brought novel challenges, including a substantial volume of data. As human inspection is almost impossible for music big data scale, specialized algorithms can help with several tasks in MIR, including music recommendation (BORGES; QUEIROZ, 2017), automatic genre classification (CORRÊA; RODRIGUEZ, 2016;SHINOHARA;FOLEISS;TAVARES, 2019;ARAÚJO LIMA et al., 2020), algorithmic composition (HOLOPAINEN, 2021) and so on. Another possible benefit is to feed machine-learning models for musical success early prediction, contributing to identify trends and new talent. Indeed, evaluating the impact of human performance is a common practice in many research fields (HENDRICKS; PATEL; ZECKHAUSER, 1993;RABIN;VAYANOS, 2010;RAAB, GULA;GIGERENZER, 2012). The term Hot Streak emerges in such context, as the 5 reference to a specific period within professional careers when the success is significantly higher than the average (LIU et al., 2018).
For individual and creative careers, research assessing impact is much more recent. Liu et al. (2018) consider large-scale careers of artists, film directors and scientists to demonstrate that hot streaks are remarkably universal across diverse domains, yet usually unique across different careers.
In this sense, Garimella and West (2019) use data from Twitter, one of the most popular online social networks, and define users' impact as the reach of their content. Janosov et al. (2020) also consider luck as a crucial ingredient to achieve impact in creative domains. Regarding music, they model the historical artist timelines based on the release year of songs and measure success by the total play counts obtained from Last.fm.
Nonetheless, to the best of our knowledge, no previous studies address the dynamics of music artists' success periods (i.e., hot streaks) within the Brazilian market. Also, although Brazil's high rates of music consumption, little is known about the key factors driving musical success and defining artists' promising careers. As regional markets have their own success patterns and behavior (OLIVEIRA et al., 2020;DE MELO;MACHADO;DE CARVALHO, 2020), such individual analyses are crucial. Therefore, this work is a step forward towards understanding the specific dynamics of music artist success within the Brazilian market.

Data Acquisition
To perform a cross-era comparative analysis between Physical and Digital media, we focus on musical success in Brazil. Our first data source is Spotify, the most popular global audio streaming service. However, its Charts only comprise data from 2016 onwards. Hence, to describe the Digital Era, we consider the range period available (2016)(2017)(2018)(2019)(2020). Therefore, we also use the Pró-Música Brasil platform to describe the Physical Era, with data from 1990 to 2015. Next, we detail the data acquisition processes for both Physical (Section 2.1) and Digital media (Section 2.2). 6

Physical Media
Pró-Música Brasil (PMB) is the official representative body of the record labels in the Brazilian phonographic market. It represents artists in legal and financial instances and issues certification awards, as authorized by record companies. The certification awards recognize the work of performers according to sale numbers in the form of "special discs", i.e., Gold, Platinum and Diamond discs. The data on such awards is available on its website4 and was collected on February 5th, 2021. The final dataset comprises information on awarded artists, release year, disc category, song/album name and media type since 1990. In PMB, the threshold sales number for each certificate depends on whether the artist is Brazilian or not, as shown in Table 1. However, as such information is not available in PMB, we crawled it from Wikipedia using a Python library. 5 Next, we collect the total sales for each musical work based on the certification awarded, nationality, and PMB's sales metric for the disc award.
Finally, we use Spotify's API 6 to associate each artist with their respective genres specified on the

Digital Media
Between 2016 and 2017, there was a crucial change in PMB's metric, which moved from Physical media (i.e., DVD and CD) to Digital media (i.e., Singles and Albums), as depicted in Figure 3. Meanwhile, streaming was already the primary revenue source for Digital media (58.3%). 7 FIGURE 2 -Discs certificated in Pró-Música Brasil (1990. In 2016, there was a metric change in the certification, hence the lack of data.
Given its relevance, we extract data referring to the Digital Era from the weekly Spotify Top 200 Chart, which corresponds to the most streamed songs in Brazil. Each chart entry contains the song's name and its artist(s), the number of streams, the song's Spotify URL and its position on the chart. We collect data from January 2017 to December 2020. We also collect artist data using the  Pró-Música Brasil (1990. In 2016, there was a metric change in the certification, hence the lack of data.

Clustered Success
In this section, we present an analysis of the distribution of success over time. We follow the methodology used by Garimella and West (2019) to investigate whether the most successful periods of each era (years or weeks) occur close to each other in artists' careers. We define artists' career as Our analyses focus on two main points. First, we investigate the timing of the most successful periods of an artist's career for both eras. Then, we look at the distribution of the difference between the positions of the two most successful periods within artists' careers. Such analyses are all made compared to shuffled careers to check the robustness of our findings, that is, if the observed effects still happen.

1. Timing of the most impactful periods
First, we analyze the positions of the five most successful years (Physical Era) and the five most successful weeks (Digital Era) within artists' careers. Figure 4 presents scatter plots of the first year of Physical Era P(Y1) versus the other considered years P(Yi) " i Î [2, 5], as well as the Pearson correlation coefficient (r) for each plot -correlation values are statistically significant (p < 0.05).
We consider all artists from our dataset. The results show linear correlation for all artists' careers and higher Pearson coefficient when comparing the first and second most popular years.
Compared with the third, fourth and fifth years, the correlation decreases its values. Such a finding reinforces the hypothesis that the most impactful years within an artist's career are more likely to happen close to each other. Regarding the Digital Era, Figure 5 presents scatter plots of the first week P(W1) versus the other considered weeks P(Wi) i Î [2,5], as well as the Pearson correlation coefficient (r) for each plot -correlation values are statistically significant (p < 0.05). Here, we also consider the totality of artists from our dataset. In contrast to the Physical Era, the results show clearer and stronger linear correlation for all artists' careers. Likewise, there is a higher Pearson coefficient concerning the first and second most popular weeks. However, although the third, fourth and fifth weeks decrease their correlation values, their value remains high. These results are different from those found for the Physical Era, and the greater granularity of the Physical data may directly influence them.
Nevertheless, it is still possible to verify that the first two weeks are more correlated than others, in both cases, reinforcing the premise that artists' most successful periods group in time.
10 FIGURE 5 -Scatter plots with Pearson correlation (r) of the position of the most successful week (Digital Era) in artist careers (W1) with W2, W3, W4 and W5, respectively. Each point represents an artist.
We then expand the investigation on correlation values to compare the positions of successive periods. Figure 6 shows a decrease in the correlation in all artists' careers for both eras.
Still, this pattern is not observed in shuffled careers, in which the correlation is always between -0.1 and 0.1 for the Physical Era, and -0.2 and 0.2 for Digital Era. Therefore, there is a general trend of clustering within the most successful periods (years or weeks) in artist careers, as they tend to happen close to each other in the success time series.
FIGURE 6 -Correlation between the first and i-th most successful years to the Physical Era (left), and the i-th most successful weeks to the Digital era (right).

Difference in positions of the most successful periods
In this section, we complement the previous analysis by taking each artist's first and second most successful periods to calculate their differences in positions and verify if they happen near each other. For instance, if each artist's most successful periods are close together (years 2 and 3 for one group of artists, and years 5 and 6 for another), subtracting their positions results in 1; the final result is close to zero when normalizing such values; i.e., if such position differences have the value of 1 for most artists, their success occurs in a consecutive period, or they are grouped in time.

11
To calculate the difference in the positions of the top two most successful periods for the artist's careers, we consider P(Y1) and P(Y2) for the Physical Era and P(W1) and P(W2) for Digital Era. We normalize such a difference by the number N of years/weeks of the artist's time series in the corresponding era. Figure 7 shows the distribution has a peak around zero for all artists' careers, suggesting that these two periods (years/weeks) are close to each other on the timeline. Note that the results are similar to both eras. Such outcome agrees with the findings of the previous analysis. Further, when we shuffle artists' careers, the distribution of these differences is much distinct from the original, demonstrating that this behavior of musical careers is not random.
Hence, there is strong evidence that artists may experience periods of outstanding success, or hot streaks, which we investigate in the next section.

Hot Streak Detection
To detect hot streaks in the artist's time series, we rely on previous work that shows the most successful points in professional careers tend to happen close to each other (GARIMELLA; WEST, 2019). Hence, we use a technique to reduce the time series dimensionality to continuous delimited periods within careers. Then, we define a hot streak as the periods in which the success (i.e., Physical sales or Digital streams) is above a certain threshold obtained from the career itself. In 12 other words, the hot streak detection does not consider external factors (e.g., genre and time) because artists reach different levels of success, and choosing a single threshold would make the comparison unfair.
Note that artists' careers may contain points with extreme values for the success metric.
Therefore, PAA is a helpful tool to smooth such differences and delimit periods in the careers.
Regarding code, we use the PAA implementation of tslearn (TAVENARD et al., 2020), a Python package for time series analysis. Its only parameter is the number of segments to split the series into (further information on values next).
Finally, we chose a specific threshold for defining the hot streak periods for each artist. Such an individualized approach is based on the percentiles of the success metric, and it allows analyzing the careers of artists with different levels of success. In other words, as success is relative for each artist, we detect HS for widely known artists with higher sales and streams, as well as independent artists who have received only a few certificates and streams.

Existing Clusters and Previous Analyses
Here, we review the previous analysis for both Physical and Digital Eras (BARBOSA et al, 2021). In Section 5.1, we use the PMB data and Spotify's Brazil Top 200 Charts for building the time series for each artist, respectively, to such eras. Next, we characterize the hot streaks for both 13 eras and understand their relationship to music genres in Section 5.2. Finally, we perform a cluster analysis to group similarly artists based on their success levels in Section 5.3.

Artists' Time Series
In the Physical Era, the evolution of an artist's success is represented by the certificates  From the artists' time series, we detect the hot streaks periods by first applying PAA (see Section 4). To do so, we set the number of segments in which the series will be split, as this is the only parameter of the method. After extensive experiments to reach meaningful values, we set the size of each PAA segment equal to two years for the Physical Era and 12 weeks for the Digital Era.
Hence, we calculate the number of segments by dividing the time series length by the predefined 14 size. In addition, we set the 80th percentile of the success metric in artists' time series as the threshold for defining the hot streak periods.

Hot Streak Characterization
We characterize the hot streak periods identified for artists according to their musical genres.
As individually considering closely related music styles may create artist overlapping and bias within the results, we define super-genres for this analysis. For example, we verify that Indie Folk is more frequently associated with Rock than any other super-genre and is then incorporated into Rock.

Cluster Analysis
We now move to the cluster analysis, which helps to better understand the characteristics of different success levels of artists achieved during the Physical and Digital Eras. We apply the K-Means algorithm in the time series and the Elbow method to find its optimal number of clusters.
The considered features for the algorithm include the total number of hot streaks, total sales and the time series threshold. As a result, the method outcome suggests three clusters. We name the  Table 3 and summarized as follows.

Cross-era Comparison
Music is part of people's daily lives regardless of the era experienced, whether Physical or Digital. With musical consumption constantly rising, we may notice similarities between both eras.
From the results obtained in (BARBOSA et al., 2021), we have explored Hot Streaks (HS) in musical careers within the Brazilian market. Such HS periods provide valuable information used in cluster analysis, in which we also notice cross-era similarities. Here, we take a step further by deepen the previous analyses on hot streak periods. Specifically, we discuss Brazilian listeners' main musical genre preferences in Section 6.1. Next, we explore the presence of Brazilian artists as the preferred consumption in the local market in Section 6.2. Finally, we explore when the first Hot Streaks occurs in the artists' careers in Section 6.3.

Genre Evolution
Musical genres express the cultural diversity existing in the country. Such diversity can be observed by the number of rhythms and musical styles as well as the specific characteristics that each of them retains. The most popular genres usually oscillate in the music market. Several factors may influence this issue, including the help provided by streaming platforms to spread cultural diversity worldwide in the form of musical genres. Hence, we analyze the temporal evolution of consumption of the main genres in the Brazilian market, both for the Physical and Digital eras. Physical sales). However, the transition of preference for musical genres over the years is notorious: in the Physical Era, the predominant rhythms were Axé (e.g., Ivete Sangalo), Sertanejo (e.g., Zezé di Camargo & Luciano) and Rock (e.g., Skank); whereas in the Digital Era, the most successful artists (THA) come from one style, Sertanejo, with more than 50% of streams in late 2020.
Overall, the Digital Era allows the appearance of new popular genres, as well as the decline of previously popular ones. For example, the prevalence of Sertanejo is remarkable over time, while Pop decreases from 2016 to 2020. Moreover, we highlight the rise of Forró in mid-2020 as a well liked genre, following the growth of popular artists who have burst the regional bubble, such as

Brazilian Artists vs. Foreign Artists
Record companies have been actively working to promote local artists aiming to expand the music ecosystem. Furthermore, according to the IFPI 2022 report, fans are listening to more local artists than ever before, and their music also has the power to go global from day one. As such, the music industry needs to invest in discovering and nurturing the artists of tomorrow. There is a strong predominance of local artists' consumption in Brazil compared to other important 20 countries, such as the USA and European countries. 8 Therefore, our purpose is to identify whether the Brazilian market follows this trend of consuming local artists in the Physical and Digital eras.
Thus, we analyze the distribution of consumption of Brazilian and foreign artists in each period.
Finally, we identify the representativeness of Brazilian and foreign artists in each cluster identified in the previous analyses, also separated by era.
In the Physical Era market, we notice a strong sales presence of artists whose nationality is Brazilian. Figure 11 shows the sales evolution in the Brazilian market, comparing sales of local artists versus foreign artists. Figure 11 shows the number of sales in millions on the left, and the percentage of total sales on the right, for both Brazilians and foreign artists. There is a constant evolution of the preference for Brazilian artists, while there is a tendency for stability in foreign artists. Indeed, the sales representativeness by Brazilian artists corresponds to around 90% throughout the entire period of the Physical era. Only around 2012, there was a slight drop in consumption of local artists, but sales still represented more than 70% in such a period. Hence, we show that Brazilians do indeed favor consuming local musical artists. Concerning the Digital Era, Figure 12 compares the streaming evolution by Brazilian and foreign artists by the number of streams (left) and its corresponding percentage (right). The Brazilian market in Digital Era has its beginning marked by a lower preference for foreign artists when compared to the Physical Era. In such a scenario, the sales of foreign artists in the Brazilian market reached around 40% of total streaming. One possible explanation is that different artists and genres spread more quickly worldwide with the advent of streaming platforms. However, there is a clear trend of growth in the consumption of Brazilian artists, reaching higher levels in 2020 and revealing a movement of a more significant decline in the consumption of foreign artists. Finally, we also analyze the distribution of the presence of Brazilian artists in each of the clusters for the Physical and Digital eras identified in Section 5. Table 4 summarizes the number of Brazilian and foreign artists by era and cluster. In general, there is a strong trend in local music consumption in the Brazilian market. We highlight the BHA and THA clusters, which comprise the most successful artists in both eras (i.e., higher sales and streams), including Sandy & Júnior and Anitta. In particular, the second cluster represents the paramount artists. Although the Physical era has one more foreign artist (five) than Brazilian (four), in the Digital era, all artists are Brazilian, indicating a strong preference for local artists and genres. As a result, the SHA cluster indicates regular success, accounting for over 90% of the artists.

First Hot Streak Analysis
The last comparative aspect is the position of hot streak periods within artist careers. Previous studies on other domains show that such periods are temporally localized and happen at any point in an individual's works sequence. Thus, to assess whether there is a similar behavior in the music domain, in both Physical and Digital eras, we investigate in which point artists experience their first stardom. Here, we focus only on the first point, as we detect more than one hot streak for several artists. Figure 13 shows the cumulative distribution of the position of the first hot streak within artist careers in both Physical (left) and Digital (right) eras, grouped by cluster. In the Physical era, in general, the cumulative distribution of the position is very similar among the three identified clusters, probably due to the data granularity (i.e., considered in years and not weeks). Nonetheless, for the BHA and THA clusters, the first explosion of success still occurs faster than the artists of the SHA cluster, considering the artists' career time. On the other hand, regarding the Digital era, the two clusters that group artists with the highest success levels (i.e., BHA and THA) stand out much more in comparison to the SHA cluster.
Almost 80% of the BHA/THA artists have their first burst of success early in their careers (i.e., in the first 30% of their timelines), whereas most SHA artists reach their first hot streak much later. 23 Therefore, our results indicate that artists who have achieved their first stardom peak earlier in their careers have a higher overall success, regardless of the musical era. It is important to note that artists may have careers of different sizes depending on their debut date, as the last date in the time series is always the same (i.e., the collection date). However, the main objective of this analysis is to offer a big-picture comparison between the Physical and Digital eras, seeking to highlight the relationship between artists' levels of success and the speed of achieving the first burst of success.

Conclusion
This article evaluated the success of musical artistic careers in the Brazilian market, comparing them in different eras: Physical Era, when listeners purchase Physical media to engage their favorite artists (e.g., LPs, CDs, and DVDs); and Digital Era, when music consumption took place mainly through streaming, which have democratized access to music, as streaming services do not necessarily require payment (just an internet connection). In this sense, comparing these eras becomes particularly relevant and valuable, as it enables to identify similar or divergent patterns that record companies can use to generate valuable insights during decision making. Thus, we performed a cross-era comparative analysis between the Physical and Digital media in the Brazilian music market, which is the largest market for the music industry in Latin America.
First, we found that the artists' most successful periods tend to group in time. Motivated by such results, we built artists' success time series for both eras to identify hot streak periods, defined as continuous high-impact bursts. Next, we characterized such periods to understand the dynamics of success among artists from different musical genres. Although there are similarities among all music styles, our results showed that some genres have meaningful specific patterns for both eras.
Therefore, as in other studies in the MIR field, considering music genre information can be relevant for both the predictive and descriptive models. We also performed a profiling analysis that uncovered three different clusters in both eras: Spike Hit Artists (SHA), Big Hit Artists (BHA), and Top Hit Artists (THA), which acted as class descriptors of successful artists. In addition, we found that artists who have achieved their first stardom peak earlier in their careers have a higher overall success, regardless of the musical era. Finally, we discover that Brazilians prefer to consume 24 music by Brazilian artists, and we highlight the proportion of Brazilian and foreign artists within each identified cluster. Such a pattern repeats in both eras.
Overall, our results shed light on meaningful insights for MIR tasks, such as prediction and recommendation. For example, the identified clusters may serve as input features for musical success prediction models. In addition, they may also help in recommending potentially successful musical partnerships and collaborations. Besides helping the scientific community, this work also benefits the music industry. Analyzing the evolution of artist careers reveals success trends in Brazil from what people consume. Indeed, our results demonstrate that although Brazilians connect with international hit songs, they still have a strong preference for local artists regardless of the era.
Hence, considering individually regional markets is crucial for better comprehension of specific factors driving musical success. Finally, understanding hot streak periods and success patterns can enhance the human element in the music industry (e.g., A&R executives and record label CEOs) and people's relationships with music. Our findings may help describe the listeners' behavior and musical trends, allowing the music industry to connect people to songs relevant to them.

Threats to Validity
Here, a limiting factor is that piracy had a high impact in the music consumption in Brazil, mainly in the late 2000s and early 2010s. Therefore, data collected from PMB may not precisely reflect Brazilian preference in music. 9 In addition, although we found similar patterns between the Physical and Digital eras, each data source used considers its own success measure, which can cause biased results. Finally, we only consider artists who are recognized as successful (either through their sales or their position in stream rankings). Future work should overcome all such limitations to enhance the results and further advance in the state-of-the-art.

ACKNOWLEDGMENT
Work supported by CAPES and CNPq,Brazil. 9 For reference, 52% of the music consumption in Brazil in 2005 came from piracy: https://bit.ly/PiracyReportIFPI